d0p1/ack - Cute Engineering : Cute solutions to hard problems

d0p1/ack

Author	SHA1	Message	Date
David Given	d2c14ca44f	Precisely one stack hreg gets allocated for each vreg/congruence group for eviction; this prevents us from having to worry about moving values from stack slot to stack slot, which is hard.	2018-09-08 18:59:55 +02:00
David Given	b7a1c96986	MIPS appears to hate converting unsigneds to floats and vice versa.	2018-09-05 23:53:38 +02:00
David Given	fc0b0ae178	(Slightly) better errors on phi congruence group mismatches.	2018-09-05 23:53:08 +02:00
David Given	7fbce066f8	We attempt to compile the first library function; we fail.	2018-09-05 00:13:01 +02:00
David Given	26fe3f7530	Added library skeletons.	2018-09-05 00:07:07 +02:00
David Given	26c0228b14	The examples all compile now (probably incorrectly, and the libc doesn't compile yet).	2018-09-04 23:55:28 +02:00
David Given	9d80756253	Lots of assembler and rule bugfixing.	2018-09-04 23:43:24 +02:00
David Given	fe5ca5a85f	Added li and la instructions.	2018-09-03 22:47:41 +02:00
David Given	26f9b4ceae	Add in floating point support to the code generator.	2018-09-03 22:06:05 +02:00
David Given	f8e71d888b	Add some painfully untested FPU instructions.	2018-09-02 21:36:09 +02:00
David Given	1d6ecddcf4	The MIPS backend is still full of holes, and cut-and-pasted PowerPC code, but is beginning to look like an actual code generator.	2018-09-02 18:57:25 +02:00
David Given	4741ed8e14	Add a completely non-tested table-based MIPS assembler.	2018-09-01 19:35:31 +02:00
David Given	d623440c77	Add the core of a simple em22 platform. Unfortunately it doesn't work; the old em libmon vanished decades ago (or never existed), and also ass appears to have a different idea of what the em opcodes are to everything else and gets confused.	2018-06-10 20:25:48 +09:00
David Given	911ce7ceb5	Crudely tweak some of the prototypes to please clang, which is pickier about K&R C than gcc is.	2018-06-02 21:31:18 +02:00
David Given	05ddefad65	Adopt a copy of Minix 2's ed; this allows the ACK's antiquated ed scripts to run with a minimum of tweaking. Rewriting them for modern ed looks really hard. Fixes: #84	2018-06-02 18:02:51 +02:00
George Koehler	1df4db99c9	Optimize libfp. Don't lose -O6 in ackcflags. This drops 124 bytes from the mandelbrot command (from 15015 to 14891 bytes) but has almost no effect on performance; the command takes about 144 seconds (in YAZE-AG) both before and after optimizing libfp.	2018-04-25 22:48:28 -04:00
George Koehler	93e01eb5d1	Teach i80 ncg to use libfp. Enable `ack -mcpm -fp`. Old .o files stop working if they use floating point. One must recompile those files. Old files don't call libfp in the correct way, and may use symbols that I removed from libem. I don't keep old symbols in libem/flp.s, because a program that pulls both libfp and flp.s would get "multiply defined" errors in the linker. I teach mach/i80/ncg/table to use libfp by copying or adapting the patterns from mach/i86/ncg/table. I did not test all the patterns, but I did use `ack -mcpm -fp -O4` to compile examples/mandelbrot.c, then I ran it in the emulator YAZE-AG. It worked, but it was slow.	2018-04-25 16:09:56 -04:00
George Koehler	b9b3428e01	Build (but don't use) libfp for cpm. This library is for software floating point. The i80 back end has never implemented floating point, and might not be ready for libfp. This commit only builds libfp without using it. I edit first/build.lua and plat/build.lua to allow `ack -c.s`, then use FP.script to edit the assembly code. I edit FP.script so it writes the edited assembly code to stdout, not to the input file.	2018-04-25 00:34:10 -04:00
David Given	aabf0bdd69	Merge pull request #73 from kernigh/kernigh-pr better code from PowerPC ncg and mcg	2018-03-13 13:57:28 +01:00
George Koehler	b1badf1851	Add instructions like "lwarx". Extend manual. Add more page numbers from PowerPC version 2.01. Remove "xnop" not in 2.01, add "mtcr" from 2.01. Add "lwarx" and the other instructions from Book II. I did not try all the newly added instructions, but these seem to work: dcbt, dcbtst, icibi, isync, lwarx, stwcx., mftb, mftbu In man/powerpc_as.6 (not installed), add a summary of the registers and addressing modes (like in i386_as.6), describe short forms, update description of hi16/ha16, add CAVEATS about instructions that some processors can't run.	2018-03-07 13:37:31 -05:00
David Given	4b5a7fee14	Made the cgg and the cg code generator work; use this to beat the PDP/11 backend into shape. It now generates binaries --- no idea whether they work or not.	2018-02-23 22:31:46 +01:00
George Koehler	a60738a50d	Don't use '-' in option string to getopt(). Using '-' might fail on platforms like FreeBSD. Commit `50a7031` stopped using '-' in the B compiler and ego. I now stop using '-' in mcg, because I can now check that mcg still works.	2018-02-05 14:55:10 -05:00
George Koehler	04ac91889c	Only lower "addi sp, sp, X" if X > 0. If X < 0, then lowering the addi might cause the code to use the stack space before allocating it. This is a bug because an asynchronous signal handler can overwrite the unallocated stack space.	2018-02-01 12:20:31 -05:00
George Koehler	9077b3a5ab	Teach mcg to pass our tests. Tests pass if one edits the top build.lua to uncomment "qemuppc" from both vars.plats and vars.plats_with_tests, and one leaves mcg in plat/qemuppc/descr. Add or correct some EM instructions in treebuilder.c: - "lof", "stf": handle negative offsets in load() and store(). - "cuu": add using IR_FROMUI. - "lim", "sim": keep an entire word in ".ignmask", to be compatible with mach/powerpc/libem/trp.s and ncg. We also keep a word in ".ignmask" in ncg for both i386 and m68020. - "trp": pass trap number in register. See comment in helper_function_with_arg(). - "sig": push the old value of .trppc on the stack. - "and ?", "ior ?", "xor ?", "com ?", "cms ?", "set ?", "inn ?": connect to helper functions in libem. - "blm", "bls": drop call to memmove() and use new helper ".bls4", because tests/plat/structcopy_e.c can't call memmove(). - "xor s", "cms s": if s is large, fall back on helper function. - "rol", "ror": add by decomposing each rotate into 4 IR ops. - "rck s", "bls s": make fatal unless s is word size. - "loi": push multiple loads in the correct order. - "dup s", "exg s": if s is large, fall back on helper. - "dus": add using new helper ".dus4". - "lxl", "lxa": follow the static chain, not the dynamic chain. - "lor 1": materialise the stack before pushing the stack pointer. - "lor 2", "str 2": make fatal. - "los", "sts": drop calls to memcpy() and use helpers ".los4" and and ".sts4", so lang/m2/libm2/LtoUset.e starts working. - "gto": correctly read descriptor. Change mach/powerpc/mcg/table: - ANY.L: add for "asp -8". - LOAD.L: work around register corruption. - COMPAREUL.I: add for "cms 8".	2018-01-31 21:05:40 -05:00
George Koehler	3dae9e49cc	Use subfic (val - reg) and mulli (reg * val). In the instruction list, put /* kills xer / for sraw, srawi, subfic; and correct the (now unused) "addi." and "lfdu". Change MACHOPT_F from -m3 to -m2. This changes the code for 15 i from slwi r3,r4,4 subfic r5,r4,0 add r3,r3,r5 to mulli r3,r4,15 If the sequence "slwi subfic addi" takes 3 cycles and 12 bytes, and mulli takes 3 cycles and 4 bytes, then mulli is better.	2018-01-27 15:53:05 -05:00
George Koehler	7c9c4f82fd	Get `ack -mosxppc -g` to partly work with gdb. Copy and adapt code from mach/{i386,m68020}/ncg/mach.c to pass the debugging stabs from EM to assembly. The next tools (as, led, cv) already know how to put the stabs in the Mach-o executable. Modify the function prolog/prologue so gdb uses fp, not sp, for N_LSYM and N_PSYM stabs. Simplify prolog() by reducing differences between stabs and no stabs, and zero and nonzero framesize. For files without stabs, the new prolog has the same number of instructions and memory accesses as the old prolog, and to run at about the same speed on my PowerPC Mac. This is enough to see some info for global and local variables in gdb for Mac OS X. I still can't get a backtrace; gdb gets confused because EM and ncg don't link 0(sp) to the previous stack frame. I don't expect `ack -mlinuxppc -g` to work with gdb for Linux, because we prepend underscores to the symbol table, which is correct for Mach-o but wrong for ELF.	2018-01-26 20:19:38 -05:00
George Koehler	e83aaca3ec	Add some comments before I forget how this stuff works.	2018-01-24 15:17:32 -05:00
George Koehler	e3672bd66e	Allow sp and fp on the fake stack. This simplifies parts of the PowerPC table and causes ncg to better decide whether to push sp or fp to the real stack, or coerce it to REG3, or coerce it to REG-REG3, or move it to a regvar. These better decisions remove extra _mr_ instructions. The idea comes from mach/powerpc/arm/table, where SP has a property STACKPOINTER and LB has LOCALBASE. I don't need two properties, so I make one property SPFP for both registers.	2018-01-23 18:18:40 -05:00
George Koehler	66f93f08c5	Add fef 4, fif 4. Improve fef 8, fif 8. Other float changes. When I wrote fef 8, I forgot to test denormalized numbers. Oops. Now fix two of my mistakes: - When checking for zero, `extrwi r6, r3, 22, 12` needs to be `extrwi r6, r3, 20, 12`. There are only 20 bits to extract. - After the multiplication by 2**64, I forgot to put the fraction in [0.5, 1) or (-1, 0.5] by setting IEEE exponent = 1022. Teach fif 8 about signed zero and NaN. In ncg/table, change cmf so NaN is not equal to any value, and comment why ordered comparisons don't work with NaN. Also add cost for fctwiz, remove extra `uses REG`. Edit comment in cfu8.s because the conditional branch might be before or after fctwiz.	2018-01-22 14:04:15 -05:00
George Koehler	64b50b3a45	Shrink .cfu8 With my PowerBook G4, a program that converts values from 1.0 to 4000000.0 runs in about 0.32s with the old .cfu8 and 0.29s with this shrunken .cfu8 Leave a comment about other ways to implement .cfu8	2018-01-07 16:03:55 -05:00
George Koehler	b90c97b00b	Teach top to merge or delete "addi sp, sp, X". This reduces code size, because ncg emits too many "addi sp, sp, X" instructions when unstacking things. Now top lowers "addi sp, sp, X" by lifting other instructions. This sometimes creates chances to merge or delete _addi_ instructions. If no such chance is found, the _addi_ remains uselessly lowered. Edit ncg/table to remove something that top now does. Edit ncg/mach.c to remove some spaces after commas. This removes a whitespace difference between .s and .so files, because top removes the space.	2018-01-05 17:55:50 -05:00
George Koehler	720af48d8a	Fix lim. Improve lxl, lxa, lor, str, procs with no locals. _lim_ must use _loe_ (load word external), not _lde_ (load double-word external). The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in some cases. The change from GPR_EXPR to REG_EXPR allows moving LXFRAME to a register variable. Add more "reusing" clauses. We have enough registers that ncg almost never reuses a register, but sometimes it can reuse r3. In mach.c, emit one fewer instruction in procedures with no locals.	2018-01-04 20:40:35 -05:00
George Koehler	d6938108a6	Add tests for C <setjmp.h> and Modula-2 Semaphores. Fix PowerPC ncg so setjmp() returns the correct value. I got unlucky when ncg picked r3 for "uses REG"; this destroyed the return value in r3 and caused the new test to fail.	2018-01-03 14:51:14 -05:00
George Koehler	26de4c1ab1	Add test for EM _rck_. Fix traps in PowerPC ncg. The new test rck_e.e segfaults on PowerPC unless I make some changes. The inline code for _rck_ was wrong because it didn't allow the trap handler to return. _sig_ forgot to push the old trap handler. Move plat/linuxppc/libsys/trap.s to mach/powerpc/libem/trp.s and rewrite it with simplified/extended mnemonics. Remove .trap alias for .trp procedure. Add a missing `mtspr lr, r0` so we can return from the trap handler. Call write() and _exit() so trp.s works with both linuxppc and osxppc. Before, Mac OS X was wrongly using the trap.s for Linux. In powerpc/libem, simplify .aar4; teach .csa and .csb to raise the trap if the default target is zero. C programs don't need these changes. You may relink your C programs with the changed .csa and .csb, but C code doesn't raise the trap. Modula-2 code can raise traps, so you may want to relink your Modula-2 programs with the changed libem, but you might keep your old .o files from Modula-2. You may need to recompile your Pascal programs (delete old .o files from Pascal) because the Pascal compiler might use _rck_.	2017-12-24 22:37:52 -05:00
George Koehler	5f2a7b260f	Optimize `mr. X, X` after some instructions. For example, when ncg emits slw r9,r8,r5 mr. r9,r9 then top simplifies the code to slw. r9,r8,r5	2017-12-22 22:32:16 -05:00
George Koehler	c964eeddba	Remove INT32 and such. Adjust indentation. I understand `loi 4` more easily than `loi INT32`, because `loi 4` appears in .e files. So remove INT8, INT16, INT32, INT64. Add a comment to explain r3 during unconditional jumps.	2017-12-22 21:18:58 -05:00
George Koehler	f96f918a29	Generate shorter code for ret 4 and ret 8.	2017-12-22 20:37:39 -05:00
George Koehler	5867ca2f2c	Remove two obsolete patterns. These patterns seem to have no effect on the generated code.	2017-12-22 19:57:42 -05:00
George Koehler	2eeee36f78	Add FRAME_V tokens for local variables. When storing to a local, stop killing the tokens of other locals, unless they might overlap with the stored local. This helps some procedures that juggle locals when the locals aren't in registers. Also use FRAME_V tokens for locals in statically enclosing procedures. Rewrite _lxa_ as _lxl_, to skip the `addi ?,?,8` if we can add 8 to the next constant. The PowerPC code from _lxl_ is now sometimes better, sometimes worse than before. The i386 table provided the idea to use %size to find overlapping locals.	2017-12-22 17:04:16 -05:00
George Koehler	ad47fa5fe3	Add splitting coercions for IND_ALL_D. Delete my wrong comment (from commits `cfbc537`, `a8f62f4`, `5432bd0`) which claimed that such coercions are not possible.	2017-12-18 20:59:04 -05:00
George Koehler	24abaf6a25	Enable conditional expressions in splitting coercions. ncgg has parsed the optional conditional expression (optexpr) of each splitting coercion since commit `72b83cc` in 1985; but for almost 33 years, ncg has ignored the expression in c2_expr. Few tables had conditional coercions (I only found them in arm and m68020), and no tables had conditional splitting coercions, so this only becomes a problem now as I try to add a conditional splitting coercion to powerpc.	2017-12-18 20:39:56 -05:00
George Koehler	5e99baabdf	Rename two tokens. CONST_HZ was not hertz (Hz).	2017-12-18 12:36:10 -05:00
George Koehler	d8fa9d1b2a	In coercions, try to reuse a register with the same token. This reduces code size.	2017-12-17 12:45:27 -05:00
George Koehler	b0d75fed37	Rename ANY_BHW to INT_W; add FLOAT_W, FLOAT_D. INT_W, the integer set, continues to exclude FSREG, because we can't easily move FSREG to GPR. ANY4 becomes ISET+FLOAT_W and ANY8 becomes FLOAT_D.	2017-12-17 11:56:02 -05:00
George Koehler	5ba83100d6	Delete rules for sti 8 with REG IND_RC_D, with REG IND_RR_D. Prefer the rule with REG FREG, by coercing IND_RC_D or IND_RR_D to FREG. This rule looks better to ncg. When ncg chose between coercion to REG IND_RC_D or coercion to REG FREG, it chose REG FREG. It only chose REG IND_RC_D if the stack had exact REG IND_RC_D.	2017-12-12 13:36:43 -05:00
George Koehler	11a54e0a7c	These instructions write to the CR.	2017-12-10 14:01:14 -05:00
George Koehler	504d2aa34e	Revise stack shuffles and integer conversions in PowerPC ncg. Allow asp 4, exg 4 to shuffle tokens without coercing them into registers; but comment why dup 4, dup 8 coerce tokens into registers. Allow dup, dus, exg with larger sizes; and add tests dup_e.e and exg_e.e to check that dup 20, dus, exg 20 work as well in powerpc as in i80 and i86. Then powerpc failed to compile loc 2 loc 4 cuu in dup_e.e. Revise the integer conversions, so powerpc can compile and pass the test.	2017-12-09 18:57:10 -05:00
George Koehler	48788287b8	Add more chances to put results in register variables. When a rule `uses REG ... yields %a`, the result %a is always a temporary, never a regvar. If the EM code uses _stl_ to put the result in a regvar, then ncg emits _mr_ to move %a to the regvar. There are two ways to put the result in the regvar without %a: 1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_ can move the token to the regvar without using %a. 2. Provide a pattern, like `sli stl`, that just puts the result in `{LOCAL, $2}` and not %a. Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add tokens like MUL_RR, and patterns like `sli stl`. Delete patterns for `stl lol` and `sdl ldl` to avoid an extra temporary %a when the local is a regvar. Delete `lal sti lal loi` because it would emit wrong code.	2017-12-08 17:19:26 -05:00
George Koehler	6b933db90b	Split C from CONST. Rename token CONST to C. Define set CONST = C + CONST_STACK. The instructions with CONST operands can now accept CONST_STACK tokens; some cases of {CONST, %1.val} become %1. Also simplify two of _rlwinm_ into _slwi_ and _srwi_.	2017-12-07 19:24:09 -05:00
George Koehler	a1d1f38691	Add test for EM rol, ror. Fix i80, i86, powerpc. EM instructions _rol_ and _ror_ do rotate an integer left or right. Our compilers and optimizers never emit _rol_ nor _ror_, but I might want to use them in the future. Add _rol_ and _ror_ to powerpc. Fix `rol 4` and `ror 4` in both i80 and i86, where the rules for `rol 4` and `ror 4` seem to have never been tested until now.	2017-12-07 17:16:21 -05:00
George Koehler	c95bcac91d	Correct the stack pointer when i80 shrinks an integer. The code used `sphl` to set the stack pointer, but the correct value was in de, not hl. Fix by swapping the values of de and hl, so `sphl` is now correct. When we shrink an integer from 4 to 2 bytes, both registers de and hl point to copies of the result, but only one register preserves the stack below the result. This fixes writehex() in tests/plat/lib/test.c, when I compile it with ack -mcpm, so it preserves the pointer to "0123456789abcdef", so it writes hexadecimal digits and not garbage. This bug goes back to commit `157b243` of Mar 18, 1985, so the bug is 32 years old, and probably the oldest bug that I ever fixed.	2017-12-07 15:39:41 -05:00
George Koehler	34cf0c8b63	Kill registers a, de, when i80 ncg does Call libem. I compiled tests/plat/lib/test.c with ack -mcpm, but i80 ncg did emit wrong code in writehex(uint32_t) for "0123456789abcdef"[code & 0xf] The code called '.and' to evaluate `code & 0xf`, then tried to call '.cii' to narrow the result from 4 to 2 bytes, but it passed garbage instead of 4 to '.cii'. The rule for '.and' was pat and defined($1) kills ALL uses dereg={const2,$1} gen Call {label,".and"} This failed to kill register de={const2,4}, so ncg pushed de, expecting to push 4, but actually pushing garbage. Fix such rules using `mvi a,...` or `lxi de,...` so ncg doesn't track the token in the register. This is like the i86 table. A different fix would use a dummy instruction `killreg a` or `killreg de` like the m68020 table. Also correct 1 to $1 when calling '.exg'.	2017-12-06 22:14:00 -05:00
George Koehler	760da1f421	Fix build with gcc. gcc gave an error because the `char ` parameter doesn't match the `const char ` in the prototype of regsave(). clang didn't give an error. I added the prototype in commit `5301cce`.	2017-11-17 17:52:37 -05:00
George Koehler	5301cceee3	Declare machine-dependent functions in mach/proto/ncg This breaks all machines because the declared return type void disagrees with the implicit return type int (when I compile mach.c with clang). Unbreak i386, i80, i86, m68020, powerpc, vc4 by adding the return types to mach.c. We don't build any other machines; they are broken since commit `a46ee91` (May 19, 2013) declared void prolog() and commit `fd91851` (Nov 10, 2016) declared void mes(), with both declarations in mach/proto/ncg/fillem.c. Also fix mach/vc4/ncg/mach.c where type full is long, so fprintf() must use "%ld" not "%d" to print full nlocals.	2017-11-13 14:23:44 -05:00
George Koehler	e04166b85d	More prototypes, less register in mach/proto/ncg Files that #include "equiv.h" must do so after including "data.h", now that a function prototype in equiv.h uses type rl_p from data.h. Adjust style, changing some `for(...)` to `for (...)`. The style in mach/proto/ncg is less than consistent; the big annoyance now is that some files want tabs at 4 spaces, others want tabs at 8 spaces.	2017-11-13 12:44:17 -05:00
George Koehler	909b0d5bf3	Add prototypes to functions in subr.c Put the declarations in "data.h", because that header declares the types cost_t and token_p. Also #include <cgg_cg.h> from "data.h" to get types c3_p and set_p, and guard <cgg_cg.h> against multiple inclusion.	2017-11-12 16:11:05 -05:00
George Koehler	ba2a45180c	Prototypes for string functions. More static.	2017-11-12 11:25:18 -05:00
George Koehler	98b27dd505	Remove old "assert.h" in mach/proto/ncg Important: You must "make clean" after checking out this commit, because the build had copied the old "assert.h" to several places in obj/. If you don't "make clean", then the compiler finds the old "assert.h" before libc <assert.h>, and the build fails because this commit removes badassertion() in subr.c. After "make clean", the compiler finds libc <assert.h> and the build succeeds.	2017-11-11 19:35:48 -05:00
George Koehler	ac4cbd735e	Use libc assert(); fix dependencies; unbreak isduo(). Switch from custom assert() to libc assert() in mach/proto/as. Continue to disable asserts if DEBUG == 0. This change found a problem in the build system; comm2.y was missing depedencies on comm0.h and comm1.h. Add the missing dependencies to the cppfile rule. Allow the dependencies by modifying cppfile in first/build.lua to act like cfile if t.dir is false. Now that comm2.y gets rebuilt, I must fix the wrong prototype of yyparse() in comm1.h. I got unlucky as induo() in comm5.c was reading beyond the end of the array. It found an operator "= " ('=' then space) in the garbage, so it returned a garbage token number, and "VAR = 123" became a syntax error. Unbreak induo() by terminating the array.	2017-11-11 16:09:05 -05:00
George Koehler	d347207e60	Add more prototypes in mach/proto/as Change "register i;" to "int i;" to so clang stops warning about implicit int. Use function prototypes so clang stops warning about implicitly declared functions.	2017-11-10 23:30:46 -05:00
George Koehler	0102cc8934	lwzu writes to the register in the token.	2017-10-19 12:44:46 -04:00
George Koehler	2a92f9bf4d	Add a few more error checks and adjustments to reglap. In util/ncgg, add two more errors for tables using reglap: - "Two sizes of reg_float can't be same size" - "Missing reg_float of size %d to contain %s" In mach/proto/ncg, rename macro isregvar_size() to PICK_REGVAR(), so the macro doesn't look like a function. This macro sometimes doesn't evaluate its second argument. In mach/powerpc/ncg/mach.c, change type of lfs_set to uint32_t, and change the left shifts from 1U<<regno to (uint32_t)1<<regno, because 1U would be too small for machines with 16-bit int.	2017-10-18 22:00:12 -04:00
George Koehler	73ad5a227d	Rename RELOLIS to RELOPPC_LIS. This relocation is specific to PowerPC. @davidgiven suggested the name RELOPPC_LIS in https://github.com/davidgiven/ack/pull/52#issuecomment-279856501 Reindent the list in h/out.h and util/led/ack.out.5 because RELOLIS_PPC is a long name. I use spaces and no tabs because the tabs looked bad in the manual page.	2017-10-18 15:39:31 -04:00
George Koehler	459a9b5949	Use lwzu, stwu to tighten more loops. Because lwzu or stwu moves the pointer, I can remove an addi instruction from the loop, so the loop is slightly faster. I wrote a benchmark in Modula-2 that exercises some of these loops. I measured its time on my old PowerPC Mac. Its user time decreases from 8.401s to 8.217s with the tighter loops.	2017-10-18 12:12:42 -04:00
George Koehler	ac2b0710c8	Add more rules for single-precision reg_float. The result of single-precision fadds, fsubs, and such can go into a register variable, like we already do with double precision. This avoids an extra fmr from a temporary register to the regvar.	2017-10-17 17:53:03 -04:00
George Koehler	47bd0ef7a7	Stop inlining code to convert integers to floats. Do the conversion by calling .cif8 or .cuf8 in libem, as it was done before my commit `1de1e8f`. I used the inline conversion to experiment with the register allocator, which was too slow until `c5bb3be`. Now that libem has the only copy of the code, move some comments and code changes there.	2017-10-17 17:00:28 -04:00
George Koehler	893e170015	Use my new regvar_w() and regvar_d() in PowerPC ncg. Rename GPRE to GPR_EXPR, then define FPR_EXPR and FSREG_EXPR. Use them for moves to register variables. Keep "kills regvar($1)", because deleting it and recompiling libc would cause many failures in my test programs. Add comment to warn, /* ncg fails to infer that regvar($1) is dead! */ Remove "kills LOCAL %off==$1" because it seems to have no effect.	2017-10-17 14:15:33 -04:00
George Koehler	307a8b996e	Add regvar_w() and regvar_d() for use with reglap. If the ncg table uses reglap, then regvar($1, reg_float) would have two sizes of registers. An error from ncgg would happen if regvar() was in a token that allows only one size. Now one can pick a size with regvar_w() for word size or regvar_d() for double-word size. Add regvar_d and regvar_w as keywords in ncgg. Modify EX_REGVAR to include the register size. In ncg, add some checks for the register size. In tables without reglap, regvar() works as before, and ncg ignores the register size in EX_REGVAR.	2017-10-17 12:05:41 -04:00
George Koehler	5432bd03d6	Do a move when coercing FREG to FREG or FSREG to FSREG.	2017-10-16 12:07:55 -04:00
George Koehler	f0619ea4ae	PowerPC ncg never uses the rules to stack LOCAL or DLOCAL.	2017-10-15 15:22:52 -04:00
George Koehler	aa876ff4c2	Fix reglap for procedures that use both sizes of reg_float. After the RA phase of ego, a procedure may put single-word and double-word values in the same reg_float. Then ncg will use both LOCAL and DLOCAL tokens at the same offset. I add isregvar_size() to ncg. It receives the size of the LOCAL or DLOCAL token, and picks the register of the correct size. This fixes a problem where ncg got the wrong-size register and corrupted the stack. This problem caused one of my test programs to segfault from stack underflow. Also adjust how fixregvars() handles both sizes.	2017-10-15 13:15:03 -04:00
George Koehler	b342b83d28	Add function prototypes to mach/proto/ncg/regvar.c	2017-10-15 11:01:18 -04:00
George Koehler	d6e9eac785	Merge branch 'default' into kernigh-linuxppc This merges several fixes and improvements from upstream. This includes commit `5f6a773` to turn off qemuppc. I see several failing tests from qemuppc; this merge will hide the test failures.	2017-10-14 13:50:49 -04:00
George Koehler	7e9348169c	Add reglap to ncg. Add 4-byte reg_float to PowerPC ncg. The new feature "reglap" allows two sizes of floating-point register variables (reg_float), if each register overlaps a single register of the other size. PowerPC ncg uses reglap to define 4-byte instances of f14 to f31 that overlap the 8-byte instances. When ncgg sees the definition of fs14("f14")=f14, it removes the 8-byte f14 from its rvnumbers array, and adds the 4-byte fs14 in its place. Later, when ncg puts a variable in fs14, if it is an 8-byte variable, then ncg switches to the 8-byte f14. The code has /* reglap */ comments in util/ncgg or #ifdef REGLAP in mach/proto/ncg reglap became necessary because my commit `a20b87c` caused PowerPC ego to allocate reg_float in both 4-byte and 8-byte sizes.	2017-10-14 12:40:04 -04:00
David Given	64f2fa9d46	Stop using mktemp() --- on Haiku, it always generates the same filenames, pretty much guaranteeing temporary file overwrites on parallel builds. Use mkstemp() instead which creates the files atomically.	2017-08-06 13:22:05 +02:00
George Koehler	2c266c631a	Reorder registers. Fix problem with ret 8. After `c5bb3be`, ncg began to allocate regvars from r13 up. I reorder the regvars so ncg again allocates them from r31 down. I also reorder the other registers. This exposed a bug in my rule for ret 8. It was wrong if item %2 was in r3, because I moved %1 to r3 before %2 to r4. Fix it by adding back an individual register class for r3 (called REG3 here, GPR3 in `c5bb3be`). Also fix my typo in mach.c that made a syntax error in assembly.	2017-02-17 19:32:27 -05:00
George Koehler	23c365c939	Fix comparison of 4-byte floats. I broke it in `f64b7d8`. My stack pattern had the wrong type of registers. The comparison popped too many bytes and corrupted the stack.	2017-02-17 19:29:45 -05:00
George Koehler	736c45453c	Remove .ret from libem and inline the code. This removes a wrong-way dependency of libsys on libem. The C functions in libsys called .ret, but libsys is after libem in the linker arguments, so the linker didn't find .ret unless something else had called .ret. Almost everything called .ret, but I got a linker error when I wrote an assembly program using the EM runtime, because my assembly program didn't call .ret. Add a dummy comment to build.lua, so git checkout touches that file, the build system reconfigures itself, and the *.s glob sees that ret.s has gone.	2017-02-16 21:18:39 -05:00
George Koehler	e6df553ebf	For PowerPC, never put a reg_float value in a reg_any. With this type check, I can change the size checks into assertions.	2017-02-16 20:30:17 -05:00
George Koehler	aa47f52166	Switch error() and fatal() in mach/proto/ncg to stdarg. This is like David Given's change to util/ncgg in `d89f172`. I need this change in mach/proto/ncg to see fatal messages, because a 64-bit pointer doesn't fit in an int.	2017-02-16 20:26:53 -05:00
George Koehler	cbe5d8640b	Add floating-point register variables to PowerPC ncg. Use f14 to f31 as register variables for 8-byte double-precison. There are no regvars for 4-byte double precision, because all regvar(reg_float) must have the same size. I expect more programs to prefer 8-byte double precision. Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to save and restore 8-byte regvars. Delay emitting the function prolog until f_regsave(), so we can use one addi to make stack space for both local vars and saved registers. Be more careful with types in mach.c; don't assume that int and long and full are the same. In ncg table, add f14 to f31 as register variables, and some rules to use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg in a regvar. Without such rules, the result would go in a scratch FREG, and we would need fmr to move it to the regvar. Also add a rule for pat sdl inreg($1)==reg_float with STACK, so we can unstack the value directly into the regvar, again without a scratch FREG and fmr. Edit util/ego/descr/powerpc.descr to tell ego about the new float regvars. This might not be working right; ego usually decides against using any float regvars, so ack -O1 (not running ego) uses the regvars, but ack -O4 (running ego) doesn't use the regvars. Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc and -mqemuppc run ego without a config file (since `8ef7c31`). I am testing powerpc.descr with a local edit to plat/linuxppc/descr to run ego with powerpc.descr there, but I did not commit my local edit.	2017-02-15 19:34:07 -05:00
George Koehler	cf728c2a2a	Implement lxl for PowerPC ncg. This fixes lxl 1 (so it follows the static chain, not the dynamic chain) and provides lxl 2 and greater. The Modula-2 compiler uses lxl for nested procedures, so they can access the variables of the enclosing procedures.	2017-02-13 23:22:31 -05:00
George Koehler	a8f62f44d8	Remove REG_PAIR. I added REG_PAIR in `cfbc537` to speed up the register allocator, because ncg was taking about 2 seconds on each sti 8. I defined only 4 such pairs, so allocating REG_PAIR was much faster than allocating REG REG. After my last commit `c5bb3be`, allocation of REG REG is fast, and REG_PAIR seems unnecessary.	2017-02-13 18:11:27 -05:00
George Koehler	c5bb3be495	Speed up register allocation by removing some register classes. The table for PowerPC had placed each GPR and FPR into an individual register class (like GPR3, GPR4, FPR1, FPR2), and had used these classes to coerce stack values into specific registers. But ncg does not like having many register classes. In http://tack.sourceforge.net/olddocs/ncg.pdf Hans van Staveren wrote: > Every extra property means the register set is more unorthogonal and > cg execution time is influenced by that, because it has to take > into account a larger set of registers that are not equivalent. So > try to keep the number of different register classes to a minimum. Recent changes to the PowerPC table have removed many coercions to specific registers. Many functions in libem switched from taking values in registers to taking them from the stack (see `dc05cb2`). I now remove all 64 individual register classes of GPR and FPR. In the few cases where I need a stack value in a specific register, I now do a move (as the arm and m68020 tables do). This commit speeds the compilation of some files. For my test file fconv.c, the compilation time goes from over 20 seconds to under 1 second. My fconv.c has 4 conversions from floats to integers, and the table has my experimental rules that do the conversions by allocating 4 or 5 registers.	2017-02-13 17:44:46 -05:00
George Koehler	dc05cb2dc8	Add pat cms !defined($1) Switch .cms to pass inputs and outputs on the real stack, not in registers; like we do with .and, .or (`81c677d`) and .xor (`c578c49`). At this point, nearly all functions in libem use the real stack, not registers, for passing inputs and outputs. This simplifies the ncg table (which needs fewer lists of specific registers) but slows calls to libem. For example, after `ba9b021`, each call to .aar4 is about 10 instructions slower. I moved 3 inputs and 1 output from registers to the real stack. A program would take 4 instructions to move registers to stack, 4 to move stack to registers, and perhaps 2 to adjust the stack pointer.	2017-02-13 16:52:32 -05:00
George Koehler	89dd80e34d	Add missing instances of "kills ALL" or "with STACK".	2017-02-13 16:38:26 -05:00
George Koehler	ba9b021253	Use .los4 in lar 4 and .sts4 in sar 4. Our libem had two implementations of loading a block from a stack, one for lar 4 and one for los 4. Now lar 4 and los 4 share the code in .los4. Likewise, sar 4 and sts 4 share the code in .sts4. Rename .los to .los4 and .sts to .sts4, because they implement los 4 and sts 4. Remove the special case for loading or storing 4 bytes, because we can do it with 1 iteration of the loop. Remove the lines to "align size" where the size must already be a multiple of 4. Fix the upper bound check in .aar4. Change .aar4, .lar4, .los4, .sar4, .sts4 to pass all operands on the real stack, except that .los4 and .sts4 take the size in register r3. Have .aar4 set r3 to the size of the array element. So lar 4 is just .aar4 then .los4, and sar 4 is just .aar4 then .sts4. ncg no longer calls .lar4 and .sar4 in libem, because it inlines the code; but I keep .lar4 and .sar4 in libem, because mcg references them. They might or might not work in mcg.	2017-02-13 15:22:00 -05:00
George Koehler	54949f713f	Change .fef8 and .fif8 to pass values on the stack. Reorder the code in .fef8 and .fif8 so that in the usual case, we fall through to the blr without taking any branches. The usual case, by my guess, is .fef8 with normalized numbers or .fif8 with small integers. I change .fef8 and .fif8 to pass values on the real stack, not in specific registers. This simplifies the ncg table, and might help me experiment with changes to the ncg table. This change might or might not help mcg. Seems that mcg always uses the stack to pass values to libem, but I have not tested .fef8 or .fif8 with mcg.	2017-02-12 16:44:37 -05:00
George Koehler	1de1e8f7f0	Experiment with conversions between integers and floats. Switch some conversions from libem calls to inline code. The conversions from integers to floats are now too slow, because each conversion allocates 4 or 5 registers, and the register allocator is too slow. I might use these slow conversions to experiment with the register allocator. I add the missing conversions between 4-byte single floats and integers, simply by going through 8-byte double floats. (These replace the calls to nonexistant functions in libem.) I remove the placeholder for fef 4, because it doesn't exist in libem, and our language runtimes only use fef 8.	2017-02-12 15:45:28 -05:00
George Koehler	2e41c392fa	Implement blm and bls using an inline loop. This replaces a call to memmove() in libc. That was working for me, but it can fail because EM programs don't always link to libc. blm and bls only need to copy aligned words. They don't need to copy bytes, and they don't need to copy between overlapping buffers, as memmove() does. So the new loop is simpler than memmove().	2017-02-11 19:30:12 -05:00
George Koehler	c578c495bb	Edit PowerPC assembly for .and, .cms, .ior, .xor, .zer Remove one addi instruction from some loops. These loops had increased 2 pointers, they now increase 1 index. I must initialize the index, so I add "li r6, 0" before each loop. Change .zer to use subf instead of neg, add. Change .xor to take the size on the real stack, as .and and .or have done since `81c677d`.	2017-02-11 18:00:56 -05:00
George Koehler	83c13597e1	Use "mr" and make a few other tweaks in PowerPC ncg table. Use extended "mr" instead of basic "or" to move registers. Both "mr" and "or" encode the same machine instruction. With "mr", I can more easily search the assembly output for register moves. Fold several stacking rules into a single rule ANY_BHW-REG to STACK. Remove the EM patterns for loc mlu $2==2 and loc slu. The first pattern had the wrong size (should be $2==4, not $2==2). Both patterns were redundant. They rewrote loc mlu as loc mli and loc slu as loc sli, but this table doesn't have patterns for loc mli or loc sli, so it is enough to rewrite mlu as mli and slu as sli.	2017-02-10 11:45:50 -05:00
George Koehler	85391399a4	Use ha16/lo16 to load or store 1, 2, 8 bytes from labels. Add the tokens IND_RL_B, IND_RL_H, IND_RL_H_S, IND_RL_D, along with the rules to use them. These rules emit shorter code. For example, loading a byte becomes lis, lbz instead of lis, addi, lbz. While making this, I wrongly set IND_RL_D to size 4. Then ncg made infinite recursion in codegen() and stackupto(), until it crashed by stack overflow. I correctly set IND_RL_D to size 8, preventing the crash.	2017-02-08 12:31:14 -05:00
George Koehler	5e00e1fce2	Trimming mach/powerpc/ncg/table Remove coercion from LABEL to REG. The coercion never happens because I have stopped putting LABEL on the stack. Also remove LABEL from set ANY_BHW. Retain the move from LABEL to REG because pat gto uses it. Remove li32 instruction, unused after the switch to the hi16, ha16, lo16 syntax. Remove COMMENT(...) lines from most moves. In my opinion, they took too much space, both in the table and in the assembly output. The stacking rules and coercions keep their COMMENT(...) lines. In test GPR, don't write to RSCRATCH. Fold several coercions into a single coercion from ANY_BHW uses REG. Use REG instead of GPR in stack patterns. REG and GPR act the same, because every GPR on the stack is a REG, but I want to be clear that I expect a REG, not r0. In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later. Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the peephole optimizer can optimize it. If $2!=$4, then the EM program is missing a conversion from size $2 to size $4. Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These rules would never get used, unless the EM program is missing a conversion from size 4 to size 1 or 2.	2017-02-08 12:27:16 -05:00
George Koehler	ed21a59a82	In PowerPC ncg, allocate register for ha16[label]. Use it to generate code like lis r12,ha16[__II0] lis r11,ha16[_f] lfs f1,lo16[_f](r11) lfs f2,lo16[__II0](r12) fadds f13,f2,f1 stfs f13,lo16[_f](r11) Here ncg has allocated r11 for ha16[_f]. We use r11 in lfs and again in stfs. Before this change, we needed an extra lis before stfs, because ncg did not remember that ha16[_f] was in a register. This example has a gap between ha16[__II0] and lo16[__II0], because the lo16 is not in the next instruction. This requires my previous commit `1bf58cf` for RELOLIS. There is a gap because ncg emits the lis as soon as I allocate it. The "lfs f2,lo16[__II0](r12)" happens in a coercion from IND_RL_W to FSREG. The coercion allocates one FSREG but may not allocate any other registers. So I must allocate r12 earlier. I allocate r12 in pat lae, but this causes a gap.	2017-02-08 12:23:06 -05:00
George Koehler	754e96ef16	Use ha16/lo16 to emit pairs of lis/stw, lis/lfs, lis/stfs. A 4-byte load from a label yields a token IND_RL_W. This token emits either lis/lwz or lis/lfs, if we want a general-purpose register or a floating-point register.	2017-02-08 12:13:54 -05:00
George Koehler	7255ed403f	Tweak some tokens in PowerPC ncg. Remove the GPRINDIRECT token, and use the IND_RC_* tokens as operands to instructions. We no longer need to unpack an IND_RC_* token and repack it as a GPRINDIRECT to use it in an instruction. Allow storing IND_ALL_B and IND_ALL_H in register variables. Create a set ANY_BHW for anything that we can store in a regvar. Push register variables on the stack without using GPRE, by changing stwu to accept LOCAL. Then ncg will replace the string ">>> BUG IN LOCAL" with the register name. (I copied ">>> BUG IN LOCAL" from mach/arm/ncg/table.) Fix the rule for "pat lil inreg($1)>0" to yield a IND_RC_W token, not a register. We might need to kill the token with "kills MEMORY". Rename CONST_ALL to CONST_STACK, because it only includes constants on the stack, and excludes CONST tokens. Instructions still don't allow CONST_STACK operands, so we still need to repack each CONST_STACK as a CONST to use it in an instruction. Rename LABEL_OFFSET_HI to just LABEL_HI, and same for LABEL_HA and LABEL_HO.	2017-02-08 12:12:28 -05:00
George Koehler	1bf58cf51c	Add RELOLIS for PowerPC lis with ha16 or hi16. The new relocation type RELOLIS handles these instructions: lis RT, ha16[expr] == addis RT, r0, ha16[expr] lis RT, hi16[expr] == addis RT, r0, hi16[expr] RELOLIS stores a 32-bit value in the program text. In this value, the high bit is a ha16 flag, the next 5 bits are the target register RT, and the low bits are a signed 26-bit offset. The linker replaces this value with the lis instruction. The old RELOPPC relocated a ha16/lo16 or hi16/lo16 pair. The new RELOLIS relocates only a ha16 or hi16, so it is no longer necessary to have a matching lo16 in the next instruction. The disadvantage is that RELOLIS has only a signed 26-bit offset, not a 32-bit offset. Switch the assembler to use RELOLIS for ha16 or hi16 and RELO2 for lo16. The li32 instruction still uses the old RELOPPC relocation. This is not the same as my RELOPPC change from my recent mail to tack-devel (https://sourceforge.net/p/tack/mailman/message/35651528/). This commit is on a different branch. Here I am throwing away my RELOPPC change and instead trying RELOLIS.	2017-02-08 11:46:31 -05:00
George Koehler	f4cfbedd5c	Remove #include <stdbool.h> from mach/powerpc/as/mach1.c We should not include a system header file here, because mach/proto/as/comm2.y goes through cpp twice. The include can cause problems like https://github.com/davidgiven/ack/issues/1 Remove this include #<stdbool.h> and leave a comment pointing to the includes in comm0.h. Change the few instances of bool, false, true, to int, 0, 1.	2017-01-30 16:39:23 -05:00
George Koehler	3c1d2d79f0	Remove type quad, use type word_t in PowerPC as. Type word_t is for encoding the machine instructions. It only needs 32 bits for PowerPC. It was long (which can have 32 or 64 bits), and there was a second type quad (which was uint32_t). Switch word_t to uint32_t and replace quad with word_t. Also change valu_t and ADDR_T away from long.	2017-01-30 16:15:02 -05:00
George Koehler	48e3aab728	Swap RA and RS when assembling "and", "or", and such instructions. They must use OP_RA_RS_RB_C instead of OP_RS_RA_RB_C. The code generator often sets RS and RA to the same register, so swapping them causes no change in many programs. I also rename OP_RS_RA_UI_CC to OP_RA_RS_UI_CC, and OP_RS_RA_C to OP_RA_RS_C, because they already swap RA and RS.	2017-01-30 15:47:09 -05:00
George Koehler	9ddbb66c8b	Turn off comments again. I turned them on by accident in `c416889`.	2017-01-30 15:45:46 -05:00
George Koehler	c41688929c	In PowerPC ncg, switch the scratch register from r11 to r0. r0 is a special case and can't be used when adding a register to a constant. The few remaining users of the scratch register don't do that. I removed other usages of the scratch register in `7c64dab`, `5b5f774`, `19f0eb8`, `f64b7d8`.	2017-01-26 13:10:08 -05:00
George Koehler	1dfd5524e4	In PowerPC top, don't delete addi r0, r0, 0 Also don't delete addis r0, r0, 0. These instructions are special cases that set r0 to zero. If we delete them, then r0 keeps its old value. I caught this bug because osxppc protects the .text segment against writing. (linuxppc doesn't protect it.) A program tried to set r0 to the NULL pointer, but top deleted the instruction, so r0 kept an old return address pointing into .text. Later the program checked that r0 wasn't NULL, tried to write to address r0, and crashed.	2017-01-26 12:44:32 -05:00
George Koehler	8c8f291a07	In PowerPC libem, remove tge.s and powerpc.h Nothing uses the tables in tge.s, after I changed the ncg table. There are no *.e files in libem, so don't try to build them.	2017-01-26 12:39:16 -05:00
George Koehler	f64b7d8ea0	Rewrite how PowerPC ncg does conditional branches and tests. The rewritten code rules bring 3 new features: 1. The new rules compare a small constant with a register by reversing the comparison and using `cmpwi` or `cmplwi`. The old rules put the constant in a register. 2. The new rules emit shorter code to yield the test results, without referencing the tables in mach/powerpc/ncg/tge.s. 3. The new rules use the extended `beq` and relatives, not the basic `bc`, in the assembly output. I delete the old tristate tokens and the old moves, because they confused me. Some of the old moves weren't really moves. For example, `move R3, C0` and then `move C0, R0` did not move r3 to r0. I rename C0 to CR0.	2017-01-25 19:08:55 -05:00
George Koehler	a348853ece	Add missing size declarations for 8-byte registers. This fixes the coercion from IND_ALL_D to FREG. The coercion had never happened, because IND_ALL_D had 8 bytes but FREG had 4 bytes. Instead, ncg always stacked the IND_ALL_D and unstacked a FREG. The stacking rule uses f0, so the code did load f0 with the indirect value, push f0 to stack, load f1 to stack, move stack pointer. Now that FREG has 8 bytes, ncg does the coercion, and the code just loads f1 with the indirect value.	2017-01-25 11:56:58 -05:00
George Koehler	188b23bade	Add constraints for pat lab, as done in the m68020 table. Always use 'kills ALL' when reaching a label, because our registers and tokens have the wrong values if the program jumps to this label from somewhere else. When falling through a label, if the top element is in r3, then require that the rest of the stack is in the real STACK, not in registers or tokens. I'm doing this to be certain that the missing constraints are not causing bugs. I did not find any such bug, perhaps because the labels are usually near other instructions (like conditional branches and function calls) that stack or kill tokens.	2017-01-24 11:26:35 -05:00
George Koehler	bb67dbeb11	Use "kills ALL" instead of a list of killed registers. This is for fef 8 and fif 8. I changed .fef8 so it no longer kills r7, but I don't want to update the list. We already use "kills ALL" for most other calls to libem.	2017-01-23 17:31:29 -05:00
George Koehler	032bcffef6	In PowerPC libem, use the new features of our assembler. The new features are the hi16/lo16 and ha16/lo16 syntax for relocations, and the extended mnemonics like "blr". Use ha16/lo16 to load some double floats with 2 instructions (lis/lfd) instead of 3 (lis/ori/lfd). Use the extended names for branches, comparisons, and bit rotations, so I can more easily read the code. The new names often encode the same machine instructions as the old names, except in a few places where I changed the instructions. Stop using andi. when we don't need to set cr0. In inn.s, I change andi. to extrwi to extract the same bits. In los.s and sts.s, I change "andi. r3, r3, ~3" to "clrrwi r3, r3, 2". This avoids setting cr0 and also stops clearing the high 16 bits of r3. In csa.s, los.s, sts.s, I change some comparisons and right shifts from signed to unsigned (cmplw, cmplwi, srwi), because the sizes are unsigned. In inn.s, the right shift can be signed (sraw) or unsigned (srw), but I use srw because we don't need the carry bit. In fef8.s, I save an instruction by using rlwinm instead of addis/andc to rlwinm to clear a field. The code no longer kills r7. In both fef8.s and fif8.s, I remove the list of killed registers. Also remove some whitespace from ends of lines.	2017-01-23 17:16:39 -05:00
George Koehler	5aa2ac2246	Teach the assembler about PowerPC extended mnemonics. Also make a few changes to basic mnemonics. Fix typo in name of the basic "creqv". Add the basic "addc" and relatives, because it would be odd to have the extended "subc" without "addc". Fix the basic "rldicl", "rldicr", "rldic", "rldimi" to correctly encode the 6-bit MB field. Fix "slw" and relatives to correctly swap their RA and RS operands. Add many, but not all, of the extended mnemonics from IBM's Power ISA Version 2.06 Book I Appendix E. (I used 2.06, published 2009, just because I already had the PDF of it.) This commit includes mnemonics for branching, subtraction, traps, bit rotation, and a few others, like "mflr" and "nop". The assembler now understands branches like `beq cr7, label` and bit shifts like `slwi r7, r7, 2`. These encode the same machine instructions as the basic "bc" and "rlwinm". Some operands to basic names become optional. The assembler no longer requires the level in "sc" or the branch hint in "bcctr" and "bclr"; they default to zero. Some extended names take an optional branch hint or condition register. Some extended names are still missing. I don't provide names with static branch prediction, like "beq+" or "bge-", because the assembler parses '+' and '-' as operators, not as part of an instruction name. I also don't provide some names that 2.06 has for moving to or from the condition register or some special purpose registers, names like "mtcr" or "mfuamr". This commit also deletes some unused tokens and one unused yacc rule.	2017-01-21 23:49:29 -05:00
David Given	d7df126730	Merge pull request #44 from kernigh/kernigh-pr-as mach/proto/as: allow more tokens	2017-01-18 23:33:40 +01:00
George Koehler	f705339f86	Allow more tokens in the assembler. I need this so I can add more %token lines to mach/powerpc/as/mach2.c The assembler's tempfile encoded each token in a byte. This only worked with tokens 0 to 127 and 256 and 383. If a token 384 or higher existed, the assembler stopped working. I need tokens 384 and higher. I change the token encoding to a 2-byte little-endian integer. I also change a byte in the string encoding.	2017-01-17 22:41:11 -05:00
David Given	232545606d	Merge from default.	2017-01-18 00:02:32 +01:00
George Koehler	ba2a03705e	Use prototypes in mach/proto/as/comm5.c Order the function prototypes in comm1.h to match the order of the function definitions in *.c files.	2017-01-17 16:41:29 -05:00
David Given	81c677d218	Add a bunch more set operations to the PowerPC backends, and the Pascal test for the same.	2017-01-17 22:31:38 +01:00
George Koehler	916d270534	Delay inclusion of <stdint.h> when compiling comm2.y See issue #1 (https://github.com/davidgiven/ack/issues/1). The file mach/proto/as/comm2.y goes through cpp twice. The _include macro, defined in comm2.y and used in comm0.h, delays the inclusion of system header files. The inclusion of <stdint.h> wasn't delayed. This caused multiple inclusions of <sys/_types.h> in FreeBSD and <machine/_types.h> in OpenBSD. Use _include to delay <stdint.h>. Also use _include for "arch.h" and "out.h", because h/out.h includes <stdint.h> and h/arch.h might include it in the future. Sort the system includes in comm0.h by moving them up to be with <stdint.h>. Must include <stdint.h> before "mach0.c", because mach/powerpc/as/mach0.c needs it. Must include "mach0.c" before checking ASLD.	2017-01-16 22:39:44 -05:00
George Koehler	e97116c037	Remove some obsolete code that causes a gcc warning. In my OpenBSD/amd64 system, the code becomes if (0) outname.on_valu &= ~(((0xFFFFFFFF)<<32)<<32); The 0xFFFFFFFF is a 32-bit int, so the left shift by 32 is out of range and causes the gcc warning. The intent might be to clear any sign-extended bits, if the assignment outname.on_valu = valu did sign extension. Old C had no unsigned long, so .on_valu would have been long. The code is obsolete because h/out.h now declares .on_valu as uint32_t.	2017-01-16 18:09:55 -05:00
David Given	c471f617b7	Ensure that memory is zero-initialised.	2017-01-16 22:45:03 +01:00
David Given	2cdcc16bc2	Fix a buffer overrun that was manifesting on OpenBSD; also fix a bounds check and some uninitialised variable problems.	2017-01-16 22:44:37 +01:00
David Given	fa5675d439	Run through clang-format.	2017-01-16 21:16:33 +01:00
David Given	e7e29d34ff	Add a test (currently failing) to check that Pascal char sets can store all 256 possible values. Add the PowerPC ncg and mcg backend support to let the test actually run, including modifying a bunch of PowrePC libem functions so that they can be called from both ncg and mcg.	2017-01-15 22:28:14 +01:00
David Given	9a346c382d	Turns out Apple's hi16/ha16 exactly match my ha16/has16, so renamed accordingly. (Memo to self: read the docs before doing the work.)	2017-01-15 11:59:33 +01:00
David Given	f80acfe9f5	Signed vs unsigned lower halves of powerpc fixups are now handled by having two assembler directives, ha16() and has16(), for the upper half; has16() applies the sign adjustment. .powerpcfixup is now gone, as we generate the relocation in ha*() instead. Add special logic to the linker for undoing and redoing the sign adjustment when reading/writing fixups. Tests still pass.	2017-01-15 11:51:37 +01:00
David Given	3c0bc205fc	Update the hi/lo syntax to be a bit more standard.	2017-01-15 10:21:02 +01:00
David Given	8edbff9795	Add assembler support for fixing up arbitrary oris/addi pairs of instructions; this should allow oris/lwz constant value loads, which will save an opcode.	2017-01-15 00:15:01 +01:00
David Given	efab08178b	Fix a bunch of issues with pushing and popping mismatched sizes, which the B compiler does a lot; dup 8 for pairs of words is now optimised.	2017-01-07 18:47:00 +01:00
David Given	6b4f8d72b8	ine and ste are now declared to modify memory (preventing cached values being propagated across the modification).	2017-01-07 13:25:09 +01:00
David Given	7710c76d56	Introduce sequence points before store instructions to prevent loads from the same address being delayed until after the store (at which point they'll return the wrong value).	2017-01-07 13:17:39 +01:00
David Given	0da248dced	Use a better NOT; and after remembering that PowerPC bit numbers are all backwards in the documentation, rewrote IFEQ/IFLT/IFLE to actually work. Probably. Thanks to the B test suite for spotting this.	2017-01-07 01:03:15 +01:00
David Given	73922f1d16	Ensure that procedure labels are word-aligned.	2017-01-06 22:29:52 +01:00
David Given	e3f8fb84dc	Change the i80 assembler to be three-pass, which allows forward references; required for assembling B.	2016-12-29 17:08:53 +00:00
David Given	e50f4be710	Merge from default.	2016-12-26 19:44:48 +00:00
David Given	bf2e0be69a	Merge pull request #27 from kernigh/pr-qemu-doze Teach qemuppc to halt the cpu on _exit().	2016-12-11 23:17:12 +01:00
George Koehler	8605a2fcfc	Add Modula-2 set operations to PowerPC ncg. This provides and, ior, xor, com, zer, set, cms when defined($1) and ior, set when !defined($1). I don't provide the other operations !defined($1) because our Modula-2 compiler hasn't used them. I wrote a Modula-2 example in https://gist.github.com/kernigh/add79662bb3c63ffb7c46d01dc8ae788 Put a dummy comment in mach/powerpc/libem/build.lua so git checkout will touch that file. Without the touch, the build system doesn't see the new *.s files.	2016-12-10 12:23:07 -05:00
George Koehler	fcda786fe9	Add some missing clauses to los, sts, aar, inn, cmi, cmu. We only implement 'los 4', 'sts 4', 'cmi 4', 'cmu 4', not for sizes other than 4. Add clause $1==4. We only implement inn when defined($1). The rule for aar needs 'kills ALL' because it kills many registers, like other rules that call libem.	2016-12-09 19:49:50 -05:00
George Koehler	436114fce4	Add a move from CONST smalls(%val) to GPR. This allows 'move {CONST, $1}, R3' with a small enough $1 to emit one instruction (addi) instead of two instructions (addis, ori). The CONST token confusingly isn't in the CONST_ALL set.	2016-12-09 18:40:14 -05:00
George Koehler	17211eef47	Fix ass to match the EM spec. The spec says, "ASS w: Adjust the stack pointer by w-byte integer". The w argument "can either be given as argument or on top of the stack." Therefore, 'ass 4' would pop the 4-byte integer from the stack, but 'ass' would pop the size w from the stack, then pop the w-byte integer. PowerPC ncg wrongly implemented 'ass' as if it was 'ass 4'. Fix it to accept only 'ass 4'.	2016-12-09 17:32:42 -05:00
George Koehler	5bd0ad4269	Remove the bogus rules for 'lor 2' and 'str 2'. These instructions would load or store the EM heap pointer. They don't work. Programs must use brk() or sbrk() in libsys. The last file to use 'lor 2' and 'str 2' was lang/pc/libpc/sav.e in the Pascal library. Commit `c084f9f` deleted the file, so we no longer need rules 'lor 2' or 'str 2' to build the ACK.	2016-12-09 17:00:56 -05:00
George Koehler	805883e377	Fill in a hint for enabling the COMMENT macro. If you want to enable comments in the .s file, change #define COMMENT(n) /* comment {LABEL, n} */ to #define COMMENT(n) comment {LABEL, n}	2016-12-09 16:58:47 -05:00
George Koehler	244e554f2f	Remove trailing whitespace in mach/powerpc/ncg/table	2016-12-09 16:36:42 -05:00
George Koehler	b8c921ca70	Allow mfspr, mtspr with a register number. PowerPC has a few hundred special-purpose registers. The assembler had only accepted the names "xer", "lr", "ctr". Most programs use only those three SPRs. If I add more names, they would almost never get used, and they might conflict with labels. I want to use "mfspr r3, 0x3f0" and "mtspr 0x3f0, r3" in plat/qemu/boot.s to access register hid0 from supervisor mode.	2016-12-07 17:28:00 -05:00
David Given	55e24e1f24	inn was assuming that bitfields were arrays of bytes, when actually they're arrays of words (which makes the LSB move on big-endian systems).	2016-12-06 21:45:20 +01:00
David Given	fbd6e8f63d	Add support for consecutive labels; needed by the B compiler.	2016-11-27 21:18:00 +01:00
David Given	5bce5fc4da	Change the extension used by Basic files for .b to .bas, to avoid conflicts with B.	2016-11-27 20:38:33 +01:00
David Given	f8fa3ece42	inn on ncg now passes the CPU tests.	2016-11-20 19:35:34 +01:00
David Given	953c08839f	inn works now; add a helper for it.	2016-11-20 12:53:44 +01:00
David Given	196fa914b3	lxa now works, I hope; traps are better (and stubbed out on qemuppc).	2016-11-20 11:57:21 +01:00
David Given	d5328492d7	Better handling of float conversions; more tests; converting to unsigned ints works now.	2016-11-20 11:27:40 +01:00
David Given	454a7494bb	cif8 and cuf8 work now. More tests.	2016-11-19 11:42:30 +01:00
David Given	cc660b230f	Floats and doubles are now written out correctly.	2016-11-19 11:39:13 +01:00
David Given	d31bc6a3f9	Made csa and csb work with mcg; adjust the libem functions and the corresponding invocation in the ncg table so the same helpers can be used for both mcg and ncg. Add a new IR opcode, FARJUMP, which jumps to a helper function but saves volatile registers.	2016-11-19 10:55:41 +01:00
David Given	5208e5f751	Yet another OB1 stack format fix.	2016-11-19 10:42:22 +01:00
David Given	43439c6d0c	Remember to push the result of lor onto the stack.	2016-11-17 22:04:32 +01:00
David Given	81bc2c74c5	A bb's regsin are no longer the same as those of its first instruction; occasionally the first hop of a block would try to rearrange its registers (due to evicted throughs), resulting in the phi moves copying values into the wrong registers.	2016-11-16 20:52:15 +01:00
David Given	581fa4a457	Reenable eviction of corrupted registers, which had been broken by a previous change. Change the register move code to get swaps right, or at least righter.	2016-11-15 21:55:10 +01:00
David Given	86c832ef86	Put saved registers in actually the write place. I hope.	2016-11-15 21:54:15 +01:00
David Given	cc686ded62	Get subtractions the right way round.	2016-11-15 20:25:11 +01:00
David Given	0289b1004e	Allow values left on the stack at the end of the procedure (it's legal!).	2016-11-14 21:47:49 +01:00
David Given	e7132183fb	Fix buffer overrun: if LABEL_STARTER is seen but LABEL_TERMINATOR is not, the label parser will keep going forever looking for the end of the label. It now stops at the end of the string.	2016-11-13 14:04:58 +01:00
David Given	852d3a691d	Update the table to return call output values in the right registers. Fix the register allocator so the corrupted registers only apply to throughs (otherwise, you can't put output registers in corrupted registers).	2016-11-11 21:48:36 +01:00
David Given	b5c1d622f5	Rework the way stack frames are laid out to be simpler and, hopefully, more correct. Saved registers are now placed in what may be the right place.	2016-11-11 21:17:45 +01:00
David Given	84ee75ec07	Merge from default.	2016-11-11 20:17:54 +01:00
David Given	d82df74a7a	Rename addr_t to address_t to avoid clashes with the system addr_t.	2016-11-11 20:17:10 +01:00
David Given	fd91851005	Add enough return types to the K&R C that the ACK builds (on Linux) using clang now.	2016-11-10 22:04:18 +01:00
David Given	4fa2c94a4a	Correctly mangle labels used in initialisers.	2016-10-31 23:21:33 +01:00
David Given	9261cd978d	Typo fix.	2016-10-31 23:16:02 +01:00
David Given	941072e0d7	Add, I hope, patterns for fmsub, fnmadd, and fnmsub (also float versions).	2016-10-31 22:36:54 +01:00
David Given	44f0cea6ca	Also use fmadd for single-precision floats.	2016-10-31 19:55:16 +01:00
David Given	064d1a5d5d	Use fmadd for multiply-and-add instructions.	2016-10-31 19:52:17 +01:00
David Given	e19850b114	Fix a few c11isms.	2016-10-30 16:51:06 +01:00
David Given	ca5b6e07bb	Properly export symbols.	2016-10-29 23:52:17 +02:00
David Given	8c3670483f	Get top working with the PowerPC; use it to eliminate useless branches and moves.	2016-10-29 23:37:11 +02:00
David Given	a8c4dac67c	Merge from default (merging in George Koehler's PowerPC changes).	2016-10-29 22:40:40 +02:00
David Given	a311e61360	Add support for preserved registers.	2016-10-29 20:22:44 +02:00
David Given	e3ebf986e9	More opcodes.	2016-10-29 13:32:09 +02:00
David Given	1ae8b90238	More opcodes.	2016-10-29 12:55:34 +02:00
David Given	acaae765af	Emit negative constants correctly.	2016-10-29 12:55:21 +02:00
David Given	61349389fb	More opcodes. sti can now cope with non-standard sizes (really need a better fix for this). Hack in crude support for mismatched stack pushes and pops (ints vs longs).	2016-10-29 12:48:05 +02:00
David Given	68419da235	Actually, the locals need to go above the spills and saved regs, so fp == lb.	2016-10-29 12:00:33 +02:00
David Given	2cc2c0ae98	Lots more opcodes. Rearrange the stack layout so that fp->ab is a fixed value (needed for CHAINFP and FPTOAB). Wire up lfrs to calls via a phi when necessary, to allow call-bra-lfr chains.	2016-10-29 11:57:56 +02:00
David Given	bfa65168e2	Don't generate phis if unnecessary (because this breaks the critical-edge-splitting guarantee and causes insertion of phi copies to fail).	2016-10-29 10:55:48 +02:00
David Given	658db4ba71	Mangle label names (turns out that the ACK assembler can't really cope with labels that are the same name as instructions...).	2016-10-27 23:17:16 +02:00
David Given	81525c0f2c	Swaps work (at least for registers). More opcodes. Rearrange the stack layout so we can always trivially find fp, which lets CHAINFP work.	2016-10-27 21:50:58 +02:00
David Given	be3dece5af	Allow emission of strings containing ".	2016-10-27 21:48:46 +02:00
David Given	51bd3ee4dd	Fix bug where some phis weren't being inserted when a given variable definition needed more than one phi (due to the dominance frontier containing more than one basic block).	2016-10-27 21:40:25 +02:00
David Given	9977ce841a	Remove the bytes1, bytes2, bytes4, bytes8 attributes; remove the concept of a register 'type'; now use int/float/long/double throughout to identify registers. Lots of register allocator tweaks and table bugfixes --- we now get through the dreading Mathlib.mod!	2016-10-25 23:04:20 +02:00
David Given	45a7f2e993	Phi copies are now inserted as part of type inference. More opcodes.	2016-10-24 22:14:08 +02:00
David Given	111c13e253	More opcodes.	2016-10-24 20:15:22 +02:00
David Given	a4644dee4d	More opcodes.	2016-10-24 12:08:40 +02:00
David Given	b22780c075	More opcodes, including the difficult and fairly stupid los/sts.	2016-10-23 22:24:08 +02:00
David Given	abd0cedd61	Massive change to how IR types are handled; we use the type code for matching rather than the size. Much cleaner and simpler.	2016-10-23 21:54:14 +02:00
David Given	b1a3d76d6f	Re-re-add the type inference layer, now I know more about how things work. Remove that terrible float promotion code.	2016-10-22 23:04:13 +02:00
David Given	11b0bc1055	More opcodes.	2016-10-22 20:32:51 +02:00
David Given	2d52b1fdaa	Remove GETRET; values are now returned directly by CALL. Fix a bug in convertstackops which was resulting in duplicate IR groups.	2016-10-22 12:13:57 +02:00
David Given	ceb938fb3c	More opcodes.	2016-10-22 11:26:28 +02:00
David Given	7ae888b754	Hacky workaround the way the Modula-2 compiler generates non-standard sized loads and saves. More opcodes; simplified table using macros.	2016-10-22 10:48:22 +02:00
David Given	90d0661639	Typo fix.	2016-10-22 00:48:55 +02:00
David Given	f851ab83af	Better (and more correct) floating point conversions; fif; various new opcodes.	2016-10-22 00:48:26 +02:00
David Given	d535be87b1	fef4 and fef8 is now cleaner, albeit slower; add some more register alias stuff.	2016-10-22 00:02:15 +02:00

... 2 3 4 5 6 ...

2677 commits