d0p1/ack - Cute Engineering : Cute solutions to hard problems

d0p1/ack

Author	SHA1	Message	Date
David Given	25b6712e63	Rework all the ackbuilder scripts not to use wildcards, because we can't expand them without luaposix, which isn't available (easily) on OSX or Windows.	2022-07-14 23:57:54 +02:00
George Koehler	9077b3a5ab	Teach mcg to pass our tests. Tests pass if one edits the top build.lua to uncomment "qemuppc" from both vars.plats and vars.plats_with_tests, and one leaves mcg in plat/qemuppc/descr. Add or correct some EM instructions in treebuilder.c: - "lof", "stf": handle negative offsets in load() and store(). - "cuu": add using IR_FROMUI. - "lim", "sim": keep an entire word in ".ignmask", to be compatible with mach/powerpc/libem/trp.s and ncg. We also keep a word in ".ignmask" in ncg for both i386 and m68020. - "trp": pass trap number in register. See comment in helper_function_with_arg(). - "sig": push the old value of .trppc on the stack. - "and ?", "ior ?", "xor ?", "com ?", "cms ?", "set ?", "inn ?": connect to helper functions in libem. - "blm", "bls": drop call to memmove() and use new helper ".bls4", because tests/plat/structcopy_e.c can't call memmove(). - "xor s", "cms s": if s is large, fall back on helper function. - "rol", "ror": add by decomposing each rotate into 4 IR ops. - "rck s", "bls s": make fatal unless s is word size. - "loi": push multiple loads in the correct order. - "dup s", "exg s": if s is large, fall back on helper. - "dus": add using new helper ".dus4". - "lxl", "lxa": follow the static chain, not the dynamic chain. - "lor 1": materialise the stack before pushing the stack pointer. - "lor 2", "str 2": make fatal. - "los", "sts": drop calls to memcpy() and use helpers ".los4" and and ".sts4", so lang/m2/libm2/LtoUset.e starts working. - "gto": correctly read descriptor. Change mach/powerpc/mcg/table: - ANY.L: add for "asp -8". - LOAD.L: work around register corruption. - COMPAREUL.I: add for "cms 8".	2018-01-31 21:05:40 -05:00
George Koehler	e83aaca3ec	Add some comments before I forget how this stuff works.	2018-01-24 15:17:32 -05:00
George Koehler	66f93f08c5	Add fef 4, fif 4. Improve fef 8, fif 8. Other float changes. When I wrote fef 8, I forgot to test denormalized numbers. Oops. Now fix two of my mistakes: - When checking for zero, `extrwi r6, r3, 22, 12` needs to be `extrwi r6, r3, 20, 12`. There are only 20 bits to extract. - After the multiplication by 2**64, I forgot to put the fraction in [0.5, 1) or (-1, 0.5] by setting IEEE exponent = 1022. Teach fif 8 about signed zero and NaN. In ncg/table, change cmf so NaN is not equal to any value, and comment why ordered comparisons don't work with NaN. Also add cost for fctwiz, remove extra `uses REG`. Edit comment in cfu8.s because the conditional branch might be before or after fctwiz.	2018-01-22 14:04:15 -05:00
George Koehler	64b50b3a45	Shrink .cfu8 With my PowerBook G4, a program that converts values from 1.0 to 4000000.0 runs in about 0.32s with the old .cfu8 and 0.29s with this shrunken .cfu8 Leave a comment about other ways to implement .cfu8	2018-01-07 16:03:55 -05:00
George Koehler	26de4c1ab1	Add test for EM _rck_. Fix traps in PowerPC ncg. The new test rck_e.e segfaults on PowerPC unless I make some changes. The inline code for _rck_ was wrong because it didn't allow the trap handler to return. _sig_ forgot to push the old trap handler. Move plat/linuxppc/libsys/trap.s to mach/powerpc/libem/trp.s and rewrite it with simplified/extended mnemonics. Remove .trap alias for .trp procedure. Add a missing `mtspr lr, r0` so we can return from the trap handler. Call write() and _exit() so trp.s works with both linuxppc and osxppc. Before, Mac OS X was wrongly using the trap.s for Linux. In powerpc/libem, simplify .aar4; teach .csa and .csb to raise the trap if the default target is zero. C programs don't need these changes. You may relink your C programs with the changed .csa and .csb, but C code doesn't raise the trap. Modula-2 code can raise traps, so you may want to relink your Modula-2 programs with the changed libem, but you might keep your old .o files from Modula-2. You may need to recompile your Pascal programs (delete old .o files from Pascal) because the Pascal compiler might use _rck_.	2017-12-24 22:37:52 -05:00
George Koehler	504d2aa34e	Revise stack shuffles and integer conversions in PowerPC ncg. Allow asp 4, exg 4 to shuffle tokens without coercing them into registers; but comment why dup 4, dup 8 coerce tokens into registers. Allow dup, dus, exg with larger sizes; and add tests dup_e.e and exg_e.e to check that dup 20, dus, exg 20 work as well in powerpc as in i80 and i86. Then powerpc failed to compile loc 2 loc 4 cuu in dup_e.e. Revise the integer conversions, so powerpc can compile and pass the test.	2017-12-09 18:57:10 -05:00
George Koehler	459a9b5949	Use lwzu, stwu to tighten more loops. Because lwzu or stwu moves the pointer, I can remove an addi instruction from the loop, so the loop is slightly faster. I wrote a benchmark in Modula-2 that exercises some of these loops. I measured its time on my old PowerPC Mac. Its user time decreases from 8.401s to 8.217s with the tighter loops.	2017-10-18 12:12:42 -04:00
George Koehler	47bd0ef7a7	Stop inlining code to convert integers to floats. Do the conversion by calling .cif8 or .cuf8 in libem, as it was done before my commit `1de1e8f`. I used the inline conversion to experiment with the register allocator, which was too slow until `c5bb3be`. Now that libem has the only copy of the code, move some comments and code changes there.	2017-10-17 17:00:28 -04:00
George Koehler	736c45453c	Remove .ret from libem and inline the code. This removes a wrong-way dependency of libsys on libem. The C functions in libsys called .ret, but libsys is after libem in the linker arguments, so the linker didn't find .ret unless something else had called .ret. Almost everything called .ret, but I got a linker error when I wrote an assembly program using the EM runtime, because my assembly program didn't call .ret. Add a dummy comment to build.lua, so git checkout touches that file, the build system reconfigures itself, and the *.s glob sees that ret.s has gone.	2017-02-16 21:18:39 -05:00
George Koehler	dc05cb2dc8	Add pat cms !defined($1) Switch .cms to pass inputs and outputs on the real stack, not in registers; like we do with .and, .or (`81c677d`) and .xor (`c578c49`). At this point, nearly all functions in libem use the real stack, not registers, for passing inputs and outputs. This simplifies the ncg table (which needs fewer lists of specific registers) but slows calls to libem. For example, after `ba9b021`, each call to .aar4 is about 10 instructions slower. I moved 3 inputs and 1 output from registers to the real stack. A program would take 4 instructions to move registers to stack, 4 to move stack to registers, and perhaps 2 to adjust the stack pointer.	2017-02-13 16:52:32 -05:00
George Koehler	ba9b021253	Use .los4 in lar 4 and .sts4 in sar 4. Our libem had two implementations of loading a block from a stack, one for lar 4 and one for los 4. Now lar 4 and los 4 share the code in .los4. Likewise, sar 4 and sts 4 share the code in .sts4. Rename .los to .los4 and .sts to .sts4, because they implement los 4 and sts 4. Remove the special case for loading or storing 4 bytes, because we can do it with 1 iteration of the loop. Remove the lines to "align size" where the size must already be a multiple of 4. Fix the upper bound check in .aar4. Change .aar4, .lar4, .los4, .sar4, .sts4 to pass all operands on the real stack, except that .los4 and .sts4 take the size in register r3. Have .aar4 set r3 to the size of the array element. So lar 4 is just .aar4 then .los4, and sar 4 is just .aar4 then .sts4. ncg no longer calls .lar4 and .sar4 in libem, because it inlines the code; but I keep .lar4 and .sar4 in libem, because mcg references them. They might or might not work in mcg.	2017-02-13 15:22:00 -05:00
George Koehler	54949f713f	Change .fef8 and .fif8 to pass values on the stack. Reorder the code in .fef8 and .fif8 so that in the usual case, we fall through to the blr without taking any branches. The usual case, by my guess, is .fef8 with normalized numbers or .fif8 with small integers. I change .fef8 and .fif8 to pass values on the real stack, not in specific registers. This simplifies the ncg table, and might help me experiment with changes to the ncg table. This change might or might not help mcg. Seems that mcg always uses the stack to pass values to libem, but I have not tested .fef8 or .fif8 with mcg.	2017-02-12 16:44:37 -05:00
George Koehler	c578c495bb	Edit PowerPC assembly for .and, .cms, .ior, .xor, .zer Remove one addi instruction from some loops. These loops had increased 2 pointers, they now increase 1 index. I must initialize the index, so I add "li r6, 0" before each loop. Change .zer to use subf instead of neg, add. Change .xor to take the size on the real stack, as .and and .or have done since `81c677d`.	2017-02-11 18:00:56 -05:00
George Koehler	8c8f291a07	In PowerPC libem, remove tge.s and powerpc.h Nothing uses the tables in tge.s, after I changed the ncg table. There are no *.e files in libem, so don't try to build them.	2017-01-26 12:39:16 -05:00
George Koehler	032bcffef6	In PowerPC libem, use the new features of our assembler. The new features are the hi16/lo16 and ha16/lo16 syntax for relocations, and the extended mnemonics like "blr". Use ha16/lo16 to load some double floats with 2 instructions (lis/lfd) instead of 3 (lis/ori/lfd). Use the extended names for branches, comparisons, and bit rotations, so I can more easily read the code. The new names often encode the same machine instructions as the old names, except in a few places where I changed the instructions. Stop using andi. when we don't need to set cr0. In inn.s, I change andi. to extrwi to extract the same bits. In los.s and sts.s, I change "andi. r3, r3, ~3" to "clrrwi r3, r3, 2". This avoids setting cr0 and also stops clearing the high 16 bits of r3. In csa.s, los.s, sts.s, I change some comparisons and right shifts from signed to unsigned (cmplw, cmplwi, srwi), because the sizes are unsigned. In inn.s, the right shift can be signed (sraw) or unsigned (srw), but I use srw because we don't need the carry bit. In fef8.s, I save an instruction by using rlwinm instead of addis/andc to rlwinm to clear a field. The code no longer kills r7. In both fef8.s and fif8.s, I remove the list of killed registers. Also remove some whitespace from ends of lines.	2017-01-23 17:16:39 -05:00
David Given	81c677d218	Add a bunch more set operations to the PowerPC backends, and the Pascal test for the same.	2017-01-17 22:31:38 +01:00
David Given	e7e29d34ff	Add a test (currently failing) to check that Pascal char sets can store all 256 possible values. Add the PowerPC ncg and mcg backend support to let the test actually run, including modifying a bunch of PowrePC libem functions so that they can be called from both ncg and mcg.	2017-01-15 22:28:14 +01:00
George Koehler	8605a2fcfc	Add Modula-2 set operations to PowerPC ncg. This provides and, ior, xor, com, zer, set, cms when defined($1) and ior, set when !defined($1). I don't provide the other operations !defined($1) because our Modula-2 compiler hasn't used them. I wrote a Modula-2 example in https://gist.github.com/kernigh/add79662bb3c63ffb7c46d01dc8ae788 Put a dummy comment in mach/powerpc/libem/build.lua so git checkout will touch that file. Without the touch, the build system doesn't see the new *.s files.	2016-12-10 12:23:07 -05:00
David Given	55e24e1f24	inn was assuming that bitfields were arrays of bytes, when actually they're arrays of words (which makes the LSB move on big-endian systems).	2016-12-06 21:45:20 +01:00
David Given	953c08839f	inn works now; add a helper for it.	2016-11-20 12:53:44 +01:00
David Given	d31bc6a3f9	Made csa and csb work with mcg; adjust the libem functions and the corresponding invocation in the ncg table so the same helpers can be used for both mcg and ncg. Add a new IR opcode, FARJUMP, which jumps to a helper function but saves volatile registers.	2016-11-19 10:55:41 +01:00
David Given	8c3670483f	Get top working with the PowerPC; use it to eliminate useless branches and moves.	2016-10-29 23:37:11 +02:00
George Koehler	f33b30ed3c	Rewrite .fif8 to avoid powerpc64 fctid This fixes the SIGILL (illegal instruction) in startrek when firing phasers. The 32-bit processors in my PowerPC Mac and in QEMU don't have fctid, a 64-bit instruction. I got the idea from mach/proto/fp/fif8.c to extract the exponent, clear some bits to get an integer, then subtract the integer from the original value to get the fraction.	2016-10-17 00:39:59 -04:00
George Koehler	6ae415d48b	Rewrite fef 8 in powerpc assembly. In EM, fef splits a float into exponent and fraction. The old C code, given an infinite float, got stuck in an infinite loop. The new assembly code doesn't loop; it extracts the IEEE exponent.	2016-09-29 15:52:54 -04:00
George Koehler	5b69777647	Rename our pseudo-opcode 'la' to 'li32'. GNU as has "la %r4,8(%r3)" as an alias for "addi %r4,%r3,8", meaning to load the address of the thing at 8(%r3). Our 'la', now 'li32', makes an addis/ori pair to load an immediate 32-bit value. For example, "li32 r4,23456789" loads a big number.	2016-09-18 17:03:23 -04:00
George Koehler	03b067e1d5	Add the missing .lar4 and .sar4 for powerpc. Inspired by the sparc code (mach/sparc/libem/lar.s). My powerpc code might still have bugs, but it's enough for examples/hilo.mod to work. May need to 'make clean' or touch a build.lua file, so ackbuilder can notice the new lar4.s and sar4.s files and build them.	2016-09-17 23:55:55 -04:00
David Given	80cb6ba927	Eliminate the RELOH2 relocation, as it never worked --- the address would be calculated incorrectly because of overflow errors. Replace it with an extended RELOPPC relocation which understands addis/ori pairs; add an la pseudoop to the assembler which generates these and the appropriate relocation. Make good. --HG-- branch : dtrg-experimental-powerpc-branch	2016-09-17 12:43:15 +02:00
David Given	45a950571d	Mostly add support for the experimental and largely broken linuxppc platform. (Doesn't quite build.) --HG-- branch : dtrg-experimental-powerpc-branch	2016-09-15 23:12:03 +02:00
David Given	f67c98e239	Distributions are a pain --- let's not bother any more. Instead, we just tag the repository and download a complete snapshot, old and ancient stuff and all.	2016-09-02 23:00:38 +02:00
David Given	88bd7ce126	Remove defunct pmfiles. --HG-- branch : default-branch	2016-06-03 13:56:50 +02:00
dtrg	4dd1ff6d80	Archival checkin (semi-working code).	2007-11-02 18:56:58 +00:00

32 commits