register 'type'; now use int/float/long/double throughout to identify
registers. Lots of register allocator tweaks and table bugfixes --- we now get
through the dreading Mathlib.mod!
This would have happened later, if f14 to f31 became regvar (like r13
to r31 are now). I am doing it now because ncg is too slow for rules
"with FREG FREG uses FREG". We use such rules for adf 8 and other EM
instructions that operate on 2 floats. Like my last commit cfbc537,
this commit speeds ncg by removing choices for register allocation.
ncg is too slow with this many registers. A stack pattern "with GPR
GPR GPR" or "with REG REG REG" takes too long to pick registers,
causing ncg 8 to take about 2 seconds on each sti 8. I introduce
REG_PAIR and there are only 4 such pairs.
For programs that use sti 8 (including C programs that copy 8-byte
structs), this speed hack improves the ncg run from several seconds to
almost instantaneous.
Also add a few COMMENT(...) lines in stacking rules.
This fixes the SIGILL (illegal instruction) in startrek when firing
phasers. The 32-bit processors in my PowerPC Mac and in QEMU don't
have fctid, a 64-bit instruction.
I got the idea from mach/proto/fp/fif8.c to extract the exponent,
clear some bits to get an integer, then subtract the integer from
the original value to get the fraction.
Adjust some of the loi rules (and associated moves) so we can identify
the tokens that must be in MEMORY.
With this commit, I can navigate the Enterprise even if I comment out
my work-around from e22c888.
Because li32 always loads a label into a GPR, it is sufficient to
coerce LABEL to REG, then use IND_RC_W or IND_RC_D for indirection
through the label.
Now that SUM_RC always has a signed 16-bit constant, it happens that
the various IND_RC_* tokens also have a signed 16-bit constant, so
we no longer need to touch the scratch register.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
Change the operator in his() from a - minus to a + plus. When los(n)
becomes negative, then his(n) needs to add 0x10000, not subtract it.
Also change los(n) to do the sign extension, because smalls(los(n))
should be true, not false.
Also change hi(n) and lo(n) to wrap n in parentheses, as (n), because
these are macros and n might still contain operators.
We only need GPRE in a few places where we write {GPRE, regvar(...)}
because ncgg can't parse plain regvar(...). In all other places, a
plain GPR works.
Also remove gpr_gpr_gpr and a few other unused and fake instructions
from the list of instructions.
Rename the scratch gpr (currently r11) from SCRATCH to RSCRATCH so I
can search for RSCRATCH without finding FSCRATCH. I also want to
avoid confusion with the SCRATCH keyword of the old code generator (cg
which came before ncg).
Change the stacking rules to prevent stacking of RSCRATCH or FSCRATCH
or any other GPR or FPR that isn't an allocatable REG or FREG. Then
ncgg rejects any rule that tries to stack a GPR or FPR, so change such
rules to stack a REG or FREG.
In our powerpc table, sdl fails to kill the old value of the local.
This is a bug, because a later ldl can load the old value instead of
the newly stored value. By rewriting "sdl 0" "ldl 0" as "dup 8" "sdl
0", the newly added rule works around the bug, but only when the ldl
is immediately after the sdl.
This rule improves code that uses double-precision floating point.
The output of printf("%f", 6.0) in C changes from all zero digits to
"6000000" but still doesn't print the decimal point. The result of
atof("-123.456") becomes correct. In startrek, I can now move the
Enterprise, but I still can't fire phasers without crashing the game.
We already have a rule for stl lol $1==$2. We had two copies of the
rule, so I am deleting the second copy.
In EM, fef splits a float into exponent and fraction. The old C code,
given an infinite float, got stuck in an infinite loop. The new
assembly code doesn't loop; it extracts the IEEE exponent.