When I wrote fef 8, I forgot to test denormalized numbers. Oops. Now
fix two of my mistakes:
- When checking for zero, `extrwi r6, r3, 22, 12` needs to be
`extrwi r6, r3, 20, 12`. There are only 20 bits to extract.
- After the multiplication by 2**64, I forgot to put the fraction in
[0.5, 1) or (-1, 0.5] by setting IEEE exponent = 1022.
Teach fif 8 about signed zero and NaN.
In ncg/table, change cmf so NaN is not equal to any value, and comment
why ordered comparisons don't work with NaN. Also add cost for
fctwiz, remove extra `uses REG`.
Edit comment in cfu8.s because the conditional branch might be before
or after fctwiz.
With my PowerBook G4, a program that converts values from 1.0 to
4000000.0 runs in about 0.32s with the old .cfu8 and 0.29s with this
shrunken .cfu8
Leave a comment about other ways to implement .cfu8
The new features are the hi16/lo16 and ha16/lo16 syntax for
relocations, and the extended mnemonics like "blr".
Use ha16/lo16 to load some double floats with 2 instructions (lis/lfd)
instead of 3 (lis/ori/lfd).
Use the extended names for branches, comparisons, and bit rotations,
so I can more easily read the code. The new names often encode the
same machine instructions as the old names, except in a few places
where I changed the instructions.
Stop using andi. when we don't need to set cr0. In inn.s, I change
andi. to extrwi to extract the same bits. In los.s and sts.s, I
change "andi. r3, r3, ~3" to "clrrwi r3, r3, 2". This avoids setting
cr0 and also stops clearing the high 16 bits of r3.
In csa.s, los.s, sts.s, I change some comparisons and right shifts
from signed to unsigned (cmplw, cmplwi, srwi), because the sizes are
unsigned. In inn.s, the right shift can be signed (sraw) or unsigned
(srw), but I use srw because we don't need the carry bit.
In fef8.s, I save an instruction by using rlwinm instead of addis/andc
to rlwinm to clear a field. The code no longer kills r7. In both
fef8.s and fif8.s, I remove the list of killed registers.
Also remove some whitespace from ends of lines.
GNU as has "la %r4,8(%r3)" as an alias for "addi %r4,%r3,8", meaning
to load the address of the thing at 8(%r3). Our 'la', now 'li32',
makes an addis/ori pair to load an immediate 32-bit value. For
example, "li32 r4,23456789" loads a big number.
calculated incorrectly because of overflow errors.
Replace it with an extended RELOPPC relocation which understands addis/ori
pairs; add an la pseudoop to the assembler which generates these and the
appropriate relocation. Make good.
--HG--
branch : dtrg-experimental-powerpc-branch