2018-01-24 20:17:32 +00:00
|
|
|
/*
|
|
|
|
* PowerPC table for ncg
|
|
|
|
*
|
|
|
|
* David Given created this table.
|
|
|
|
* George Koehler made many changes in years 2016 to 2018.
|
|
|
|
*
|
|
|
|
* This back end provides 4-byte integers, 4-byte floats, and 8-byte
|
|
|
|
* floats. It should provide enough of EM for the ACK's compilers.
|
|
|
|
* - It doesn't provide "mon" (monitor call) nor "lor 2", "str 2"
|
|
|
|
* (heap pointer). Programs should call procedures in libsys to
|
|
|
|
* make system calls or allocate heap memory.
|
|
|
|
* - It generates only a few EM traps:
|
|
|
|
* - EARRAY from aar, lar, sar
|
|
|
|
* - ERANGE from rck
|
|
|
|
* - ECASE from csa, csb
|
|
|
|
* - It uses floating-point registers to move 8-byte values that
|
|
|
|
* aren't floats. This might cause extra FPU context switches in
|
|
|
|
* programs that don't use floating point.
|
|
|
|
*
|
|
|
|
* The EM stack is less than optimal for PowerPC, and incompatible
|
|
|
|
* with the calling conventions of other compilers (like gcc).
|
|
|
|
* - EM and ncg use the stack to pass parameters to procedures. For
|
|
|
|
* PowerPC, this is probably slower than passing them in registers.
|
|
|
|
* - This back end misaligns some 8-byte floats, because EM's stack
|
|
|
|
* has only 4-byte alignment. (This kind of misalignment also
|
|
|
|
* happened in IBM's AIX and Apple's Mac OS, where data structures
|
|
|
|
* had 8-byte floats with only 4-byte alignment.)
|
|
|
|
*/
|
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
EM_WSIZE = 4
|
|
|
|
EM_PSIZE = 4
|
|
|
|
EM_BSIZE = 8 /* two words saved in call frame */
|
|
|
|
|
|
|
|
FP_OFFSET = 0 /* Offset of saved FP relative to our FP */
|
|
|
|
PC_OFFSET = 4 /* Offset of saved PC relative to our FP */
|
|
|
|
|
2017-01-30 20:45:46 +00:00
|
|
|
#define COMMENT(n) /* comment {LABEL, n} */
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
#define nicesize(x) ((x)==1 || (x)==2 || (x)==4 || (x)==8)
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
#define smalls(n) sfit(n, 16)
|
|
|
|
#define smallu(n) ufit(n, 16)
|
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
/* Finds FRAME_V tokens that overlap myoff, mysize. */
|
|
|
|
#define fover(myoff, mysize) (%off+%size>(myoff) && %off<((myoff)+(mysize)))
|
|
|
|
|
|
|
|
/* Checks if we can use {LXFRAME, x}. */
|
|
|
|
#define nicelx(x) ((x)>=1 && (x)<=0x8000)
|
|
|
|
|
2016-10-15 03:59:26 +00:00
|
|
|
#define lo(n) ((n) & 0xFFFF)
|
|
|
|
#define hi(n) (((n)>>16) & 0xFFFF)
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
/* Use these for instructions that treat the low half as signed --- his()
|
|
|
|
* includes a modifier to produce the correct value when the low half gets
|
|
|
|
* sign extended. Er, do make sure you load the low half second. */
|
2016-10-15 03:59:26 +00:00
|
|
|
#define los(n) (lo(n) | (((0-(lo(n)>>15)) & ~0xFFFF)))
|
|
|
|
#define his(n) ((hi(n) + (lo(n)>>15)) & 0xFFFF)
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
|
|
|
|
PROPERTIES
|
|
|
|
|
2017-02-18 00:32:27 +00:00
|
|
|
GPR /* general-purpose register */
|
2018-01-23 23:18:40 +00:00
|
|
|
SPFP /* sp or fp */
|
2017-02-18 00:32:27 +00:00
|
|
|
REG /* allocatable GPR */
|
|
|
|
REG3 /* coercion to r3 */
|
|
|
|
|
|
|
|
FPR(8) /* floating-point register */
|
|
|
|
FREG(8) /* allocatable FPR */
|
|
|
|
FSREG /* allocatable single-precision FPR */
|
|
|
|
|
|
|
|
SPR /* special-purpose register */
|
|
|
|
CR /* condition register */
|
2016-12-09 21:36:42 +00:00
|
|
|
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
REGISTERS
|
|
|
|
|
2017-02-18 00:32:27 +00:00
|
|
|
/*
|
2018-01-24 20:17:32 +00:00
|
|
|
* We use r1 as stack pointer and r2 as frame pointer.
|
|
|
|
* Our assembler has aliases sp -> r1 and fp -> r2.
|
|
|
|
*
|
|
|
|
* We preserve r13 to r31 and f14 to f31 across function
|
|
|
|
* calls to mimic other compilers (like gcc). See
|
|
|
|
* - http://refspecs.linuxbase.org/elf/elfspec_ppc.pdf
|
|
|
|
* - https://github.com/ryanarn/powerabi -> chap3-elf32abi.sgml
|
|
|
|
* - Apple's "32-bit PowerPC Function Calling Conventions"
|
|
|
|
*
|
2017-02-18 00:32:27 +00:00
|
|
|
* When ncg allocates regvars, it seems to start with the last
|
|
|
|
* register in the first class. To encourage ncg to allocate
|
|
|
|
* them from r31 down, we list them in one class as
|
|
|
|
* r13, r14, ..., r31: GPR, REG regvar(reg_any).
|
|
|
|
*/
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2018-01-23 23:18:40 +00:00
|
|
|
r0, r12 : GPR.
|
|
|
|
sp, fp : GPR, SPFP.
|
2017-12-19 01:59:04 +00:00
|
|
|
r3 : GPR, REG, REG3.
|
|
|
|
r4, r5, r6, r7, r8, r9, r10, r11 : GPR, REG.
|
2017-02-13 22:44:46 +00:00
|
|
|
|
2017-02-18 00:32:27 +00:00
|
|
|
r13, r14, r15, r16, r17, r18, r19, r20, r21, r22, r23, r24,
|
|
|
|
r25, r26, r27, r28, r29, r30, r31
|
|
|
|
: GPR, REG regvar(reg_any).
|
2013-05-07 23:48:48 +00:00
|
|
|
|
2017-02-18 00:32:27 +00:00
|
|
|
f0 : FPR.
|
2017-02-13 22:44:46 +00:00
|
|
|
|
2017-02-18 00:32:27 +00:00
|
|
|
f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11, f12, f13
|
2017-12-14 21:26:19 +00:00
|
|
|
: FPR, FREG.
|
2017-02-13 22:44:46 +00:00
|
|
|
|
2017-02-18 00:32:27 +00:00
|
|
|
f14, f15, f16, f17, f18, f19, f20, f21, f22, f23, f24, f25,
|
|
|
|
f26, f27, f28, f29, f30, f31
|
|
|
|
: FPR, FREG regvar(reg_float).
|
2017-02-13 22:44:46 +00:00
|
|
|
|
2017-02-18 00:32:27 +00:00
|
|
|
fs1("f1")=f1, fs2("f2")=f2, fs3("f3")=f3, fs4("f4")=f4,
|
|
|
|
fs5("f5")=f5, fs6("f6")=f6, fs7("f7")=f7, fs8("f8")=f8,
|
|
|
|
fs9("f9")=f9, fs10("f10")=f10, fs11("f11")=f11, fs12("f12")=f12,
|
|
|
|
fs13("f13")=f13
|
|
|
|
: FSREG.
|
2017-02-13 22:44:46 +00:00
|
|
|
|
2017-10-14 16:40:04 +00:00
|
|
|
/* reglap: reg_float may have subregister of different size */
|
|
|
|
fs14("f14")=f14, fs15("f15")=f15, fs16("f16")=f16, fs17("f17")=f17,
|
|
|
|
fs18("f18")=f18, fs19("f19")=f19, fs20("f20")=f20, fs21("f21")=f21,
|
|
|
|
fs22("f22")=f22, fs23("f23")=f23, fs24("f24")=f24, fs25("f25")=f25,
|
|
|
|
fs26("f26")=f26, fs27("f27")=f27, fs28("f28")=f28, fs29("f29")=f29,
|
|
|
|
fs30("f30")=f30, fs31("f31")=f31
|
|
|
|
: FSREG regvar(reg_float).
|
|
|
|
|
2017-02-13 22:44:46 +00:00
|
|
|
lr, ctr : SPR.
|
2018-01-24 20:17:32 +00:00
|
|
|
cr0 : CR. /* We use cr0, ignore cr1 to cr7. */
|
2017-02-13 22:44:46 +00:00
|
|
|
|
2018-01-03 19:51:14 +00:00
|
|
|
/* The stacking rules can't allocate registers. We use these
|
|
|
|
* scratch registers to stack tokens.
|
2017-12-19 01:59:04 +00:00
|
|
|
*/
|
2017-02-13 22:44:46 +00:00
|
|
|
#define RSCRATCH r0
|
|
|
|
#define FSCRATCH f0
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
|
|
|
|
TOKENS
|
|
|
|
|
|
|
|
/* Primitives */
|
|
|
|
|
2017-12-08 00:24:09 +00:00
|
|
|
C /* constant */ = { INT val; } 4 val.
|
2007-11-02 18:56:58 +00:00
|
|
|
LABEL = { ADDR adr; } 4 adr.
|
2017-02-08 17:12:28 +00:00
|
|
|
LABEL_HI = { ADDR adr; } 4 "hi16[" adr "]".
|
|
|
|
LABEL_HA = { ADDR adr; } 4 "ha16[" adr "]".
|
|
|
|
LABEL_LO = { ADDR adr; } 4 "lo16[" adr "]".
|
|
|
|
LOCAL = { INT off; } 4 ">>> BUG IN LOCAL".
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
DLOCAL = { INT off; } 8 ">>> BUG IN DLOCAL".
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
/* Allows us to use regvar() to refer to registers */
|
|
|
|
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
REG_EXPR = { REG reg; } 4 reg.
|
|
|
|
FREG_EXPR = { FREG reg; } 8 reg.
|
2017-10-17 18:15:33 +00:00
|
|
|
FSREG_EXPR = { FSREG reg; } 4 reg.
|
2007-11-02 18:56:58 +00:00
|
|
|
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
/* Constants on the stack */
|
|
|
|
|
2017-12-08 00:24:09 +00:00
|
|
|
CONST_N8000 = { INT val; } 4 val.
|
|
|
|
CONST_N7FFF_N0001 = { INT val; } 4 val.
|
|
|
|
CONST_0000_7FFF = { INT val; } 4 val.
|
|
|
|
CONST_8000 = { INT val; } 4 val.
|
|
|
|
CONST_8001_FFFF = { INT val; } 4 val.
|
2017-12-18 17:36:10 +00:00
|
|
|
CONST_HI_ZR = { INT val; } 4 val.
|
|
|
|
CONST_HI_LO = { INT val; } 4 val.
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
/* Expression partial results */
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
SEX_B = { GPR reg; } 4. /* sign extension */
|
|
|
|
SEX_H = { GPR reg; } 4.
|
|
|
|
|
In PowerPC ncg, allocate register for ha16[label].
Use it to generate code like
lis r12,ha16[__II0]
lis r11,ha16[_f]
lfs f1,lo16[_f](r11)
lfs f2,lo16[__II0](r12)
fadds f13,f2,f1
stfs f13,lo16[_f](r11)
Here ncg has allocated r11 for ha16[_f]. We use r11 in lfs and again
in stfs. Before this change, we needed an extra lis before stfs,
because ncg did not remember that ha16[_f] was in a register.
This example has a gap between ha16[__II0] and lo16[__II0], because
the lo16 is not in the next instruction. This requires my previous
commit 1bf58cf for RELOLIS. There is a gap because ncg emits the lis
as soon as I allocate it. The "lfs f2,lo16[__II0](r12)" happens in a
coercion from IND_RL_W to FSREG. The coercion allocates one FSREG but
may not allocate any other registers. So I must allocate r12 earlier.
I allocate r12 in pat lae, but this causes a gap.
2017-02-08 17:23:06 +00:00
|
|
|
SUM_RIS = { GPR reg; INT offhi; } 4. /* reg + (offhi << 16) */
|
|
|
|
SUM_RC = { GPR reg; INT off; } 4. /* reg + off */
|
|
|
|
SUM_RL = { GPR reg; ADDR adr; } 4. /* reg + lo16[adr] */
|
|
|
|
SUM_RR = { GPR reg1; GPR reg2; } 4. /* reg1 + reg2 */
|
2016-12-09 21:36:42 +00:00
|
|
|
|
Use subfic (val - reg) and mulli (reg * val).
In the instruction list, put /* kills xer */ for sraw, srawi, subfic;
and correct the (now unused) "addi." and "lfdu".
Change MACHOPT_F from -m3 to -m2. This changes the code for 15 * i
from
slwi r3,r4,4
subfic r5,r4,0
add r3,r3,r5
to
mulli r3,r4,15
If the sequence "slwi subfic addi" takes 3 cycles and 12 bytes, and
mulli takes 3 cycles and 4 bytes, then mulli is better.
2018-01-27 20:33:43 +00:00
|
|
|
SUB_CR = { INT val; GPR reg; } 4. /* val - reg */
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
SUB_RR = { GPR reg1; GPR reg2; } 4. /* reg1 - reg2 */
|
|
|
|
NEG_R = { GPR reg; } 4. /* -reg */
|
Use subfic (val - reg) and mulli (reg * val).
In the instruction list, put /* kills xer */ for sraw, srawi, subfic;
and correct the (now unused) "addi." and "lfdu".
Change MACHOPT_F from -m3 to -m2. This changes the code for 15 * i
from
slwi r3,r4,4
subfic r5,r4,0
add r3,r3,r5
to
mulli r3,r4,15
If the sequence "slwi subfic addi" takes 3 cycles and 12 bytes, and
mulli takes 3 cycles and 4 bytes, then mulli is better.
2018-01-27 20:33:43 +00:00
|
|
|
MUL_RC = { GPR reg; INT val; } 4. /* reg * val */
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
MUL_RR = { GPR reg1; GPR reg2; } 4. /* reg1 * reg2 */
|
|
|
|
DIV_RR = { GPR reg1; GPR reg2; } 4. /* reg1 / reg2 signed */
|
|
|
|
DIV_RR_U = { GPR reg1; GPR reg2; } 4. /* reg1 / reg2 unsigned */
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
/* Indirect loads and stores */
|
|
|
|
|
2017-02-02 15:48:25 +00:00
|
|
|
IND_RC_B = { GPR reg; INT off; } 4 off "(" reg ")".
|
Use ha16/lo16 to load or store 1, 2, 8 bytes from labels.
Add the tokens IND_RL_B, IND_RL_H, IND_RL_H_S, IND_RL_D, along with
the rules to use them. These rules emit shorter code. For example,
loading a byte becomes lis, lbz instead of lis, addi, lbz.
While making this, I wrongly set IND_RL_D to size 4. Then ncg made
infinite recursion in codegen() and stackupto(), until it crashed by
stack overflow. I correctly set IND_RL_D to size 8, preventing the
crash.
2017-02-08 17:31:14 +00:00
|
|
|
IND_RL_B = { GPR reg; ADDR adr; } 4 "lo16[" adr "](" reg ")".
|
2017-02-02 15:48:25 +00:00
|
|
|
IND_RR_B = { GPR reg1; GPR reg2; } 4.
|
|
|
|
IND_RC_H = { GPR reg; INT off; } 4 off "(" reg ")".
|
Use ha16/lo16 to load or store 1, 2, 8 bytes from labels.
Add the tokens IND_RL_B, IND_RL_H, IND_RL_H_S, IND_RL_D, along with
the rules to use them. These rules emit shorter code. For example,
loading a byte becomes lis, lbz instead of lis, addi, lbz.
While making this, I wrongly set IND_RL_D to size 4. Then ncg made
infinite recursion in codegen() and stackupto(), until it crashed by
stack overflow. I correctly set IND_RL_D to size 8, preventing the
crash.
2017-02-08 17:31:14 +00:00
|
|
|
IND_RL_H = { GPR reg; ADDR adr; } 4 "lo16[" adr "](" reg ")".
|
2017-02-02 15:48:25 +00:00
|
|
|
IND_RR_H = { GPR reg1; GPR reg2; } 4.
|
|
|
|
IND_RC_H_S = { GPR reg; INT off; } 4 off "(" reg ")".
|
Use ha16/lo16 to load or store 1, 2, 8 bytes from labels.
Add the tokens IND_RL_B, IND_RL_H, IND_RL_H_S, IND_RL_D, along with
the rules to use them. These rules emit shorter code. For example,
loading a byte becomes lis, lbz instead of lis, addi, lbz.
While making this, I wrongly set IND_RL_D to size 4. Then ncg made
infinite recursion in codegen() and stackupto(), until it crashed by
stack overflow. I correctly set IND_RL_D to size 8, preventing the
crash.
2017-02-08 17:31:14 +00:00
|
|
|
IND_RL_H_S = { GPR reg; ADDR adr; } 4 "lo16[" adr "](" reg ")".
|
2017-02-02 15:48:25 +00:00
|
|
|
IND_RR_H_S = { GPR reg1; GPR reg2; } 4.
|
|
|
|
IND_RC_W = { GPR reg; INT off; } 4 off "(" reg ")".
|
|
|
|
IND_RL_W = { GPR reg; ADDR adr; } 4 "lo16[" adr "](" reg ")".
|
|
|
|
IND_RR_W = { GPR reg1; GPR reg2; } 4.
|
|
|
|
IND_RC_D = { GPR reg; INT off; } 8 off "(" reg ")".
|
Use ha16/lo16 to load or store 1, 2, 8 bytes from labels.
Add the tokens IND_RL_B, IND_RL_H, IND_RL_H_S, IND_RL_D, along with
the rules to use them. These rules emit shorter code. For example,
loading a byte becomes lis, lbz instead of lis, addi, lbz.
While making this, I wrongly set IND_RL_D to size 4. Then ncg made
infinite recursion in codegen() and stackupto(), until it crashed by
stack overflow. I correctly set IND_RL_D to size 8, preventing the
crash.
2017-02-08 17:31:14 +00:00
|
|
|
IND_RL_D = { GPR reg; ADDR adr; } 8 "lo16[" adr "](" reg ")".
|
2017-02-02 15:48:25 +00:00
|
|
|
IND_RR_D = { GPR reg1; GPR reg2; } 8.
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
/* Local variables in frame */
|
|
|
|
|
|
|
|
FRAME_B = { INT level; GPR reg; INT off; INT size; }
|
|
|
|
4 off "(" reg ")".
|
|
|
|
FRAME_H = { INT level; GPR reg; INT off; INT size; }
|
|
|
|
4 off "(" reg ")".
|
|
|
|
FRAME_H_S = { INT level; GPR reg; INT off; INT size; }
|
|
|
|
4 off "(" reg ")".
|
|
|
|
FRAME_W = { INT level; GPR reg; INT off; INT size; }
|
|
|
|
4 off "(" reg ")".
|
|
|
|
FRAME_D = { INT level; GPR reg; INT off; INT size; }
|
|
|
|
8 off "(" reg ")".
|
|
|
|
|
|
|
|
LXFRAME = { INT level; } 4.
|
|
|
|
|
|
|
|
/* Bitwise logic */
|
|
|
|
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
NOT_R = { GPR reg; } 4. /* ~reg */
|
|
|
|
AND_RIS = { GPR reg; INT valhi; } 4.
|
|
|
|
AND_RC = { GPR reg; INT val; } 4.
|
|
|
|
AND_RR = { GPR reg1; GPR reg2; } 4.
|
|
|
|
ANDC_RR = { GPR reg1; GPR reg2; } 4. /* reg1 & ~reg2 */
|
|
|
|
OR_RIS = { GPR reg; INT valhi; } 4.
|
|
|
|
OR_RC = { GPR reg; INT val; } 4.
|
|
|
|
OR_RR = { GPR reg1; GPR reg2; } 4.
|
|
|
|
ORC_RR = { GPR reg1; GPR reg2; } 4. /* reg1 | ~reg2 */
|
|
|
|
XOR_RIS = { GPR reg; INT valhi; } 4.
|
|
|
|
XOR_RC = { GPR reg; INT val; } 4.
|
|
|
|
XOR_RR = { GPR reg1; GPR reg2; } 4.
|
|
|
|
NAND_RR = { GPR reg1; GPR reg2; } 4. /* ~(reg1 & reg2) */
|
|
|
|
NOR_RR = { GPR reg1; GPR reg2; } 4. /* ~(reg1 | reg2) */
|
|
|
|
EQV_RR = { GPR reg1; GPR reg2; } 4. /* ~(reg1 ^ reg2) */
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
/* Comparisons */
|
|
|
|
|
2017-01-26 00:08:55 +00:00
|
|
|
COND_RC = { GPR reg; INT val; } 4.
|
|
|
|
COND_RR = { GPR reg1; GPR reg2; } 4.
|
|
|
|
CONDL_RC = { GPR reg; INT val; } 4.
|
|
|
|
CONDL_RR = { GPR reg1; GPR reg2; } 4.
|
|
|
|
COND_FS = { FSREG reg1; FSREG reg2; } 4.
|
|
|
|
COND_FD = { FREG reg1; FREG reg2; } 4.
|
|
|
|
|
|
|
|
XEQ = { GPR reg; } 4.
|
|
|
|
XNE = { GPR reg; } 4.
|
|
|
|
XGT = { GPR reg; } 4.
|
|
|
|
XGE = { GPR reg; } 4.
|
|
|
|
XLT = { GPR reg; } 4.
|
|
|
|
XLE = { GPR reg; } 4.
|
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
SETS
|
|
|
|
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
/* signed 16-bit integer */
|
|
|
|
CONST2 = CONST_N8000 + CONST_N7FFF_N0001 + CONST_0000_7FFF.
|
|
|
|
/* integer that, when negated, fits signed 16-bit */
|
|
|
|
CONST2_WHEN_NEG = CONST_N7FFF_N0001 + CONST_0000_7FFF + CONST_8000.
|
|
|
|
/* unsigned 16-bit integer */
|
|
|
|
UCONST2 = CONST_0000_7FFF + CONST_8000 + CONST_8001_FFFF.
|
|
|
|
/* any constant on stack */
|
2017-02-08 17:12:28 +00:00
|
|
|
CONST_STACK = CONST_N8000 + CONST_N7FFF_N0001 + CONST_0000_7FFF +
|
2017-12-18 17:36:10 +00:00
|
|
|
CONST_8000 + CONST_8001_FFFF +
|
|
|
|
CONST_HI_ZR + CONST_HI_LO.
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
|
2017-12-08 00:24:09 +00:00
|
|
|
CONST = C + CONST_STACK.
|
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
SET_RC_B = IND_RC_B + IND_RL_B + FRAME_B.
|
|
|
|
SET_RC_H = IND_RC_H + IND_RL_H + FRAME_H.
|
|
|
|
SET_RC_H_S = IND_RC_H_S + IND_RL_H_S + FRAME_H_S.
|
|
|
|
SET_RC_W = IND_RC_W + IND_RL_W + FRAME_W.
|
|
|
|
SET_RC_D = IND_RC_D + IND_RL_D + FRAME_D.
|
|
|
|
|
|
|
|
IND_ALL_B = IND_RC_B + IND_RL_B + IND_RR_B.
|
|
|
|
IND_ALL_H = IND_RC_H + IND_RL_H + IND_RR_H +
|
|
|
|
IND_RC_H_S + IND_RL_H_S + IND_RR_H_S.
|
|
|
|
IND_ALL_W = IND_RC_W + IND_RL_W + IND_RR_W.
|
|
|
|
IND_ALL_D = IND_RC_D + IND_RL_D + IND_RR_D.
|
|
|
|
IND_V = IND_ALL_B + IND_ALL_H + IND_ALL_W + IND_ALL_D.
|
|
|
|
|
|
|
|
FRAME_V = FRAME_B + FRAME_H + FRAME_H_S + FRAME_W + FRAME_D.
|
2016-10-16 22:13:39 +00:00
|
|
|
|
|
|
|
/* anything killed by sti (store indirect) */
|
2017-12-22 22:04:16 +00:00
|
|
|
MEMORY = IND_V + FRAME_V.
|
2016-10-16 22:13:39 +00:00
|
|
|
|
2017-12-14 21:26:19 +00:00
|
|
|
/* any integer from stack that we can easily move to GPR */
|
2018-01-23 23:18:40 +00:00
|
|
|
INT_W = SPFP + REG + CONST_STACK + SEX_B + SEX_H +
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
SUM_RIS + SUM_RC + SUM_RL + SUM_RR +
|
Use subfic (val - reg) and mulli (reg * val).
In the instruction list, put /* kills xer */ for sraw, srawi, subfic;
and correct the (now unused) "addi." and "lfdu".
Change MACHOPT_F from -m3 to -m2. This changes the code for 15 * i
from
slwi r3,r4,4
subfic r5,r4,0
add r3,r3,r5
to
mulli r3,r4,15
If the sequence "slwi subfic addi" takes 3 cycles and 12 bytes, and
mulli takes 3 cycles and 4 bytes, then mulli is better.
2018-01-27 20:33:43 +00:00
|
|
|
SUB_CR + SUB_RR + NEG_R +
|
|
|
|
MUL_RC + MUL_RR + DIV_RR + DIV_RR_U +
|
2017-12-14 21:26:19 +00:00
|
|
|
IND_ALL_B + IND_ALL_H + IND_ALL_W +
|
Use subfic (val - reg) and mulli (reg * val).
In the instruction list, put /* kills xer */ for sraw, srawi, subfic;
and correct the (now unused) "addi." and "lfdu".
Change MACHOPT_F from -m3 to -m2. This changes the code for 15 * i
from
slwi r3,r4,4
subfic r5,r4,0
add r3,r3,r5
to
mulli r3,r4,15
If the sequence "slwi subfic addi" takes 3 cycles and 12 bytes, and
mulli takes 3 cycles and 4 bytes, then mulli is better.
2018-01-27 20:33:43 +00:00
|
|
|
FRAME_B + FRAME_H + FRAME_H_S + FRAME_W +
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
NOT_R + AND_RIS + AND_RC + AND_RR + ANDC_RR +
|
|
|
|
OR_RIS + OR_RC + OR_RR + ORC_RR +
|
|
|
|
XOR_RIS + XOR_RC + XOR_RR + NAND_RR + NOR_RR + EQV_RR +
|
|
|
|
XEQ + XNE + XGT + XGE + XLT + XLE.
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
FLOAT_D = FREG + IND_ALL_D + FRAME_D.
|
|
|
|
FLOAT_W = FSREG + IND_ALL_W + FRAME_W.
|
2017-12-14 21:26:19 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
INSTRUCTIONS
|
|
|
|
|
2016-10-17 18:57:21 +00:00
|
|
|
/* We give time as cycles of total latency from Freescale
|
|
|
|
* Semiconductor, MPC7450 RISC Microprocessor Family Reference
|
|
|
|
* Manual, Rev. 5, section 6.6.
|
|
|
|
*
|
|
|
|
* We have only 4-byte alignment for doubles; 8-byte alignment is
|
|
|
|
* optimal. We guess the misalignment penalty by adding 1 cycle to
|
|
|
|
* the cost of loading or storing a double:
|
|
|
|
* lfd lfdu lfdx: 4 -> 5
|
|
|
|
* stfd stfdu stfdx: 3 -> 4
|
|
|
|
*/
|
|
|
|
cost(4, 1) /* space, time */
|
|
|
|
|
2016-10-07 02:59:27 +00:00
|
|
|
add GPR:wo, GPR:ro, GPR:ro.
|
Use subfic (val - reg) and mulli (reg * val).
In the instruction list, put /* kills xer */ for sraw, srawi, subfic;
and correct the (now unused) "addi." and "lfdu".
Change MACHOPT_F from -m3 to -m2. This changes the code for 15 * i
from
slwi r3,r4,4
subfic r5,r4,0
add r3,r3,r5
to
mulli r3,r4,15
If the sequence "slwi subfic addi" takes 3 cycles and 12 bytes, and
mulli takes 3 cycles and 4 bytes, then mulli is better.
2018-01-27 20:33:43 +00:00
|
|
|
addX "add." GPR:wo:cc, GPR:ro, GPR:ro.
|
2017-02-02 15:48:25 +00:00
|
|
|
addi GPR:wo, GPR:ro, CONST+LABEL_LO:ro.
|
2017-02-08 17:12:28 +00:00
|
|
|
li GPR:wo, CONST:ro.
|
|
|
|
addis GPR:wo, GPR:ro, CONST+LABEL_HI+LABEL_HA:ro.
|
|
|
|
lis GPR:wo, CONST+LABEL_HI+LABEL_HA:ro.
|
2016-10-07 02:59:27 +00:00
|
|
|
and GPR:wo, GPR:ro, GPR:ro.
|
|
|
|
andc GPR:wo, GPR:ro, GPR:ro.
|
2016-10-17 18:57:21 +00:00
|
|
|
andiX "andi." GPR:wo:cc, GPR:ro, CONST:ro.
|
|
|
|
andisX "andis." GPR:wo:cc, GPR:ro, CONST:ro.
|
2007-11-02 18:56:58 +00:00
|
|
|
b LABEL:ro.
|
|
|
|
bc CONST:ro, CONST:ro, LABEL:ro.
|
2017-02-12 00:30:12 +00:00
|
|
|
bdnz LABEL:ro.
|
2017-01-26 00:08:55 +00:00
|
|
|
beq LABEL:ro.
|
|
|
|
bne LABEL:ro.
|
|
|
|
bgt LABEL:ro.
|
|
|
|
bge LABEL:ro.
|
|
|
|
blt LABEL:ro.
|
|
|
|
ble LABEL:ro.
|
|
|
|
bxx LABEL:ro. /* dummy */
|
2007-11-02 18:56:58 +00:00
|
|
|
bcctr CONST:ro, CONST:ro, CONST:ro.
|
2017-01-26 00:08:55 +00:00
|
|
|
bctr.
|
2007-11-02 18:56:58 +00:00
|
|
|
bcctrl CONST:ro, CONST:ro, CONST:ro.
|
2017-01-26 00:08:55 +00:00
|
|
|
bctrl.
|
2007-11-02 18:56:58 +00:00
|
|
|
bclr CONST:ro, CONST:ro, CONST:ro.
|
2017-02-17 02:18:39 +00:00
|
|
|
blr.
|
2007-11-02 18:56:58 +00:00
|
|
|
bl LABEL:ro.
|
2017-12-10 19:01:14 +00:00
|
|
|
cmp CR:wo, CONST:ro, GPR:ro, GPR:ro kills :cc.
|
2017-01-26 00:08:55 +00:00
|
|
|
cmpw GPR:ro, GPR:ro kills :cc.
|
2017-12-10 19:01:14 +00:00
|
|
|
cmpi CR:wo, CONST:ro, GPR:ro, CONST:ro kills :cc.
|
2017-01-26 00:08:55 +00:00
|
|
|
cmpwi GPR:ro, CONST:ro kills :cc.
|
2017-12-10 19:01:14 +00:00
|
|
|
cmpl CR:wo, CONST:ro, GPR:ro, GPR:ro kills :cc.
|
2017-01-26 00:08:55 +00:00
|
|
|
cmplw GPR:ro, GPR:ro kills :cc.
|
2017-12-10 19:01:14 +00:00
|
|
|
cmpli CR:wo, CONST:ro, GPR:ro, CONST:ro kills :cc.
|
2017-01-26 00:08:55 +00:00
|
|
|
cmplwi GPR:ro, CONST:ro kills :cc.
|
2016-10-17 18:57:21 +00:00
|
|
|
divw GPR:wo, GPR:ro, GPR:ro cost(4, 23).
|
|
|
|
divwu GPR:wo, GPR:ro, GPR:ro cost(4, 23).
|
2016-10-07 02:59:27 +00:00
|
|
|
eqv GPR:wo, GPR:ro, GPR:ro.
|
|
|
|
extsb GPR:wo, GPR:ro.
|
|
|
|
extsh GPR:wo, GPR:ro.
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
fadd FREG+DLOCAL:wo, FREG:ro, FREG:ro cost(4, 5).
|
2017-10-17 21:53:03 +00:00
|
|
|
fadds FSREG+LOCAL:wo, FSREG:ro, FSREG:ro cost(4, 5).
|
2017-01-26 00:08:55 +00:00
|
|
|
fcmpo CR:wo, FREG:ro, FREG:ro cost(4, 5).
|
|
|
|
fcmpo CR:wo, FSREG:ro, FSREG:ro cost(4, 5).
|
Add fef 4, fif 4. Improve fef 8, fif 8. Other float changes.
When I wrote fef 8, I forgot to test denormalized numbers. Oops. Now
fix two of my mistakes:
- When checking for zero, `extrwi r6, r3, 22, 12` needs to be
`extrwi r6, r3, 20, 12`. There are only 20 bits to extract.
- After the multiplication by 2**64, I forgot to put the fraction in
[0.5, 1) or (-1, 0.5] by setting IEEE exponent = 1022.
Teach fif 8 about signed zero and NaN.
In ncg/table, change cmf so NaN is not equal to any value, and comment
why ordered comparisons don't work with NaN. Also add cost for
fctwiz, remove extra `uses REG`.
Edit comment in cfu8.s because the conditional branch might be before
or after fctwiz.
2018-01-22 19:04:15 +00:00
|
|
|
fctiwz FREG:wo, FREG:ro cost(4, 5).
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
fdiv FREG+DLOCAL:wo, FREG:ro, FREG:ro cost(4, 35).
|
2017-10-17 21:53:03 +00:00
|
|
|
fdivs FSREG+LOCAL:wo, FSREG:ro, FSREG:ro cost(4, 21).
|
2017-10-17 18:15:33 +00:00
|
|
|
fmr FPR:wo, FPR:ro cost(4, 5).
|
|
|
|
fmr FSREG:wo, FSREG:ro cost(4, 5).
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
fmul FREG+DLOCAL:wo, FREG:ro, FREG:ro cost(4, 5).
|
2017-10-17 21:53:03 +00:00
|
|
|
fmuls FSREG+LOCAL:wo, FSREG:ro, FSREG:ro cost(4, 5).
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
fneg FREG+DLOCAL:wo, FREG:ro cost(4, 5).
|
2017-10-17 21:53:03 +00:00
|
|
|
fneg FSREG+LOCAL:wo, FSREG:ro cost(4, 5).
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
frsp FSREG+LOCAL:wo, FREG:ro cost(4, 5).
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
fsub FREG+DLOCAL:wo, FREG:ro, FREG:ro cost(4, 5).
|
2017-10-17 21:53:03 +00:00
|
|
|
fsubs FSREG+LOCAL:wo, FSREG:ro, FSREG:ro cost(4, 5).
|
2017-12-22 22:04:16 +00:00
|
|
|
lbz GPR:wo, SET_RC_B:ro cost(4, 3).
|
2016-10-17 18:57:21 +00:00
|
|
|
lbzx GPR:wo, GPR:ro, GPR:ro cost(4, 3).
|
2017-12-22 22:04:16 +00:00
|
|
|
lfd FPR+DLOCAL:wo, SET_RC_D:ro cost(4, 5).
|
Use subfic (val - reg) and mulli (reg * val).
In the instruction list, put /* kills xer */ for sraw, srawi, subfic;
and correct the (now unused) "addi." and "lfdu".
Change MACHOPT_F from -m3 to -m2. This changes the code for 15 * i
from
slwi r3,r4,4
subfic r5,r4,0
add r3,r3,r5
to
mulli r3,r4,15
If the sequence "slwi subfic addi" takes 3 cycles and 12 bytes, and
mulli takes 3 cycles and 4 bytes, then mulli is better.
2018-01-27 20:33:43 +00:00
|
|
|
lfdu FPR:wo, IND_RC_D:rw cost(4, 5).
|
2017-10-17 18:15:33 +00:00
|
|
|
lfdx FPR:wo, GPR:ro, GPR:ro cost(4, 5).
|
2017-12-22 22:04:16 +00:00
|
|
|
lfs FSREG+LOCAL:wo, SET_RC_W:ro cost(4, 4).
|
2017-02-08 17:12:28 +00:00
|
|
|
lfsu FSREG:wo, IND_RC_W:rw cost(4, 4).
|
2017-10-17 18:15:33 +00:00
|
|
|
lfsx FSREG:wo, GPR:ro, GPR:ro cost(4, 4).
|
2017-12-22 22:04:16 +00:00
|
|
|
lha GPR:wo, SET_RC_H_S:ro cost(4, 3).
|
2016-10-17 18:57:21 +00:00
|
|
|
lhax GPR:wo, GPR:ro, GPR:ro cost(4, 3).
|
2017-12-22 22:04:16 +00:00
|
|
|
lhz GPR:wo, SET_RC_H:ro cost(4, 3).
|
2016-10-17 18:57:21 +00:00
|
|
|
lhzx GPR:wo, GPR:ro, GPR:ro cost(4, 3).
|
2018-01-05 22:55:50 +00:00
|
|
|
lwz GPR+LOCAL:wo, SET_RC_W:ro cost(4, 3).
|
2017-10-19 16:44:46 +00:00
|
|
|
lwzu GPR:wo, IND_RC_W:rw cost(4, 3).
|
2016-10-17 18:57:21 +00:00
|
|
|
lwzx GPR:wo, GPR:ro, GPR:ro cost(4, 3).
|
|
|
|
mfcr GPR:wo cost(4,2).
|
|
|
|
mfspr GPR:wo, SPR:ro cost(4, 3).
|
|
|
|
mtspr SPR:wo, GPR:ro cost(4, 2).
|
Use subfic (val - reg) and mulli (reg * val).
In the instruction list, put /* kills xer */ for sraw, srawi, subfic;
and correct the (now unused) "addi." and "lfdu".
Change MACHOPT_F from -m3 to -m2. This changes the code for 15 * i
from
slwi r3,r4,4
subfic r5,r4,0
add r3,r3,r5
to
mulli r3,r4,15
If the sequence "slwi subfic addi" takes 3 cycles and 12 bytes, and
mulli takes 3 cycles and 4 bytes, then mulli is better.
2018-01-27 20:33:43 +00:00
|
|
|
mulli GPR:wo, GPR:ro, CONST:ro cost(4, 3).
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
mullw GPR:wo, GPR:ro, GPR:ro cost(4, 4).
|
|
|
|
nand GPR:wo, GPR:ro, GPR:ro.
|
|
|
|
neg GPR:wo, GPR:ro.
|
|
|
|
nor GPR:wo, GPR:ro, GPR:ro.
|
2016-10-07 02:59:27 +00:00
|
|
|
or GPR:wo, GPR:ro, GPR:ro.
|
2017-02-10 16:45:50 +00:00
|
|
|
mr GPR:wo, GPR:ro.
|
|
|
|
orX "or." GPR:wo:cc, GPR:ro, GPR:ro.
|
2017-12-23 03:32:16 +00:00
|
|
|
mrX_readonly "mr." GPR:ro:cc, GPR:ro.
|
2016-10-07 02:59:27 +00:00
|
|
|
orc GPR:wo, GPR:ro, GPR:ro.
|
2017-02-08 17:12:28 +00:00
|
|
|
ori GPR:wo, GPR:ro, CONST+LABEL_LO:ro.
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
oris GPR:wo, GPR:ro, CONST:ro.
|
2016-10-07 02:59:27 +00:00
|
|
|
rlwinm GPR:wo, GPR:ro, CONST:ro, CONST:ro, CONST:ro.
|
2017-01-26 00:08:55 +00:00
|
|
|
extlwi GPR:wo, GPR:ro, CONST:ro, CONST:ro.
|
|
|
|
extrwi GPR:wo, GPR:ro, CONST:ro, CONST:ro.
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
rotlwi GPR+LOCAL:wo, GPR:ro, CONST:ro.
|
|
|
|
rotrwi GPR+LOCAL:wo, GPR:ro, CONST:ro.
|
|
|
|
slwi GPR+LOCAL:wo, GPR:ro, CONST:ro.
|
|
|
|
srwi GPR+LOCAL:wo, GPR:ro, CONST:ro.
|
2017-12-07 22:16:21 +00:00
|
|
|
rlwnm GPR:wo, GPR:ro, GPR:ro, CONST:ro, CONST:ro.
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
rotlw GPR+LOCAL:wo, GPR:ro, GPR:ro.
|
|
|
|
slw GPR+LOCAL:wo, GPR:ro, GPR:ro.
|
Use subfic (val - reg) and mulli (reg * val).
In the instruction list, put /* kills xer */ for sraw, srawi, subfic;
and correct the (now unused) "addi." and "lfdu".
Change MACHOPT_F from -m3 to -m2. This changes the code for 15 * i
from
slwi r3,r4,4
subfic r5,r4,0
add r3,r3,r5
to
mulli r3,r4,15
If the sequence "slwi subfic addi" takes 3 cycles and 12 bytes, and
mulli takes 3 cycles and 4 bytes, then mulli is better.
2018-01-27 20:33:43 +00:00
|
|
|
sraw GPR+LOCAL:wo, GPR:ro, GPR:ro /* kills xer */ cost(4, 2).
|
|
|
|
srawi GPR+LOCAL:wo, GPR:ro, CONST:ro /* kills xer */ cost(4, 2).
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
srw GPR+LOCAL:wo, GPR:ro, GPR:ro.
|
2017-12-22 22:04:16 +00:00
|
|
|
stb GPR:ro, SET_RC_B:rw cost(4, 3).
|
2016-10-17 18:57:21 +00:00
|
|
|
stbx GPR:ro, GPR:ro, GPR:ro cost(4, 3).
|
2017-12-22 22:04:16 +00:00
|
|
|
stfd FPR:ro, SET_RC_D:rw cost(4, 4).
|
2017-10-17 18:15:33 +00:00
|
|
|
stfdu FPR:ro, IND_RC_D:rw cost(4, 4).
|
2016-10-17 18:57:21 +00:00
|
|
|
stfdx FPR:ro, GPR:ro, GPR:ro cost(4, 4).
|
2017-12-22 22:04:16 +00:00
|
|
|
stfs FSREG:ro, SET_RC_W:rw cost(4, 3).
|
2017-10-17 18:15:33 +00:00
|
|
|
stfsu FSREG:ro, IND_RC_W:rw cost(4, 3).
|
2016-10-17 18:57:21 +00:00
|
|
|
stfsx FSREG:ro, GPR:ro, GPR:ro cost(4, 3).
|
2017-12-22 22:04:16 +00:00
|
|
|
sth GPR:ro, SET_RC_H:rw cost(4, 3).
|
2016-10-17 18:57:21 +00:00
|
|
|
sthx GPR:ro, GPR:ro, GPR:ro cost(4, 3).
|
2017-12-22 22:04:16 +00:00
|
|
|
stw GPR:ro, SET_RC_W:rw cost(4, 3).
|
2016-10-17 18:57:21 +00:00
|
|
|
stwx GPR:ro, GPR:ro, GPR:ro cost(4, 3).
|
2017-10-17 18:15:33 +00:00
|
|
|
stwu GPR:ro, IND_RC_W:rw cost(4, 3).
|
2018-01-05 22:55:50 +00:00
|
|
|
subf GPR:wo, GPR:ro, GPR:ro.
|
Use subfic (val - reg) and mulli (reg * val).
In the instruction list, put /* kills xer */ for sraw, srawi, subfic;
and correct the (now unused) "addi." and "lfdu".
Change MACHOPT_F from -m3 to -m2. This changes the code for 15 * i
from
slwi r3,r4,4
subfic r5,r4,0
add r3,r3,r5
to
mulli r3,r4,15
If the sequence "slwi subfic addi" takes 3 cycles and 12 bytes, and
mulli takes 3 cycles and 4 bytes, then mulli is better.
2018-01-27 20:33:43 +00:00
|
|
|
subfic GPR:wo, GPR:ro, CONST:ro /* kills xer */.
|
2016-10-07 02:59:27 +00:00
|
|
|
xor GPR:wo, GPR:ro, GPR:ro.
|
|
|
|
xori GPR:wo, GPR:ro, CONST:ro.
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
xoris GPR:wo, GPR:ro, CONST:ro.
|
2016-09-27 20:46:11 +00:00
|
|
|
|
2017-10-15 19:22:52 +00:00
|
|
|
bug ">>> BUG" LABEL:ro cost(0, 0).
|
2017-01-14 23:15:01 +00:00
|
|
|
comment "!" LABEL:ro cost(0, 0).
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
|
|
|
|
MOVES
|
|
|
|
|
|
|
|
from GPR to GPR
|
2017-02-10 16:45:50 +00:00
|
|
|
gen mr %2, %1
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-10-14 16:40:04 +00:00
|
|
|
from FSREG to FSREG
|
|
|
|
gen fmr %2, %1
|
|
|
|
|
2017-10-17 18:15:33 +00:00
|
|
|
from FPR to FPR
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
gen fmr %2, %1
|
|
|
|
|
2017-02-08 17:12:28 +00:00
|
|
|
/* Constants */
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-08 00:24:09 +00:00
|
|
|
from CONST smalls(%val) to GPR
|
2007-11-02 18:56:58 +00:00
|
|
|
gen
|
2017-02-08 17:12:28 +00:00
|
|
|
COMMENT("move CONST->GPR smalls")
|
2017-12-08 00:24:09 +00:00
|
|
|
li %2, %1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-08 00:24:09 +00:00
|
|
|
from CONST lo(%val)==0 to GPR
|
2007-11-02 18:56:58 +00:00
|
|
|
gen
|
2017-02-08 17:12:28 +00:00
|
|
|
COMMENT("move CONST->GPR shifted")
|
2017-12-08 00:24:09 +00:00
|
|
|
lis %2, {C, hi(%1.val)}
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
|
2017-12-08 00:24:09 +00:00
|
|
|
from CONST to GPR
|
2007-11-02 18:56:58 +00:00
|
|
|
gen
|
2017-02-08 17:12:28 +00:00
|
|
|
COMMENT("move CONST->GPR")
|
2017-12-08 00:24:09 +00:00
|
|
|
lis %2, {C, hi(%1.val)}
|
|
|
|
ori %2, %2, {C, lo(%1.val)}
|
|
|
|
/* Can't use addi %2, %2, {C, los(%1.val)}
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
* because %2 might be R0. */
|
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
from LABEL to GPR
|
|
|
|
gen
|
|
|
|
COMMENT("move LABEL->GPR")
|
2017-02-08 17:12:28 +00:00
|
|
|
lis %2, {LABEL_HI, %1.adr}
|
|
|
|
ori %2, %2, {LABEL_LO, %1.adr}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-02-02 15:48:25 +00:00
|
|
|
from LABEL_HA to GPR
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen lis %2, %1
|
2017-02-02 15:48:25 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
/* Sign extension */
|
|
|
|
|
|
|
|
from SEX_B to GPR
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen extsb %2, %1.reg
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
from SEX_H to GPR
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen extsh %2, %1.reg
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
/* Register + something */
|
|
|
|
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
from SUM_RIS to GPR
|
2017-12-08 00:24:09 +00:00
|
|
|
gen addis %2, %1.reg, {C, %1.offhi}
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
from SUM_RC to GPR
|
2017-12-08 00:24:09 +00:00
|
|
|
gen addi %2, %1.reg, {C, %1.off}
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
|
In PowerPC ncg, allocate register for ha16[label].
Use it to generate code like
lis r12,ha16[__II0]
lis r11,ha16[_f]
lfs f1,lo16[_f](r11)
lfs f2,lo16[__II0](r12)
fadds f13,f2,f1
stfs f13,lo16[_f](r11)
Here ncg has allocated r11 for ha16[_f]. We use r11 in lfs and again
in stfs. Before this change, we needed an extra lis before stfs,
because ncg did not remember that ha16[_f] was in a register.
This example has a gap between ha16[__II0] and lo16[__II0], because
the lo16 is not in the next instruction. This requires my previous
commit 1bf58cf for RELOLIS. There is a gap because ncg emits the lis
as soon as I allocate it. The "lfs f2,lo16[__II0](r12)" happens in a
coercion from IND_RL_W to FSREG. The coercion allocates one FSREG but
may not allocate any other registers. So I must allocate r12 earlier.
I allocate r12 in pat lae, but this causes a gap.
2017-02-08 17:23:06 +00:00
|
|
|
from SUM_RL to GPR
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen addi %2, %1.reg, {LABEL_LO, %1.adr}
|
In PowerPC ncg, allocate register for ha16[label].
Use it to generate code like
lis r12,ha16[__II0]
lis r11,ha16[_f]
lfs f1,lo16[_f](r11)
lfs f2,lo16[__II0](r12)
fadds f13,f2,f1
stfs f13,lo16[_f](r11)
Here ncg has allocated r11 for ha16[_f]. We use r11 in lfs and again
in stfs. Before this change, we needed an extra lis before stfs,
because ncg did not remember that ha16[_f] was in a register.
This example has a gap between ha16[__II0] and lo16[__II0], because
the lo16 is not in the next instruction. This requires my previous
commit 1bf58cf for RELOLIS. There is a gap because ncg emits the lis
as soon as I allocate it. The "lfs f2,lo16[__II0](r12)" happens in a
coercion from IND_RL_W to FSREG. The coercion allocates one FSREG but
may not allocate any other registers. So I must allocate r12 earlier.
I allocate r12 in pat lae, but this causes a gap.
2017-02-08 17:23:06 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
from SUM_RR to GPR
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen add %2, %1.reg1, %1.reg2
|
2016-10-07 02:59:27 +00:00
|
|
|
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
/* Other arithmetic */
|
|
|
|
|
Use subfic (val - reg) and mulli (reg * val).
In the instruction list, put /* kills xer */ for sraw, srawi, subfic;
and correct the (now unused) "addi." and "lfdu".
Change MACHOPT_F from -m3 to -m2. This changes the code for 15 * i
from
slwi r3,r4,4
subfic r5,r4,0
add r3,r3,r5
to
mulli r3,r4,15
If the sequence "slwi subfic addi" takes 3 cycles and 12 bytes, and
mulli takes 3 cycles and 4 bytes, then mulli is better.
2018-01-27 20:33:43 +00:00
|
|
|
from SUB_CR to GPR
|
|
|
|
/* val - reg -> subtract reg from val */
|
|
|
|
gen subfic %2, %1.reg, {C, %1.val}
|
|
|
|
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
from SUB_RR to GPR
|
|
|
|
/* reg1 - reg2 -> subtract reg2 from reg1 */
|
|
|
|
gen subf %2, %1.reg2, %1.reg1
|
|
|
|
|
|
|
|
from NEG_R to GPR
|
|
|
|
gen neg %2, %1.reg
|
|
|
|
|
Use subfic (val - reg) and mulli (reg * val).
In the instruction list, put /* kills xer */ for sraw, srawi, subfic;
and correct the (now unused) "addi." and "lfdu".
Change MACHOPT_F from -m3 to -m2. This changes the code for 15 * i
from
slwi r3,r4,4
subfic r5,r4,0
add r3,r3,r5
to
mulli r3,r4,15
If the sequence "slwi subfic addi" takes 3 cycles and 12 bytes, and
mulli takes 3 cycles and 4 bytes, then mulli is better.
2018-01-27 20:33:43 +00:00
|
|
|
from MUL_RC to GPR
|
|
|
|
gen mulli %2, %1.reg, {C, %1.val}
|
|
|
|
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
from MUL_RR to GPR
|
|
|
|
gen mullw %2, %1.reg1, %1.reg2
|
|
|
|
|
|
|
|
from DIV_RR to GPR
|
|
|
|
gen divw %2, %1.reg1, %1.reg2
|
|
|
|
|
|
|
|
from DIV_RR_U to GPR
|
|
|
|
gen divwu %2, %1.reg1, %1.reg2
|
|
|
|
|
2016-10-16 22:13:39 +00:00
|
|
|
/* Read byte */
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
from SET_RC_B to GPR
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen lbz %2, %1
|
2016-10-16 20:02:25 +00:00
|
|
|
|
2016-10-16 22:13:39 +00:00
|
|
|
from IND_RR_B to GPR
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen lbzx %2, %1.reg1, %1.reg2
|
2016-10-16 22:13:39 +00:00
|
|
|
|
|
|
|
/* Write byte */
|
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
from GPR to SET_RC_B
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen stb %1, %2
|
2016-10-16 20:02:25 +00:00
|
|
|
|
2016-10-16 22:13:39 +00:00
|
|
|
from GPR to IND_RR_B
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen stbx %1, %2.reg1, %2.reg2
|
2016-10-16 22:13:39 +00:00
|
|
|
|
|
|
|
/* Read halfword (short) */
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
from SET_RC_H to GPR
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen lhz %2, %1
|
2016-10-16 20:02:25 +00:00
|
|
|
|
2016-10-16 22:13:39 +00:00
|
|
|
from IND_RR_H to GPR
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen lhzx %2, %1.reg1, %1.reg2
|
2016-10-16 22:13:39 +00:00
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
from SET_RC_H_S to GPR
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen lha %2, %1
|
2016-10-16 20:02:25 +00:00
|
|
|
|
2016-10-16 22:13:39 +00:00
|
|
|
from IND_RR_H_S to GPR
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen lhax %2, %1.reg1, %1.reg2
|
2016-10-16 22:13:39 +00:00
|
|
|
|
|
|
|
/* Write halfword */
|
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
from GPR to SET_RC_H
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen sth %1, %2
|
2016-10-16 20:02:25 +00:00
|
|
|
|
2016-10-16 22:13:39 +00:00
|
|
|
from GPR to IND_RR_H
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen sthx %1, %2.reg1, %2.reg2
|
2016-10-16 22:13:39 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
/* Read word */
|
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
from SET_RC_W to GPR
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen lwz %2, %1
|
2017-02-02 15:48:25 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
from IND_RR_W to GPR
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen lwzx %2, %1.reg1, %1.reg2
|
2016-10-16 20:02:25 +00:00
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
from SET_RC_W to FSREG
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen lfs %2, %1
|
2017-02-02 15:48:25 +00:00
|
|
|
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
from IND_RR_W to FSREG
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen lfsx %2, %1.reg1, %1.reg2
|
2016-10-16 20:33:24 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
/* Write word */
|
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
from GPR to SET_RC_W
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen stw %1, %2
|
In PowerPC ncg, allocate register for ha16[label].
Use it to generate code like
lis r12,ha16[__II0]
lis r11,ha16[_f]
lfs f1,lo16[_f](r11)
lfs f2,lo16[__II0](r12)
fadds f13,f2,f1
stfs f13,lo16[_f](r11)
Here ncg has allocated r11 for ha16[_f]. We use r11 in lfs and again
in stfs. Before this change, we needed an extra lis before stfs,
because ncg did not remember that ha16[_f] was in a register.
This example has a gap between ha16[__II0] and lo16[__II0], because
the lo16 is not in the next instruction. This requires my previous
commit 1bf58cf for RELOLIS. There is a gap because ncg emits the lis
as soon as I allocate it. The "lfs f2,lo16[__II0](r12)" happens in a
coercion from IND_RL_W to FSREG. The coercion allocates one FSREG but
may not allocate any other registers. So I must allocate r12 earlier.
I allocate r12 in pat lae, but this causes a gap.
2017-02-08 17:23:06 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
from GPR to IND_RR_W
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen stwx %1, %2.reg1, %2.reg2
|
2016-10-16 20:02:25 +00:00
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
from FSREG to SET_RC_W
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen stfs %1, %2
|
In PowerPC ncg, allocate register for ha16[label].
Use it to generate code like
lis r12,ha16[__II0]
lis r11,ha16[_f]
lfs f1,lo16[_f](r11)
lfs f2,lo16[__II0](r12)
fadds f13,f2,f1
stfs f13,lo16[_f](r11)
Here ncg has allocated r11 for ha16[_f]. We use r11 in lfs and again
in stfs. Before this change, we needed an extra lis before stfs,
because ncg did not remember that ha16[_f] was in a register.
This example has a gap between ha16[__II0] and lo16[__II0], because
the lo16 is not in the next instruction. This requires my previous
commit 1bf58cf for RELOLIS. There is a gap because ncg emits the lis
as soon as I allocate it. The "lfs f2,lo16[__II0](r12)" happens in a
coercion from IND_RL_W to FSREG. The coercion allocates one FSREG but
may not allocate any other registers. So I must allocate r12 earlier.
I allocate r12 in pat lae, but this causes a gap.
2017-02-08 17:23:06 +00:00
|
|
|
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
from FSREG to IND_RR_W
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen stfsx %1, %2.reg1, %2.reg2
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
/* Read double */
|
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
from SET_RC_D to FPR
|
Use ha16/lo16 to load or store 1, 2, 8 bytes from labels.
Add the tokens IND_RL_B, IND_RL_H, IND_RL_H_S, IND_RL_D, along with
the rules to use them. These rules emit shorter code. For example,
loading a byte becomes lis, lbz instead of lis, addi, lbz.
While making this, I wrongly set IND_RL_D to size 4. Then ncg made
infinite recursion in codegen() and stackupto(), until it crashed by
stack overflow. I correctly set IND_RL_D to size 8, preventing the
crash.
2017-02-08 17:31:14 +00:00
|
|
|
gen lfd %2, %1
|
2016-10-16 20:02:25 +00:00
|
|
|
|
2017-10-17 18:15:33 +00:00
|
|
|
from IND_RR_D to FPR
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen lfdx %2, %1.reg1, %1.reg2
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
/* Write double */
|
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
from FPR to SET_RC_D
|
Use ha16/lo16 to load or store 1, 2, 8 bytes from labels.
Add the tokens IND_RL_B, IND_RL_H, IND_RL_H_S, IND_RL_D, along with
the rules to use them. These rules emit shorter code. For example,
loading a byte becomes lis, lbz instead of lis, addi, lbz.
While making this, I wrongly set IND_RL_D to size 4. Then ncg made
infinite recursion in codegen() and stackupto(), until it crashed by
stack overflow. I correctly set IND_RL_D to size 8, preventing the
crash.
2017-02-08 17:31:14 +00:00
|
|
|
gen stfd %1, %2
|
2007-11-02 18:56:58 +00:00
|
|
|
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
from FPR to IND_RR_D
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen stfdx %1, %2.reg1, %2.reg2
|
2016-10-16 20:33:24 +00:00
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
/* LXFRAME is a lexical frame from the static chain. We define a move
|
|
|
|
so "uses REG={LXFRAME, $1}" may find a register with the same
|
|
|
|
frame, and not repeat the move. This move can't search for a REG
|
|
|
|
with {LXFRAME, $1-1}, but must always start from fp. The static
|
|
|
|
chain, if it exists, is the argument at fp + EM_BSIZE. */
|
|
|
|
|
|
|
|
from LXFRAME %level==1 to REG
|
|
|
|
gen lwz %2, {IND_RC_W, fp, EM_BSIZE}
|
|
|
|
from LXFRAME %level==2 to REG
|
|
|
|
gen lwz %2, {IND_RC_W, fp, EM_BSIZE}
|
|
|
|
/* PowerPC can't add r0 + EM_BSIZE,
|
|
|
|
* so %2 must not be r0. */
|
|
|
|
lwz %2, {IND_RC_W, %2, EM_BSIZE}
|
|
|
|
from LXFRAME %level==3 to REG
|
|
|
|
gen lwz %2, {IND_RC_W, fp, EM_BSIZE}
|
|
|
|
lwz %2, {IND_RC_W, %2, EM_BSIZE}
|
|
|
|
lwz %2, {IND_RC_W, %2, EM_BSIZE}
|
|
|
|
from LXFRAME %level==4 to REG
|
|
|
|
gen lwz %2, {IND_RC_W, fp, EM_BSIZE}
|
|
|
|
lwz %2, {IND_RC_W, %2, EM_BSIZE}
|
|
|
|
lwz %2, {IND_RC_W, %2, EM_BSIZE}
|
|
|
|
lwz %2, {IND_RC_W, %2, EM_BSIZE}
|
|
|
|
from LXFRAME to REG /* assuming %level in 2 to 0x8000 */
|
|
|
|
gen li %2, {C, %1.level-1}
|
|
|
|
mtspr ctr, %2
|
|
|
|
lwz %2, {IND_RC_W, fp, EM_BSIZE}
|
|
|
|
1: lwz %2, {IND_RC_W, %2, EM_BSIZE}
|
|
|
|
bdnz {LABEL, "1b"}
|
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
/* Logicals */
|
|
|
|
|
|
|
|
from NOT_R to GPR
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen nor %2, %1.reg, %1.reg
|
2007-11-02 18:56:58 +00:00
|
|
|
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
from AND_RIS to GPR
|
|
|
|
gen andisX %2, %1.reg, {C, %1.valhi}
|
|
|
|
|
|
|
|
from AND_RC to GPR
|
|
|
|
gen andiX %2, %1.reg, {C, %1.val}
|
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
from AND_RR to GPR
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen and %2, %1.reg1, %1.reg2
|
2007-11-02 18:56:58 +00:00
|
|
|
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
from ANDC_RR to GPR
|
|
|
|
gen andc %2, %1.reg1, %1.reg2
|
2007-11-02 18:56:58 +00:00
|
|
|
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
from OR_RIS to GPR
|
2017-12-08 00:24:09 +00:00
|
|
|
gen oris %2, %1.reg, {C, %1.valhi}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
from OR_RC to GPR
|
2017-12-08 00:24:09 +00:00
|
|
|
gen ori %2, %1.reg, {C, %1.val}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
from OR_RR to GPR
|
|
|
|
gen or %2, %1.reg1, %1.reg2
|
|
|
|
|
|
|
|
from ORC_RR to GPR
|
|
|
|
gen orc %2, %1.reg1, %1.reg2
|
2007-11-02 18:56:58 +00:00
|
|
|
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
from XOR_RIS to GPR
|
2017-12-08 00:24:09 +00:00
|
|
|
gen xoris %2, %1.reg, {C, %1.valhi}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
from XOR_RC to GPR
|
2017-12-08 00:24:09 +00:00
|
|
|
gen xori %2, %1.reg, {C, %1.val}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
from XOR_RR to GPR
|
|
|
|
gen xor %2, %1.reg1, %1.reg2
|
|
|
|
|
|
|
|
from NAND_RR to GPR
|
|
|
|
gen nand %2, %1.reg1, %1.reg2
|
|
|
|
|
|
|
|
from NOR_RR to GPR
|
|
|
|
gen nor %2, %1.reg1, %1.reg2
|
|
|
|
|
|
|
|
from EQV_RR to GPR
|
|
|
|
gen eqv %2, %1.reg1, %1.reg2
|
|
|
|
|
2017-01-26 00:08:55 +00:00
|
|
|
/* Conditions */
|
|
|
|
|
|
|
|
/* Compare values, then copy cr0 to GPR. */
|
|
|
|
|
|
|
|
from COND_RC to GPR
|
|
|
|
gen
|
2017-12-08 00:24:09 +00:00
|
|
|
cmpwi %1.reg, {C, %1.val}
|
2017-01-26 00:08:55 +00:00
|
|
|
mfcr %2
|
|
|
|
|
|
|
|
from COND_RR to GPR
|
|
|
|
gen
|
|
|
|
cmpw %1.reg1, %1.reg2
|
|
|
|
mfcr %2
|
|
|
|
|
|
|
|
from CONDL_RC to GPR
|
|
|
|
gen
|
2017-12-08 00:24:09 +00:00
|
|
|
cmplwi %1.reg, {C, %1.val}
|
2017-01-26 00:08:55 +00:00
|
|
|
mfcr %2
|
|
|
|
|
|
|
|
from CONDL_RR to GPR
|
|
|
|
gen
|
|
|
|
cmplw %1.reg1, %1.reg2
|
|
|
|
mfcr %2
|
|
|
|
|
|
|
|
from COND_FS to GPR
|
|
|
|
gen
|
2017-02-13 22:44:46 +00:00
|
|
|
fcmpo cr0, %1.reg1, %1.reg2
|
2017-01-26 00:08:55 +00:00
|
|
|
mfcr %2
|
|
|
|
|
|
|
|
from COND_FD to GPR
|
|
|
|
gen
|
2017-02-13 22:44:46 +00:00
|
|
|
fcmpo cr0, %1.reg1, %1.reg2
|
2017-01-26 00:08:55 +00:00
|
|
|
mfcr %2
|
|
|
|
|
|
|
|
/* Given a copy of cr0 in %1.reg, extract a condition bit
|
|
|
|
* (lt, gt, eq) and perhaps flip it.
|
|
|
|
*/
|
|
|
|
|
|
|
|
from XEQ to GPR
|
|
|
|
gen
|
2017-12-08 00:24:09 +00:00
|
|
|
extrwi %2, %1.reg, {C, 1}, {C, 2}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
|
|
|
from XNE to GPR
|
|
|
|
gen
|
2017-12-08 00:24:09 +00:00
|
|
|
extrwi %2, %1.reg, {C, 1}, {C, 2}
|
|
|
|
xori %2, %2, {C, 1}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
|
|
|
from XGT to GPR
|
|
|
|
gen
|
2017-12-08 00:24:09 +00:00
|
|
|
extrwi %2, %1.reg, {C, 1}, {C, 1}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
|
|
|
from XGE to GPR
|
|
|
|
gen
|
2017-12-08 00:24:09 +00:00
|
|
|
extrwi %2, %1.reg, {C, 1}, {C, 0}
|
|
|
|
xori %2, %2, {C, 1}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
|
|
|
from XLT to GPR
|
|
|
|
gen
|
2017-12-08 00:24:09 +00:00
|
|
|
extrwi %2, %1.reg, {C, 1}, {C, 0}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
|
|
|
from XLE to GPR
|
|
|
|
gen
|
2017-12-08 00:24:09 +00:00
|
|
|
extrwi %2, %1.reg, {C, 1}, {C, 1}
|
|
|
|
xori %2, %2, {C, 1}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
/* REG_EXPR exists solely to allow us to use regvar() (which can only
|
2017-10-17 18:15:33 +00:00
|
|
|
be used in an expression) as a register constant. We can then use
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
our moves to GPR or REG to set register variables. This is easier
|
|
|
|
than defining moves to LOCAL, and avoids confusion between GPR and
|
|
|
|
FSREG in LOCAL. */
|
2007-11-02 18:56:58 +00:00
|
|
|
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
from INT_W + LXFRAME to REG_EXPR
|
2017-10-17 18:15:33 +00:00
|
|
|
gen move %1, %2.reg
|
|
|
|
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
from FLOAT_D to FREG_EXPR
|
2017-10-17 18:15:33 +00:00
|
|
|
gen move %1, %2.reg
|
|
|
|
|
2017-12-14 21:26:19 +00:00
|
|
|
from FLOAT_W to FSREG_EXPR
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen move %1, %2.reg
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
TESTS
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 03:32:16 +00:00
|
|
|
/* Given "mrX %1, %1", ncgg would say, "Instruction destroys
|
|
|
|
* %1, not allowed here". We use mrX_readonly to trick ncgg.
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
*/
|
2007-11-02 18:56:58 +00:00
|
|
|
to test GPR
|
|
|
|
gen
|
2017-12-23 03:32:16 +00:00
|
|
|
mrX_readonly %1, %1
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
|
|
|
|
STACKINGRULES
|
2016-09-27 20:46:11 +00:00
|
|
|
|
2018-01-23 23:18:40 +00:00
|
|
|
from SPFP+REG to STACK
|
2007-11-02 18:56:58 +00:00
|
|
|
gen
|
2018-01-23 23:18:40 +00:00
|
|
|
COMMENT("stack SPFP+REG")
|
2017-02-13 22:44:46 +00:00
|
|
|
stwu %1, {IND_RC_W, sp, 0-4}
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
|
2018-01-23 23:18:40 +00:00
|
|
|
from INT_W-SPFP-REG to STACK
|
2007-11-02 18:56:58 +00:00
|
|
|
gen
|
2018-01-23 23:18:40 +00:00
|
|
|
COMMENT("stack INT_W-SPFP-REG")
|
2016-10-07 00:47:42 +00:00
|
|
|
move %1, RSCRATCH
|
2017-02-13 22:44:46 +00:00
|
|
|
stwu RSCRATCH, {IND_RC_W, sp, 0-4}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-14 21:26:19 +00:00
|
|
|
from FLOAT_D-FREG to STACK
|
2007-11-02 18:56:58 +00:00
|
|
|
gen
|
2017-12-14 21:26:19 +00:00
|
|
|
COMMENT("stack FLOAT_D-FREG")
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
move %1, FSCRATCH
|
2017-02-13 22:44:46 +00:00
|
|
|
stfdu FSCRATCH, {IND_RC_D, sp, 0-8}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-10-15 19:22:52 +00:00
|
|
|
from FREG to STACK
|
2007-11-02 18:56:58 +00:00
|
|
|
gen
|
2017-10-17 18:15:33 +00:00
|
|
|
COMMENT("stack FREG")
|
2017-02-13 22:44:46 +00:00
|
|
|
stfdu %1, {IND_RC_D, sp, 0-8}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
from FSREG to STACK
|
2007-11-02 18:56:58 +00:00
|
|
|
gen
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
COMMENT("stack FSREG")
|
2017-02-13 22:44:46 +00:00
|
|
|
stfsu %1, {IND_RC_W, sp, 0-4}
|
2016-09-27 20:46:11 +00:00
|
|
|
|
2017-10-15 19:22:52 +00:00
|
|
|
/*
|
|
|
|
* We never stack LOCAL or DLOCAL tokens, because we only use
|
|
|
|
* them for register variables, so ncg pushes the register,
|
|
|
|
* not the token. These rules only prevent an error in ncgg.
|
|
|
|
*/
|
|
|
|
from LOCAL to STACK
|
|
|
|
gen bug {LABEL, "STACKING LOCAL"}
|
|
|
|
from DLOCAL to STACK
|
|
|
|
gen bug {LABEL, "STACKING DLOCAL"}
|
|
|
|
|
2016-09-27 20:46:11 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
COERCIONS
|
|
|
|
|
2018-01-05 22:55:50 +00:00
|
|
|
/* The unstacking coercions emit many "addi sp, sp, X"
|
|
|
|
* instructions; the target optimizer (top) will merge them.
|
|
|
|
*/
|
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
from STACK
|
|
|
|
uses REG
|
|
|
|
gen
|
|
|
|
COMMENT("coerce STACK->REG")
|
2017-02-13 22:44:46 +00:00
|
|
|
lwz %a, {IND_RC_W, sp, 0}
|
2017-12-08 00:24:09 +00:00
|
|
|
addi sp, sp, {C, 4}
|
2007-11-02 18:56:58 +00:00
|
|
|
yields %a
|
2016-10-18 00:31:59 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
from STACK
|
|
|
|
uses FREG
|
|
|
|
gen
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
COMMENT("coerce STACK->FREG")
|
2017-02-13 22:44:46 +00:00
|
|
|
lfd %a, {IND_RC_D, sp, 0}
|
2017-12-08 00:24:09 +00:00
|
|
|
addi sp, sp, {C, 8}
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
yields %a
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
from STACK
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
uses FSREG
|
2007-11-02 18:56:58 +00:00
|
|
|
gen
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
COMMENT("coerce STACK->FSREG")
|
2017-02-13 22:44:46 +00:00
|
|
|
lfs %a, {IND_RC_W, sp, 0}
|
2017-12-08 00:24:09 +00:00
|
|
|
addi sp, sp, {C, 4}
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
yields %a
|
2016-10-18 00:31:59 +00:00
|
|
|
|
2017-12-17 17:45:27 +00:00
|
|
|
/* "uses REG=%1" may find and reuse a register containing the
|
|
|
|
* same token. For contrast, "uses REG gen move %1, %a" would
|
|
|
|
* pick a different register before doing the move.
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
*
|
|
|
|
* "reusing %1" helps when coercing an INT_W token like
|
|
|
|
* {SUM_RC, r3, 0-4} to REG3, by not stacking the token.
|
|
|
|
*/
|
2017-12-17 17:45:27 +00:00
|
|
|
|
2017-12-14 21:26:19 +00:00
|
|
|
from INT_W
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, REG=%1
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
yields %a
|
2016-10-18 00:31:59 +00:00
|
|
|
|
2017-12-14 21:26:19 +00:00
|
|
|
from FLOAT_D
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, FREG=%1
|
2017-10-16 16:07:55 +00:00
|
|
|
yields %a
|
|
|
|
|
2017-12-14 21:26:19 +00:00
|
|
|
from FLOAT_W
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, FSREG=%1
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
yields %a
|
2016-10-18 00:31:59 +00:00
|
|
|
|
2017-12-19 01:59:04 +00:00
|
|
|
/* Splitting coercions can't allocate registers.
|
|
|
|
* PowerPC can't add r0 + constant. Use r12.
|
|
|
|
*/
|
|
|
|
|
|
|
|
from IND_RC_D %off<=0x7FFA
|
|
|
|
yields
|
|
|
|
{IND_RC_W, %1.reg, %1.off+4}
|
|
|
|
{IND_RC_W, %1.reg, %1.off}
|
|
|
|
|
|
|
|
from IND_RC_D
|
|
|
|
/* Don't move to %1.reg; it might be a regvar. */
|
|
|
|
gen move {SUM_RC, %1.reg, %1.off}, r12
|
|
|
|
yields {IND_RC_W, r12, 4} {IND_RC_W, r12, 0}
|
|
|
|
|
|
|
|
from IND_RR_D
|
|
|
|
gen move {SUM_RR, %1.reg1, %1.reg2}, r12
|
|
|
|
yields {IND_RC_W, r12, 4} {IND_RC_W, r12, 0}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
from FRAME_D %off<=0x7FFA
|
|
|
|
yields
|
|
|
|
{FRAME_W, %1.level, %1.reg, %1.off+4, 4}
|
|
|
|
{FRAME_W, %1.level, %1.reg, %1.off, 4}
|
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
PATTERNS
|
|
|
|
|
2017-12-09 22:21:06 +00:00
|
|
|
/* Constants */
|
2007-11-02 18:56:58 +00:00
|
|
|
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
pat loc $1==(0-0x8000) /* Load constant */
|
|
|
|
yields {CONST_N8000, $1}
|
|
|
|
pat loc $1>=(0-0x7FFF) && $1<=(0-1)
|
|
|
|
yields {CONST_N7FFF_N0001, $1}
|
|
|
|
pat loc $1>=0 && $1<=0x7FFF
|
|
|
|
yields {CONST_0000_7FFF, $1}
|
|
|
|
pat loc $1==0x8000
|
|
|
|
yields {CONST_8000, $1}
|
|
|
|
pat loc $1>=0x8001 && $1<=0xFFFF
|
|
|
|
yields {CONST_8001_FFFF, $1}
|
|
|
|
pat loc lo($1)==0
|
2017-12-18 17:36:10 +00:00
|
|
|
yields {CONST_HI_ZR, $1}
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
pat loc
|
2017-12-18 17:36:10 +00:00
|
|
|
yields {CONST_HI_LO, $1}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-12-09 22:21:06 +00:00
|
|
|
|
|
|
|
/* Stack shuffles */
|
|
|
|
|
|
|
|
/* The peephole optimizer does: loc $1 ass 4 -> asp $1
|
|
|
|
* To optimize multiplication, it uses: dup 8 asp 4
|
|
|
|
*/
|
|
|
|
|
|
|
|
pat asp $1==4 /* Adjust stack by constant */
|
2017-12-14 21:26:19 +00:00
|
|
|
with exact INT_W+FLOAT_W
|
2017-12-09 22:21:06 +00:00
|
|
|
/* drop %1 */
|
|
|
|
with STACK
|
|
|
|
gen addi sp, sp, {C, 4}
|
|
|
|
pat asp smalls($1)
|
|
|
|
with STACK
|
|
|
|
gen addi sp, sp, {C, $1}
|
|
|
|
pat asp lo($1)==0
|
|
|
|
with STACK
|
|
|
|
gen addi sp, sp, {C, hi($1)}
|
|
|
|
pat asp
|
|
|
|
with STACK
|
|
|
|
gen
|
|
|
|
addis sp, sp, {C, his($1)}
|
|
|
|
addi sp, sp, {C, los($1)}
|
|
|
|
|
|
|
|
pat ass $1==4 /* Adjust stack by variable */
|
|
|
|
with REG STACK
|
|
|
|
gen add sp, sp, %1
|
|
|
|
|
|
|
|
/* To duplicate a token, we coerce the token into a register,
|
|
|
|
* then duplicate the register. This decreases code size.
|
|
|
|
*/
|
|
|
|
|
|
|
|
pat dup $1==4 /* Duplicate word on top of stack */
|
|
|
|
with REG+FSREG
|
2016-09-30 15:50:50 +00:00
|
|
|
yields %1 %1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-09 22:21:06 +00:00
|
|
|
pat dup $1==8 /* Duplicate double-word */
|
|
|
|
with REG+FSREG REG+FSREG
|
2007-11-02 18:56:58 +00:00
|
|
|
yields %2 %1 %2 %1
|
2016-09-30 15:50:50 +00:00
|
|
|
with FREG
|
|
|
|
yields %1 %1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-09 22:21:06 +00:00
|
|
|
pat dup /* Duplicate other size */
|
|
|
|
leaving
|
|
|
|
loc $1
|
|
|
|
dus 4
|
|
|
|
|
|
|
|
pat dus $1==4 /* Duplicate variable size */
|
|
|
|
with REG STACK
|
|
|
|
/* ( a size%1 -- a a ) */
|
|
|
|
uses REG, REG
|
|
|
|
gen
|
|
|
|
srwi %a, %1, {C, 2}
|
|
|
|
mtspr ctr, %a
|
|
|
|
add %b, sp, %1
|
|
|
|
1: lwzu %a, {IND_RC_W, %b, 0-4}
|
|
|
|
stwu %a, {IND_RC_W, sp, 0-4}
|
|
|
|
bdnz {LABEL, "1b"}
|
|
|
|
|
|
|
|
pat exg $1==4 /* Exchange top two words */
|
2017-12-14 21:26:19 +00:00
|
|
|
with INT_W+FLOAT_W INT_W+FLOAT_W
|
2007-11-02 18:56:58 +00:00
|
|
|
yields %1 %2
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-09 22:21:06 +00:00
|
|
|
pat exg defined($1) /* Exchange other size */
|
|
|
|
leaving
|
|
|
|
loc $1
|
|
|
|
cal ".exg"
|
|
|
|
|
|
|
|
pat exg !defined($1)
|
|
|
|
leaving
|
|
|
|
cal ".exg"
|
|
|
|
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
/* Type conversions */
|
|
|
|
|
2017-12-09 22:21:06 +00:00
|
|
|
pat loc loc ciu /* signed -> unsigned */
|
2007-11-02 18:56:58 +00:00
|
|
|
leaving
|
|
|
|
loc $1
|
|
|
|
loc $2
|
|
|
|
cuu
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-09 22:21:06 +00:00
|
|
|
pat loc loc cui /* unsigned -> signed */
|
|
|
|
leaving
|
|
|
|
loc $1
|
|
|
|
loc $2
|
|
|
|
cuu
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-09 22:21:06 +00:00
|
|
|
pat loc loc cuu $1<=4 && $2<=4 /* unsigned -> unsigned */
|
2007-11-02 18:56:58 +00:00
|
|
|
/* nop */
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-09 22:21:06 +00:00
|
|
|
pat loc loc cii $1<=4 && $2<=$1
|
|
|
|
/* signed -> signed of smaller or same size,
|
|
|
|
* no sign extension */
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-09 22:21:06 +00:00
|
|
|
pat loc loc cii $1==1 && $2<=4 /* sign-extend char */
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with REG
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {SEX_B, %1}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-09 22:21:06 +00:00
|
|
|
pat loc loc cii $1==2 && $2<=4 /* sign-extend short */
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with REG
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {SEX_H, %1}
|
|
|
|
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
/* Local variables */
|
|
|
|
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
pat lal smalls($1) /* Load address of local */
|
2017-02-13 22:44:46 +00:00
|
|
|
yields {SUM_RC, fp, $1}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
pat lal /* Load address of local */
|
2017-02-13 22:44:46 +00:00
|
|
|
uses REG={SUM_RIS, fp, his($1)}
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
yields {SUM_RC, %a, los($1)}
|
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
pat lal loi smalls($1) && $2==1 /* Load byte from local */
|
|
|
|
yields {FRAME_B, 0, fp, $1, 1}
|
|
|
|
|
|
|
|
/* Load half-word from local and sign-extend */
|
|
|
|
pat lal loi loc loc cii smalls($1) && $2==2 && $3==2 && $4==4
|
|
|
|
yields {FRAME_H_S, 0, fp, $1, 1}
|
|
|
|
|
|
|
|
pat lal loi smalls($1) && $2==2 /* Load half-word from local */
|
|
|
|
yields {FRAME_H, 0, fp, $1, 1}
|
|
|
|
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
/* Load word from local */
|
2017-10-14 16:40:04 +00:00
|
|
|
pat lol inreg($1)==reg_any || inreg($1)==reg_float
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {LOCAL, $1}
|
2017-12-22 22:04:16 +00:00
|
|
|
pat lol smalls($1)
|
|
|
|
yields {FRAME_W, 0, fp, $1, 4}
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
pat lol
|
2007-11-02 18:56:58 +00:00
|
|
|
leaving
|
|
|
|
lal $1
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
loi 4
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
pat ldl inreg($1)==reg_float /* Load double-word from local */
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
yields {DLOCAL, $1}
|
2017-12-22 22:04:16 +00:00
|
|
|
pat ldl smalls($1) && smalls($1+4)
|
|
|
|
/* smalls($1+4) implies FRAME_D %off<=0xFFFA */
|
|
|
|
yields {FRAME_D, 0, fp, $1, 8}
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
pat ldl
|
2007-11-02 18:56:58 +00:00
|
|
|
leaving
|
|
|
|
lal $1
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
loi 8
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
pat lal sti smalls($1) && $2==1 /* Store byte to local */
|
|
|
|
with REG
|
|
|
|
kills IND_V, FRAME_V %level==0 && fover($1, 1)
|
|
|
|
gen move %1, {FRAME_B, 0, fp, $1, 1}
|
|
|
|
|
|
|
|
pat lal sti smalls($1) && $2==2 /* Store half-word to local */
|
|
|
|
with REG
|
|
|
|
kills IND_V, FRAME_V %level==0 && fover($1, 2)
|
|
|
|
gen move %1, {FRAME_H, 0, fp, $1, 2}
|
|
|
|
|
|
|
|
pat stl inreg($1)==reg_any /* Store word to local */
|
2017-12-14 21:26:19 +00:00
|
|
|
with exact INT_W
|
2017-10-17 18:15:33 +00:00
|
|
|
/* ncg fails to infer that regvar($1) is dead! */
|
|
|
|
kills regvar($1)
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
gen move %1, {REG_EXPR, regvar($1)}
|
2017-10-17 18:15:33 +00:00
|
|
|
with STACK
|
|
|
|
gen
|
|
|
|
lwz {LOCAL, $1}, {IND_RC_W, sp, 0}
|
2017-12-08 00:24:09 +00:00
|
|
|
addi sp, sp, {C, 4}
|
2017-10-14 16:40:04 +00:00
|
|
|
pat stl inreg($1)==reg_float
|
2017-12-22 22:04:16 +00:00
|
|
|
with exact FLOAT_W
|
2017-10-17 18:15:33 +00:00
|
|
|
kills regvar_w($1, reg_float)
|
|
|
|
gen move %1, {FSREG_EXPR, regvar_w($1, reg_float)}
|
2017-10-14 16:40:04 +00:00
|
|
|
with STACK
|
|
|
|
gen
|
|
|
|
lfs {LOCAL, $1}, {IND_RC_W, sp, 0}
|
2017-12-08 00:24:09 +00:00
|
|
|
addi sp, sp, {C, 4}
|
2017-12-22 22:04:16 +00:00
|
|
|
pat stl smalls($1)
|
|
|
|
with REG+FSREG
|
|
|
|
kills IND_V, FRAME_V %level==0 && fover($1, 4)
|
|
|
|
gen move %1, {FRAME_W, 0, fp, $1, 4}
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
pat stl
|
2007-11-02 18:56:58 +00:00
|
|
|
leaving
|
|
|
|
lal $1
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
sti 4
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
pat sdl inreg($1)==reg_float /* Store double-word to local */
|
|
|
|
with exact FLOAT_D
|
2017-10-17 18:15:33 +00:00
|
|
|
kills regvar_d($1, reg_float)
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
gen move %1, {FREG_EXPR, regvar_d($1, reg_float)}
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
with STACK
|
|
|
|
gen
|
|
|
|
lfd {DLOCAL, $1}, {IND_RC_D, sp, 0}
|
2017-12-08 00:24:09 +00:00
|
|
|
addi sp, sp, {C, 8}
|
2017-12-22 22:04:16 +00:00
|
|
|
pat sdl smalls($1) && smalls($1+4)
|
|
|
|
with REG REG
|
|
|
|
kills IND_V, FRAME_V %level==0 && fover($1, 8)
|
|
|
|
gen
|
|
|
|
move %1, {FRAME_W, 0, fp, $1, 4}
|
|
|
|
move %2, {FRAME_W, 0, fp, $1+4, 4}
|
|
|
|
with FREG
|
|
|
|
kills IND_V, FRAME_V %level==0 && fover($1, 4)
|
|
|
|
gen move %1, {FRAME_D, 0, fp, $1, 8}
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
pat sdl
|
2007-11-02 18:56:58 +00:00
|
|
|
leaving
|
|
|
|
lal $1
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
sti 8
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 00:57:42 +00:00
|
|
|
pat lil /* Load indirect from local */
|
2007-11-02 18:56:58 +00:00
|
|
|
leaving
|
|
|
|
lol $1
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
loi 4
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
pat sil /* Store indirect to local */
|
2007-11-02 18:56:58 +00:00
|
|
|
leaving
|
|
|
|
lol $1
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
sti 4
|
2016-09-30 15:50:50 +00:00
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
pat zrl /* Zero local */
|
2007-11-02 18:56:58 +00:00
|
|
|
leaving
|
|
|
|
loc 0
|
|
|
|
stl $1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
pat inl /* Increment local */
|
2007-11-02 18:56:58 +00:00
|
|
|
leaving
|
|
|
|
lol $1
|
|
|
|
loc 1
|
|
|
|
adi 4
|
|
|
|
stl $1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
pat del /* Decrement local */
|
2007-11-02 18:56:58 +00:00
|
|
|
leaving
|
|
|
|
lol $1
|
|
|
|
loc 1
|
|
|
|
sbi 4
|
|
|
|
stl $1
|
|
|
|
|
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
/* Local variables of procedures on static chain */
|
|
|
|
|
|
|
|
/* lxa (lexical argument base) -> lxl (lexical local base) */
|
|
|
|
pat lxa adp nicelx($1)
|
|
|
|
leaving lxl $1 adp $2+EM_BSIZE
|
|
|
|
pat lxa lof nicelx($1)
|
|
|
|
leaving lxl $1 lof $2+EM_BSIZE
|
|
|
|
pat lxa ldf nicelx($1)
|
|
|
|
leaving lxl $1 ldf $2+EM_BSIZE
|
|
|
|
pat lxa stf nicelx($1)
|
|
|
|
leaving lxl $1 stf $2+EM_BSIZE
|
|
|
|
pat lxa sdf nicelx($1)
|
|
|
|
leaving lxl $1 stf $2+EM_BSIZE
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
pat lxa nicelx($1)
|
2017-12-22 22:04:16 +00:00
|
|
|
leaving lxl $1 adp EM_BSIZE
|
|
|
|
|
|
|
|
/* Load locals in statically enclosing procedures */
|
|
|
|
pat lxl adp loi nicelx($1) && smalls($2) && $3==1
|
|
|
|
uses REG={LXFRAME, $1}
|
|
|
|
yields {FRAME_B, $1, %a, $2, 1}
|
|
|
|
pat lxl adp loi loc loc cii nicelx($1) && smalls($2) &&
|
|
|
|
$3==2 && $4==2 && $5==4
|
|
|
|
uses REG={LXFRAME, $1}
|
|
|
|
yields {FRAME_H_S, $1, %a, $2, 2}
|
|
|
|
pat lxl adp loi nicelx($1) && smalls($2) && $3==2
|
|
|
|
uses REG={LXFRAME, $1}
|
|
|
|
yields {FRAME_H, $1, %a, $2, 2}
|
|
|
|
pat lxl lof nicelx($1) && smalls($2)
|
|
|
|
uses REG={LXFRAME, $1}
|
|
|
|
yields {FRAME_W, $1, %a, $2, 4}
|
|
|
|
pat lxl ldf nicelx($1) && smalls($2) && smalls($2+4)
|
|
|
|
uses REG={LXFRAME, $1}
|
|
|
|
/* smalls($2+4) implies FRAME_D %off<=0xFFFA */
|
|
|
|
yields {FRAME_D, $1, %a, $2, 8}
|
|
|
|
|
|
|
|
/* Store locals in statically enclosing procedures */
|
|
|
|
pat lxl adp sti nicelx($1) && smalls($2) && $3==1
|
|
|
|
with REG
|
|
|
|
kills IND_V, FRAME_V %level==$1 && fover($2, 1)
|
|
|
|
uses REG={LXFRAME, $1}
|
|
|
|
gen move %1, {FRAME_B, $1, %a, $2, 1}
|
|
|
|
pat lxl adp sti nicelx($1) && smalls($2) && $3==2
|
|
|
|
with REG
|
|
|
|
kills IND_V, FRAME_V %level==$1 && fover($2, 2)
|
|
|
|
uses REG={LXFRAME, $1}
|
|
|
|
gen move %1, {FRAME_H, $1, %a, $2, 2}
|
|
|
|
pat lxl stf nicelx($1) && smalls($2)
|
|
|
|
with REG+FSREG
|
|
|
|
kills IND_V, FRAME_V %level==$1 && fover($2, 4)
|
|
|
|
uses REG={LXFRAME, $1}
|
|
|
|
gen move %1, {FRAME_W, $1, %a, $2, 4}
|
|
|
|
pat lxl sdf nicelx($1) && smalls($2) && smalls($2+4)
|
|
|
|
with REG REG
|
|
|
|
kills IND_V, FRAME_V %level==$1 && fover($2, 8)
|
|
|
|
uses REG={LXFRAME, $1}
|
|
|
|
gen
|
|
|
|
move %1, {FRAME_W, $1, %a, $2, 4}
|
|
|
|
move %2, {FRAME_W, $1, %a, $2+4, 4}
|
|
|
|
with FREG
|
|
|
|
kills IND_V, FRAME_V %level==$1 && fover($2, 8)
|
|
|
|
uses REG={LXFRAME, $1}
|
|
|
|
gen move %1, {FRAME_D, $1, %a, $2, 8}
|
|
|
|
|
2018-01-23 23:18:40 +00:00
|
|
|
pat lxl nicelx($1) /* Local base on static chain */
|
2017-12-22 22:04:16 +00:00
|
|
|
uses REG={LXFRAME, $1}
|
|
|
|
yields %a /* Can't yield LXFRAME. */
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
pat lxl stl nicelx($1) && inreg($2)==reg_any
|
|
|
|
kills regvar($2)
|
|
|
|
gen move {LXFRAME, $1}, {REG_EXPR, regvar($2)}
|
2017-12-22 22:04:16 +00:00
|
|
|
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
pat lxl $1==0 /* Our local base */
|
2018-01-23 23:18:40 +00:00
|
|
|
yields fp
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
|
|
|
|
pat lxa $1==0 /* Our argument base */
|
|
|
|
yields {SUM_RC, fp, EM_BSIZE}
|
2017-12-22 22:04:16 +00:00
|
|
|
|
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
/* Global variables */
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
pat lpi /* Load address of function */
|
2007-11-02 18:56:58 +00:00
|
|
|
leaving
|
|
|
|
lae $1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat lae /* Load address of external */
|
In PowerPC ncg, allocate register for ha16[label].
Use it to generate code like
lis r12,ha16[__II0]
lis r11,ha16[_f]
lfs f1,lo16[_f](r11)
lfs f2,lo16[__II0](r12)
fadds f13,f2,f1
stfs f13,lo16[_f](r11)
Here ncg has allocated r11 for ha16[_f]. We use r11 in lfs and again
in stfs. Before this change, we needed an extra lis before stfs,
because ncg did not remember that ha16[_f] was in a register.
This example has a gap between ha16[__II0] and lo16[__II0], because
the lo16 is not in the next instruction. This requires my previous
commit 1bf58cf for RELOLIS. There is a gap because ncg emits the lis
as soon as I allocate it. The "lfs f2,lo16[__II0](r12)" happens in a
coercion from IND_RL_W to FSREG. The coercion allocates one FSREG but
may not allocate any other registers. So I must allocate r12 earlier.
I allocate r12 in pat lae, but this causes a gap.
2017-02-08 17:23:06 +00:00
|
|
|
uses REG={LABEL_HA, $1}
|
|
|
|
yields {SUM_RL, %a, $1}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat loe /* Load word external */
|
|
|
|
leaving
|
|
|
|
lae $1
|
2017-12-23 02:18:58 +00:00
|
|
|
loi 4
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
pat ste /* Store word external */
|
|
|
|
leaving
|
|
|
|
lae $1
|
2017-12-23 02:18:58 +00:00
|
|
|
sti 4
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat lde /* Load double-word external */
|
|
|
|
leaving
|
|
|
|
lae $1
|
2017-12-23 02:18:58 +00:00
|
|
|
loi 8
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat sde /* Store double-word external */
|
|
|
|
leaving
|
|
|
|
lae $1
|
2017-12-23 02:18:58 +00:00
|
|
|
sti 8
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat zre /* Zero external */
|
2007-11-02 18:56:58 +00:00
|
|
|
leaving
|
|
|
|
loc 0
|
|
|
|
ste $1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat ine /* Increment external */
|
In PowerPC ncg, allocate register for ha16[label].
Use it to generate code like
lis r12,ha16[__II0]
lis r11,ha16[_f]
lfs f1,lo16[_f](r11)
lfs f2,lo16[__II0](r12)
fadds f13,f2,f1
stfs f13,lo16[_f](r11)
Here ncg has allocated r11 for ha16[_f]. We use r11 in lfs and again
in stfs. Before this change, we needed an extra lis before stfs,
because ncg did not remember that ha16[_f] was in a register.
This example has a gap between ha16[__II0] and lo16[__II0], because
the lo16 is not in the next instruction. This requires my previous
commit 1bf58cf for RELOLIS. There is a gap because ncg emits the lis
as soon as I allocate it. The "lfs f2,lo16[__II0](r12)" happens in a
coercion from IND_RL_W to FSREG. The coercion allocates one FSREG but
may not allocate any other registers. So I must allocate r12 earlier.
I allocate r12 in pat lae, but this causes a gap.
2017-02-08 17:23:06 +00:00
|
|
|
leaving
|
|
|
|
loe $1
|
|
|
|
inc
|
|
|
|
ste $1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat dee /* Decrement external */
|
In PowerPC ncg, allocate register for ha16[label].
Use it to generate code like
lis r12,ha16[__II0]
lis r11,ha16[_f]
lfs f1,lo16[_f](r11)
lfs f2,lo16[__II0](r12)
fadds f13,f2,f1
stfs f13,lo16[_f](r11)
Here ncg has allocated r11 for ha16[_f]. We use r11 in lfs and again
in stfs. Before this change, we needed an extra lis before stfs,
because ncg did not remember that ha16[_f] was in a register.
This example has a gap between ha16[__II0] and lo16[__II0], because
the lo16 is not in the next instruction. This requires my previous
commit 1bf58cf for RELOLIS. There is a gap because ncg emits the lis
as soon as I allocate it. The "lfs f2,lo16[__II0](r12)" happens in a
coercion from IND_RL_W to FSREG. The coercion allocates one FSREG but
may not allocate any other registers. So I must allocate r12 earlier.
I allocate r12 in pat lae, but this causes a gap.
2017-02-08 17:23:06 +00:00
|
|
|
leaving
|
|
|
|
loe $1
|
|
|
|
dec
|
|
|
|
ste $1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
/* Structures */
|
|
|
|
|
|
|
|
pat lof /* Load word offsetted */
|
|
|
|
leaving
|
|
|
|
adp $1
|
2017-12-23 02:18:58 +00:00
|
|
|
loi 4
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat ldf /* Load double-word offsetted */
|
|
|
|
leaving
|
|
|
|
adp $1
|
2017-12-23 02:18:58 +00:00
|
|
|
loi 8
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat stf /* Store word offsetted */
|
|
|
|
leaving
|
|
|
|
adp $1
|
2017-12-23 02:18:58 +00:00
|
|
|
sti 4
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat sdf /* Store double-word offsetted */
|
|
|
|
leaving
|
|
|
|
adp $1
|
2017-12-23 02:18:58 +00:00
|
|
|
sti 8
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
/* Loads and stores */
|
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat loi $1==1 /* Load byte indirect */
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with REG
|
2016-10-16 22:13:39 +00:00
|
|
|
yields {IND_RC_B, %1, 0}
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with exact SUM_RC
|
2016-10-16 22:13:39 +00:00
|
|
|
yields {IND_RC_B, %1.reg, %1.off}
|
Use ha16/lo16 to load or store 1, 2, 8 bytes from labels.
Add the tokens IND_RL_B, IND_RL_H, IND_RL_H_S, IND_RL_D, along with
the rules to use them. These rules emit shorter code. For example,
loading a byte becomes lis, lbz instead of lis, addi, lbz.
While making this, I wrongly set IND_RL_D to size 4. Then ncg made
infinite recursion in codegen() and stackupto(), until it crashed by
stack overflow. I correctly set IND_RL_D to size 8, preventing the
crash.
2017-02-08 17:31:14 +00:00
|
|
|
with exact SUM_RL
|
|
|
|
yields {IND_RL_B, %1.reg, %1.adr}
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with exact SUM_RR
|
|
|
|
yields {IND_RR_B, %1.reg1, %1.reg2}
|
2016-10-16 22:13:39 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
/* Load half-word indirect and sign-extend */
|
|
|
|
pat loi loc loc cii $1==2 && $2==2 && $3==4
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with REG
|
2016-10-16 22:13:39 +00:00
|
|
|
yields {IND_RC_H_S, %1, 0}
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with exact SUM_RC
|
2016-10-16 22:13:39 +00:00
|
|
|
yields {IND_RC_H_S, %1.reg, %1.off}
|
Use ha16/lo16 to load or store 1, 2, 8 bytes from labels.
Add the tokens IND_RL_B, IND_RL_H, IND_RL_H_S, IND_RL_D, along with
the rules to use them. These rules emit shorter code. For example,
loading a byte becomes lis, lbz instead of lis, addi, lbz.
While making this, I wrongly set IND_RL_D to size 4. Then ncg made
infinite recursion in codegen() and stackupto(), until it crashed by
stack overflow. I correctly set IND_RL_D to size 8, preventing the
crash.
2017-02-08 17:31:14 +00:00
|
|
|
with exact SUM_RL
|
|
|
|
yields {IND_RL_H_S, %1.reg, %1.adr}
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with exact SUM_RR
|
|
|
|
yields {IND_RR_H_S, %1.reg1, %1.reg2}
|
2016-10-16 22:13:39 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat loi $1==2 /* Load half-word indirect */
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with REG
|
2016-10-16 22:13:39 +00:00
|
|
|
yields {IND_RC_H, %1, 0}
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with exact SUM_RC
|
2016-10-16 22:13:39 +00:00
|
|
|
yields {IND_RC_H, %1.reg, %1.off}
|
Use ha16/lo16 to load or store 1, 2, 8 bytes from labels.
Add the tokens IND_RL_B, IND_RL_H, IND_RL_H_S, IND_RL_D, along with
the rules to use them. These rules emit shorter code. For example,
loading a byte becomes lis, lbz instead of lis, addi, lbz.
While making this, I wrongly set IND_RL_D to size 4. Then ncg made
infinite recursion in codegen() and stackupto(), until it crashed by
stack overflow. I correctly set IND_RL_D to size 8, preventing the
crash.
2017-02-08 17:31:14 +00:00
|
|
|
with exact SUM_RL
|
|
|
|
yields {IND_RL_H, %1.reg, %1.adr}
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with exact SUM_RR
|
|
|
|
yields {IND_RR_H, %1.reg1, %1.reg2}
|
2016-10-16 22:13:39 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat loi $1==4 /* Load word indirect */
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with REG
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {IND_RC_W, %1, 0}
|
2017-02-02 15:48:25 +00:00
|
|
|
with exact SUM_RC
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {IND_RC_W, %1.reg, %1.off}
|
In PowerPC ncg, allocate register for ha16[label].
Use it to generate code like
lis r12,ha16[__II0]
lis r11,ha16[_f]
lfs f1,lo16[_f](r11)
lfs f2,lo16[__II0](r12)
fadds f13,f2,f1
stfs f13,lo16[_f](r11)
Here ncg has allocated r11 for ha16[_f]. We use r11 in lfs and again
in stfs. Before this change, we needed an extra lis before stfs,
because ncg did not remember that ha16[_f] was in a register.
This example has a gap between ha16[__II0] and lo16[__II0], because
the lo16 is not in the next instruction. This requires my previous
commit 1bf58cf for RELOLIS. There is a gap because ncg emits the lis
as soon as I allocate it. The "lfs f2,lo16[__II0](r12)" happens in a
coercion from IND_RL_W to FSREG. The coercion allocates one FSREG but
may not allocate any other registers. So I must allocate r12 earlier.
I allocate r12 in pat lae, but this causes a gap.
2017-02-08 17:23:06 +00:00
|
|
|
with exact SUM_RL
|
|
|
|
yields {IND_RL_W, %1.reg, %1.adr}
|
2017-02-02 15:48:25 +00:00
|
|
|
with exact SUM_RR
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {IND_RR_W, %1.reg1, %1.reg2}
|
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat loi $1==8 /* Load double-word indirect */
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with REG
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {IND_RC_D, %1, 0}
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with exact SUM_RC
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {IND_RC_D, %1.reg, %1.off}
|
Use ha16/lo16 to load or store 1, 2, 8 bytes from labels.
Add the tokens IND_RL_B, IND_RL_H, IND_RL_H_S, IND_RL_D, along with
the rules to use them. These rules emit shorter code. For example,
loading a byte becomes lis, lbz instead of lis, addi, lbz.
While making this, I wrongly set IND_RL_D to size 4. Then ncg made
infinite recursion in codegen() and stackupto(), until it crashed by
stack overflow. I correctly set IND_RL_D to size 8, preventing the
crash.
2017-02-08 17:31:14 +00:00
|
|
|
with exact SUM_RL
|
|
|
|
yields {IND_RL_D, %1.reg, %1.adr}
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with exact SUM_RR
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {IND_RR_D, %1.reg1, %1.reg2}
|
|
|
|
|
|
|
|
pat loi /* Load arbitrary size */
|
|
|
|
leaving
|
|
|
|
loc $1
|
Use .los4 in lar 4 and .sts4 in sar 4.
Our libem had two implementations of loading a block from a stack, one
for lar 4 and one for los 4. Now lar 4 and los 4 share the code in
.los4. Likewise, sar 4 and sts 4 share the code in .sts4.
Rename .los to .los4 and .sts to .sts4, because they implement los 4
and sts 4. Remove the special case for loading or storing 4 bytes,
because we can do it with 1 iteration of the loop. Remove the lines
to "align size" where the size must already be a multiple of 4.
Fix the upper bound check in .aar4.
Change .aar4, .lar4, .los4, .sar4, .sts4 to pass all operands on the
real stack, except that .los4 and .sts4 take the size in register r3.
Have .aar4 set r3 to the size of the array element. So lar 4 is just
.aar4 then .los4, and sar 4 is just .aar4 then .sts4.
ncg no longer calls .lar4 and .sar4 in libem, because it inlines the
code; but I keep .lar4 and .sar4 in libem, because mcg references
them. They might or might not work in mcg.
2017-02-13 20:22:00 +00:00
|
|
|
los 4
|
2016-12-09 21:36:42 +00:00
|
|
|
|
Use .los4 in lar 4 and .sts4 in sar 4.
Our libem had two implementations of loading a block from a stack, one
for lar 4 and one for los 4. Now lar 4 and los 4 share the code in
.los4. Likewise, sar 4 and sts 4 share the code in .sts4.
Rename .los to .los4 and .sts to .sts4, because they implement los 4
and sts 4. Remove the special case for loading or storing 4 bytes,
because we can do it with 1 iteration of the loop. Remove the lines
to "align size" where the size must already be a multiple of 4.
Fix the upper bound check in .aar4.
Change .aar4, .lar4, .los4, .sar4, .sts4 to pass all operands on the
real stack, except that .los4 and .sts4 take the size in register r3.
Have .aar4 set r3 to the size of the array element. So lar 4 is just
.aar4 then .los4, and sar 4 is just .aar4 then .sts4.
ncg no longer calls .lar4 and .sar4 in libem, because it inlines the
code; but I keep .lar4 and .sar4 in libem, because mcg references
them. They might or might not work in mcg.
2017-02-13 20:22:00 +00:00
|
|
|
pat los $1==4 /* Load arbitrary size */
|
2017-02-18 00:32:27 +00:00
|
|
|
with REG3 STACK
|
2007-11-02 18:56:58 +00:00
|
|
|
kills ALL
|
2017-12-23 02:18:58 +00:00
|
|
|
gen bl {LABEL, ".los4"}
|
2016-10-16 22:13:39 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat sti $1==1 /* Store byte indirect */
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with REG REG
|
2016-10-16 22:13:39 +00:00
|
|
|
kills MEMORY
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen move %2, {IND_RC_B, %1, 0}
|
|
|
|
with SUM_RC REG
|
2016-10-16 22:13:39 +00:00
|
|
|
kills MEMORY
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen move %2, {IND_RC_B, %1.reg, %1.off}
|
Use ha16/lo16 to load or store 1, 2, 8 bytes from labels.
Add the tokens IND_RL_B, IND_RL_H, IND_RL_H_S, IND_RL_D, along with
the rules to use them. These rules emit shorter code. For example,
loading a byte becomes lis, lbz instead of lis, addi, lbz.
While making this, I wrongly set IND_RL_D to size 4. Then ncg made
infinite recursion in codegen() and stackupto(), until it crashed by
stack overflow. I correctly set IND_RL_D to size 8, preventing the
crash.
2017-02-08 17:31:14 +00:00
|
|
|
with SUM_RL REG
|
|
|
|
kills MEMORY
|
|
|
|
gen move %2, {IND_RL_B, %1.reg, %1.adr}
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with SUM_RR REG
|
2016-10-16 22:13:39 +00:00
|
|
|
kills MEMORY
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen move %2, {IND_RR_B, %1.reg1, %1.reg2}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat sti $1==2 /* Store half-word indirect */
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with REG REG
|
2016-10-16 22:13:39 +00:00
|
|
|
kills MEMORY
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen move %2, {IND_RC_H, %1, 0}
|
|
|
|
with SUM_RC REG
|
2016-10-16 22:13:39 +00:00
|
|
|
kills MEMORY
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen move %2, {IND_RC_H, %1.reg, %1.off}
|
Use ha16/lo16 to load or store 1, 2, 8 bytes from labels.
Add the tokens IND_RL_B, IND_RL_H, IND_RL_H_S, IND_RL_D, along with
the rules to use them. These rules emit shorter code. For example,
loading a byte becomes lis, lbz instead of lis, addi, lbz.
While making this, I wrongly set IND_RL_D to size 4. Then ncg made
infinite recursion in codegen() and stackupto(), until it crashed by
stack overflow. I correctly set IND_RL_D to size 8, preventing the
crash.
2017-02-08 17:31:14 +00:00
|
|
|
with SUM_RL REG
|
|
|
|
kills MEMORY
|
|
|
|
gen move %2, {IND_RL_H, %1.reg, %1.adr}
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with SUM_RR REG
|
2016-10-16 22:13:39 +00:00
|
|
|
kills MEMORY
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen move %2, {IND_RR_H, %1.reg1, %1.reg2}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat sti $1==4 /* Store word indirect */
|
2017-02-02 15:48:25 +00:00
|
|
|
with REG REG+FSREG
|
2016-10-16 22:13:39 +00:00
|
|
|
kills MEMORY
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen move %2, {IND_RC_W, %1, 0}
|
|
|
|
with SUM_RC REG+FSREG
|
2016-10-16 22:13:39 +00:00
|
|
|
kills MEMORY
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen move %2, {IND_RC_W, %1.reg, %1.off}
|
In PowerPC ncg, allocate register for ha16[label].
Use it to generate code like
lis r12,ha16[__II0]
lis r11,ha16[_f]
lfs f1,lo16[_f](r11)
lfs f2,lo16[__II0](r12)
fadds f13,f2,f1
stfs f13,lo16[_f](r11)
Here ncg has allocated r11 for ha16[_f]. We use r11 in lfs and again
in stfs. Before this change, we needed an extra lis before stfs,
because ncg did not remember that ha16[_f] was in a register.
This example has a gap between ha16[__II0] and lo16[__II0], because
the lo16 is not in the next instruction. This requires my previous
commit 1bf58cf for RELOLIS. There is a gap because ncg emits the lis
as soon as I allocate it. The "lfs f2,lo16[__II0](r12)" happens in a
coercion from IND_RL_W to FSREG. The coercion allocates one FSREG but
may not allocate any other registers. So I must allocate r12 earlier.
I allocate r12 in pat lae, but this causes a gap.
2017-02-08 17:23:06 +00:00
|
|
|
with SUM_RL REG+FSREG
|
|
|
|
kills MEMORY
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen move %2, {IND_RL_W, %1.reg, %1.adr}
|
|
|
|
with SUM_RR REG+FSREG
|
2016-10-16 22:13:39 +00:00
|
|
|
kills MEMORY
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen move %2, {IND_RR_W, %1.reg1, %1.reg2}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat sti $1==8 /* Store double-word indirect */
|
2016-10-18 00:31:59 +00:00
|
|
|
with REG FREG
|
2016-10-16 22:13:39 +00:00
|
|
|
kills MEMORY
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen move %2, {IND_RC_D, %1, 0}
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
with SUM_RC FREG
|
2016-10-16 22:13:39 +00:00
|
|
|
kills MEMORY
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
gen move %2, {IND_RC_D, %1.reg, %1.off}
|
Use ha16/lo16 to load or store 1, 2, 8 bytes from labels.
Add the tokens IND_RL_B, IND_RL_H, IND_RL_H_S, IND_RL_D, along with
the rules to use them. These rules emit shorter code. For example,
loading a byte becomes lis, lbz instead of lis, addi, lbz.
While making this, I wrongly set IND_RL_D to size 4. Then ncg made
infinite recursion in codegen() and stackupto(), until it crashed by
stack overflow. I correctly set IND_RL_D to size 8, preventing the
crash.
2017-02-08 17:31:14 +00:00
|
|
|
with SUM_RL FREG
|
|
|
|
kills MEMORY
|
|
|
|
gen move %2, {IND_RL_D, %1.reg, %1.adr}
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with SUM_RR FREG
|
|
|
|
kills MEMORY
|
|
|
|
gen move %2, {IND_RR_D, %1.reg1, %1.reg2}
|
2017-02-13 23:11:27 +00:00
|
|
|
with REG REG REG
|
2016-10-16 22:13:39 +00:00
|
|
|
kills MEMORY
|
2007-11-02 18:56:58 +00:00
|
|
|
gen
|
2017-02-13 23:11:27 +00:00
|
|
|
move %2, {IND_RC_W, %1, 0}
|
|
|
|
move %3, {IND_RC_W, %1, 4}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
pat sti /* Store arbitrary size */
|
|
|
|
leaving
|
|
|
|
loc $1
|
Use .los4 in lar 4 and .sts4 in sar 4.
Our libem had two implementations of loading a block from a stack, one
for lar 4 and one for los 4. Now lar 4 and los 4 share the code in
.los4. Likewise, sar 4 and sts 4 share the code in .sts4.
Rename .los to .los4 and .sts to .sts4, because they implement los 4
and sts 4. Remove the special case for loading or storing 4 bytes,
because we can do it with 1 iteration of the loop. Remove the lines
to "align size" where the size must already be a multiple of 4.
Fix the upper bound check in .aar4.
Change .aar4, .lar4, .los4, .sar4, .sts4 to pass all operands on the
real stack, except that .los4 and .sts4 take the size in register r3.
Have .aar4 set r3 to the size of the array element. So lar 4 is just
.aar4 then .los4, and sar 4 is just .aar4 then .sts4.
ncg no longer calls .lar4 and .sar4 in libem, because it inlines the
code; but I keep .lar4 and .sar4 in libem, because mcg references
them. They might or might not work in mcg.
2017-02-13 20:22:00 +00:00
|
|
|
sts 4
|
2016-10-16 22:13:39 +00:00
|
|
|
|
Use .los4 in lar 4 and .sts4 in sar 4.
Our libem had two implementations of loading a block from a stack, one
for lar 4 and one for los 4. Now lar 4 and los 4 share the code in
.los4. Likewise, sar 4 and sts 4 share the code in .sts4.
Rename .los to .los4 and .sts to .sts4, because they implement los 4
and sts 4. Remove the special case for loading or storing 4 bytes,
because we can do it with 1 iteration of the loop. Remove the lines
to "align size" where the size must already be a multiple of 4.
Fix the upper bound check in .aar4.
Change .aar4, .lar4, .los4, .sar4, .sts4 to pass all operands on the
real stack, except that .los4 and .sts4 take the size in register r3.
Have .aar4 set r3 to the size of the array element. So lar 4 is just
.aar4 then .los4, and sar 4 is just .aar4 then .sts4.
ncg no longer calls .lar4 and .sar4 in libem, because it inlines the
code; but I keep .lar4 and .sar4 in libem, because mcg references
them. They might or might not work in mcg.
2017-02-13 20:22:00 +00:00
|
|
|
pat sts $1==4 /* Store arbitrary size */
|
2017-02-18 00:32:27 +00:00
|
|
|
with REG3 STACK
|
2007-11-02 18:56:58 +00:00
|
|
|
kills ALL
|
2017-12-23 02:18:58 +00:00
|
|
|
gen bl {LABEL, ".sts4"}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
/* Arithmetic wrappers */
|
|
|
|
|
|
|
|
pat ads $1==4 /* Add var to pointer */
|
|
|
|
leaving adi $1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat sbs $1==4 /* Subtract var from pointer */
|
|
|
|
leaving sbi $1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat adp /* Add constant to pointer */
|
|
|
|
leaving
|
|
|
|
loc $1
|
|
|
|
adi 4
|
|
|
|
|
|
|
|
pat adu /* Add unsigned */
|
|
|
|
leaving
|
|
|
|
adi $1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat sbu /* Subtract unsigned */
|
|
|
|
leaving
|
|
|
|
sbi $1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat inc /* Add 1 */
|
|
|
|
leaving
|
|
|
|
loc 1
|
|
|
|
adi 4
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat dec /* Subtract 1 */
|
|
|
|
leaving
|
|
|
|
loc 1
|
|
|
|
sbi 4
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-02-10 16:45:50 +00:00
|
|
|
pat mlu /* Multiply unsigned */
|
2007-11-02 18:56:58 +00:00
|
|
|
leaving
|
|
|
|
mli $1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-02-10 16:45:50 +00:00
|
|
|
pat slu /* Shift left unsigned */
|
2007-11-02 18:56:58 +00:00
|
|
|
leaving
|
|
|
|
sli $1
|
|
|
|
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
/* Word arithmetic */
|
|
|
|
|
2018-01-24 20:17:32 +00:00
|
|
|
/* Like most back ends, this one doesn't trap EIOVFL, so it
|
|
|
|
* ignores overflow in signed integers.
|
|
|
|
*/
|
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat adi $1==4 /* Add word (second + top) */
|
|
|
|
with REG REG
|
|
|
|
yields {SUM_RR, %1, %2}
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
with CONST2 REG
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {SUM_RC, %2, %1.val}
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
with REG CONST2
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {SUM_RC, %1, %2.val}
|
2017-12-18 17:36:10 +00:00
|
|
|
with CONST_HI_ZR REG
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {SUM_RIS, %2, his(%1.val)}
|
2017-12-18 17:36:10 +00:00
|
|
|
with REG CONST_HI_ZR
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {SUM_RIS, %1, his(%2.val)}
|
2017-12-18 17:36:10 +00:00
|
|
|
with CONST_STACK-CONST2-CONST_HI_ZR REG
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
uses reusing %2, REG={SUM_RIS, %2, his(%1.val)}
|
|
|
|
yields {SUM_RC, %a, los(%1.val)}
|
2017-12-18 17:36:10 +00:00
|
|
|
with REG CONST_STACK-CONST2-CONST_HI_ZR
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
uses reusing %1, REG={SUM_RIS, %1, his(%2.val)}
|
|
|
|
yields {SUM_RC, %a, los(%2.val)}
|
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat sbi $1==4 /* Subtract word (second - top) */
|
|
|
|
with REG REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, REG
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {SUB_RR, %2, %1}
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
with CONST2_WHEN_NEG REG
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {SUM_RC, %2, 0-%1.val}
|
Use subfic (val - reg) and mulli (reg * val).
In the instruction list, put /* kills xer */ for sraw, srawi, subfic;
and correct the (now unused) "addi." and "lfdu".
Change MACHOPT_F from -m3 to -m2. This changes the code for 15 * i
from
slwi r3,r4,4
subfic r5,r4,0
add r3,r3,r5
to
mulli r3,r4,15
If the sequence "slwi subfic addi" takes 3 cycles and 12 bytes, and
mulli takes 3 cycles and 4 bytes, then mulli is better.
2018-01-27 20:33:43 +00:00
|
|
|
with REG CONST2
|
|
|
|
yields {SUB_CR, %2.val, %1}
|
2017-12-18 17:36:10 +00:00
|
|
|
with CONST_HI_ZR REG
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {SUM_RIS, %2, his(0-%1.val)}
|
2017-12-18 17:36:10 +00:00
|
|
|
with CONST_STACK-CONST2_WHEN_NEG-CONST_HI_ZR REG
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
uses reusing %2, REG={SUM_RIS, %2, his(0-%1.val)}
|
|
|
|
yields {SUM_RC, %a, los(0-%1.val)}
|
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat ngi $1==4 /* Negate word */
|
|
|
|
with REG
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {NEG_R, %1}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat mli $1==4 /* Multiply word (second * top) */
|
Use subfic (val - reg) and mulli (reg * val).
In the instruction list, put /* kills xer */ for sraw, srawi, subfic;
and correct the (now unused) "addi." and "lfdu".
Change MACHOPT_F from -m3 to -m2. This changes the code for 15 * i
from
slwi r3,r4,4
subfic r5,r4,0
add r3,r3,r5
to
mulli r3,r4,15
If the sequence "slwi subfic addi" takes 3 cycles and 12 bytes, and
mulli takes 3 cycles and 4 bytes, then mulli is better.
2018-01-27 20:33:43 +00:00
|
|
|
with CONST2 REG
|
|
|
|
yields {MUL_RC, %2, %1.val}
|
|
|
|
with REG CONST2
|
|
|
|
yields {MUL_RC, %1, %2.val}
|
2007-11-02 18:56:58 +00:00
|
|
|
with REG REG
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {MUL_RR, %2, %1}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat dvi $1==4 /* Divide word (second / top) */
|
|
|
|
with REG REG
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {DIV_RR, %2, %1}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
pat dvu $1==4 /* Divide unsigned word (second / top) */
|
2007-11-02 18:56:58 +00:00
|
|
|
with REG REG
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {DIV_RR_U, %2, %1}
|
|
|
|
|
|
|
|
/* To calculate a remainder: a % b = a - (a / b * b) */
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
pat rmi $1==4 /* Remainder word (second % top) */
|
|
|
|
with REG REG
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
uses REG={DIV_RR, %2, %1}, REG
|
|
|
|
gen move {MUL_RR, %a, %1}, %b
|
|
|
|
yields {SUB_RR, %2, %b}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
pat rmu $1==4 /* Remainder unsigned word (second % top) */
|
2007-11-02 18:56:58 +00:00
|
|
|
with REG REG
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
uses REG={DIV_RR_U, %2, %1}, REG
|
|
|
|
gen move {MUL_RR, %a, %1}, %b
|
|
|
|
yields {SUB_RR, %2, %b}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-12-07 22:16:21 +00:00
|
|
|
|
|
|
|
/* Bitwise logic */
|
|
|
|
|
2018-01-24 20:17:32 +00:00
|
|
|
/* This back end doesn't know how to combine shifts and
|
|
|
|
* bitwise ops to emit rlwinm, rlwnm, or rlwimi instructions.
|
|
|
|
*/
|
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat and $1==4 /* AND word */
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with REG NOT_R
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {ANDC_RR, %1, %2.reg}
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with NOT_R REG
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {ANDC_RR, %2, %1.reg}
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with REG REG
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {AND_RR, %1, %2}
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with REG UCONST2
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {AND_RC, %1, %2.val}
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with UCONST2 REG
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {AND_RC, %2, %1.val}
|
2017-12-18 17:36:10 +00:00
|
|
|
with REG CONST_HI_ZR
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {AND_RIS, %1, hi(%2.val)}
|
2017-12-18 17:36:10 +00:00
|
|
|
with CONST_HI_ZR REG
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {AND_RIS, %2, hi(%1.val)}
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
|
2016-12-10 17:23:07 +00:00
|
|
|
pat and defined($1) /* AND set */
|
2017-01-17 21:31:38 +00:00
|
|
|
leaving
|
|
|
|
loc $1
|
|
|
|
cal ".and"
|
|
|
|
|
|
|
|
pat and !defined($1)
|
|
|
|
leaving
|
|
|
|
cal ".and"
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat ior $1==4 /* OR word */
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
with REG NOT_R
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {ORC_RR, %1, %2.reg}
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
with NOT_R REG
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {ORC_RR, %2, %1.reg}
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
with REG REG
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {OR_RR, %1, %2}
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
with REG UCONST2
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {OR_RC, %1, %2.val}
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
with UCONST2 REG
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {OR_RC, %2, %1.val}
|
2017-12-18 17:36:10 +00:00
|
|
|
with REG CONST_HI_ZR
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {OR_RIS, %1, hi(%2.val)}
|
2017-12-18 17:36:10 +00:00
|
|
|
with CONST_HI_ZR REG
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {OR_RIS, %2, hi(%1.val)}
|
2017-12-18 17:36:10 +00:00
|
|
|
with REG CONST_STACK-UCONST2-CONST_HI_ZR
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
uses reusing %1, REG={OR_RIS, %1, hi(%2.val)}
|
2019-02-13 19:56:10 +00:00
|
|
|
yields {OR_RC, %a, lo(%2.val)}
|
2017-12-18 17:36:10 +00:00
|
|
|
with CONST_STACK-UCONST2-CONST_HI_ZR REG
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
uses reusing %2, REG={OR_RIS, %2, hi(%1.val)}
|
2019-02-13 19:56:10 +00:00
|
|
|
yields {OR_RC, %a, lo(%1.val)}
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
|
2016-12-10 17:23:07 +00:00
|
|
|
pat ior defined($1) /* OR set */
|
2017-01-15 21:28:14 +00:00
|
|
|
leaving
|
|
|
|
loc $1
|
|
|
|
cal ".ior"
|
2016-12-10 17:23:07 +00:00
|
|
|
|
|
|
|
/* OR set (variable), used in lang/m2/libm2/LtoUset.e */
|
|
|
|
pat ior !defined($1)
|
2017-01-15 21:28:14 +00:00
|
|
|
leaving
|
|
|
|
cal ".ior"
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat xor $1==4 /* XOR word */
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
with REG REG
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {XOR_RR, %1, %2}
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
with REG UCONST2
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {XOR_RC, %1, %2.val}
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
with UCONST2 REG
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {XOR_RC, %2, %1.val}
|
2017-12-18 17:36:10 +00:00
|
|
|
with REG CONST_HI_ZR
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XOR_RIS, %1, hi(%2.val)}
|
2017-12-18 17:36:10 +00:00
|
|
|
with CONST_HI_ZR REG
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XOR_RIS, %2, hi(%1.val)}
|
2017-12-18 17:36:10 +00:00
|
|
|
with REG CONST_STACK-UCONST2-CONST_HI_ZR
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
uses reusing %1, REG={XOR_RIS, %1, hi(%2.val)}
|
2019-02-13 19:56:10 +00:00
|
|
|
yields {XOR_RC, %a, lo(%2.val)}
|
2017-12-18 17:36:10 +00:00
|
|
|
with CONST_STACK-UCONST2-CONST_HI_ZR REG
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
uses reusing %2, REG={XOR_RIS, %2, hi(%1.val)}
|
2019-02-13 19:56:10 +00:00
|
|
|
yields {XOR_RC, %a, lo(%1.val)}
|
Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens. These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.
Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.
Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant. The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 17:58:54 +00:00
|
|
|
|
2016-12-10 17:23:07 +00:00
|
|
|
pat xor defined($1) /* XOR set */
|
2017-02-11 23:00:56 +00:00
|
|
|
leaving
|
|
|
|
loc $1
|
|
|
|
cal ".xor"
|
|
|
|
|
|
|
|
pat xor !defined($1)
|
|
|
|
leaving
|
|
|
|
cal ".xor"
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat com $1==4 /* NOT word */
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
with exact AND_RR
|
|
|
|
yields {NAND_RR, %1.reg1, %1.reg2}
|
|
|
|
with exact OR_RR
|
|
|
|
yields {NOR_RR, %1.reg1, %1.reg2}
|
|
|
|
with exact XOR_RR
|
|
|
|
yields {EQV_RR, %1.reg1, %1.reg2}
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with REG
|
2007-11-02 18:56:58 +00:00
|
|
|
yields {NOT_R, %1}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2016-12-10 17:23:07 +00:00
|
|
|
pat com defined($1) /* NOT set */
|
2017-01-17 21:31:38 +00:00
|
|
|
leaving
|
|
|
|
loc $1
|
|
|
|
cal ".com"
|
|
|
|
|
|
|
|
pat com !defined($1)
|
|
|
|
leaving
|
|
|
|
cal ".com"
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2016-12-10 17:23:07 +00:00
|
|
|
pat zer $1==4 /* Push zero */
|
|
|
|
leaving
|
|
|
|
loc 0
|
|
|
|
|
2017-12-09 22:21:06 +00:00
|
|
|
pat zer defined($1) /* Create empty set */
|
2017-01-15 21:28:14 +00:00
|
|
|
leaving
|
|
|
|
loc $1
|
|
|
|
cal ".zer"
|
2016-12-10 17:23:07 +00:00
|
|
|
|
2017-12-07 22:16:21 +00:00
|
|
|
|
|
|
|
/* Shifts and rotations */
|
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat sli $1==4 /* Shift left (second << top) */
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with CONST_STACK REG
|
2007-11-02 18:56:58 +00:00
|
|
|
uses reusing %2, REG
|
2017-12-08 00:24:09 +00:00
|
|
|
gen slwi %a, %2, {C, %1.val & 0x1F}
|
2007-11-02 18:56:58 +00:00
|
|
|
yields %a
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with REG REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, REG
|
2017-12-08 00:24:09 +00:00
|
|
|
gen slw %a, %2, %1
|
2007-11-02 18:56:58 +00:00
|
|
|
yields %a
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
pat sli stl $1==4 && inreg($2)==reg_any
|
|
|
|
with CONST_STACK REG
|
|
|
|
gen slwi {LOCAL, $2}, %2, {C, %1.val & 0x1F}
|
|
|
|
with REG REG
|
|
|
|
gen slw {LOCAL, $2}, %2, %1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
pat sri $1==4 /* Shift right signed (second >> top) */
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with CONST_STACK REG
|
2007-11-02 18:56:58 +00:00
|
|
|
uses reusing %2, REG
|
2017-12-08 00:24:09 +00:00
|
|
|
gen srawi %a, %2, {C, %1.val & 0x1F}
|
2007-11-02 18:56:58 +00:00
|
|
|
yields %a
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with REG REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, REG
|
2017-12-08 00:24:09 +00:00
|
|
|
gen sraw %a, %2, %1
|
2007-11-02 18:56:58 +00:00
|
|
|
yields %a
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
pat sri stl $1==4 && inreg($2)==reg_any
|
|
|
|
with CONST_STACK REG
|
|
|
|
gen srawi {LOCAL, $2}, %2, {C, %1.val & 0x1F}
|
|
|
|
with REG REG
|
|
|
|
gen sraw {LOCAL, $2}, %2, %1
|
2007-11-02 18:56:58 +00:00
|
|
|
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
pat sru $1==4 /* Shift right unsigned (second >> top) */
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with CONST_STACK REG
|
2007-11-02 18:56:58 +00:00
|
|
|
uses reusing %2, REG
|
2017-12-08 00:24:09 +00:00
|
|
|
gen srwi %a, %2, {C, %1.val & 0x1F}
|
2007-11-02 18:56:58 +00:00
|
|
|
yields %a
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with REG REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, REG
|
2017-12-08 00:24:09 +00:00
|
|
|
gen srw %a, %2, %1
|
2007-11-02 18:56:58 +00:00
|
|
|
yields %a
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
pat sru stl $1==4 && inreg($2)==reg_any
|
|
|
|
with CONST_STACK REG
|
|
|
|
gen srwi {LOCAL, $2}, %2, {C, %1.val & 0x1F}
|
|
|
|
with REG REG
|
|
|
|
gen srw {LOCAL, $2}, %2, %1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-07 22:16:21 +00:00
|
|
|
pat rol $1==4 /* Rotate left word */
|
|
|
|
with CONST_STACK REG
|
|
|
|
uses reusing %2, REG
|
2017-12-08 00:24:09 +00:00
|
|
|
gen rotlwi %a, %2, {C, %1.val & 0x1F}
|
2017-12-07 22:16:21 +00:00
|
|
|
yields %a
|
|
|
|
with REG REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, REG
|
2017-12-07 22:16:21 +00:00
|
|
|
gen rotlw %a, %2, %1
|
|
|
|
yields %a
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
pat rol stl $1==4 && inreg($2)==reg_any
|
|
|
|
with CONST_STACK REG
|
|
|
|
gen rotlwi {LOCAL, $2}, %2, {C, %1.val & 0x1F}
|
|
|
|
with REG REG
|
|
|
|
gen rotlw {LOCAL, $2}, %2, %1
|
2017-12-07 22:16:21 +00:00
|
|
|
|
|
|
|
/*
|
|
|
|
* ror 4 -> ngi 4, rol 4
|
|
|
|
* because to rotate right by n bits is to rotate left by
|
|
|
|
* (32 - n), which is to rotate left by -n. PowerPC rotlw
|
|
|
|
* handles -n as (-n & 0x1F).
|
|
|
|
*/
|
|
|
|
|
|
|
|
pat ror $1==4 /* Rotate right word */
|
|
|
|
with CONST_STACK REG
|
|
|
|
uses reusing %2, REG
|
2017-12-08 00:24:09 +00:00
|
|
|
gen rotrwi %a, %2, {C, %1.val & 0x1F}
|
2017-12-07 22:16:21 +00:00
|
|
|
yields %a
|
|
|
|
with /* anything */
|
|
|
|
leaving
|
|
|
|
ngi 4
|
|
|
|
rol 4
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
pat ror stl $1==4 && inreg($2)==reg_any
|
|
|
|
with CONST_STACK REG
|
|
|
|
gen rotrwi {LOCAL, $2}, %2, {C, %1.val & 0x1F}
|
|
|
|
with /* anything */
|
|
|
|
leaving
|
|
|
|
ngi 4
|
|
|
|
rol 4
|
|
|
|
stl $2
|
2017-12-07 22:16:21 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
/* Arrays */
|
|
|
|
|
Use .los4 in lar 4 and .sts4 in sar 4.
Our libem had two implementations of loading a block from a stack, one
for lar 4 and one for los 4. Now lar 4 and los 4 share the code in
.los4. Likewise, sar 4 and sts 4 share the code in .sts4.
Rename .los to .los4 and .sts to .sts4, because they implement los 4
and sts 4. Remove the special case for loading or storing 4 bytes,
because we can do it with 1 iteration of the loop. Remove the lines
to "align size" where the size must already be a multiple of 4.
Fix the upper bound check in .aar4.
Change .aar4, .lar4, .los4, .sar4, .sts4 to pass all operands on the
real stack, except that .los4 and .sts4 take the size in register r3.
Have .aar4 set r3 to the size of the array element. So lar 4 is just
.aar4 then .los4, and sar 4 is just .aar4 then .sts4.
ncg no longer calls .lar4 and .sar4 in libem, because it inlines the
code; but I keep .lar4 and .sar4 in libem, because mcg references
them. They might or might not work in mcg.
2017-02-13 20:22:00 +00:00
|
|
|
pat aar $1==4 /* Address of array element */
|
2017-12-23 02:18:58 +00:00
|
|
|
leaving cal ".aar4"
|
Use .los4 in lar 4 and .sts4 in sar 4.
Our libem had two implementations of loading a block from a stack, one
for lar 4 and one for los 4. Now lar 4 and los 4 share the code in
.los4. Likewise, sar 4 and sts 4 share the code in .sts4.
Rename .los to .los4 and .sts to .sts4, because they implement los 4
and sts 4. Remove the special case for loading or storing 4 bytes,
because we can do it with 1 iteration of the loop. Remove the lines
to "align size" where the size must already be a multiple of 4.
Fix the upper bound check in .aar4.
Change .aar4, .lar4, .los4, .sar4, .sts4 to pass all operands on the
real stack, except that .los4 and .sts4 take the size in register r3.
Have .aar4 set r3 to the size of the array element. So lar 4 is just
.aar4 then .los4, and sar 4 is just .aar4 then .sts4.
ncg no longer calls .lar4 and .sar4 in libem, because it inlines the
code; but I keep .lar4 and .sar4 in libem, because mcg references
them. They might or might not work in mcg.
2017-02-13 20:22:00 +00:00
|
|
|
|
|
|
|
pat lar $1==4 /* Load from array */
|
|
|
|
with STACK
|
Add some missing clauses to los, sts, aar, inn, cmi, cmu.
We only implement 'los 4', 'sts 4', 'cmi 4', 'cmu 4', not for sizes
other than 4. Add clause $1==4.
We only implement inn when defined($1).
The rule for aar needs 'kills ALL' because it kills many registers,
like other rules that call libem.
2016-12-10 00:49:50 +00:00
|
|
|
kills ALL
|
2007-11-02 18:56:58 +00:00
|
|
|
gen
|
|
|
|
bl {LABEL, ".aar4"}
|
Use .los4 in lar 4 and .sts4 in sar 4.
Our libem had two implementations of loading a block from a stack, one
for lar 4 and one for los 4. Now lar 4 and los 4 share the code in
.los4. Likewise, sar 4 and sts 4 share the code in .sts4.
Rename .los to .los4 and .sts to .sts4, because they implement los 4
and sts 4. Remove the special case for loading or storing 4 bytes,
because we can do it with 1 iteration of the loop. Remove the lines
to "align size" where the size must already be a multiple of 4.
Fix the upper bound check in .aar4.
Change .aar4, .lar4, .los4, .sar4, .sts4 to pass all operands on the
real stack, except that .los4 and .sts4 take the size in register r3.
Have .aar4 set r3 to the size of the array element. So lar 4 is just
.aar4 then .los4, and sar 4 is just .aar4 then .sts4.
ncg no longer calls .lar4 and .sar4 in libem, because it inlines the
code; but I keep .lar4 and .sar4 in libem, because mcg references
them. They might or might not work in mcg.
2017-02-13 20:22:00 +00:00
|
|
|
/* pass r3 = size from .aar4 to .los4 */
|
|
|
|
bl {LABEL, ".los4"}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
Use .los4 in lar 4 and .sts4 in sar 4.
Our libem had two implementations of loading a block from a stack, one
for lar 4 and one for los 4. Now lar 4 and los 4 share the code in
.los4. Likewise, sar 4 and sts 4 share the code in .sts4.
Rename .los to .los4 and .sts to .sts4, because they implement los 4
and sts 4. Remove the special case for loading or storing 4 bytes,
because we can do it with 1 iteration of the loop. Remove the lines
to "align size" where the size must already be a multiple of 4.
Fix the upper bound check in .aar4.
Change .aar4, .lar4, .los4, .sar4, .sts4 to pass all operands on the
real stack, except that .los4 and .sts4 take the size in register r3.
Have .aar4 set r3 to the size of the array element. So lar 4 is just
.aar4 then .los4, and sar 4 is just .aar4 then .sts4.
ncg no longer calls .lar4 and .sar4 in libem, because it inlines the
code; but I keep .lar4 and .sar4 in libem, because mcg references
them. They might or might not work in mcg.
2017-02-13 20:22:00 +00:00
|
|
|
pat lae lar $2==4 && nicesize(rom($1, 3))
|
2007-11-02 18:56:58 +00:00
|
|
|
leaving
|
|
|
|
lae $1
|
Use .los4 in lar 4 and .sts4 in sar 4.
Our libem had two implementations of loading a block from a stack, one
for lar 4 and one for los 4. Now lar 4 and los 4 share the code in
.los4. Likewise, sar 4 and sts 4 share the code in .sts4.
Rename .los to .los4 and .sts to .sts4, because they implement los 4
and sts 4. Remove the special case for loading or storing 4 bytes,
because we can do it with 1 iteration of the loop. Remove the lines
to "align size" where the size must already be a multiple of 4.
Fix the upper bound check in .aar4.
Change .aar4, .lar4, .los4, .sar4, .sts4 to pass all operands on the
real stack, except that .los4 and .sts4 take the size in register r3.
Have .aar4 set r3 to the size of the array element. So lar 4 is just
.aar4 then .los4, and sar 4 is just .aar4 then .sts4.
ncg no longer calls .lar4 and .sar4 in libem, because it inlines the
code; but I keep .lar4 and .sar4 in libem, because mcg references
them. They might or might not work in mcg.
2017-02-13 20:22:00 +00:00
|
|
|
aar 4
|
2007-11-02 18:56:58 +00:00
|
|
|
loi rom($1, 3)
|
2016-12-09 21:36:42 +00:00
|
|
|
|
Use .los4 in lar 4 and .sts4 in sar 4.
Our libem had two implementations of loading a block from a stack, one
for lar 4 and one for los 4. Now lar 4 and los 4 share the code in
.los4. Likewise, sar 4 and sts 4 share the code in .sts4.
Rename .los to .los4 and .sts to .sts4, because they implement los 4
and sts 4. Remove the special case for loading or storing 4 bytes,
because we can do it with 1 iteration of the loop. Remove the lines
to "align size" where the size must already be a multiple of 4.
Fix the upper bound check in .aar4.
Change .aar4, .lar4, .los4, .sar4, .sts4 to pass all operands on the
real stack, except that .los4 and .sts4 take the size in register r3.
Have .aar4 set r3 to the size of the array element. So lar 4 is just
.aar4 then .los4, and sar 4 is just .aar4 then .sts4.
ncg no longer calls .lar4 and .sar4 in libem, because it inlines the
code; but I keep .lar4 and .sar4 in libem, because mcg references
them. They might or might not work in mcg.
2017-02-13 20:22:00 +00:00
|
|
|
pat sar $1==4 /* Store to array */
|
|
|
|
with STACK
|
2007-11-02 18:56:58 +00:00
|
|
|
kills ALL
|
|
|
|
gen
|
Use .los4 in lar 4 and .sts4 in sar 4.
Our libem had two implementations of loading a block from a stack, one
for lar 4 and one for los 4. Now lar 4 and los 4 share the code in
.los4. Likewise, sar 4 and sts 4 share the code in .sts4.
Rename .los to .los4 and .sts to .sts4, because they implement los 4
and sts 4. Remove the special case for loading or storing 4 bytes,
because we can do it with 1 iteration of the loop. Remove the lines
to "align size" where the size must already be a multiple of 4.
Fix the upper bound check in .aar4.
Change .aar4, .lar4, .los4, .sar4, .sts4 to pass all operands on the
real stack, except that .los4 and .sts4 take the size in register r3.
Have .aar4 set r3 to the size of the array element. So lar 4 is just
.aar4 then .los4, and sar 4 is just .aar4 then .sts4.
ncg no longer calls .lar4 and .sar4 in libem, because it inlines the
code; but I keep .lar4 and .sar4 in libem, because mcg references
them. They might or might not work in mcg.
2017-02-13 20:22:00 +00:00
|
|
|
bl {LABEL, ".aar4"}
|
|
|
|
/* pass r3 = size from .aar4 to .sts4 */
|
|
|
|
bl {LABEL, ".sts4"}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
Use .los4 in lar 4 and .sts4 in sar 4.
Our libem had two implementations of loading a block from a stack, one
for lar 4 and one for los 4. Now lar 4 and los 4 share the code in
.los4. Likewise, sar 4 and sts 4 share the code in .sts4.
Rename .los to .los4 and .sts to .sts4, because they implement los 4
and sts 4. Remove the special case for loading or storing 4 bytes,
because we can do it with 1 iteration of the loop. Remove the lines
to "align size" where the size must already be a multiple of 4.
Fix the upper bound check in .aar4.
Change .aar4, .lar4, .los4, .sar4, .sts4 to pass all operands on the
real stack, except that .los4 and .sts4 take the size in register r3.
Have .aar4 set r3 to the size of the array element. So lar 4 is just
.aar4 then .los4, and sar 4 is just .aar4 then .sts4.
ncg no longer calls .lar4 and .sar4 in libem, because it inlines the
code; but I keep .lar4 and .sar4 in libem, because mcg references
them. They might or might not work in mcg.
2017-02-13 20:22:00 +00:00
|
|
|
pat lae sar $2==4 && nicesize(rom($1, 3))
|
2007-11-02 18:56:58 +00:00
|
|
|
leaving
|
|
|
|
lae $1
|
Use .los4 in lar 4 and .sts4 in sar 4.
Our libem had two implementations of loading a block from a stack, one
for lar 4 and one for los 4. Now lar 4 and los 4 share the code in
.los4. Likewise, sar 4 and sts 4 share the code in .sts4.
Rename .los to .los4 and .sts to .sts4, because they implement los 4
and sts 4. Remove the special case for loading or storing 4 bytes,
because we can do it with 1 iteration of the loop. Remove the lines
to "align size" where the size must already be a multiple of 4.
Fix the upper bound check in .aar4.
Change .aar4, .lar4, .los4, .sar4, .sts4 to pass all operands on the
real stack, except that .los4 and .sts4 take the size in register r3.
Have .aar4 set r3 to the size of the array element. So lar 4 is just
.aar4 then .los4, and sar 4 is just .aar4 then .sts4.
ncg no longer calls .lar4 and .sar4 in libem, because it inlines the
code; but I keep .lar4 and .sar4 in libem, because mcg references
them. They might or might not work in mcg.
2017-02-13 20:22:00 +00:00
|
|
|
aar 4
|
2007-11-02 18:56:58 +00:00
|
|
|
sti rom($1, 3)
|
|
|
|
|
|
|
|
|
|
|
|
/* Sets */
|
|
|
|
|
2016-12-10 17:23:07 +00:00
|
|
|
pat set defined($1) /* Create singleton set */
|
2017-01-15 21:28:14 +00:00
|
|
|
leaving
|
|
|
|
loc $1
|
|
|
|
cal ".set"
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2016-12-10 17:23:07 +00:00
|
|
|
/* Create set (variable), used in lang/m2/libm2/LtoUset.e */
|
|
|
|
pat set !defined($1)
|
2017-01-15 21:28:14 +00:00
|
|
|
leaving
|
|
|
|
cal ".set"
|
2016-12-09 21:36:42 +00:00
|
|
|
|
Add some missing clauses to los, sts, aar, inn, cmi, cmu.
We only implement 'los 4', 'sts 4', 'cmi 4', 'cmu 4', not for sizes
other than 4. Add clause $1==4.
We only implement inn when defined($1).
The rule for aar needs 'kills ALL' because it kills many registers,
like other rules that call libem.
2016-12-10 00:49:50 +00:00
|
|
|
pat inn defined($1) /* Test for set bit */
|
2017-01-15 21:28:14 +00:00
|
|
|
leaving
|
|
|
|
loc $1
|
|
|
|
cal ".inn"
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-01-17 21:31:38 +00:00
|
|
|
pat inn !defined($1)
|
|
|
|
leaving
|
|
|
|
cal ".inn"
|
2016-12-09 21:36:42 +00:00
|
|
|
|
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
/* Boolean resolutions */
|
|
|
|
|
|
|
|
pat teq /* top = (top == 0) */
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG
|
2007-11-02 18:56:58 +00:00
|
|
|
uses reusing %1, REG
|
|
|
|
gen
|
2017-01-26 00:08:55 +00:00
|
|
|
test %1
|
|
|
|
mfcr %a
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XEQ, %a}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat tne /* top = (top != 0) */
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG
|
2007-11-02 18:56:58 +00:00
|
|
|
uses reusing %1, REG
|
|
|
|
gen
|
2017-01-26 00:08:55 +00:00
|
|
|
test %1
|
|
|
|
mfcr %a
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XNE, %a}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat tlt /* top = (top < 0) */
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG
|
2007-11-02 18:56:58 +00:00
|
|
|
uses reusing %1, REG
|
|
|
|
gen
|
2017-01-26 00:08:55 +00:00
|
|
|
test %1
|
|
|
|
mfcr %a
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XLT, %a}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat tle /* top = (top <= 0) */
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG
|
2007-11-02 18:56:58 +00:00
|
|
|
uses reusing %1, REG
|
|
|
|
gen
|
2017-01-26 00:08:55 +00:00
|
|
|
test %1
|
|
|
|
mfcr %a
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XLE, %a}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat tgt /* top = (top > 0) */
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG
|
2007-11-02 18:56:58 +00:00
|
|
|
uses reusing %1, REG
|
|
|
|
gen
|
2017-01-26 00:08:55 +00:00
|
|
|
test %1
|
|
|
|
mfcr %a
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XGT, %a}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
pat tge /* top = (top >= 0) */
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG
|
2007-11-02 18:56:58 +00:00
|
|
|
uses reusing %1, REG
|
|
|
|
gen
|
2017-01-26 00:08:55 +00:00
|
|
|
test %1
|
|
|
|
mfcr %a
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XGE, %a}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-01-26 00:08:55 +00:00
|
|
|
pat cmi teq $1==4 /* Signed second == top */
|
|
|
|
with REG CONST2
|
|
|
|
uses reusing %1, REG={COND_RC, %1, %2.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XEQ, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with CONST2 REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %2, REG={COND_RC, %2, %1.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XEQ, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, REG={COND_RR, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XEQ, %a}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-01-26 00:08:55 +00:00
|
|
|
pat cmi tne $1==4 /* Signed second != top */
|
|
|
|
with REG CONST2
|
|
|
|
uses reusing %1, REG={COND_RC, %1, %2.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XNE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with CONST2 REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %2, REG={COND_RC, %2, %1.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XNE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, REG={COND_RR, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XNE, %a}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-01-26 00:08:55 +00:00
|
|
|
pat cmi tgt $1==4 /* Signed second > top */
|
|
|
|
with REG CONST2
|
|
|
|
uses reusing %1, REG={COND_RC, %1, %2.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XLT, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with CONST2 REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %2, REG={COND_RC, %2, %1.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XGT, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, REG={COND_RR, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XGT, %a}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-01-26 00:08:55 +00:00
|
|
|
pat cmi tge $1==4 /* Signed second >= top */
|
|
|
|
with REG CONST2
|
|
|
|
uses reusing %1, REG={COND_RC, %1, %2.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XLE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with CONST2 REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %2, REG={COND_RC, %2, %1.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XGE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, REG={COND_RR, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XGE, %a}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-01-26 00:08:55 +00:00
|
|
|
pat cmi tlt $1==4 /* Signed second < top */
|
|
|
|
with REG CONST2
|
|
|
|
uses reusing %1, REG={COND_RC, %1, %2.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XGT, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with CONST2 REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %2, REG={COND_RC, %2, %1.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XLT, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, REG={COND_RR, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XLT, %a}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-01-26 00:08:55 +00:00
|
|
|
pat cmi tle $1==4 /* Signed second <= top */
|
|
|
|
with REG CONST2
|
|
|
|
uses reusing %1, REG={COND_RC, %1, %2.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XGE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with CONST2 REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %2, REG={COND_RC, %2, %1.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XLE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, REG={COND_RR, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XLE, %a}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-01-26 00:08:55 +00:00
|
|
|
pat cmu teq $1==4 /* Unsigned second == top */
|
|
|
|
with REG UCONST2
|
|
|
|
uses reusing %1, REG={CONDL_RC, %1, %2.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XEQ, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with UCONST2 REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %2, REG={CONDL_RC, %2, %1.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XEQ, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, REG={CONDL_RR, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XEQ, %a}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-01-26 00:08:55 +00:00
|
|
|
pat cmu tne $1==4 /* Unsigned second != top */
|
|
|
|
with REG UCONST2
|
|
|
|
uses reusing %1, REG={CONDL_RC, %1, %2.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XNE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with UCONST2 REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %2, REG={CONDL_RC, %2, %1.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XNE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, REG={CONDL_RR, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XNE, %a}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-01-26 00:08:55 +00:00
|
|
|
pat cmu tgt $1==4 /* Unsigned second > top */
|
|
|
|
with REG UCONST2
|
|
|
|
uses reusing %1, REG={CONDL_RC, %1, %2.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XLT, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with UCONST2 REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %2, REG={CONDL_RC, %2, %1.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XGT, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, REG={CONDL_RR, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XGT, %a}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-01-26 00:08:55 +00:00
|
|
|
pat cmu tge $1==4 /* Unsigned second >= top */
|
|
|
|
with REG UCONST2
|
|
|
|
uses reusing %1, REG={CONDL_RC, %1, %2.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XLE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with UCONST2 REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %2, REG={CONDL_RC, %2, %1.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XGE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, REG={CONDL_RR, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XGE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
|
|
|
pat cmu tlt $1==4 /* Unsigned second < top */
|
|
|
|
with REG UCONST2
|
|
|
|
uses reusing %1, REG={CONDL_RC, %1, %2.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XGT, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with UCONST2 REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %2, REG={CONDL_RC, %2, %1.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XLT, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, REG={CONDL_RR, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XLT, %a}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-01-26 00:08:55 +00:00
|
|
|
pat cmu tle $1==4 /* Unsigned second <= top */
|
|
|
|
with REG UCONST2
|
|
|
|
uses reusing %1, REG={CONDL_RC, %1, %2.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XGE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with UCONST2 REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %2, REG={CONDL_RC, %2, %1.val}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XLE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, REG={CONDL_RR, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XLE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
|
|
|
|
|
|
|
/* Simple branches */
|
|
|
|
|
|
|
|
proc zxx example zeq
|
|
|
|
with REG STACK
|
2007-11-02 18:56:58 +00:00
|
|
|
gen
|
2017-01-26 00:08:55 +00:00
|
|
|
test %1
|
|
|
|
bxx* {LABEL, $1}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-01-26 00:08:55 +00:00
|
|
|
/* Pop signed int, branch if... */
|
|
|
|
pat zeq call zxx("beq") /* top == 0 */
|
|
|
|
pat zne call zxx("bne") /* top != 0 */
|
|
|
|
pat zgt call zxx("bgt") /* top > 0 */
|
|
|
|
pat zge call zxx("bge") /* top >= 0 */
|
|
|
|
pat zlt call zxx("blt") /* top < 0 */
|
|
|
|
pat zle call zxx("ble") /* top >= 0 */
|
|
|
|
|
|
|
|
/* The peephole optimizer rewrites
|
|
|
|
* cmi 4 zeq
|
|
|
|
* as beq, and does same for bne, bgt, and so on.
|
|
|
|
*/
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-01-26 00:08:55 +00:00
|
|
|
proc bxx example beq
|
|
|
|
with REG CONST2 STACK
|
2007-11-02 18:56:58 +00:00
|
|
|
gen
|
2017-12-08 00:24:09 +00:00
|
|
|
cmpwi %1, %2
|
2017-01-26 00:08:55 +00:00
|
|
|
bxx[2] {LABEL, $1}
|
|
|
|
with CONST2 REG STACK
|
|
|
|
gen
|
2017-12-08 00:24:09 +00:00
|
|
|
cmpwi %2, %1
|
2017-01-26 00:08:55 +00:00
|
|
|
bxx[1] {LABEL, $1}
|
|
|
|
with REG REG STACK
|
|
|
|
gen
|
|
|
|
cmpw %2, %1
|
|
|
|
bxx[1] {LABEL, $1}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-01-26 00:08:55 +00:00
|
|
|
/* Pop two signed ints, branch if... */
|
|
|
|
pat beq call bxx("beq", "beq") /* second == top */
|
|
|
|
pat bne call bxx("bne", "bne") /* second != top */
|
|
|
|
pat bgt call bxx("bgt", "blt") /* second > top */
|
|
|
|
pat bge call bxx("bge", "ble") /* second >= top */
|
|
|
|
pat blt call bxx("blt", "bgt") /* second < top */
|
|
|
|
pat ble call bxx("ble", "bge") /* second >= top */
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-01-26 00:08:55 +00:00
|
|
|
proc cmu4zxx example cmu zeq
|
|
|
|
with REG CONST2 STACK
|
2007-11-02 18:56:58 +00:00
|
|
|
gen
|
2017-12-08 00:24:09 +00:00
|
|
|
cmplwi %1, %2
|
2017-01-26 00:08:55 +00:00
|
|
|
bxx[2] {LABEL, $2}
|
|
|
|
with CONST2 REG STACK
|
|
|
|
gen
|
2017-12-08 00:24:09 +00:00
|
|
|
cmplwi %2, %1
|
2017-01-26 00:08:55 +00:00
|
|
|
bxx[1] {LABEL, $2}
|
|
|
|
with REG REG STACK
|
|
|
|
gen
|
|
|
|
cmplw %2, %1
|
|
|
|
bxx[1] {LABEL, $2}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-01-26 00:08:55 +00:00
|
|
|
/* Pop two unsigned ints, branch if... */
|
|
|
|
pat cmu zeq $1==4 call cmu4zxx("beq", "beq")
|
|
|
|
pat cmu zne $1==4 call cmu4zxx("bne", "bne")
|
|
|
|
pat cmu zgt $1==4 call cmu4zxx("bgt", "blt")
|
|
|
|
pat cmu zge $1==4 call cmu4zxx("bge", "ble")
|
|
|
|
pat cmu zlt $1==4 call cmu4zxx("blt", "bgt")
|
|
|
|
pat cmu zle $1==4 call cmu4zxx("ble", "bge")
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-01-26 00:08:55 +00:00
|
|
|
/* Comparisons */
|
|
|
|
|
|
|
|
/* Each comparison extracts the lt and gt bits from cr0.
|
|
|
|
* extlwi %a, %a, 2, 0
|
|
|
|
* puts lt in the sign bit, so lt yields a negative result,
|
|
|
|
* gt yields positive.
|
|
|
|
* rlwinm %a, %a, 1, 31, 0
|
|
|
|
* puts gt in the sign bit, to reverse the comparison.
|
|
|
|
*/
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat cmi $1==4 /* Signed tristate compare */
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG CONST2
|
|
|
|
uses reusing %1, REG={COND_RC, %1, %2.val}
|
2017-12-08 00:24:09 +00:00
|
|
|
gen rlwinm %a, %a, {C, 1}, {C, 31}, {C, 0}
|
2017-01-26 00:08:55 +00:00
|
|
|
yields %a
|
|
|
|
with CONST2 REG
|
|
|
|
uses reusing %2, REG={COND_RC, %2, %1.val}
|
2017-12-08 00:24:09 +00:00
|
|
|
gen extlwi %a, %a, {C, 2}, {C, 0}
|
2017-01-26 00:08:55 +00:00
|
|
|
yields %a
|
|
|
|
with REG REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, REG={COND_RR, %2, %1}
|
2017-12-08 00:24:09 +00:00
|
|
|
gen extlwi %a, %a, {C, 2}, {C, 0}
|
2017-01-26 00:08:55 +00:00
|
|
|
yields %a
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat cmu $1==4 /* Unsigned tristate compare */
|
2017-01-26 00:08:55 +00:00
|
|
|
with REG UCONST2
|
|
|
|
uses reusing %1, REG={CONDL_RC, %1, %2.val}
|
2017-12-08 00:24:09 +00:00
|
|
|
gen rlwinm %a, %a, {C, 1}, {C, 31}, {C, 0}
|
2017-01-26 00:08:55 +00:00
|
|
|
yields %a
|
|
|
|
with UCONST2 REG
|
|
|
|
uses reusing %2, REG={CONDL_RC, %2, %1.val}
|
2017-12-08 00:24:09 +00:00
|
|
|
gen extlwi %a, %a, {C, 2}, {C, 0}
|
2017-01-26 00:08:55 +00:00
|
|
|
yields %a
|
|
|
|
with REG REG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, REG={CONDL_RR, %2, %1}
|
2017-12-08 00:24:09 +00:00
|
|
|
gen extlwi %a, %a, {C, 2}, {C, 0}
|
2017-01-26 00:08:55 +00:00
|
|
|
yields %a
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat cmp /* Compare pointers */
|
|
|
|
leaving
|
2017-12-23 02:18:58 +00:00
|
|
|
cmu 4
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat cms $1==4 /* Compare blocks (word sized) */
|
2007-11-02 18:56:58 +00:00
|
|
|
leaving
|
2017-12-23 02:18:58 +00:00
|
|
|
cmi 4
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2016-12-10 17:23:07 +00:00
|
|
|
pat cms defined($1)
|
2017-02-13 21:52:32 +00:00
|
|
|
leaving
|
|
|
|
loc $1
|
|
|
|
cal ".cms"
|
|
|
|
|
|
|
|
pat cms !defined($1)
|
|
|
|
leaving
|
|
|
|
cal ".cms"
|
2016-12-09 21:36:42 +00:00
|
|
|
|
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
/* Other branching and labelling */
|
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
/* During an unconditional jump, if the top element on the
|
|
|
|
* stack has 4 bytes, then we hold it in register r3.
|
|
|
|
*/
|
2007-11-02 18:56:58 +00:00
|
|
|
pat lab topeltsize($1)==4 && !fallthrough($1)
|
2017-01-24 16:26:35 +00:00
|
|
|
kills ALL
|
2017-12-23 02:18:58 +00:00
|
|
|
gen labeldef $1
|
|
|
|
yields r3
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat lab topeltsize($1)==4 && fallthrough($1)
|
2017-02-18 00:32:27 +00:00
|
|
|
with REG3 STACK
|
2017-12-23 02:18:58 +00:00
|
|
|
kills ALL
|
|
|
|
gen labeldef $1
|
|
|
|
yields r3
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat lab topeltsize($1)!=4 /* Label without r3 */
|
2007-11-02 18:56:58 +00:00
|
|
|
with STACK
|
2017-12-23 02:18:58 +00:00
|
|
|
kills ALL
|
|
|
|
gen labeldef $1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat bra topeltsize($1)==4 /* Branch with r3 */
|
2017-02-18 00:32:27 +00:00
|
|
|
with REG3 STACK
|
2017-12-23 02:18:58 +00:00
|
|
|
gen b {LABEL, $1}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat bra topeltsize($1)!=4 /* Branch without r3 */
|
2007-11-02 18:56:58 +00:00
|
|
|
with STACK
|
2017-12-23 02:18:58 +00:00
|
|
|
gen b {LABEL, $1}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
/* Miscellaneous */
|
|
|
|
|
|
|
|
pat cal /* Call procedure */
|
|
|
|
with STACK
|
|
|
|
kills ALL
|
2017-12-23 02:18:58 +00:00
|
|
|
gen bl {LABEL, $1}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
pat cai /* Call procedure indirect */
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with REG STACK
|
2007-11-02 18:56:58 +00:00
|
|
|
kills ALL
|
|
|
|
gen
|
2017-02-13 22:44:46 +00:00
|
|
|
mtspr ctr, %1
|
2017-01-26 00:08:55 +00:00
|
|
|
bctrl.
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat lfr $1==4 /* Load function result, word */
|
2017-02-13 22:44:46 +00:00
|
|
|
yields r3
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat lfr $1==8 /* Load function result, double-word */
|
2017-02-13 22:44:46 +00:00
|
|
|
yields r4 r3
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat ret $1==0 /* Return from procedure */
|
|
|
|
gen
|
2017-02-17 02:18:39 +00:00
|
|
|
/* Restore saved registers. */
|
2007-11-02 18:56:58 +00:00
|
|
|
return
|
2017-02-17 02:18:39 +00:00
|
|
|
/* Epilog: restore lr and fp. */
|
|
|
|
lwz r0, {IND_RC_W, fp, 4}
|
|
|
|
mtspr lr, r0
|
|
|
|
lwz r0, {IND_RC_W, fp, 0}
|
|
|
|
/* Free our stack frame. */
|
2017-12-08 00:24:09 +00:00
|
|
|
addi sp, fp, {C, 8}
|
2017-02-17 02:18:39 +00:00
|
|
|
mr fp, r0
|
|
|
|
blr.
|
|
|
|
|
2018-01-05 22:55:50 +00:00
|
|
|
/* If "ret" coerces STACK to REG3, then top will delete the
|
|
|
|
* extra "addi sp, sp, 4".
|
|
|
|
*/
|
|
|
|
|
2017-02-17 02:18:39 +00:00
|
|
|
pat ret $1==4 /* Return from procedure, word */
|
2017-02-18 00:32:27 +00:00
|
|
|
with REG3
|
2017-02-17 02:18:39 +00:00
|
|
|
leaving ret 0
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-02-17 02:18:39 +00:00
|
|
|
pat ret $1==8 /* Return from proc, double-word */
|
2017-12-23 01:37:39 +00:00
|
|
|
with REG3 INT_W
|
2017-02-18 00:32:27 +00:00
|
|
|
gen move %2, r4
|
|
|
|
leaving ret 0
|
2017-12-23 01:37:39 +00:00
|
|
|
with REG3 STACK
|
|
|
|
gen lwz r4, {IND_RC_W, sp, 0}
|
|
|
|
leaving ret 0
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-10-18 16:12:42 +00:00
|
|
|
/*
|
|
|
|
* These rules for blm/bls are wrong if length is zero.
|
|
|
|
* So are several procedures in libem.
|
|
|
|
*/
|
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat blm /* Block move constant length */
|
2017-02-12 00:30:12 +00:00
|
|
|
leaving
|
|
|
|
loc $1
|
|
|
|
bls
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat bls /* Block move variable length */
|
2018-01-23 23:18:40 +00:00
|
|
|
with REG SPFP+REG SPFP+REG
|
|
|
|
/* allows sp as %2, %3 */
|
2017-10-18 16:12:42 +00:00
|
|
|
/* ( src%3 dst%2 len%1 -- ) */
|
|
|
|
uses reusing %1, REG, REG, REG
|
2007-11-02 18:56:58 +00:00
|
|
|
gen
|
2017-12-08 00:24:09 +00:00
|
|
|
srwi %a, %1, {C, 2}
|
2017-10-18 16:12:42 +00:00
|
|
|
mtspr ctr, %a
|
2017-12-08 00:24:09 +00:00
|
|
|
addi %b, %3, {C, 0-4}
|
|
|
|
addi %c, %2, {C, 0-4}
|
2017-10-18 16:12:42 +00:00
|
|
|
1: lwzu %a, {IND_RC_W, %b, 4}
|
|
|
|
stwu %a, {IND_RC_W, %c, 4}
|
2017-02-12 00:30:12 +00:00
|
|
|
bdnz {LABEL, "1b"}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat csa /* Array-lookup switch */
|
2016-11-19 09:55:41 +00:00
|
|
|
with STACK
|
2017-02-13 21:38:26 +00:00
|
|
|
kills ALL
|
2017-12-23 02:18:58 +00:00
|
|
|
gen b {LABEL, ".csa"}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat csb /* Table-lookup switch */
|
2016-11-19 09:55:41 +00:00
|
|
|
with STACK
|
2017-02-13 21:38:26 +00:00
|
|
|
kills ALL
|
2017-12-23 02:18:58 +00:00
|
|
|
gen b {LABEL, ".csb"}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
/* EM specials */
|
|
|
|
|
|
|
|
pat fil /* Set current filename */
|
|
|
|
leaving
|
|
|
|
lae $1
|
2016-09-30 17:40:36 +00:00
|
|
|
ste "hol0+4"
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat lin /* Set current line number */
|
|
|
|
leaving
|
|
|
|
loc $1
|
2016-09-30 17:40:36 +00:00
|
|
|
ste "hol0"
|
2007-11-02 18:56:58 +00:00
|
|
|
|
|
|
|
pat lni /* Increment line number */
|
2017-12-23 02:18:58 +00:00
|
|
|
leaving ine "hol0"
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat lim /* Load EM trap ignore mask */
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
leaving loe ".ignmask"
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat sim /* Store EM trap ignore mask */
|
2017-12-23 02:18:58 +00:00
|
|
|
leaving ste ".ignmask"
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-25 03:37:52 +00:00
|
|
|
pat sig /* Set trap handler, yield old */
|
|
|
|
leaving
|
|
|
|
loe ".trppc"
|
|
|
|
exg 4
|
|
|
|
ste ".trppc"
|
2016-12-09 21:36:42 +00:00
|
|
|
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
pat trp /* Raise EM trap */
|
|
|
|
with REG3
|
|
|
|
kills ALL
|
|
|
|
gen bl {LABEL, ".trp"}
|
|
|
|
|
2007-11-02 18:56:58 +00:00
|
|
|
pat rtt /* Return from trap */
|
2017-12-23 02:18:58 +00:00
|
|
|
leaving ret 0
|
2016-12-09 21:36:42 +00:00
|
|
|
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
pat rck $1==4 /* Range check */
|
|
|
|
leaving cal ".rck"
|
|
|
|
|
2017-12-22 22:04:16 +00:00
|
|
|
/* Our caller's local base, "lxl 0 dch", appears in
|
|
|
|
* lang/cem/libcc.ansi/setjmp/setjmp.e, lang/m2/libm2/par_misc.e
|
2017-02-14 04:22:31 +00:00
|
|
|
*/
|
2017-12-22 22:04:16 +00:00
|
|
|
pat lxl dch $1==0
|
|
|
|
yields {IND_RC_W, fp, FP_OFFSET}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-02-14 04:22:31 +00:00
|
|
|
pat dch /* Dynamic chain: LB -> caller's LB */
|
Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG. The coercion never happens because
I have stopped putting LABEL on the stack. Also remove LABEL from set
ANY_BHW. Retain the move from LABEL to REG because pat gto uses it.
Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.
Remove COMMENT(...) lines from most moves. In my opinion, they took
too much space, both in the table and in the assembly output. The
stacking rules and coercions keep their COMMENT(...) lines.
In test GPR, don't write to RSCRATCH.
Fold several coercions into a single coercion from ANY_BHW uses REG.
Use REG instead of GPR in stack patterns. REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.
In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.
Remove rules to optimize loc loc cii loc loc cii. If $2==$4, the
peephole optimizer can optimize it. If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.
Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2. These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 17:27:16 +00:00
|
|
|
with REG
|
2017-02-14 04:22:31 +00:00
|
|
|
yields {IND_RC_W, %1, FP_OFFSET}
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-02-14 04:22:31 +00:00
|
|
|
pat lpb /* LB -> argument base */
|
2017-12-23 02:18:58 +00:00
|
|
|
leaving adp EM_BSIZE
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2018-01-03 19:51:14 +00:00
|
|
|
/* "gto" must preserve the function result for "lfr", so
|
|
|
|
* longjmp() can pass the return value to setjmp().
|
|
|
|
* - See lang/cem/libcc.ansi/setjmp/setjmp.e
|
|
|
|
*
|
|
|
|
* Must preserve r3 and r4, so no "uses REG".
|
|
|
|
* PowerPC can't add r0 + constant. Use r12.
|
|
|
|
*/
|
2007-11-02 18:56:58 +00:00
|
|
|
pat gto /* longjmp */
|
2017-02-13 21:38:26 +00:00
|
|
|
with STACK
|
|
|
|
gen
|
2018-01-03 19:51:14 +00:00
|
|
|
move {LABEL, $1}, r12
|
|
|
|
move {IND_RC_W, r12, 8}, fp
|
|
|
|
move {IND_RC_W, r12, 4}, sp
|
|
|
|
move {IND_RC_W, r12, 0}, r12
|
|
|
|
mtspr ctr, r12
|
2017-02-13 21:38:26 +00:00
|
|
|
bctr.
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-02-14 04:22:31 +00:00
|
|
|
pat lor $1==0 /* Load local base */
|
2017-12-22 22:04:16 +00:00
|
|
|
leaving lxl 0
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-02-14 04:22:31 +00:00
|
|
|
pat lor $1==1 /* Load stack pointer */
|
2017-12-22 22:04:16 +00:00
|
|
|
with STACK
|
2018-01-23 23:18:40 +00:00
|
|
|
yields sp
|
2016-12-09 21:36:42 +00:00
|
|
|
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
/* Next few patterns for "lor 1" appear in
|
|
|
|
* lang/m2/libm2/par_misc.e
|
|
|
|
*/
|
|
|
|
pat lor adp $1==1 && smalls($2) /* sp + constant */
|
|
|
|
with STACK
|
2018-01-23 23:18:40 +00:00
|
|
|
yields {SUM_RC, sp, $2}
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
|
|
|
|
/* Subtract stack pointer by doing %1 - (sp - 4)
|
|
|
|
* because sp - 4 would point to %1.
|
|
|
|
*/
|
|
|
|
pat lor sbs loc adu $1==1 && $2==4 && $4==4
|
|
|
|
with REG STACK
|
|
|
|
uses reusing %1, REG
|
|
|
|
gen subf %a, sp, %1
|
|
|
|
yields %a
|
|
|
|
leaving loc $3+4 adu 4
|
|
|
|
pat lor sbs $1==1 && $2==4
|
|
|
|
with REG STACK
|
|
|
|
uses reusing %1, REG
|
|
|
|
gen subf %a, sp, %1
|
|
|
|
yields {SUM_RC, %a, 4}
|
|
|
|
|
2017-02-14 04:22:31 +00:00
|
|
|
pat str $1==0 /* Store local base */
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
with INT_W
|
2017-12-22 22:04:16 +00:00
|
|
|
gen move %1, fp
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
with STACK
|
|
|
|
gen
|
|
|
|
lwz fp, {IND_RC_W, sp, 0}
|
|
|
|
addi sp, sp, {C, 4}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-02-14 04:22:31 +00:00
|
|
|
pat str $1==1 /* Store stack pointer */
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
with INT_W
|
|
|
|
kills ALL
|
2017-12-22 22:04:16 +00:00
|
|
|
gen move %1, sp
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
with STACK
|
|
|
|
kills ALL
|
|
|
|
gen lwz sp, {IND_RC_W, sp, 0}
|
2017-01-15 21:28:14 +00:00
|
|
|
|
|
|
|
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
/* Single-precision floating-point */
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat zrf $1==4 /* Push zero */
|
2017-12-25 03:37:52 +00:00
|
|
|
leaving loe ".fs_00000000"
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-10-17 21:53:03 +00:00
|
|
|
pat adf $1==4 /* Add single */
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
with FSREG FSREG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, FSREG
|
2017-12-23 02:18:58 +00:00
|
|
|
gen fadds %a, %2, %1
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
yields %a
|
2017-10-17 21:53:03 +00:00
|
|
|
pat adf stl $1==4 && inreg($2)==reg_float
|
|
|
|
with FSREG FSREG
|
|
|
|
gen fadds {LOCAL, $2}, %2, %1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-10-17 21:53:03 +00:00
|
|
|
pat sbf $1==4 /* Subtract single */
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
with FSREG FSREG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, FSREG
|
2017-12-23 02:18:58 +00:00
|
|
|
gen fsubs %a, %2, %1
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
yields %a
|
2017-10-17 21:53:03 +00:00
|
|
|
pat sbf stl $1==4 && inreg($2)==reg_float
|
|
|
|
with FSREG FSREG
|
|
|
|
gen fsubs {LOCAL, $2}, %2, %1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-10-17 21:53:03 +00:00
|
|
|
pat mlf $1==4 /* Multiply single */
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
with FSREG FSREG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, FSREG
|
2017-12-23 02:18:58 +00:00
|
|
|
gen fmuls %a, %2, %1
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
yields %a
|
2017-10-17 21:53:03 +00:00
|
|
|
pat mlf stl $1==4 && inreg($2)==reg_float
|
|
|
|
with FSREG FSREG
|
|
|
|
gen fmuls {LOCAL, $2}, %2, %1
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat dvf $1==4 /* Divide single */
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
with FSREG FSREG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, FSREG
|
2017-12-23 02:18:58 +00:00
|
|
|
gen fdivs %a, %2, %1
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
yields %a
|
2017-10-17 21:53:03 +00:00
|
|
|
pat dvf stl $1==4 && inreg($2)==reg_float
|
|
|
|
with FSREG FSREG
|
|
|
|
gen fdivs {LOCAL, $2}, %2, %1
|
2007-11-02 18:56:58 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat ngf $1==4 /* Negate single */
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
with FSREG
|
|
|
|
uses reusing %1, FSREG
|
2017-12-23 02:18:58 +00:00
|
|
|
gen fneg %a, %1
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
yields %a
|
2017-10-17 21:53:03 +00:00
|
|
|
pat ngf stl $1==4 && inreg($2)==reg_float
|
|
|
|
with FSREG
|
|
|
|
gen fneg {LOCAL, $2}, %1
|
2007-11-02 18:56:58 +00:00
|
|
|
|
Add fef 4, fif 4. Improve fef 8, fif 8. Other float changes.
When I wrote fef 8, I forgot to test denormalized numbers. Oops. Now
fix two of my mistakes:
- When checking for zero, `extrwi r6, r3, 22, 12` needs to be
`extrwi r6, r3, 20, 12`. There are only 20 bits to extract.
- After the multiplication by 2**64, I forgot to put the fraction in
[0.5, 1) or (-1, 0.5] by setting IEEE exponent = 1022.
Teach fif 8 about signed zero and NaN.
In ncg/table, change cmf so NaN is not equal to any value, and comment
why ordered comparisons don't work with NaN. Also add cost for
fctwiz, remove extra `uses REG`.
Edit comment in cfu8.s because the conditional branch might be before
or after fctwiz.
2018-01-22 19:04:15 +00:00
|
|
|
/* When a or b is NaN, then a < b, a <= b, a > b, a >= b
|
|
|
|
* should all be false. We can't make them false, because
|
|
|
|
* - EM's _cmf_ is only for ordered comparisons.
|
|
|
|
* - The peephole optimizer assumes (a < b) == !(a >= b).
|
|
|
|
*
|
|
|
|
* We do make a == b false and a != b true, by checking the
|
|
|
|
* eq (equal) bit or un (unordered) bit in cr0.
|
|
|
|
*/
|
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat cmf $1==4 /* Compare single */
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
with FSREG FSREG
|
2017-01-26 00:08:55 +00:00
|
|
|
uses REG={COND_FS, %2, %1}
|
Add fef 4, fif 4. Improve fef 8, fif 8. Other float changes.
When I wrote fef 8, I forgot to test denormalized numbers. Oops. Now
fix two of my mistakes:
- When checking for zero, `extrwi r6, r3, 22, 12` needs to be
`extrwi r6, r3, 20, 12`. There are only 20 bits to extract.
- After the multiplication by 2**64, I forgot to put the fraction in
[0.5, 1) or (-1, 0.5] by setting IEEE exponent = 1022.
Teach fif 8 about signed zero and NaN.
In ncg/table, change cmf so NaN is not equal to any value, and comment
why ordered comparisons don't work with NaN. Also add cost for
fctwiz, remove extra `uses REG`.
Edit comment in cfu8.s because the conditional branch might be before
or after fctwiz.
2018-01-22 19:04:15 +00:00
|
|
|
/* Extract lt, gt, un; put lt in sign bit. */
|
|
|
|
gen andisX %a, %a, {C, 0xd000}
|
2017-01-26 00:08:55 +00:00
|
|
|
yields %a
|
|
|
|
|
|
|
|
pat cmf teq $1==4 /* Single second == top */
|
|
|
|
with FSREG FSREG
|
|
|
|
uses REG={COND_FS, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XEQ, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
|
|
|
pat cmf tne $1==4 /* Single second == top */
|
|
|
|
with FSREG FSREG
|
|
|
|
uses REG={COND_FS, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XNE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
|
|
|
pat cmf tgt $1==4 /* Single second > top */
|
|
|
|
with FSREG FSREG
|
|
|
|
uses REG={COND_FS, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XGT, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
|
|
|
pat cmf tge $1==4 /* Single second >= top */
|
|
|
|
with FSREG FSREG
|
|
|
|
uses REG={COND_FS, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XGE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
|
|
|
pat cmf tlt $1==4 /* Single second < top */
|
|
|
|
with FSREG FSREG
|
|
|
|
uses REG={COND_FS, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XLT, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
|
|
|
pat cmf tle $1==4 /* Single second <= top */
|
|
|
|
with FSREG FSREG
|
|
|
|
uses REG={COND_FS, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XLE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
|
|
|
proc cmf4zxx example cmf zeq
|
2017-02-18 00:29:45 +00:00
|
|
|
with FSREG FSREG STACK
|
2017-01-26 00:08:55 +00:00
|
|
|
gen
|
2017-02-13 22:44:46 +00:00
|
|
|
fcmpo cr0, %2, %1
|
2017-01-26 00:08:55 +00:00
|
|
|
bxx* {LABEL, $2}
|
|
|
|
|
|
|
|
/* Pop 2 singles, branch if... */
|
|
|
|
pat cmf zeq $1==4 call cmf4zxx("beq")
|
|
|
|
pat cmf zne $1==4 call cmf4zxx("bne")
|
|
|
|
pat cmf zgt $1==4 call cmf4zxx("bgt")
|
|
|
|
pat cmf zge $1==4 call cmf4zxx("bge")
|
|
|
|
pat cmf zlt $1==4 call cmf4zxx("blt")
|
|
|
|
pat cmf zle $1==4 call cmf4zxx("ble")
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat loc loc cff $1==4 && $2==8 /* Convert single to double */
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
with FSREG
|
|
|
|
yields %1.1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat loc loc cfi $1==4 && $2==4 /* Single to signed int */
|
2017-02-12 04:23:47 +00:00
|
|
|
leaving
|
|
|
|
loc 4
|
|
|
|
loc 8
|
|
|
|
cff
|
|
|
|
loc 8
|
|
|
|
loc 4
|
|
|
|
cfi
|
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat loc loc cfu $1==4 && $2==4 /* Single to unsigned int */
|
2017-02-12 04:23:47 +00:00
|
|
|
leaving
|
|
|
|
loc 4
|
|
|
|
loc 8
|
|
|
|
cff
|
|
|
|
loc 8
|
|
|
|
loc 4
|
|
|
|
cfu
|
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat loc loc cif $1==4 && $2==4 /* Signed int to single */
|
2017-02-12 04:23:47 +00:00
|
|
|
leaving
|
|
|
|
loc 4
|
|
|
|
loc 8
|
|
|
|
cif
|
|
|
|
loc 8
|
|
|
|
loc 4
|
|
|
|
cff
|
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat loc loc cuf $1==4 && $2==4 /* Unsigned int to single */
|
2017-02-12 04:23:47 +00:00
|
|
|
leaving
|
|
|
|
loc 4
|
|
|
|
loc 8
|
|
|
|
cuf
|
|
|
|
loc 8
|
|
|
|
loc 4
|
|
|
|
cff
|
2016-12-09 21:36:42 +00:00
|
|
|
|
Add fef 4, fif 4. Improve fef 8, fif 8. Other float changes.
When I wrote fef 8, I forgot to test denormalized numbers. Oops. Now
fix two of my mistakes:
- When checking for zero, `extrwi r6, r3, 22, 12` needs to be
`extrwi r6, r3, 20, 12`. There are only 20 bits to extract.
- After the multiplication by 2**64, I forgot to put the fraction in
[0.5, 1) or (-1, 0.5] by setting IEEE exponent = 1022.
Teach fif 8 about signed zero and NaN.
In ncg/table, change cmf so NaN is not equal to any value, and comment
why ordered comparisons don't work with NaN. Also add cost for
fctwiz, remove extra `uses REG`.
Edit comment in cfu8.s because the conditional branch might be before
or after fctwiz.
2018-01-22 19:04:15 +00:00
|
|
|
pat fef $1==4 /* Split fraction, exponent */
|
|
|
|
leaving cal ".fef4"
|
|
|
|
|
|
|
|
/* Multiply two singles, then split fraction, integer */
|
|
|
|
pat fif $1==4
|
|
|
|
leaving cal ".fif4"
|
|
|
|
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
|
|
|
|
/* Double-precision floating-point */
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat zrf $1==8 /* Push zero */
|
|
|
|
leaving lde ".fd_00000000"
|
2016-12-09 21:36:42 +00:00
|
|
|
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
pat adf $1==8 /* Add double */
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
with FREG FREG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, FREG
|
2017-12-23 02:18:58 +00:00
|
|
|
gen fadd %a, %2, %1
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
yields %a
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
pat adf sdl $1==8 && inreg($2)==reg_float
|
|
|
|
with FREG FREG
|
|
|
|
gen fadd {DLOCAL, $2}, %2, %1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
pat sbf $1==8 /* Subtract double */
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
with FREG FREG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, FREG
|
2017-12-23 02:18:58 +00:00
|
|
|
gen fsub %a, %2, %1
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
yields %a
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
pat sbf sdl $1==8 && inreg($2)==reg_float
|
|
|
|
with FREG FREG
|
|
|
|
gen fsub {DLOCAL, $2}, %2, %1
|
2016-12-09 21:36:42 +00:00
|
|
|
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
pat mlf $1==8 /* Multiply double */
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
with FREG FREG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, FREG
|
2017-12-23 02:18:58 +00:00
|
|
|
gen fmul %a, %2, %1
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
yields %a
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
pat mlf sdl $1==8 && inreg($2)==reg_float
|
|
|
|
with FREG FREG
|
|
|
|
gen fmul {DLOCAL, $2}, %2, %1
|
2007-11-02 18:56:58 +00:00
|
|
|
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
pat dvf $1==8 /* Divide double */
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
with FREG FREG
|
Fix lim. Improve lxl, lxa, lor, str, procs with no locals.
_lim_ must use _loe_ (load word external), not _lde_ (load double-word
external).
The new patterns for _lxl_, _lxa_, _lor_, _str_ emit shorter code in
some cases. The change from GPR_EXPR to REG_EXPR allows moving
LXFRAME to a register variable.
Add more "reusing" clauses. We have enough registers that ncg almost
never reuses a register, but sometimes it can reuse r3.
In mach.c, emit one fewer instruction in procedures with no locals.
2018-01-05 01:40:35 +00:00
|
|
|
uses reusing %1, reusing %2, FREG
|
2017-12-23 02:18:58 +00:00
|
|
|
gen fdiv %a, %2, %1
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
yields %a
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
pat dvf sdl $1==8 && inreg($2)==reg_float
|
|
|
|
with FREG FREG
|
|
|
|
gen fdiv {DLOCAL, $2}, %2, %1
|
2007-11-02 18:56:58 +00:00
|
|
|
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
pat ngf $1==8 /* Negate double */
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
with FREG
|
2007-11-02 18:56:58 +00:00
|
|
|
uses reusing %1, FREG
|
2017-12-23 02:18:58 +00:00
|
|
|
gen fneg %a, %1
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
yields %a
|
Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size. I expect more programs to
prefer 8-byte double precision.
Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars. Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers. Be more careful with types in mach.c;
don't assume that int and long and full are the same.
In ncg table, add f14 to f31 as register variables, and some rules to
use them. Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar. Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar. Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.
Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars. This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.
Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31). I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-16 00:34:07 +00:00
|
|
|
pat ngf sdl $1==8 && inreg($2)==reg_float
|
|
|
|
with FREG
|
|
|
|
gen fneg {DLOCAL, $2}, %1
|
2007-11-02 18:56:58 +00:00
|
|
|
|
Add fef 4, fif 4. Improve fef 8, fif 8. Other float changes.
When I wrote fef 8, I forgot to test denormalized numbers. Oops. Now
fix two of my mistakes:
- When checking for zero, `extrwi r6, r3, 22, 12` needs to be
`extrwi r6, r3, 20, 12`. There are only 20 bits to extract.
- After the multiplication by 2**64, I forgot to put the fraction in
[0.5, 1) or (-1, 0.5] by setting IEEE exponent = 1022.
Teach fif 8 about signed zero and NaN.
In ncg/table, change cmf so NaN is not equal to any value, and comment
why ordered comparisons don't work with NaN. Also add cost for
fctwiz, remove extra `uses REG`.
Edit comment in cfu8.s because the conditional branch might be before
or after fctwiz.
2018-01-22 19:04:15 +00:00
|
|
|
/* To compare NaN, see comment above pat cmf $1==4 */
|
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat cmf $1==8 /* Compare double */
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
with FREG FREG
|
2017-01-26 00:08:55 +00:00
|
|
|
uses REG={COND_FD, %2, %1}
|
Add fef 4, fif 4. Improve fef 8, fif 8. Other float changes.
When I wrote fef 8, I forgot to test denormalized numbers. Oops. Now
fix two of my mistakes:
- When checking for zero, `extrwi r6, r3, 22, 12` needs to be
`extrwi r6, r3, 20, 12`. There are only 20 bits to extract.
- After the multiplication by 2**64, I forgot to put the fraction in
[0.5, 1) or (-1, 0.5] by setting IEEE exponent = 1022.
Teach fif 8 about signed zero and NaN.
In ncg/table, change cmf so NaN is not equal to any value, and comment
why ordered comparisons don't work with NaN. Also add cost for
fctwiz, remove extra `uses REG`.
Edit comment in cfu8.s because the conditional branch might be before
or after fctwiz.
2018-01-22 19:04:15 +00:00
|
|
|
/* Extract lt, gt, un; put lt in sign bit. */
|
|
|
|
gen andisX %a, %a, {C, 0xd000}
|
2017-01-26 00:08:55 +00:00
|
|
|
yields %a
|
|
|
|
|
|
|
|
pat cmf teq $1==8 /* Double second == top */
|
|
|
|
with FREG FREG
|
|
|
|
uses REG={COND_FD, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XEQ, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
Add fef 4, fif 4. Improve fef 8, fif 8. Other float changes.
When I wrote fef 8, I forgot to test denormalized numbers. Oops. Now
fix two of my mistakes:
- When checking for zero, `extrwi r6, r3, 22, 12` needs to be
`extrwi r6, r3, 20, 12`. There are only 20 bits to extract.
- After the multiplication by 2**64, I forgot to put the fraction in
[0.5, 1) or (-1, 0.5] by setting IEEE exponent = 1022.
Teach fif 8 about signed zero and NaN.
In ncg/table, change cmf so NaN is not equal to any value, and comment
why ordered comparisons don't work with NaN. Also add cost for
fctwiz, remove extra `uses REG`.
Edit comment in cfu8.s because the conditional branch might be before
or after fctwiz.
2018-01-22 19:04:15 +00:00
|
|
|
pat cmf tne $1==8 /* Double second == top */
|
2017-01-26 00:08:55 +00:00
|
|
|
with FREG FREG
|
|
|
|
uses REG={COND_FD, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XNE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
|
|
|
pat cmf tgt $1==8 /* Double second > top */
|
|
|
|
with FREG FREG
|
|
|
|
uses REG={COND_FD, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XGT, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
|
|
|
pat cmf tge $1==8 /* Double second >= top */
|
|
|
|
with FREG FREG
|
|
|
|
uses REG={COND_FD, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XGE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
|
|
|
pat cmf tlt $1==8 /* Double second < top */
|
|
|
|
with FREG FREG
|
|
|
|
uses REG={COND_FD, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XLT, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
|
|
|
pat cmf tle $1==8 /* Double second <= top */
|
|
|
|
with FREG FREG
|
|
|
|
uses REG={COND_FD, %2, %1}
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
yields {XLE, %a}
|
2017-01-26 00:08:55 +00:00
|
|
|
|
|
|
|
proc cmf8zxx example cmf zeq
|
|
|
|
with FREG FREG STACK
|
|
|
|
gen
|
2017-02-13 22:44:46 +00:00
|
|
|
fcmpo cr0, %2, %1
|
2017-01-26 00:08:55 +00:00
|
|
|
bxx* {LABEL, $2}
|
|
|
|
|
|
|
|
/* Pop 2 doubles, branch if... */
|
|
|
|
pat cmf zeq $1==8 call cmf8zxx("beq")
|
|
|
|
pat cmf zne $1==8 call cmf8zxx("bne")
|
|
|
|
pat cmf zgt $1==8 call cmf8zxx("bgt")
|
|
|
|
pat cmf zge $1==8 call cmf8zxx("bge")
|
|
|
|
pat cmf zlt $1==8 call cmf8zxx("blt")
|
|
|
|
pat cmf zle $1==8 call cmf8zxx("ble")
|
2016-12-09 21:36:42 +00:00
|
|
|
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
/* Convert double to single */
|
|
|
|
/* reg_float pattern must be first, or it goes unused! */
|
|
|
|
pat loc loc cff stl $1==8 && $2==4 && inreg($4)==reg_float
|
|
|
|
with FREG
|
|
|
|
gen frsp {LOCAL, $4}, %1
|
|
|
|
pat loc loc cff $1==8 && $2==4
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
with FREG
|
|
|
|
uses reusing %1, FSREG
|
Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar. If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.
There are two ways to put the result in the regvar without %a:
1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
can move the token to the regvar without using %a.
2. Provide a pattern, like `sli stl`, that just puts the result
in `{LOCAL, $2}` and not %a.
Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.
Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar. Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 22:19:26 +00:00
|
|
|
gen frsp %a, %1
|
Enable the Hall check again, and get powerpc to pass it.
Upon enabling the check, mach/powerpc/ncg/table fails to build as ncgg
gives many errors of "Previous rule impossible on empty stack". David
Given reported this problem in 2013:
https://sourceforge.net/p/tack/mailman/message/30814694/
Commit c93cb69 commented out the error in util/ncgg/cgg.y to disable
the Hall check. This commit enables it again. In ncgg, the Hall
check is checking that a rule is possible with an empty fake stack.
It would be possible if ncg can coerce the values from the real stack
to the fake stack. The powerpc table defined coercions from STACK to
{FS, %a} and {FD, %a}, but the Hall check didn't understand the
coercions and rejected each rule "with FS" or "with FD".
This commit removes the FS and FD tokens and adds a new group of FSREG
registers for single-precision floats, while keeping FREG registers
for double precision. The registers overlap, with each FSREG
containing one FREG, because it is the same register in PowerPC
hardware. FS tokens become FSREG registers and FD tokens become FREG
registers. The Hall check understands the coercions from STACK to
FSREG and FREG. The idea to define separate but overlapping registers
comes from the PDP-11 table (mach/pdp/ncg/table).
This commit also removes F0 from the FREG group. This is my attempt
to keep F0 off the fake stack, because one of the stacking rules uses
F0 as a scratch register (FSCRATCH).
2016-09-18 19:08:55 +00:00
|
|
|
yields %a
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat loc loc cfi $1==8 && $2==4 /* Double to signed int */
|
2017-02-12 04:23:47 +00:00
|
|
|
with FREG STACK
|
|
|
|
uses reusing %1, FREG
|
2007-11-02 18:56:58 +00:00
|
|
|
gen
|
2017-02-12 04:23:47 +00:00
|
|
|
fctiwz %a, %1
|
2017-02-13 22:44:46 +00:00
|
|
|
stfdu %a, {IND_RC_D, sp, 0-8}
|
2017-12-08 00:24:09 +00:00
|
|
|
addi sp, sp, {C, 4}
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat loc loc cfu $1==8 && $2==4 /* Double to unsigned int */
|
|
|
|
leaving cal ".cfu8"
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat loc loc cif $1==4 && $2==8 /* Signed int to double */
|
|
|
|
leaving cal ".cif8"
|
2016-12-09 21:36:42 +00:00
|
|
|
|
2017-12-23 02:18:58 +00:00
|
|
|
pat loc loc cuf $1==4 && $2==8 /* Unsigned int to double */
|
|
|
|
leaving cal ".cuf8"
|
2016-10-17 04:39:59 +00:00
|
|
|
|
2017-02-12 21:44:37 +00:00
|
|
|
pat fef $1==8 /* Split fraction, exponent */
|
2017-12-23 02:18:58 +00:00
|
|
|
leaving cal ".fef8"
|
2016-10-17 04:39:59 +00:00
|
|
|
|
2017-02-12 21:44:37 +00:00
|
|
|
/* Multiply two doubles, then split fraction, integer */
|
|
|
|
pat fif $1==8
|
2017-12-23 02:18:58 +00:00
|
|
|
leaving cal ".fif8"
|