Commit graph

2693 commits

Author SHA1 Message Date
George Koehler 5f2a7b260f Optimize mr. X, X after some instructions.
For example, when ncg emits
    slw r9,r8,r5
    mr. r9,r9
then top simplifies the code to
    slw. r9,r8,r5
2017-12-22 22:32:16 -05:00
George Koehler c964eeddba Remove INT32 and such. Adjust indentation.
I understand `loi 4` more easily than `loi INT32`, because `loi 4`
appears in .e files.  So remove INT8, INT16, INT32, INT64.

Add a comment to explain r3 during unconditional jumps.
2017-12-22 21:18:58 -05:00
George Koehler f96f918a29 Generate shorter code for ret 4 and ret 8. 2017-12-22 20:37:39 -05:00
George Koehler 5867ca2f2c Remove two obsolete patterns.
These patterns seem to have no effect on the generated code.
2017-12-22 19:57:42 -05:00
George Koehler 2eeee36f78 Add FRAME_V tokens for local variables.
When storing to a local, stop killing the tokens of other locals,
unless they might overlap with the stored local.  This helps some
procedures that juggle locals when the locals aren't in registers.

Also use FRAME_V tokens for locals in statically enclosing procedures.
Rewrite _lxa_ as _lxl_, to skip the `addi ?,?,8` if we can add 8 to
the next constant.  The PowerPC code from _lxl_ is now sometimes
better, sometimes worse than before.

The i386 table provided the idea to use %size to find overlapping
locals.
2017-12-22 17:04:16 -05:00
George Koehler ad47fa5fe3 Add splitting coercions for IND_ALL_D.
Delete my wrong comment (from commits cfbc537, a8f62f4, 5432bd0) which
claimed that such coercions are not possible.
2017-12-18 20:59:04 -05:00
George Koehler 24abaf6a25 Enable conditional expressions in splitting coercions.
ncgg has parsed the optional conditional expression (optexpr) of each
splitting coercion since commit 72b83cc in 1985; but for almost 33
years, ncg has ignored the expression in c2_expr.

Few tables had conditional coercions (I only found them in arm and
m68020), and no tables had conditional splitting coercions, so this
only becomes a problem now as I try to add a conditional splitting
coercion to powerpc.
2017-12-18 20:39:56 -05:00
George Koehler 5e99baabdf Rename two tokens. CONST_HZ was not hertz (Hz). 2017-12-18 12:36:10 -05:00
George Koehler d8fa9d1b2a In coercions, try to reuse a register with the same token.
This reduces code size.
2017-12-17 12:45:27 -05:00
George Koehler b0d75fed37 Rename ANY_BHW to INT_W; add FLOAT_W, FLOAT_D.
INT_W, the integer set, continues to exclude FSREG, because we can't
easily move FSREG to GPR.

ANY4 becomes ISET+FLOAT_W and ANY8 becomes FLOAT_D.
2017-12-17 11:56:02 -05:00
George Koehler 5ba83100d6 Delete rules for sti 8 with REG IND_RC_D, with REG IND_RR_D.
Prefer the rule with REG FREG, by coercing IND_RC_D or IND_RR_D to
FREG.  This rule looks better to ncg.  When ncg chose between coercion
to REG IND_RC_D or coercion to REG FREG, it chose REG FREG.  It only
chose REG IND_RC_D if the stack had exact REG IND_RC_D.
2017-12-12 13:36:43 -05:00
George Koehler 11a54e0a7c These instructions write to the CR. 2017-12-10 14:01:14 -05:00
George Koehler 504d2aa34e Revise stack shuffles and integer conversions in PowerPC ncg.
Allow asp 4, exg 4 to shuffle tokens without coercing them into
registers; but comment why dup 4, dup 8 coerce tokens into registers.

Allow dup, dus, exg with larger sizes; and add tests dup_e.e and
exg_e.e to check that dup 20, dus, exg 20 work as well in powerpc as
in i80 and i86.

Then powerpc failed to compile loc 2 loc 4 cuu in dup_e.e.  Revise the
integer conversions, so powerpc can compile and pass the test.
2017-12-09 18:57:10 -05:00
George Koehler 48788287b8 Add more chances to put results in register variables.
When a rule `uses REG ... yields %a`, the result %a is always a
temporary, never a regvar.  If the EM code uses _stl_ to put the
result in a regvar, then ncg emits _mr_ to move %a to the regvar.

There are two ways to put the result in the regvar without %a:

  1. Yield a token, as in `yields {MUL_RR, %2, %1}`, so that _stl_
     can move the token to the regvar without using %a.

  2. Provide a pattern, like `sli stl`, that just puts the result
     in `{LOCAL, $2}` and not %a.

Allow some tokens, like SUM_RIS and XEQ, onto the stack; and add
tokens like MUL_RR, and patterns like `sli stl`.

Delete patterns for `stl lol` and `sdl ldl` to avoid an extra
temporary %a when the local is a regvar.  Delete `lal sti lal loi`
because it would emit wrong code.
2017-12-08 17:19:26 -05:00
George Koehler 6b933db90b Split C from CONST.
Rename token CONST to C.  Define set CONST = C + CONST_STACK.  The
instructions with CONST operands can now accept CONST_STACK tokens;
some cases of {CONST, %1.val} become %1.

Also simplify two of _rlwinm_ into _slwi_ and _srwi_.
2017-12-07 19:24:09 -05:00
George Koehler a1d1f38691 Add test for EM rol, ror. Fix i80, i86, powerpc.
EM instructions _rol_ and _ror_ do rotate an integer left or right.
Our compilers and optimizers never emit _rol_ nor _ror_, but I might
want to use them in the future.

Add _rol_ and _ror_ to powerpc.  Fix `rol 4` and `ror 4` in both i80
and i86, where the rules for `rol 4` and `ror 4` seem to have never
been tested until now.
2017-12-07 17:16:21 -05:00
George Koehler c95bcac91d Correct the stack pointer when i80 shrinks an integer.
The code used `sphl` to set the stack pointer, but the correct value
was in de, not hl.  Fix by swapping the values of de and hl, so `sphl`
is now correct.  When we shrink an integer from 4 to 2 bytes, both
registers de and hl point to copies of the result, but only one
register preserves the stack below the result.

This fixes writehex() in tests/plat/lib/test.c, when I compile it with
ack -mcpm, so it preserves the pointer to "0123456789abcdef", so it
writes hexadecimal digits and not garbage.

This bug goes back to commit 157b243 of Mar 18, 1985, so the bug is
32 years old, and probably the oldest bug that I ever fixed.
2017-12-07 15:39:41 -05:00
George Koehler 34cf0c8b63 Kill registers a, de, when i80 ncg does Call libem.
I compiled tests/plat/lib/test.c with ack -mcpm, but i80 ncg did emit
wrong code in writehex(uint32_t) for

    "0123456789abcdef"[code & 0xf]

The code called '.and' to evaluate `code & 0xf`, then tried to call
'.cii' to narrow the result from 4 to 2 bytes, but it passed garbage
instead of 4 to '.cii'.  The rule for '.and' was

    pat and defined($1)
    kills ALL
    uses dereg={const2,$1}
    gen Call {label,".and"}

This failed to kill register de={const2,4}, so ncg pushed de,
expecting to push 4, but actually pushing garbage.

Fix such rules using `mvi a,...` or `lxi de,...` so ncg doesn't track
the token in the register.  This is like the i86 table.  A different
fix would use a dummy instruction `killreg a` or `killreg de` like the
m68020 table.

Also correct 1 to $1 when calling '.exg'.
2017-12-06 22:14:00 -05:00
George Koehler 760da1f421 Fix build with gcc.
gcc gave an error because the `char *` parameter doesn't match the
`const char *` in the prototype of regsave().  clang didn't give an
error.  I added the prototype in commit 5301cce.
2017-11-17 17:52:37 -05:00
George Koehler 5301cceee3 Declare machine-dependent functions in mach/proto/ncg
This breaks all machines because the declared return type void
disagrees with the implicit return type int (when I compile mach.c
with clang).  Unbreak i386, i80, i86, m68020, powerpc, vc4 by adding
the return types to mach.c.  We don't build any other machines; they
are broken since commit a46ee91 (May 19, 2013) declared void prolog()
and commit fd91851 (Nov 10, 2016) declared void mes(), with both
declarations in mach/proto/ncg/fillem.c.

Also fix mach/vc4/ncg/mach.c where type full is long, so fprintf()
must use "%ld" not "%d" to print full nlocals.
2017-11-13 14:23:44 -05:00
George Koehler e04166b85d More prototypes, less register in mach/proto/ncg
Files that #include "equiv.h" must do so after including "data.h", now
that a function prototype in equiv.h uses type rl_p from data.h.

Adjust style, changing some `for(...)` to `for (...)`.  The style in
mach/proto/ncg is less than consistent; the big annoyance now is that
some files want tabs at 4 spaces, others want tabs at 8 spaces.
2017-11-13 12:44:17 -05:00
George Koehler 909b0d5bf3 Add prototypes to functions in subr.c
Put the declarations in "data.h", because that header declares the
types cost_t and token_p.  Also #include <cgg_cg.h> from "data.h" to
get types c3_p and set_p, and guard <cgg_cg.h> against multiple
inclusion.
2017-11-12 16:11:05 -05:00
George Koehler ba2a45180c Prototypes for string functions. More static. 2017-11-12 11:25:18 -05:00
George Koehler 98b27dd505 Remove old "assert.h" in mach/proto/ncg
*Important:*  You must "make clean" after checking out this commit,
because the build had copied the old "assert.h" to several places in
obj/.  If you don't "make clean", then the compiler finds the old
"assert.h" before libc <assert.h>, and the build fails because this
commit removes badassertion() in subr.c.  After "make clean", the
compiler finds libc <assert.h> and the build succeeds.
2017-11-11 19:35:48 -05:00
George Koehler ac4cbd735e Use libc assert(); fix dependencies; unbreak isduo().
Switch from custom assert() to libc assert() in mach/proto/as.
Continue to disable asserts if DEBUG == 0.

This change found a problem in the build system; comm2.y was missing
depedencies on comm0.h and comm1.h.  Add the missing dependencies to
the cppfile rule.  Allow the dependencies by modifying cppfile in
first/build.lua to act like cfile if t.dir is false.

Now that comm2.y gets rebuilt, I must fix the wrong prototype of
yyparse() in comm1.h.

I got unlucky as induo() in comm5.c was reading beyond the end of the
array.  It found an operator "= " ('=' then space) in the garbage, so
it returned a garbage token number, and "VAR = 123" became a syntax
error.  Unbreak induo() by terminating the array.
2017-11-11 16:09:05 -05:00
George Koehler d347207e60 Add more prototypes in mach/proto/as
Change "register i;" to "int i;" to so clang stops warning about
implicit int.  Use function prototypes so clang stops warning about
implicitly declared functions.
2017-11-10 23:30:46 -05:00
George Koehler 0102cc8934 lwzu writes to the register in the token. 2017-10-19 12:44:46 -04:00
George Koehler 2a92f9bf4d Add a few more error checks and adjustments to reglap.
In util/ncgg, add two more errors for tables using reglap:
 - "Two sizes of reg_float can't be same size"
 - "Missing reg_float of size %d to contain %s"

In mach/proto/ncg, rename macro isregvar_size() to PICK_REGVAR(), so
the macro doesn't look like a function.  This macro sometimes doesn't
evaluate its second argument.

In mach/powerpc/ncg/mach.c, change type of lfs_set to uint32_t, and
change the left shifts from 1U<<regno to (uint32_t)1<<regno, because
1U would be too small for machines with 16-bit int.
2017-10-18 22:00:12 -04:00
George Koehler 73ad5a227d Rename RELOLIS to RELOPPC_LIS.
This relocation is specific to PowerPC.  @davidgiven suggested the
name RELOPPC_LIS in
https://github.com/davidgiven/ack/pull/52#issuecomment-279856501

Reindent the list in h/out.h and util/led/ack.out.5 because
RELOLIS_PPC is a long name.  I use spaces and no tabs because the tabs
looked bad in the manual page.
2017-10-18 15:39:31 -04:00
George Koehler 459a9b5949 Use lwzu, stwu to tighten more loops.
Because lwzu or stwu moves the pointer, I can remove an addi
instruction from the loop, so the loop is slightly faster.

I wrote a benchmark in Modula-2 that exercises some of these loops.  I
measured its time on my old PowerPC Mac.  Its user time decreases from
8.401s to 8.217s with the tighter loops.
2017-10-18 12:12:42 -04:00
George Koehler ac2b0710c8 Add more rules for single-precision reg_float.
The result of single-precision fadds, fsubs, and such can go into a
register variable, like we already do with double precision.  This
avoids an extra fmr from a temporary register to the regvar.
2017-10-17 17:53:03 -04:00
George Koehler 47bd0ef7a7 Stop inlining code to convert integers to floats.
Do the conversion by calling .cif8 or .cuf8 in libem, as it was done
before my commit 1de1e8f.  I used the inline conversion to experiment
with the register allocator, which was too slow until c5bb3be.

Now that libem has the only copy of the code, move some comments and
code changes there.
2017-10-17 17:00:28 -04:00
George Koehler 893e170015 Use my new regvar_w() and regvar_d() in PowerPC ncg.
Rename GPRE to GPR_EXPR, then define FPR_EXPR and FSREG_EXPR.  Use
them for moves to register variables.

Keep "kills regvar($1)", because deleting it and recompiling libc
would cause many failures in my test programs.  Add comment to warn,
  /* ncg fails to infer that regvar($1) is dead! */

Remove "kills LOCAL %off==$1" because it seems to have no effect.
2017-10-17 14:15:33 -04:00
George Koehler 307a8b996e Add regvar_w() and regvar_d() for use with reglap.
If the ncg table uses reglap, then regvar($1, reg_float) would have
two sizes of registers.  An error from ncgg would happen if regvar()
was in a token that allows only one size.  Now one can pick a size
with regvar_w() for word size or regvar_d() for double-word size.

Add regvar_d and regvar_w as keywords in ncgg.  Modify EX_REGVAR to
include the register size.  In ncg, add some checks for the register
size.  In tables without reglap, regvar() works as before, and ncg
ignores the register size in EX_REGVAR.
2017-10-17 12:05:41 -04:00
George Koehler 5432bd03d6 Do a move when coercing FREG to FREG or FSREG to FSREG. 2017-10-16 12:07:55 -04:00
George Koehler f0619ea4ae PowerPC ncg never uses the rules to stack LOCAL or DLOCAL. 2017-10-15 15:22:52 -04:00
George Koehler aa876ff4c2 Fix reglap for procedures that use both sizes of reg_float.
After the RA phase of ego, a procedure may put single-word and
double-word values in the same reg_float.  Then ncg will use both
LOCAL and DLOCAL tokens at the same offset.

I add isregvar_size() to ncg.  It receives the size of the LOCAL or
DLOCAL token, and picks the register of the correct size.  This fixes
a problem where ncg got the wrong-size register and corrupted the
stack.  This problem caused one of my test programs to segfault from
stack underflow.

Also adjust how fixregvars() handles both sizes.
2017-10-15 13:15:03 -04:00
George Koehler b342b83d28 Add function prototypes to mach/proto/ncg/regvar.c 2017-10-15 11:01:18 -04:00
George Koehler d6e9eac785 Merge branch 'default' into kernigh-linuxppc
This merges several fixes and improvements from upstream.  This
includes commit 5f6a773 to turn off qemuppc.  I see several failing
tests from qemuppc; this merge will hide the test failures.
2017-10-14 13:50:49 -04:00
George Koehler 7e9348169c Add reglap to ncg. Add 4-byte reg_float to PowerPC ncg.
The new feature "reglap" allows two sizes of floating-point register
variables (reg_float), if each register overlaps a single register of
the other size.  PowerPC ncg uses reglap to define 4-byte instances
of f14 to f31 that overlap the 8-byte instances.

When ncgg sees the definition of fs14("f14")=f14, it removes the
8-byte f14 from its rvnumbers array, and adds the 4-byte fs14 in its
place.  Later, when ncg puts a variable in fs14, if it is an 8-byte
variable, then ncg switches to the 8-byte f14.  The code has
/* reglap */ comments in util/ncgg or #ifdef REGLAP in mach/proto/ncg

reglap became necessary because my commit a20b87c caused PowerPC ego
to allocate reg_float in both 4-byte and 8-byte sizes.
2017-10-14 12:40:04 -04:00
David Given 64f2fa9d46 Stop using mktemp() --- on Haiku, it always generates the same filenames,
pretty much guaranteeing temporary file overwrites on parallel builds. Use
mkstemp() instead which creates the files atomically.
2017-08-06 13:22:05 +02:00
George Koehler 2c266c631a Reorder registers. Fix problem with ret 8.
After c5bb3be, ncg began to allocate regvars from r13 up.  I reorder
the regvars so ncg again allocates them from r31 down.  I also reorder
the other registers.

This exposed a bug in my rule for ret 8.  It was wrong if item %2 was
in r3, because I moved %1 to r3 before %2 to r4.  Fix it by adding
back an individual register class for r3 (called REG3 here, GPR3 in
c5bb3be).

Also fix my typo in mach.c that made a syntax error in assembly.
2017-02-17 19:32:27 -05:00
George Koehler 23c365c939 Fix comparison of 4-byte floats.
I broke it in f64b7d8.  My stack pattern had the wrong type of
registers.  The comparison popped too many bytes and corrupted the
stack.
2017-02-17 19:29:45 -05:00
George Koehler 736c45453c Remove .ret from libem and inline the code.
This removes a wrong-way dependency of libsys on libem.  The C
functions in libsys called .ret, but libsys is after libem in the
linker arguments, so the linker didn't find .ret unless something else
had called .ret.  Almost everything called .ret, but I got a linker
error when I wrote an assembly program using the EM runtime, because
my assembly program didn't call .ret.

Add a dummy comment to build.lua, so git checkout touches that file,
the build system reconfigures itself, and the *.s glob sees that ret.s
has gone.
2017-02-16 21:18:39 -05:00
George Koehler e6df553ebf For PowerPC, never put a reg_float value in a reg_any.
With this type check, I can change the size checks into assertions.
2017-02-16 20:30:17 -05:00
George Koehler aa47f52166 Switch error() and fatal() in mach/proto/ncg to stdarg.
This is like David Given's change to util/ncgg in d89f172.  I need
this change in mach/proto/ncg to see fatal messages, because a 64-bit
pointer doesn't fit in an int.
2017-02-16 20:26:53 -05:00
George Koehler cbe5d8640b Add floating-point register variables to PowerPC ncg.
Use f14 to f31 as register variables for 8-byte double-precison.
There are no regvars for 4-byte double precision, because all
regvar(reg_float) must have the same size.  I expect more programs to
prefer 8-byte double precision.

Teach mach/powerpc/ncg/mach.c to emit stfd and lfd instructions to
save and restore 8-byte regvars.  Delay emitting the function prolog
until f_regsave(), so we can use one addi to make stack space for both
local vars and saved registers.  Be more careful with types in mach.c;
don't assume that int and long and full are the same.

In ncg table, add f14 to f31 as register variables, and some rules to
use them.  Add rules to put the result of fadd, fsub, fmul, fdiv, fneg
in a regvar.  Without such rules, the result would go in a scratch
FREG, and we would need fmr to move it to the regvar.  Also add a rule
for pat sdl inreg($1)==reg_float with STACK, so we can unstack the
value directly into the regvar, again without a scratch FREG and fmr.

Edit util/ego/descr/powerpc.descr to tell ego about the new float
regvars.  This might not be working right; ego usually decides against
using any float regvars, so ack -O1 (not running ego) uses the
regvars, but ack -O4 (running ego) doesn't use the regvars.

Beware that ack -mosxppc runs ego using powerpc.descr but -mlinuxppc
and -mqemuppc run ego without a config file (since 8ef7c31).  I am
testing powerpc.descr with a local edit to plat/linuxppc/descr to run
ego with powerpc.descr there, but I did not commit my local edit.
2017-02-15 19:34:07 -05:00
George Koehler cf728c2a2a Implement lxl for PowerPC ncg.
This fixes lxl 1 (so it follows the static chain, not the dynamic
chain) and provides lxl 2 and greater.  The Modula-2 compiler uses lxl
for nested procedures, so they can access the variables of the
enclosing procedures.
2017-02-13 23:22:31 -05:00
George Koehler a8f62f44d8 Remove REG_PAIR.
I added REG_PAIR in cfbc537 to speed up the register allocator,
because ncg was taking about 2 seconds on each sti 8.  I defined only
4 such pairs, so allocating REG_PAIR was much faster than allocating
REG REG.

After my last commit c5bb3be, allocation of REG REG is fast, and
REG_PAIR seems unnecessary.
2017-02-13 18:11:27 -05:00
George Koehler c5bb3be495 Speed up register allocation by removing some register classes.
The table for PowerPC had placed each GPR and FPR into an individual
register class (like GPR3, GPR4, FPR1, FPR2), and had used these
classes to coerce stack values into specific registers.  But ncg does
not like having many register classes.

In http://tack.sourceforge.net/olddocs/ncg.pdf
Hans van Staveren wrote:

> Every extra property means the register set is more unorthogonal and
> *cg* execution time is influenced by that, because it has to take
> into account a larger set of registers that are not equivalent.  So
> try to keep the number of different register classes to a minimum.

Recent changes to the PowerPC table have removed many coercions to
specific registers.  Many functions in libem switched from taking
values in registers to taking them from the stack (see dc05cb2).

I now remove all 64 individual register classes of GPR and FPR.  In
the few cases where I need a stack value in a specific register, I now
do a move (as the arm and m68020 tables do).

This commit speeds the compilation of some files.  For my test file
fconv.c, the compilation time goes from over 20 seconds to under 1
second.  My fconv.c has 4 conversions from floats to integers, and the
table has my experimental rules that do the conversions by allocating
4 or 5 registers.
2017-02-13 17:44:46 -05:00
George Koehler dc05cb2dc8 Add pat cms !defined($1)
Switch .cms to pass inputs and outputs on the real stack, not in
registers; like we do with .and, .or (81c677d) and .xor (c578c49).

At this point, nearly all functions in libem use the real stack, not
registers, for passing inputs and outputs.  This simplifies the ncg
table (which needs fewer lists of specific registers) but slows calls
to libem.

For example, after ba9b021, each call to .aar4 is about 10
instructions slower.  I moved 3 inputs and 1 output from registers to
the real stack.  A program would take 4 instructions to move registers
to stack, 4 to move stack to registers, and perhaps 2 to adjust the
stack pointer.
2017-02-13 16:52:32 -05:00
George Koehler 89dd80e34d Add missing instances of "kills ALL" or "with STACK". 2017-02-13 16:38:26 -05:00
George Koehler ba9b021253 Use .los4 in lar 4 and .sts4 in sar 4.
Our libem had two implementations of loading a block from a stack, one
for lar 4 and one for los 4.  Now lar 4 and los 4 share the code in
.los4.  Likewise, sar 4 and sts 4 share the code in .sts4.

Rename .los to .los4 and .sts to .sts4, because they implement los 4
and sts 4.  Remove the special case for loading or storing 4 bytes,
because we can do it with 1 iteration of the loop.  Remove the lines
to "align size" where the size must already be a multiple of 4.

Fix the upper bound check in .aar4.

Change .aar4, .lar4, .los4, .sar4, .sts4 to pass all operands on the
real stack, except that .los4 and .sts4 take the size in register r3.
Have .aar4 set r3 to the size of the array element.  So lar 4 is just
.aar4 then .los4, and sar 4 is just .aar4 then .sts4.

ncg no longer calls .lar4 and .sar4 in libem, because it inlines the
code; but I keep .lar4 and .sar4 in libem, because mcg references
them.  They might or might not work in mcg.
2017-02-13 15:22:00 -05:00
George Koehler 54949f713f Change .fef8 and .fif8 to pass values on the stack.
Reorder the code in .fef8 and .fif8 so that in the usual case, we fall
through to the blr without taking any branches.  The usual case, by my
guess, is .fef8 with normalized numbers or .fif8 with small integers.

I change .fef8 and .fif8 to pass values on the real stack, not in
specific registers.  This simplifies the ncg table, and might help me
experiment with changes to the ncg table.

This change might or might not help mcg.  Seems that mcg always uses
the stack to pass values to libem, but I have not tested .fef8 or
.fif8 with mcg.
2017-02-12 16:44:37 -05:00
George Koehler 1de1e8f7f0 Experiment with conversions between integers and floats.
Switch some conversions from libem calls to inline code.  The
conversions from integers to floats are now too slow, because each
conversion allocates 4 or 5 registers, and the register allocator is
too slow.  I might use these slow conversions to experiment with the
register allocator.

I add the missing conversions between 4-byte single floats and
integers, simply by going through 8-byte double floats.  (These
replace the calls to nonexistant functions in libem.)

I remove the placeholder for fef 4, because it doesn't exist in libem,
and our language runtimes only use fef 8.
2017-02-12 15:45:28 -05:00
George Koehler 2e41c392fa Implement blm and bls using an inline loop.
This replaces a call to memmove() in libc.  That was working for me,
but it can fail because EM programs don't always link to libc.

blm and bls only need to copy aligned words.  They don't need to copy
bytes, and they don't need to copy between overlapping buffers, as
memmove() does.  So the new loop is simpler than memmove().
2017-02-11 19:30:12 -05:00
George Koehler c578c495bb Edit PowerPC assembly for .and, .cms, .ior, .xor, .zer
Remove one addi instruction from some loops.  These loops had
increased 2 pointers, they now increase 1 index.  I must initialize
the index, so I add "li r6, 0" before each loop.

Change .zer to use subf instead of neg, add.

Change .xor to take the size on the real stack, as .and and .or have
done since 81c677d.
2017-02-11 18:00:56 -05:00
George Koehler 83c13597e1 Use "mr" and make a few other tweaks in PowerPC ncg table.
Use extended "mr" instead of basic "or" to move registers.  Both "mr"
and "or" encode the same machine instruction.  With "mr", I can more
easily search the assembly output for register moves.

Fold several stacking rules into a single rule ANY_BHW-REG to STACK.

Remove the EM patterns for loc mlu $2==2 and loc slu.  The first
pattern had the wrong size (should be $2==4, not $2==2).  Both
patterns were redundant.  They rewrote loc mlu as loc mli and loc slu
as loc sli, but this table doesn't have patterns for loc mli or loc
sli, so it is enough to rewrite mlu as mli and slu as sli.
2017-02-10 11:45:50 -05:00
George Koehler 85391399a4 Use ha16/lo16 to load or store 1, 2, 8 bytes from labels.
Add the tokens IND_RL_B, IND_RL_H, IND_RL_H_S, IND_RL_D, along with
the rules to use them.  These rules emit shorter code.  For example,
loading a byte becomes lis, lbz instead of lis, addi, lbz.

While making this, I wrongly set IND_RL_D to size 4.  Then ncg made
infinite recursion in codegen() and stackupto(), until it crashed by
stack overflow.  I correctly set IND_RL_D to size 8, preventing the
crash.
2017-02-08 12:31:14 -05:00
George Koehler 5e00e1fce2 Trimming mach/powerpc/ncg/table
Remove coercion from LABEL to REG.  The coercion never happens because
I have stopped putting LABEL on the stack.  Also remove LABEL from set
ANY_BHW.  Retain the move from LABEL to REG because pat gto uses it.

Remove li32 instruction, unused after the switch to the hi16, ha16,
lo16 syntax.

Remove COMMENT(...) lines from most moves.  In my opinion, they took
too much space, both in the table and in the assembly output.  The
stacking rules and coercions keep their COMMENT(...)  lines.

In test GPR, don't write to RSCRATCH.

Fold several coercions into a single coercion from ANY_BHW uses REG.

Use REG instead of GPR in stack patterns.  REG and GPR act the same,
because every GPR on the stack is a REG, but I want to be clear that I
expect a REG, not r0.

In code rules, sort SUM_RC before SORT_RR, so I can add SUM_RL later.

Remove rules to optimize loc loc cii loc loc cii.  If $2==$4, the
peephole optimizer can optimize it.  If $2!=$4, then the EM program is
missing a conversion from size $2 to size $4.

Remove rules to store a SEX_B with sti 1 or a SEX_H with sti 2.  These
rules would never get used, unless the EM program is missing a
conversion from size 4 to size 1 or 2.
2017-02-08 12:27:16 -05:00
George Koehler ed21a59a82 In PowerPC ncg, allocate register for ha16[label].
Use it to generate code like

    lis r12,ha16[__II0]
    lis r11,ha16[_f]
    lfs f1,lo16[_f](r11)
    lfs f2,lo16[__II0](r12)
    fadds f13,f2,f1
    stfs f13,lo16[_f](r11)

Here ncg has allocated r11 for ha16[_f].  We use r11 in lfs and again
in stfs.  Before this change, we needed an extra lis before stfs,
because ncg did not remember that ha16[_f] was in a register.

This example has a gap between ha16[__II0] and lo16[__II0], because
the lo16 is not in the next instruction.  This requires my previous
commit 1bf58cf for RELOLIS.  There is a gap because ncg emits the lis
as soon as I allocate it.  The "lfs f2,lo16[__II0](r12)" happens in a
coercion from IND_RL_W to FSREG.  The coercion allocates one FSREG but
may not allocate any other registers.  So I must allocate r12 earlier.
I allocate r12 in pat lae, but this causes a gap.
2017-02-08 12:23:06 -05:00
George Koehler 754e96ef16 Use ha16/lo16 to emit pairs of lis/stw, lis/lfs, lis/stfs.
A 4-byte load from a label yields a token IND_RL_W.  This token emits
either lis/lwz or lis/lfs, if we want a general-purpose register or a
floating-point register.
2017-02-08 12:13:54 -05:00
George Koehler 7255ed403f Tweak some tokens in PowerPC ncg.
Remove the GPRINDIRECT token, and use the IND_RC_* tokens as operands
to instructions.  We no longer need to unpack an IND_RC_* token and
repack it as a GPRINDIRECT to use it in an instruction.

Allow storing IND_ALL_B and IND_ALL_H in register variables.  Create a
set ANY_BHW for anything that we can store in a regvar.

Push register variables on the stack without using GPRE, by changing
stwu to accept LOCAL.  Then ncg will replace the string ">>> BUG IN
LOCAL" with the register name.  (I copied ">>> BUG IN LOCAL" from
mach/arm/ncg/table.)

Fix the rule for "pat lil inreg($1)>0" to yield a IND_RC_W token, not
a register.  We might need to kill the token with "kills MEMORY".

Rename CONST_ALL to CONST_STACK, because it only includes constants on
the stack, and excludes CONST tokens.  Instructions still don't allow
CONST_STACK operands, so we still need to repack each CONST_STACK as a
CONST to use it in an instruction.

Rename LABEL_OFFSET_HI to just LABEL_HI, and same for LABEL_HA and
LABEL_HO.
2017-02-08 12:12:28 -05:00
George Koehler 1bf58cf51c Add RELOLIS for PowerPC lis with ha16 or hi16.
The new relocation type RELOLIS handles these instructions:

    lis RT, ha16[expr] == addis RT, r0, ha16[expr]
    lis RT, hi16[expr] == addis RT, r0, hi16[expr]

RELOLIS stores a 32-bit value in the program text.  In this value, the
high bit is a ha16 flag, the next 5 bits are the target register RT,
and the low bits are a signed 26-bit offset.  The linker replaces this
value with the lis instruction.

The old RELOPPC relocated a ha16/lo16 or hi16/lo16 pair.  The new
RELOLIS relocates only a ha16 or hi16, so it is no longer necessary to
have a matching lo16 in the next instruction.  The disadvantage is
that RELOLIS has only a signed 26-bit offset, not a 32-bit offset.

Switch the assembler to use RELOLIS for ha16 or hi16 and RELO2 for
lo16.  The li32 instruction still uses the old RELOPPC relocation.

This is not the same as my RELOPPC change from my recent mail to
tack-devel (https://sourceforge.net/p/tack/mailman/message/35651528/).
This commit is on a different branch.  Here I am throwing away my
RELOPPC change and instead trying RELOLIS.
2017-02-08 11:46:31 -05:00
George Koehler f4cfbedd5c Remove #include <stdbool.h> from mach/powerpc/as/mach1.c
We should not include a system header file here, because
mach/proto/as/comm2.y goes through cpp twice.  The include can cause
problems like https://github.com/davidgiven/ack/issues/1

Remove this include #<stdbool.h> and leave a comment pointing to the
includes in comm0.h.  Change the few instances of bool, false, true,
to int, 0, 1.
2017-01-30 16:39:23 -05:00
George Koehler 3c1d2d79f0 Remove type quad, use type word_t in PowerPC as.
Type word_t is for encoding the machine instructions.  It only needs
32 bits for PowerPC.  It was long (which can have 32 or 64 bits), and
there was a second type quad (which was uint32_t).  Switch word_t to
uint32_t and replace quad with word_t.

Also change valu_t and ADDR_T away from long.
2017-01-30 16:15:02 -05:00
George Koehler 48e3aab728 Swap RA and RS when assembling "and", "or", and such instructions.
They must use OP_RA_RS_RB_C instead of OP_RS_RA_RB_C.  The code
generator often sets RS and RA to the same register, so swapping them
causes no change in many programs.

I also rename OP_RS_RA_UI_CC to OP_RA_RS_UI_CC, and OP_RS_RA_C to
OP_RA_RS_C, because they already swap RA and RS.
2017-01-30 15:47:09 -05:00
George Koehler 9ddbb66c8b Turn off comments again. I turned them on by accident in c416889. 2017-01-30 15:45:46 -05:00
George Koehler c41688929c In PowerPC ncg, switch the scratch register from r11 to r0.
r0 is a special case and can't be used when adding a register to a
constant.  The few remaining users of the scratch register don't do
that.  I removed other usages of the scratch register in 7c64dab,
5b5f774, 19f0eb8, f64b7d8.
2017-01-26 13:10:08 -05:00
George Koehler 1dfd5524e4 In PowerPC top, don't delete addi r0, r0, 0
Also don't delete addis r0, r0, 0.  These instructions are special
cases that set r0 to zero.  If we delete them, then r0 keeps its old
value.

I caught this bug because osxppc protects the .text segment against
writing.  (linuxppc doesn't protect it.)  A program tried to set r0 to
the NULL pointer, but top deleted the instruction, so r0 kept an old
return address pointing into .text.  Later the program checked that r0
wasn't NULL, tried to write to address r0, and crashed.
2017-01-26 12:44:32 -05:00
George Koehler 8c8f291a07 In PowerPC libem, remove tge.s and powerpc.h
Nothing uses the tables in tge.s, after I changed the ncg table.
There are no *.e files in libem, so don't try to build them.
2017-01-26 12:39:16 -05:00
George Koehler f64b7d8ea0 Rewrite how PowerPC ncg does conditional branches and tests.
The rewritten code rules bring 3 new features:

  1.  The new rules compare a small constant with a register by
      reversing the comparison and using `cmpwi` or `cmplwi`.  The old
      rules put the constant in a register.

  2.  The new rules emit shorter code to yield the test results,
      without referencing the tables in mach/powerpc/ncg/tge.s.

  3.  The new rules use the extended `beq` and relatives, not the
      basic `bc`, in the assembly output.

I delete the old tristate tokens and the old moves, because they
confused me.  Some of the old moves weren't really moves.  For
example, `move R3, C0` and then `move C0, R0` did not move r3 to r0.

I rename C0 to CR0.
2017-01-25 19:08:55 -05:00
George Koehler a348853ece Add missing size declarations for 8-byte registers.
This fixes the coercion from IND_ALL_D to FREG.  The coercion had
never happened, because IND_ALL_D had 8 bytes but FREG had 4 bytes.
Instead, ncg always stacked the IND_ALL_D and unstacked a FREG.  The
stacking rule uses f0, so the code did load f0 with the indirect
value, push f0 to stack, load f1 to stack, move stack pointer.  Now
that FREG has 8 bytes, ncg does the coercion, and the code just loads
f1 with the indirect value.
2017-01-25 11:56:58 -05:00
George Koehler 188b23bade Add constraints for pat lab, as done in the m68020 table.
Always use 'kills ALL' when reaching a label, because our registers
and tokens have the wrong values if the program jumps to this label
from somewhere else.

When falling through a label, if the top element is in r3, then
require that the rest of the stack is in the real STACK, not in
registers or tokens.

I'm doing this to be certain that the missing constraints are not
causing bugs.  I did not find any such bug, perhaps because the labels
are usually near other instructions (like conditional branches and
function calls) that stack or kill tokens.
2017-01-24 11:26:35 -05:00
George Koehler bb67dbeb11 Use "kills ALL" instead of a list of killed registers.
This is for fef 8 and fif 8.  I changed .fef8 so it no longer kills
r7, but I don't want to update the list.  We already use "kills ALL"
for most other calls to libem.
2017-01-23 17:31:29 -05:00
George Koehler 032bcffef6 In PowerPC libem, use the new features of our assembler.
The new features are the hi16/lo16 and ha16/lo16 syntax for
relocations, and the extended mnemonics like "blr".

Use ha16/lo16 to load some double floats with 2 instructions (lis/lfd)
instead of 3 (lis/ori/lfd).

Use the extended names for branches, comparisons, and bit rotations,
so I can more easily read the code.  The new names often encode the
same machine instructions as the old names, except in a few places
where I changed the instructions.

Stop using andi. when we don't need to set cr0.  In inn.s, I change
andi. to extrwi to extract the same bits.  In los.s and sts.s, I
change "andi. r3, r3, ~3" to "clrrwi r3, r3, 2".  This avoids setting
cr0 and also stops clearing the high 16 bits of r3.

In csa.s, los.s, sts.s, I change some comparisons and right shifts
from signed to unsigned (cmplw, cmplwi, srwi), because the sizes are
unsigned.  In inn.s, the right shift can be signed (sraw) or unsigned
(srw), but I use srw because we don't need the carry bit.

In fef8.s, I save an instruction by using rlwinm instead of addis/andc
to rlwinm to clear a field.  The code no longer kills r7.  In both
fef8.s and fif8.s, I remove the list of killed registers.

Also remove some whitespace from ends of lines.
2017-01-23 17:16:39 -05:00
George Koehler 5aa2ac2246 Teach the assembler about PowerPC extended mnemonics.
Also make a few changes to basic mnemonics.  Fix typo in name of the
basic "creqv".  Add the basic "addc" and relatives, because it would
be odd to have the extended "subc" without "addc".  Fix the basic
"rldicl", "rldicr", "rldic", "rldimi" to correctly encode the 6-bit MB
field.  Fix "slw" and relatives to correctly swap their RA and RS
operands.

Add many, but not all, of the extended mnemonics from IBM's Power ISA
Version 2.06 Book I Appendix E.  (I used 2.06, published 2009, just
because I already had the PDF of it.)  This commit includes mnemonics
for branching, subtraction, traps, bit rotation, and a few others,
like "mflr" and "nop".  The assembler now understands branches like
`beq cr7, label` and bit shifts like `slwi r7, r7, 2`.  These encode
the same machine instructions as the basic "bc" and "rlwinm".

Some operands to basic names become optional.  The assembler no longer
requires the level in "sc" or the branch hint in "bcctr" and "bclr";
they default to zero.  Some extended names take an optional branch
hint or condition register.

Some extended names are still missing.  I don't provide names with
static branch prediction, like "beq+" or "bge-", because the assembler
parses '+' and '-' as operators, not as part of an instruction name.
I also don't provide some names that 2.06 has for moving to or from
the condition register or some special purpose registers, names like
"mtcr" or "mfuamr".

This commit also deletes some unused tokens and one unused yacc rule.
2017-01-21 23:49:29 -05:00
David Given d7df126730 Merge pull request #44 from kernigh/kernigh-pr-as
mach/proto/as: allow more tokens
2017-01-18 23:33:40 +01:00
George Koehler f705339f86 Allow more tokens in the assembler.
I need this so I can add more %token lines to mach/powerpc/as/mach2.c

The assembler's tempfile encoded each token in a byte.  This only
worked with tokens 0 to 127 and 256 and 383.  If a token 384 or higher
existed, the assembler stopped working.  I need tokens 384 and higher.

I change the token encoding to a 2-byte little-endian integer.  I also
change a byte in the string encoding.
2017-01-17 22:41:11 -05:00
David Given 232545606d Merge from default. 2017-01-18 00:02:32 +01:00
George Koehler ba2a03705e Use prototypes in mach/proto/as/comm5.c
Order the function prototypes in comm1.h to match the order of the
function definitions in *.c files.
2017-01-17 16:41:29 -05:00
David Given 81c677d218 Add a bunch more set operations to the PowerPC backends, and the Pascal test
for the same.
2017-01-17 22:31:38 +01:00
George Koehler 916d270534 Delay inclusion of <stdint.h> when compiling comm2.y
See issue #1 (https://github.com/davidgiven/ack/issues/1).  The file
mach/proto/as/comm2.y goes through cpp twice.  The _include macro,
defined in comm2.y and used in comm0.h, delays the inclusion of system
header files.  The inclusion of <stdint.h> wasn't delayed.  This
caused multiple inclusions of <sys/_types.h> in FreeBSD and
<machine/_types.h> in OpenBSD.

Use _include to delay <stdint.h>.  Also use _include for "arch.h" and
"out.h", because h/out.h includes <stdint.h> and h/arch.h might
include it in the future.

Sort the system includes in comm0.h by moving them up to be with
<stdint.h>.  Must include <stdint.h> before "mach0.c", because
mach/powerpc/as/mach0.c needs it.  Must include "mach0.c" before
checking ASLD.
2017-01-16 22:39:44 -05:00
George Koehler e97116c037 Remove some obsolete code that causes a gcc warning.
In my OpenBSD/amd64 system, the code becomes

    if (0)
        outname.on_valu &= ~(((0xFFFFFFFF)<<32)<<32);

The 0xFFFFFFFF is a 32-bit int, so the left shift by 32 is out of
range and causes the gcc warning.

The intent might be to clear any sign-extended bits, if the assignment
outname.on_valu = valu did sign extension.  Old C had no unsigned
long, so .on_valu would have been long.  The code is obsolete because
h/out.h now declares .on_valu as uint32_t.
2017-01-16 18:09:55 -05:00
David Given c471f617b7 Ensure that memory is zero-initialised. 2017-01-16 22:45:03 +01:00
David Given 2cdcc16bc2 Fix a buffer overrun that was manifesting on OpenBSD; also fix a bounds check and some uninitialised variable problems. 2017-01-16 22:44:37 +01:00
David Given fa5675d439 Run through clang-format. 2017-01-16 21:16:33 +01:00
David Given e7e29d34ff Add a test (currently failing) to check that Pascal char sets can store all 256
possible values. Add the PowerPC ncg and mcg backend support to let the test
actually run, including modifying a bunch of PowrePC libem functions so that
they can be called from both ncg and mcg.
2017-01-15 22:28:14 +01:00
David Given 9a346c382d Turns out Apple's hi16/ha16 exactly match my ha16/has16, so renamed
accordingly. (Memo to self: read the docs *before* doing the work.)
2017-01-15 11:59:33 +01:00
David Given f80acfe9f5 Signed vs unsigned lower halves of powerpc fixups are now handled by having two
assembler directives, ha16() and has16(), for the upper half; has16() applies
the sign adjustment. .powerpcfixup is now gone, as we generate the relocation
in ha*() instead. Add special logic to the linker for undoing and redoing the
sign adjustment when reading/writing fixups. Tests still pass.
2017-01-15 11:51:37 +01:00
David Given 3c0bc205fc Update the hi/lo syntax to be a bit more standard. 2017-01-15 10:21:02 +01:00
David Given 8edbff9795 Add assembler support for fixing up arbitrary oris/addi pairs of instructions;
this should allow oris/lwz constant value loads, which will save an opcode.
2017-01-15 00:15:01 +01:00
David Given efab08178b Fix a bunch of issues with pushing and popping mismatched sizes, which the B
compiler does a lot; dup 8 for pairs of words is now optimised.
2017-01-07 18:47:00 +01:00
David Given 6b4f8d72b8 ine and ste are now declared to modify memory (preventing cached values being
propagated across the modification).
2017-01-07 13:25:09 +01:00
David Given 7710c76d56 Introduce sequence points before store instructions to prevent loads from the
same address being delayed until after the store (at which point they'll return
the wrong value).
2017-01-07 13:17:39 +01:00
David Given 0da248dced Use a better NOT; and after remembering that PowerPC bit numbers are all
backwards in the documentation, rewrote IFEQ/IFLT/IFLE to actually work.
Probably. Thanks to the B test suite for spotting this.
2017-01-07 01:03:15 +01:00
David Given 73922f1d16 Ensure that procedure labels are word-aligned. 2017-01-06 22:29:52 +01:00
David Given e3f8fb84dc Change the i80 assembler to be three-pass, which allows forward references;
required for assembling B.
2016-12-29 17:08:53 +00:00
David Given e50f4be710 Merge from default. 2016-12-26 19:44:48 +00:00
David Given bf2e0be69a Merge pull request #27 from kernigh/pr-qemu-doze
Teach qemuppc to halt the cpu on _exit().
2016-12-11 23:17:12 +01:00
George Koehler 8605a2fcfc Add Modula-2 set operations to PowerPC ncg.
This provides and, ior, xor, com, zer, set, cms when defined($1) and
ior, set when !defined($1).  I don't provide the other operations
!defined($1) because our Modula-2 compiler hasn't used them.

I wrote a Modula-2 example in
https://gist.github.com/kernigh/add79662bb3c63ffb7c46d01dc8ae788

Put a dummy comment in mach/powerpc/libem/build.lua so git checkout
will touch that file.  Without the touch, the build system doesn't see
the new *.s files.
2016-12-10 12:23:07 -05:00
George Koehler fcda786fe9 Add some missing clauses to los, sts, aar, inn, cmi, cmu.
We only implement 'los 4', 'sts 4', 'cmi 4', 'cmu 4', not for sizes
other than 4.  Add clause $1==4.

We only implement inn when defined($1).

The rule for aar needs 'kills ALL' because it kills many registers,
like other rules that call libem.
2016-12-09 19:49:50 -05:00
George Koehler 436114fce4 Add a move from CONST smalls(%val) to GPR.
This allows 'move {CONST, $1}, R3' with a small enough $1 to emit one
instruction (addi) instead of two instructions (addis, ori).  The
CONST token confusingly isn't in the CONST_ALL set.
2016-12-09 18:40:14 -05:00
George Koehler 17211eef47 Fix ass to match the EM spec.
The spec says, "ASS w: Adjust the stack pointer by w-byte integer".
The w argument "can either be given as argument or on top of the
stack."  Therefore, 'ass 4' would pop the 4-byte integer from the
stack, but 'ass' would pop the size w from the stack, then pop the
w-byte integer.

PowerPC ncg wrongly implemented 'ass' as if it was 'ass 4'.  Fix it to
accept only 'ass 4'.
2016-12-09 17:32:42 -05:00
George Koehler 5bd0ad4269 Remove the bogus rules for 'lor 2' and 'str 2'.
These instructions would load or store the EM heap pointer.  They
don't work.  Programs must use brk() or sbrk() in libsys.

The last file to use 'lor 2' and 'str 2' was lang/pc/libpc/sav.e in
the Pascal library.  Commit c084f9f deleted the file, so we no longer
need rules 'lor 2' or 'str 2' to build the ACK.
2016-12-09 17:00:56 -05:00
George Koehler 805883e377 Fill in a hint for enabling the COMMENT macro.
If you want to enable comments in the .s file, change

    #define COMMENT(n) /* comment {LABEL, n} */

to

    #define COMMENT(n) comment {LABEL, n}
2016-12-09 16:58:47 -05:00
George Koehler 244e554f2f Remove trailing whitespace in mach/powerpc/ncg/table 2016-12-09 16:36:42 -05:00
George Koehler b8c921ca70 Allow mfspr, mtspr with a register number.
PowerPC has a few hundred special-purpose registers.  The assembler
had only accepted the names "xer", "lr", "ctr".  Most programs use
only those three SPRs.  If I add more names, they would almost never
get used, and they might conflict with labels.

I want to use "mfspr r3, 0x3f0" and "mtspr 0x3f0, r3" in
plat/qemu/boot.s to access register hid0 from supervisor mode.
2016-12-07 17:28:00 -05:00
David Given 55e24e1f24 inn was assuming that bitfields were arrays of bytes, when actually they're
arrays of words (which makes the LSB move on big-endian systems).
2016-12-06 21:45:20 +01:00
David Given fbd6e8f63d Add support for consecutive labels; needed by the B compiler. 2016-11-27 21:18:00 +01:00
David Given 5bce5fc4da Change the extension used by Basic files for .b to .bas, to avoid conflicts
with B.
2016-11-27 20:38:33 +01:00
David Given f8fa3ece42 inn on ncg now passes the CPU tests. 2016-11-20 19:35:34 +01:00
David Given 953c08839f inn works now; add a helper for it. 2016-11-20 12:53:44 +01:00
David Given 196fa914b3 lxa now works, I hope; traps are better (and stubbed out on qemuppc). 2016-11-20 11:57:21 +01:00
David Given d5328492d7 Better handling of float conversions; more tests; converting to unsigned ints
works now.
2016-11-20 11:27:40 +01:00
David Given 454a7494bb cif8 and cuf8 work now. More tests. 2016-11-19 11:42:30 +01:00
David Given cc660b230f Floats and doubles are now written out correctly. 2016-11-19 11:39:13 +01:00
David Given d31bc6a3f9 Made csa and csb work with mcg; adjust the libem functions and the
corresponding invocation in the ncg table so the same helpers can be used for
both mcg and ncg. Add a new IR opcode, FARJUMP, which jumps to a helper
function but saves volatile registers.
2016-11-19 10:55:41 +01:00
David Given 5208e5f751 Yet another OB1 stack format fix. 2016-11-19 10:42:22 +01:00
David Given 43439c6d0c Remember to push the result of lor onto the stack. 2016-11-17 22:04:32 +01:00
David Given 81bc2c74c5 A bb's regsin are no longer the same as those of its first instruction;
occasionally the first hop of a block would try to rearrange its registers (due
to evicted throughs), resulting in the phi moves copying values into the wrong
registers.
2016-11-16 20:52:15 +01:00
David Given 581fa4a457 Reenable eviction of corrupted registers, which had been broken by a previous
change. Change the register move code to get swaps right, or at least righter.
2016-11-15 21:55:10 +01:00
David Given 86c832ef86 Put saved registers in *actually* the write place. I hope. 2016-11-15 21:54:15 +01:00
David Given cc686ded62 Get subtractions the right way round. 2016-11-15 20:25:11 +01:00
David Given 0289b1004e Allow values left on the stack at the end of the procedure (it's legal!). 2016-11-14 21:47:49 +01:00
David Given e7132183fb Fix buffer overrun: if LABEL_STARTER is seen but LABEL_TERMINATOR is not, the
label parser will keep going forever looking for the end of the label. It now
stops at the end of the string.
2016-11-13 14:04:58 +01:00
David Given 852d3a691d Update the table to return call output values in the right registers. Fix the
register allocator so the corrupted registers only apply to throughs
(otherwise, you can't put output registers in corrupted registers).
2016-11-11 21:48:36 +01:00
David Given b5c1d622f5 Rework the way stack frames are laid out to be simpler and, hopefully, more
correct. Saved registers are now placed in what may be the right place.
2016-11-11 21:17:45 +01:00
David Given 84ee75ec07 Merge from default. 2016-11-11 20:17:54 +01:00
David Given d82df74a7a Rename addr_t to address_t to avoid clashes with the system addr_t. 2016-11-11 20:17:10 +01:00
David Given fd91851005 Add enough return types to the K&R C that the ACK builds (on Linux) using clang
now.
2016-11-10 22:04:18 +01:00
David Given 4fa2c94a4a Correctly mangle labels used in initialisers. 2016-10-31 23:21:33 +01:00
David Given 9261cd978d Typo fix. 2016-10-31 23:16:02 +01:00
David Given 941072e0d7 Add, I hope, patterns for fmsub, fnmadd, and fnmsub (also float versions). 2016-10-31 22:36:54 +01:00
David Given 44f0cea6ca Also use fmadd for single-precision floats. 2016-10-31 19:55:16 +01:00
David Given 064d1a5d5d Use fmadd for multiply-and-add instructions. 2016-10-31 19:52:17 +01:00
David Given e19850b114 Fix a few c11isms. 2016-10-30 16:51:06 +01:00
David Given ca5b6e07bb Properly export symbols. 2016-10-29 23:52:17 +02:00
David Given 8c3670483f Get top working with the PowerPC; use it to eliminate useless branches and
moves.
2016-10-29 23:37:11 +02:00
David Given a8c4dac67c Merge from default (merging in George Koehler's PowerPC changes). 2016-10-29 22:40:40 +02:00
David Given a311e61360 Add support for preserved registers. 2016-10-29 20:22:44 +02:00
David Given e3ebf986e9 More opcodes. 2016-10-29 13:32:09 +02:00
David Given 1ae8b90238 More opcodes. 2016-10-29 12:55:34 +02:00
David Given acaae765af Emit negative constants correctly. 2016-10-29 12:55:21 +02:00
David Given 61349389fb More opcodes. sti can now cope with non-standard sizes (really need a better
fix for this). Hack in crude support for mismatched stack pushes and pops (ints
vs longs).
2016-10-29 12:48:05 +02:00
David Given 68419da235 Actually, the locals need to go above the spills and saved regs, so fp == lb. 2016-10-29 12:00:33 +02:00
David Given 2cc2c0ae98 Lots more opcodes. Rearrange the stack layout so that fp->ab is a fixed value
(needed for CHAINFP and FPTOAB). Wire up lfrs to calls via a phi when
necessary, to allow call-bra-lfr chains.
2016-10-29 11:57:56 +02:00
David Given bfa65168e2 Don't generate phis if unnecessary (because this breaks the
critical-edge-splitting guarantee and causes insertion of phi copies to fail).
2016-10-29 10:55:48 +02:00
David Given 658db4ba71 Mangle label names (turns out that the ACK assembler can't really cope with
labels that are the same name as instructions...).
2016-10-27 23:17:16 +02:00
David Given 81525c0f2c Swaps work (at least for registers). More opcodes. Rearrange the stack layout
so we can always trivially find fp, which lets CHAINFP work.
2016-10-27 21:50:58 +02:00
David Given be3dece5af Allow emission of strings containing ". 2016-10-27 21:48:46 +02:00
David Given 51bd3ee4dd Fix bug where some phis weren't being inserted when a given variable definition
needed more than one phi (due to the dominance frontier containing more than
one basic block).
2016-10-27 21:40:25 +02:00
David Given 9977ce841a Remove the bytes1, bytes2, bytes4, bytes8 attributes; remove the concept of a
register 'type'; now use int/float/long/double throughout to identify
registers. Lots of register allocator tweaks and table bugfixes --- we now get
through the dreading Mathlib.mod!
2016-10-25 23:04:20 +02:00
David Given 45a7f2e993 Phi copies are now inserted as part of type inference. More opcodes. 2016-10-24 22:14:08 +02:00
David Given 111c13e253 More opcodes. 2016-10-24 20:15:22 +02:00
David Given a4644dee4d More opcodes. 2016-10-24 12:08:40 +02:00
David Given b22780c075 More opcodes, including the difficult and fairly stupid los/sts. 2016-10-23 22:24:08 +02:00
David Given abd0cedd61 Massive change to how IR types are handled; we use the type code for matching
rather than the size. Much cleaner and simpler.
2016-10-23 21:54:14 +02:00
David Given b1a3d76d6f Re-re-add the type inference layer, now I know more about how things work.
Remove that terrible float promotion code.
2016-10-22 23:04:13 +02:00
David Given 11b0bc1055 More opcodes. 2016-10-22 20:32:51 +02:00
David Given 2d52b1fdaa Remove GETRET; values are now returned directly by CALL. Fix a bug in
convertstackops which was resulting in duplicate IR groups.
2016-10-22 12:13:57 +02:00
David Given ceb938fb3c More opcodes. 2016-10-22 11:26:28 +02:00
David Given 7ae888b754 Hacky workaround the way the Modula-2 compiler generates non-standard sized
loads and saves. More opcodes; simplified table using macros.
2016-10-22 10:48:22 +02:00
David Given 90d0661639 Typo fix. 2016-10-22 00:48:55 +02:00
David Given f851ab83af Better (and more correct) floating point conversions; fif; various new opcodes. 2016-10-22 00:48:26 +02:00
David Given d535be87b1 fef4 and fef8 is now cleaner, albeit slower; add some more register alias
stuff.
2016-10-22 00:02:15 +02:00
David Given 4db402f229 Add (pretty crummy) support for register aliases and static pairs of registers.
We should have enough functionality now for rather buggy 8-bit ints and
doubles. Rework the table and the platform.c to match.
2016-10-21 23:31:00 +02:00
David Given e4fec71f9c Lots more opcodes; better eviction behaviour; better register moves. Lots more
PowerPC stuff (some working).
2016-10-19 23:29:05 +02:00
David Given ffb1eabf45 Floating point promotion is less buggy. 2016-10-19 23:27:53 +02:00
George Koehler 99dee0ad24 Remove f14 to f31 from FREG and FSREG.
This would have happened later, if f14 to f31 became regvar (like r13
to r31 are now).  I am doing it now because ncg is too slow for rules
"with FREG FREG uses FREG".  We use such rules for adf 8 and other EM
instructions that operate on 2 floats.  Like my last commit cfbc537,
this commit speeds ncg by removing choices for register allocation.
2016-10-18 21:16:47 -04:00
David Given d5071e7df1 Promote values accessed via NOP. 2016-10-18 23:58:03 +02:00
David Given 5413d47029 '!' tracing is now always emitted; tracing goes to stderr. 2016-10-18 22:32:09 +02:00
David Given 3520704ea8 Add support for floating point constants. 2016-10-18 22:29:42 +02:00
George Koehler cfbc537959 In powerpc ncg, add a speed hack for sti 8.
ncg is too slow with this many registers.  A stack pattern "with GPR
GPR GPR" or "with REG REG REG" takes too long to pick registers,
causing ncg 8 to take about 2 seconds on each sti 8.  I introduce
REG_PAIR and there are only 4 such pairs.

For programs that use sti 8 (including C programs that copy 8-byte
structs), this speed hack improves the ncg run from several seconds to
almost instantaneous.

Also add a few COMMENT(...) lines in stacking rules.
2016-10-17 20:31:59 -04:00
David Given 938fb8c2fc Lots more opcodes. 2016-10-18 00:31:26 +02:00
David Given 4a093b9eba Add li and mr pseudoinstructions. 2016-10-18 00:21:32 +02:00
George Koehler c7b68033ef Add costs to powerpc instructions.
Also show how andi., andis., or., set condition codes.
2016-10-17 14:57:21 -04:00
George Koehler f33b30ed3c Rewrite .fif8 to avoid powerpc64 fctid
This fixes the SIGILL (illegal instruction) in startrek when firing
phasers.  The 32-bit processors in my PowerPC Mac and in QEMU don't
have fctid, a 64-bit instruction.

I got the idea from mach/proto/fp/fif8.c to extract the exponent,
clear some bits to get an integer, then subtract the integer from
the original value to get the fraction.
2016-10-17 00:39:59 -04:00
George Koehler e2ccc8f942 Add "kills MEMORY" to powerpc sti rules.
Adjust some of the loi rules (and associated moves) so we can identify
the tokens that must be in MEMORY.

With this commit, I can navigate the Enterprise even if I comment out
my work-around from e22c888.
2016-10-16 18:13:39 -04:00
David Given 5f0164db62 Bolt mcg into the PowerPC backend. It doesn't build yet, but it is generating
*some* code.
2016-10-17 00:06:06 +02:00
David Given d539389e81 Merge in the unfinished PowerPC branch. 2016-10-16 22:38:27 +02:00
David Given 1e17921208 Implement saving of dirty registers onto the stack. 2016-10-16 22:37:42 +02:00
George Koehler 19f0eb86a4 Remove IND_LABEL_W and IND_LABEL_D
Because li32 always loads a label into a GPR, it is sufficient to
coerce LABEL to REG, then use IND_RC_W or IND_RC_D for indirection
through the label.
2016-10-16 16:33:24 -04:00
George Koehler 5b5f774a64 Simplify moves to and from IND_RC_*
Now that SUM_RC always has a signed 16-bit constant, it happens that
the various IND_RC_* tokens also have a signed 16-bit constant, so
we no longer need to touch the scratch register.
2016-10-16 16:02:25 -04:00
George Koehler 7c64dab491 Refactor how powerpc ncg pushes constants.
When loc (load constant) pushes a constant, it now checks the value of
the constant and pushes any of 7 tokens.  These tokens allow stack
patterns to recognize 16-bit signed integers (CONST2), 16-bit unsigned
integers (UCONST2), multiples of 0x10000 (CONST_HZ), and other
interesting forms of constants.

Use the new constant tokens in the rules for adi, sbi, and, ior, xor.
Adjust a few other rules to understand the new tokens.

Require that SUM_RC has a signed 16-bit constant, and OR_RC and XOR_RC
each have an unsigned 16-bit constant.  The moves from SUM_RC, OR_RC,
XOR_RC to GPR no longer touch the scratch register, because the
constant is not too big.
2016-10-16 13:58:54 -04:00
George Koehler baa152217e Remove unused parts of mach/powerpc/ncg/table
Remove unused tokens GPRINDIRECTLO, HILABEL, LOLABEL, LABELI.  Also
remove an #if 0 ... #endif group of patterns.
2016-10-15 20:00:48 -04:00
David Given 6a23906ad8 Various bits of cleanup; we should almost be ready to try sending this to the
assembler soon...
2016-10-15 23:39:38 +02:00
David Given 286435a2ed Oops, forgot to add the output option spec to the string! 2016-10-15 23:34:54 +02:00
David Given b36897c299 References to the stack frame are now rendered properly. 2016-10-15 23:33:30 +02:00
David Given a8ee82d197 Stop passing proc around, and use a global instead --- much cleaner. 2016-10-15 23:19:44 +02:00
David Given 7aa60a6451 Register spilling to the stack frame works, more or less. 2016-10-15 22:53:56 +02:00
David Given 0eb32e7553 Fix yet another bug to do with IR register outputs. 2016-10-15 19:14:25 +02:00
David Given 9504aec2bd Function termination gets routed through an exit block; we now have prologues
and epilogues. mcgg now exports some useful data as headers. Start factoring
out some of the architecture-specific bits into an architecture-specific file.
2016-10-15 18:38:46 +02:00
David Given 5ad3aa8595 Add a pile of new instructions used by Pascal; I'm going to need to think about
how locals and the local base are handled.
2016-10-15 13:07:59 +02:00
David Given 358c44de35 Bytes were sometimes failing to be sign extended correctly. 2016-10-15 12:11:40 +02:00
David Given 517120d0fb Allow asm names for registers which are different from the friendly names shown
in the tracing (because PowerPC register names are just numbers).
2016-10-15 11:42:47 +02:00
David Given b2ddf12473 Some more opcodes. 2016-10-15 11:22:40 +02:00
George Koehler 29cb008faa In powerpc table, fix macros los() and his().
Change the operator in his() from a - minus to a + plus.  When los(n)
becomes negative, then his(n) needs to add 0x10000, not subtract it.

Also change los(n) to do the sign extension, because smalls(los(n))
should be true, not false.

Also change hi(n) and lo(n) to wrap n in parentheses, as (n), because
these are macros and n might still contain operators.
2016-10-14 23:59:26 -04:00
David Given bb17aea73a You can now mark a register as corrupting a certain register class; calls work,
or at least look like they work. The bad news is that the register allocator
has a rare talent for putting things in the wrong register.
2016-10-15 01:15:08 +02:00
David Given 886adb86d7 Log empty hops. 2016-10-14 23:19:25 +02:00
David Given 4f2177e41f Reworked loads and stores; it's now *different*, maybe not better. 2016-10-14 23:19:02 +02:00
David Given a63052427e Factor out the register allocation routines to make them easier to deal with. 2016-10-14 23:17:06 +02:00
David Given bb53a7fb51 Fix stupid issue where hop output registers were being overwritten, leading to
invalid SSA form.
2016-10-14 23:12:29 +02:00
David Given 98fe70a7de Output register equality constraints work. 2016-10-14 22:17:02 +02:00
David Given 216ff5cc43 Make loads and stores in the table nicer; fix a place where it looked like it
was working but only accidentally.
2016-10-12 23:12:53 +02:00
David Given f06b51c981 Keep track of register types as well as attributes --- the type being how we
find new registers when evicting values. Input constraints work (they were
being ignored before). Various bug fixing so they actually work.
2016-10-12 22:58:46 +02:00
David Given 4723a1442f Add code to remove unused phis, converting to pruned SSA form, to avoid
confusing the register allocator later.
2016-10-12 21:50:12 +02:00
David Given df239b3f90 Don't allow the same IR to be added to the sequence list more than once
(sometimes happens because op_dup, but makes no sense).
2016-10-12 00:45:36 +02:00
David Given 96dffd2007 Clean up the allocator a bit, in preparation for making it lots more
complicated; no semantic changes.
2016-10-11 23:17:30 +02:00
David Given 668cccdff1 A few more opcodes. 2016-10-11 00:29:18 +02:00
David Given 2be1c51885 A little fiddling with store instructions. The PowerPC is not friendly to
iburg.
2016-10-11 00:23:35 +02:00
David Given e93c58dc8d Refactored the way hops are rendered; add support for emitting code (although
with no prologue or epilogue yet).
2016-10-11 00:12:11 +02:00
David Given 92bd1ac5f4 Register allocator now gets all the way through all of my test file without
crashing (albeit with register moves and swaps stubbed out). Correct code? Who
knows.
2016-10-10 23:19:46 +02:00
David Given a4d06d1795 D'oh, need multiple passes over the edge splitter in order to properly find all
cases.
2016-10-10 23:18:37 +02:00
David Given fac12aae32 Calculate phi congruency groups; use them to solve the
importing-hreg-from-the-future problem (probably poorly).
2016-10-09 22:04:20 +02:00
David Given 23c3575f0f The register allocator now makes a spirited attempt to honour register
attributes when allocating. Unfortunately, backward edges don't work (because
the limited def-use chain stuff doesn't work across basic blocks). Needs more
thought.
2016-10-09 15:09:34 +02:00
David Given 38de688c5a Floating point promotion was broken since the IR float change. Fix. 2016-10-09 15:08:03 +02:00
David Given 36cddd6afb Add some more opcodes; rearrange the registers to be more PowerPC-friendly. 2016-10-09 14:45:13 +02:00
David Given cfe5312fcc Predicates can now take numeric arguments. The PowerPC predicates have been
turned into generic ones (as they'll be useful everywhere). Node arguments for
predicates require the '%' prefix for consistency. Hex numbers are permitted.
2016-10-09 12:32:36 +02:00
David Given d75cc0a663 Basic register allocation works! 2016-10-08 23:32:54 +02:00
David Given 637aeed70a Only allocate an output vreg if the instruction actually wants one. 2016-10-08 12:15:21 +02:00
David Given 2198db69b1 Instruction predicates work now. 2016-10-08 11:35:33 +02:00
David Given 9ebf731335 Minor cleanup. 2016-10-08 11:07:28 +02:00
David Given 9db902314b Fix bug where pushes were being placed in the wrong blocks. 2016-10-08 10:21:24 +02:00
George Koehler 65c2a8a0ae Remove stackadjust and stackoffset() from ncg.
This feature has never been used since its introduction, more than 3
years ago, in David Given's commit c93cb69 of May 8, 2013.  The commit
was for "PowerPC and M68K work".  I am not undoing the entire commit.
I am only removing the stackadjust and stackoffset() feature.

This commit removes the feature from my branch kernigh-linuxppc.  This
removal includes the mach/proto/ncg parts.  The default branch already
removed most of the feature, but kept the mach/proto/ncg parts.  That
removal happened in commit 81778b6 of May 13, 2013 (which was a merge;
git diff af0dede 81778b6).  The branch dtrg-experimental-powerpc
merged the default branch but without the removal.  That merge was
commit 4703db0f of Sep 15, 2016 (git diff 8c94b13 4703db0).  My branch
kernigh-linuxppc is off branch dtrg-experimental-powerpc, so I can no
longer get the removal by merging default.

David Given described the stackadjust feature in
  https://sourceforge.net/p/tack/mailman/message/30814691/

The instruction stackadjust would add a value to the offset, and the
function stackoffset() would return this offset.  One would use this
to track sp - fp, then omit the frame pointer by not keeping fp in a
register.
2016-10-07 20:52:13 -04:00
David Given 4e49830e09 Overhaul of everything phi related; critical edge splitting now happens before
anything SSA happens; liveness calculations now look like they might be
working.
2016-10-08 00:21:23 +02:00
George Koehler 409ba7fb1b Remove most of GPRE from mach/powerpc/ncg/table
We only need GPRE in a few places where we write {GPRE, regvar(...)}
because ncgg can't parse plain regvar(...).  In all other places, a
plain GPR works.

Also remove gpr_gpr_gpr and a few other unused and fake instructions
from the list of instructions.
2016-10-06 22:59:27 -04:00
George Koehler 7cccd88b71 Rename SCRATCH to RSCRATCH. Never stack RSCRATCH nor FSCRATCH.
Rename the scratch gpr (currently r11) from SCRATCH to RSCRATCH so I
can search for RSCRATCH without finding FSCRATCH.  I also want to
avoid confusion with the SCRATCH keyword of the old code generator (cg
which came before ncg).

Change the stacking rules to prevent stacking of RSCRATCH or FSCRATCH
or any other GPR or FPR that isn't an allocatable REG or FREG.  Then
ncgg rejects any rule that tries to stack a GPR or FPR, so change such
rules to stack a REG or FREG.
2016-10-06 20:47:42 -04:00
David Given ee93389c5f Refactor the cfg and dominance stuff to make it a lot nicer. 2016-10-06 21:34:21 +02:00
David Given d20b63dc94 The register allocator is really a pass, so arrange the code like one. 2016-10-05 23:55:38 +02:00
David Given 87e004e4a9 Warning fix. 2016-10-05 23:55:04 +02:00
David Given 21034c0d65 No, dammit, for register allocation I need to walk the blocks in *dominance*
order. Since the dominance tree has changed when I fiddled with the graph, I
need to recompute it, so factor it out of the SSA pass. Code is uglier than I'd
like but at least the RET statement goes last in the generated code now.
2016-10-05 23:52:54 +02:00
David Given d95c75dfd7 Allowing an input filename on the command line makes debuggers happy. (Then we
don't need to redirect stdin.)
2016-10-05 23:24:29 +02:00
David Given 88fb231d6e Better constraint syntax; mcgg now passes register usage information up to mcg;
mcg can track individual hop inputs and outputs (needed for live range
analysis!); the register allocator now puts the basic blocks into the right
order in preparation for live range analysis.
2016-10-05 22:56:25 +02:00
David Given 7a6fc7a72b Made sure that all files end in vim magic. 2016-10-05 21:07:29 +02:00
David Given 92502901a7 Better management of register data. Add struct hreg. 2016-10-05 21:00:28 +02:00
David Given ac62c34e19 Add a pass to do critical edge splitting. 2016-10-04 23:42:00 +02:00
David Given 8fedf5a0a8 Added support for the op_bXX conditional branch instructions. 2016-10-04 23:28:16 +02:00
David Given 249855ed23 Fix the horror of the startup code; now uses getopt and stuff and the debug
flags can be set as an option.
2016-10-04 22:36:01 +02:00
David Given ac063a6f54 Remove unused variable (reduce memory usage by 1/10). 2016-10-04 22:35:08 +02:00
David Given c6f576f758 Bodge in enough phi support to let the instruction generator complete on basic
programs.
2016-10-04 21:58:31 +02:00
David Given e13ff5be31 Don't allocate new vregs for REG and NOP --- a bit hacky, but suppresses stray
movs very effectively.
2016-10-04 21:29:03 +02:00
David Given bd28bddb92 Massive rewrite of how emitters and the instruction selector works, after I
realised that the existing approach wasn't working. Now, hopefully, tracks the
instruction trees generated during selection properly.
2016-10-04 00:16:06 +02:00
David Given 68f98cbad7 Instruction selection now happens on a shadow tree, rather than on the IR tree
itself. Currently it's semantically the same but the implementation is cleaner.
2016-10-03 20:52:36 +02:00
David Given 288ee56203 Get quite a long way towards basic output-register equality constraints (needed
to make special nodes like NOP work properly). Realise that the way I'm dealing
with the instruction selector is all wrong; I need to physically copy chunks of
tree to give to burg (so I can terminate them correctly).
2016-10-02 23:25:54 +02:00
David Given 3aa30e50d1 Come up with a syntax for register constraints. 2016-10-02 21:51:25 +02:00
David Given c079e97492 Perform SSA conversion of locals. Much, *much* better code now, at least
inasmuch as it looks better before register allocation. Basic blocks now know
their own successors and predecessors (after a certain point in the IR
processing).
2016-10-02 17:50:34 +02:00
David Given 79d4ab1d96 Add zrl opcode. Keep track of local sizes as well as offsets. 2016-10-02 16:08:46 +02:00
David Given bf73fcdb64 Add inl and del opcodes. 2016-10-02 14:44:21 +02:00
David Given b298c27c63 Refactor mcg.h as it's getting a bit big; keep track of register variables. 2016-10-02 00:30:33 +02:00