My i386 code from 893df4b gave the wrong sign to some 8-byte
remainders. Fix by splitting .dvi8 and .rmi8 so each has its own code
to pick the sign. They and .dvu8 and .rmu8 share a private sub
.divrem8 for unsigned division.
Improve the i386 code by using instructions like _bsr_ and _shrd_.
Change the helpers to yield a quotient in ebx:eax or a remainder in
ecx:edx; this seems more convenient, because _div_ puts its quotient
in eax and remainder in edx.
Add tests for comparisons and shifts. Also add enough integer
conversions to compile the shift test (llshift_e.c), and disable
some wrong rules for ldc and conversions.
This provides adi, sbi, mli, dvi, rmi, ngi, dvu, rmu 8, but is missing
shifts and rotates. It is also missing conversions between 8-byte
integers and other sizes of integers or floats. The code might not be
all correct, but works at least some of the time.
I adapted this from how ncg i86 does 4-byte integers, but I use a
different algorithm when dividing by a large value: i86 avoids the div
instruction and uses a shift-and-subtract loop; but I use the div
instruction to estimate a quotient, which is more like how big integer
libraries do division. My .dvi8 and .dvu8 also set ecx:ebx to the
remainder; this might be a bad idea, because it requires .dvi8 and
.dvu8 to always calculate the remainder, even when the caller only
wants the quotient.
To play with 8-byte integers, I wrote EM procedures like
mes 2, 4, 4
exp $ngi
pro $ngi,0
ldl 4
ngi 8
lol 0
sti 8
lol 0
ret 4
end
exp $adi
pro $adi,0
ldl 4
ldl 12
adi 8
lol 0
sti 8
lol 0
ret 4
end
and called them from C like
typedef struct { int l; int h; } q;
q ngi(q);
q adi(q, q);
This turns EM `con 5000000000I8` into assembly `.data8 5000000000` for
machines i386, i80, i86, m68020, powerpc, vc4. These are the only ncg
machines in our build.
i80 and i86 get con_mult(sz) for sz == 4 and sz == 8. The other
machines only get sz == 8, because they have 4-byte words, and ncg
only calls con_mult(sz) when sz is greater than the word size. The
tab "\t" after .data4 or .data8 is like the tabs in the con_*() macros
of mach/*/ncg/mach.h.
i86 now uses .data4, like i80. Also, i86 and i386 now use the numeric
string without converting it to an integer and back to a string.
This takes literal integers, not expressions, because each machine
defines its own valu_t for expressions, but valu_t can be too narrow
for an 8-byte integer, and I don't want to change all the machines to
use a wider valu_t. Instead, change how the assembler parses literal
integers. Remove the NUMBER token and add a NUMBER8 token for an
int64_t. The new .data8 pseudo emits all 8 bytes of the int64_t;
expressions narrow the int64_t to a valu_t. Don't add any checks for
integer overflow; expressions and .data* pseudos continue to ignore
overflow when a number is too wide.
This commit requires int64_t and uint64_t in the C compiler to build
the assembler. The ACK's own C compiler doesn't have these.
For the assembler's temporary file, add NUMBER4 to store 4-byte
integers. NUMBER4 acts like NUMBER[0-3] and only stores a
non-negative integer. Each negative integer now takes 8 bytes (up
from 4) in the temporary file.
Move the `\fI` and `\fP` in the uni_ass(6) manual, so the square
brackets in `thing [, thing]*` are not italic. This looks nicer in my
terminal, where italic text is underlined.
This causes clang to give fewer warnings of implicit declarations of
functions.
In mach/pdp/cv/cv.c, rename wr_int2() to cv_int2() because it
conflicts with wr_int2() in <object.h>.
In util/ack, rename F_OK to F_TRANSFORM because it conflicts with F_OK
for access() in <unistd.h>.
unsigned comparisons is surprisingly not that useful due to marshalling
overhead; it's only four bytes to do inline (plus jc), or six for a constant.
Also add some useful top optimisations. Star Trek goes from 39890 to 39450
bytes.