ack/doc/m68020.doc

.nr PS 11
.nr VS 13p
.EQ
delim @@
.EN
.EQ
gfont R
.EN
.ND
.RP
.TL
A back end table for the Motorola MC68000, MC68010 and MC68020 microprocessors
.AU
Frank Doodeman
.AB
A back end table is part of the Amsterdam Compiler Kit (ACK). It is used
to produce the actual back end, a program that translates the intermediate
language family EM to assembly language for some target machine. The table
discussed here can be used for two back ends, suitable for in total three
machines: the MC68000 and MC68010 (the difference between these two is
so small that one back end table can be used for either one), or
for the MC68020.
.AE
.NH
Introduction
.PP
To simplify the task of producing portable (cross) compilers and interpreters
the Vrije Universiteit designed an integrated collection of programs, the
Amsterdam Compiler Kit (ACK) [2]. It is based on the old UNCOL idea [1] which
attempts to solve the problem of how to make a compiler for each of @ N @
languages on @ M @ different machines without having to write @ N times M @
programs.
.PP
The UNCOL approach is to write @ N @
.I
front ends,
.R
which translate the
source language into a common intermediate language UNCOL (Universal Computer
Oriented Language), and @ M @
.I
back ends,
.R
each of which translates programs in
UNCOL into a specific machine language. Under these conditions only @ M + N @
programs must be written to provide all @ N @ languages on all @ M @
machines, instead of @ M times N @ programs.
.PP
The intermediate language for the Amsterdam Compiler Kit is the machine language
for a simple stack machine called EM (Encoding Machine) [3]. So a back end for
the MC68020 translates EM code into MC68020 assembly language. Writing such a
table [4] suffices to get the back end.
.PP
The back end is a single program that is driven by a machine dependent driving
table. This table, the back end table, defines the mapping of EM code to
the MC68000, MC68010 or MC68020 assembly language.
.NH
The MC68000 and MC68020 micro processors
.PP
In this document the name MC68000 will be used for both the MC68000 and the
MC68010 micro processors, because as far as the back end table is concerned
there is no difference between them. For a complete and detailed description
of the MC68020 one is referred to [5]; for the MC68000 one might also use [6].
In this section some relevant parts will be handled.
.NH 2 
Registers
.PP
Both the MC68000 and the MC68020 have eight 32-bit data registers (@ D sub 0 @-@ D sub 7 @) that can
be used for byte (8-bit), word (16-bit) and long word (32-bit) data operations.
They also have seven 32-bit address registers (@ A sub 0 @-@ A sub 6 @) that may be used as
software stack pointers and base address registers; address register @ A sub 7 @ is
used as the system stack pointer. Address registers may also be used for
word and long word address operations.
.NH 2 
Addressing modes
.PP
First the MC68000 addressing modes will be discussed. Since the MC68020's
set of addressing modes is an extension of the MC68000's set, of course this
section also applies to the MC68020.
.PP
In the description we use:
.IP @ A sub n @
for address register;
.IP @ D sub n @
for data register;
.IP @ R sub n @
for address or data register;
.IP @ X sub n @
for index register (either data or address register);
.IP @ PC @
for program counter;
.IP @ d sub 8 @
for 8 bit displacement integer;
.IP @ d sub 16 @
for 16 bit displacement integer;
.IP @ bd @
for base displacement (may be null, word or long);
.IP @ od @
for outer displacement (may be null, word or long).
.NH 3 
General addressing modes
.NH 4 
Register Direct Addressing
.IP Syntax: 8
@ R sub n @ 
.PP
This addressing mode (it can be used with either a data register or an address
register) specifies that the operand is in one of
the 16 multifunction registers.
.NH 4 
Address Register Indirect
.IP Syntax: 8
@ ( A sub n ) @ 
.PP
The address of the operand is in the address register specified.
.NH 4 
Address Register Indirect With Postincrement
.IP Syntax: 8
@ ( A sub n )+ @ 
.PP
The address of the operand is in the address register specified. After the
operand address is used, the address register is incremented by one, two or
four depending upon whether the size of the operand is byte, word or long.
If the address register is the stack pointer and the operand size is byte, the
address register is incremented by two rather than one to keep the stack pointer
on a word boundary.
.NH 4 
Address Register Indirect With Predecrement
.IP Syntax: 8
@ -( A sub n ) @ 
.PP
The address of the operand is in the address register specified. Before the
operand address is used, the address register is decremented by one, two or
four depending upon whether the size of the operand is byte, word or long.
If the address register is the stack pointer and the operand size is byte, the
address register is decremented by two rather than one to keep the stack pointer
on a word boundary.
.NH 4 
Address Register Indirect With Displacement
.IP Syntax: 8
@ d sub 16 ( A sub n ) @ for the MC68000, @ ( d sub 16 , A sub n ) @ for the MC68020 
.PP
This address mode requires one word of extension. The address of the operand is
the sum of the contents of the address register and the sign extended 16-bit
integer in the extension word.
.NH 4 
Address Register Indirect With Index
.IP Syntax: 8
@ d sub 8 ( A sub n , X sub n .size) @ for the MC68000, @ ( d sub 8 , A sub n , X sub n .size) @ for the MC68020 
.PP
This address mode requires one word of extension according to a certain format, 
which specifies
.IP 1.
which register to use as index register;
.IP 2.
a flag that indicates whether the index register is a data register or an
address register;
.IP 3.
a flag that indicates the index size; this is
.I word
when the low order part of the index register is to be used, and 
.I long
when the whole long value in the register is to be used as index;
.IP 4.
an 8-bit displacement integer (the low order byte of the extension word).
.PP
The address of the operand is the sum of the contents of the address register,
the possibly sign extended contents of index register and the sign
extended 8-bit displacement.
.NH 4 
Absolute Data Addressing
.IP Syntax: 8
@ address @ for the MC68000, @ ( address ) @ for the MC68020 
.PP
Two different kinds of this mode are available:
.IP 1.
Absolute Short Address; this mode requires one word of extension. The address of
the operand is the sign extended 16-bit extension word.
.IP 2.
Absolute Long Address; this mode requires two words of extension. The address of
the operand is developed by concatenation of the two extension words; the high
order part of the address is the first extension word, the low order part is
the second.
.NH 4 
Program Counter With Displacement.
.IP Syntax: 8
@ d sub 16 ( PC ) @ for the MC68000, @ ( d sub 16 , PC ) @ for the MC68020 
.PP
This mode requires one word of extension. The address of the operand is the sum
of the address in the program counter and the sign extended 16-bit displacement
integer in the extension word. The value in the program counter is the
address of the extension word.
.NH 4 
Program Counter With Index
.IP Syntax: 8
@ d sub 8 ( PC , X sub n .size ) @ for the MC68000, @ ( d sub 8 , PC,  X sub n .size ) @ for the MC68020 
.PP
This mode requires one word of extension as described under
.I
Address Register Indirect With Index.
.R
The address of the operand is the sum of the value in the
program counter, the possibly sign extended index register and the sign
extended 8-bit displacement integer in the extension word.
The value in the program counter is the address of the extension word.
.NH 4 
Immediate Data
.IP Syntax: 8
@ "\#data" @
.PP
This addressing mode requires either one or two words of extension, depending
on the size of the operation;
.IP
byte operation - the operand is in the low order byte of extension word;
.IP
word operation - the operand is in the extension word;
.IP
long operation - the operand is in the two extension words, the high order
16-bits are in the first extension word, the low order 16-bits in the second.
.NH 3 
Extra MC68020 addressing modes
.PP
The MC68020 has three more addressing modes. These modes all use a displacement
(some even two), an address register and an index register. Instead of the
address register one may also use the program counter. Any of these
may be omitted. If all addends are omitted the processor creates an
effective address of zero. All of these three modes require at least one
extension word, the
.I
Full Format Extension Word,
.R
which specifies:
.IP 1.
the index register number (0-7);
.IP 2.
the index register type (address or data register);
.IP 3.
the size of the index (only low order part or the whole register)
.IP 4.
a scale factor. This is a number from 0 to 3 which specifies how many bits
the contents of the index register is to be shifted to the left before being
used as an index;
.IP 5.
a flag that specifies whether the base (address) register is to be added or
to be suppressed;
.IP 6.
a flag that specifies whether to add or suppress the index operand;
.IP 7.
two bits that specify the size of the base displacement (null, word or long);
.IP 8.
three bits that in combination with (6) above specify which of the three
addressing modes (described below) to use and, if used, the size of the
outer displacement (null, word or long).
.IP N.B.
All modes mentioned above for the MC68000
that use an index register may have this register
scaled (only when using the MC68020).
.PP
The three extra addressing modes are:
.NH 4 
Address Register Indirect With Index (Base Displacement)
.IP Syntax: 8
@ ( bd , A sub n , X sub n .size*scale ) @ (MC68020 only)
.PP
The address of the operand is the sum of the contents of the address register,
the scaled contents of the possibly scaled index register and the possibly
sign extended base displacement. When the program counter is used instead
of the address register, the value in the program counter is the address
of the full format extension word. This mode requires one or two more extension
words when the size of the base displacement is word or long respectively.
.PP
Note that without the index operand, this mode is an extension of the
.I
Address Register Indirect With Displacement
.R
mode; when using the MC68020 one is no longer limited to a 16-bit displacement.
Also note that with the index operand added, this mode is an extension
of the
.I
Address Register Indirect With Index
.R
mode; when using the MC68020 one is no longer limited to an 8-bit displacement.
.NH 4 
Memory Indirect Post-Indexed
.IP Syntax: 8
@ ( [ bd , A sub n ] , X sub n .size*scale , od ) @ (MC68020 only)
.PP
This mode may use an outer displacement. First an intermediate memory
address is calculated by adding the contents of the address register and
the possibly sign extended base displacement. This address is used
for in indirect memory access of a long word, followed by adding
the index operand (scaled and possibly signed extended). Finally the
outer displacement is added to yield the address of the operand.
When the program counter is used, the value in the program counter is the
address of the full format extension word.
.NH 4
Memory Indirect Pre-Indexed
.IP Syntax: 8
@ ( [ bd , A sub n , X sub n .size*scale ] , od ) @ (MC68020 only)
.PP
This mode may use an outer displacement. First an intermediate memory
address is calculated by adding the contents of the address register,
the scaled contents of the possibly sign extended index register and
the possibly sign extended base displacement. This address is used
for an indirect memory access of a long word, followed by adding
the outer displacement to yield the address of the operand.
When the program counter is used, the value in the program counter is the
address of the full format extension word.
.NH 3
Addressing modes used in the table
.PP
Not all addressing modes mentioned above are used in code generation. It is
clear that none of the modes that use the program counter PC can be used,
since at code generation time nothing is known about the value in PC.
Also some of the possibilities of the three MC68020 addressing modes are not
used; e.g. it is possible to use a
.I
Data Register Indirect
.R
mode, which actually is the
.I
Address Register Indirect With Index
.R
mode, with the address register and the displacement left out. However 
such a mode would require two extra bytes for the full format extension word,
and it would also be much slower than using
.I
Address Register Indirect.
.R
For this kind of reasons several possible addressing modes are not used in the
generation of code.
In the table address registers are only used for holding addresses, and
for index registers only data registers are used.
.NH
The M68000 and MC68020 back end table
.PP
The table itself has to be run through the C preprocessor 
before it can be used to generate
the back end (called
.I
code generator
.R
or
.I cg
for short). When no flags are given to
the preprocessor an MC68020 code generator is produced; for the MC68000
code generator one has to run the table through the preprocessor using the
.I -Dm68k4
flag.
.PP
The table is designed as described in [4]. For the overall design of a back
end table one is referred to this document. This section only deals
with problems encountered in writing the table and other things worth noting.
.NH 2 
Constant Definitions
.PP
Wordsize and pointersize (EM_WSIZE and EM_PSIZE respectively) are defined
as four (bytes). EM_BSIZE, the hole between AB (the parameter base) and
LB (the local base), is eight bytes: only
the return address and the localbase are saved.
.NH 2 
Properties
.PP
Since Hans van Staveren in his document [4] clearly states that
.I cg
execution time is negatively influenced by the number of properties, only
four different properties have been defined. Besides, since the registers
really are multifunctional, these four are really all that are needed.
.NH 2 
Registers
.PP
The table uses register variables: @ D sub 3 @ - @ D sub 7 @ are used as general register
variables, and address registers @ A sub 2 @ - @ A sub 5 @ are used as pointer register
variables. @ A sub 6 @ is reserved for the localbase.
.NH 2 
Tokens
.PP
At first glance one might wonder about the amount of tokens, especially
for the MC68020, considering the small amount of different addressing modes.
However, the last three addressing modes mentioned for the MC68020 may
omit any of the addends, and this leads to a large amount of different tokens.
I did consider the possibility of enlarging the number of tokens and sets
even further, because there might be assemblers that don't handle displacements
of zero optimally (they might generate a 2 byte extension word holding zero).
The small profit in bytes in the generated code
however does not justify the increase
in size of the token section, the set section and the patterns section,
so this idea was not developed any further.
.PP
The timing cost of the tokens may be incorrect for some MC68000 tokens.
This is because the MC68000 uses a 16-bit data bus which causes the need
of two separate memory accesses for getting 32-bit operands.
.NH 3 
Token names
.PP
The amount of tokens and the limited capability of the authors imagination
might have caused the names of some tokens not to be very clarifying.
Some information about the names may be in place here.
.PP
Whenever part of a token name is in capitals that part is memory indirected
(i.e. in square brackets). In token names
.I OFF
and
.I off
mean an offsetted address register, so an address register with a displacement
(either base displacement or outer displacement).
.I
IND, ind
.R
and
.I index
stand for indexed, or index register.
.I ABS
and
.I abs
stand for absolute, which actually is just a displacement (base or outer).
These `rules' only apply to names of tokens that represent actual operands.
There are also tokens that represent addresses of operands. These
(with a few exceptions) contain
.I
regA, regX
.R
and
.I con
as parts of there names, which stand for address register, index register and
displacement (always base displacement) respectively. If the address to which
the token refers uses memory indirection, that part of the name comes first
(in small letters), followed by an underscore. The memory indirection part
follows the `rules' for operand token names.
.PP
Of course there are exceptions to these `rules' but in those cases the names
are self explanatory.
.PP
Two special cases:
.I ext_regX
is the name of the token that represents the
address of an absolute indexed operand, syntax @ ( bd , X sub n .size*scale ) @; 
.I regX
does not represent any real mode, but is used with EM array instructions and
pointer arithmetic.
.NH 3
Special tokens for the MC68000
.PP
The MC68000 requires two extra tokens, which are called
.I t_regAcon
and
.I
t_regAregXcon.
.R
They are necessary because
.I regAcon
can only have a 16-bit displacement on the MC68000, and
.I regAregXcon
uses only 8 bits for its displacement. To prevent these addressing modes to
be used with displacements that are too large, the extra tokens are needed.
Whenever the displacements become too large and they need
to be used in the generation
of assembly code, these tokens are transformed into other tokens.
To prevent the table from becoming too messy I defined
.I t_regAcon
and
.I t_regAregXcon
to be identical to
.I regAcon
and
.I regAregXcon
respectively for the MC68020.
.NH 2 
Sets
.PP
Most set names used in the table are self explanatory, especially to the reader
who is familiar with the four addressing categories as mentioned in [5]:
.I
data, memory, alterable
.R
and
.I
control.
.R
In the sets definition part some sets are defined that are not used elsewhere in
the table, but are only used to be part of the definition of
some other set. This keeps the
set definition part from getting too unreadable.
.PP
The sets called
.I imm_cmp
consist of all tokens that can be used to compare with a constant.
.NH 2 
Instructions
.PP
Only the instructions that are used in code generation are listed here.
The first few instructions are meant especially for the use with register
variables. The operand LOCAL used here refers to a register variable.
The reader may not conclude that these operations are also allowed on
ordinary locals. The space and timing cost of these instructions have been
adapted, but the use of the word LOCAL for register variables causes these cost
to be inaccurate anyway.
.PP
The 
.I killreg
instruction, which generates a comment in the assembly language output and
which is meant to let
.I cg
know that the data register operand has its contents destroyed,
needs some explaining but this explanation is better in place
in the discussion of groups 3 and 4 of the section about patterns.
.PP
The timing cost of the instructions are probably not very accurate for the
MC68020 because the MC68020 uses an instruction cache and prefetch. The
cost used in the table are the `worst case cost' as mentioned in section 9
of [5].
.NH 2 
Moves
.PP
These are all pretty straightforward, except perhaps when
.I t_regAcon
and
.I t_regAregXcon
are used. In these cases the size of the displacement has to be checked
before moving. This also applies to the stacking rules and the coercions.
.NH 2 
Tests
.PP
These three tests (one fore each operation size) could not be more
straightforward than they are now.
.NH 2 
Stackingrules
.PP
The only peculiar stackingrule is the one for
.I
regX.
.R
This token is only used with EM array instructions and
with pointer arithmetic. Whenever it is put
on the fake stack, some EM instructions are left in the instruction stream
to remove this token. Consequently it should never have to be stacked. However
the
.I
code generator generator
.R
(or
.I cgg
for short)
complained about not having a stackingrule for this token, so it had to
be added nevertheless.
.NH 2 
Coercions
.PP
These are all straightforward. There are no splitting coercions since
the fake stack never contains any tokens that can be split.
There are only two unstacking coercions.
The rest are all transforming coercions. Almost all coercions transform
tokens into either a data register or an address register, except in the
MC68000 part of the table the
.I t_regAcon
and
.I t_regAregXcon
tokens are transformed into real
.I regAcon
and
.I regAregXcon
tokens with displacements that are properly sized.
.NH 2 
Patterns
.PP
This is the largest part of the table. It is subdivided into 17 groups.
We will take a closer look at the more interesting groups.
.NH 3 
Group 0: rules for register variables
.PP
This group makes sure that EM instructions using register variables are
handled efficiently. This group includes: local loads and
stores; arithmetic, shifts and logical operations on locals and indirect locals
and pointer handling, where C expressions like
.I
*cp++
.R
are handled. For such an expression there are several EM instruction
sequences the front end might generate. For an integer pointer e.g.:
.DS
.B
lol lol adp stl loi $1==$2 && $1==$4 && $3==4 && $5==4
.I
.DE
or
.DS
.B
lol loi lol adp stl $1==$3 && $3==$5 && $2==4 && $5==4
.I
.DE
or perhaps even
.DS
.B
lil lol adp stl $1==$2 && $2==$4 && $3==4
.I
.DE
Each of these is included, since which one is generated is is up to the front
end. If the front end is consistent this will mean that some of these patterns
will never be used in code generation. This might seem a waist, but anyone
who thinks that will certainly change his mind when his new C front end
generates a different EM instruction sequence.
.NH 3 
Groups 1 and 2: load and store instructions
.PP
In these groups
.B lof
and
.B stf
,
.B loi
and
.B sti
,
.B ldf
and
.B sdf
are the important instructions.
These are the large parts in this group, especially the
.B loi
and
.B sti
instructions, because they come in three basic sizes (byte, word and long).
Note that with these instructions in the MC68000 part the
.I exact
is omitted in front of
.I regAcon
and
.I
regAregXcon.
.R
This makes sure that
.I t_regAcon
and
.I t_regAregXcon
are transformed into proper tokens before they are used as addresses.
.PP
Also note that the
.I regAregXcon
token is completely left out from the
\fBlof\fR, \fBstf\fR, \fBldf\fR and \fBsdf\fR
instruction handling. This is because the sum of the token displacement
and the offset provided in the instruction cannot be checked and is likely
to exceed 8 bits. Unfortunately 
.I cgg
does not allow the inspection of subregisters of tokens that are on the
fake stack. This same problem might also occur with the
.I regAcon
token, but this is less likely because it
uses 16-bit displacements. Besides if it would have been left out the
\fBlof\fR, \fBstf\fR, \fBldf\fR and \fBsdf\fR
instructions would have been handled considerably less efficient.
.NH 3 
Groups 3 and 4: integer and unsigned arithmetic
.PP
EM instruction
.B sbi
also works with address registers, because the 
.B cmp
instruction in group 12 is replaced by \fBsbi 4\fR.
.PP
For the MC68000 \fBmli\fR, \fBmlu\fR, \fBdvi\fR, \fBdvu\fR, \fBrmi\fR
and \fBrmu\fR are handled
by library routines. This is because the MC68000 has only 16-bit multiplications
and divisions.
.PP
The MC68020 does have 32-bit multiplications and divisions, but for the
.B rmi
and
.B rmu
EM instructions peculiar things happen anyway: they generate the
.I killreg
instruction. This is necessary because the data register that 
first held the dividend now holds the quotient; the original contents are
destroyed without
.I cg
knowing about it (the destruction of the two registers that make up the
.I DREG_pair
token couldn't be noted in the instructions part of the table).
To let
.I cg
know that these contents are destroyed, we have to use this `pseudo instruction'
from lack of a better solution.
.NH 3 
Group 5: floating point arithmetic
.PP
Since floating point arithmetic is not implemented traps will be generated here.
.NH 3 
Group 6: pointer arithmetic
.PP
This also is a very important group, along with groups 1 and 2. The MC68020
has many different addressing modes and if possible they should be used in
the generation of assembly language.
.PP
The
.I regX
token is generated here too. It is meant to make efficient use of the
MC68020 possibility of scaling index registers.
.PP
Note that I would have liked one extra pattern to handle C-statements
like
.DS
.I
pointer += expr ? constant1 : constant2;
.R
.DE
efficiently. This pattern would have looked like:
.DS
pat ads
with const
leaving adp %1.num
.DE
but when
.I cg
is coming to the EM replacement part, the constant has already been removed
from the fake stack, causing
.I %1.num
to have a wrong value.
.NH 3 
Group 9: logical instructions
.PP
The EM instructions \fBand\fR,
.B ior
and
.B xor
are so much alike that procedures can be used here, except for the
.B
xor $1==4
.R
instruction, because the MC68000
.I eor
instruction does not allow as many kinds of operands as
.I and
and
.I
or.
.R
.NH 3 
Group 11: arrays
.PP
This group also tries to make efficient use of the available addressing modes,
but it leaves the actual work to group 6 mentioned above.
.PP
The
.I regX
token is also generated here. In this group this token is very useful for
handling array instructions for arrays with one, two, four or eight byte
elements; the array index goes into the index register, which can then
be scaled appropriately. An offset is used when the
first array element has an index other than zero.
.PP
I would have liked some extra patterns here too but they won't work
for the same reasons as explained in the discussion of group 6.
.NH 3 
Group 14: procedure calls instructions
.PP
The function return area consists of registers @ D sub 0 @ and @ D sub 1 @.
.NH 3 
Group 15: miscellaneous instructions
.PP
In many cases here library routines are called. These will be discussed
later.
.PP
Two special EM instructions are included here: \fBdch\fR, and \fBlpb\fR.
I don't know when they are generated by a front end, but these
instructions were also in the back end table for the PDP. In the PDP table
these instructions were replaced by
.B
loi 4
.R
and
.B
adp 8
.R
respectively. I included them both, since they couldn't do any harm.
.NH 3 
Extra group: optimalization
.PP
This group is handling EM patterns with more than one instruction. This group
is not absolutely necessary but it makes the generation of code
more efficient. Among the things that are handled here are: arithmetic and
logical operations on locals, externals and indirect locals; shifting
of locals, externals and indirect locals by one; some pointer arithmetic; tests
in combination with logical and's and or's or with branches. Finally
there are sixteen patterns about divisions that could be handled more
efficiently by right shifts and which I think should be handled by the
peephole optimizer (since it also handles
the same patterns with multiplication).
.NH
The library routines
.PP
The table is supplied with two separate libraries: one for the MC68000 and one
for the MC68020. The MC68000 uses a couple more routines than the MC68020
because it doesn't have 32-bit division and multiplication.
.PP
The routines that need to pop their operands first store their return address.
Routines that need other register besides @ D sub 0 @-@ D sub 2 @ and @ A sub 0 @-@ A sub 1 @ first store
the original contents of those registers. @ D sub 0 @-@ D sub 2 @ and @ A sub 0 @-@ A sub 1 @ do not have
to be saved because if they contain anything useful, their contents
are pushed on the stack before the routine is called.
.PP
The
.I .trp
routine just prints a message stating the trap number and exits (except
of course when that particular trap number is masked). Usually higher
level languages use their own trap handling routines.
.PP
The
.I .mon
routine doesn't do anything useful at all. It just prints a message stating that
the specified system call is not implemented and then exits. Front ends
usually generate calls to special routines rather than the EM
instruction \fBmon\fR.
These routines have to be supplied in another library. They
may be system dependent (e.g. the MC68000 machine this table was tested on
first moves the parameters to registers, then moves the system call number
to @ D sub 0 @ and then executes
.I
trap #0,
.R
whereas the MC68020 machine this table was tested on required the parameters
to be on the stack rather than in registers). Therefor this library is not
discussed here.
.PP
The
.I .printf
routine is included for EM diagnostic messages. It can print strings using %s,
16-bit decimal numbers using %d and 32-bit hexadecimal numbers using %x.
.PP
The
.I .strhp
routine stores a new EM heap pointer, and sometimes it needs to allocate more
heap space. This is done by calling the system call routine \fI_brk\fR.
Chunks of 1K bytes are allocated, but this can easily be changed into
larger or smaller chunks.
.PP
The MC68000 library also contains a routine to handle the EM instruction \fBrck\fR.
The MC68020 has an instruction
.I cmp2
that is specially meant for range checking so the MC68020 library can do without
that routine.
.PP
The MC68000 library has two multiplication routines, one for unsigned and the other
for signed multiplication. The one for signed multiplication
first tests the sizes of the operands, to see if it can perform
the 16 bit machine instruction instead of the routine. If not, it considers
it's two operands being two digit numbers in a 65535-radix system. It
uses the 16-bit unsigned multiply instruction
.I mulu
three times (it does not calculate the high order result),
and adds up the intermediary results the proper way. The signed
multiplication routine calculates the sign of the result, calculates
the result as it it were an unsigned multiplication, and
adjusts the sign of the result. Here testing
the operands for there sizes would be less simple, because the operands
are signeds; so that is not done here.
.PP
The MC68000 library also has two division routines. The routine for unsigned
division uses the popular algorithm, where the divisor is shifted out and
the quotient shifted in. The signed division routine calculates the sign of
both the quotient and the remainder, calls the unsigned division routine
and adjusts the signs for the quotient and the remainder.
.PP
The
.I .nop
routine is included for testing purposes. This routine prints the line
number and the value in the stack pointer. Calls to this routine
are generated by the EM instruction \fBnop\fR, which is ordinarily
left out by the peephole optimizer.
.NH
Testing the table
.PP
There are special test programs available for testing back end tables.
First there is the EM test set, which tests most EM instructions, making
good use of the
.B nop
instruction. Then there are the Pascal and C test programs. The Pascal
test programs report errors, which makes it relatively easy
to find out what was wrong in the table. The C test programs just
generate some output, which then has to be compared to the expected
output. Differences are
not only caused by errors but also e.g. by the use of four
byte integers and unsigneds (which this table does),
the use of signed characters
instead of unsigned characters (the C front end I used generated signed
characters) or because the back end
does not support floating point.
These differences have to be `filtered out' to reveal
the differences caused by actual errors in the back end table.
These errors then have to be found out by examining the assembly code, for
no proper diagnostic messages are generated.
.PP
After these three basic tests there still remain a number of patterns that
haven't been tested yet. Fortunately
.I cgg
offers the possibility of generating a special
.I cg
that can print a list of patterns that haven't been used in
code generation yet.
For these patterns the table writer has to write his own test programs.
This may complicate things a bit because errors may now be caused by
errors in the back end table as well as errors in the test programs.
The latter happened quite often to me, because I found EM
to be an uncomfortable programming language (of course it isn't meant to
be a programming language, but an intermediary language).
.PP
There still remain a couple of patterns in this table that haven't been tested
yet. However these patterns all have very similar cases that have been
tested (an example of this is mentioned in the section on group 0
of the patterns section of the table). Some patterns have to
do with floating point numbers. These EM instructions all generate
traps, so they didn't all have to be tested. The two instructions
.B dch
and
.B lpb
haven't been tested in this table, but since they only use EM replacement
and they have been tested in the PDP back end table, these two should
be all right.
.NH
Performance of the back end
.PP
To test the performance of the back end I gathered a couple of
C programs and compiled them on the machines I used to test the back ends on.
I compiled them using the C compiler that was available there and
I also compiled them using the back end. I then compared the sizes
of the text segments in the object files.
The final results of these comparisons are in fig. 1 and fig. 2.
.KF
.TS
center box;
cfI s s s s s
c s s s s s
c c | c s | c s
c c | c s | c s
c | c | c  c | c  c
l | n | n  n | n  n.
Differences in text segment sizes for the MC68000
parts of the back end compiled by itself
_
original	 	old m68k4	new MC68000
compiler	(100%)	back end	back end
_
name	size	size	perc.	size	perc.
_
codegen.c	13892	16224	116.7%	12860	92.5%
compute.c	4340	4502	103.7%	4530	104.3%
equiv.c	680	662	97.3%	598	87.9%
fillem.c	8016	7304	91.1%	6880	85.8%
gencode.c	1356	1194	88.0%	1130	83.3%
glosym.c	224	202	90.1%	190	84.8%
main.c	732	672	91.8%	634	86.6%
move.c	1876	1526	81.3%	1410	75.1%
nextem.c	1288	1594	123.7%	1192	92.5%
reg.c	1076	1014	94.2%	916	85.1%
regvar.c	1352	1188	87.8%	1150	85.0%
salloc.c	1240	1100	88.7%	1024	82.5%
state.c	628	600	95.5%	532	84.7%
subr.c	6948	6382	91.8%	5680	81.7%
=
averages	2939	3155	95.8%	2766	86.6%
.TE
.DS C
fig 1.
.DE
.KE
.KF
.TS
center box;
cfI s s s
cfI s s s
c s s s
c s s s
c c | c s
c c | c s
c | c | c  c
l | n | n  n.
Differences in text segment sizes
for the MC68020
parts of the back end
compiled by itself
_
original	 	MC68020
compiler	(100%)	back end
_
name	size	size	perc.
_
codegen.c	12608	12134	96.2%
compute.c	4624	4416	95.5%
equiv.c	572	504	88.1%
fillem.c	7780	6976	89.6%
gencode.c	1320	1086	82.2%
glosym.c	228	182	79.8%
main.c	736	596	80.9%
move.c	1392	1280	91.9%
nextem.c	1176	1066	90.6%
reg.c	1052	836	79.4%
regvar.c	1196	968	80.9%
salloc.c	1200	932	77.6%
state.c	580	528	91.0%
subr.c	6136	5268	85.8%
=
averages	2900	2627	86.4%
.TE
.DS C
fig 2.
.DE
.KE
Fig. 1 also includes results of an old m68k4 back end (a back end
for the MC68000 with four byte word and pointersize). The table for
this back end was given to me as an example, but I thought it didn't make
good use of the MC68000's addressing capabilities, it hardly did any
optimalization, and it sometimes even
generated code that the assembler would not swallow.
This was sufficient reason for me to write a completely new table.
.PP
The results from the table may not be taken too seriously. The sizes measured
are the sizes of the text segments of the user programs, i.e. without the
inclusion of library routines. Of course these segments do contain calls
to these routines. Another thing is that the
.I rom
segment may be included in the text segment (this is why the
results for the MC68000 for
.I compute.c
look so bad).
.PP
Some other things must be said about these results.
The quality of EM code
generated by the C front end is certainly not optimal. The front end
uses temporary locals (extra locals that are used to evaluate expressions)
far too quickly: for a simple C expression like
.DS
.I
*(pointer) += constant
.R
.DE
where
.I pointer
is a register variable, the C front end generates (for obscure reasons)
a temporary local that holds the contents of \fIpointer\fR. This way
the pattern for
.DS
.B
loc lil adi sil $2==$4 && $3==4
.R
.DE
for register variables is not used and longer, less efficient
code is generated. But even in spite of this, the back end seems to
generate rather compact code.
.NH
Some timing results
.PP
In order to measure the performance of the code generated by the back end
some timing tests were done. The reason I chose these particular tests is
that they were also done for many other back ends; the reader can compare
the results if he so wishes (of course comparing the results only
show a global difference in speed of the various machines; it doesn't
show whether some back end generates relatively better code than another).
.PP
On the MC68000 machine the statements were executed one million times.
On the MC68020 machine the statements had to be executed four million times
because this machine was so fast that timing results would be very
unreliable if the statements were executed only one million times.
.PP
For testing I used the following C test program:
.DS
.I
main()
{
    int i, j, ...
    ...
    for (i=0; i<1000; i++)
        for (j=0; j<1000; j++)
    	    STATEMENT;
}
.R
.DE
where
.I STATEMENT
is any of the test statements or the empty statement. For the MC68020
tests I used 2000 instead of 1000.
The results of the test with the empty statement were used to calculate
the execution times of the other test statements.
.PP
Figures 3 and 4 show many results. For each machine actually two tests were
done: one with register variables, and the other without them.
I noticed that the original C compilers on both machines did not generate
the use of register variables, unless specifically requested. The
back end uses register variables when and where they are profitable, even
if the user did not ask for them.
.KF
.TS
center box;
cfI s s s s
c s s s s
c | c s | c s
cw(1.5i) | c c | c c
c | c c | c c
lp-2fI | n n | n n.
timing results for the MC68000
times in @ mu @seconds
_
test statement	without register variables	with register variables
_
 	original	new MC68000	original	new MC68000
 	C compiler	back end	C compiler	back end
_
int1=0;	2.8	2.7	0.5	0.5
int1=int2-1;	4.1	4.1	1.3	1.3
int1=int1+1;	4.1	4.1	1.3	1.3
int1=int2*int3;	40.0	40.5	36.2	36.8
T{
int1=(int2<0);
\/*true*/
T}	5.5	7.3	2.0	4.5
T{
int1=(int2<0);
\/*false*/
T}	4.7	8.5	2.8	5.6
T{
int1=(int2<3);
\/*true*/
T}	6.2	7.7	2.6	5.4
T{
int1=(int2<3);
\/*false*/
T}	5.4	8.9	3.6	6.5
T{
.na
int1=((int2>3)||(int2<3));
\/* true || false */
T}	6.0	7.8	3.4	5.4
T{
.na
int1=((int2>3)||(int2<3));
\/* false || true */
T}	9.1	10.2	5.7	7.1
T{
.na
switch (int1) {
case 1: int1=0; break;
case 2: int1=1; break;
}
T}	6.3	17.8	5.3	14.0
T{
.na
if (int1=0) int2=3;
\/*true*/
T}	5.1	4.7	1.3	1.3
T{
.na
if (int1=0) int2=3;
\/*false*/
T}	2.2	2.1	1.9	1.1
while (int1>0) int1=int1-1;	2.2	2.1	1.1	1.1
int1=a[int2];	6.8	6.7	4.0	3.1
p3(int1);	14.3	11.1	13.4	10.0
int1=f(int2);	17.7	14.5	14.8	11.7
s.overhead=5400;	2.8	2.7	2.9	2.7
.TE
.DS C
Fig. 3
.DE
.KE
.KF
.TS
center box;
cfI s s s s
c s s s s
c | c s | c s
cw(1.5i) | c c | c c
c | c c | c c
lp-2fI | n n | n n.
timing results for the MC68020
times in @ mu @seconds
_
test statement	without register variables	with register variables
_
 	original	new MC68020	original	new MC68020
 	C compiler	back end	C compiler	back end
_
int1=0;	.25	.25	.15	.15
int1=int2-1;	1.3	1.3	.38	.38
int1=int1+1;	1.2	.90	.38	.15
int1=int2*int3;	4.4	4.2	3.0	3.1
T{
int1=(int2<0);
\/*true*/
T}	1.6	2.7	1.1	2.3
T{
int1=(int2<0);
\/*false*/
T}	1.9	2.9	.80	2.1
T{
int1=(int2<3);
\/*true*/
T}	1.7	2.8	1.2	2.6
T{
int1=(int2<3);
\/*false*/
T}	2.1	3.0	.85	2.3
T{
.na
int1=((int2>3)||(int2<3));
\/* true || false */
T}	2.1	3.1	1.2	2.5
T{
.na
int1=((int2>3)||(int2<3));
\/* false || true */
T}	3.4	4.2	1.8	3.2
T{
.na
switch (int1) {
case 1: int1=0; break;
case 2: int1=1; break;
}
T}	2.7	8.0	2.0	6.9
T{
.na
if (int1=0) int2=3;
\/*true*/
T}	1.2	1.3	.63	.63
T{
.na
if (int1=0) int2=3;
\/*false*/
T}	1.7	1.6	.50	.53
while (int1>0) int1=int1-1;	1.2	1.3	.55	.53
int1=a[int2];	1.8	1.8	1.0	1.0
p3(int1);	14.8	5.5	14.1	5.0
int1=f(int2);	16.3	6.6	15.2	5.9
s.overhead=5400;	.48	.48	.50	.50
.TE
.DS C
Fig. 4
.DE
.KE
.PP
The reader may have noticed that on both machines the back end seems
to generate considerably slower code for tests where a `condition' is
used in the rhs of an assignment statement. This is in fact not true: it is
the front end that generates bad code. Two examples: for the C statement
.DS
.I
int1 = (int2 < 0);
.R
.DE
the front end generates the following code for the rhs (I
used arbitrary labels):
.DS
.B
lol -16
zlt *10
loc 0
bra *11
10
loc 1
11
.R
.DE
while in this case (to my opinion) it should have generated
.DS
.B
lol -16
tlt
.R
.DE
which is much shorter. Another example: for the C statement
.DS
.I
int1 = (int2 < 3);
.B
.DE
the front end generates for the rhs
.DS
.B
lol -16
loc 3
blt *10
loc 0
bra *11
10
loc 1
11
.R
.DE
while a much better translation would be
.DS
.B
lol -16
loc 3
cmi 4
tlt
.R
.DE
.PP
Another statement that the back end seems to generate slower code for is
the C switch statement. This is true, but it is also caused by
the way these things are done in EM. EM uses the
.B csa
or
.B csb
instruction, and for these two I had to use library routines. On larger
switch statements the
.I .csa
routine will perform relatively better.
.PP
The back end generates considerably faster code for procedure and function
calls, especially in the MC68020 case, and also for the C statement
.DS
.I
int1 = int1 + 1;
.R
.DE
The original C compilers use the same method for this instruction
as for
.DS
.I
int1 = int2 - 1;
.R
.DE
they perform the addition in a scratch register, and then store the
result. For the former C statement this is not necessary, because
the MC68000 and MC68020 have an instruction that can add constants
to almost anything (in this case: to locals). The MC68000 and MC68020
back ends do use this instruction.
.NH
Some final remarks
.PP
As mentioned a few times before, the C front end compiler does not
generate optimal code and as a consequence of this the
back end does not always generate optimal code. This is especially
the case with temporary locals, which the front end generates much
too quickly, and also with conditional expressions that are
used in the rhs of an assignment statement (fortunately this is not
needed so much).
.PP
If
.I cgg
would have been able to accept operands separated by any character
instead of just by commas (in the instruction definitions part),
I wouldn't have had the need of the
.I killreg
pseudo instruction. It would also be handy to have
.I cgg
accept all normal C operators. At the moment
.I cgg
does not accept binary ands, ors and exors, even though in [4]
it is stated that
.I cgg
does accept all normal C operators. As it happens I did not need the
binary operators, but at some time in developing the table I thought
I did.
.PP
I would also like
.I cg
to do more with the condition codes information that is supplied with
each instruction in the instruction definitions section of the table.
Sometimes
.I cg
generates test instructions which actually were not necessary. This
of course causes the generated
programs to be slightly larger and slightly slower.
.PP
In spite of the few minor shortcomings mentioned above I found
.I cgg
a very comfortable tool to use.
.SH
References
.PP
.IP [1]
T. B. Steel Jr.,
.I
UNCOL: The myth and the Fact,
.R
in Ann. Rev. Auto. Prog.,
R. Goodman (ed.), Vol. 2 (1969), pp 325 - 344
.IP [2]
A. S. Tanenbaum, H. van Staveren, E. G. Keizer, J. W. Stevenson,
.I
A practical toolkit for making portable compilers,
.R
Informatica Report 74, Vrije Universiteit, Amsterdam, 1983
.IP [3]
A. S. Tanenbaum, H. van Staveren, E. G. Keizer, J. W. Stevenson,
.I
Description of an experimental machine architecture for use with
block structured languages,
.R
Informatica Report 81, Vrije Universiteit, Amsterdam, 1983
.IP [4]
H. van Staveren
.I
The table driven code generator from the Amsterdam Compiler Kit,
Second Revised Edition,
.R
Vrije Universiteit, Amsterdam
.IP [5]
.I
MC68020 32-bit Microprocessor User's Manual,
.R
Second Edition,
Motorola Inc., 1985, 1984
.IP [6]
.I
MC68000 16-bit Microprocessor User's Manual,
Preliminary,
.R
Motorola Inc., 1979