1408 lines
		
	
	
	
		
			43 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			1408 lines
		
	
	
	
		
			43 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
.nr PS 11
 | 
						|
.nr VS 13p
 | 
						|
.EQ
 | 
						|
delim @@
 | 
						|
.EN
 | 
						|
.EQ
 | 
						|
gfont R
 | 
						|
.EN
 | 
						|
.ND
 | 
						|
.RP
 | 
						|
.TL
 | 
						|
A back end table for the Motorola MC68000, MC68010 and MC68020 microprocessors
 | 
						|
.AU
 | 
						|
Frank Doodeman
 | 
						|
.AB
 | 
						|
A back end table is part of the Amsterdam Compiler Kit (ACK). It is used
 | 
						|
to produce the actual back end, a program that translates the intermediate
 | 
						|
language family EM to assembly language for some target machine. The table
 | 
						|
discussed here can be used for two back ends, suitable for in total three
 | 
						|
machines: the MC68000 and MC68010 (the difference between these two is
 | 
						|
so small that one back end table can be used for either one), or
 | 
						|
for the MC68020.
 | 
						|
.AE
 | 
						|
.NH
 | 
						|
Introduction
 | 
						|
.PP
 | 
						|
To simplify the task of producing portable (cross) compilers and interpreters
 | 
						|
the Vrije Universiteit designed an integrated collection of programs, the
 | 
						|
Amsterdam Compiler Kit (ACK) [2]. It is based on the old UNCOL idea [1] which
 | 
						|
attempts to solve the problem of how to make a compiler for each of @ N @
 | 
						|
languages on @ M @ different machines without having to write @ N times M @
 | 
						|
programs.
 | 
						|
.PP
 | 
						|
The UNCOL approach is to write @ N @
 | 
						|
.I
 | 
						|
front ends,
 | 
						|
.R
 | 
						|
which translate the
 | 
						|
source language into a common intermediate language UNCOL (Universal Computer
 | 
						|
Oriented Language), and @ M @
 | 
						|
.I
 | 
						|
back ends,
 | 
						|
.R
 | 
						|
each of which translates programs in
 | 
						|
UNCOL into a specific machine language. Under these conditions only @ M + N @
 | 
						|
programs must be written to provide all @ N @ languages on all @ M @
 | 
						|
machines, instead of @ M times N @ programs.
 | 
						|
.PP
 | 
						|
The intermediate language for the Amsterdam Compiler Kit is the machine language
 | 
						|
for a simple stack machine called EM (Encoding Machine) [3]. So a back end for
 | 
						|
the MC68020 translates EM code into MC68020 assembly language. Writing such a
 | 
						|
table [4] suffices to get the back end.
 | 
						|
.PP
 | 
						|
The back end is a single program that is driven by a machine dependent driving
 | 
						|
table. This table, the back end table, defines the mapping of EM code to
 | 
						|
the MC68000, MC68010 or MC68020 assembly language.
 | 
						|
.NH
 | 
						|
The MC68000 and MC68020 micro processors
 | 
						|
.PP
 | 
						|
In this document the name MC68000 will be used for both the MC68000 and the
 | 
						|
MC68010 micro processors, because as far as the back end table is concerned
 | 
						|
there is no difference between them. For a complete and detailed description
 | 
						|
of the MC68020 one is referred to [5]; for the MC68000 one might also use [6].
 | 
						|
In this section some relevant parts will be handled.
 | 
						|
.NH 2 
 | 
						|
Registers
 | 
						|
.PP
 | 
						|
Both the MC68000 and the MC68020 have eight 32-bit data registers (@ D sub 0 @-@ D sub 7 @) that can
 | 
						|
be used for byte (8-bit), word (16-bit) and long word (32-bit) data operations.
 | 
						|
They also have seven 32-bit address registers (@ A sub 0 @-@ A sub 6 @) that may be used as
 | 
						|
software stack pointers and base address registers; address register @ A sub 7 @ is
 | 
						|
used as the system stack pointer. Address registers may also be used for
 | 
						|
word and long word address operations.
 | 
						|
.NH 2 
 | 
						|
Addressing modes
 | 
						|
.PP
 | 
						|
First the MC68000 addressing modes will be discussed. Since the MC68020's
 | 
						|
set of addressing modes is an extension of the MC68000's set, of course this
 | 
						|
section also applies to the MC68020.
 | 
						|
.PP
 | 
						|
In the description we use:
 | 
						|
.IP @ A sub n @
 | 
						|
for address register;
 | 
						|
.IP @ D sub n @
 | 
						|
for data register;
 | 
						|
.IP @ R sub n @
 | 
						|
for address or data register;
 | 
						|
.IP @ X sub n @
 | 
						|
for index register (either data or address register);
 | 
						|
.IP @ PC @
 | 
						|
for program counter;
 | 
						|
.IP @ d sub 8 @
 | 
						|
for 8 bit displacement integer;
 | 
						|
.IP @ d sub 16 @
 | 
						|
for 16 bit displacement integer;
 | 
						|
.IP @ bd @
 | 
						|
for base displacement (may be null, word or long);
 | 
						|
.IP @ od @
 | 
						|
for outer displacement (may be null, word or long).
 | 
						|
.NH 3 
 | 
						|
General addressing modes
 | 
						|
.NH 4 
 | 
						|
Register Direct Addressing
 | 
						|
.IP Syntax: 8
 | 
						|
@ R sub n @ 
 | 
						|
.PP
 | 
						|
This addressing mode (it can be used with either a data register or an address
 | 
						|
register) specifies that the operand is in one of
 | 
						|
the 16 multifunction registers.
 | 
						|
.NH 4 
 | 
						|
Address Register Indirect
 | 
						|
.IP Syntax: 8
 | 
						|
@ ( A sub n ) @ 
 | 
						|
.PP
 | 
						|
The address of the operand is in the address register specified.
 | 
						|
.NH 4 
 | 
						|
Address Register Indirect With Postincrement
 | 
						|
.IP Syntax: 8
 | 
						|
@ ( A sub n )+ @ 
 | 
						|
.PP
 | 
						|
The address of the operand is in the address register specified. After the
 | 
						|
operand address is used, the address register is incremented by one, two or
 | 
						|
four depending upon whether the size of the operand is byte, word or long.
 | 
						|
If the address register is the stack pointer and the operand size is byte, the
 | 
						|
address register is incremented by two rather than one to keep the stack pointer
 | 
						|
on a word boundary.
 | 
						|
.NH 4 
 | 
						|
Address Register Indirect With Predecrement
 | 
						|
.IP Syntax: 8
 | 
						|
@ -( A sub n ) @ 
 | 
						|
.PP
 | 
						|
The address of the operand is in the address register specified. Before the
 | 
						|
operand address is used, the address register is decremented by one, two or
 | 
						|
four depending upon whether the size of the operand is byte, word or long.
 | 
						|
If the address register is the stack pointer and the operand size is byte, the
 | 
						|
address register is decremented by two rather than one to keep the stack pointer
 | 
						|
on a word boundary.
 | 
						|
.NH 4 
 | 
						|
Address Register Indirect With Displacement
 | 
						|
.IP Syntax: 8
 | 
						|
@ d sub 16 ( A sub n ) @ for the MC68000, @ ( d sub 16 , A sub n ) @ for the MC68020 
 | 
						|
.PP
 | 
						|
This address mode requires one word of extension. The address of the operand is
 | 
						|
the sum of the contents of the address register and the sign extended 16-bit
 | 
						|
integer in the extension word.
 | 
						|
.NH 4 
 | 
						|
Address Register Indirect With Index
 | 
						|
.IP Syntax: 8
 | 
						|
@ d sub 8 ( A sub n , X sub n .size) @ for the MC68000, @ ( d sub 8 , A sub n , X sub n .size) @ for the MC68020 
 | 
						|
.PP
 | 
						|
This address mode requires one word of extension according to a certain format, 
 | 
						|
which specifies
 | 
						|
.IP 1.
 | 
						|
which register to use as index register;
 | 
						|
.IP 2.
 | 
						|
a flag that indicates whether the index register is a data register or an
 | 
						|
address register;
 | 
						|
.IP 3.
 | 
						|
a flag that indicates the index size; this is
 | 
						|
.I word
 | 
						|
when the low order part of the index register is to be used, and 
 | 
						|
.I long
 | 
						|
when the whole long value in the register is to be used as index;
 | 
						|
.IP 4.
 | 
						|
an 8-bit displacement integer (the low order byte of the extension word).
 | 
						|
.PP
 | 
						|
The address of the operand is the sum of the contents of the address register,
 | 
						|
the possibly sign extended contents of index register and the sign
 | 
						|
extended 8-bit displacement.
 | 
						|
.NH 4 
 | 
						|
Absolute Data Addressing
 | 
						|
.IP Syntax: 8
 | 
						|
@ address @ for the MC68000, @ ( address ) @ for the MC68020 
 | 
						|
.PP
 | 
						|
Two different kinds of this mode are available:
 | 
						|
.IP 1.
 | 
						|
Absolute Short Address; this mode requires one word of extension. The address of
 | 
						|
the operand is the sign extended 16-bit extension word.
 | 
						|
.IP 2.
 | 
						|
Absolute Long Address; this mode requires two words of extension. The address of
 | 
						|
the operand is developed by concatenation of the two extension words; the high
 | 
						|
order part of the address is the first extension word, the low order part is
 | 
						|
the second.
 | 
						|
.NH 4 
 | 
						|
Program Counter With Displacement.
 | 
						|
.IP Syntax: 8
 | 
						|
@ d sub 16 ( PC ) @ for the MC68000, @ ( d sub 16 , PC ) @ for the MC68020 
 | 
						|
.PP
 | 
						|
This mode requires one word of extension. The address of the operand is the sum
 | 
						|
of the address in the program counter and the sign extended 16-bit displacement
 | 
						|
integer in the extension word. The value in the program counter is the
 | 
						|
address of the extension word.
 | 
						|
.NH 4 
 | 
						|
Program Counter With Index
 | 
						|
.IP Syntax: 8
 | 
						|
@ d sub 8 ( PC , X sub n .size ) @ for the MC68000, @ ( d sub 8 , PC,  X sub n .size ) @ for the MC68020 
 | 
						|
.PP
 | 
						|
This mode requires one word of extension as described under
 | 
						|
.I
 | 
						|
Address Register Indirect With Index.
 | 
						|
.R
 | 
						|
The address of the operand is the sum of the value in the
 | 
						|
program counter, the possibly sign extended index register and the sign
 | 
						|
extended 8-bit displacement integer in the extension word.
 | 
						|
The value in the program counter is the address of the extension word.
 | 
						|
.NH 4 
 | 
						|
Immediate Data
 | 
						|
.IP Syntax: 8
 | 
						|
@ \#data @
 | 
						|
.PP
 | 
						|
This addressing mode requires either one or two words of extension, depending
 | 
						|
on the size of the operation;
 | 
						|
.IP
 | 
						|
byte operation - the operand is in the low order byte of extension word;
 | 
						|
.IP
 | 
						|
word operation - the operand is in the extension word;
 | 
						|
.IP
 | 
						|
long operation - the operand is in the two extension words, the high order
 | 
						|
16-bits are in the first extension word, the low order 16-bits in the second.
 | 
						|
.NH 3 
 | 
						|
Extra MC68020 addressing modes
 | 
						|
.PP
 | 
						|
The MC68020 has three more addressing modes. These modes all use a displacement
 | 
						|
(some even two), an address register and an index register. Instead of the
 | 
						|
address register one may also use the program counter. Any of these
 | 
						|
may be omitted. If all addends are omitted the processor creates an
 | 
						|
effective address of zero. All of these three modes require at least one
 | 
						|
extension word, the
 | 
						|
.I
 | 
						|
Full Format Extension Word,
 | 
						|
.R
 | 
						|
which specifies:
 | 
						|
.IP 1.
 | 
						|
the index register number (0-7);
 | 
						|
.IP 2.
 | 
						|
the index register type (address or data register);
 | 
						|
.IP 3.
 | 
						|
the size of the index (only low order part or the whole register)
 | 
						|
.IP 4.
 | 
						|
a scale factor. This is a number from 0 to 3 which specifies how many bits
 | 
						|
the contents of the index register is to be shifted to the left before being
 | 
						|
used as an index;
 | 
						|
.IP 5.
 | 
						|
a flag that specifies whether the base (address) register is to be added or
 | 
						|
to be suppressed;
 | 
						|
.IP 6.
 | 
						|
a flag that specifies whether to add or suppress the index operand;
 | 
						|
.IP 7.
 | 
						|
two bits that specify the size of the base displacement (null, word or long);
 | 
						|
.IP 8.
 | 
						|
three bits that in combination with (6) above specify which of the three
 | 
						|
addressing modes (described below) to use and, if used, the size of the
 | 
						|
outer displacement (null, word or long).
 | 
						|
.IP N.B.
 | 
						|
All modes mentioned above for the MC68000
 | 
						|
that use an index register may have this register
 | 
						|
scaled (only when using the MC68020).
 | 
						|
.PP
 | 
						|
The three extra addressing modes are:
 | 
						|
.NH 4 
 | 
						|
Address Register Indirect With Index (Base Displacement)
 | 
						|
.IP Syntax: 8
 | 
						|
@ ( bd , A sub n , X sub n .size*scale ) @ (MC68020 only)
 | 
						|
.PP
 | 
						|
The address of the operand is the sum of the contents of the address register,
 | 
						|
the scaled contents of the possibly scaled index register and the possibly
 | 
						|
sign extended base displacement. When the program counter is used instead
 | 
						|
of the address register, the value in the program counter is the address
 | 
						|
of the full format extension word. This mode requires one or two more extension
 | 
						|
words when the size of the base displacement is word or long respectively.
 | 
						|
.PP
 | 
						|
Note that without the index operand, this mode is an extension of the
 | 
						|
.I
 | 
						|
Address Register Indirect With Displacement
 | 
						|
.R
 | 
						|
mode; when using the MC68020 one is no longer limited to a 16-bit displacement.
 | 
						|
Also note that with the index operand added, this mode is an extension
 | 
						|
of the
 | 
						|
.I
 | 
						|
Address Register Indirect With Index
 | 
						|
.R
 | 
						|
mode; when using the MC68020 one is no longer limited to an 8-bit displacement.
 | 
						|
.NH 4 
 | 
						|
Memory Indirect Post-Indexed
 | 
						|
.IP Syntax: 8
 | 
						|
@ ( [ bd , A sub n ] , X sub n .size*scale , od ) @ (MC68020 only)
 | 
						|
.PP
 | 
						|
This mode may use an outer displacement. First an intermediate memory
 | 
						|
address is calculated by adding the contents of the address register and
 | 
						|
the possibly sign extended base displacement. This address is used
 | 
						|
for in indirect memory access of a long word, followed by adding
 | 
						|
the index operand (scaled and possibly signed extended). Finally the
 | 
						|
outer displacement is added to yield the address of the operand.
 | 
						|
When the program counter is used, the value in the program counter is the
 | 
						|
address of the full format extension word.
 | 
						|
.NH 4
 | 
						|
Memory Indirect Pre-Indexed
 | 
						|
.IP Syntax: 8
 | 
						|
@ ( [ bd , A sub n , X sub n .size*scale ] , od ) @ (MC68020 only)
 | 
						|
.PP
 | 
						|
This mode may use an outer displacement. First an intermediate memory
 | 
						|
address is calculated by adding the contents of the address register,
 | 
						|
the scaled contents of the possibly sign extended index register and
 | 
						|
the possibly sign extended base displacement. This address is used
 | 
						|
for an indirect memory access of a long word, followed by adding
 | 
						|
the outer displacement to yield the address of the operand.
 | 
						|
When the program counter is used, the value in the program counter is the
 | 
						|
address of the full format extension word.
 | 
						|
.NH 3
 | 
						|
Addressing modes used in the table
 | 
						|
.PP
 | 
						|
Not all addressing modes mentioned above are used in code generation. It is
 | 
						|
clear that none of the modes that use the program counter PC can be used,
 | 
						|
since at code generation time nothing is known about the value in PC.
 | 
						|
Also some of the possibilities of the three MC68020 addressing modes are not
 | 
						|
used; e.g. it is possible to use a
 | 
						|
.I
 | 
						|
Data Register Indirect
 | 
						|
.R
 | 
						|
mode, which actually is the
 | 
						|
.I
 | 
						|
Address Register Indirect With Index
 | 
						|
.R
 | 
						|
mode, with the address register and the displacement left out. However 
 | 
						|
such a mode would require two extra bytes for the full format extension word,
 | 
						|
and it would also be much slower than using
 | 
						|
.I
 | 
						|
Address Register Indirect.
 | 
						|
.R
 | 
						|
For this kind of reasons several possible addressing modes are not used in the
 | 
						|
generation of code.
 | 
						|
In the table address registers are only used for holding addresses, and
 | 
						|
for index registers only data registers are used.
 | 
						|
.NH
 | 
						|
The M68000 and MC68020 back end table
 | 
						|
.PP
 | 
						|
The table itself has to be run through the C preprocessor 
 | 
						|
before it can be used to generate
 | 
						|
the back end (called
 | 
						|
.I
 | 
						|
code generator
 | 
						|
.R
 | 
						|
or
 | 
						|
.I cg
 | 
						|
for short). When no flags are given to
 | 
						|
the preprocessor an MC68020 code generator is produced; for the MC68000
 | 
						|
code generator one has to run the table through the preprocessor using the
 | 
						|
.I -Dm68k4
 | 
						|
flag.
 | 
						|
.PP
 | 
						|
The table is designed as described in [4]. For the overall design of a back
 | 
						|
end table one is referred to this document. This section only deals
 | 
						|
with problems encountered in writing the table and other things worth noting.
 | 
						|
.NH 2 
 | 
						|
Constant Definitions
 | 
						|
.PP
 | 
						|
Wordsize and pointersize (EM_WSIZE and EM_PSIZE respectively) are defined
 | 
						|
as four (bytes). EM_BSIZE, the hole between AB (the parameter base) and
 | 
						|
LB (the local base), is eight bytes: only
 | 
						|
the return address and the localbase are saved.
 | 
						|
.NH 2 
 | 
						|
Properties
 | 
						|
.PP
 | 
						|
Since Hans van Staveren in his document [4] clearly states that
 | 
						|
.I cg
 | 
						|
execution time is negatively influenced by the number of properties, only
 | 
						|
four different properties have been defined. Besides, since the registers
 | 
						|
really are multifunctional, these four are really all that are needed.
 | 
						|
.NH 2 
 | 
						|
Registers
 | 
						|
.PP
 | 
						|
The table uses register variables: @ D sub 3 @ - @ D sub 7 @ are used as general register
 | 
						|
variables, and address registers @ A sub 2 @ - @ A sub 5 @ are used as pointer register
 | 
						|
variables. @ A sub 6 @ is reserved for the localbase.
 | 
						|
.NH 2 
 | 
						|
Tokens
 | 
						|
.PP
 | 
						|
At first glance one might wonder about the amount of tokens, especially
 | 
						|
for the MC68020, considering the small amount of different addressing modes.
 | 
						|
However, the last three addressing modes mentioned for the MC68020 may
 | 
						|
omit any of the addends, and this leads to a large amount of different tokens.
 | 
						|
I did consider the possibility of enlarging the number of tokens and sets
 | 
						|
even further, because there might be assemblers that don't handle displacements
 | 
						|
of zero optimally (they might generate a 2 byte extension word holding zero).
 | 
						|
The small profit in bytes in the generated code
 | 
						|
however does not justify the increase
 | 
						|
in size of the token section, the set section and the patterns section,
 | 
						|
so this idea was not developed any further.
 | 
						|
.PP
 | 
						|
The timing cost of the tokens may be incorrect for some MC68000 tokens.
 | 
						|
This is because the MC68000 uses a 16-bit data bus which causes the need
 | 
						|
of two separate memory accesses for getting 32-bit operands.
 | 
						|
.NH 3 
 | 
						|
Token names
 | 
						|
.PP
 | 
						|
The amount of tokens and the limited capability of the authors imagination
 | 
						|
might have caused the names of some tokens not to be very clarifying.
 | 
						|
Some information about the names may be in place here.
 | 
						|
.PP
 | 
						|
Whenever part of a token name is in capitals that part is memory indirected
 | 
						|
(i.e. in square brackets). In token names
 | 
						|
.I OFF
 | 
						|
and
 | 
						|
.I off
 | 
						|
mean an offsetted address register, so an address register with a displacement
 | 
						|
(either base displacement or outer displacement).
 | 
						|
.I
 | 
						|
IND, ind
 | 
						|
.R
 | 
						|
and
 | 
						|
.I index
 | 
						|
stand for indexed, or index register.
 | 
						|
.I ABS
 | 
						|
and
 | 
						|
.I abs
 | 
						|
stand for absolute, which actually is just a displacement (base or outer).
 | 
						|
These `rules' only apply to names of tokens that represent actual operands.
 | 
						|
There are also tokens that represent addresses of operands. These
 | 
						|
(with a few exceptions) contain
 | 
						|
.I
 | 
						|
regA, regX
 | 
						|
.R
 | 
						|
and
 | 
						|
.I con
 | 
						|
as parts of there names, which stand for address register, index register and
 | 
						|
displacement (always base displacement) respectively. If the address to which
 | 
						|
the token refers uses memory indirection, that part of the name comes first
 | 
						|
(in small letters), followed by an underscore. The memory indirection part
 | 
						|
follows the `rules' for operand token names.
 | 
						|
.PP
 | 
						|
Of course there are exceptions to these `rules' but in those cases the names
 | 
						|
are self explanatory.
 | 
						|
.PP
 | 
						|
Two special cases:
 | 
						|
.I ext_regX
 | 
						|
is the name of the token that represents the
 | 
						|
address of an absolute indexed operand, syntax @ ( bd , X sub n .size*scale ) @; 
 | 
						|
.I regX
 | 
						|
does not represent any real mode, but is used with EM array instructions and
 | 
						|
pointer arithmetic.
 | 
						|
.NH 3
 | 
						|
Special tokens for the MC68000
 | 
						|
.PP
 | 
						|
The MC68000 requires two extra tokens, which are called
 | 
						|
.I t_regAcon
 | 
						|
and
 | 
						|
.I
 | 
						|
t_regAregXcon.
 | 
						|
.R
 | 
						|
They are necessary because
 | 
						|
.I regAcon
 | 
						|
can only have a 16-bit displacement on the MC68000, and
 | 
						|
.I regAregXcon
 | 
						|
uses only 8 bits for its displacement. To prevent these addressing modes to
 | 
						|
be used with displacements that are too large, the extra tokens are needed.
 | 
						|
Whenever the displacements become too large and they need
 | 
						|
to be used in the generation
 | 
						|
of assembly code, these tokens are transformed into other tokens.
 | 
						|
To prevent the table from becoming too messy I defined
 | 
						|
.I t_regAcon
 | 
						|
and
 | 
						|
.I t_regAregXcon
 | 
						|
to be identical to
 | 
						|
.I regAcon
 | 
						|
and
 | 
						|
.I regAregXcon
 | 
						|
respectively for the MC68020.
 | 
						|
.NH 2 
 | 
						|
Sets
 | 
						|
.PP
 | 
						|
Most set names used in the table are self explanatory, especially to the reader
 | 
						|
who is familiar with the four addressing categories as mentioned in [5]:
 | 
						|
.I
 | 
						|
data, memory, alterable
 | 
						|
.R
 | 
						|
and
 | 
						|
.I
 | 
						|
control.
 | 
						|
.R
 | 
						|
In the sets definition part some sets are defined that are not used elsewhere in
 | 
						|
the table, but are only used to be part of the definition of
 | 
						|
some other set. This keeps the
 | 
						|
set definition part from getting too unreadable.
 | 
						|
.PP
 | 
						|
The sets called
 | 
						|
.I imm_cmp
 | 
						|
consist of all tokens that can be used to compare with a constant.
 | 
						|
.NH 2 
 | 
						|
Instructions
 | 
						|
.PP
 | 
						|
Only the instructions that are used in code generation are listed here.
 | 
						|
The first few instructions are meant especially for the use with register
 | 
						|
variables. The operand LOCAL used here refers to a register variable.
 | 
						|
The reader may not conclude that these operations are also allowed on
 | 
						|
ordinary locals. The space and timing cost of these instructions have been
 | 
						|
adapted, but the use of the word LOCAL for register variables causes these cost
 | 
						|
to be inaccurate anyway.
 | 
						|
.PP
 | 
						|
The 
 | 
						|
.I killreg
 | 
						|
instruction, which generates a comment in the assembly language output and
 | 
						|
which is meant to let
 | 
						|
.I cg
 | 
						|
know that the data register operand has its contents destroyed,
 | 
						|
needs some explaining but this explanation is better in place
 | 
						|
in the discussion of groups 3 and 4 of the section about patterns.
 | 
						|
.PP
 | 
						|
The timing cost of the instructions are probably not very accurate for the
 | 
						|
MC68020 because the MC68020 uses an instruction cache and prefetch. The
 | 
						|
cost used in the table are the `worst case cost' as mentioned in section 9
 | 
						|
of [5].
 | 
						|
.NH 2 
 | 
						|
Moves
 | 
						|
.PP
 | 
						|
These are all pretty straightforward, except perhaps when
 | 
						|
.I t_regAcon
 | 
						|
and
 | 
						|
.I t_regAregXcon
 | 
						|
are used. In these cases the size of the displacement has to be checked
 | 
						|
before moving. This also applies to the stacking rules and the coercions.
 | 
						|
.NH 2 
 | 
						|
Tests
 | 
						|
.PP
 | 
						|
These three tests (one fore each operation size) could not be more
 | 
						|
straightforward than they are now.
 | 
						|
.NH 2 
 | 
						|
Stackingrules
 | 
						|
.PP
 | 
						|
The only peculiar stackingrule is the one for
 | 
						|
.I
 | 
						|
regX.
 | 
						|
.R
 | 
						|
This token is only used with EM array instructions and
 | 
						|
with pointer arithmetic. Whenever it is put
 | 
						|
on the fake stack, some EM instructions are left in the instruction stream
 | 
						|
to remove this token. Consequently it should never have to be stacked. However
 | 
						|
the
 | 
						|
.I
 | 
						|
code generator generator
 | 
						|
.R
 | 
						|
(or
 | 
						|
.I cgg
 | 
						|
for short)
 | 
						|
complained about not having a stackingrule for this token, so it had to
 | 
						|
be added nevertheless.
 | 
						|
.NH 2 
 | 
						|
Coercions
 | 
						|
.PP
 | 
						|
These are all straightforward. There are no splitting coercions since
 | 
						|
the fake stack never contains any tokens that can be split.
 | 
						|
There are only two unstacking coercions.
 | 
						|
The rest are all transforming coercions. Almost all coercions transform
 | 
						|
tokens into either a data register or an address register, except in the
 | 
						|
MC68000 part of the table the
 | 
						|
.I t_regAcon
 | 
						|
and
 | 
						|
.I t_regAregXcon
 | 
						|
tokens are transformed into real
 | 
						|
.I regAcon
 | 
						|
and
 | 
						|
.I regAregXcon
 | 
						|
tokens with displacements that are properly sized.
 | 
						|
.NH 2 
 | 
						|
Patterns
 | 
						|
.PP
 | 
						|
This is the largest part of the table. It is subdivided into 17 groups.
 | 
						|
We will take a closer look at the more interesting groups.
 | 
						|
.NH 3 
 | 
						|
Group 0: rules for register variables
 | 
						|
.PP
 | 
						|
This group makes sure that EM instructions using register variables are
 | 
						|
handled efficiently. This group includes: local loads and
 | 
						|
stores; arithmetic, shifts and logical operations on locals and indirect locals
 | 
						|
and pointer handling, where C expressions like
 | 
						|
.I
 | 
						|
*cp++
 | 
						|
.R
 | 
						|
are handled. For such an expression there are several EM instruction
 | 
						|
sequences the front end might generate. For an integer pointer e.g.:
 | 
						|
.DS
 | 
						|
.B
 | 
						|
lol lol adp stl loi $1==$2 && $1==$4 && $3==4 && $5==4
 | 
						|
.I
 | 
						|
.DE
 | 
						|
or
 | 
						|
.DS
 | 
						|
.B
 | 
						|
lol loi lol adp stl $1==$3 && $3==$5 && $2==4 && $5==4
 | 
						|
.I
 | 
						|
.DE
 | 
						|
or perhaps even
 | 
						|
.DS
 | 
						|
.B
 | 
						|
lil lol adp stl $1==$2 && $2==$4 && $3==4
 | 
						|
.I
 | 
						|
.DE
 | 
						|
Each of these is included, since which one is generated is is up to the front
 | 
						|
end. If the front end is consistent this will mean that some of these patterns
 | 
						|
will never be used in code generation. This might seem a waist, but anyone
 | 
						|
who thinks that will certainly change his mind when his new C front end
 | 
						|
generates a different EM instruction sequence.
 | 
						|
.NH 3 
 | 
						|
Groups 1 and 2: load and store instructions
 | 
						|
.PP
 | 
						|
In these groups
 | 
						|
.B lof
 | 
						|
and
 | 
						|
.B stf
 | 
						|
,
 | 
						|
.B loi
 | 
						|
and
 | 
						|
.B sti
 | 
						|
,
 | 
						|
.B ldf
 | 
						|
and
 | 
						|
.B sdf
 | 
						|
are the important instructions.
 | 
						|
These are the large parts in this group, especially the
 | 
						|
.B loi
 | 
						|
and
 | 
						|
.B sti
 | 
						|
instructions, because they come in three basic sizes (byte, word and long).
 | 
						|
Note that with these instructions in the MC68000 part the
 | 
						|
.I exact
 | 
						|
is omitted in front of
 | 
						|
.I regAcon
 | 
						|
and
 | 
						|
.I
 | 
						|
regAregXcon.
 | 
						|
.R
 | 
						|
This makes sure that
 | 
						|
.I t_regAcon
 | 
						|
and
 | 
						|
.I t_regAregXcon
 | 
						|
are transformed into proper tokens before they are used as addresses.
 | 
						|
.PP
 | 
						|
Also note that the
 | 
						|
.I regAregXcon
 | 
						|
token is completely left out from the
 | 
						|
\fBlof\fR, \fBstf\fR, \fBldf\fR and \fBsdf\fR
 | 
						|
instruction handling. This is because the sum of the token displacement
 | 
						|
and the offset provided in the instruction cannot be checked and is likely
 | 
						|
to exceed 8 bits. Unfortunately 
 | 
						|
.I cgg
 | 
						|
does not allow the inspection of subregisters of tokens that are on the
 | 
						|
fake stack. This same problem might also occur with the
 | 
						|
.I regAcon
 | 
						|
token, but this is less likely because it
 | 
						|
uses 16-bit displacements. Besides if it would have been left out the
 | 
						|
\fBlof\fR, \fBstf\fR, \fBldf\fR and \fBsdf\fR
 | 
						|
instructions would have been handled considerably less efficient.
 | 
						|
.NH 3 
 | 
						|
Groups 3 and 4: integer and unsigned arithmetic
 | 
						|
.PP
 | 
						|
EM instruction
 | 
						|
.B sbi
 | 
						|
also works with address registers, because the 
 | 
						|
.B cmp
 | 
						|
instruction in group 12 is replaced by \fBsbi 4\fR.
 | 
						|
.PP
 | 
						|
For the MC68000 \fBmli\fR, \fBmlu\fR, \fBdvi\fR, \fBdvu\fR, \fBrmi\fR
 | 
						|
and \fBrmu\fR are handled
 | 
						|
by library routines. This is because the MC68000 has only 16-bit multiplications
 | 
						|
and divisions.
 | 
						|
.PP
 | 
						|
The MC68020 does have 32-bit multiplications and divisions, but for the
 | 
						|
.B rmi
 | 
						|
and
 | 
						|
.B rmu
 | 
						|
EM instructions peculiar things happen anyway: they generate the
 | 
						|
.I killreg
 | 
						|
instruction. This is necessary because the data register that 
 | 
						|
first held the dividend now holds the quotient; the original contents are
 | 
						|
destroyed without
 | 
						|
.I cg
 | 
						|
knowing about it (the destruction of the two registers that make up the
 | 
						|
.I DREG_pair
 | 
						|
token couldn't be noted in the instructions part of the table).
 | 
						|
To let
 | 
						|
.I cg
 | 
						|
know that these contents are destroyed, we have to use this `pseudo instruction'
 | 
						|
from lack of a better solution.
 | 
						|
.NH 3 
 | 
						|
Group 5: floating point arithmetic
 | 
						|
.PP
 | 
						|
Since floating point arithmetic is not implemented traps will be generated here.
 | 
						|
.NH 3 
 | 
						|
Group 6: pointer arithmetic
 | 
						|
.PP
 | 
						|
This also is a very important group, along with groups 1 and 2. The MC68020
 | 
						|
has many different addressing modes and if possible they should be used in
 | 
						|
the generation of assembly language.
 | 
						|
.PP
 | 
						|
The
 | 
						|
.I regX
 | 
						|
token is generated here too. It is meant to make efficient use of the
 | 
						|
MC68020 possibility of scaling index registers.
 | 
						|
.PP
 | 
						|
Note that I would have liked one extra pattern to handle C-statements
 | 
						|
like
 | 
						|
.DS
 | 
						|
.I
 | 
						|
pointer += expr ? constant1 : constant2;
 | 
						|
.R
 | 
						|
.DE
 | 
						|
efficiently. This pattern would have looked like:
 | 
						|
.DS
 | 
						|
pat ads
 | 
						|
with const
 | 
						|
leaving adp %1.num
 | 
						|
.DE
 | 
						|
but when
 | 
						|
.I cg
 | 
						|
is coming to the EM replacement part, the constant has already been removed
 | 
						|
from the fake stack, causing
 | 
						|
.I %1.num
 | 
						|
to have a wrong value.
 | 
						|
.NH 3 
 | 
						|
Group 9: logical instructions
 | 
						|
.PP
 | 
						|
The EM instructions \fBand\fR,
 | 
						|
.B ior
 | 
						|
and
 | 
						|
.B xor
 | 
						|
are so much alike that procedures can be used here, except for the
 | 
						|
.B
 | 
						|
xor $1==4
 | 
						|
.R
 | 
						|
instruction, because the MC68000
 | 
						|
.I eor
 | 
						|
instruction does not allow as many kinds of operands as
 | 
						|
.I and
 | 
						|
and
 | 
						|
.I
 | 
						|
or.
 | 
						|
.R
 | 
						|
.NH 3 
 | 
						|
Group 11: arrays
 | 
						|
.PP
 | 
						|
This group also tries to make efficient use of the available addressing modes,
 | 
						|
but it leaves the actual work to group 6 mentioned above.
 | 
						|
.PP
 | 
						|
The
 | 
						|
.I regX
 | 
						|
token is also generated here. In this group this token is very useful for
 | 
						|
handling array instructions for arrays with one, two, four or eight byte
 | 
						|
elements; the array index goes into the index register, which can then
 | 
						|
be scaled appropriately. An offset is used when the
 | 
						|
first array element has an index other than zero.
 | 
						|
.PP
 | 
						|
I would have liked some extra patterns here too but they won't work
 | 
						|
for the same reasons as explained in the discussion of group 6.
 | 
						|
.NH 3 
 | 
						|
Group 14: procedure calls instructions
 | 
						|
.PP
 | 
						|
The function return area consists of registers @ D sub 0 @ and @ D sub 1 @.
 | 
						|
.NH 3 
 | 
						|
Group 15: miscellaneous instructions
 | 
						|
.PP
 | 
						|
In many cases here library routines are called. These will be discussed
 | 
						|
later.
 | 
						|
.PP
 | 
						|
Two special EM instructions are included here: \fBdch\fR, and \fBlpb\fR.
 | 
						|
I don't know when they are generated by a front end, but these
 | 
						|
instructions were also in the back end table for the PDP. In the PDP table
 | 
						|
these instructions were replaced by
 | 
						|
.B
 | 
						|
loi 4
 | 
						|
.R
 | 
						|
and
 | 
						|
.B
 | 
						|
adp 8
 | 
						|
.R
 | 
						|
respectively. I included them both, since they couldn't do any harm.
 | 
						|
.NH 3 
 | 
						|
Extra group: optimalization
 | 
						|
.PP
 | 
						|
This group is handling EM patterns with more than one instruction. This group
 | 
						|
is not absolutely necessary but it makes the generation of code
 | 
						|
more efficient. Among the things that are handled here are: arithmetic and
 | 
						|
logical operations on locals, externals and indirect locals; shifting
 | 
						|
of locals, externals and indirect locals by one; some pointer arithmetic; tests
 | 
						|
in combination with logical and's and or's or with branches. Finally
 | 
						|
there are sixteen patterns about divisions that could be handled more
 | 
						|
efficiently by right shifts and which I think should be handled by the
 | 
						|
peephole optimizer (since it also handles
 | 
						|
the same patterns with multiplication).
 | 
						|
.NH
 | 
						|
The library routines
 | 
						|
.PP
 | 
						|
The table is supplied with two separate libraries: one for the MC68000 and one
 | 
						|
for the MC68020. The MC68000 uses a couple more routines than the MC68020
 | 
						|
because it doesn't have 32-bit division and multiplication.
 | 
						|
.PP
 | 
						|
The routines that need to pop their operands first store their return address.
 | 
						|
Routines that need other register besides @ D sub 0 @-@ D sub 2 @ and @ A sub 0 @-@ A sub 1 @ first store
 | 
						|
the original contents of those registers. @ D sub 0 @-@ D sub 2 @ and @ A sub 0 @-@ A sub 1 @ do not have
 | 
						|
to be saved because if they contain anything useful, their contents
 | 
						|
are pushed on the stack before the routine is called.
 | 
						|
.PP
 | 
						|
The
 | 
						|
.I .trp
 | 
						|
routine just prints a message stating the trap number and exits (except
 | 
						|
of course when that particular trap number is masked). Usually higher
 | 
						|
level languages use their own trap handling routines.
 | 
						|
.PP
 | 
						|
The
 | 
						|
.I .mon
 | 
						|
routine doesn't do anything useful at all. It just prints a message stating that
 | 
						|
the specified system call is not implemented and then exits. Front ends
 | 
						|
usually generate calls to special routines rather than the EM
 | 
						|
instruction \fBmon\fR.
 | 
						|
These routines have to be supplied in another library. They
 | 
						|
may be system dependent (e.g. the MC68000 machine this table was tested on
 | 
						|
first moves the parameters to registers, then moves the system call number
 | 
						|
to @ D sub 0 @ and then executes
 | 
						|
.I
 | 
						|
trap #0,
 | 
						|
.R
 | 
						|
whereas the MC68020 machine this table was tested on required the parameters
 | 
						|
to be on the stack rather than in registers). Therefor this library is not
 | 
						|
discussed here.
 | 
						|
.PP
 | 
						|
The
 | 
						|
.I .printf
 | 
						|
routine is included for EM diagnostic messages. It can print strings using %s,
 | 
						|
16-bit decimal numbers using %d and 32-bit hexadecimal numbers using %x.
 | 
						|
.PP
 | 
						|
The
 | 
						|
.I .strhp
 | 
						|
routine stores a new EM heap pointer, and sometimes it needs to allocate more
 | 
						|
heap space. This is done by calling the system call routine \fI_brk\fR.
 | 
						|
Chunks of 1K bytes are allocated, but this can easily be changed into
 | 
						|
larger or smaller chunks.
 | 
						|
.PP
 | 
						|
The MC68000 library also contains a routine to handle the EM instruction \fBrck\fR.
 | 
						|
The MC68020 has an instruction
 | 
						|
.I cmp2
 | 
						|
that is specially meant for range checking so the MC68020 library can do without
 | 
						|
that routine.
 | 
						|
.PP
 | 
						|
The MC68000 library has two multiplication routines, one for unsigned and the other
 | 
						|
for signed multiplication. The one for signed multiplication
 | 
						|
first tests the sizes of the operands, to see if it can perform
 | 
						|
the 16 bit machine instruction instead of the routine. If not, it considers
 | 
						|
it's two operands being two digit numbers in a 65535-radix system. It
 | 
						|
uses the 16-bit unsigned multiply instruction
 | 
						|
.I mulu
 | 
						|
three times (it does not calculate the high order result),
 | 
						|
and adds up the intermediary results the proper way. The signed
 | 
						|
multiplication routine calculates the sign of the result, calculates
 | 
						|
the result as it it were an unsigned multiplication, and
 | 
						|
adjusts the sign of the result. Here testing
 | 
						|
the operands for there sizes would be less simple, because the operands
 | 
						|
are signeds; so that is not done here.
 | 
						|
.PP
 | 
						|
The MC68000 library also has two division routines. The routine for unsigned
 | 
						|
division uses the popular algorithm, where the divisor is shifted out and
 | 
						|
the quotient shifted in. The signed division routine calculates the sign of
 | 
						|
both the quotient and the remainder, calls the unsigned division routine
 | 
						|
and adjusts the signs for the quotient and the remainder.
 | 
						|
.PP
 | 
						|
The
 | 
						|
.I .nop
 | 
						|
routine is included for testing purposes. This routine prints the line
 | 
						|
number and the value in the stack pointer. Calls to this routine
 | 
						|
are generated by the EM instruction \fBnop\fR, which is ordinarily
 | 
						|
left out by the peephole optimizer.
 | 
						|
.NH
 | 
						|
Testing the table
 | 
						|
.PP
 | 
						|
There are special test programs available for testing back end tables.
 | 
						|
First there is the EM test set, which tests most EM instructions, making
 | 
						|
good use of the
 | 
						|
.B nop
 | 
						|
instruction. Then there are the Pascal and C test programs. The Pascal
 | 
						|
test programs report errors, which makes it relatively easy
 | 
						|
to find out what was wrong in the table. The C test programs just
 | 
						|
generate some output, which then has to be compared to the expected
 | 
						|
output. Differences are
 | 
						|
not only caused by errors but also e.g. by the use of four
 | 
						|
byte integers and unsigneds (which this table does),
 | 
						|
the use of signed characters
 | 
						|
instead of unsigned characters (the C front end I used generated signed
 | 
						|
characters) or because the back end
 | 
						|
does not support floating point.
 | 
						|
These differences have to be `filtered out' to reveal
 | 
						|
the differences caused by actual errors in the back end table.
 | 
						|
These errors then have to be found out by examining the assembly code, for
 | 
						|
no proper diagnostic messages are generated.
 | 
						|
.PP
 | 
						|
After these three basic tests there still remain a number of patterns that
 | 
						|
haven't been tested yet. Fortunately
 | 
						|
.I cgg
 | 
						|
offers the possibility of generating a special
 | 
						|
.I cg
 | 
						|
that can print a list of patterns that haven't been used in
 | 
						|
code generation yet.
 | 
						|
For these patterns the table writer has to write his own test programs.
 | 
						|
This may complicate things a bit because errors may now be caused by
 | 
						|
errors in the back end table as well as errors in the test programs.
 | 
						|
The latter happened quite often to me, because I found EM
 | 
						|
to be an uncomfortable programming language (of course it isn't meant to
 | 
						|
be a programming language, but an intermediary language).
 | 
						|
.PP
 | 
						|
There still remain a couple of patterns in this table that haven't been tested
 | 
						|
yet. However these patterns all have very similar cases that have been
 | 
						|
tested (an example of this is mentioned in the section on group 0
 | 
						|
of the patterns section of the table). Some patterns have to
 | 
						|
do with floating point numbers. These EM instructions all generate
 | 
						|
traps, so they didn't all have to be tested. The two instructions
 | 
						|
.B dch
 | 
						|
and
 | 
						|
.B lpb
 | 
						|
haven't been tested in this table, but since they only use EM replacement
 | 
						|
and they have been tested in the PDP back end table, these two should
 | 
						|
be all right.
 | 
						|
.NH
 | 
						|
Performance of the back end
 | 
						|
.PP
 | 
						|
To test the performance of the back end I gathered a couple of
 | 
						|
C programs and compiled them on the machines I used to test the back ends on.
 | 
						|
I compiled them using the C compiler that was available there and
 | 
						|
I also compiled them using the back end. I then compared the sizes
 | 
						|
of the text segments in the object files.
 | 
						|
The final results of these comparisons are in fig. 1 and fig. 2.
 | 
						|
.KF
 | 
						|
.TS
 | 
						|
center box;
 | 
						|
cfI s s s s s
 | 
						|
c s s s s s
 | 
						|
c c | c s | c s
 | 
						|
c c | c s | c s
 | 
						|
c | c | c  c | c  c
 | 
						|
l | n | n  n | n  n.
 | 
						|
Differences in text segment sizes for the MC68000
 | 
						|
parts of the back end compiled by itself
 | 
						|
_
 | 
						|
original	 	old m68k4	new MC68000
 | 
						|
compiler	(100%)	back end	back end
 | 
						|
_
 | 
						|
name	size	size	perc.	size	perc.
 | 
						|
_
 | 
						|
codegen.c	13892	16224	116.7%	12860	92.5%
 | 
						|
compute.c	4340	4502	103.7%	4530	104.3%
 | 
						|
equiv.c	680	662	97.3%	598	87.9%
 | 
						|
fillem.c	8016	7304	91.1%	6880	85.8%
 | 
						|
gencode.c	1356	1194	88.0%	1130	83.3%
 | 
						|
glosym.c	224	202	90.1%	190	84.8%
 | 
						|
main.c	732	672	91.8%	634	86.6%
 | 
						|
move.c	1876	1526	81.3%	1410	75.1%
 | 
						|
nextem.c	1288	1594	123.7%	1192	92.5%
 | 
						|
reg.c	1076	1014	94.2%	916	85.1%
 | 
						|
regvar.c	1352	1188	87.8%	1150	85.0%
 | 
						|
salloc.c	1240	1100	88.7%	1024	82.5%
 | 
						|
state.c	628	600	95.5%	532	84.7%
 | 
						|
subr.c	6948	6382	91.8%	5680	81.7%
 | 
						|
=
 | 
						|
averages	2939	3155	95.8%	2766	86.6%
 | 
						|
.TE
 | 
						|
.DS C
 | 
						|
fig 1.
 | 
						|
.DE
 | 
						|
.KE
 | 
						|
.KF
 | 
						|
.TS
 | 
						|
center box;
 | 
						|
cfI s s s
 | 
						|
cfI s s s
 | 
						|
c s s s
 | 
						|
c s s s
 | 
						|
c c | c s
 | 
						|
c c | c s
 | 
						|
c | c | c  c
 | 
						|
l | n | n  n.
 | 
						|
Differences in text segment sizes
 | 
						|
for the MC68020
 | 
						|
parts of the back end
 | 
						|
compiled by itself
 | 
						|
_
 | 
						|
original	 	MC68020
 | 
						|
compiler	(100%)	back end
 | 
						|
_
 | 
						|
name	size	size	perc.
 | 
						|
_
 | 
						|
codegen.c	12608	12134	96.2%
 | 
						|
compute.c	4624	4416	95.5%
 | 
						|
equiv.c	572	504	88.1%
 | 
						|
fillem.c	7780	6976	89.6%
 | 
						|
gencode.c	1320	1086	82.2%
 | 
						|
glosym.c	228	182	79.8%
 | 
						|
main.c	736	596	80.9%
 | 
						|
move.c	1392	1280	91.9%
 | 
						|
nextem.c	1176	1066	90.6%
 | 
						|
reg.c	1052	836	79.4%
 | 
						|
regvar.c	1196	968	80.9%
 | 
						|
salloc.c	1200	932	77.6%
 | 
						|
state.c	580	528	91.0%
 | 
						|
subr.c	6136	5268	85.8%
 | 
						|
=
 | 
						|
averages	2900	2627	86.4%
 | 
						|
.TE
 | 
						|
.DS C
 | 
						|
fig 2.
 | 
						|
.DE
 | 
						|
.KE
 | 
						|
Fig. 1 also includes results of an old m68k4 back end (a back end
 | 
						|
for the MC68000 with four byte word and pointersize). The table for
 | 
						|
this back end was given to me as an example, but I thought it didn't make
 | 
						|
good use of the MC68000's addressing capabilities, it hardly did any
 | 
						|
optimalization, and it sometimes even
 | 
						|
generated code that the assembler would not swallow.
 | 
						|
This was sufficient reason for me to write a completely new table.
 | 
						|
.PP
 | 
						|
The results from the table may not be taken too seriously. The sizes measured
 | 
						|
are the sizes of the text segments of the user programs, i.e. without the
 | 
						|
inclusion of library routines. Of course these segments do contain calls
 | 
						|
to these routines. Another thing is that the
 | 
						|
.I rom
 | 
						|
segment may be included in the text segment (this is why the
 | 
						|
results for the MC68000 for
 | 
						|
.I compute.c
 | 
						|
look so bad).
 | 
						|
.PP
 | 
						|
Some other things must be said about these results.
 | 
						|
The quality of EM code
 | 
						|
generated by the C front end is certainly not optimal. The front end
 | 
						|
uses temporary locals (extra locals that are used to evaluate expressions)
 | 
						|
far too quickly: for a simple C expression like
 | 
						|
.DS
 | 
						|
.I
 | 
						|
*(pointer) += constant
 | 
						|
.R
 | 
						|
.DE
 | 
						|
where
 | 
						|
.I pointer
 | 
						|
is a register variable, the C front end generates (for obscure reasons)
 | 
						|
a temporary local that holds the contents of \fIpointer\fR. This way
 | 
						|
the pattern for
 | 
						|
.DS
 | 
						|
.B
 | 
						|
loc lil adi sil $2==$4 && $3==4
 | 
						|
.R
 | 
						|
.DE
 | 
						|
for register variables is not used and longer, less efficient
 | 
						|
code is generated. But even in spite of this, the back end seems to
 | 
						|
generate rather compact code.
 | 
						|
.NH
 | 
						|
Some timing results
 | 
						|
.PP
 | 
						|
In order to measure the performance of the code generated by the back end
 | 
						|
some timing tests were done. The reason I chose these particular tests is
 | 
						|
that they were also done for many other back ends; the reader can compare
 | 
						|
the results if he so wishes (of course comparing the results only
 | 
						|
show a global difference in speed of the various machines; it doesn't
 | 
						|
show whether some back end generates relatively better code than another).
 | 
						|
.PP
 | 
						|
On the MC68000 machine the statements were executed one million times.
 | 
						|
On the MC68020 machine the statements had to be executed four million times
 | 
						|
because this machine was so fast that timing results would be very
 | 
						|
unreliable if the statements were executed only one million times.
 | 
						|
.PP
 | 
						|
For testing I used the following C test program:
 | 
						|
.DS
 | 
						|
.I
 | 
						|
main()
 | 
						|
{
 | 
						|
    int i, j, ...
 | 
						|
    ...
 | 
						|
    for (i=0; i<1000; i++)
 | 
						|
        for (j=0; j<1000; j++)
 | 
						|
    	    STATEMENT;
 | 
						|
}
 | 
						|
.R
 | 
						|
.DE
 | 
						|
where
 | 
						|
.I STATEMENT
 | 
						|
is any of the test statements or the empty statement. For the MC68020
 | 
						|
tests I used 2000 instead of 1000.
 | 
						|
The results of the test with the empty statement were used to calculate
 | 
						|
the execution times of the other test statements.
 | 
						|
.PP
 | 
						|
Figures 3 and 4 show many results. For each machine actually two tests were
 | 
						|
done: one with register variables, and the other without them.
 | 
						|
I noticed that the original C compilers on both machines did not generate
 | 
						|
the use of register variables, unless specifically requested. The
 | 
						|
back end uses register variables when and where they are profitable, even
 | 
						|
if the user did not ask for them.
 | 
						|
.KF
 | 
						|
.TS
 | 
						|
center box;
 | 
						|
cfI s s s s
 | 
						|
c s s s s
 | 
						|
c | c s | c s
 | 
						|
cw(1.5i) | c c | c c
 | 
						|
c | c c | c c
 | 
						|
lp-2fI | n n | n n.
 | 
						|
timing results for the MC68000
 | 
						|
times in @ mu @seconds
 | 
						|
_
 | 
						|
test statement	without register variables	with register variables
 | 
						|
_
 | 
						|
 	original	new MC68000	original	new MC68000
 | 
						|
 	C compiler	back end	C compiler	back end
 | 
						|
_
 | 
						|
int1=0;	2.8	2.7	0.5	0.5
 | 
						|
int1=int2-1;	4.1	4.1	1.3	1.3
 | 
						|
int1=int1+1;	4.1	4.1	1.3	1.3
 | 
						|
int1=int2*int3;	40.0	40.5	36.2	36.8
 | 
						|
T{
 | 
						|
int1=(int2<0);
 | 
						|
\/*true*/
 | 
						|
T}	5.5	7.3	2.0	4.5
 | 
						|
T{
 | 
						|
int1=(int2<0);
 | 
						|
\/*false*/
 | 
						|
T}	4.7	8.5	2.8	5.6
 | 
						|
T{
 | 
						|
int1=(int2<3);
 | 
						|
\/*true*/
 | 
						|
T}	6.2	7.7	2.6	5.4
 | 
						|
T{
 | 
						|
int1=(int2<3);
 | 
						|
\/*false*/
 | 
						|
T}	5.4	8.9	3.6	6.5
 | 
						|
T{
 | 
						|
.na
 | 
						|
int1=((int2>3)||(int2<3));
 | 
						|
\/* true || false */
 | 
						|
T}	6.0	7.8	3.4	5.4
 | 
						|
T{
 | 
						|
.na
 | 
						|
int1=((int2>3)||(int2<3));
 | 
						|
\/* false || true */
 | 
						|
T}	9.1	10.2	5.7	7.1
 | 
						|
T{
 | 
						|
.na
 | 
						|
switch (int1) {
 | 
						|
case 1: int1=0; break;
 | 
						|
case 2: int1=1; break;
 | 
						|
}
 | 
						|
T}	6.3	17.8	5.3	14.0
 | 
						|
T{
 | 
						|
.na
 | 
						|
if (int1=0) int2=3;
 | 
						|
\/*true*/
 | 
						|
T}	5.1	4.7	1.3	1.3
 | 
						|
T{
 | 
						|
.na
 | 
						|
if (int1=0) int2=3;
 | 
						|
\/*false*/
 | 
						|
T}	2.2	2.1	1.9	1.1
 | 
						|
while (int1>0) int1=int1-1;	2.2	2.1	1.1	1.1
 | 
						|
int1=a[int2];	6.8	6.7	4.0	3.1
 | 
						|
p3(int1);	14.3	11.1	13.4	10.0
 | 
						|
int1=f(int2);	17.7	14.5	14.8	11.7
 | 
						|
s.overhead=5400;	2.8	2.7	2.9	2.7
 | 
						|
.TE
 | 
						|
.DS C
 | 
						|
Fig. 3
 | 
						|
.DE
 | 
						|
.KE
 | 
						|
.KF
 | 
						|
.TS
 | 
						|
center box;
 | 
						|
cfI s s s s
 | 
						|
c s s s s
 | 
						|
c | c s | c s
 | 
						|
cw(1.5i) | c c | c c
 | 
						|
c | c c | c c
 | 
						|
lp-2fI | n n | n n.
 | 
						|
timing results for the MC68020
 | 
						|
times in @ mu @seconds
 | 
						|
_
 | 
						|
test statement	without register variables	with register variables
 | 
						|
_
 | 
						|
 	original	new MC68020	original	new MC68020
 | 
						|
 	C compiler	back end	C compiler	back end
 | 
						|
_
 | 
						|
int1=0;	.25	.25	.15	.15
 | 
						|
int1=int2-1;	1.3	1.3	.38	.38
 | 
						|
int1=int1+1;	1.2	.90	.38	.15
 | 
						|
int1=int2*int3;	4.4	4.2	3.0	3.1
 | 
						|
T{
 | 
						|
int1=(int2<0);
 | 
						|
\/*true*/
 | 
						|
T}	1.6	2.7	1.1	2.3
 | 
						|
T{
 | 
						|
int1=(int2<0);
 | 
						|
\/*false*/
 | 
						|
T}	1.9	2.9	.80	2.1
 | 
						|
T{
 | 
						|
int1=(int2<3);
 | 
						|
\/*true*/
 | 
						|
T}	1.7	2.8	1.2	2.6
 | 
						|
T{
 | 
						|
int1=(int2<3);
 | 
						|
\/*false*/
 | 
						|
T}	2.1	3.0	.85	2.3
 | 
						|
T{
 | 
						|
.na
 | 
						|
int1=((int2>3)||(int2<3));
 | 
						|
\/* true || false */
 | 
						|
T}	2.1	3.1	1.2	2.5
 | 
						|
T{
 | 
						|
.na
 | 
						|
int1=((int2>3)||(int2<3));
 | 
						|
\/* false || true */
 | 
						|
T}	3.4	4.2	1.8	3.2
 | 
						|
T{
 | 
						|
.na
 | 
						|
switch (int1) {
 | 
						|
case 1: int1=0; break;
 | 
						|
case 2: int1=1; break;
 | 
						|
}
 | 
						|
T}	2.7	8.0	2.0	6.9
 | 
						|
T{
 | 
						|
.na
 | 
						|
if (int1=0) int2=3;
 | 
						|
\/*true*/
 | 
						|
T}	1.2	1.3	.63	.63
 | 
						|
T{
 | 
						|
.na
 | 
						|
if (int1=0) int2=3;
 | 
						|
\/*false*/
 | 
						|
T}	1.7	1.6	.50	.53
 | 
						|
while (int1>0) int1=int1-1;	1.2	1.3	.55	.53
 | 
						|
int1=a[int2];	1.8	1.8	1.0	1.0
 | 
						|
p3(int1);	14.8	5.5	14.1	5.0
 | 
						|
int1=f(int2);	16.3	6.6	15.2	5.9
 | 
						|
s.overhead=5400;	.48	.48	.50	.50
 | 
						|
.TE
 | 
						|
.DS C
 | 
						|
Fig. 4
 | 
						|
.DE
 | 
						|
.KE
 | 
						|
.PP
 | 
						|
The reader may have noticed that on both machines the back end seems
 | 
						|
to generate considerably slower code for tests where a `condition' is
 | 
						|
used in the rhs of an assignment statement. This is in fact not true: it is
 | 
						|
the front end that generates bad code. Two examples: for the C statement
 | 
						|
.DS
 | 
						|
.I
 | 
						|
int1 = (int2 < 0);
 | 
						|
.R
 | 
						|
.DE
 | 
						|
the front end generates the following code for the rhs (I
 | 
						|
used arbitrary labels):
 | 
						|
.DS
 | 
						|
.B
 | 
						|
lol -16
 | 
						|
zlt *10
 | 
						|
loc 0
 | 
						|
bra *11
 | 
						|
10
 | 
						|
loc 1
 | 
						|
11
 | 
						|
.R
 | 
						|
.DE
 | 
						|
while in this case (to my opinion) it should have generated
 | 
						|
.DS
 | 
						|
.B
 | 
						|
lol -16
 | 
						|
tlt
 | 
						|
.R
 | 
						|
.DE
 | 
						|
which is much shorter. Another example: for the C statement
 | 
						|
.DS
 | 
						|
.I
 | 
						|
int1 = (int2 < 3);
 | 
						|
.B
 | 
						|
.DE
 | 
						|
the front end generates for the rhs
 | 
						|
.DS
 | 
						|
.B
 | 
						|
lol -16
 | 
						|
loc 3
 | 
						|
blt *10
 | 
						|
loc 0
 | 
						|
bra *11
 | 
						|
10
 | 
						|
loc 1
 | 
						|
11
 | 
						|
.R
 | 
						|
.DE
 | 
						|
while a much better translation would be
 | 
						|
.DS
 | 
						|
.B
 | 
						|
lol -16
 | 
						|
loc 3
 | 
						|
cmi 4
 | 
						|
tlt
 | 
						|
.R
 | 
						|
.DE
 | 
						|
.PP
 | 
						|
Another statement that the back end seems to generate slower code for is
 | 
						|
the C switch statement. This is true, but it is also caused by
 | 
						|
the way these things are done in EM. EM uses the
 | 
						|
.B csa
 | 
						|
or
 | 
						|
.B csb
 | 
						|
instruction, and for these two I had to use library routines. On larger
 | 
						|
switch statements the
 | 
						|
.I .csa
 | 
						|
routine will perform relatively better.
 | 
						|
.PP
 | 
						|
The back end generates considerably faster code for procedure and function
 | 
						|
calls, especially in the MC68020 case, and also for the C statement
 | 
						|
.DS
 | 
						|
.I
 | 
						|
int1 = int1 + 1;
 | 
						|
.R
 | 
						|
.DE
 | 
						|
The original C compilers use the same method for this instruction
 | 
						|
as for
 | 
						|
.DS
 | 
						|
.I
 | 
						|
int1 = int2 - 1;
 | 
						|
.R
 | 
						|
.DE
 | 
						|
they perform the addition in a scratch register, and then store the
 | 
						|
result. For the former C statement this is not necessary, because
 | 
						|
the MC68000 and MC68020 have an instruction that can add constants
 | 
						|
to almost anything (in this case: to locals). The MC68000 and MC68020
 | 
						|
back ends do use this instruction.
 | 
						|
.NH
 | 
						|
Some final remarks
 | 
						|
.PP
 | 
						|
As mentioned a few times before, the C front end compiler does not
 | 
						|
generate optimal code and as a consequence of this the
 | 
						|
back end does not always generate optimal code. This is especially
 | 
						|
the case with temporary locals, which the front end generates much
 | 
						|
too quickly, and also with conditional expressions that are
 | 
						|
used in the rhs of an assignment statement (fortunately this is not
 | 
						|
needed so much).
 | 
						|
.PP
 | 
						|
If
 | 
						|
.I cgg
 | 
						|
would have been able to accept operands separated by any character
 | 
						|
instead of just by commas (in the instruction definitions part),
 | 
						|
I wouldn't have had the need of the
 | 
						|
.I killreg
 | 
						|
pseudo instruction. It would also be handy to have
 | 
						|
.I cgg
 | 
						|
accept all normal C operators. At the moment
 | 
						|
.I cgg
 | 
						|
does not accept binary ands, ors and exors, even though in [4]
 | 
						|
it is stated that
 | 
						|
.I cgg
 | 
						|
does accept all normal C operators. As it happens I did not need the
 | 
						|
binary operators, but at some time in developing the table I thought
 | 
						|
I did.
 | 
						|
.PP
 | 
						|
I would also like
 | 
						|
.I cg
 | 
						|
to do more with the condition codes information that is supplied with
 | 
						|
each instruction in the instruction definitions section of the table.
 | 
						|
Sometimes
 | 
						|
.I cg
 | 
						|
generates test instructions which actually were not necessary. This
 | 
						|
of course causes the generated
 | 
						|
programs to be slightly larger and slightly slower.
 | 
						|
.PP
 | 
						|
In spite of the few minor shortcomings mentioned above I found
 | 
						|
.I cgg
 | 
						|
a very comfortable tool to use.
 | 
						|
.SH
 | 
						|
References
 | 
						|
.PP
 | 
						|
.IP [1]
 | 
						|
T. B. Steel Jr.,
 | 
						|
.I
 | 
						|
UNCOL: The myth and the Fact,
 | 
						|
.R
 | 
						|
in Ann. Rev. Auto. Prog.,
 | 
						|
R. Goodman (ed.), Vol. 2 (1969), pp 325 - 344
 | 
						|
.IP [2]
 | 
						|
A. S. Tanenbaum, H. van Staveren, E. G. Keizer, J. W. Stevenson,
 | 
						|
.I
 | 
						|
A practical toolkit for making portable compilers,
 | 
						|
.R
 | 
						|
Informatica Report 74, Vrije Universiteit, Amsterdam, 1983
 | 
						|
.IP [3]
 | 
						|
A. S. Tanenbaum, H. van Staveren, E. G. Keizer, J. W. Stevenson,
 | 
						|
.I
 | 
						|
Description of an experimental machine architecture for use with
 | 
						|
block structured languages,
 | 
						|
.R
 | 
						|
Informatica Report 81, Vrije Universiteit, Amsterdam, 1983
 | 
						|
.IP [4]
 | 
						|
H. van Staveren
 | 
						|
.I
 | 
						|
The table driven code generator from the Amsterdam Compiler Kit,
 | 
						|
Second Revised Edition,
 | 
						|
.R
 | 
						|
Vrije Universiteit, Amsterdam
 | 
						|
.IP [5]
 | 
						|
.I
 | 
						|
MC68020 32-bit Microprocessor User's Manual,
 | 
						|
.R
 | 
						|
Second Edition,
 | 
						|
Motorola Inc., 1985, 1984
 | 
						|
.IP [6]
 | 
						|
.I
 | 
						|
MC68000 16-bit Microprocessor User's Manual,
 | 
						|
Preliminary,
 | 
						|
.R
 | 
						|
Motorola Inc., 1979
 |