1408 lines
		
	
	
	
		
			43 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			1408 lines
		
	
	
	
		
			43 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| .nr PS 11
 | |
| .nr VS 13p
 | |
| .EQ
 | |
| delim @@
 | |
| .EN
 | |
| .EQ
 | |
| gfont R
 | |
| .EN
 | |
| .ND
 | |
| .RP
 | |
| .TL
 | |
| A back end table for the Motorola MC68000, MC68010 and MC68020 microprocessors
 | |
| .AU
 | |
| Frank Doodeman
 | |
| .AB
 | |
| A back end table is part of the Amsterdam Compiler Kit (ACK). It is used
 | |
| to produce the actual back end, a program that translates the intermediate
 | |
| language family EM to assembly language for some target machine. The table
 | |
| discussed here can be used for two back ends, suitable for in total three
 | |
| machines: the MC68000 and MC68010 (the difference between these two is
 | |
| so small that one back end table can be used for either one), or
 | |
| for the MC68020.
 | |
| .AE
 | |
| .NH
 | |
| Introduction
 | |
| .PP
 | |
| To simplify the task of producing portable (cross) compilers and interpreters
 | |
| the Vrije Universiteit designed an integrated collection of programs, the
 | |
| Amsterdam Compiler Kit (ACK) [2]. It is based on the old UNCOL idea [1] which
 | |
| attempts to solve the problem of how to make a compiler for each of @ N @
 | |
| languages on @ M @ different machines without having to write @ N times M @
 | |
| programs.
 | |
| .PP
 | |
| The UNCOL approach is to write @ N @
 | |
| .I
 | |
| front ends,
 | |
| .R
 | |
| which translate the
 | |
| source language into a common intermediate language UNCOL (Universal Computer
 | |
| Oriented Language), and @ M @
 | |
| .I
 | |
| back ends,
 | |
| .R
 | |
| each of which translates programs in
 | |
| UNCOL into a specific machine language. Under these conditions only @ M + N @
 | |
| programs must be written to provide all @ N @ languages on all @ M @
 | |
| machines, instead of @ M times N @ programs.
 | |
| .PP
 | |
| The intermediate language for the Amsterdam Compiler Kit is the machine language
 | |
| for a simple stack machine called EM (Encoding Machine) [3]. So a back end for
 | |
| the MC68020 translates EM code into MC68020 assembly language. Writing such a
 | |
| table [4] suffices to get the back end.
 | |
| .PP
 | |
| The back end is a single program that is driven by a machine dependent driving
 | |
| table. This table, the back end table, defines the mapping of EM code to
 | |
| the MC68000, MC68010 or MC68020 assembly language.
 | |
| .NH
 | |
| The MC68000 and MC68020 micro processors
 | |
| .PP
 | |
| In this document the name MC68000 will be used for both the MC68000 and the
 | |
| MC68010 micro processors, because as far as the back end table is concerned
 | |
| there is no difference between them. For a complete and detailed description
 | |
| of the MC68020 one is referred to [5]; for the MC68000 one might also use [6].
 | |
| In this section some relevant parts will be handled.
 | |
| .NH 2 
 | |
| Registers
 | |
| .PP
 | |
| Both the MC68000 and the MC68020 have eight 32-bit data registers (@ D sub 0 @-@ D sub 7 @) that can
 | |
| be used for byte (8-bit), word (16-bit) and long word (32-bit) data operations.
 | |
| They also have seven 32-bit address registers (@ A sub 0 @-@ A sub 6 @) that may be used as
 | |
| software stack pointers and base address registers; address register @ A sub 7 @ is
 | |
| used as the system stack pointer. Address registers may also be used for
 | |
| word and long word address operations.
 | |
| .NH 2 
 | |
| Addressing modes
 | |
| .PP
 | |
| First the MC68000 addressing modes will be discussed. Since the MC68020's
 | |
| set of addressing modes is an extension of the MC68000's set, of course this
 | |
| section also applies to the MC68020.
 | |
| .PP
 | |
| In the description we use:
 | |
| .IP @ A sub n @
 | |
| for address register;
 | |
| .IP @ D sub n @
 | |
| for data register;
 | |
| .IP @ R sub n @
 | |
| for address or data register;
 | |
| .IP @ X sub n @
 | |
| for index register (either data or address register);
 | |
| .IP @ PC @
 | |
| for program counter;
 | |
| .IP @ d sub 8 @
 | |
| for 8 bit displacement integer;
 | |
| .IP @ d sub 16 @
 | |
| for 16 bit displacement integer;
 | |
| .IP @ bd @
 | |
| for base displacement (may be null, word or long);
 | |
| .IP @ od @
 | |
| for outer displacement (may be null, word or long).
 | |
| .NH 3 
 | |
| General addressing modes
 | |
| .NH 4 
 | |
| Register Direct Addressing
 | |
| .IP Syntax: 8
 | |
| @ R sub n @ 
 | |
| .PP
 | |
| This addressing mode (it can be used with either a data register or an address
 | |
| register) specifies that the operand is in one of
 | |
| the 16 multifunction registers.
 | |
| .NH 4 
 | |
| Address Register Indirect
 | |
| .IP Syntax: 8
 | |
| @ ( A sub n ) @ 
 | |
| .PP
 | |
| The address of the operand is in the address register specified.
 | |
| .NH 4 
 | |
| Address Register Indirect With Postincrement
 | |
| .IP Syntax: 8
 | |
| @ ( A sub n )+ @ 
 | |
| .PP
 | |
| The address of the operand is in the address register specified. After the
 | |
| operand address is used, the address register is incremented by one, two or
 | |
| four depending upon whether the size of the operand is byte, word or long.
 | |
| If the address register is the stack pointer and the operand size is byte, the
 | |
| address register is incremented by two rather than one to keep the stack pointer
 | |
| on a word boundary.
 | |
| .NH 4 
 | |
| Address Register Indirect With Predecrement
 | |
| .IP Syntax: 8
 | |
| @ -( A sub n ) @ 
 | |
| .PP
 | |
| The address of the operand is in the address register specified. Before the
 | |
| operand address is used, the address register is decremented by one, two or
 | |
| four depending upon whether the size of the operand is byte, word or long.
 | |
| If the address register is the stack pointer and the operand size is byte, the
 | |
| address register is decremented by two rather than one to keep the stack pointer
 | |
| on a word boundary.
 | |
| .NH 4 
 | |
| Address Register Indirect With Displacement
 | |
| .IP Syntax: 8
 | |
| @ d sub 16 ( A sub n ) @ for the MC68000, @ ( d sub 16 , A sub n ) @ for the MC68020 
 | |
| .PP
 | |
| This address mode requires one word of extension. The address of the operand is
 | |
| the sum of the contents of the address register and the sign extended 16-bit
 | |
| integer in the extension word.
 | |
| .NH 4 
 | |
| Address Register Indirect With Index
 | |
| .IP Syntax: 8
 | |
| @ d sub 8 ( A sub n , X sub n .size) @ for the MC68000, @ ( d sub 8 , A sub n , X sub n .size) @ for the MC68020 
 | |
| .PP
 | |
| This address mode requires one word of extension according to a certain format, 
 | |
| which specifies
 | |
| .IP 1.
 | |
| which register to use as index register;
 | |
| .IP 2.
 | |
| a flag that indicates whether the index register is a data register or an
 | |
| address register;
 | |
| .IP 3.
 | |
| a flag that indicates the index size; this is
 | |
| .I word
 | |
| when the low order part of the index register is to be used, and 
 | |
| .I long
 | |
| when the whole long value in the register is to be used as index;
 | |
| .IP 4.
 | |
| an 8-bit displacement integer (the low order byte of the extension word).
 | |
| .PP
 | |
| The address of the operand is the sum of the contents of the address register,
 | |
| the possibly sign extended contents of index register and the sign
 | |
| extended 8-bit displacement.
 | |
| .NH 4 
 | |
| Absolute Data Addressing
 | |
| .IP Syntax: 8
 | |
| @ address @ for the MC68000, @ ( address ) @ for the MC68020 
 | |
| .PP
 | |
| Two different kinds of this mode are available:
 | |
| .IP 1.
 | |
| Absolute Short Address; this mode requires one word of extension. The address of
 | |
| the operand is the sign extended 16-bit extension word.
 | |
| .IP 2.
 | |
| Absolute Long Address; this mode requires two words of extension. The address of
 | |
| the operand is developed by concatenation of the two extension words; the high
 | |
| order part of the address is the first extension word, the low order part is
 | |
| the second.
 | |
| .NH 4 
 | |
| Program Counter With Displacement.
 | |
| .IP Syntax: 8
 | |
| @ d sub 16 ( PC ) @ for the MC68000, @ ( d sub 16 , PC ) @ for the MC68020 
 | |
| .PP
 | |
| This mode requires one word of extension. The address of the operand is the sum
 | |
| of the address in the program counter and the sign extended 16-bit displacement
 | |
| integer in the extension word. The value in the program counter is the
 | |
| address of the extension word.
 | |
| .NH 4 
 | |
| Program Counter With Index
 | |
| .IP Syntax: 8
 | |
| @ d sub 8 ( PC , X sub n .size ) @ for the MC68000, @ ( d sub 8 , PC,  X sub n .size ) @ for the MC68020 
 | |
| .PP
 | |
| This mode requires one word of extension as described under
 | |
| .I
 | |
| Address Register Indirect With Index.
 | |
| .R
 | |
| The address of the operand is the sum of the value in the
 | |
| program counter, the possibly sign extended index register and the sign
 | |
| extended 8-bit displacement integer in the extension word.
 | |
| The value in the program counter is the address of the extension word.
 | |
| .NH 4 
 | |
| Immediate Data
 | |
| .IP Syntax: 8
 | |
| @ "\#data" @
 | |
| .PP
 | |
| This addressing mode requires either one or two words of extension, depending
 | |
| on the size of the operation;
 | |
| .IP
 | |
| byte operation - the operand is in the low order byte of extension word;
 | |
| .IP
 | |
| word operation - the operand is in the extension word;
 | |
| .IP
 | |
| long operation - the operand is in the two extension words, the high order
 | |
| 16-bits are in the first extension word, the low order 16-bits in the second.
 | |
| .NH 3 
 | |
| Extra MC68020 addressing modes
 | |
| .PP
 | |
| The MC68020 has three more addressing modes. These modes all use a displacement
 | |
| (some even two), an address register and an index register. Instead of the
 | |
| address register one may also use the program counter. Any of these
 | |
| may be omitted. If all addends are omitted the processor creates an
 | |
| effective address of zero. All of these three modes require at least one
 | |
| extension word, the
 | |
| .I
 | |
| Full Format Extension Word,
 | |
| .R
 | |
| which specifies:
 | |
| .IP 1.
 | |
| the index register number (0-7);
 | |
| .IP 2.
 | |
| the index register type (address or data register);
 | |
| .IP 3.
 | |
| the size of the index (only low order part or the whole register)
 | |
| .IP 4.
 | |
| a scale factor. This is a number from 0 to 3 which specifies how many bits
 | |
| the contents of the index register is to be shifted to the left before being
 | |
| used as an index;
 | |
| .IP 5.
 | |
| a flag that specifies whether the base (address) register is to be added or
 | |
| to be suppressed;
 | |
| .IP 6.
 | |
| a flag that specifies whether to add or suppress the index operand;
 | |
| .IP 7.
 | |
| two bits that specify the size of the base displacement (null, word or long);
 | |
| .IP 8.
 | |
| three bits that in combination with (6) above specify which of the three
 | |
| addressing modes (described below) to use and, if used, the size of the
 | |
| outer displacement (null, word or long).
 | |
| .IP N.B.
 | |
| All modes mentioned above for the MC68000
 | |
| that use an index register may have this register
 | |
| scaled (only when using the MC68020).
 | |
| .PP
 | |
| The three extra addressing modes are:
 | |
| .NH 4 
 | |
| Address Register Indirect With Index (Base Displacement)
 | |
| .IP Syntax: 8
 | |
| @ ( bd , A sub n , X sub n .size*scale ) @ (MC68020 only)
 | |
| .PP
 | |
| The address of the operand is the sum of the contents of the address register,
 | |
| the scaled contents of the possibly scaled index register and the possibly
 | |
| sign extended base displacement. When the program counter is used instead
 | |
| of the address register, the value in the program counter is the address
 | |
| of the full format extension word. This mode requires one or two more extension
 | |
| words when the size of the base displacement is word or long respectively.
 | |
| .PP
 | |
| Note that without the index operand, this mode is an extension of the
 | |
| .I
 | |
| Address Register Indirect With Displacement
 | |
| .R
 | |
| mode; when using the MC68020 one is no longer limited to a 16-bit displacement.
 | |
| Also note that with the index operand added, this mode is an extension
 | |
| of the
 | |
| .I
 | |
| Address Register Indirect With Index
 | |
| .R
 | |
| mode; when using the MC68020 one is no longer limited to an 8-bit displacement.
 | |
| .NH 4 
 | |
| Memory Indirect Post-Indexed
 | |
| .IP Syntax: 8
 | |
| @ ( [ bd , A sub n ] , X sub n .size*scale , od ) @ (MC68020 only)
 | |
| .PP
 | |
| This mode may use an outer displacement. First an intermediate memory
 | |
| address is calculated by adding the contents of the address register and
 | |
| the possibly sign extended base displacement. This address is used
 | |
| for in indirect memory access of a long word, followed by adding
 | |
| the index operand (scaled and possibly signed extended). Finally the
 | |
| outer displacement is added to yield the address of the operand.
 | |
| When the program counter is used, the value in the program counter is the
 | |
| address of the full format extension word.
 | |
| .NH 4
 | |
| Memory Indirect Pre-Indexed
 | |
| .IP Syntax: 8
 | |
| @ ( [ bd , A sub n , X sub n .size*scale ] , od ) @ (MC68020 only)
 | |
| .PP
 | |
| This mode may use an outer displacement. First an intermediate memory
 | |
| address is calculated by adding the contents of the address register,
 | |
| the scaled contents of the possibly sign extended index register and
 | |
| the possibly sign extended base displacement. This address is used
 | |
| for an indirect memory access of a long word, followed by adding
 | |
| the outer displacement to yield the address of the operand.
 | |
| When the program counter is used, the value in the program counter is the
 | |
| address of the full format extension word.
 | |
| .NH 3
 | |
| Addressing modes used in the table
 | |
| .PP
 | |
| Not all addressing modes mentioned above are used in code generation. It is
 | |
| clear that none of the modes that use the program counter PC can be used,
 | |
| since at code generation time nothing is known about the value in PC.
 | |
| Also some of the possibilities of the three MC68020 addressing modes are not
 | |
| used; e.g. it is possible to use a
 | |
| .I
 | |
| Data Register Indirect
 | |
| .R
 | |
| mode, which actually is the
 | |
| .I
 | |
| Address Register Indirect With Index
 | |
| .R
 | |
| mode, with the address register and the displacement left out. However 
 | |
| such a mode would require two extra bytes for the full format extension word,
 | |
| and it would also be much slower than using
 | |
| .I
 | |
| Address Register Indirect.
 | |
| .R
 | |
| For this kind of reasons several possible addressing modes are not used in the
 | |
| generation of code.
 | |
| In the table address registers are only used for holding addresses, and
 | |
| for index registers only data registers are used.
 | |
| .NH
 | |
| The M68000 and MC68020 back end table
 | |
| .PP
 | |
| The table itself has to be run through the C preprocessor 
 | |
| before it can be used to generate
 | |
| the back end (called
 | |
| .I
 | |
| code generator
 | |
| .R
 | |
| or
 | |
| .I cg
 | |
| for short). When no flags are given to
 | |
| the preprocessor an MC68020 code generator is produced; for the MC68000
 | |
| code generator one has to run the table through the preprocessor using the
 | |
| .I -Dm68k4
 | |
| flag.
 | |
| .PP
 | |
| The table is designed as described in [4]. For the overall design of a back
 | |
| end table one is referred to this document. This section only deals
 | |
| with problems encountered in writing the table and other things worth noting.
 | |
| .NH 2 
 | |
| Constant Definitions
 | |
| .PP
 | |
| Wordsize and pointersize (EM_WSIZE and EM_PSIZE respectively) are defined
 | |
| as four (bytes). EM_BSIZE, the hole between AB (the parameter base) and
 | |
| LB (the local base), is eight bytes: only
 | |
| the return address and the localbase are saved.
 | |
| .NH 2 
 | |
| Properties
 | |
| .PP
 | |
| Since Hans van Staveren in his document [4] clearly states that
 | |
| .I cg
 | |
| execution time is negatively influenced by the number of properties, only
 | |
| four different properties have been defined. Besides, since the registers
 | |
| really are multifunctional, these four are really all that are needed.
 | |
| .NH 2 
 | |
| Registers
 | |
| .PP
 | |
| The table uses register variables: @ D sub 3 @ - @ D sub 7 @ are used as general register
 | |
| variables, and address registers @ A sub 2 @ - @ A sub 5 @ are used as pointer register
 | |
| variables. @ A sub 6 @ is reserved for the localbase.
 | |
| .NH 2 
 | |
| Tokens
 | |
| .PP
 | |
| At first glance one might wonder about the amount of tokens, especially
 | |
| for the MC68020, considering the small amount of different addressing modes.
 | |
| However, the last three addressing modes mentioned for the MC68020 may
 | |
| omit any of the addends, and this leads to a large amount of different tokens.
 | |
| I did consider the possibility of enlarging the number of tokens and sets
 | |
| even further, because there might be assemblers that don't handle displacements
 | |
| of zero optimally (they might generate a 2 byte extension word holding zero).
 | |
| The small profit in bytes in the generated code
 | |
| however does not justify the increase
 | |
| in size of the token section, the set section and the patterns section,
 | |
| so this idea was not developed any further.
 | |
| .PP
 | |
| The timing cost of the tokens may be incorrect for some MC68000 tokens.
 | |
| This is because the MC68000 uses a 16-bit data bus which causes the need
 | |
| of two separate memory accesses for getting 32-bit operands.
 | |
| .NH 3 
 | |
| Token names
 | |
| .PP
 | |
| The amount of tokens and the limited capability of the authors imagination
 | |
| might have caused the names of some tokens not to be very clarifying.
 | |
| Some information about the names may be in place here.
 | |
| .PP
 | |
| Whenever part of a token name is in capitals that part is memory indirected
 | |
| (i.e. in square brackets). In token names
 | |
| .I OFF
 | |
| and
 | |
| .I off
 | |
| mean an offsetted address register, so an address register with a displacement
 | |
| (either base displacement or outer displacement).
 | |
| .I
 | |
| IND, ind
 | |
| .R
 | |
| and
 | |
| .I index
 | |
| stand for indexed, or index register.
 | |
| .I ABS
 | |
| and
 | |
| .I abs
 | |
| stand for absolute, which actually is just a displacement (base or outer).
 | |
| These `rules' only apply to names of tokens that represent actual operands.
 | |
| There are also tokens that represent addresses of operands. These
 | |
| (with a few exceptions) contain
 | |
| .I
 | |
| regA, regX
 | |
| .R
 | |
| and
 | |
| .I con
 | |
| as parts of there names, which stand for address register, index register and
 | |
| displacement (always base displacement) respectively. If the address to which
 | |
| the token refers uses memory indirection, that part of the name comes first
 | |
| (in small letters), followed by an underscore. The memory indirection part
 | |
| follows the `rules' for operand token names.
 | |
| .PP
 | |
| Of course there are exceptions to these `rules' but in those cases the names
 | |
| are self explanatory.
 | |
| .PP
 | |
| Two special cases:
 | |
| .I ext_regX
 | |
| is the name of the token that represents the
 | |
| address of an absolute indexed operand, syntax @ ( bd , X sub n .size*scale ) @; 
 | |
| .I regX
 | |
| does not represent any real mode, but is used with EM array instructions and
 | |
| pointer arithmetic.
 | |
| .NH 3
 | |
| Special tokens for the MC68000
 | |
| .PP
 | |
| The MC68000 requires two extra tokens, which are called
 | |
| .I t_regAcon
 | |
| and
 | |
| .I
 | |
| t_regAregXcon.
 | |
| .R
 | |
| They are necessary because
 | |
| .I regAcon
 | |
| can only have a 16-bit displacement on the MC68000, and
 | |
| .I regAregXcon
 | |
| uses only 8 bits for its displacement. To prevent these addressing modes to
 | |
| be used with displacements that are too large, the extra tokens are needed.
 | |
| Whenever the displacements become too large and they need
 | |
| to be used in the generation
 | |
| of assembly code, these tokens are transformed into other tokens.
 | |
| To prevent the table from becoming too messy I defined
 | |
| .I t_regAcon
 | |
| and
 | |
| .I t_regAregXcon
 | |
| to be identical to
 | |
| .I regAcon
 | |
| and
 | |
| .I regAregXcon
 | |
| respectively for the MC68020.
 | |
| .NH 2 
 | |
| Sets
 | |
| .PP
 | |
| Most set names used in the table are self explanatory, especially to the reader
 | |
| who is familiar with the four addressing categories as mentioned in [5]:
 | |
| .I
 | |
| data, memory, alterable
 | |
| .R
 | |
| and
 | |
| .I
 | |
| control.
 | |
| .R
 | |
| In the sets definition part some sets are defined that are not used elsewhere in
 | |
| the table, but are only used to be part of the definition of
 | |
| some other set. This keeps the
 | |
| set definition part from getting too unreadable.
 | |
| .PP
 | |
| The sets called
 | |
| .I imm_cmp
 | |
| consist of all tokens that can be used to compare with a constant.
 | |
| .NH 2 
 | |
| Instructions
 | |
| .PP
 | |
| Only the instructions that are used in code generation are listed here.
 | |
| The first few instructions are meant especially for the use with register
 | |
| variables. The operand LOCAL used here refers to a register variable.
 | |
| The reader may not conclude that these operations are also allowed on
 | |
| ordinary locals. The space and timing cost of these instructions have been
 | |
| adapted, but the use of the word LOCAL for register variables causes these cost
 | |
| to be inaccurate anyway.
 | |
| .PP
 | |
| The 
 | |
| .I killreg
 | |
| instruction, which generates a comment in the assembly language output and
 | |
| which is meant to let
 | |
| .I cg
 | |
| know that the data register operand has its contents destroyed,
 | |
| needs some explaining but this explanation is better in place
 | |
| in the discussion of groups 3 and 4 of the section about patterns.
 | |
| .PP
 | |
| The timing cost of the instructions are probably not very accurate for the
 | |
| MC68020 because the MC68020 uses an instruction cache and prefetch. The
 | |
| cost used in the table are the `worst case cost' as mentioned in section 9
 | |
| of [5].
 | |
| .NH 2 
 | |
| Moves
 | |
| .PP
 | |
| These are all pretty straightforward, except perhaps when
 | |
| .I t_regAcon
 | |
| and
 | |
| .I t_regAregXcon
 | |
| are used. In these cases the size of the displacement has to be checked
 | |
| before moving. This also applies to the stacking rules and the coercions.
 | |
| .NH 2 
 | |
| Tests
 | |
| .PP
 | |
| These three tests (one fore each operation size) could not be more
 | |
| straightforward than they are now.
 | |
| .NH 2 
 | |
| Stackingrules
 | |
| .PP
 | |
| The only peculiar stackingrule is the one for
 | |
| .I
 | |
| regX.
 | |
| .R
 | |
| This token is only used with EM array instructions and
 | |
| with pointer arithmetic. Whenever it is put
 | |
| on the fake stack, some EM instructions are left in the instruction stream
 | |
| to remove this token. Consequently it should never have to be stacked. However
 | |
| the
 | |
| .I
 | |
| code generator generator
 | |
| .R
 | |
| (or
 | |
| .I cgg
 | |
| for short)
 | |
| complained about not having a stackingrule for this token, so it had to
 | |
| be added nevertheless.
 | |
| .NH 2 
 | |
| Coercions
 | |
| .PP
 | |
| These are all straightforward. There are no splitting coercions since
 | |
| the fake stack never contains any tokens that can be split.
 | |
| There are only two unstacking coercions.
 | |
| The rest are all transforming coercions. Almost all coercions transform
 | |
| tokens into either a data register or an address register, except in the
 | |
| MC68000 part of the table the
 | |
| .I t_regAcon
 | |
| and
 | |
| .I t_regAregXcon
 | |
| tokens are transformed into real
 | |
| .I regAcon
 | |
| and
 | |
| .I regAregXcon
 | |
| tokens with displacements that are properly sized.
 | |
| .NH 2 
 | |
| Patterns
 | |
| .PP
 | |
| This is the largest part of the table. It is subdivided into 17 groups.
 | |
| We will take a closer look at the more interesting groups.
 | |
| .NH 3 
 | |
| Group 0: rules for register variables
 | |
| .PP
 | |
| This group makes sure that EM instructions using register variables are
 | |
| handled efficiently. This group includes: local loads and
 | |
| stores; arithmetic, shifts and logical operations on locals and indirect locals
 | |
| and pointer handling, where C expressions like
 | |
| .I
 | |
| *cp++
 | |
| .R
 | |
| are handled. For such an expression there are several EM instruction
 | |
| sequences the front end might generate. For an integer pointer e.g.:
 | |
| .DS
 | |
| .B
 | |
| lol lol adp stl loi $1==$2 && $1==$4 && $3==4 && $5==4
 | |
| .I
 | |
| .DE
 | |
| or
 | |
| .DS
 | |
| .B
 | |
| lol loi lol adp stl $1==$3 && $3==$5 && $2==4 && $5==4
 | |
| .I
 | |
| .DE
 | |
| or perhaps even
 | |
| .DS
 | |
| .B
 | |
| lil lol adp stl $1==$2 && $2==$4 && $3==4
 | |
| .I
 | |
| .DE
 | |
| Each of these is included, since which one is generated is is up to the front
 | |
| end. If the front end is consistent this will mean that some of these patterns
 | |
| will never be used in code generation. This might seem a waist, but anyone
 | |
| who thinks that will certainly change his mind when his new C front end
 | |
| generates a different EM instruction sequence.
 | |
| .NH 3 
 | |
| Groups 1 and 2: load and store instructions
 | |
| .PP
 | |
| In these groups
 | |
| .B lof
 | |
| and
 | |
| .B stf
 | |
| ,
 | |
| .B loi
 | |
| and
 | |
| .B sti
 | |
| ,
 | |
| .B ldf
 | |
| and
 | |
| .B sdf
 | |
| are the important instructions.
 | |
| These are the large parts in this group, especially the
 | |
| .B loi
 | |
| and
 | |
| .B sti
 | |
| instructions, because they come in three basic sizes (byte, word and long).
 | |
| Note that with these instructions in the MC68000 part the
 | |
| .I exact
 | |
| is omitted in front of
 | |
| .I regAcon
 | |
| and
 | |
| .I
 | |
| regAregXcon.
 | |
| .R
 | |
| This makes sure that
 | |
| .I t_regAcon
 | |
| and
 | |
| .I t_regAregXcon
 | |
| are transformed into proper tokens before they are used as addresses.
 | |
| .PP
 | |
| Also note that the
 | |
| .I regAregXcon
 | |
| token is completely left out from the
 | |
| \fBlof\fR, \fBstf\fR, \fBldf\fR and \fBsdf\fR
 | |
| instruction handling. This is because the sum of the token displacement
 | |
| and the offset provided in the instruction cannot be checked and is likely
 | |
| to exceed 8 bits. Unfortunately 
 | |
| .I cgg
 | |
| does not allow the inspection of subregisters of tokens that are on the
 | |
| fake stack. This same problem might also occur with the
 | |
| .I regAcon
 | |
| token, but this is less likely because it
 | |
| uses 16-bit displacements. Besides if it would have been left out the
 | |
| \fBlof\fR, \fBstf\fR, \fBldf\fR and \fBsdf\fR
 | |
| instructions would have been handled considerably less efficient.
 | |
| .NH 3 
 | |
| Groups 3 and 4: integer and unsigned arithmetic
 | |
| .PP
 | |
| EM instruction
 | |
| .B sbi
 | |
| also works with address registers, because the 
 | |
| .B cmp
 | |
| instruction in group 12 is replaced by \fBsbi 4\fR.
 | |
| .PP
 | |
| For the MC68000 \fBmli\fR, \fBmlu\fR, \fBdvi\fR, \fBdvu\fR, \fBrmi\fR
 | |
| and \fBrmu\fR are handled
 | |
| by library routines. This is because the MC68000 has only 16-bit multiplications
 | |
| and divisions.
 | |
| .PP
 | |
| The MC68020 does have 32-bit multiplications and divisions, but for the
 | |
| .B rmi
 | |
| and
 | |
| .B rmu
 | |
| EM instructions peculiar things happen anyway: they generate the
 | |
| .I killreg
 | |
| instruction. This is necessary because the data register that 
 | |
| first held the dividend now holds the quotient; the original contents are
 | |
| destroyed without
 | |
| .I cg
 | |
| knowing about it (the destruction of the two registers that make up the
 | |
| .I DREG_pair
 | |
| token couldn't be noted in the instructions part of the table).
 | |
| To let
 | |
| .I cg
 | |
| know that these contents are destroyed, we have to use this `pseudo instruction'
 | |
| from lack of a better solution.
 | |
| .NH 3 
 | |
| Group 5: floating point arithmetic
 | |
| .PP
 | |
| Since floating point arithmetic is not implemented traps will be generated here.
 | |
| .NH 3 
 | |
| Group 6: pointer arithmetic
 | |
| .PP
 | |
| This also is a very important group, along with groups 1 and 2. The MC68020
 | |
| has many different addressing modes and if possible they should be used in
 | |
| the generation of assembly language.
 | |
| .PP
 | |
| The
 | |
| .I regX
 | |
| token is generated here too. It is meant to make efficient use of the
 | |
| MC68020 possibility of scaling index registers.
 | |
| .PP
 | |
| Note that I would have liked one extra pattern to handle C-statements
 | |
| like
 | |
| .DS
 | |
| .I
 | |
| pointer += expr ? constant1 : constant2;
 | |
| .R
 | |
| .DE
 | |
| efficiently. This pattern would have looked like:
 | |
| .DS
 | |
| pat ads
 | |
| with const
 | |
| leaving adp %1.num
 | |
| .DE
 | |
| but when
 | |
| .I cg
 | |
| is coming to the EM replacement part, the constant has already been removed
 | |
| from the fake stack, causing
 | |
| .I %1.num
 | |
| to have a wrong value.
 | |
| .NH 3 
 | |
| Group 9: logical instructions
 | |
| .PP
 | |
| The EM instructions \fBand\fR,
 | |
| .B ior
 | |
| and
 | |
| .B xor
 | |
| are so much alike that procedures can be used here, except for the
 | |
| .B
 | |
| xor $1==4
 | |
| .R
 | |
| instruction, because the MC68000
 | |
| .I eor
 | |
| instruction does not allow as many kinds of operands as
 | |
| .I and
 | |
| and
 | |
| .I
 | |
| or.
 | |
| .R
 | |
| .NH 3 
 | |
| Group 11: arrays
 | |
| .PP
 | |
| This group also tries to make efficient use of the available addressing modes,
 | |
| but it leaves the actual work to group 6 mentioned above.
 | |
| .PP
 | |
| The
 | |
| .I regX
 | |
| token is also generated here. In this group this token is very useful for
 | |
| handling array instructions for arrays with one, two, four or eight byte
 | |
| elements; the array index goes into the index register, which can then
 | |
| be scaled appropriately. An offset is used when the
 | |
| first array element has an index other than zero.
 | |
| .PP
 | |
| I would have liked some extra patterns here too but they won't work
 | |
| for the same reasons as explained in the discussion of group 6.
 | |
| .NH 3 
 | |
| Group 14: procedure calls instructions
 | |
| .PP
 | |
| The function return area consists of registers @ D sub 0 @ and @ D sub 1 @.
 | |
| .NH 3 
 | |
| Group 15: miscellaneous instructions
 | |
| .PP
 | |
| In many cases here library routines are called. These will be discussed
 | |
| later.
 | |
| .PP
 | |
| Two special EM instructions are included here: \fBdch\fR, and \fBlpb\fR.
 | |
| I don't know when they are generated by a front end, but these
 | |
| instructions were also in the back end table for the PDP. In the PDP table
 | |
| these instructions were replaced by
 | |
| .B
 | |
| loi 4
 | |
| .R
 | |
| and
 | |
| .B
 | |
| adp 8
 | |
| .R
 | |
| respectively. I included them both, since they couldn't do any harm.
 | |
| .NH 3 
 | |
| Extra group: optimalization
 | |
| .PP
 | |
| This group is handling EM patterns with more than one instruction. This group
 | |
| is not absolutely necessary but it makes the generation of code
 | |
| more efficient. Among the things that are handled here are: arithmetic and
 | |
| logical operations on locals, externals and indirect locals; shifting
 | |
| of locals, externals and indirect locals by one; some pointer arithmetic; tests
 | |
| in combination with logical and's and or's or with branches. Finally
 | |
| there are sixteen patterns about divisions that could be handled more
 | |
| efficiently by right shifts and which I think should be handled by the
 | |
| peephole optimizer (since it also handles
 | |
| the same patterns with multiplication).
 | |
| .NH
 | |
| The library routines
 | |
| .PP
 | |
| The table is supplied with two separate libraries: one for the MC68000 and one
 | |
| for the MC68020. The MC68000 uses a couple more routines than the MC68020
 | |
| because it doesn't have 32-bit division and multiplication.
 | |
| .PP
 | |
| The routines that need to pop their operands first store their return address.
 | |
| Routines that need other register besides @ D sub 0 @-@ D sub 2 @ and @ A sub 0 @-@ A sub 1 @ first store
 | |
| the original contents of those registers. @ D sub 0 @-@ D sub 2 @ and @ A sub 0 @-@ A sub 1 @ do not have
 | |
| to be saved because if they contain anything useful, their contents
 | |
| are pushed on the stack before the routine is called.
 | |
| .PP
 | |
| The
 | |
| .I .trp
 | |
| routine just prints a message stating the trap number and exits (except
 | |
| of course when that particular trap number is masked). Usually higher
 | |
| level languages use their own trap handling routines.
 | |
| .PP
 | |
| The
 | |
| .I .mon
 | |
| routine doesn't do anything useful at all. It just prints a message stating that
 | |
| the specified system call is not implemented and then exits. Front ends
 | |
| usually generate calls to special routines rather than the EM
 | |
| instruction \fBmon\fR.
 | |
| These routines have to be supplied in another library. They
 | |
| may be system dependent (e.g. the MC68000 machine this table was tested on
 | |
| first moves the parameters to registers, then moves the system call number
 | |
| to @ D sub 0 @ and then executes
 | |
| .I
 | |
| trap #0,
 | |
| .R
 | |
| whereas the MC68020 machine this table was tested on required the parameters
 | |
| to be on the stack rather than in registers). Therefor this library is not
 | |
| discussed here.
 | |
| .PP
 | |
| The
 | |
| .I .printf
 | |
| routine is included for EM diagnostic messages. It can print strings using %s,
 | |
| 16-bit decimal numbers using %d and 32-bit hexadecimal numbers using %x.
 | |
| .PP
 | |
| The
 | |
| .I .strhp
 | |
| routine stores a new EM heap pointer, and sometimes it needs to allocate more
 | |
| heap space. This is done by calling the system call routine \fI_brk\fR.
 | |
| Chunks of 1K bytes are allocated, but this can easily be changed into
 | |
| larger or smaller chunks.
 | |
| .PP
 | |
| The MC68000 library also contains a routine to handle the EM instruction \fBrck\fR.
 | |
| The MC68020 has an instruction
 | |
| .I cmp2
 | |
| that is specially meant for range checking so the MC68020 library can do without
 | |
| that routine.
 | |
| .PP
 | |
| The MC68000 library has two multiplication routines, one for unsigned and the other
 | |
| for signed multiplication. The one for signed multiplication
 | |
| first tests the sizes of the operands, to see if it can perform
 | |
| the 16 bit machine instruction instead of the routine. If not, it considers
 | |
| it's two operands being two digit numbers in a 65535-radix system. It
 | |
| uses the 16-bit unsigned multiply instruction
 | |
| .I mulu
 | |
| three times (it does not calculate the high order result),
 | |
| and adds up the intermediary results the proper way. The signed
 | |
| multiplication routine calculates the sign of the result, calculates
 | |
| the result as it it were an unsigned multiplication, and
 | |
| adjusts the sign of the result. Here testing
 | |
| the operands for there sizes would be less simple, because the operands
 | |
| are signeds; so that is not done here.
 | |
| .PP
 | |
| The MC68000 library also has two division routines. The routine for unsigned
 | |
| division uses the popular algorithm, where the divisor is shifted out and
 | |
| the quotient shifted in. The signed division routine calculates the sign of
 | |
| both the quotient and the remainder, calls the unsigned division routine
 | |
| and adjusts the signs for the quotient and the remainder.
 | |
| .PP
 | |
| The
 | |
| .I .nop
 | |
| routine is included for testing purposes. This routine prints the line
 | |
| number and the value in the stack pointer. Calls to this routine
 | |
| are generated by the EM instruction \fBnop\fR, which is ordinarily
 | |
| left out by the peephole optimizer.
 | |
| .NH
 | |
| Testing the table
 | |
| .PP
 | |
| There are special test programs available for testing back end tables.
 | |
| First there is the EM test set, which tests most EM instructions, making
 | |
| good use of the
 | |
| .B nop
 | |
| instruction. Then there are the Pascal and C test programs. The Pascal
 | |
| test programs report errors, which makes it relatively easy
 | |
| to find out what was wrong in the table. The C test programs just
 | |
| generate some output, which then has to be compared to the expected
 | |
| output. Differences are
 | |
| not only caused by errors but also e.g. by the use of four
 | |
| byte integers and unsigneds (which this table does),
 | |
| the use of signed characters
 | |
| instead of unsigned characters (the C front end I used generated signed
 | |
| characters) or because the back end
 | |
| does not support floating point.
 | |
| These differences have to be `filtered out' to reveal
 | |
| the differences caused by actual errors in the back end table.
 | |
| These errors then have to be found out by examining the assembly code, for
 | |
| no proper diagnostic messages are generated.
 | |
| .PP
 | |
| After these three basic tests there still remain a number of patterns that
 | |
| haven't been tested yet. Fortunately
 | |
| .I cgg
 | |
| offers the possibility of generating a special
 | |
| .I cg
 | |
| that can print a list of patterns that haven't been used in
 | |
| code generation yet.
 | |
| For these patterns the table writer has to write his own test programs.
 | |
| This may complicate things a bit because errors may now be caused by
 | |
| errors in the back end table as well as errors in the test programs.
 | |
| The latter happened quite often to me, because I found EM
 | |
| to be an uncomfortable programming language (of course it isn't meant to
 | |
| be a programming language, but an intermediary language).
 | |
| .PP
 | |
| There still remain a couple of patterns in this table that haven't been tested
 | |
| yet. However these patterns all have very similar cases that have been
 | |
| tested (an example of this is mentioned in the section on group 0
 | |
| of the patterns section of the table). Some patterns have to
 | |
| do with floating point numbers. These EM instructions all generate
 | |
| traps, so they didn't all have to be tested. The two instructions
 | |
| .B dch
 | |
| and
 | |
| .B lpb
 | |
| haven't been tested in this table, but since they only use EM replacement
 | |
| and they have been tested in the PDP back end table, these two should
 | |
| be all right.
 | |
| .NH
 | |
| Performance of the back end
 | |
| .PP
 | |
| To test the performance of the back end I gathered a couple of
 | |
| C programs and compiled them on the machines I used to test the back ends on.
 | |
| I compiled them using the C compiler that was available there and
 | |
| I also compiled them using the back end. I then compared the sizes
 | |
| of the text segments in the object files.
 | |
| The final results of these comparisons are in fig. 1 and fig. 2.
 | |
| .KF
 | |
| .TS
 | |
| center box;
 | |
| cfI s s s s s
 | |
| c s s s s s
 | |
| c c | c s | c s
 | |
| c c | c s | c s
 | |
| c | c | c  c | c  c
 | |
| l | n | n  n | n  n.
 | |
| Differences in text segment sizes for the MC68000
 | |
| parts of the back end compiled by itself
 | |
| _
 | |
| original	 	old m68k4	new MC68000
 | |
| compiler	(100%)	back end	back end
 | |
| _
 | |
| name	size	size	perc.	size	perc.
 | |
| _
 | |
| codegen.c	13892	16224	116.7%	12860	92.5%
 | |
| compute.c	4340	4502	103.7%	4530	104.3%
 | |
| equiv.c	680	662	97.3%	598	87.9%
 | |
| fillem.c	8016	7304	91.1%	6880	85.8%
 | |
| gencode.c	1356	1194	88.0%	1130	83.3%
 | |
| glosym.c	224	202	90.1%	190	84.8%
 | |
| main.c	732	672	91.8%	634	86.6%
 | |
| move.c	1876	1526	81.3%	1410	75.1%
 | |
| nextem.c	1288	1594	123.7%	1192	92.5%
 | |
| reg.c	1076	1014	94.2%	916	85.1%
 | |
| regvar.c	1352	1188	87.8%	1150	85.0%
 | |
| salloc.c	1240	1100	88.7%	1024	82.5%
 | |
| state.c	628	600	95.5%	532	84.7%
 | |
| subr.c	6948	6382	91.8%	5680	81.7%
 | |
| =
 | |
| averages	2939	3155	95.8%	2766	86.6%
 | |
| .TE
 | |
| .DS C
 | |
| fig 1.
 | |
| .DE
 | |
| .KE
 | |
| .KF
 | |
| .TS
 | |
| center box;
 | |
| cfI s s s
 | |
| cfI s s s
 | |
| c s s s
 | |
| c s s s
 | |
| c c | c s
 | |
| c c | c s
 | |
| c | c | c  c
 | |
| l | n | n  n.
 | |
| Differences in text segment sizes
 | |
| for the MC68020
 | |
| parts of the back end
 | |
| compiled by itself
 | |
| _
 | |
| original	 	MC68020
 | |
| compiler	(100%)	back end
 | |
| _
 | |
| name	size	size	perc.
 | |
| _
 | |
| codegen.c	12608	12134	96.2%
 | |
| compute.c	4624	4416	95.5%
 | |
| equiv.c	572	504	88.1%
 | |
| fillem.c	7780	6976	89.6%
 | |
| gencode.c	1320	1086	82.2%
 | |
| glosym.c	228	182	79.8%
 | |
| main.c	736	596	80.9%
 | |
| move.c	1392	1280	91.9%
 | |
| nextem.c	1176	1066	90.6%
 | |
| reg.c	1052	836	79.4%
 | |
| regvar.c	1196	968	80.9%
 | |
| salloc.c	1200	932	77.6%
 | |
| state.c	580	528	91.0%
 | |
| subr.c	6136	5268	85.8%
 | |
| =
 | |
| averages	2900	2627	86.4%
 | |
| .TE
 | |
| .DS C
 | |
| fig 2.
 | |
| .DE
 | |
| .KE
 | |
| Fig. 1 also includes results of an old m68k4 back end (a back end
 | |
| for the MC68000 with four byte word and pointersize). The table for
 | |
| this back end was given to me as an example, but I thought it didn't make
 | |
| good use of the MC68000's addressing capabilities, it hardly did any
 | |
| optimalization, and it sometimes even
 | |
| generated code that the assembler would not swallow.
 | |
| This was sufficient reason for me to write a completely new table.
 | |
| .PP
 | |
| The results from the table may not be taken too seriously. The sizes measured
 | |
| are the sizes of the text segments of the user programs, i.e. without the
 | |
| inclusion of library routines. Of course these segments do contain calls
 | |
| to these routines. Another thing is that the
 | |
| .I rom
 | |
| segment may be included in the text segment (this is why the
 | |
| results for the MC68000 for
 | |
| .I compute.c
 | |
| look so bad).
 | |
| .PP
 | |
| Some other things must be said about these results.
 | |
| The quality of EM code
 | |
| generated by the C front end is certainly not optimal. The front end
 | |
| uses temporary locals (extra locals that are used to evaluate expressions)
 | |
| far too quickly: for a simple C expression like
 | |
| .DS
 | |
| .I
 | |
| *(pointer) += constant
 | |
| .R
 | |
| .DE
 | |
| where
 | |
| .I pointer
 | |
| is a register variable, the C front end generates (for obscure reasons)
 | |
| a temporary local that holds the contents of \fIpointer\fR. This way
 | |
| the pattern for
 | |
| .DS
 | |
| .B
 | |
| loc lil adi sil $2==$4 && $3==4
 | |
| .R
 | |
| .DE
 | |
| for register variables is not used and longer, less efficient
 | |
| code is generated. But even in spite of this, the back end seems to
 | |
| generate rather compact code.
 | |
| .NH
 | |
| Some timing results
 | |
| .PP
 | |
| In order to measure the performance of the code generated by the back end
 | |
| some timing tests were done. The reason I chose these particular tests is
 | |
| that they were also done for many other back ends; the reader can compare
 | |
| the results if he so wishes (of course comparing the results only
 | |
| show a global difference in speed of the various machines; it doesn't
 | |
| show whether some back end generates relatively better code than another).
 | |
| .PP
 | |
| On the MC68000 machine the statements were executed one million times.
 | |
| On the MC68020 machine the statements had to be executed four million times
 | |
| because this machine was so fast that timing results would be very
 | |
| unreliable if the statements were executed only one million times.
 | |
| .PP
 | |
| For testing I used the following C test program:
 | |
| .DS
 | |
| .I
 | |
| main()
 | |
| {
 | |
|     int i, j, ...
 | |
|     ...
 | |
|     for (i=0; i<1000; i++)
 | |
|         for (j=0; j<1000; j++)
 | |
|     	    STATEMENT;
 | |
| }
 | |
| .R
 | |
| .DE
 | |
| where
 | |
| .I STATEMENT
 | |
| is any of the test statements or the empty statement. For the MC68020
 | |
| tests I used 2000 instead of 1000.
 | |
| The results of the test with the empty statement were used to calculate
 | |
| the execution times of the other test statements.
 | |
| .PP
 | |
| Figures 3 and 4 show many results. For each machine actually two tests were
 | |
| done: one with register variables, and the other without them.
 | |
| I noticed that the original C compilers on both machines did not generate
 | |
| the use of register variables, unless specifically requested. The
 | |
| back end uses register variables when and where they are profitable, even
 | |
| if the user did not ask for them.
 | |
| .KF
 | |
| .TS
 | |
| center box;
 | |
| cfI s s s s
 | |
| c s s s s
 | |
| c | c s | c s
 | |
| cw(1.5i) | c c | c c
 | |
| c | c c | c c
 | |
| lp-2fI | n n | n n.
 | |
| timing results for the MC68000
 | |
| times in @ mu @seconds
 | |
| _
 | |
| test statement	without register variables	with register variables
 | |
| _
 | |
|  	original	new MC68000	original	new MC68000
 | |
|  	C compiler	back end	C compiler	back end
 | |
| _
 | |
| int1=0;	2.8	2.7	0.5	0.5
 | |
| int1=int2-1;	4.1	4.1	1.3	1.3
 | |
| int1=int1+1;	4.1	4.1	1.3	1.3
 | |
| int1=int2*int3;	40.0	40.5	36.2	36.8
 | |
| T{
 | |
| int1=(int2<0);
 | |
| \/*true*/
 | |
| T}	5.5	7.3	2.0	4.5
 | |
| T{
 | |
| int1=(int2<0);
 | |
| \/*false*/
 | |
| T}	4.7	8.5	2.8	5.6
 | |
| T{
 | |
| int1=(int2<3);
 | |
| \/*true*/
 | |
| T}	6.2	7.7	2.6	5.4
 | |
| T{
 | |
| int1=(int2<3);
 | |
| \/*false*/
 | |
| T}	5.4	8.9	3.6	6.5
 | |
| T{
 | |
| .na
 | |
| int1=((int2>3)||(int2<3));
 | |
| \/* true || false */
 | |
| T}	6.0	7.8	3.4	5.4
 | |
| T{
 | |
| .na
 | |
| int1=((int2>3)||(int2<3));
 | |
| \/* false || true */
 | |
| T}	9.1	10.2	5.7	7.1
 | |
| T{
 | |
| .na
 | |
| switch (int1) {
 | |
| case 1: int1=0; break;
 | |
| case 2: int1=1; break;
 | |
| }
 | |
| T}	6.3	17.8	5.3	14.0
 | |
| T{
 | |
| .na
 | |
| if (int1=0) int2=3;
 | |
| \/*true*/
 | |
| T}	5.1	4.7	1.3	1.3
 | |
| T{
 | |
| .na
 | |
| if (int1=0) int2=3;
 | |
| \/*false*/
 | |
| T}	2.2	2.1	1.9	1.1
 | |
| while (int1>0) int1=int1-1;	2.2	2.1	1.1	1.1
 | |
| int1=a[int2];	6.8	6.7	4.0	3.1
 | |
| p3(int1);	14.3	11.1	13.4	10.0
 | |
| int1=f(int2);	17.7	14.5	14.8	11.7
 | |
| s.overhead=5400;	2.8	2.7	2.9	2.7
 | |
| .TE
 | |
| .DS C
 | |
| Fig. 3
 | |
| .DE
 | |
| .KE
 | |
| .KF
 | |
| .TS
 | |
| center box;
 | |
| cfI s s s s
 | |
| c s s s s
 | |
| c | c s | c s
 | |
| cw(1.5i) | c c | c c
 | |
| c | c c | c c
 | |
| lp-2fI | n n | n n.
 | |
| timing results for the MC68020
 | |
| times in @ mu @seconds
 | |
| _
 | |
| test statement	without register variables	with register variables
 | |
| _
 | |
|  	original	new MC68020	original	new MC68020
 | |
|  	C compiler	back end	C compiler	back end
 | |
| _
 | |
| int1=0;	.25	.25	.15	.15
 | |
| int1=int2-1;	1.3	1.3	.38	.38
 | |
| int1=int1+1;	1.2	.90	.38	.15
 | |
| int1=int2*int3;	4.4	4.2	3.0	3.1
 | |
| T{
 | |
| int1=(int2<0);
 | |
| \/*true*/
 | |
| T}	1.6	2.7	1.1	2.3
 | |
| T{
 | |
| int1=(int2<0);
 | |
| \/*false*/
 | |
| T}	1.9	2.9	.80	2.1
 | |
| T{
 | |
| int1=(int2<3);
 | |
| \/*true*/
 | |
| T}	1.7	2.8	1.2	2.6
 | |
| T{
 | |
| int1=(int2<3);
 | |
| \/*false*/
 | |
| T}	2.1	3.0	.85	2.3
 | |
| T{
 | |
| .na
 | |
| int1=((int2>3)||(int2<3));
 | |
| \/* true || false */
 | |
| T}	2.1	3.1	1.2	2.5
 | |
| T{
 | |
| .na
 | |
| int1=((int2>3)||(int2<3));
 | |
| \/* false || true */
 | |
| T}	3.4	4.2	1.8	3.2
 | |
| T{
 | |
| .na
 | |
| switch (int1) {
 | |
| case 1: int1=0; break;
 | |
| case 2: int1=1; break;
 | |
| }
 | |
| T}	2.7	8.0	2.0	6.9
 | |
| T{
 | |
| .na
 | |
| if (int1=0) int2=3;
 | |
| \/*true*/
 | |
| T}	1.2	1.3	.63	.63
 | |
| T{
 | |
| .na
 | |
| if (int1=0) int2=3;
 | |
| \/*false*/
 | |
| T}	1.7	1.6	.50	.53
 | |
| while (int1>0) int1=int1-1;	1.2	1.3	.55	.53
 | |
| int1=a[int2];	1.8	1.8	1.0	1.0
 | |
| p3(int1);	14.8	5.5	14.1	5.0
 | |
| int1=f(int2);	16.3	6.6	15.2	5.9
 | |
| s.overhead=5400;	.48	.48	.50	.50
 | |
| .TE
 | |
| .DS C
 | |
| Fig. 4
 | |
| .DE
 | |
| .KE
 | |
| .PP
 | |
| The reader may have noticed that on both machines the back end seems
 | |
| to generate considerably slower code for tests where a `condition' is
 | |
| used in the rhs of an assignment statement. This is in fact not true: it is
 | |
| the front end that generates bad code. Two examples: for the C statement
 | |
| .DS
 | |
| .I
 | |
| int1 = (int2 < 0);
 | |
| .R
 | |
| .DE
 | |
| the front end generates the following code for the rhs (I
 | |
| used arbitrary labels):
 | |
| .DS
 | |
| .B
 | |
| lol -16
 | |
| zlt *10
 | |
| loc 0
 | |
| bra *11
 | |
| 10
 | |
| loc 1
 | |
| 11
 | |
| .R
 | |
| .DE
 | |
| while in this case (to my opinion) it should have generated
 | |
| .DS
 | |
| .B
 | |
| lol -16
 | |
| tlt
 | |
| .R
 | |
| .DE
 | |
| which is much shorter. Another example: for the C statement
 | |
| .DS
 | |
| .I
 | |
| int1 = (int2 < 3);
 | |
| .B
 | |
| .DE
 | |
| the front end generates for the rhs
 | |
| .DS
 | |
| .B
 | |
| lol -16
 | |
| loc 3
 | |
| blt *10
 | |
| loc 0
 | |
| bra *11
 | |
| 10
 | |
| loc 1
 | |
| 11
 | |
| .R
 | |
| .DE
 | |
| while a much better translation would be
 | |
| .DS
 | |
| .B
 | |
| lol -16
 | |
| loc 3
 | |
| cmi 4
 | |
| tlt
 | |
| .R
 | |
| .DE
 | |
| .PP
 | |
| Another statement that the back end seems to generate slower code for is
 | |
| the C switch statement. This is true, but it is also caused by
 | |
| the way these things are done in EM. EM uses the
 | |
| .B csa
 | |
| or
 | |
| .B csb
 | |
| instruction, and for these two I had to use library routines. On larger
 | |
| switch statements the
 | |
| .I .csa
 | |
| routine will perform relatively better.
 | |
| .PP
 | |
| The back end generates considerably faster code for procedure and function
 | |
| calls, especially in the MC68020 case, and also for the C statement
 | |
| .DS
 | |
| .I
 | |
| int1 = int1 + 1;
 | |
| .R
 | |
| .DE
 | |
| The original C compilers use the same method for this instruction
 | |
| as for
 | |
| .DS
 | |
| .I
 | |
| int1 = int2 - 1;
 | |
| .R
 | |
| .DE
 | |
| they perform the addition in a scratch register, and then store the
 | |
| result. For the former C statement this is not necessary, because
 | |
| the MC68000 and MC68020 have an instruction that can add constants
 | |
| to almost anything (in this case: to locals). The MC68000 and MC68020
 | |
| back ends do use this instruction.
 | |
| .NH
 | |
| Some final remarks
 | |
| .PP
 | |
| As mentioned a few times before, the C front end compiler does not
 | |
| generate optimal code and as a consequence of this the
 | |
| back end does not always generate optimal code. This is especially
 | |
| the case with temporary locals, which the front end generates much
 | |
| too quickly, and also with conditional expressions that are
 | |
| used in the rhs of an assignment statement (fortunately this is not
 | |
| needed so much).
 | |
| .PP
 | |
| If
 | |
| .I cgg
 | |
| would have been able to accept operands separated by any character
 | |
| instead of just by commas (in the instruction definitions part),
 | |
| I wouldn't have had the need of the
 | |
| .I killreg
 | |
| pseudo instruction. It would also be handy to have
 | |
| .I cgg
 | |
| accept all normal C operators. At the moment
 | |
| .I cgg
 | |
| does not accept binary ands, ors and exors, even though in [4]
 | |
| it is stated that
 | |
| .I cgg
 | |
| does accept all normal C operators. As it happens I did not need the
 | |
| binary operators, but at some time in developing the table I thought
 | |
| I did.
 | |
| .PP
 | |
| I would also like
 | |
| .I cg
 | |
| to do more with the condition codes information that is supplied with
 | |
| each instruction in the instruction definitions section of the table.
 | |
| Sometimes
 | |
| .I cg
 | |
| generates test instructions which actually were not necessary. This
 | |
| of course causes the generated
 | |
| programs to be slightly larger and slightly slower.
 | |
| .PP
 | |
| In spite of the few minor shortcomings mentioned above I found
 | |
| .I cgg
 | |
| a very comfortable tool to use.
 | |
| .SH
 | |
| References
 | |
| .PP
 | |
| .IP [1]
 | |
| T. B. Steel Jr.,
 | |
| .I
 | |
| UNCOL: The myth and the Fact,
 | |
| .R
 | |
| in Ann. Rev. Auto. Prog.,
 | |
| R. Goodman (ed.), Vol. 2 (1969), pp 325 - 344
 | |
| .IP [2]
 | |
| A. S. Tanenbaum, H. van Staveren, E. G. Keizer, J. W. Stevenson,
 | |
| .I
 | |
| A practical toolkit for making portable compilers,
 | |
| .R
 | |
| Informatica Report 74, Vrije Universiteit, Amsterdam, 1983
 | |
| .IP [3]
 | |
| A. S. Tanenbaum, H. van Staveren, E. G. Keizer, J. W. Stevenson,
 | |
| .I
 | |
| Description of an experimental machine architecture for use with
 | |
| block structured languages,
 | |
| .R
 | |
| Informatica Report 81, Vrije Universiteit, Amsterdam, 1983
 | |
| .IP [4]
 | |
| H. van Staveren
 | |
| .I
 | |
| The table driven code generator from the Amsterdam Compiler Kit,
 | |
| Second Revised Edition,
 | |
| .R
 | |
| Vrije Universiteit, Amsterdam
 | |
| .IP [5]
 | |
| .I
 | |
| MC68020 32-bit Microprocessor User's Manual,
 | |
| .R
 | |
| Second Edition,
 | |
| Motorola Inc., 1985, 1984
 | |
| .IP [6]
 | |
| .I
 | |
| MC68000 16-bit Microprocessor User's Manual,
 | |
| Preliminary,
 | |
| .R
 | |
| Motorola Inc., 1979
 |