.nr PS 11 .nr VS 13p .EQ delim @@ .EN .EQ gfont R .EN .ND .RP .TL A back end table for the Motorola MC68000, MC68010 and MC68020 microprocessors .AU Frank Doodeman .AB A back end table is part of the Amsterdam Compiler Kit (ACK). It is used to produce the actual back end, a program that translates the intermediate language family EM to assembly language for some target machine. The table discussed here can be used for two back ends, suitable for in total three machines: the MC68000 and MC68010 (the difference between these two is so small that one back end table can be used for either one), or for the MC68020. .AE .NH Introduction .PP To simplify the task of producing portable (cross) compilers and interpreters the Vrije Universiteit designed an integrated collection of programs, the Amsterdam Compiler Kit (ACK) [2]. It is based on the old UNCOL idea [1] which attempts to solve the problem of how to make a compiler for each of @ N @ languages on @ M @ different machines without having to write @ N times M @ programs. .PP The UNCOL approach is to write @ N @ .I front ends, .R which translate the source language into a common intermediate language UNCOL (Universal Computer Oriented Language), and @ M @ .I back ends, .R each of which translates programs in UNCOL into a specific machine language. Under these conditions only @ M + N @ programs must be written to provide all @ N @ languages on all @ M @ machines, instead of @ M times N @ programs. .PP The intermediate language for the Amsterdam Compiler Kit is the machine language for a simple stack machine called EM (Encoding Machine) [3]. So a back end for the MC68020 translates EM code into MC68020 assembly language. Writing such a table [4] suffices to get the back end. .PP The back end is a single program that is driven by a machine dependent driving table. This table, the back end table, defines the mapping of EM code to the MC68000, MC68010 or MC68020 assembly language. .NH The MC68000 and MC68020 micro processors .PP In this document the name MC68000 will be used for both the MC68000 and the MC68010 micro processors, because as far as the back end table is concerned there is no difference between them. For a complete and detailed description of the MC68020 one is referred to [5]; for the MC68000 one might also use [6]. In this section some relevant parts will be handled. .NH 2 Registers .PP Both the MC68000 and the MC68020 have eight 32-bit data registers (@ D sub 0 @-@ D sub 7 @) that can be used for byte (8-bit), word (16-bit) and long word (32-bit) data operations. They also have seven 32-bit address registers (@ A sub 0 @-@ A sub 6 @) that may be used as software stack pointers and base address registers; address register @ A sub 7 @ is used as the system stack pointer. Address registers may also be used for word and long word address operations. .NH 2 Addressing modes .PP First the MC68000 addressing modes will be discussed. Since the MC68020's set of addressing modes is an extension of the MC68000's set, of course this section also applies to the MC68020. .PP In the description we use: .IP @ A sub n @ for address register; .IP @ D sub n @ for data register; .IP @ R sub n @ for address or data register; .IP @ X sub n @ for index register (either data or address register); .IP @ PC @ for program counter; .IP @ d sub 8 @ for 8 bit displacement integer; .IP @ d sub 16 @ for 16 bit displacement integer; .IP @ bd @ for base displacement (may be null, word or long); .IP @ od @ for outer displacement (may be null, word or long). .NH 3 General addressing modes .NH 4 Register Direct Addressing .IP Syntax: 8 @ R sub n @ .PP This addressing mode (it can be used with either a data register or an address register) specifies that the operand is in one of the 16 multifunction registers. .NH 4 Address Register Indirect .IP Syntax: 8 @ ( A sub n ) @ .PP The address of the operand is in the address register specified. .NH 4 Address Register Indirect With Postincrement .IP Syntax: 8 @ ( A sub n )+ @ .PP The address of the operand is in the address register specified. After the operand address is used, the address register is incremented by one, two or four depending upon whether the size of the operand is byte, word or long. If the address register is the stack pointer and the operand size is byte, the address register is incremented by two rather than one to keep the stack pointer on a word boundary. .NH 4 Address Register Indirect With Predecrement .IP Syntax: 8 @ -( A sub n ) @ .PP The address of the operand is in the address register specified. Before the operand address is used, the address register is decremented by one, two or four depending upon whether the size of the operand is byte, word or long. If the address register is the stack pointer and the operand size is byte, the address register is decremented by two rather than one to keep the stack pointer on a word boundary. .NH 4 Address Register Indirect With Displacement .IP Syntax: 8 @ d sub 16 ( A sub n ) @ for the MC68000, @ ( d sub 16 , A sub n ) @ for the MC68020 .PP This address mode requires one word of extension. The address of the operand is the sum of the contents of the address register and the sign extended 16-bit integer in the extension word. .NH 4 Address Register Indirect With Index .IP Syntax: 8 @ d sub 8 ( A sub n , X sub n .size) @ for the MC68000, @ ( d sub 8 , A sub n , X sub n .size) @ for the MC68020 .PP This address mode requires one word of extension according to a certain format, which specifies .IP 1. which register to use as index register; .IP 2. a flag that indicates whether the index register is a data register or an address register; .IP 3. a flag that indicates the index size; this is .I word when the low order part of the index register is to be used, and .I long when the whole long value in the register is to be used as index; .IP 4. an 8-bit displacement integer (the low order byte of the extension word). .PP The address of the operand is the sum of the contents of the address register, the possibly sign extended contents of index register and the sign extended 8-bit displacement. .NH 4 Absolute Data Addressing .IP Syntax: 8 @ address @ for the MC68000, @ ( address ) @ for the MC68020 .PP Two different kinds of this mode are available: .IP 1. Absolute Short Address; this mode requires one word of extension. The address of the operand is the sign extended 16-bit extension word. .IP 2. Absolute Long Address; this mode requires two words of extension. The address of the operand is developed by concatenation of the two extension words; the high order part of the address is the first extension word, the low order part is the second. .NH 4 Program Counter With Displacement. .IP Syntax: 8 @ d sub 16 ( PC ) @ for the MC68000, @ ( d sub 16 , PC ) @ for the MC68020 .PP This mode requires one word of extension. The address of the operand is the sum of the address in the program counter and the sign extended 16-bit displacement integer in the extension word. The value in the program counter is the address of the extension word. .NH 4 Program Counter With Index .IP Syntax: 8 @ d sub 8 ( PC , X sub n .size ) @ for the MC68000, @ ( d sub 8 , PC, X sub n .size ) @ for the MC68020 .PP This mode requires one word of extension as described under .I Address Register Indirect With Index. .R The address of the operand is the sum of the value in the program counter, the possibly sign extended index register and the sign extended 8-bit displacement integer in the extension word. The value in the program counter is the address of the extension word. .NH 4 Immediate Data .IP Syntax: 8 @ \#data @ .PP This addressing mode requires either one or two words of extension, depending on the size of the operation; .IP byte operation - the operand is in the low order byte of extension word; .IP word operation - the operand is in the extension word; .IP long operation - the operand is in the two extension words, the high order 16-bits are in the first extension word, the low order 16-bits in the second. .NH 3 Extra MC68020 addressing modes .PP The MC68020 has three more addressing modes. These modes all use a displacement (some even two), an address register and an index register. Instead of the address register one may also use the program counter. Any of these may be omitted. If all addends are omitted the processor creates an effective address of zero. All of these three modes require at least one extension word, the .I Full Format Extension Word, .R which specifies: .IP 1. the index register number (0-7); .IP 2. the index register type (address or data register); .IP 3. the size of the index (only low order part or the whole register) .IP 4. a scale factor. This is a number from 0 to 3 which specifies how many bits the contents of the index register is to be shifted to the left before being used as an index; .IP 5. a flag that specifies whether the base (address) register is to be added or to be suppressed; .IP 6. a flag that specifies whether to add or suppress the index operand; .IP 7. two bits that specify the size of the base displacement (null, word or long); .IP 8. three bits that in combination with (6) above specify which of the three addressing modes (described below) to use and, if used, the size of the outer displacement (null, word or long). .IP N.B. All modes mentioned above for the MC68000 that use an index register may have this register scaled (only when using the MC68020). .PP The three extra addressing modes are: .NH 4 Address Register Indirect With Index (Base Displacement) .IP Syntax: 8 @ ( bd , A sub n , X sub n .size*scale ) @ (MC68020 only) .PP The address of the operand is the sum of the contents of the address register, the scaled contents of the possibly scaled index register and the possibly sign extended base displacement. When the program counter is used instead of the address register, the value in the program counter is the address of the full format extension word. This mode requires one or two more extension words when the size of the base displacement is word or long respectively. .PP Note that without the index operand, this mode is an extension of the .I Address Register Indirect With Displacement .R mode; when using the MC68020 one is no longer limited to a 16-bit displacement. Also note that with the index operand added, this mode is an extension of the .I Address Register Indirect With Index .R mode; when using the MC68020 one is no longer limited to an 8-bit displacement. .NH 4 Memory Indirect Post-Indexed .IP Syntax: 8 @ ( [ bd , A sub n ] , X sub n .size*scale , od ) @ (MC68020 only) .PP This mode may use an outer displacement. First an intermediate memory address is calculated by adding the contents of the address register and the possibly sign extended base displacement. This address is used for in indirect memory access of a long word, followed by adding the index operand (scaled and possibly signed extended). Finally the outer displacement is added to yield the address of the operand. When the program counter is used, the value in the program counter is the address of the full format extension word. .NH 4 Memory Indirect Pre-Indexed .IP Syntax: 8 @ ( [ bd , A sub n , X sub n .size*scale ] , od ) @ (MC68020 only) .PP This mode may use an outer displacement. First an intermediate memory address is calculated by adding the contents of the address register, the scaled contents of the possibly sign extended index register and the possibly sign extended base displacement. This address is used for an indirect memory access of a long word, followed by adding the outer displacement to yield the address of the operand. When the program counter is used, the value in the program counter is the address of the full format extension word. .NH 3 Addressing modes used in the table .PP Not all addressing modes mentioned above are used in code generation. It is clear that none of the modes that use the program counter PC can be used, since at code generation time nothing is known about the value in PC. Also some of the possibilities of the three MC68020 addressing modes are not used; e.g. it is possible to use a .I Data Register Indirect .R mode, which actually is the .I Address Register Indirect With Index .R mode, with the address register and the displacement left out. However such a mode would require two extra bytes for the full format extension word, and it would also be much slower than using .I Address Register Indirect. .R For this kind of reasons several possible addressing modes are not used in the generation of code. In the table address registers are only used for holding addresses, and for index registers only data registers are used. .NH The M68000 and MC68020 back end table .PP The table itself has to be run through the C preprocessor before it can be used to generate the back end (called .I code generator .R or .I cg for short). When no flags are given to the preprocessor an MC68020 code generator is produced; for the MC68000 code generator one has to run the table through the preprocessor using the .I -Dm68k4 flag. .PP The table is designed as described in [4]. For the overall design of a back end table one is referred to this document. This section only deals with problems encountered in writing the table and other things worth noting. .NH 2 Constant Definitions .PP Wordsize and pointersize (EM_WSIZE and EM_PSIZE respectively) are defined as four (bytes). EM_BSIZE, the hole between AB (the parameter base) and LB (the local base), is eight bytes: only the return address and the localbase are saved. .NH 2 Properties .PP Since Hans van Staveren in his document [4] clearly states that .I cg execution time is negatively influenced by the number of properties, only four different properties have been defined. Besides, since the registers really are multifunctional, these four are really all that are needed. .NH 2 Registers .PP The table uses register variables: @ D sub 3 @ - @ D sub 7 @ are used as general register variables, and address registers @ A sub 2 @ - @ A sub 5 @ are used as pointer register variables. @ A sub 6 @ is reserved for the localbase. .NH 2 Tokens .PP At first glance one might wonder about the amount of tokens, especially for the MC68020, considering the small amount of different addressing modes. However, the last three addressing modes mentioned for the MC68020 may omit any of the addends, and this leads to a large amount of different tokens. I did consider the possibility of enlarging the number of tokens and sets even further, because there might be assemblers that don't handle displacements of zero optimally (they might generate a 2 byte extension word holding zero). The small profit in bytes in the generated code however does not justify the increase in size of the token section, the set section and the patterns section, so this idea was not developed any further. .PP The timing cost of the tokens may be incorrect for some MC68000 tokens. This is because the MC68000 uses a 16-bit data bus which causes the need of two separate memory accesses for getting 32-bit operands. .NH 3 Token names .PP The amount of tokens and the limited capability of the authors imagination might have caused the names of some tokens not to be very clarifying. Some information about the names may be in place here. .PP Whenever part of a token name is in capitals that part is memory indirected (i.e. in square brackets). In token names .I OFF and .I off mean an offsetted address register, so an address register with a displacement (either base displacement or outer displacement). .I IND, ind .R and .I index stand for indexed, or index register. .I ABS and .I abs stand for absolute, which actually is just a displacement (base or outer). These `rules' only apply to names of tokens that represent actual operands. There are also tokens that represent addresses of operands. These (with a few exceptions) contain .I regA, regX .R and .I con as parts of there names, which stand for address register, index register and displacement (always base displacement) respectively. If the address to which the token refers uses memory indirection, that part of the name comes first (in small letters), followed by an underscore. The memory indirection part follows the `rules' for operand token names. .PP Of course there are exceptions to these `rules' but in those cases the names are self explanatory. .PP Two special cases: .I ext_regX is the name of the token that represents the address of an absolute indexed operand, syntax @ ( bd , X sub n .size*scale ) @; .I regX does not represent any real mode, but is used with EM array instructions and pointer arithmetic. .NH 3 Special tokens for the MC68000 .PP The MC68000 requires two extra tokens, which are called .I t_regAcon and .I t_regAregXcon. .R They are necessary because .I regAcon can only have a 16-bit displacement on the MC68000, and .I regAregXcon uses only 8 bits for its displacement. To prevent these addressing modes to be used with displacements that are too large, the extra tokens are needed. Whenever the displacements become too large and they need to be used in the generation of assembly code, these tokens are transformed into other tokens. To prevent the table from becoming too messy I defined .I t_regAcon and .I t_regAregXcon to be identical to .I regAcon and .I regAregXcon respectively for the MC68020. .NH 2 Sets .PP Most set names used in the table are self explanatory, especially to the reader who is familiar with the four addressing categories as mentioned in [5]: .I data, memory, alterable .R and .I control. .R In the sets definition part some sets are defined that are not used elsewhere in the table, but are only used to be part of the definition of some other set. This keeps the set definition part from getting too unreadable. .PP The sets called .I imm_cmp consist of all tokens that can be used to compare with a constant. .NH 2 Instructions .PP Only the instructions that are used in code generation are listed here. The first few instructions are meant especially for the use with register variables. The operand LOCAL used here refers to a register variable. The reader may not conclude that these operations are also allowed on ordinary locals. The space and timing cost of these instructions have been adapted, but the use of the word LOCAL for register variables causes these cost to be inaccurate anyway. .PP The .I killreg instruction, which generates a comment in the assembly language output and which is meant to let .I cg know that the data register operand has its contents destroyed, needs some explaining but this explanation is better in place in the discussion of groups 3 and 4 of the section about patterns. .PP The timing cost of the instructions are probably not very accurate for the MC68020 because the MC68020 uses an instruction cache and prefetch. The cost used in the table are the `worst case cost' as mentioned in section 9 of [5]. .NH 2 Moves .PP These are all pretty straightforward, except perhaps when .I t_regAcon and .I t_regAregXcon are used. In these cases the size of the displacement has to be checked before moving. This also applies to the stacking rules and the coercions. .NH 2 Tests .PP These three tests (one fore each operation size) could not be more straightforward than they are now. .NH 2 Stackingrules .PP The only peculiar stackingrule is the one for .I regX. .R This token is only used with EM array instructions and with pointer arithmetic. Whenever it is put on the fake stack, some EM instructions are left in the instruction stream to remove this token. Consequently it should never have to be stacked. However the .I code generator generator .R (or .I cgg for short) complained about not having a stackingrule for this token, so it had to be added nevertheless. .NH 2 Coercions .PP These are all straightforward. There are no splitting coercions since the fake stack never contains any tokens that can be split. There are only two unstacking coercions. The rest are all transforming coercions. Almost all coercions transform tokens into either a data register or an address register, except in the MC68000 part of the table the .I t_regAcon and .I t_regAregXcon tokens are transformed into real .I regAcon and .I regAregXcon tokens with displacements that are properly sized. .NH 2 Patterns .PP This is the largest part of the table. It is subdivided into 17 groups. We will take a closer look at the more interesting groups. .NH 3 Group 0: rules for register variables .PP This group makes sure that EM instructions using register variables are handled efficiently. This group includes: local loads and stores; arithmetic, shifts and logical operations on locals and indirect locals and pointer handling, where C expressions like .I *cp++ .R are handled. For such an expression there are several EM instruction sequences the front end might generate. For an integer pointer e.g.: .DS .B lol lol adp stl loi $1==$2 && $1==$4 && $3==4 && $5==4 .I .DE or .DS .B lol loi lol adp stl $1==$3 && $3==$5 && $2==4 && $5==4 .I .DE or perhaps even .DS .B lil lol adp stl $1==$2 && $2==$4 && $3==4 .I .DE Each of these is included, since which one is generated is is up to the front end. If the front end is consistent this will mean that some of these patterns will never be used in code generation. This might seem a waist, but anyone who thinks that will certainly change his mind when his new C front end generates a different EM instruction sequence. .NH 3 Groups 1 and 2: load and store instructions .PP In these groups .B lof and .B stf , .B loi and .B sti , .B ldf and .B sdf are the important instructions. These are the large parts in this group, especially the .B loi and .B sti instructions, because they come in three basic sizes (byte, word and long). Note that with these instructions in the MC68000 part the .I exact is omitted in front of .I regAcon and .I regAregXcon. .R This makes sure that .I t_regAcon and .I t_regAregXcon are transformed into proper tokens before they are used as addresses. .PP Also note that the .I regAregXcon token is completely left out from the \fBlof\fR, \fBstf\fR, \fBldf\fR and \fBsdf\fR instruction handling. This is because the sum of the token displacement and the offset provided in the instruction cannot be checked and is likely to exceed 8 bits. Unfortunately .I cgg does not allow the inspection of subregisters of tokens that are on the fake stack. This same problem might also occur with the .I regAcon token, but this is less likely because it uses 16-bit displacements. Besides if it would have been left out the \fBlof\fR, \fBstf\fR, \fBldf\fR and \fBsdf\fR instructions would have been handled considerably less efficient. .NH 3 Groups 3 and 4: integer and unsigned arithmetic .PP EM instruction .B sbi also works with address registers, because the .B cmp instruction in group 12 is replaced by \fBsbi 4\fR. .PP For the MC68000 \fBmli\fR, \fBmlu\fR, \fBdvi\fR, \fBdvu\fR, \fBrmi\fR and \fBrmu\fR are handled by library routines. This is because the MC68000 has only 16-bit multiplications and divisions. .PP The MC68020 does have 32-bit multiplications and divisions, but for the .B rmi and .B rmu EM instructions peculiar things happen anyway: they generate the .I killreg instruction. This is necessary because the data register that first held the dividend now holds the quotient; the original contents are destroyed without .I cg knowing about it (the destruction of the two registers that make up the .I DREG_pair token couldn't be noted in the instructions part of the table). To let .I cg know that these contents are destroyed, we have to use this `pseudo instruction' from lack of a better solution. .NH 3 Group 5: floating point arithmetic .PP Since floating point arithmetic is not implemented traps will be generated here. .NH 3 Group 6: pointer arithmetic .PP This also is a very important group, along with groups 1 and 2. The MC68020 has many different addressing modes and if possible they should be used in the generation of assembly language. .PP The .I regX token is generated here too. It is meant to make efficient use of the MC68020 possibility of scaling index registers. .PP Note that I would have liked one extra pattern to handle C-statements like .DS .I pointer += expr ? constant1 : constant2; .R .DE efficiently. This pattern would have looked like: .DS pat ads with const leaving adp %1.num .DE but when .I cg is coming to the EM replacement part, the constant has already been removed from the fake stack, causing .I %1.num to have a wrong value. .NH 3 Group 9: logical instructions .PP The EM instructions \fBand\fR, .B ior and .B xor are so much alike that procedures can be used here, except for the .B xor $1==4 .R instruction, because the MC68000 .I eor instruction does not allow as many kinds of operands as .I and and .I or. .R .NH 3 Group 11: arrays .PP This group also tries to make efficient use of the available addressing modes, but it leaves the actual work to group 6 mentioned above. .PP The .I regX token is also generated here. In this group this token is very useful for handling array instructions for arrays with one, two, four or eight byte elements; the array index goes into the index register, which can then be scaled appropriately. An offset is used when the first array element has an index other than zero. .PP I would have liked some extra patterns here too but they won't work for the same reasons as explained in the discussion of group 6. .NH 3 Group 14: procedure calls instructions .PP The function return area consists of registers @ D sub 0 @ and @ D sub 1 @. .NH 3 Group 15: miscellaneous instructions .PP In many cases here library routines are called. These will be discussed later. .PP Two special EM instructions are included here: \fBdch\fR, and \fBlpb\fR. I don't know when they are generated by a front end, but these instructions were also in the back end table for the PDP. In the PDP table these instructions were replaced by .B loi 4 .R and .B adp 8 .R respectively. I included them both, since they couldn't do any harm. .NH 3 Extra group: optimalization .PP This group is handling EM patterns with more than one instruction. This group is not absolutely necessary but it makes the generation of code more efficient. Among the things that are handled here are: arithmetic and logical operations on locals, externals and indirect locals; shifting of locals, externals and indirect locals by one; some pointer arithmetic; tests in combination with logical and's and or's or with branches. Finally there are sixteen patterns about divisions that could be handled more efficiently by right shifts and which I think should be handled by the peephole optimizer (since it also handles the same patterns with multiplication). .NH The library routines .PP The table is supplied with two separate libraries: one for the MC68000 and one for the MC68020. The MC68000 uses a couple more routines than the MC68020 because it doesn't have 32-bit division and multiplication. .PP The routines that need to pop their operands first store their return address. Routines that need other register besides @ D sub 0 @-@ D sub 2 @ and @ A sub 0 @-@ A sub 1 @ first store the original contents of those registers. @ D sub 0 @-@ D sub 2 @ and @ A sub 0 @-@ A sub 1 @ do not have to be saved because if they contain anything useful, their contents are pushed on the stack before the routine is called. .PP The .I .trp routine just prints a message stating the trap number and exits (except of course when that particular trap number is masked). Usually higher level languages use their own trap handling routines. .PP The .I .mon routine doesn't do anything useful at all. It just prints a message stating that the specified system call is not implemented and then exits. Front ends usually generate calls to special routines rather than the EM instruction \fBmon\fR. These routines have to be supplied in another library. They may be system dependent (e.g. the MC68000 machine this table was tested on first moves the parameters to registers, then moves the system call number to @ D sub 0 @ and then executes .I trap #0, .R whereas the MC68020 machine this table was tested on required the parameters to be on the stack rather than in registers). Therefor this library is not discussed here. .PP The .I .printf routine is included for EM diagnostic messages. It can print strings using %s, 16-bit decimal numbers using %d and 32-bit hexadecimal numbers using %x. .PP The .I .strhp routine stores a new EM heap pointer, and sometimes it needs to allocate more heap space. This is done by calling the system call routine \fI_brk\fR. Chunks of 1K bytes are allocated, but this can easily be changed into larger or smaller chunks. .PP The MC68000 library also contains a routine to handle the EM instruction \fBrck\fR. The MC68020 has an instruction .I cmp2 that is specially meant for range checking so the MC68020 library can do without that routine. .PP The MC68000 library has two multiplication routines, one for unsigned and the other for signed multiplication. The one for signed multiplication first tests the sizes of the operands, to see if it can perform the 16 bit machine instruction instead of the routine. If not, it considers it's two operands being two digit numbers in a 65535-radix system. It uses the 16-bit unsigned multiply instruction .I mulu three times (it does not calculate the high order result), and adds up the intermediary results the proper way. The signed multiplication routine calculates the sign of the result, calculates the result as it it were an unsigned multiplication, and adjusts the sign of the result. Here testing the operands for there sizes would be less simple, because the operands are signeds; so that is not done here. .PP The MC68000 library also has two division routines. The routine for unsigned division uses the popular algorithm, where the divisor is shifted out and the quotient shifted in. The signed division routine calculates the sign of both the quotient and the remainder, calls the unsigned division routine and adjusts the signs for the quotient and the remainder. .PP The .I .nop routine is included for testing purposes. This routine prints the line number and the value in the stack pointer. Calls to this routine are generated by the EM instruction \fBnop\fR, which is ordinarily left out by the peephole optimizer. .NH Testing the table .PP There are special test programs available for testing back end tables. First there is the EM test set, which tests most EM instructions, making good use of the .B nop instruction. Then there are the Pascal and C test programs. The Pascal test programs report errors, which makes it relatively easy to find out what was wrong in the table. The C test programs just generate some output, which then has to be compared to the expected output. Differences are not only caused by errors but also e.g. by the use of four byte integers and unsigneds (which this table does), the use of signed characters instead of unsigned characters (the C front end I used generated signed characters) or because the back end does not support floating point. These differences have to be `filtered out' to reveal the differences caused by actual errors in the back end table. These errors then have to be found out by examining the assembly code, for no proper diagnostic messages are generated. .PP After these three basic tests there still remain a number of patterns that haven't been tested yet. Fortunately .I cgg offers the possibility of generating a special .I cg that can print a list of patterns that haven't been used in code generation yet. For these patterns the table writer has to write his own test programs. This may complicate things a bit because errors may now be caused by errors in the back end table as well as errors in the test programs. The latter happened quite often to me, because I found EM to be an uncomfortable programming language (of course it isn't meant to be a programming language, but an intermediary language). .PP There still remain a couple of patterns in this table that haven't been tested yet. However these patterns all have very similar cases that have been tested (an example of this is mentioned in the section on group 0 of the patterns section of the table). Some patterns have to do with floating point numbers. These EM instructions all generate traps, so they didn't all have to be tested. The two instructions .B dch and .B lpb haven't been tested in this table, but since they only use EM replacement and they have been tested in the PDP back end table, these two should be all right. .NH Performance of the back end .PP To test the performance of the back end I gathered a couple of C programs and compiled them on the machines I used to test the back ends on. I compiled them using the C compiler that was available there and I also compiled them using the back end. I then compared the sizes of the text segments in the object files. The final results of these comparisons are in fig. 1 and fig. 2. .KF .TS center box; cfI s s s s s c s s s s s c c | c s | c s c c | c s | c s c | c | c c | c c l | n | n n | n n. Differences in text segment sizes for the MC68000 parts of the back end compiled by itself _ original old m68k4 new MC68000 compiler (100%) back end back end _ name size size perc. size perc. _ codegen.c 13892 16224 116.7% 12860 92.5% compute.c 4340 4502 103.7% 4530 104.3% equiv.c 680 662 97.3% 598 87.9% fillem.c 8016 7304 91.1% 6880 85.8% gencode.c 1356 1194 88.0% 1130 83.3% glosym.c 224 202 90.1% 190 84.8% main.c 732 672 91.8% 634 86.6% move.c 1876 1526 81.3% 1410 75.1% nextem.c 1288 1594 123.7% 1192 92.5% reg.c 1076 1014 94.2% 916 85.1% regvar.c 1352 1188 87.8% 1150 85.0% salloc.c 1240 1100 88.7% 1024 82.5% state.c 628 600 95.5% 532 84.7% subr.c 6948 6382 91.8% 5680 81.7% = averages 2939 3155 95.8% 2766 86.6% .TE .DS C fig 1. .DE .KE .KF .TS center box; cfI s s s cfI s s s c s s s c s s s c c | c s c c | c s c | c | c c l | n | n n. Differences in text segment sizes for the MC68020 parts of the back end compiled by itself _ original MC68020 compiler (100%) back end _ name size size perc. _ codegen.c 12608 12134 96.2% compute.c 4624 4416 95.5% equiv.c 572 504 88.1% fillem.c 7780 6976 89.6% gencode.c 1320 1086 82.2% glosym.c 228 182 79.8% main.c 736 596 80.9% move.c 1392 1280 91.9% nextem.c 1176 1066 90.6% reg.c 1052 836 79.4% regvar.c 1196 968 80.9% salloc.c 1200 932 77.6% state.c 580 528 91.0% subr.c 6136 5268 85.8% = averages 2900 2627 86.4% .TE .DS C fig 2. .DE .KE Fig. 1 also includes results of an old m68k4 back end (a back end for the MC68000 with four byte word and pointersize). The table for this back end was given to me as an example, but I thought it didn't make good use of the MC68000's addressing capabilities, it hardly did any optimalization, and it sometimes even generated code that the assembler would not swallow. This was sufficient reason for me to write a completely new table. .PP The results from the table may not be taken too seriously. The sizes measured are the sizes of the text segments of the user programs, i.e. without the inclusion of library routines. Of course these segments do contain calls to these routines. Another thing is that the .I rom segment may be included in the text segment (this is why the results for the MC68000 for .I compute.c look so bad). .PP Some other things must be said about these results. The quality of EM code generated by the C front end is certainly not optimal. The front end uses temporary locals (extra locals that are used to evaluate expressions) far too quickly: for a simple C expression like .DS .I *(pointer) += constant .R .DE where .I pointer is a register variable, the C front end generates (for obscure reasons) a temporary local that holds the contents of \fIpointer\fR. This way the pattern for .DS .B loc lil adi sil $2==$4 && $3==4 .R .DE for register variables is not used and longer, less efficient code is generated. But even in spite of this, the back end seems to generate rather compact code. .NH Some timing results .PP In order to measure the performance of the code generated by the back end some timing tests were done. The reason I chose these particular tests is that they were also done for many other back ends; the reader can compare the results if he so wishes (of course comparing the results only show a global difference in speed of the various machines; it doesn't show whether some back end generates relatively better code than another). .PP On the MC68000 machine the statements were executed one million times. On the MC68020 machine the statements had to be executed four million times because this machine was so fast that timing results would be very unreliable if the statements were executed only one million times. .PP For testing I used the following C test program: .DS .I main() { int i, j, ... ... for (i=0; i<1000; i++) for (j=0; j<1000; j++) STATEMENT; } .R .DE where .I STATEMENT is any of the test statements or the empty statement. For the MC68020 tests I used 2000 instead of 1000. The results of the test with the empty statement were used to calculate the execution times of the other test statements. .PP Figures 3 and 4 show many results. For each machine actually two tests were done: one with register variables, and the other without them. I noticed that the original C compilers on both machines did not generate the use of register variables, unless specifically requested. The back end uses register variables when and where they are profitable, even if the user did not ask for them. .KF .TS center box; cfI s s s s c s s s s c | c s | c s cw(1.5i) | c c | c c c | c c | c c lp-2fI | n n | n n. timing results for the MC68000 times in @ mu @seconds _ test statement without register variables with register variables _ original new MC68000 original new MC68000 C compiler back end C compiler back end _ int1=0; 2.8 2.7 0.5 0.5 int1=int2-1; 4.1 4.1 1.3 1.3 int1=int1+1; 4.1 4.1 1.3 1.3 int1=int2*int3; 40.0 40.5 36.2 36.8 T{ int1=(int2<0); \/*true*/ T} 5.5 7.3 2.0 4.5 T{ int1=(int2<0); \/*false*/ T} 4.7 8.5 2.8 5.6 T{ int1=(int2<3); \/*true*/ T} 6.2 7.7 2.6 5.4 T{ int1=(int2<3); \/*false*/ T} 5.4 8.9 3.6 6.5 T{ .na int1=((int2>3)||(int2<3)); \/* true || false */ T} 6.0 7.8 3.4 5.4 T{ .na int1=((int2>3)||(int2<3)); \/* false || true */ T} 9.1 10.2 5.7 7.1 T{ .na switch (int1) { case 1: int1=0; break; case 2: int1=1; break; } T} 6.3 17.8 5.3 14.0 T{ .na if (int1=0) int2=3; \/*true*/ T} 5.1 4.7 1.3 1.3 T{ .na if (int1=0) int2=3; \/*false*/ T} 2.2 2.1 1.9 1.1 while (int1>0) int1=int1-1; 2.2 2.1 1.1 1.1 int1=a[int2]; 6.8 6.7 4.0 3.1 p3(int1); 14.3 11.1 13.4 10.0 int1=f(int2); 17.7 14.5 14.8 11.7 s.overhead=5400; 2.8 2.7 2.9 2.7 .TE .DS C Fig. 3 .DE .KE .KF .TS center box; cfI s s s s c s s s s c | c s | c s cw(1.5i) | c c | c c c | c c | c c lp-2fI | n n | n n. timing results for the MC68020 times in @ mu @seconds _ test statement without register variables with register variables _ original new MC68020 original new MC68020 C compiler back end C compiler back end _ int1=0; .25 .25 .15 .15 int1=int2-1; 1.3 1.3 .38 .38 int1=int1+1; 1.2 .90 .38 .15 int1=int2*int3; 4.4 4.2 3.0 3.1 T{ int1=(int2<0); \/*true*/ T} 1.6 2.7 1.1 2.3 T{ int1=(int2<0); \/*false*/ T} 1.9 2.9 .80 2.1 T{ int1=(int2<3); \/*true*/ T} 1.7 2.8 1.2 2.6 T{ int1=(int2<3); \/*false*/ T} 2.1 3.0 .85 2.3 T{ .na int1=((int2>3)||(int2<3)); \/* true || false */ T} 2.1 3.1 1.2 2.5 T{ .na int1=((int2>3)||(int2<3)); \/* false || true */ T} 3.4 4.2 1.8 3.2 T{ .na switch (int1) { case 1: int1=0; break; case 2: int1=1; break; } T} 2.7 8.0 2.0 6.9 T{ .na if (int1=0) int2=3; \/*true*/ T} 1.2 1.3 .63 .63 T{ .na if (int1=0) int2=3; \/*false*/ T} 1.7 1.6 .50 .53 while (int1>0) int1=int1-1; 1.2 1.3 .55 .53 int1=a[int2]; 1.8 1.8 1.0 1.0 p3(int1); 14.8 5.5 14.1 5.0 int1=f(int2); 16.3 6.6 15.2 5.9 s.overhead=5400; .48 .48 .50 .50 .TE .DS C Fig. 4 .DE .KE .PP The reader may have noticed that on both machines the back end seems to generate considerably slower code for tests where a `condition' is used in the rhs of an assignment statement. This is in fact not true: it is the front end that generates bad code. Two examples: for the C statement .DS .I int1 = (int2 < 0); .R .DE the front end generates the following code for the rhs (I used arbitrary labels): .DS .B lol -16 zlt *10 loc 0 bra *11 10 loc 1 11 .R .DE while in this case (to my opinion) it should have generated .DS .B lol -16 tlt .R .DE which is much shorter. Another example: for the C statement .DS .I int1 = (int2 < 3); .B .DE the front end generates for the rhs .DS .B lol -16 loc 3 blt *10 loc 0 bra *11 10 loc 1 11 .R .DE while a much better translation would be .DS .B lol -16 loc 3 cmi 4 tlt .R .DE .PP Another statement that the back end seems to generate slower code for is the C switch statement. This is true, but it is also caused by the way these things are done in EM. EM uses the .B csa or .B csb instruction, and for these two I had to use library routines. On larger switch statements the .I .csa routine will perform relatively better. .PP The back end generates considerably faster code for procedure and function calls, especially in the MC68020 case, and also for the C statement .DS .I int1 = int1 + 1; .R .DE The original C compilers use the same method for this instruction as for .DS .I int1 = int2 - 1; .R .DE they perform the addition in a scratch register, and then store the result. For the former C statement this is not necessary, because the MC68000 and MC68020 have an instruction that can add constants to almost anything (in this case: to locals). The MC68000 and MC68020 back ends do use this instruction. .NH Some final remarks .PP As mentioned a few times before, the C front end compiler does not generate optimal code and as a consequence of this the back end does not always generate optimal code. This is especially the case with temporary locals, which the front end generates much too quickly, and also with conditional expressions that are used in the rhs of an assignment statement (fortunately this is not needed so much). .PP If .I cgg would have been able to accept operands separated by any character instead of just by commas (in the instruction definitions part), I wouldn't have had the need of the .I killreg pseudo instruction. It would also be handy to have .I cgg accept all normal C operators. At the moment .I cgg does not accept binary ands, ors and exors, even though in [4] it is stated that .I cgg does accept all normal C operators. As it happens I did not need the binary operators, but at some time in developing the table I thought I did. .PP I would also like .I cg to do more with the condition codes information that is supplied with each instruction in the instruction definitions section of the table. Sometimes .I cg generates test instructions which actually were not necessary. This of course causes the generated programs to be slightly larger and slightly slower. .PP In spite of the few minor shortcomings mentioned above I found .I cgg a very comfortable tool to use. .SH References .PP .IP [1] T. B. Steel Jr., .I UNCOL: The myth and the Fact, .R in Ann. Rev. Auto. Prog., R. Goodman (ed.), Vol. 2 (1969), pp 325 - 344 .IP [2] A. S. Tanenbaum, H. van Staveren, E. G. Keizer, J. W. Stevenson, .I A practical toolkit for making portable compilers, .R Informatica Report 74, Vrije Universiteit, Amsterdam, 1983 .IP [3] A. S. Tanenbaum, H. van Staveren, E. G. Keizer, J. W. Stevenson, .I Description of an experimental machine architecture for use with block structured languages, .R Informatica Report 81, Vrije Universiteit, Amsterdam, 1983 .IP [4] H. van Staveren .I The table driven code generator from the Amsterdam Compiler Kit, Second Revised Edition, .R Vrije Universiteit, Amsterdam .IP [5] .I MC68020 32-bit Microprocessor User's Manual, .R Second Edition, Motorola Inc., 1985, 1984 .IP [6] .I MC68000 16-bit Microprocessor User's Manual, Preliminary, .R Motorola Inc., 1979