. \" $Id$" .RP .ND Dec 1984 .TL .B A backend table for the 6500 microprocessor .R .AU Jan van Dalen .AB The backend table is part of the Amsterdam Compiler Kit (ACK). It translates the intermediate language family EM to a machine code for the MCS6500 microprocessor family. .AE .bp .DS C .B THE MCS6500 MICROPROCESSOR. .R .DE .NH 0 Introduction .PP Why a back end table for the MCS6500 microprocessor family. Although the MCS6500 microprocessor family has an simple instruction set and internal structure, it is used in a variety of microcomputers and homecomputers. This is because of is low cost. As an example the Apple II, a well known and width spread microprocessor, uses the MCS6502 CPU. Also the BBC homecomputer, whose popularity is growing day by day uses the MCS6502 CPU. The BBC homecomputer is based on the MCS6502 CPU although better and stronger microprocessors are available. The designers of Acorn computer Industries have probably choosen for the MCS6502 because of the amount of software available for this CPU. Since its width spreaded use, a variaty of software will be needed for it. One can think of games!!, administration programs, teaching programs, basic interpreters and other application programs. Even do it will not be possible to run the total compiler kit on a MCS6500 based computer, it is possible to write application programs in a high level language, such as Pascal or C on a minicomputer. These application programs can be tested and compiled on that minicomputer and put in a ROM (Read Only Memory), for example, cso that it an be executed by a MCS6500 CPU. The strategy of writing testprograms on a minicomputer, compile it and then execute it on a MCS6500 based microprocessor is used by the development of the back end. The minicomputer used is M68000 based one, manufactured by Bleasdale Computer Systems Ltd.. The micro- or homecomputer used is a BBC microcomputer, manufactured by Acorn Computer Ltd.. .NH The MOS Technology MCS6500 .PP The MCS6500 is as a family of CPU devices developed by MOS Technology [1]. The members of the MCS6500 family are the same chips in a different housing. The MCS6502, the big brother in the family, can handle 64k bytes of memory, while for example the MCS6504 can only handle 8k bytes of memory. This difference is due to the fact that the MCS6502 is in a 40 pins house and the MCS6504 has a 28 pins house, so less address lines are available. .bp .NH The MCS6500 CPU programmable registers .PP The MCS6500 series is based on the same chip so all have the same programmable registers. .sp 9 .NH 2 The accumulator A. .PP The accumulator A is the only register on which the arithmetic and logical instructions can be used. For example, the instruction ADC (add with carry) adds the contents of the accumulator A and a byte from memory or data. .NH 2 The index register X. .PP As the name suggests this register can be used for some indirect addressing modes. The modes are explaned below. .NH 2 The index register Y. .PP This register is, just as the index register X, used for certain indirect addressing modes. These addressing modes are different from the modes which use index register X. .NH 2 The program counter PC .PP This is the only 16-bit register available. It is used to point to the next instruction to be carried out. .NH 2 The stack pointer SP .PP The stack pointer is an 8-bit register, so the stack can contain at most 256 bytes. The CPU always appends 00000001 as highbyte of any stack address, which means that memory locations .B 0100 .R through .B 01FF .R are permanently assigned to the stack. .sp 12 .NH 2 The status register .PP The status register maintains six status flags and a master interrupt control bit. .br These are the six status flags: Carry (c) Zero (z) Overflow (o) Sign (n) Decimal mode (d) Break (b) The bit (i) is the master interrupt control bit. .NH The MCS6500 memory layout. .PP In the MCS6500 memory space three area's have special meaning. These area's are: .IP 1) Top page. .IP 2) Zero page. .IP 3) The stack. .PP MCS6500 memory is divided up into pages. These pages consist 256 bytes. So in a memory address the highbyte denotes the page number and the lowbyte the offset within the page. .NH 2 Top page. .PP When a MCS6500 is restared it jumps indirect via memory address .B FFFC. .R At .B FFFC .R (lowbyte) and .B FFFD .R (highbyte) there must be the address of the bootstrap subroutine. When a break instruction (BRK) occurs or an interrupt takes place, the MCS6500 jumps indirect through memory address .B FFFE. .R .B FFFE .R and .B FFFF .R thus, must contain the address of the interrupt routine. The former only goes for maskeble interrupt. There also exist a nonmaskeble interrupt. This cause the MCS6500 to jump indirect through memory address .B FFFA. .R So the top six bytes of memory are used by the operating system and therefore not available for the back end. .NH 2 Zero page. .PP This page has a special meaning in the sence that addressing this page uses special opcodes. Since a page consists of 256 bytes, only one byte is needed for addressing zero page. So an instruction which uses zero page occupies two bytes. It also uses less clock cycle's while carrying out the instruction. Zero page is also needed when indirect addressing is used. This means that when indirect addressing is used, the address must reside in zero page (two consecutive bytes). In this case (the back end), zero page is used, for example to hold the local base, the second local base, the stack pointer etc. .NH 2 The stack. .PP The stack is described in paragraph 3.5 about the MCS6500 programmable registers. .NH The memory adressing modes .PP MCS6500 memory reference instructions use direct addressing, indexed addressing, and indirect addressing. .NH 2 direct addressing. .PP Three-byte instructions use the second and third bytes of the object code to provide a direct 16-bit address: therefore, 65.536 bytes of memory can be addressed directly. The commonly used memory reference instructions also have a two-byte object code variation, where the second byte directly addresses one of the first 256 bytes. .NH 2 Base page, indexed addressing. .PP In this case, the instruction has two bytes of object code. The contents of either the X or Y index registers are added to the second object code byte in order to compute a memory address. This may be illustrated as follows: .sp 15 Base page, indexed addressing, as illustrated above, is wraparound - which means that there is no carry. If the sum of the index register and second object code byte contents is more than .B FF .R , the carry bit will be dicarded. This may be illustrated as follows: .sp 9 .NH 2 Absolute indexed addressing. .PP In this case, the contents of either the X or Y register are added to a 16-bit direct address provided by the second and third bytes of an instruction's object code. This may be illustrated as follows: .sp 10 .NH 2 Indirect addressing. .PP Instructions that use simple indirect addressing have three bytes of object code. The second and third object code bytes provide a 16-bit address; therefore, the indirect address can be located anywhere in memory. This is straightforward indirect addressing. .NH 3 Pre-indexed indirect addressing. .PP In this case, the object code consists of two bytes and the second object code byte provides an 8-bit address. Instructions that use pre-indexed indirect addressing add the contents of the X index register and the second object code byte to access a memory location in the first 256 bytes of memory, where the indirect address will be found: .sp 18 When using pre-indexed indirect addressing, once again wraparound addition is used, which means that when the X index register contents are added to the second object code byte, any carry will be discarded. Note that only the X index register can be used with pre-indexed addressing. .NH 3 Post-indexed indirect addressing. .PP In this case, the object code consists of two bytes and the second object code byte provides an 8-bit address. Now the second object code byte indentifies a location in the first 256 bytes of memory where an indirect address will be found. The contents of the Y index register are added to this indirect address. This may be illustrated as follows: .sp 18 Note that only the Y index register can be used with post-indexed indirect addressing. .bp .NH What the CPU has and doesn't has. .PP Although the designers of the MCS6500 CPUs family state that there is nothing very significant about the short stack (only 256 bytes) this stack caused problems for the back end. The designers say that a 256-byte stack usually is sufficient for any typical microcomputer, this is only true if the stack is used only for return addresses of the JSR (jump to subroutine) instruction. But since the EM machine is suppost to be a stack machine and high level languages need the ability of parameters and locals in there procedures and function, this short stack is unsufficiant. So an software stack is implemented in this back end, requiring two additional subroutines for stack handling. These two stack handling subroutines slow down the processing time of a program since the stack is used heavely. .PP Since parameters and locals of EM procedures are offseted from the localbase of that procedure, indirect addressing is havily used. Offsets are positive (for parameters) and negative (for local variables). As explaned before the addressing modes the MCS6500 have a post indexed indirect addressing mode. This addressing mode can only handle positive offsets. This raises a problem for accessing the local variables I have chosen for the next solution. A second local base is introduced. This second local base is the real local base subtracted by a constant BASE. In the present situation of the back end the value of BASE is 240. This means that there are 240 bytes reseved for local variables to be indirect addressed and 14 bytes for the parameters. .DS C .B THE CODE GENERATOR. .R .DE .NH 0 Description of the machine table. .PP The machine description table consists of the following sections: .IP 1. The macro definitions. .IP 2. Constant definitions. .IP 3. Register definitions. .IP 4. Token definitions. .IP 5. Token expressions. .IP 6. Code rules. .IP 7. Move definitions. .IP 8. Test definitions. .IP 9. Stack definitions. .NH 2 Macro definitions. .PP The macro definitions at the top of the table are expanded by the preprocessor on occurence in the rest of the table. .NH 2 Constant definitions. .PP There are three constants which must be defined at first. The are: .IP EM_WSIZE: 11 Number of bytes in a machine word. This is the number of bytes a simple .B loc .R instruction will put on the stack. .IP EM_PSIZE: Number of bytes in a pointer. This is the number of bytes a .B lal .R instruction will put on the stack. .IP EM_BSIZE: Number of bytes in the hole between AB and LB. The calling sequence only saves LB on the stack so this constant is equal to the pointer size. .NH 1 Register definitions. .PP The only important register definition is the definition of the registerpair AX. Since the rest of the machine's registers Y, PC, ST serve special purposes, the code generator cannot use them. .NH 2 Token definitions .PP There is a fake token. This token is put in the table, since the code generator generator complains if it cannot find one. .NH 2 Token expression definitions. .PP The token expression is also a fake one. This token expression is put in the table, since the code generator generator complains if it cannot find one. .NH 2 Code rules. .PP The code rule section is the largest section in the table. They specify EM patterns, stack patterns, code to be generated, etc. The syntax is: .IP "code rule:" EM pattern '|' stack pattern '|' code '|' stack replacement '|' EM replacement '|' .PP All patterns are optional, however there must be at least one pattern present. If the EM pattern is missing the rule becomes a rewriting rule or a .B coercion .R to be used when code generation cannot continue because of an invalid stack pattern. The code rules are preceeded by the word CODE:. .NH 3 The EM pattern. .PP The EM pattern consists of a list of EM mnemonics followed by a boolean expression. Examples: .sp 1 .br .B loe .R .sp 1 will match a single .B loe .R instruction, .sp 1 .br .B loc loc cif .R $1==2 && $2==8 .sp 1 is a pattern that will match .sp 1 .br .B loc .R 2 .br .B loc .R 8 .br .B cif .R .sp 1 and .sp 1 .br .B lol inc stl .R $1==$3 .sp 1 will match for example .sp 1 .br .B lol .R 6 .br .B inc .R .br .B stl .R 6 .sp 1 A missing boolean expession evaluates to TRUE. .PP The code generator will match the longest EM pattern on every occasion, if two patterns of the same length match the first in the table will be chosen, while all patterns of length greater than or equal to three are considered to be of the same length. .NH 3 The stack pattern. .PP The only stack pattern that can occur is R16, which means that the registerpair AX contains the word on top of the stack. If this is not the case a coersion occurs. This coersion generates a "jsr Pop", which means that the top of the stack is popped and stored in the registerpair AX. .NH 3 The code part. .PP The code part consists of three parts, stack cleanup, register allocation, and code to be generated. All of these may be omitted. .NH 4 Stack cleanup. .PP When generating something like a branch instruction it might be needed to empty the fake stack, that is, remove the AX registerpair. This is done by the instruction remove(ALL) .NH 4 Register allocation. .PP If the machine code to be generated uses the registerpair AX, this is signaled to the code generator by the allocate(R16) instruction. If the registerpair AX resides on the fake stack, this will result in a "jsr Push", which means that the registerpair AX is pushed on the stack and will be free for further use. If registerpair AX is not on the fake stack nothing happens. .NH 4 Code to be generated. .PP Code to be generated is specified as a list of items of the following kind: .IP 1) A string in double quotes("This is a string"). This is copied to the codefile and a newline ('\n') is appended. Inside the string all normal C string conventions are allowed, and substitutions can be made of the following sorts. .RS .IP a) $1, $2 etc. These are the operand of the corresponding EM instructions and are printed according to there type. To put a real '$' inside the string it must be doubled ('$$'). .IP b) %[1], %[2.reg], %[b.1] etc. these have there obvious meaning. If they describe a complete token (%[1]) the printformat for the token is used. If they stand fo a basic term in an expression they will be printed according to their type. To put a real '%' inside the string it must be doubled ('%%'). .IP c) %( arbitrary expression %). This allows inclusion of arbitrary expressions inside strings. Usually not needed very often, so that the akward notation is not too bad. Note that %(%[1]%) is equivalent to %[1]. .RE .NH 3 stack replacement. .PP The stack replacement is a possibly empty list of items to be pushed on the fake stack. Three things can occur: .IP 1) %[1] is used if the registerpair AX was on the fake stack and is to be pushed back onto it. .IP 2) %[a] is used if the registerpair AX is allocated with allocate(R16) and is to be pushed onto the fake stack. .IP 3) It can also be empty. .NH 3 EM replacement. .PP In exeptional cases it might be useful to leave part of the an EM pattern undone. For example, a .B sdl .R instruction might be split into two .B stl .R instructions when there is no 4-byte quantity on the stack. The EM replacement part allows one to express this. Example: .sp 1 .br .B stl .R $1 .B stl .R $1+2 .sp 1 The instructions are inserted in the stream so they can match the first part of a pattern in the next step. Note that since the code generator traverses the EM instructions in a strict linear fashion, it is impossible to let the EM replacement match later parts of a pattern. So if there is a pattern .sp 1 .br .B loc stl .R $1==0 .sp 1 and the input is .sp 1 .br .B loc .R 0 .B sdl .R 4 .sp 1 the .B loc .R 0 will be processed first, then the .B sdl .R might be split into two .B stl .R 's but the pattern cannot match now. .NH 3 Move definitions. .PP This definition is a fake. This definition is put in the table, since the code generator generator complains if it cannot find one. .NH 3 Test definitions. .PP Test definitions aren't used by the table. .NH 3 Stack definitions. .PP When the generator has to push the registerpair AX, it must know how to do so. The machine code to be generated is defined here. .NH 1 Some remarks. .PP The above description of the machine table is a description of the table for the MCS6500. It uses only a part of the possibilities which the code generator generator offers. For a more precise and detailed description see [2]. .DS C .B THE BACK END TABLE. .R .DE .NH 0 Introduction. .PP The code rules are divided in 15 groups. These groups are: .IP 1. Load instructions. .IP 2. Store instructions. .IP 3. Integer arithmetic instructions. .IP 4. Unsigned arithmetic instructions. .IP 5. Floating point arithmetic instructions. .IP 6. Pointer arithmetic instructions. .IP 7. Increment, decrement and zero instructions. .IP 8. Convert instructions. .IP 9. Logical instructions. .IP 10. Set manipulation instructions. .IP 11. Array instructions. .IP 12. Compare instructions. .IP 13. Branch instructions. .IP 14. Procedure call instructions. .IP 15. Miscellaneous instructions. .PP From all of these groups one or two typical EM pattern will be explained in the next paragraphs. Comment is placed between /* and */ (/* This is a comment */). .NH The instructions. .NH 2 The load instructions. .PP In this group a typical instruction is .B lol .R . A .B lol .R instruction pushes the word at local base + offset, where offset is the instructions argument, onto the stack. Since the MCS6500 can only offset by 256 bytes, as explaned at the memory addressing modes, there is a need for two code rules in the table. One which can offset directly and one that must explicit calculate the address of the local. .NH 3 The lol instruction with indirect offsetting. .PP In this case an indirect offsetted load from the second local base is possible. The table content is: .sp 1 .br .B lol .R IN($1) | | .br allocate(R16) /* allocate registerpair AX */ .br "ldy #BASE+$1" /* load Y with the offset from the second .br local base */ .br "lda (LBl),y" /* load indirect the lowbyte of the word */ .br "tax" /* move register A to register X */ .br "iny" /* increment register Y (offset) */ .br "lda (LBl),y" /* load indirect the highbyte of the word */ .br | %[a] | | /* push the word onto the fake stack */ .NH 3 The lol instruction whose offset is to big. .PP In this case, the library subroutine "Lol" is used. This subroutine expects the offset in registerpair AX, then calculates the address of the local or parameter, and loads it into registerpair AX. The table content is: .sp 1 .br .B lol .R | | .br allocate(R16) /* allocate registerpair AX */ .br "lda #[$1].h" /* load highbyte of offset into register A */ .br "ldx #[$1].l" /* load lowbyte of offset into register X */ .br "jsr Lol" /* perform the subroutine */ .br | %[a] | | /* push word onto the fake stack */ .NH 2 The store instructions. .PP In this group a typical instruction is .B stl. .R A .B stl .R instruction poppes a word from the stack and stores it in the word at local base + offset, where offset is the instructions argument. Here also is the need for two code rules in the table as a result of the offset limits. .NH 3 The stl instruction with indirect offsetting. .PP In this case it an indirect offsetted store from the second local base is possible. The table content is: .sp 1 .br .B stl .R IN($1) | R16 | /* expect registerpair AX on top of the .br fake stack */ .br "ldy #BASE+1+$1" /* load Y with the offset from the .br second local base */ .br "sta (LBl),y" /* store the highbyte of the word from A */ .br "txa" /* move register X to register A */ .br "dey" /* decrement offset */ .br "sta (LBl),y" /* store the lowbyte of the word from A */ .br | | | .NH 3 The stl instruction whose offset is to big. .PP In this case the library subroutine 'Stl' is used. This subroutine expects the offset in registerpair AX, then calculates the address, poppes the word stores it at its place. The table content is: .sp 1 .br .B stl .R | | .br allocate(R16) /* allocate registerpair AX */ .br "lda #[$1].h" /* load highbyte of offset in register A */ .br "ldx #[$1].l" /* load lowbyte of offset in register X */ .br "jsr Stl" /* perform the subroutine */ .br | | | .NH 2 Integer arithmetic instructions. .PP In this group typical instructions are .B adi .R and .B mli. .R These instructions, in this table, are implemented for 2-byte and 4-byte integers. The only arithmetic instructions available on the MCS6500 are the ADC (add with carry), and SBC (subtract with not(carry)). Not(carry) here means that in a subtraction, the one's complement of the carry is taken. The absence of multiply and division instructions forces the use of subroutines to handle these cases. Because there are no registers left to perform on the multiply and division, zero page is used here. The 4-byte integer arithmetic is implemented, because in C there exists the integer type long. A user is freely to use the type long, but will pay in performance. .NH 3 The adi instruction. .PP In case of the .B adi .R 2 (and .B sbi .R 2) instruction there are many EM patterns, so that the instruction can be performed in line in most cases. For the worst case there exists a subroutine in the library which deals with the EM instruction. In case of a .B adi .R 4 (or .B sbi .R 4) there only is a subroutine to deal with it. A table content is: .sp 1 .br .B lol lol adi .R (IN($1) && IN($2) && $3==2) | | /* is it in range */ .br allocate(R16) /* allocate registerpair AX */ .br "ldy #BASE+$1+1" /* load Y with offset for first operand */ .br "lda (LBl),y" /* load indirect highbyte first operand */ .br "pha" /* save highbyte first operand on hard_stack */ .br "dey" /* decrement offset first operand */ .br "lda (LBl),y" /* load indirect lowbyte first operand */ .br "ldy #BASE+$2" /* load Y with offset for second operand */ .br "clc" /* clear carry for addition */ .br "adc (LBl),y" /* add the lowbytes of the operands */ .br "tax" /* store lowbyte of result in place */ .br "iny" /* increment offset second operand */ .br "pla" /* get highbyte first operand */ .br "adc (LBl),y" /* add the highbytes of the operands */ .br | %[a] | | /* push the result onto the fake stack */ .NH 3 The mli instruction. .PP The .B mli .R 2 instruction uses most the subroutine 'Mlinp'. This subroutine expects the multiplicand in zero page at locations ARTH, ARTH+1, while the multiplier is in zero page locations ARTH+2, ARTH+3. For a description of the algorithms used for multiplication and division, see [3]. A table content is: .sp 1 .br .B lol lol mli .R (IN($1) && IN($2) && $3==2) | | .br allocate(R16) /* allocate registerpair AX */ .br "ldy #BASE+$1" /* load Y with offset of multiplicand */ .br "lda (LBl),y" /* load indirect lowbyte of multiplicand */ .br "sta ARTH" /* store lowbyte in zero page */ .br "iny" /* increment offset of multiplicand */ .br "lda (LBl),y" /* load indirect highbyte of multiplicand */ .br "sta ARTH+1" /* store highbyte in zero page */ .br "ldy #BASE+$2" /* load Y with offset of multiplier */ .br "lda (LBl),y" /* load indirect lowbyte of multiplier */ .br "sta ARTH+2" /* store lowbyte in zero page */ .br "iny" /* increment offset of multiplier */ .br "lda (LBl),y" /* load indirect highbyte of multiplier */ .br "sta ARTH+3" /* store highbyte in zero page */ .br "jsr Mlinp" /* perform the multiply */ .br | %[a] | | /* push result onto fake stack */ .NH 2 The unsgned arithmetic instructions. .PP Since unsigned addition an subtraction is performed in the same way as signed addition and subtraction, these cases are dealt with by an EM replacement. For mutiplication and division there are special subroutines. .NH 3 Unsigned addition. .PP This is an example of the EM replacement strategy. .sp 1 .br .B lol lol adu .R | | | | .B lol .R $1 .B lol .R $2 .B adi .R $3 | .NH 2 Floating point arithmetic. .PP Floating point arithmetic isn't implemented in this table. .NH 2 Pointer arithmetic instructions. .PP A typical pointer arithmetic instruction is .B adp .R 2. This instruction adds an offset and a pointer. A table content is: .sp 1 .br .B adp .R | | | | .B loc .R $1 .B adi .R 2 | .NH 2 Increment, decrement and zero instructions. .PP In this group a typical instruction is .B inl .R , which increments a local or parameter. The MCS6500 doesn't have an instruction to increment the accumulator A, so the 'ADC' instruction must be used. A table content is: .sp 1 .br .B inl .R IN($1) | | .br allocate(R16) /* allocate registerpair AX */ .br "ldy #BASE+$1" /* load Y with offset of the local */ .br "clc" /* clear carry for addition */ .br "lda (LBl),y" /* load indirect lowbyte of local */ .br "adc #1" /* increment lowbyte */ .br "sta (LBl),y" /* restore indirect the incremented lowbyte */ .br "bcc 1f" /* if carry is clear then ready */ .br "iny" /* increment offset of local */ .br "lda (LBl),y" /* load indirect highbyte of local */ .br "adc #0" /* add carry to highbyte */ .br "sta (LBl),y\\n1:" /* restore indirect the highbyte */ .PP If the offset of the local or parameter is to big, first the local or parameter is fetched, than incremented, and then restored. .NH 2 Convert instructions. .PP In this case there are two convert instructions which really do something. One of them is in line code, and deals with the extension of a character (1-byte) to an integer. The other one is a subroutine which handles the conversion between 2-byte integers and 4-byte integers. .NH 3 The in line conversion. .PP The table content is: .sp 1 .br .B loc loc cii .R $1==1 && $2==2 | R16 | .br "txa" /* see if sign extension is needed */ .br "bpl 1f" /* there is no need for sign extension */ .br "lda #0FFh" /* sign extension here */ .br "bne 2f" /* conversion ready */ .br "1: lda #0\\n2:" /* no sign extension here */ .NH 2 Logical instructions. .PP A typical instruction in this group is the logical .B and .R on two 2-byte words. The logical .B and .R on groups of more than two bytes (max 254) is also possible and uses a library subroutine. .NH 3 The logical and on 2-byte groups. .PP The table content is: .sp 1 .br .B and .R $1==2 | R16 | /* one group must be on the fake stack */ .br "sta ARTH+1" /* temporary save of first group highbyte */ .br "stx ARTH" /* temporary save of first group lowbyte */ .br "jsr Pop" /* pop second group from the stack */ .br "and ARTH+1" /* logical and on highbytes */ .br "pha" /* temporary save the result's highbyte */ .br "txa" /* logical and can only be done in A */ .br "and ARTH" /* logical and on lowbytes */ .br "tax" /* restore results lowbyte */ .br "pla" /* restore results highbyte */ .br | %[1] | | /* push result onto fake stack */ .NH 2 Set manipulation instructions. .PP A typical EM pattern in this group is .B loc inn zeq .R $1>0 && $1<16 && $2==2. This EM pattern works on sets of 16 bits. Sets can be bigger (max 256 bytes = 2048 bits), but than a library routine is used instead of in line code. The table content of the above EM pattern is: .sp 1 .br .B loc inn zeq .R $1>0 && $1<16 && $2==2 | R16 | .br "ldy #$1+1" /* load Y with bit number */ .br "stx ARTH" /* cannot rotate X, so use zero page */ .br "1: lsr a" /* right shift A */ .br "ror ARTH" /* right rotate zero page location */ .br "dey" /* decrement Y */ .br "bne 1b" /* shift $1 times */ .br "bcc $1" /* no carry, so bit is zero */ .NH 2 Array instructions. .PP In this group a typical EM pattern is .B lae lar .R defined(rom(1,3)) | | | | .B lae .R $1 .B aar .R $2 .B loi .R rom(1,3). This pattern uses the .B aar .R instruction, which is part of a typical EM pattern: .sp 1 .br .B lae aar .R $2==2 && rom(1,3)==2 && rom(1,1)==0 | R16 | /* registerpair AX contains the index in the array */ .br "pha" /* save highbyte of index */ .br "txa" /* move lowbyte of index to A */ .br "asl a" /* shift left lowbyte == 2 times lowbyte */ .br "tax" /* restore lowbyte */ .br "pla" /* restore highbyte */ .br "rol a" /* rotate left highbyte == 2 times highbyte */ .br | %[1] | adi 2 | /* push new index, add to lowerbound array */ .NH 2 Compare instructions. .PP In this group all EM patterns are performed by calling a subroutine. Subroutines are used here because comparison is only possible byte by byte. This means a lot of code, and since compare are used frequently a lot of in line code would be generated, and thus reducing the space left for the software stack. These subroutines can be found in the library. .NH 2 Branch instructions. .PP A typical branch instruction is .B beq. .R The table content for it is: .sp 1 .br .B beq .R | R16 | .br "sta BRANCH+1" /* save highbyte second operand in zero page */ .br "stx BRANCH" /* save lowbyte second operand in zero page */ .br "jsr Pop" /* pop the first operand */ .br "cmp BRANCH+1" /* compare the highbytes */ .br "bne 1f" /* there not equal so go on */ .br "cpx BRANCH" /* compare the lowbytes */ .br "beq $1\\n1:" /* lowbytes are also equal, so branch */ .PP Another typical instruction in this group is .B zeq. .R The table content is: .sp 1 .br .B zeq .R | R16 | .br "tay" /* move A to Y for setting testbits */ .br "bmi $1" /* highbyte s minus so branch */ .br "txa" /* move X to A for setting testbits */ .br "beq $1\\n1:" /* lowbyte also zero, thus branch */ .NH 2 Procedure call instructions. .PP In this group one code generation might seem a little akward. It is the EM instruction .B cai .R which generates a 'jsr Indir'. This is because there is no indirect jump_subroutine in the MCS6500. The only solution is to store the address in zero page, and then do a 'jsr' to a known label. At this label there must be an indirect jump instruction, which perform a jump to the address stored in zero page. In this case the label is Indir, and the address is stored in zero page at the addresses ADDR, ADDR+1. The tabel content is: .sp 1 .br .B cai .R | R16 | .br "stx ADDR" /* store lowbyte of address in zero page */ .br "sta ADDR+1" /* store highbyte of address in zero page */ .br "jsr Indir" /* use the indirect jump */ .br | | | .NH 2 Miscellaneous instructions. .PP In this group, as the name suggests, there is no typical EM instruction or EM pattern. Most of the MCS6500 code to be generated uses a library subroutine or is straightforward. .DS C .B PERFORMANCE. .R .DE .NH 0 Introduction. .PP To measure the performance of the back end table some timing tests are done. What to time? In this case, the execution time of several Pascal statements are timed. Statements in C, which have a Pascal equivalence are timed also. The statements are timed as follows. A test program is been written, which executes two nested for_loops from 1 to 1.000. Within these for_loops the statement, which is to be tested, is placed, so the statement will be executed 1.000.000 times. Then the same program is executed without the test statement. The time difference between the two executions is the time neccesairy to execute the test statement 1.000.000 times. The total time to execute the test statement requires thus the time difference divided by 1.000.000. .NH 0 Testing Pascal statements. .PP The next statements are tested. .IP 1) int1 := 0; .IP 2) int1 := int2 - 1; .IP 3) int1 := int1 + 1; .IP 4) int1 := icon1 - icon2; .IP 5) int1 := icon2 div icon1; .IP 6) int1 := int2 * int3; .IP 7) bool := (int1 < 0); .IP 8) bool := (int1 < 3); .IP 9) bool := ((int1 > 3) or (int1 < 3)) .IP 10) case int1 of 1: bool := false; 2: bool := true end; .IP 11) if int1 = 0 then int2 := 3; .IP 12) while int1 > 0 do int1 := int1 - 1; .IP 13) m := a[k]; .IP 14) let2 := ['a'..'c']; .IP 15) P3(x); .IP 16) dum := F3(x); .IP 17) s.overhead := 5400; .IP 18) with s do overhead := 5400; .PP These statement were tested in a procedure test. .sp 1 .br procedure test; .br var i, j, ... : integer; .br bool : boolean; .br let2 : set of char; .br begin .br for i := 1 to 1000 .br for j := 1 to 1000 .br STATEMENT .br end; .sp 1 .PP STATEMENT is one of the statements as shown above, or it is the empty statement. The assignment of used variables, if neccesairy, is done before the first for_loop. In case of the statement which uses the procedure call, statement 15, a dummy procedure is declared whose body is empty. In case of the statement which uses the function, statement 16, this function returns its argument. for the timing of C statements a similar test program was written. .sp 1 .br main() .br { .br int i, j, ...; .br for (i = 1; i <= 1000; i++) .br for (j = 1; j <= 1000; j++) .br STATEMENT .br } .sp 1 .NH The results. .PP Here are tables with the results of the time measurments. Times are in microseconds (10^-6). Some statements appear twice in the tables. In the second case an array of 200 integers was declerated before the variable to be tested, so this variable cannot be accessed by indirect addressing from the second local base. This results in a larger execution time of the statement to be tested. The column 68000 contains the times measured on a Bleasdale, M68000 based, computer. The times in column pdp are measured on a DEC pdp11/44, where the times from column 6500 come from a BBC microcomputer. .bp .TS expand; c s s s c c c c lw35 nw7 nw7 nw7. Pascal timing results statement 68000 pdp 6500 _ T{ int1 := 0; T} 4.0 5.8 16.7 4.0 4.2 97.8 _ T{ int1 := int2 - 1; T} 7.2 7.1 27.2 6.9 7.1 206.5 _ T{ int1 := int1 + 1; T} 6.9 6.8 27.2 6.4 6.7 106.5 _ T{ int1 := icon1 + icon2; T} 6.2 6.2 25.6 6.2 6.0 106.6 _ T{ int1 := icon2 div icon1; T} 14.9 14.3 372.6 14.9 14.7 453.7 _ T{ int1 := int2 * int3; T} 11.5 12.0 558.1 11.3 11.6 728.6 _ T{ bool := (int1 < 0); T} 7.2 6.9 122.8 7.8 8.1 453.2 _ T{ bool := (int1 < 3); T} 7.3 7.6 126.0 7.2 8.1 232.2 _ T{ bool := ((int1 > 3) or (int1 < 3)) T} 10.1 12.0 307.8 10.2 11.9 440.1 _ T{ case int1 of 1: bool := false; 2: bool := true end; T} 18.3 17.9 165.7 _ T{ if int1 = 0 then int2 := 3; T} 9.5 8.5 133.8 _ T{ while int1 > 0 do int1 := int1 - 1; T} 6.9 6.9 126.0 _ T{ m := a[k]; T} 7.2 6.8 134.3 _ T{ let2 := ['a'..'c']; T} 38.4 38.8 447.4 _ T{ P3(x); T} 18.9 18.8 180.3 _ T{ dum := F3(x); T} 26.8 27.1 343.3 _ T{ s.overhead := 5400; T} 4.6 4.1 16.7 _ T{ with s do overhead := 5400; T} 4.2 4.3 16.7 .TE .TS expand; c s s s c c c c lw35 nw7 nw7 nw7. C timing results statement 68000time pdptime 6500time _ T{ int1 = 0; T} 4.1 3.6 17.2 4.1 4.1 97.7 _ T{ int1 = int2 - 1; T} 6.6 6.9 27.2 6.1 6.5 206.4 _ T{ int1 = int1 + 1; T} 6.4 7.3 27.2 6.3 6.2 206.4 _ T{ int1 = int2 * int3; T} 11.4 12.3 522.6 9.6 10.1 721.2 _ T{ int1 = (int2 < 0); T} 7.2 7.6 126.4 7.4 7.7 232.5 _ T{ int1 = (int2 < 3); T} 7.0 7.5 126.0 7.8 7.8 232.6 _ T{ int1 = ((int2 > 3) || (int2 < 3)); T} 11.8 12.2 193.4 11.5 13.2 245.6 _ T{ switch (int1) { case 1: int1 = 0; break; case 2: int1 = 1; break; } T} 28.3 29.2 164.1 _ T{ if (int1 == 0) int2 = 3; T} 4.8 4.8 19.4 _ T{ while (int2 > 0) int2 = int2 - 1; T} 5.8 6.0 125.9 _ T{ int2 = a[int2]; T} 4.8 5.1 192.8 _ T{ P3(int2); T} 18.8 18.4 180.3 _ T{ int2 = F3(int2); T} 27.0 27.2 309.4 _ T{ s.overhead = 5400; T} 5.0 4.1 16.7 .TE .NH Pascal statements which don't have a C equivalent. .PP At first, the two statements who perform an operation on constants are left out. These are left out while the C front end does constant folding, while the Pascal front end doesn't. So in C the statements int1 = icon1 + icon2; and int1 = icon1 / icont2; will use the same amount of time since the expression is evaluated by the front end. The two other statements (let2 := ['a'..'c']; and .B with .R s .B do .R overhead := 5400;), aren't included in the C statement timing table, because there constructs do not exist in C. Although in C there can be direct bit manipulation, and thus can be used to implement sets I have not used it here. The .B with .R statement does not exists in C and there is nothing with the slightest resemblance to it. .PP At first sight in the table , it looked if there is no much difference in the times for the M68000 and the pdp11/44, in comparison with the times needed by the MCS6500. To verify this impression, I calculated the correlation coefficient between the times of the M68000 and pdp11/44. It turned out to be 0.997 for both the Pascal time tests and the C time tests. Since the correlation coefficient is near to one and the difference between the times is small, they can be considered to be the same as seen from the times of the MCS6500. Then I have tried to make a grafic of the times from the M68000 and the MCS6500. Well, there was't any correlation to been seen, taken all the times. The only correlation one could see, with some effort, was in the times for the first three Pascal statements. The two first C statements show also a correlation, which two points always do. .PP Also the three Pascal statements .B case .R , .B if .R , and .B while .R have a correlation coefficient of 0.999. This is probably because the .B case .R statement uses a subroutine in both cases and the other two statements .B if .R and, .B while .R generate in line code. The last two Pascal statements use the same time, since the front end wil generate the same EM code for both. .PP The independence between the rest of the test times is because in these cases the object code for the MCS6500 uses library subroutines, while the other processors can handle the EM code with in line code. .PP It is clear that the MCS6500 is a slower device, it needs longer execution times, the need of more library subroutines, but there is no constant factor between it execution times and those of other processors. .PP The slowing down of the MCS6500 as result of the need of a library subroutine is illustrated by the muliplication statement. The MCS6500 needs a library subroutine, while the other two processors have a machine instruction to perform the multiply. This results in a factor of 48.5, when the operands can be accessed indirect by the MCS6500. When the MCS6500 cannot access the operands indirectly the situation is even worse. The slight differences between the MCS6500 execution times for Pascal statements and C statements is probably the result of the front end, and thus beyond the scope of this discussion. .PP Another timing test is done in C on the statement k = i + j + 1983. This statement is tested on many UNIX* .FS * UNIX is a Trademark of Bell Laboratories. .FE systems. For a complete list see appendix A. The slowest one is the IBM XT, which runs on a 8088 microprocessor. The fasted one is the Amdahl computer. Here is short table to illustrate the performance of the MCS6500. .TS c c c c n n. machine short int IBM XT 53.4 53.4 Amdahl 0.5 0.3 MCS6500 150.2 150.2 .TE The MCS6500 is three times slower than the IBM XT, but threehundred times slower than the Amdahl. The reason why the times on the IBM XT and the MCS6500 are the same for short's and int's, is that most C compilers make the types short and integer the same size on 16-bit machines. In this project the MCS6500 is regarded as a 16-bit machine. .NH Length tests. .PP I have also compiled several programs written in Pascal and C to see if there is a resemblance between the number of bytes generated in the machine's language. In the tables: .IP length: 9 The number of bytes of the source program. .IP 68000: The number of bytes of the a.out file for a M68000. .IP pdp: The number of bytes of the a.out file for a pdp11/44. .IP 6500: The number of bytes of the a.out file for a MCS6500. .LP These are the results: .TS c s s s c c c c n n n n. Pascal programs length 68000 pdp 6500 _ 19946 14383 16090 26710 19484 20169 20190 35416 10849 10469 11464 18949 273 4221 5106 7944 1854 5807 6610 10301 .TE .TS c s s s c c c c n n n n. C progams length 68000 pdp 6500 _ 9444 6927 8234 11559 7655 14353 18240 26251 4775 11309 15934 19910 639 6337 9660 12494 .TE .PP In contrast to the execution times of the test statements, the object code files sizes show a constant factor between them. After calculating the correlation coefficient, I have calculated the line fitted between sizes. .FS * x is the number of bytes .FE .TS c s s c c c l c c. Pascal programs processor corr. coef. fitted line _ 68000-pdp 0.996 68000-6500 0.999 1.76x + 502* pdp-6500 0.999 1.80x - 1577 .TE .TS c s s c c c l c c. C programs processor corr. coef. fitted line _ 68000-pdp 0.974 68000-6500 0.992 1.80x + 502* pdp-6500 0.980 1.40x - 1577 .TE .PP As seen from the tables above the correlation coefficient for Pascal programs is better than the ones for C programs. Thus the line fits best for Pascal programs. With the formula of the best fitted line one can now estimate the size of the object code, which a program needs, for a MCS6500 without having the compiler at hand. One also can see from these formula that the object code generated for a MCS6500 is about 1.8 times more than for the other processors. Since the number of bytes in the source file havily depends on the programmer, how many spaces he or she uses, the size of the indenting in structured programs, etc., there is no correlation between the size of the source file and the size of the object file. Also the use of comments has its influence on the size. .bp .DS C .B SUMMARY. .R .DE .NH 0 Summary .PP In this chapter some final conclusions are made. .PP In spite of its simplicity, the MCS6500 is strong enough to implement a EM machine. A serious deficy of the MCS6500 is the missing of 16-bit general purpose registers, and especially the missing of a 16-bit stackpointer. As pointed out before, one 16-bit register can be simulated by a pair of 8-bit registers, in fact, the accumulator A to hold the highbyte, and the index register X to hold the lowbyte of the word. By lack of a 16-bit stackpointer, zero page must be used to hold a stackpointer and there are also two subroutines needed for manipulating the stack (Push and Pop). .PP As seen at the time tests, the simple instruction set of the MCS6500 forces the use of library subroutines. These library subroutines increas the execution time of the programs. .PP The sizes of the object code files show a strong correlation in contrast to the execution times. With this correlatiuon one canestimate the size of a program if it is to be used on a MCS6500. .bp .NH 0 .B REFERENCES. .R .IP 1. Osborn, A., Jacobson, S., and Kane, J. The Mos Technology MCS6500. .B An Introduction to Microcomputers , .R Volume II, Some Real Products (june 1977) chap. 9. .RS .PP A hardware description of some real existing CPU's, such as the Intel Z80, MCS6500, etc. is given in this book. .RE .IP 2. van Staveren, H. The table driven code generator from the Amsterdam Compiler Kit. Vrije Universiteit, Amsterdam, (July 11, 1983). .RS .PP The defining document for writing a back end table. .RE .IP 3. Tanenbaum, A.S. Structured Computer Organization. Prentice Hall. (1976). .RS .PP In this book computers are described as a hierarchy of levels, with each one performing some well-defined function. .RE