1893 lines
		
	
	
	
		
			42 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			1893 lines
		
	
	
	
		
			42 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| . \" $Header$"
 | |
| .RP
 | |
| .ND Dec 1984
 | |
| .TL
 | |
| .B
 | |
| A backend table for the 6500 microprocessor
 | |
| .R
 | |
| .AU
 | |
| Jan van Dalen
 | |
| .AB
 | |
| The backend table is part of the Amsterdam Compiler Kit (ACK).
 | |
| It translates the intermediate language family EM to a machine
 | |
| code for the MCS6500 microprocessor family.
 | |
| .AE
 | |
| .bp
 | |
| .DS C
 | |
| .B
 | |
| THE MCS6500 MICROPROCESSOR.
 | |
| .R
 | |
| .DE
 | |
| .NH 0
 | |
| Introduction
 | |
| .PP
 | |
| Why a back end table for the MCS6500 microprocessor family.
 | |
| Although the MCS6500 microprocessor family has an simple
 | |
| instruction set and internal structure, it is used in a
 | |
| variety of microcomputers and homecomputers.
 | |
| This is because of is low cost.
 | |
| As an example the Apple II, a well known and width spread
 | |
| microprocessor, uses the MCS6502 CPU.
 | |
| Also the BBC homecomputer, whose popularity is growing day
 | |
| by day uses the MCS6502 CPU.
 | |
| The BBC homecomputer is based on the MCS6502 CPU although 
 | |
| better and stronger microprocessors are available.
 | |
| The designers of Acorn computer Industries have probably
 | |
| choosen for the MCS6502 because of the amount of software
 | |
| available for this CPU.
 | |
| Since its width spreaded use, a variaty of software
 | |
| will be needed for it.
 | |
| One can think of games!!, administration programs,
 | |
| teaching programs, basic interpreters and other application
 | |
| programs.
 | |
| Even do it will not be possible to run the total compiler kit
 | |
| on a MCS6500 based computer, it is possible to write application
 | |
| programs in a high level language, such as Pascal or C on a
 | |
| minicomputer.
 | |
| These application programs can be tested and compiled on that
 | |
| minicomputer and put in a ROM (Read Only Memory), for example,
 | |
| cso that it an be executed by a MCS6500 CPU.
 | |
| The strategy of writing testprograms on a minicomputer, 
 | |
| compile it and then execute it on a MCS6500 based
 | |
| microprocessor is used by the development of the back end.
 | |
| The minicomputer used is M68000 based one, manufactured by
 | |
| Bleasdale Computer Systems Ltd..
 | |
| The micro- or homecomputer used is a BBC microcomputer,
 | |
| manufactured by Acorn Computer Ltd..
 | |
| .NH
 | |
| The MOS Technology MCS6500
 | |
| .PP
 | |
| The MCS6500 is as a family of CPU devices developed by MOS
 | |
| Technology [1].
 | |
| The members of the MCS6500 family are the same chips in a 
 | |
| different housing.
 | |
| The MCS6502, the big brother in the family, can handle 64k
 | |
| bytes of memory, while for example the MCS6504 can only handle
 | |
| 8k bytes of memory.
 | |
| This difference is due to the fact that the MCS6502 is in a
 | |
| 40 pins house and the MCS6504 has a 28 pins house, so less
 | |
| address lines are available.
 | |
| .bp
 | |
| .NH
 | |
| The MCS6500 CPU programmable registers
 | |
| .PP
 | |
| The MCS6500 series is based on the same chip so all have the
 | |
| same programmable registers.
 | |
| .sp 9
 | |
| .NH 2
 | |
| The accumulator A.
 | |
| .PP
 | |
| The accumulator A is the only register on which the arithmetic
 | |
| and logical instructions can be used.
 | |
| For example, the instruction ADC (add with carry) adds the
 | |
| contents of the accumulator A and a byte from memory or data.
 | |
| .NH 2
 | |
| The index register X.
 | |
| .PP
 | |
| As the name suggests this register can be used for some
 | |
| indirect addressing modes.
 | |
| The modes are explaned below.
 | |
| .NH 2
 | |
| The index register Y.
 | |
| .PP
 | |
| This register is, just as the index register X, used for
 | |
| certain indirect addressing modes.
 | |
| These addressing modes are different from the modes which
 | |
| use index register X.
 | |
| .NH 2
 | |
| The program counter PC
 | |
| .PP 
 | |
| This is the only 16-bit register available.
 | |
| It is used to point to the next instruction to be
 | |
| carried out.
 | |
| .NH 2
 | |
| The stack pointer SP
 | |
| .PP
 | |
| The stack pointer is an 8-bit register, so the stack can contain
 | |
| at most 256 bytes.
 | |
| The CPU always appends 00000001 as highbyte of any stack address,
 | |
| which means that memory locations
 | |
| .B
 | |
| 0100
 | |
| .R
 | |
| through
 | |
| .B
 | |
| 01FF
 | |
| .R
 | |
| are permanently assigned to the stack.
 | |
| .sp 12
 | |
| .NH 2
 | |
| The status register
 | |
| .PP
 | |
| The status register maintains six status flags and a master
 | |
| interrupt control bit.
 | |
| .br
 | |
| These are the six status flags:
 | |
|     Carry        (c)
 | |
|     Zero         (z)
 | |
|     Overflow     (o)
 | |
|     Sign         (n)
 | |
|     Decimal mode (d)
 | |
|     Break        (b)
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| The bit (i) is the master interrupt control bit.
 | |
| .NH
 | |
| The MCS6500 memory layout.
 | |
| .PP
 | |
| In the MCS6500 memory space three area's have special meaning.
 | |
| These area's are:
 | |
| .IP 1)
 | |
| Top page.
 | |
| .IP 2)
 | |
| Zero page.
 | |
| .IP 3)
 | |
| The stack.
 | |
| .PP
 | |
| MCS6500 memory is divided up into pages.
 | |
| These pages consist 256 bytes.
 | |
| So in a memory address the highbyte denotes the page number
 | |
| and the lowbyte the offset within the page.
 | |
| .NH 2
 | |
| Top page.
 | |
| .PP
 | |
| When a MCS6500 is restared it jumps indirect via memory address
 | |
| .B
 | |
| FFFC.
 | |
| .R
 | |
| At
 | |
| .B
 | |
| FFFC
 | |
| .R
 | |
| (lowbyte) and 
 | |
| .B
 | |
| FFFD
 | |
| .R
 | |
| (highbyte) there must be the address of the bootstrap subroutine.
 | |
| When a break instruction (BRK) occurs or an interrupt takes place,
 | |
| the MCS6500 jumps indirect through memory address
 | |
| .B
 | |
| FFFE.
 | |
| .R
 | |
| .B
 | |
| FFFE
 | |
| .R
 | |
| and 
 | |
| .B
 | |
| FFFF
 | |
| .R
 | |
| thus, must contain the address of the interrupt routine.
 | |
| The former only goes for maskeble interrupt.
 | |
| There also exist a nonmaskeble interrupt.
 | |
| This cause the MCS6500 to jump indirect through memory address
 | |
| .B
 | |
| FFFA.
 | |
| .R
 | |
| So the top six bytes of memory are used by the operating system
 | |
| and therefore not available for the back end.
 | |
| .NH 2
 | |
| Zero page.
 | |
| .PP
 | |
| This page has a special meaning in the sence that addressing
 | |
| this page uses special opcodes.
 | |
| Since a page consists of 256 bytes, only one byte is needed
 | |
| for addressing zero page.
 | |
| So an instruction which uses zero page occupies two bytes.
 | |
| It also uses less clock cycle's while carrying out the instruction.
 | |
| Zero page is also needed when indirect addressing is used.
 | |
| This means that when indirect addressing is used, the address must
 | |
| reside in zero page (two consecutive bytes).
 | |
| In this case (the back end), zero page is used, for example
 | |
| to hold the local base, the second local base, the stack pointer
 | |
| etc.
 | |
| .NH 2
 | |
| The stack.
 | |
| .PP
 | |
| The stack is described in paragraph 3.5 about the MCS6500
 | |
| programmable registers.
 | |
| .NH 
 | |
| The memory adressing modes
 | |
| .PP
 | |
| MCS6500 memory reference instructions use direct addressing,
 | |
| indexed addressing, and indirect addressing.
 | |
| .NH 2
 | |
| direct addressing.
 | |
| .PP
 | |
| Three-byte instructions use the second and third bytes of the
 | |
| object code to provide a direct 16-bit address:
 | |
| therefore, 65.536 bytes of memory can be addressed directly.
 | |
| The commonly used memory reference instructions also have a two-byte
 | |
| object code variation, where the second byte directly addresses
 | |
| one of the first 256 bytes.
 | |
| .NH 2
 | |
| Base page, indexed addressing.
 | |
| .PP
 | |
| In this case, the instruction has two bytes of object code.
 | |
| The contents of either the X or Y index registers are added to the 
 | |
| second  object code byte in order to compute a memory address.
 | |
| This may be illustrated as follows:
 | |
| .sp 15
 | |
| Base page, indexed addressing, as illustrated above, is 
 | |
| wraparound - which means that there is no carry.
 | |
| If the sum of the index register and second object code byte contents
 | |
| is more than
 | |
| .B
 | |
| FF
 | |
| .R
 | |
| , the carry bit will be dicarded.
 | |
| This may be illustrated as follows:
 | |
| .sp 9
 | |
| .NH 2
 | |
| Absolute indexed addressing.
 | |
| .PP
 | |
| In this case, the contents of either the X or Y register are added
 | |
| to a 16-bit direct address provided by the second and third bytes
 | |
| of an instruction's object code.
 | |
| This may be illustrated as follows:
 | |
| .sp 10
 | |
| .NH 2
 | |
| Indirect addressing.
 | |
| .PP
 | |
| Instructions that use simple indirect addressing have three bytes of
 | |
| object code.
 | |
| The second and third object code bytes provide a 16-bit address;
 | |
| therefore, the indirect address can be located anywhere in
 | |
| memory.
 | |
| This is straightforward indirect addressing.
 | |
| .NH 3
 | |
| Pre-indexed indirect addressing.
 | |
| .PP
 | |
| In this case, the object code consists of two bytes and the 
 | |
| second object code byte provides an 8-bit address.
 | |
| Instructions that use pre-indexed indirect addressing add the contents
 | |
| of the X index register and the second object code byte to access
 | |
| a memory location in the first 256 bytes of memory, where the 
 | |
| indirect address will be found:
 | |
| .sp 18
 | |
| When using pre-indexed indirect addressing, once again wraparound
 | |
| addition is used, which means that when the X index register contents
 | |
| are added to the second object code byte, any carry will be discarded.
 | |
| Note that only the X index register can be used with pre-indexed
 | |
| addressing.
 | |
| .NH 3
 | |
| Post-indexed indirect addressing.
 | |
| .PP
 | |
| In this case, the object code consists of two bytes and the
 | |
| second object code byte provides an 8-bit address.
 | |
| Now the second object code byte indentifies a location
 | |
| in the first 256 bytes of memory where an indirect address
 | |
| will be found.
 | |
| The contents of the Y index register are added to this indirect
 | |
| address.
 | |
| This may be illustrated as follows:
 | |
| .sp 18
 | |
| Note that only the Y index register can be used with post-indexed
 | |
| indirect addressing.
 | |
| .bp
 | |
| .NH
 | |
| What the CPU has and doesn't has.
 | |
| .PP
 | |
| Although the designers of the MCS6500 CPUs family state that
 | |
| there is nothing very significant about the short stack (only
 | |
| 256 bytes) this stack caused problems for the back end.
 | |
| The designers say that a 256-byte stack usually is sufficient
 | |
| for any typical microcomputer, this is only true if the stack
 | |
| is used only for return addresses of the JSR (jump to
 | |
| subroutine) instruction.
 | |
| But since the EM machine is suppost to be a stack machine and
 | |
| high level languages need the ability of parameters and
 | |
| locals in there procedures and function, this short stack
 | |
| is unsufficiant.
 | |
| So an software stack is implemented in this back end, requiring two
 | |
| additional subroutines for stack handling.
 | |
| These two stack handling subroutines slow down the processing time
 | |
| of a program since the stack is used heavely.
 | |
| .PP
 | |
| Since parameters and locals of EM procedures are offseted
 | |
| from the localbase of that procedure, indirect addressing
 | |
| is havily used.
 | |
| Offsets are positive (for parameters) and negative (for
 | |
| local variables).
 | |
| As explaned before the addressing modes the MCS6500 have a
 | |
| post indexed indirect addressing mode.
 | |
| This addressing mode can only handle positive offsets.
 | |
| This raises a problem for accessing the local variables
 | |
| I have chosen for the next solution.
 | |
| A second local base is introduced.
 | |
| This second local base is the real local base subtracted by
 | |
| a constant BASE.
 | |
| In the present situation of the back end the value of BASE
 | |
| is 240.
 | |
| This means that there are 240 bytes reseved for local
 | |
| variables to be indirect addressed and 14 bytes for
 | |
| the parameters.
 | |
| .DS C
 | |
| .B
 | |
| THE CODE GENERATOR.
 | |
| .R
 | |
| .DE
 | |
| .NH 0
 | |
| Description of the machine table.
 | |
| .PP
 | |
| The machine description table consists of the following sections:
 | |
| .IP 1.
 | |
| The macro definitions.
 | |
| .IP 2.
 | |
| Constant definitions.
 | |
| .IP 3.
 | |
| Register definitions.
 | |
| .IP 4.
 | |
| Token definitions.
 | |
| .IP 5.
 | |
| Token expressions.
 | |
| .IP 6.
 | |
| Code rules.
 | |
| .IP 7.
 | |
| Move definitions.
 | |
| .IP 8.
 | |
| Test definitions.
 | |
| .IP 9.
 | |
| Stack definitions.
 | |
| .NH 2
 | |
| Macro definitions.
 | |
| .PP
 | |
| The macro definitions at the top of the table are expanded
 | |
| by the preprocessor on occurence in the rest of the table.
 | |
| .NH 2
 | |
| Constant definitions.
 | |
| .PP
 | |
| There are three constants which must be defined at first.
 | |
| The are:
 | |
| .IP EM_WSIZE: 11
 | |
| Number of bytes in a machine word.
 | |
| This is the number of bytes a simple
 | |
| .B
 | |
| loc
 | |
| .R
 | |
| instruction will put on the stack.
 | |
| .IP EM_PSIZE:
 | |
| Number of bytes in a pointer.
 | |
| This is the number of bytes a
 | |
| .B
 | |
| lal
 | |
| .R
 | |
| instruction will put on the stack.
 | |
| .IP EM_BSIZE:
 | |
| Number of bytes in the hole between AB and LB.
 | |
| The calling sequence only saves LB on the stack so this
 | |
| constant is equal to the pointer size.
 | |
| .NH 1
 | |
| Register definitions.
 | |
| .PP
 | |
| The only important register definition is the definition of
 | |
| the registerpair AX.
 | |
| Since the rest of the machine's registers Y, PC, ST serve
 | |
| special purposes, the code generator cannot use them.
 | |
| .NH 2
 | |
| Token definitions
 | |
| .PP
 | |
| There is a fake token.
 | |
| This token is put in the table, since the code generator generator
 | |
| complains if it cannot find one.
 | |
| .NH 2
 | |
| Token expression definitions.
 | |
| .PP
 | |
| The token expression is also a fake one.
 | |
| This token expression is put in the table, since the code generator
 | |
| generator complains if it cannot find one.
 | |
| .NH 2
 | |
| Code rules.
 | |
| .PP
 | |
| The code rule section is the largest section in the table.
 | |
| They specify EM patterns, stack patterns, code to be generated,
 | |
| etc.
 | |
| The syntax is:
 | |
| .IP code rule:
 | |
| EM pattern '|' stack pattern '|' code '|'
 | |
| stack replacement '|' EM replacement '|'
 | |
| .PP
 | |
| All patterns are optional, however there must be at least one
 | |
| pattern present.
 | |
| If the EM pattern is missing the rule becomes a rewriting
 | |
| rule or a
 | |
| .B
 | |
| coercion
 | |
| .R
 | |
| to be used when code generation cannot continue because of an
 | |
| invalid stack pattern.
 | |
| The code rules are preceeded by the word CODE:.
 | |
| .NH 3
 | |
| The EM pattern.
 | |
| .PP
 | |
| The EM pattern consists of a list of EM mnemonics followed by
 | |
| a boolean expression. Examples:
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| loe
 | |
| .R
 | |
| .sp 1
 | |
| will match a single
 | |
| .B
 | |
| loe
 | |
| .R
 | |
| instruction,
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| loc loc cif
 | |
| .R
 | |
| $1==2 && $2==8
 | |
| .sp 1
 | |
| is a pattern that will match
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| loc
 | |
| .R
 | |
| 2
 | |
| .br
 | |
| .B
 | |
| loc
 | |
| .R
 | |
| 8
 | |
| .br
 | |
| .B
 | |
| cif
 | |
| .R
 | |
| .sp 1
 | |
| and
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| lol
 | |
| inc
 | |
| stl
 | |
| .R
 | |
| $1==$3
 | |
| .sp 1
 | |
| will match for example
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| lol
 | |
| .R
 | |
| 6
 | |
| .br
 | |
| .B
 | |
| inc
 | |
| .R
 | |
| .br
 | |
| .B
 | |
| stl
 | |
| .R
 | |
| 6
 | |
| .sp 1
 | |
| A missing boolean expession evaluates to TRUE.
 | |
| .PP
 | |
| The code generator will match the longest EM pattern on every occasion,
 | |
| if two patterns of the same length match the first in the table
 | |
| will be chosen, while all patterns of length greater than or equal
 | |
| to three are considered to be of the same length.
 | |
| .NH 3
 | |
| The stack pattern.
 | |
| .PP
 | |
| The only stack pattern that can occur is R16, which means that the
 | |
| registerpair AX contains the word on top of the stack.
 | |
| If this is not the case a coersion occurs.
 | |
| This coersion generates a "jsr Pop", which means that the top
 | |
| of the stack is popped and stored in the registerpair AX.
 | |
| .NH 3
 | |
| The code part.
 | |
| .PP
 | |
| The code part consists of three parts, stack cleanup, register
 | |
| allocation, and code to be generated.
 | |
| All of these may be omitted.
 | |
| .NH 4
 | |
| Stack cleanup.
 | |
| .PP
 | |
| When generating something like a branch instruction it might be
 | |
| needed to empty the fake stack, that is, remove the AX registerpair.
 | |
| This is done by the instruction remove(ALL)
 | |
| .NH 4
 | |
| Register allocation.
 | |
| .PP
 | |
| If the machine code to be generated uses the registerpair AX,
 | |
| this is signaled to the code generator by the allocate(R16)
 | |
| instruction.
 | |
| If the registerpair AX resides on the fake stack, this will result
 | |
| in a "jsr Push", which means that the registerpair AX is pushed on
 | |
| the stack and will be free for further use.
 | |
| If registerpair AX is not on the fake stack nothing happens.
 | |
| .NH 4
 | |
| Code to be generated.
 | |
| .PP
 | |
| Code to be generated is specified as a list of items of the following
 | |
| kind:
 | |
| .IP 1)
 | |
| A string in double quotes("This is a string").
 | |
| This is copied to the codefile and a newline ('\n') is appended.
 | |
| Inside the string all normal C string conventions are allowed,
 | |
| and substitutions can be made of the following sorts.
 | |
| .RS
 | |
| .IP a)
 | |
| $1, $2 etc. These are the operand of the corresponding EM 
 | |
| instructions and are printed according to there type.
 | |
| To put a real '$' inside the string it must be doubled ('$$').
 | |
| .IP b)
 | |
| %[1], %[2.reg], %[b.1] etc. these have there obvious meaning.
 | |
| If they describe a complete token (%[1]) the printformat for
 | |
| the token is used.
 | |
| If they stand fo a basic term in an expression they will be
 | |
| printed according to their type.
 | |
| To put a real '%' inside the string it must be doubled ('%%').
 | |
| .IP c)
 | |
| %( arbitrary expression %). This allows inclusion of arbitrary
 | |
| expressions inside strings.
 | |
| Usually not needed very often, so that the akward notation
 | |
| is not too bad.
 | |
| Note that %(%[1]%) is equivalent to %[1].
 | |
| .RE
 | |
| .NH 3
 | |
| stack replacement.
 | |
| .PP
 | |
| The stack replacement is a possibly empty list of items to be
 | |
| pushed on the fake stack.
 | |
| Three things can occur:
 | |
| .IP 1)
 | |
| %[1] is used if the registerpair AX was on the fake stack and is
 | |
| to be pushed back onto it.
 | |
| .IP 2)
 | |
| %[a] is used if the registerpair AX is allocated with allocate(R16)
 | |
| and is to be pushed onto the fake stack.
 | |
| .IP 3)
 | |
| It can also be empty.
 | |
| .NH 3
 | |
| EM replacement.
 | |
| .PP
 | |
| In exeptional cases it might be useful to leave part of the an EM
 | |
| pattern undone.
 | |
| For example, a
 | |
| .B
 | |
| sdl
 | |
| .R
 | |
| instruction might be split into two
 | |
| .B
 | |
| stl
 | |
| .R
 | |
| instructions when there is no 4-byte quantity on the stack.
 | |
| The EM replacement part allows one to express this.
 | |
| Example:
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| stl
 | |
| .R
 | |
| $1
 | |
| .B
 | |
| stl
 | |
| .R
 | |
| $1+2
 | |
| .sp 1
 | |
| The instructions are inserted in the stream so they can match
 | |
| the first part of a pattern in the next step.
 | |
| Note that since the code generator traverses the EM instructions
 | |
| in a strict linear fashion, it is impossible to let the EM
 | |
| replacement match later parts of a pattern.
 | |
| So if there is a pattern
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| loc
 | |
| stl
 | |
| .R
 | |
| $1==0
 | |
| .sp1
 | |
| and the input is
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| loc
 | |
| .R
 | |
| 0
 | |
| .B
 | |
| sdl
 | |
| .R
 | |
| 4
 | |
| .sp 1
 | |
| the
 | |
| .B
 | |
| loc
 | |
| .R
 | |
| 0
 | |
| will be processed first, then the
 | |
| .B
 | |
| sdl
 | |
| .R
 | |
| might be split into two
 | |
| .B
 | |
| stl
 | |
| .R
 | |
| 's but the pattern cannot match now.
 | |
| .NH 3
 | |
| Move definitions.
 | |
| .PP
 | |
| This definition is a fake. This definition is put in the
 | |
| table, since the code generator generator complains if it
 | |
| cannot find one.
 | |
| .NH 3
 | |
| Test definitions.
 | |
| .PP
 | |
| Test definitions aren't used by the table.
 | |
| .NH 3
 | |
| Stack definitions.
 | |
| .PP
 | |
| When the generator has to push the registerpair AX, it must
 | |
| know how to do so.
 | |
| The machine code to be generated is defined here.
 | |
| .NH 1
 | |
| Some remarks.
 | |
| .PP
 | |
| The above description of the machine table is
 | |
| a description of the table for the MCS6500.
 | |
| It uses only a part of the possibilities which the code generator
 | |
| generator offers.
 | |
| For a more precise and detailed description see [2].
 | |
| .DS C
 | |
| .B
 | |
| THE BACK END TABLE.
 | |
| .R
 | |
| .DE
 | |
| .NH 0
 | |
| Introduction.
 | |
| .PP
 | |
| The code rules are divided in 15 groups.
 | |
| These groups are:
 | |
| .IP 1.
 | |
| Load instructions.
 | |
| .IP 2.
 | |
| Store instructions.
 | |
| .IP 3.
 | |
| Integer arithmetic instructions.
 | |
| .IP 4.
 | |
| Unsigned arithmetic instructions.
 | |
| .IP 5.
 | |
| Floating point arithmetic instructions.
 | |
| .IP 6.
 | |
| Pointer arithmetic instructions.
 | |
| .IP 7.
 | |
| Increment, decrement and zero instructions.
 | |
| .IP 8.
 | |
| Convert instructions.
 | |
| .IP 9.
 | |
| Logical instructions.
 | |
| .IP 10.
 | |
| Set manipulation instructions.
 | |
| .IP 11.
 | |
| Array instructions.
 | |
| .IP 12.
 | |
| Compare instructions.
 | |
| .IP 13.
 | |
| Branch instructions.
 | |
| .IP 14.
 | |
| Procedure call instructions.
 | |
| .IP 15.
 | |
| Miscellaneous instructions.
 | |
| .PP
 | |
| From all of these groups one or two typical EM pattern will be explained
 | |
| in the next paragraphs.
 | |
| Comment is placed between /* and */ (/* This is a comment */).
 | |
| .NH
 | |
| The instructions.
 | |
| .NH 2
 | |
| The load instructions.
 | |
| .PP
 | |
| In this group a typical instruction is
 | |
| .B
 | |
| lol
 | |
| .R
 | |
| .
 | |
| A
 | |
| .B
 | |
| lol
 | |
| .R
 | |
| instruction pushes the word at local base + offset, where offset
 | |
| is the instructions argument, onto the stack.
 | |
| Since the MCS6500 can only offset by 256 bytes, as explaned at the
 | |
| memory addressing modes, there is a need for two code rules in the
 | |
| table.
 | |
| One which can offset directly and one that must explicit
 | |
| calculate the address of the local.
 | |
| .NH 3
 | |
| The lol instruction with indirect offsetting.
 | |
| .PP
 | |
| In this case an indirect offsetted load from the second local base
 | |
| is possible.
 | |
| The table content is:
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| lol
 | |
| .R
 | |
| IN($1) | |
 | |
| .br
 | |
| allocate(R16)	/* allocate registerpair AX */
 | |
| .br
 | |
| "ldy #BASE+$1"	/* load Y with the offset from the second
 | |
| .br
 | |
| 					      local base */
 | |
| .br
 | |
| "lda (LBl),y"	/* load indirect the lowbyte of the word */
 | |
| .br
 | |
| "tax"		/* move register A to register X */
 | |
| .br
 | |
| "iny"		/* increment register Y (offset) */
 | |
| .br
 | |
| "lda (LBl),y"	/* load indirect the highbyte of the word */
 | |
| .br
 | |
| | %[a] | |	/* push the word onto the fake stack */
 | |
| .NH 3
 | |
| The lol instruction whose offset is to big.
 | |
| .PP
 | |
| In this case, the library subroutine "Lol" is used.
 | |
| This subroutine expects the offset in registerpair AX, then
 | |
| calculates the address of the local or parameter, and loads
 | |
| it into registerpair AX.
 | |
| The table content is:
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| lol
 | |
| .R
 | |
| | |
 | |
| .br
 | |
| allocate(R16)	/* allocate registerpair AX */
 | |
| .br
 | |
| "lda #[$1].h"	/* load highbyte of offset into register A */
 | |
| .br
 | |
| "ldx #[$1].l"	/* load lowbyte of offset into register X */
 | |
| .br
 | |
| "jsr Lol"	/* perform the subroutine */
 | |
| .br
 | |
| | %[a] | |	/* push word onto the fake stack */
 | |
| .NH 2
 | |
| The store instructions.
 | |
| .PP
 | |
| In this group a typical instruction is
 | |
| .B
 | |
| stl.
 | |
| .R
 | |
| A
 | |
| .B
 | |
| stl
 | |
| .R
 | |
| instruction poppes a word from the stack and stores it in the word
 | |
| at local base + offset, where offset is the instructions argument.
 | |
| Here also is the need for two code rules in the table as a result
 | |
| of the offset limits.
 | |
| .NH 3
 | |
| The stl instruction with indirect offsetting.
 | |
| .PP
 | |
| In this case it an indirect offsetted store from the second local
 | |
| base is possible.
 | |
| The table content is:
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| stl
 | |
| .R
 | |
| IN($1) | R16 |	/* expect registerpair AX on top of the
 | |
| .br
 | |
| 							fake stack */
 | |
| .br
 | |
| "ldy #BASE+1+$1"  /* load Y with the offset from the
 | |
| .br
 | |
| 						second local base */
 | |
| .br
 | |
| "sta (LBl),y"	/* store the highbyte of the word from A */
 | |
| .br
 | |
| "txa"		/* move register X to register A */
 | |
| .br
 | |
| "dey"		/* decrement offset */
 | |
| .br
 | |
| "sta (LBl),y"	/* store the lowbyte of the word from A */
 | |
| .br
 | |
| | | |
 | |
| .NH 3
 | |
| The stl instruction whose offset is to big.
 | |
| .PP
 | |
| In this case the library subroutine 'Stl' is used.
 | |
| This subroutine expects the offset in registerpair AX, then
 | |
| calculates the address, poppes the word stores it at its place.
 | |
| The table content is:
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| stl
 | |
| .R
 | |
| | |
 | |
| .br
 | |
| allocate(R16)	/* allocate registerpair AX */
 | |
| .br
 | |
| "lda #[$1].h"	/* load highbyte of offset in register A */
 | |
| .br
 | |
| "ldx #[$1].l"	/* load lowbyte of offset in register X */
 | |
| .br
 | |
| "jsr Stl"	/* perform the subroutine */
 | |
| .br
 | |
| | | |
 | |
| .NH 2
 | |
| Integer arithmetic instructions.
 | |
| .PP
 | |
| In this group typical instructions are
 | |
| .B
 | |
| adi
 | |
| .R
 | |
| and
 | |
| .B
 | |
| mli.
 | |
| .R
 | |
| These instructions, in this table, are implemented for 2-byte
 | |
| and 4-byte integers.
 | |
| The only arithmetic instructions available on the MCS6500 are
 | |
| the ADC (add with carry), and SBC (subtract with not(carry)).
 | |
| Not(carry) here means that in a subtraction, the one's complement
 | |
| of the carry is taken.
 | |
| The absence of multiply and division instructions forces the
 | |
| use of subroutines to handle these cases.
 | |
| Because there are no registers left to perform on the multiply
 | |
| and division, zero page is used here.
 | |
| The 4-byte integer arithmetic is implemented, because in C there
 | |
| exists the integer type long.
 | |
| A user is freely to use the type long, but will pay in performance.
 | |
| .NH 3
 | |
| The adi instruction.
 | |
| .PP
 | |
| In case of the
 | |
| .B
 | |
| adi
 | |
| .R
 | |
| 2 (and
 | |
| .B
 | |
| sbi
 | |
| .R
 | |
| 2) instruction there are many EM
 | |
| patterns, so that the instruction can be performed in line in
 | |
| most cases.
 | |
| For the worst case there exists a subroutine in the library
 | |
| which deals with the EM instruction.
 | |
| In case of a
 | |
| .B
 | |
| adi
 | |
| .R
 | |
| 4 (or
 | |
| .B
 | |
| sbi
 | |
| .R
 | |
| 4) there only is a subroutine to deal with it.
 | |
| A table content is:
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| lol lol adi
 | |
| .R
 | |
| (IN($1) && IN($2) && $3==2) | | /* is it in range */
 | |
| .br
 | |
| allocate(R16)	/* allocate registerpair AX */
 | |
| .br
 | |
| "ldy #BASE+$1+1" /* load Y with offset for first operand */
 | |
| .br
 | |
| "lda (LBl),y"	/* load indirect highbyte first operand */
 | |
| .br
 | |
| "pha"		/* save highbyte first operand on hard_stack */
 | |
| .br
 | |
| "dey"		/* decrement offset first operand */
 | |
| .br
 | |
| "lda (LBl),y"	/* load indirect lowbyte first operand */
 | |
| .br
 | |
| "ldy #BASE+$2"	/* load Y with offset for second operand */
 | |
| .br
 | |
| "clc"		/* clear carry for addition */
 | |
| .br
 | |
| "adc (LBl),y"	/* add the lowbytes of the operands */
 | |
| .br
 | |
| "tax"		/* store lowbyte of result in place */
 | |
| .br
 | |
| "iny"		/* increment offset second operand */
 | |
| .br
 | |
| "pla"		/* get highbyte first operand */
 | |
| .br
 | |
| "adc (LBl),y"	/* add the highbytes of the operands */
 | |
| .br
 | |
| | %[a] | |	/* push the result onto the fake stack */
 | |
| .NH 3
 | |
| The mli instruction.
 | |
| .PP
 | |
| The
 | |
| .B
 | |
| mli
 | |
| .R
 | |
| 2 instruction uses most the subroutine 'Mlinp'.
 | |
| This subroutine expects the multiplicand in zero page
 | |
| at locations ARTH, ARTH+1, while the multiplier is in zero
 | |
| page locations ARTH+2, ARTH+3.
 | |
| For a description of the algorithms used for multiplication and
 | |
| division, see [3].
 | |
| A table content is:
 | |
| .sp  1
 | |
| .br
 | |
| .B
 | |
| lol lol mli
 | |
| .R
 | |
| (IN($1) && IN($2) && $3==2) | |
 | |
| .br
 | |
| allocate(R16)	/* allocate registerpair AX */
 | |
| .br
 | |
| "ldy #BASE+$1"	/* load Y with offset of multiplicand */
 | |
| .br
 | |
| "lda (LBl),y"	/* load indirect lowbyte of multiplicand */
 | |
| .br
 | |
| "sta ARTH"	/* store lowbyte in zero page */
 | |
| .br
 | |
| "iny"		/* increment offset of multiplicand */
 | |
| .br
 | |
| "lda (LBl),y"	/* load indirect highbyte of multiplicand */
 | |
| .br
 | |
| "sta ARTH+1"	/* store highbyte in zero page */
 | |
| .br
 | |
| "ldy #BASE+$2"	/* load Y with offset of multiplier */
 | |
| .br
 | |
| "lda (LBl),y"	/* load indirect lowbyte of multiplier */
 | |
| .br
 | |
| "sta ARTH+2"	/* store lowbyte in zero page */
 | |
| .br
 | |
| "iny"		/* increment offset of multiplier */
 | |
| .br
 | |
| "lda (LBl),y"	/* load indirect highbyte of multiplier */
 | |
| .br
 | |
| "sta ARTH+3"	/* store highbyte in zero page */
 | |
| .br
 | |
| "jsr Mlinp"	/* perform the multiply */
 | |
| .br
 | |
| | %[a] | |	/* push result onto fake stack */
 | |
| .NH 2
 | |
| The unsgned arithmetic instructions.
 | |
| .PP
 | |
| Since unsigned addition an subtraction is performed in the same way
 | |
| as signed addition and subtraction, these cases are dealt with by
 | |
| an EM replacement.
 | |
| For mutiplication and division there are special subroutines.
 | |
| .NH 3
 | |
| Unsigned addition.
 | |
| .PP
 | |
| This is an example of the EM replacement strategy.
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| lol lol adu
 | |
| .R
 | |
| 	| | | |
 | |
| .B
 | |
| lol
 | |
| .R
 | |
| $1
 | |
| .B
 | |
| lol
 | |
| .R
 | |
| $2
 | |
| .B
 | |
| adi
 | |
| .R
 | |
| $3 |
 | |
| .NH 2
 | |
| Floating point arithmetic.
 | |
| .PP
 | |
| Floating point arithmetic isn't implemented in this table.
 | |
| .NH 2
 | |
| Pointer arithmetic instructions.
 | |
| .PP
 | |
| A typical pointer arithmetic instruction is
 | |
| .B
 | |
| adp
 | |
| .R
 | |
| 2.
 | |
| This instruction adds an offset and a pointer.
 | |
| A table content is:
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| adp
 | |
| .R
 | |
| 	| | | |
 | |
| .B
 | |
| loc
 | |
| .R
 | |
| $1
 | |
| .B
 | |
| adi
 | |
| .R
 | |
| 2 |
 | |
| .NH 2
 | |
| Increment, decrement and zero instructions.
 | |
| .PP
 | |
| In this group a typical instruction is
 | |
| .B
 | |
| inl
 | |
| .R
 | |
| , which increments a local or parameter.
 | |
| The MCS6500 doesn't have an instruction to increment the
 | |
| accumulator A, so the 'ADC' instruction must be used.
 | |
| A table content is:
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| inl
 | |
| .R
 | |
| IN($1) | |
 | |
| .br
 | |
| allocate(R16)	/* allocate registerpair AX */
 | |
| .br
 | |
| "ldy #BASE+$1"	/* load Y with offset of the local */
 | |
| .br
 | |
| "clc"		/* clear carry for addition */
 | |
| .br
 | |
| "lda (LBl),y"	/* load indirect lowbyte of local */
 | |
| .br
 | |
| "adc #1"	/* increment lowbyte */
 | |
| .br
 | |
| "sta (LBl),y"	/* restore indirect the incremented lowbyte */
 | |
| .br
 | |
| "bcc 1f"	/* if carry is clear then ready */
 | |
| .br 
 | |
| "iny"		/* increment offset of local */
 | |
| .br
 | |
| "lda (LBl),y"	/* load indirect highbyte of local */
 | |
| .br
 | |
| "adc #0"	/* add carry to highbyte */
 | |
| .br
 | |
| "sta (LBl),y\\n1:"  /* restore indirect the highbyte */
 | |
| .PP
 | |
| If the offset of the local or parameter is to big, first the
 | |
| local or parameter is fetched, than incremented, and then
 | |
| restored.
 | |
| .NH 2
 | |
| Convert instructions.
 | |
| .PP
 | |
| In this case there are two convert instructions
 | |
| which really do something.
 | |
| One of them is in line code, and deals with the extension of
 | |
| a character (1-byte) to an integer.
 | |
| The other one is a subroutine which handles the conversion
 | |
| between 2-byte integers and 4-byte integers.
 | |
| .NH 3
 | |
| The in line conversion.
 | |
| .PP
 | |
| The table content is:
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| loc loc cii
 | |
| .R
 | |
| $1==1 && $2==2 | R16 |
 | |
| .br
 | |
| "txa"		/* see if sign extension is needed */
 | |
| .br
 | |
| "bpl 1f"	/* there is no need for sign extension */
 | |
| .br
 | |
| "lda #0FFh"	/* sign extension here */
 | |
| .br
 | |
| "bne 2f"	/* conversion ready */
 | |
| .br
 | |
| "1: lda #0\\n2:"	/* no sign extension here */
 | |
| .NH 2
 | |
| Logical instructions.
 | |
| .PP
 | |
| A typical instruction in this group is the logical
 | |
| .B
 | |
| and
 | |
| .R
 | |
| on two 2-byte words.
 | |
| The logical
 | |
| .B
 | |
| and
 | |
| .R
 | |
| on groups of more than two bytes (max 254)
 | |
| is also possible and uses a library subroutine.
 | |
| .NH 3
 | |
| The logical and on 2-byte groups.
 | |
| .PP
 | |
| The table content is:
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| and
 | |
| .R
 | |
| $1==2 | R16 |	/* one group must be on the fake stack */
 | |
| .br
 | |
| "sta ARTH+1"	/* temporary save of first group highbyte */
 | |
| .br
 | |
| "stx ARTH"	/* temporary save of first group lowbyte */
 | |
| .br
 | |
| "jsr Pop"	/* pop second group from the stack */
 | |
| .br
 | |
| "and ARTH+1"	/* logical and on highbytes */
 | |
| .br
 | |
| "pha"		/* temporary save the result's highbyte */
 | |
| .br
 | |
| "txa"		/* logical and can only be done in A */
 | |
| .br
 | |
| "and ARTH"	/* logical and on lowbytes */
 | |
| .br
 | |
| "tax"		/* restore results lowbyte */
 | |
| .br
 | |
| "pla"		/* restore results highbyte */
 | |
| .br
 | |
| | %[1] | |	/* push result onto fake stack */
 | |
| .NH 2
 | |
| Set manipulation instructions.
 | |
| .PP
 | |
| A typical EM pattern in this group is
 | |
| .B
 | |
| loc inn zeq
 | |
| .R
 | |
| $1>0 && $1<16 && $2==2.
 | |
| This EM pattern works on sets of 16 bits.
 | |
| Sets can be bigger (max 256 bytes = 2048 bits), but than a
 | |
| library routine is used instead of in line code.
 | |
| The table content of the above EM pattern is:
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| loc inn zeq
 | |
| .R
 | |
| $1>0 && $1<16 && $2==2 | R16 |
 | |
| .br
 | |
| "ldy #$1+1"	/* load Y with bit number */
 | |
| .br
 | |
| "stx ARTH"	/* cannot rotate X, so use zero page */
 | |
| .br
 | |
| "1: lsr a"	/* right shift A */
 | |
| .br
 | |
| "ror ARTH"	/* right rotate zero page location */
 | |
| .br
 | |
| "dey"		/* decrement Y */
 | |
| .br
 | |
| "bne 1b"	/* shift $1 times */
 | |
| .br
 | |
| "bcc $1"	/* no carry, so bit is zero */
 | |
| .NH 2
 | |
| Array instructions.
 | |
| .PP
 | |
| In this group a typical EM pattern is
 | |
| .B
 | |
| lae lar
 | |
| .R
 | |
| defined(rom(1,3)) | | | |
 | |
| .B
 | |
| lae
 | |
| .R
 | |
| $1
 | |
| .B
 | |
| aar
 | |
| .R
 | |
| $2
 | |
| .B
 | |
| loi
 | |
| .R
 | |
| rom(1,3).
 | |
| This pattern uses the 
 | |
| .B
 | |
| aar
 | |
| .R
 | |
| instruction, which is part of a typical EM pattern:
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| lae aar
 | |
| .R
 | |
| $2==2 && rom(1,3)==2 && rom(1,1)==0 | R16 | /* registerpair AX contains
 | |
| the index in the array */
 | |
| .br
 | |
| "pha"		/* save highbyte of index */
 | |
| .br
 | |
| "txa"		/* move lowbyte of index to A */
 | |
| .br
 | |
| "asl a"		/* shift left lowbyte == 2 times lowbyte */
 | |
| .br
 | |
| "tax"		/* restore lowbyte */
 | |
| .br
 | |
| "pla"		/* restore highbyte */
 | |
| .br
 | |
| "rol a"		/* rotate left highbyte == 2 times highbyte */
 | |
| .br
 | |
| | %[1] | adi 2 | /* push new index, add to lowerbound array */
 | |
| .NH 2
 | |
| Compare instructions.
 | |
| .PP
 | |
| In this group all EM patterns are performed by calling
 | |
| a subroutine.
 | |
| Subroutines are used here because comparison is only
 | |
| possible byte by byte.
 | |
| This means a lot of code, and since compare are used frequently
 | |
| a lot of in line code would be generated, and thus reducing
 | |
| the space left for the software stack.
 | |
| These subroutines can be found in the library.
 | |
| .NH 2
 | |
| Branch instructions.
 | |
| .PP
 | |
| A typical branch instruction is
 | |
| .B
 | |
| beq.
 | |
| .R
 | |
| The table content for it is:
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| beq
 | |
| .R
 | |
| | R16 |
 | |
| .br
 | |
| "sta BRANCH+1"	/* save highbyte second operand in zero page */
 | |
| .br
 | |
| "stx BRANCH"	/* save lowbyte second operand in zero page */
 | |
| .br
 | |
| "jsr Pop"	/* pop the first operand */
 | |
| .br
 | |
| "cmp BRANCH+1" 	/* compare the highbytes */
 | |
| .br
 | |
| "bne 1f"	/* there not equal so go on */
 | |
| .br
 | |
| "cpx BRANCH"	/* compare the lowbytes */
 | |
| .br
 | |
| "beq $1\\n1:"	/* lowbytes are also equal, so branch */
 | |
| .PP
 | |
| Another typical instruction in this group is
 | |
| .B
 | |
| zeq.
 | |
| .R
 | |
| The table content is:
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| zeq
 | |
| .R
 | |
| | R16 |
 | |
| .br
 | |
| "tay"		/* move A to Y for setting testbits */
 | |
| .br
 | |
| "bmi $1"	/* highbyte s minus so branch */
 | |
| .br
 | |
| "txa"		/* move X to A for setting testbits */
 | |
| .br
 | |
| "beq $1\\n1:"	/* lowbyte also zero, thus branch */
 | |
| .NH 2
 | |
| Procedure call instructions.
 | |
| .PP
 | |
| In this group one code generation might seem a little
 | |
| akward.
 | |
| It is the EM instruction
 | |
| .B
 | |
| cai
 | |
| .R
 | |
| which generates a 'jsr Indir'.
 | |
| This is because there is no indirect jump_subroutine in the
 | |
| MCS6500.
 | |
| The only solution is to store the address in zero page, and then
 | |
| do a 'jsr' to a known label.
 | |
| At this label there must be an indirect jump instruction, which
 | |
| perform a jump to the address stored in zero page.
 | |
| In this case the label is Indir, and the address is stored in
 | |
| zero page at the addresses ADDR, ADDR+1.
 | |
| The tabel content is:
 | |
| .sp 1
 | |
| .br
 | |
| .B
 | |
| cai
 | |
| .R
 | |
| | R16 |
 | |
| .br
 | |
| "stx ADDR"	/* store lowbyte of address in zero page */
 | |
| .br
 | |
| "sta ADDR+1"	/* store highbyte of address in zero page */
 | |
| .br
 | |
| "jsr Indir"	/* use the indirect jump */
 | |
| .br
 | |
| | | |
 | |
| .NH 2
 | |
| Miscellaneous instructions.
 | |
| .PP
 | |
| In this group, as the name suggests, there is no
 | |
| typical EM instruction or EM pattern.
 | |
| Most of the MCS6500 code to be generated uses a library subroutine
 | |
| or is straightforward.
 | |
| .DS C
 | |
| .B
 | |
| PERFORMANCE.
 | |
| .R
 | |
| .DE
 | |
| .NH 0
 | |
| Introduction.
 | |
| .PP
 | |
| To measure the performance of the back end table some timing
 | |
| tests are done.
 | |
| What to time?
 | |
| In this case, the execution time of several Pascal statements
 | |
| are timed.
 | |
| Statements in C, which have a Pascal equivalence are timed also.
 | |
| The statements are timed as follows.
 | |
| A test program is been written, which executes two
 | |
| nested  for_loops from 1 to 1.000.
 | |
| Within these for_loops the statement, which is to be tested, is placed,
 | |
| so the statement will be executed 1.000.000 times.
 | |
| Then the same program is executed without the test statement.
 | |
| The time difference between the two executions is the time
 | |
| neccesairy to execute the test statement 1.000.000 times.
 | |
| The total time to execute the test statement requires thus the
 | |
| time difference divided by 1.000.000.
 | |
| .NH 0
 | |
| Testing Pascal statements.
 | |
| .PP
 | |
| The next statements are tested.
 | |
| .IP 1)
 | |
| int1 := 0;
 | |
| .IP 2)
 | |
| int1 := int2 - 1;
 | |
| .IP 3)
 | |
| int1 := int1 + 1;
 | |
| .IP 4)
 | |
| int1 := icon1 - icon2;
 | |
| .IP 5)
 | |
| int1 := icon2 div icon1;
 | |
| .IP 6)
 | |
| int1 := int2 * int3;
 | |
| .IP 7)
 | |
| bool := (int1 < 0);
 | |
| .IP 8)
 | |
| bool := (int1 < 3);
 | |
| .IP 9)
 | |
| bool := ((int1 > 3) or (int1 < 3))
 | |
| .IP 10)
 | |
| case int1 of 1: bool := false; 2: bool := true end;
 | |
| .IP 11)
 | |
| if int1 = 0 then int2 := 3;
 | |
| .IP 12)
 | |
| while int1 > 0 do int1 := int1 - 1;
 | |
| .IP 13)
 | |
| m := a[k];
 | |
| .IP 14)
 | |
| let2 := ['a'..'c'];
 | |
| .IP 15)
 | |
| P3(x);
 | |
| .IP 16)
 | |
| dum := F3(x);
 | |
| .IP 17)
 | |
| s.overhead := 5400;
 | |
| .IP 18)
 | |
| with s do overhead := 5400;
 | |
| .PP
 | |
| These statement were tested in a procedure test.
 | |
| .sp 1
 | |
| .br
 | |
| procedure test;
 | |
| .br
 | |
| var i, j, ... : integer;
 | |
| .br
 | |
|     bool : boolean;
 | |
| .br
 | |
|     let2 : set of char;
 | |
| .br
 | |
| begin
 | |
| .br
 | |
|     for i := 1 to 1000
 | |
| .br
 | |
| 	for j := 1 to 1000
 | |
| .br
 | |
| 	    STATEMENT
 | |
| .br
 | |
| end;
 | |
| .sp 1
 | |
| .PP
 | |
| STATEMENT is one of the statements as shown above, or it is
 | |
| the empty statement.
 | |
| The assignment of used variables, if neccesairy, is done before
 | |
| the first for_loop.
 | |
| In case of the statement which uses the procedure call, statement
 | |
| 15, a dummy procedure is declared whose body is empty.
 | |
| In case of the statement which uses the function, statement 16,
 | |
| this function returns its argument.
 | |
| for the timing of C statements a similar test program was
 | |
| written.
 | |
| .sp 1
 | |
| .br
 | |
| main()
 | |
| .br
 | |
| {
 | |
| .br
 | |
|     int i, j, ...;
 | |
| .br
 | |
|     for (i = 1; i <= 1000; i++)
 | |
| .br
 | |
| 	for (j = 1; j <= 1000; j++)
 | |
| .br
 | |
| 	    STATEMENT
 | |
| .br
 | |
| }
 | |
| .sp 1
 | |
| .NH
 | |
| The results.
 | |
| .PP
 | |
| Here are tables with the results of the time measurments.
 | |
| Times are in microseconds (10^-6).
 | |
| Some statements appear twice in the tables.
 | |
| In the second case an array of 200 integers was declerated
 | |
| before the variable to be tested, so this variable cannot
 | |
| be accessed by indirect addressing from the second local base.
 | |
| This results in a larger execution time of the statement to be
 | |
| tested.
 | |
| The column 68000 contains the times measured on a Bleasdale,
 | |
| M68000 based, computer.
 | |
| The times in column pdp are measured on a DEC pdp11/44, where
 | |
| the times from column 6500 come from a BBC microcomputer.
 | |
| .bp
 | |
| .TS
 | |
| expand;
 | |
| c s s s
 | |
| c c c c
 | |
| lw35 nw7 nw7 nw7.
 | |
| Pascal timing results
 | |
| statement	68000	pdp	6500
 | |
| _
 | |
| T{
 | |
| int1 := 0;
 | |
| T}	4.0	5.8	16.7
 | |
|  	4.0	4.2	97.8
 | |
| _
 | |
| T{
 | |
| int1 := int2 - 1;
 | |
| T}	7.2	7.1	27.2
 | |
|  	6.9	7.1	206.5
 | |
| _
 | |
| T{
 | |
| int1 := int1 + 1;
 | |
| T}	6.9	6.8	27.2
 | |
|  	6.4	6.7	106.5
 | |
| _
 | |
| T{
 | |
| int1 := icon1 + icon2;
 | |
| T}	6.2	6.2	25.6
 | |
|  	6.2	6.0	106.6
 | |
| _
 | |
| T{
 | |
| int1 := icon2 div icon1;
 | |
| T}	14.9	14.3	372.6
 | |
|  	14.9	14.7	453.7
 | |
| _
 | |
| T{
 | |
| int1 := int2 * int3;
 | |
| T}	11.5	12.0	558.1
 | |
|  	11.3	11.6	728.6
 | |
| _
 | |
| T{
 | |
| bool := (int1 < 0);
 | |
| T}	7.2	6.9	122.8
 | |
|  	7.8	8.1	453.2
 | |
| _
 | |
| T{
 | |
| bool := (int1 < 3);
 | |
| T}	7.3	7.6	126.0
 | |
|  	7.2	8.1	232.2
 | |
| _
 | |
| T{
 | |
| bool := ((int1 > 3) or (int1 < 3))
 | |
| T}	10.1	12.0	307.8
 | |
|  	10.2	11.9	440.1
 | |
| _
 | |
| T{
 | |
| case int1 of 1: bool := false; 2: bool := true end;
 | |
| T}	18.3	17.9	165.7
 | |
| _
 | |
| T{
 | |
| if int1 = 0 then int2 := 3;
 | |
| T}	9.5	8.5	133.8
 | |
| _
 | |
| T{
 | |
| while int1 > 0 do int1 := int1 - 1;
 | |
| T}	6.9	6.9	126.0
 | |
| _
 | |
| T{
 | |
| m := a[k];
 | |
| T}	7.2	6.8	134.3
 | |
| _
 | |
| T{
 | |
| let2 := ['a'..'c'];
 | |
| T}	38.4	38.8	447.4
 | |
| _
 | |
| T{
 | |
| P3(x);
 | |
| T}	18.9	18.8	180.3
 | |
| _
 | |
| T{
 | |
| dum := F3(x);
 | |
| T}	26.8	27.1	343.3
 | |
| _
 | |
| T{
 | |
| s.overhead := 5400;
 | |
| T}	4.6	4.1	16.7
 | |
| _
 | |
| T{
 | |
| with s do overhead := 5400;
 | |
| T}	4.2	4.3	16.7
 | |
| .TE
 | |
| .TS
 | |
| expand;
 | |
| c s s s
 | |
| c c c c
 | |
| lw35 nw7 nw7 nw7.
 | |
| C timing results
 | |
| statement	68000time	pdptime	6500time
 | |
| _
 | |
| T{
 | |
| int1 = 0;
 | |
| T}	4.1	3.6	17.2
 | |
|  	4.1	4.1	97.7
 | |
| _
 | |
| T{
 | |
| int1 = int2 - 1;
 | |
| T}	6.6	6.9	27.2
 | |
|  	6.1	6.5	206.4
 | |
| _
 | |
| T{
 | |
| int1 = int1 + 1;
 | |
| T}	6.4	7.3	27.2
 | |
|  	6.3	6.2	206.4
 | |
| _
 | |
| T{
 | |
| int1 = int2 * int3;
 | |
| T}	11.4	12.3	522.6
 | |
| 	9.6	10.1	721.2
 | |
| _
 | |
| T{
 | |
| int1 = (int2 < 0);
 | |
| T}	7.2	7.6	126.4
 | |
|  	7.4	7.7	232.5
 | |
| _
 | |
| T{
 | |
| int1 = (int2 < 3);
 | |
| T}	7.0	7.5	126.0
 | |
|  	7.8	7.8	232.6
 | |
| _
 | |
| T{
 | |
| int1 = ((int2 > 3) || (int2 < 3));
 | |
| T}	11.8	12.2	193.4
 | |
|  	11.5	13.2	245.6
 | |
| _
 | |
| T{
 | |
| switch (int1) { case 1: int1 = 0; break; case 2: int1 = 1; break; }
 | |
| T}	28.3	29.2	164.1
 | |
| _
 | |
| T{
 | |
| if (int1 == 0) int2 = 3;
 | |
| T}	4.8	4.8	19.4
 | |
| _
 | |
| T{
 | |
| while (int2 > 0) int2 = int2 - 1;
 | |
| T}	5.8	6.0	125.9
 | |
| _
 | |
| T{
 | |
| int2 = a[int2];
 | |
| T}	4.8	5.1	192.8
 | |
| _
 | |
| T{
 | |
| P3(int2);
 | |
| T}	18.8	18.4	180.3
 | |
| _
 | |
| T{
 | |
| int2 = F3(int2);
 | |
| T}	27.0	27.2	309.4
 | |
| _
 | |
| T{
 | |
| s.overhead = 5400;
 | |
| T}	5.0	4.1	16.7
 | |
| .TE
 | |
| .NH
 | |
| Pascal statements which don't have a C equivalent.
 | |
| .PP
 | |
| At first, the two statements who perform an operation on constants
 | |
| are left out.
 | |
| These are left out while the C front end does constant folding,
 | |
| while the Pascal front end doesn't.
 | |
| So in C the statements int1 = icon1 + icon2; and int1 = icon1 / icont2;
 | |
| will use the same amount of time since the expression is evaluated
 | |
| by the front end.
 | |
| The two other statements (let2 := ['a'..'c']; and
 | |
| .B
 | |
| with
 | |
| .R
 | |
| s
 | |
| .B
 | |
| do
 | |
| .R
 | |
| overhead := 5400;), aren't included in the C statement timing table,
 | |
| because there constructs do not exist in C.
 | |
| Although in C there can be direct bit manipulation, and thus can
 | |
| be used to implement sets I have not used it here.
 | |
| The
 | |
| .B
 | |
| with
 | |
| .R
 | |
| statement does not exists in C and there is nothing with the slightest
 | |
| resemblance to it.
 | |
| .PP
 | |
| At first sight in the table , it looked if there is no much difference
 | |
| in the times for the M68000 and the pdp11/44, in comparison with the
 | |
| times needed by the MCS6500.
 | |
| To verify this impression, I calculated the correlation coefficient
 | |
| between the times of the M68000 and pdp11/44.
 | |
| It turned out to be 0.997 for both the Pascal time tests and the C
 | |
| time tests.
 | |
| Since the correlation coefficient is near to one and the difference
 | |
| between the times is small, they can be considered to be the same
 | |
| as seen from the times of the MCS6500.
 | |
| Then I have tried to make a grafic of the times from the M68000 and
 | |
| the MCS6500.
 | |
| Well, there was't any correlation to been seen, taken all the times.
 | |
| The only correlation one could see, with some effort, was in the
 | |
| times for the first three Pascal statements.
 | |
| The two first C statements show also a correlation, which two points
 | |
| always do.
 | |
| .PP
 | |
| Also the three Pascal statements
 | |
| .B
 | |
| case
 | |
| .R
 | |
| ,
 | |
| .B
 | |
| if
 | |
| .R
 | |
| ,
 | |
| and
 | |
| .B
 | |
| while
 | |
| .R
 | |
| have a correlation coefficient of 0.999.
 | |
| This is probably because the
 | |
| .B
 | |
| case
 | |
| .R
 | |
| statement uses a subroutine in both cases and the other two
 | |
| statements
 | |
| .B
 | |
| if
 | |
| .R
 | |
| and,
 | |
| .B
 | |
| while
 | |
| .R
 | |
| generate in line code.
 | |
| The last two Pascal statements use the same time, since the front
 | |
| end wil generate the same EM code for both.
 | |
| .PP
 | |
| The independence between the rest of the test times is because
 | |
| in these cases the object code for the MCS6500 uses library
 | |
| subroutines, while the other processors can handle the EM code
 | |
| with in line code.
 | |
| .PP
 | |
| It is clear that the MCS6500 is a slower device, it needs longer
 | |
| execution times, the need of more library subroutines, but
 | |
| there is no constant factor between it execution times and those
 | |
| of other processors.
 | |
| .PP
 | |
| The slowing down of the MCS6500 as result of the need of a
 | |
| library subroutine is illustrated by the muliplication
 | |
| statement.
 | |
| The MCS6500 needs a library subroutine, while the other
 | |
| two processors have a machine instruction to perform the
 | |
| multiply.
 | |
| This results in a factor of 48.5, when the operands can be accessed
 | |
| indirect by the MCS6500.
 | |
| When the MCS6500 cannot access the operands indirectly the situation
 | |
| is even worse.
 | |
| The slight differences between the MCS6500 execution times for
 | |
| Pascal statements and C statements is probably the result of the
 | |
| front end, and thus beyond the scope of this discussion.
 | |
| .PP
 | |
| Another timing test is done in C on the statement k = i + j + 1983.
 | |
| This statement is tested on many UNIX*
 | |
| .FS
 | |
| * UNIX is a Trademark of Bell Laboratories.
 | |
| .FE
 | |
| systems.
 | |
| For a complete list see appendix A.
 | |
| The slowest one is the IBM XT, which runs on a 8088 microprocessor.
 | |
| The fasted one is the Amdahl computer.
 | |
| Here is short table to illustrate the performance of the
 | |
| MCS6500.
 | |
| .TS
 | |
| c c c
 | |
| c n n.
 | |
| machine	short	int
 | |
| IBM XT	53.4	53.4
 | |
| Amdahl	0.5	0.3
 | |
| MCS6500	150.2	150.2
 | |
| .TE
 | |
| The MCS6500 is three times slower than the IBM XT, but threehundred
 | |
| times slower than the Amdahl.
 | |
| The reason why the times on the IBM XT and the MCS6500 are the
 | |
| same for short's and int's, is that most C compilers make the types
 | |
| short and integer the same size on 16-bit machines.
 | |
| In this project the MCS6500 is regarded as a 16-bit machine.
 | |
| .NH
 | |
| Length tests.
 | |
| .PP
 | |
| I have also compiled several programs written in Pascal and C to
 | |
| see if there is a resemblance between the number of bytes generated
 | |
| in the machine's language.
 | |
| In the tables:
 | |
| .IP length: 9
 | |
| The number of bytes of the source program.
 | |
| .IP 68000:
 | |
| The number of bytes of the a.out file for a M68000.
 | |
| .IP pdp:
 | |
| The number of bytes of the a.out file for a pdp11/44.
 | |
| .IP 6500:
 | |
| The number of bytes of the a.out file for a MCS6500.
 | |
| .LP
 | |
| These are the results:
 | |
| .TS
 | |
| c s s s
 | |
| c c c c
 | |
| n n n n.
 | |
| Pascal programs
 | |
| length	68000	pdp	6500
 | |
| _
 | |
| 19946	14383	16090	26710
 | |
| 19484	20169	20190	35416
 | |
| 10849	10469	11464	18949
 | |
| 273	4221	5106	7944
 | |
| 1854	5807	6610	10301
 | |
| .TE
 | |
| .TS
 | |
| c s s s
 | |
| c c c c
 | |
| n n n n.
 | |
| C progams
 | |
| length	68000	pdp	6500
 | |
| _
 | |
| 9444	6927	8234	11559
 | |
| 7655	14353	18240	26251
 | |
| 4775	11309	15934	19910
 | |
| 639	6337	9660	12494
 | |
| .TE
 | |
| .PP
 | |
| In contrast to the execution times of the test statements, the
 | |
| object code files sizes show a constant factor between them.
 | |
| After calculating the correlation coefficient, I have calculated
 | |
| the line fitted between sizes.
 | |
| .FS
 | |
| * x is the number of bytes
 | |
| .FE
 | |
| .TS
 | |
| c s s
 | |
| c c c
 | |
| l c c.
 | |
| Pascal programs
 | |
| processor	corr. coef.	fitted line
 | |
| _
 | |
| 68000-pdp	0.996	 
 | |
| 68000-6500	0.999	1.76x + 502*
 | |
| pdp-6500	0.999	1.80x - 1577
 | |
| .TE
 | |
| .TS
 | |
| c s s
 | |
| c c c
 | |
| l c c.
 | |
| C programs
 | |
| processor	corr. coef.	fitted line
 | |
| _
 | |
| 68000-pdp	0.974	 
 | |
| 68000-6500	0.992	1.80x + 502*
 | |
| pdp-6500	0.980	1.40x - 1577
 | |
| .TE
 | |
| .PP
 | |
| As seen from the tables above the correlation coefficient for
 | |
| Pascal programs is better than the ones for C programs.
 | |
| Thus the line fits best for Pascal programs.
 | |
| With the formula of the best fitted line one can now estimate
 | |
| the size of the object code, which a program needs, for a MCS6500
 | |
| without having the compiler at hand.
 | |
| One also can see from these formula that the object code
 | |
| generated for a MCS6500 is about 1.8 times more than for the other
 | |
| processors.
 | |
| Since the number of bytes in the source file havily depends on the
 | |
| programmer, how many spaces he or she uses, the size of the indenting
 | |
| in structured programs, etc., there is no correlation between the
 | |
| size of the source file and the size of the object file.
 | |
| Also the use of comments has its influence on the size.
 | |
| .bp
 | |
| .DS C
 | |
| .B
 | |
| SUMMARY.
 | |
| .R
 | |
| .DE
 | |
| .NH 0
 | |
| Summary
 | |
| .PP
 | |
| In this chapter some final conclusions are made.
 | |
| .PP
 | |
| In spite of its simplicity, the MCS6500 is strong enough to
 | |
| implement a EM machine.
 | |
| A serious deficy of the MCS6500 is the missing of 16-bit
 | |
| general purpose registers, and especially the missing of a
 | |
| 16-bit stackpointer.
 | |
| As pointed out before, one 16-bit register can be simulated
 | |
| by a pair of 8-bit registers, in fact, the accumulator A to
 | |
| hold the highbyte, and the index register X to hold the lowbyte
 | |
| of the word.
 | |
| By lack of a 16-bit stackpointer, zero page must be used to hold
 | |
| a stackpointer and there are also two subroutines needed for
 | |
| manipulating the stack (Push and Pop).
 | |
| .PP
 | |
| As seen at the time tests, the simple instruction set of the
 | |
| MCS6500 forces the use of library subroutines.
 | |
| These library subroutines increas the execution time of the
 | |
| programs.
 | |
| .PP
 | |
| The sizes of the object code files show a strong correlation
 | |
| in contrast to the execution times.
 | |
| With this correlatiuon one canestimate the size of a program
 | |
| if it is to be used on a MCS6500.
 | |
| .bp
 | |
| .NH 0
 | |
| .B
 | |
| REFERENCES.
 | |
| .R
 | |
| .IP 1.
 | |
| Osborn, A., Jacobson, S., and Kane, J. The Mos Technology MCS6500.
 | |
| .B
 | |
| An Introduction to Microcomputers ,
 | |
| .R
 | |
| Volume II, Some Real Products (june 1977) chap. 9.
 | |
| .RS
 | |
| .PP
 | |
| A hardware description of some real existing CPU's, such as
 | |
| the Intel Z80, MCS6500, etc. is given in this book.
 | |
| .RE
 | |
| .IP 2.
 | |
| van Staveren, H.
 | |
| The table driven code generator from the Amsterdam Compiler Kit.
 | |
| Vrije Universiteit, Amsterdam, (July 11, 1983).
 | |
| .RS
 | |
| .PP
 | |
| The defining document for writing a back end table.
 | |
| .RE
 | |
| .IP 3.
 | |
| Tanenbaum, A.S. Structured Computer Organization.
 | |
| Prentice Hall. (1976).
 | |
| .RS
 | |
| .PP
 | |
| In this book computers are described as a hierarchy of levels,
 | |
| with each one performing some well-defined function.
 | |
| .RE
 |