1894 lines
42 KiB
Plaintext
1894 lines
42 KiB
Plaintext
. \" $Id$"
|
|
.RP
|
|
.ND Dec 1984
|
|
.TL
|
|
.B
|
|
A backend table for the 6500 microprocessor
|
|
.R
|
|
.AU
|
|
Jan van Dalen
|
|
.AB
|
|
The backend table is part of the Amsterdam Compiler Kit (ACK).
|
|
It translates the intermediate language family EM to a machine
|
|
code for the MCS6500 microprocessor family.
|
|
.AE
|
|
.bp
|
|
.DS C
|
|
.B
|
|
THE MCS6500 MICROPROCESSOR.
|
|
.R
|
|
.DE
|
|
.NH 0
|
|
Introduction
|
|
.PP
|
|
Why a back end table for the MCS6500 microprocessor family.
|
|
Although the MCS6500 microprocessor family has an simple
|
|
instruction set and internal structure, it is used in a
|
|
variety of microcomputers and homecomputers.
|
|
This is because of is low cost.
|
|
As an example the Apple II, a well known and width spread
|
|
microprocessor, uses the MCS6502 CPU.
|
|
Also the BBC homecomputer, whose popularity is growing day
|
|
by day uses the MCS6502 CPU.
|
|
The BBC homecomputer is based on the MCS6502 CPU although
|
|
better and stronger microprocessors are available.
|
|
The designers of Acorn computer Industries have probably
|
|
choosen for the MCS6502 because of the amount of software
|
|
available for this CPU.
|
|
Since its width spreaded use, a variaty of software
|
|
will be needed for it.
|
|
One can think of games!!, administration programs,
|
|
teaching programs, basic interpreters and other application
|
|
programs.
|
|
Even do it will not be possible to run the total compiler kit
|
|
on a MCS6500 based computer, it is possible to write application
|
|
programs in a high level language, such as Pascal or C on a
|
|
minicomputer.
|
|
These application programs can be tested and compiled on that
|
|
minicomputer and put in a ROM (Read Only Memory), for example,
|
|
cso that it an be executed by a MCS6500 CPU.
|
|
The strategy of writing testprograms on a minicomputer,
|
|
compile it and then execute it on a MCS6500 based
|
|
microprocessor is used by the development of the back end.
|
|
The minicomputer used is M68000 based one, manufactured by
|
|
Bleasdale Computer Systems Ltd..
|
|
The micro- or homecomputer used is a BBC microcomputer,
|
|
manufactured by Acorn Computer Ltd..
|
|
.NH
|
|
The MOS Technology MCS6500
|
|
.PP
|
|
The MCS6500 is as a family of CPU devices developed by MOS
|
|
Technology [1].
|
|
The members of the MCS6500 family are the same chips in a
|
|
different housing.
|
|
The MCS6502, the big brother in the family, can handle 64k
|
|
bytes of memory, while for example the MCS6504 can only handle
|
|
8k bytes of memory.
|
|
This difference is due to the fact that the MCS6502 is in a
|
|
40 pins house and the MCS6504 has a 28 pins house, so less
|
|
address lines are available.
|
|
.bp
|
|
.NH
|
|
The MCS6500 CPU programmable registers
|
|
.PP
|
|
The MCS6500 series is based on the same chip so all have the
|
|
same programmable registers.
|
|
.sp 9
|
|
.NH 2
|
|
The accumulator A.
|
|
.PP
|
|
The accumulator A is the only register on which the arithmetic
|
|
and logical instructions can be used.
|
|
For example, the instruction ADC (add with carry) adds the
|
|
contents of the accumulator A and a byte from memory or data.
|
|
.NH 2
|
|
The index register X.
|
|
.PP
|
|
As the name suggests this register can be used for some
|
|
indirect addressing modes.
|
|
The modes are explaned below.
|
|
.NH 2
|
|
The index register Y.
|
|
.PP
|
|
This register is, just as the index register X, used for
|
|
certain indirect addressing modes.
|
|
These addressing modes are different from the modes which
|
|
use index register X.
|
|
.NH 2
|
|
The program counter PC
|
|
.PP
|
|
This is the only 16-bit register available.
|
|
It is used to point to the next instruction to be
|
|
carried out.
|
|
.NH 2
|
|
The stack pointer SP
|
|
.PP
|
|
The stack pointer is an 8-bit register, so the stack can contain
|
|
at most 256 bytes.
|
|
The CPU always appends 00000001 as highbyte of any stack address,
|
|
which means that memory locations
|
|
.B
|
|
0100
|
|
.R
|
|
through
|
|
.B
|
|
01FF
|
|
.R
|
|
are permanently assigned to the stack.
|
|
.sp 12
|
|
.NH 2
|
|
The status register
|
|
.PP
|
|
The status register maintains six status flags and a master
|
|
interrupt control bit.
|
|
.br
|
|
These are the six status flags:
|
|
Carry (c)
|
|
Zero (z)
|
|
Overflow (o)
|
|
Sign (n)
|
|
Decimal mode (d)
|
|
Break (b)
|
|
|
|
|
|
|
|
|
|
|
|
The bit (i) is the master interrupt control bit.
|
|
.NH
|
|
The MCS6500 memory layout.
|
|
.PP
|
|
In the MCS6500 memory space three area's have special meaning.
|
|
These area's are:
|
|
.IP 1)
|
|
Top page.
|
|
.IP 2)
|
|
Zero page.
|
|
.IP 3)
|
|
The stack.
|
|
.PP
|
|
MCS6500 memory is divided up into pages.
|
|
These pages consist 256 bytes.
|
|
So in a memory address the highbyte denotes the page number
|
|
and the lowbyte the offset within the page.
|
|
.NH 2
|
|
Top page.
|
|
.PP
|
|
When a MCS6500 is restared it jumps indirect via memory address
|
|
.B
|
|
FFFC.
|
|
.R
|
|
At
|
|
.B
|
|
FFFC
|
|
.R
|
|
(lowbyte) and
|
|
.B
|
|
FFFD
|
|
.R
|
|
(highbyte) there must be the address of the bootstrap subroutine.
|
|
When a break instruction (BRK) occurs or an interrupt takes place,
|
|
the MCS6500 jumps indirect through memory address
|
|
.B
|
|
FFFE.
|
|
.R
|
|
.B
|
|
FFFE
|
|
.R
|
|
and
|
|
.B
|
|
FFFF
|
|
.R
|
|
thus, must contain the address of the interrupt routine.
|
|
The former only goes for maskeble interrupt.
|
|
There also exist a nonmaskeble interrupt.
|
|
This cause the MCS6500 to jump indirect through memory address
|
|
.B
|
|
FFFA.
|
|
.R
|
|
So the top six bytes of memory are used by the operating system
|
|
and therefore not available for the back end.
|
|
.NH 2
|
|
Zero page.
|
|
.PP
|
|
This page has a special meaning in the sence that addressing
|
|
this page uses special opcodes.
|
|
Since a page consists of 256 bytes, only one byte is needed
|
|
for addressing zero page.
|
|
So an instruction which uses zero page occupies two bytes.
|
|
It also uses less clock cycle's while carrying out the instruction.
|
|
Zero page is also needed when indirect addressing is used.
|
|
This means that when indirect addressing is used, the address must
|
|
reside in zero page (two consecutive bytes).
|
|
In this case (the back end), zero page is used, for example
|
|
to hold the local base, the second local base, the stack pointer
|
|
etc.
|
|
.NH 2
|
|
The stack.
|
|
.PP
|
|
The stack is described in paragraph 3.5 about the MCS6500
|
|
programmable registers.
|
|
.NH
|
|
The memory adressing modes
|
|
.PP
|
|
MCS6500 memory reference instructions use direct addressing,
|
|
indexed addressing, and indirect addressing.
|
|
.NH 2
|
|
direct addressing.
|
|
.PP
|
|
Three-byte instructions use the second and third bytes of the
|
|
object code to provide a direct 16-bit address:
|
|
therefore, 65.536 bytes of memory can be addressed directly.
|
|
The commonly used memory reference instructions also have a two-byte
|
|
object code variation, where the second byte directly addresses
|
|
one of the first 256 bytes.
|
|
.NH 2
|
|
Base page, indexed addressing.
|
|
.PP
|
|
In this case, the instruction has two bytes of object code.
|
|
The contents of either the X or Y index registers are added to the
|
|
second object code byte in order to compute a memory address.
|
|
This may be illustrated as follows:
|
|
.sp 15
|
|
Base page, indexed addressing, as illustrated above, is
|
|
wraparound - which means that there is no carry.
|
|
If the sum of the index register and second object code byte contents
|
|
is more than
|
|
.B
|
|
FF
|
|
.R
|
|
, the carry bit will be dicarded.
|
|
This may be illustrated as follows:
|
|
.sp 9
|
|
.NH 2
|
|
Absolute indexed addressing.
|
|
.PP
|
|
In this case, the contents of either the X or Y register are added
|
|
to a 16-bit direct address provided by the second and third bytes
|
|
of an instruction's object code.
|
|
This may be illustrated as follows:
|
|
.sp 10
|
|
.NH 2
|
|
Indirect addressing.
|
|
.PP
|
|
Instructions that use simple indirect addressing have three bytes of
|
|
object code.
|
|
The second and third object code bytes provide a 16-bit address;
|
|
therefore, the indirect address can be located anywhere in
|
|
memory.
|
|
This is straightforward indirect addressing.
|
|
.NH 3
|
|
Pre-indexed indirect addressing.
|
|
.PP
|
|
In this case, the object code consists of two bytes and the
|
|
second object code byte provides an 8-bit address.
|
|
Instructions that use pre-indexed indirect addressing add the contents
|
|
of the X index register and the second object code byte to access
|
|
a memory location in the first 256 bytes of memory, where the
|
|
indirect address will be found:
|
|
.sp 18
|
|
When using pre-indexed indirect addressing, once again wraparound
|
|
addition is used, which means that when the X index register contents
|
|
are added to the second object code byte, any carry will be discarded.
|
|
Note that only the X index register can be used with pre-indexed
|
|
addressing.
|
|
.NH 3
|
|
Post-indexed indirect addressing.
|
|
.PP
|
|
In this case, the object code consists of two bytes and the
|
|
second object code byte provides an 8-bit address.
|
|
Now the second object code byte indentifies a location
|
|
in the first 256 bytes of memory where an indirect address
|
|
will be found.
|
|
The contents of the Y index register are added to this indirect
|
|
address.
|
|
This may be illustrated as follows:
|
|
.sp 18
|
|
Note that only the Y index register can be used with post-indexed
|
|
indirect addressing.
|
|
.bp
|
|
.NH
|
|
What the CPU has and doesn't has.
|
|
.PP
|
|
Although the designers of the MCS6500 CPUs family state that
|
|
there is nothing very significant about the short stack (only
|
|
256 bytes) this stack caused problems for the back end.
|
|
The designers say that a 256-byte stack usually is sufficient
|
|
for any typical microcomputer, this is only true if the stack
|
|
is used only for return addresses of the JSR (jump to
|
|
subroutine) instruction.
|
|
But since the EM machine is suppost to be a stack machine and
|
|
high level languages need the ability of parameters and
|
|
locals in there procedures and function, this short stack
|
|
is unsufficiant.
|
|
So an software stack is implemented in this back end, requiring two
|
|
additional subroutines for stack handling.
|
|
These two stack handling subroutines slow down the processing time
|
|
of a program since the stack is used heavely.
|
|
.PP
|
|
Since parameters and locals of EM procedures are offseted
|
|
from the localbase of that procedure, indirect addressing
|
|
is havily used.
|
|
Offsets are positive (for parameters) and negative (for
|
|
local variables).
|
|
As explaned before the addressing modes the MCS6500 have a
|
|
post indexed indirect addressing mode.
|
|
This addressing mode can only handle positive offsets.
|
|
This raises a problem for accessing the local variables
|
|
I have chosen for the next solution.
|
|
A second local base is introduced.
|
|
This second local base is the real local base subtracted by
|
|
a constant BASE.
|
|
In the present situation of the back end the value of BASE
|
|
is 240.
|
|
This means that there are 240 bytes reseved for local
|
|
variables to be indirect addressed and 14 bytes for
|
|
the parameters.
|
|
.DS C
|
|
.B
|
|
THE CODE GENERATOR.
|
|
.R
|
|
.DE
|
|
.NH 0
|
|
Description of the machine table.
|
|
.PP
|
|
The machine description table consists of the following sections:
|
|
.IP 1.
|
|
The macro definitions.
|
|
.IP 2.
|
|
Constant definitions.
|
|
.IP 3.
|
|
Register definitions.
|
|
.IP 4.
|
|
Token definitions.
|
|
.IP 5.
|
|
Token expressions.
|
|
.IP 6.
|
|
Code rules.
|
|
.IP 7.
|
|
Move definitions.
|
|
.IP 8.
|
|
Test definitions.
|
|
.IP 9.
|
|
Stack definitions.
|
|
.NH 2
|
|
Macro definitions.
|
|
.PP
|
|
The macro definitions at the top of the table are expanded
|
|
by the preprocessor on occurence in the rest of the table.
|
|
.NH 2
|
|
Constant definitions.
|
|
.PP
|
|
There are three constants which must be defined at first.
|
|
The are:
|
|
.IP EM_WSIZE: 11
|
|
Number of bytes in a machine word.
|
|
This is the number of bytes a simple
|
|
.B
|
|
loc
|
|
.R
|
|
instruction will put on the stack.
|
|
.IP EM_PSIZE:
|
|
Number of bytes in a pointer.
|
|
This is the number of bytes a
|
|
.B
|
|
lal
|
|
.R
|
|
instruction will put on the stack.
|
|
.IP EM_BSIZE:
|
|
Number of bytes in the hole between AB and LB.
|
|
The calling sequence only saves LB on the stack so this
|
|
constant is equal to the pointer size.
|
|
.NH 1
|
|
Register definitions.
|
|
.PP
|
|
The only important register definition is the definition of
|
|
the registerpair AX.
|
|
Since the rest of the machine's registers Y, PC, ST serve
|
|
special purposes, the code generator cannot use them.
|
|
.NH 2
|
|
Token definitions
|
|
.PP
|
|
There is a fake token.
|
|
This token is put in the table, since the code generator generator
|
|
complains if it cannot find one.
|
|
.NH 2
|
|
Token expression definitions.
|
|
.PP
|
|
The token expression is also a fake one.
|
|
This token expression is put in the table, since the code generator
|
|
generator complains if it cannot find one.
|
|
.NH 2
|
|
Code rules.
|
|
.PP
|
|
The code rule section is the largest section in the table.
|
|
They specify EM patterns, stack patterns, code to be generated,
|
|
etc.
|
|
The syntax is:
|
|
.IP code rule:
|
|
EM pattern '|' stack pattern '|' code '|'
|
|
stack replacement '|' EM replacement '|'
|
|
.PP
|
|
All patterns are optional, however there must be at least one
|
|
pattern present.
|
|
If the EM pattern is missing the rule becomes a rewriting
|
|
rule or a
|
|
.B
|
|
coercion
|
|
.R
|
|
to be used when code generation cannot continue because of an
|
|
invalid stack pattern.
|
|
The code rules are preceeded by the word CODE:.
|
|
.NH 3
|
|
The EM pattern.
|
|
.PP
|
|
The EM pattern consists of a list of EM mnemonics followed by
|
|
a boolean expression. Examples:
|
|
.sp 1
|
|
.br
|
|
.B
|
|
loe
|
|
.R
|
|
.sp 1
|
|
will match a single
|
|
.B
|
|
loe
|
|
.R
|
|
instruction,
|
|
.sp 1
|
|
.br
|
|
.B
|
|
loc loc cif
|
|
.R
|
|
$1==2 && $2==8
|
|
.sp 1
|
|
is a pattern that will match
|
|
.sp 1
|
|
.br
|
|
.B
|
|
loc
|
|
.R
|
|
2
|
|
.br
|
|
.B
|
|
loc
|
|
.R
|
|
8
|
|
.br
|
|
.B
|
|
cif
|
|
.R
|
|
.sp 1
|
|
and
|
|
.sp 1
|
|
.br
|
|
.B
|
|
lol
|
|
inc
|
|
stl
|
|
.R
|
|
$1==$3
|
|
.sp 1
|
|
will match for example
|
|
.sp 1
|
|
.br
|
|
.B
|
|
lol
|
|
.R
|
|
6
|
|
.br
|
|
.B
|
|
inc
|
|
.R
|
|
.br
|
|
.B
|
|
stl
|
|
.R
|
|
6
|
|
.sp 1
|
|
A missing boolean expession evaluates to TRUE.
|
|
.PP
|
|
The code generator will match the longest EM pattern on every occasion,
|
|
if two patterns of the same length match the first in the table
|
|
will be chosen, while all patterns of length greater than or equal
|
|
to three are considered to be of the same length.
|
|
.NH 3
|
|
The stack pattern.
|
|
.PP
|
|
The only stack pattern that can occur is R16, which means that the
|
|
registerpair AX contains the word on top of the stack.
|
|
If this is not the case a coersion occurs.
|
|
This coersion generates a "jsr Pop", which means that the top
|
|
of the stack is popped and stored in the registerpair AX.
|
|
.NH 3
|
|
The code part.
|
|
.PP
|
|
The code part consists of three parts, stack cleanup, register
|
|
allocation, and code to be generated.
|
|
All of these may be omitted.
|
|
.NH 4
|
|
Stack cleanup.
|
|
.PP
|
|
When generating something like a branch instruction it might be
|
|
needed to empty the fake stack, that is, remove the AX registerpair.
|
|
This is done by the instruction remove(ALL)
|
|
.NH 4
|
|
Register allocation.
|
|
.PP
|
|
If the machine code to be generated uses the registerpair AX,
|
|
this is signaled to the code generator by the allocate(R16)
|
|
instruction.
|
|
If the registerpair AX resides on the fake stack, this will result
|
|
in a "jsr Push", which means that the registerpair AX is pushed on
|
|
the stack and will be free for further use.
|
|
If registerpair AX is not on the fake stack nothing happens.
|
|
.NH 4
|
|
Code to be generated.
|
|
.PP
|
|
Code to be generated is specified as a list of items of the following
|
|
kind:
|
|
.IP 1)
|
|
A string in double quotes("This is a string").
|
|
This is copied to the codefile and a newline ('\n') is appended.
|
|
Inside the string all normal C string conventions are allowed,
|
|
and substitutions can be made of the following sorts.
|
|
.RS
|
|
.IP a)
|
|
$1, $2 etc. These are the operand of the corresponding EM
|
|
instructions and are printed according to there type.
|
|
To put a real '$' inside the string it must be doubled ('$$').
|
|
.IP b)
|
|
%[1], %[2.reg], %[b.1] etc. these have there obvious meaning.
|
|
If they describe a complete token (%[1]) the printformat for
|
|
the token is used.
|
|
If they stand fo a basic term in an expression they will be
|
|
printed according to their type.
|
|
To put a real '%' inside the string it must be doubled ('%%').
|
|
.IP c)
|
|
%( arbitrary expression %). This allows inclusion of arbitrary
|
|
expressions inside strings.
|
|
Usually not needed very often, so that the akward notation
|
|
is not too bad.
|
|
Note that %(%[1]%) is equivalent to %[1].
|
|
.RE
|
|
.NH 3
|
|
stack replacement.
|
|
.PP
|
|
The stack replacement is a possibly empty list of items to be
|
|
pushed on the fake stack.
|
|
Three things can occur:
|
|
.IP 1)
|
|
%[1] is used if the registerpair AX was on the fake stack and is
|
|
to be pushed back onto it.
|
|
.IP 2)
|
|
%[a] is used if the registerpair AX is allocated with allocate(R16)
|
|
and is to be pushed onto the fake stack.
|
|
.IP 3)
|
|
It can also be empty.
|
|
.NH 3
|
|
EM replacement.
|
|
.PP
|
|
In exeptional cases it might be useful to leave part of the an EM
|
|
pattern undone.
|
|
For example, a
|
|
.B
|
|
sdl
|
|
.R
|
|
instruction might be split into two
|
|
.B
|
|
stl
|
|
.R
|
|
instructions when there is no 4-byte quantity on the stack.
|
|
The EM replacement part allows one to express this.
|
|
Example:
|
|
.sp 1
|
|
.br
|
|
.B
|
|
stl
|
|
.R
|
|
$1
|
|
.B
|
|
stl
|
|
.R
|
|
$1+2
|
|
.sp 1
|
|
The instructions are inserted in the stream so they can match
|
|
the first part of a pattern in the next step.
|
|
Note that since the code generator traverses the EM instructions
|
|
in a strict linear fashion, it is impossible to let the EM
|
|
replacement match later parts of a pattern.
|
|
So if there is a pattern
|
|
.sp 1
|
|
.br
|
|
.B
|
|
loc
|
|
stl
|
|
.R
|
|
$1==0
|
|
.sp1
|
|
and the input is
|
|
.sp 1
|
|
.br
|
|
.B
|
|
loc
|
|
.R
|
|
0
|
|
.B
|
|
sdl
|
|
.R
|
|
4
|
|
.sp 1
|
|
the
|
|
.B
|
|
loc
|
|
.R
|
|
0
|
|
will be processed first, then the
|
|
.B
|
|
sdl
|
|
.R
|
|
might be split into two
|
|
.B
|
|
stl
|
|
.R
|
|
's but the pattern cannot match now.
|
|
.NH 3
|
|
Move definitions.
|
|
.PP
|
|
This definition is a fake. This definition is put in the
|
|
table, since the code generator generator complains if it
|
|
cannot find one.
|
|
.NH 3
|
|
Test definitions.
|
|
.PP
|
|
Test definitions aren't used by the table.
|
|
.NH 3
|
|
Stack definitions.
|
|
.PP
|
|
When the generator has to push the registerpair AX, it must
|
|
know how to do so.
|
|
The machine code to be generated is defined here.
|
|
.NH 1
|
|
Some remarks.
|
|
.PP
|
|
The above description of the machine table is
|
|
a description of the table for the MCS6500.
|
|
It uses only a part of the possibilities which the code generator
|
|
generator offers.
|
|
For a more precise and detailed description see [2].
|
|
.DS C
|
|
.B
|
|
THE BACK END TABLE.
|
|
.R
|
|
.DE
|
|
.NH 0
|
|
Introduction.
|
|
.PP
|
|
The code rules are divided in 15 groups.
|
|
These groups are:
|
|
.IP 1.
|
|
Load instructions.
|
|
.IP 2.
|
|
Store instructions.
|
|
.IP 3.
|
|
Integer arithmetic instructions.
|
|
.IP 4.
|
|
Unsigned arithmetic instructions.
|
|
.IP 5.
|
|
Floating point arithmetic instructions.
|
|
.IP 6.
|
|
Pointer arithmetic instructions.
|
|
.IP 7.
|
|
Increment, decrement and zero instructions.
|
|
.IP 8.
|
|
Convert instructions.
|
|
.IP 9.
|
|
Logical instructions.
|
|
.IP 10.
|
|
Set manipulation instructions.
|
|
.IP 11.
|
|
Array instructions.
|
|
.IP 12.
|
|
Compare instructions.
|
|
.IP 13.
|
|
Branch instructions.
|
|
.IP 14.
|
|
Procedure call instructions.
|
|
.IP 15.
|
|
Miscellaneous instructions.
|
|
.PP
|
|
From all of these groups one or two typical EM pattern will be explained
|
|
in the next paragraphs.
|
|
Comment is placed between /* and */ (/* This is a comment */).
|
|
.NH
|
|
The instructions.
|
|
.NH 2
|
|
The load instructions.
|
|
.PP
|
|
In this group a typical instruction is
|
|
.B
|
|
lol
|
|
.R
|
|
.
|
|
A
|
|
.B
|
|
lol
|
|
.R
|
|
instruction pushes the word at local base + offset, where offset
|
|
is the instructions argument, onto the stack.
|
|
Since the MCS6500 can only offset by 256 bytes, as explaned at the
|
|
memory addressing modes, there is a need for two code rules in the
|
|
table.
|
|
One which can offset directly and one that must explicit
|
|
calculate the address of the local.
|
|
.NH 3
|
|
The lol instruction with indirect offsetting.
|
|
.PP
|
|
In this case an indirect offsetted load from the second local base
|
|
is possible.
|
|
The table content is:
|
|
.sp 1
|
|
.br
|
|
.B
|
|
lol
|
|
.R
|
|
IN($1) | |
|
|
.br
|
|
allocate(R16) /* allocate registerpair AX */
|
|
.br
|
|
"ldy #BASE+$1" /* load Y with the offset from the second
|
|
.br
|
|
local base */
|
|
.br
|
|
"lda (LBl),y" /* load indirect the lowbyte of the word */
|
|
.br
|
|
"tax" /* move register A to register X */
|
|
.br
|
|
"iny" /* increment register Y (offset) */
|
|
.br
|
|
"lda (LBl),y" /* load indirect the highbyte of the word */
|
|
.br
|
|
| %[a] | | /* push the word onto the fake stack */
|
|
.NH 3
|
|
The lol instruction whose offset is to big.
|
|
.PP
|
|
In this case, the library subroutine "Lol" is used.
|
|
This subroutine expects the offset in registerpair AX, then
|
|
calculates the address of the local or parameter, and loads
|
|
it into registerpair AX.
|
|
The table content is:
|
|
.sp 1
|
|
.br
|
|
.B
|
|
lol
|
|
.R
|
|
| |
|
|
.br
|
|
allocate(R16) /* allocate registerpair AX */
|
|
.br
|
|
"lda #[$1].h" /* load highbyte of offset into register A */
|
|
.br
|
|
"ldx #[$1].l" /* load lowbyte of offset into register X */
|
|
.br
|
|
"jsr Lol" /* perform the subroutine */
|
|
.br
|
|
| %[a] | | /* push word onto the fake stack */
|
|
.NH 2
|
|
The store instructions.
|
|
.PP
|
|
In this group a typical instruction is
|
|
.B
|
|
stl.
|
|
.R
|
|
A
|
|
.B
|
|
stl
|
|
.R
|
|
instruction poppes a word from the stack and stores it in the word
|
|
at local base + offset, where offset is the instructions argument.
|
|
Here also is the need for two code rules in the table as a result
|
|
of the offset limits.
|
|
.NH 3
|
|
The stl instruction with indirect offsetting.
|
|
.PP
|
|
In this case it an indirect offsetted store from the second local
|
|
base is possible.
|
|
The table content is:
|
|
.sp 1
|
|
.br
|
|
.B
|
|
stl
|
|
.R
|
|
IN($1) | R16 | /* expect registerpair AX on top of the
|
|
.br
|
|
fake stack */
|
|
.br
|
|
"ldy #BASE+1+$1" /* load Y with the offset from the
|
|
.br
|
|
second local base */
|
|
.br
|
|
"sta (LBl),y" /* store the highbyte of the word from A */
|
|
.br
|
|
"txa" /* move register X to register A */
|
|
.br
|
|
"dey" /* decrement offset */
|
|
.br
|
|
"sta (LBl),y" /* store the lowbyte of the word from A */
|
|
.br
|
|
| | |
|
|
.NH 3
|
|
The stl instruction whose offset is to big.
|
|
.PP
|
|
In this case the library subroutine 'Stl' is used.
|
|
This subroutine expects the offset in registerpair AX, then
|
|
calculates the address, poppes the word stores it at its place.
|
|
The table content is:
|
|
.sp 1
|
|
.br
|
|
.B
|
|
stl
|
|
.R
|
|
| |
|
|
.br
|
|
allocate(R16) /* allocate registerpair AX */
|
|
.br
|
|
"lda #[$1].h" /* load highbyte of offset in register A */
|
|
.br
|
|
"ldx #[$1].l" /* load lowbyte of offset in register X */
|
|
.br
|
|
"jsr Stl" /* perform the subroutine */
|
|
.br
|
|
| | |
|
|
.NH 2
|
|
Integer arithmetic instructions.
|
|
.PP
|
|
In this group typical instructions are
|
|
.B
|
|
adi
|
|
.R
|
|
and
|
|
.B
|
|
mli.
|
|
.R
|
|
These instructions, in this table, are implemented for 2-byte
|
|
and 4-byte integers.
|
|
The only arithmetic instructions available on the MCS6500 are
|
|
the ADC (add with carry), and SBC (subtract with not(carry)).
|
|
Not(carry) here means that in a subtraction, the one's complement
|
|
of the carry is taken.
|
|
The absence of multiply and division instructions forces the
|
|
use of subroutines to handle these cases.
|
|
Because there are no registers left to perform on the multiply
|
|
and division, zero page is used here.
|
|
The 4-byte integer arithmetic is implemented, because in C there
|
|
exists the integer type long.
|
|
A user is freely to use the type long, but will pay in performance.
|
|
.NH 3
|
|
The adi instruction.
|
|
.PP
|
|
In case of the
|
|
.B
|
|
adi
|
|
.R
|
|
2 (and
|
|
.B
|
|
sbi
|
|
.R
|
|
2) instruction there are many EM
|
|
patterns, so that the instruction can be performed in line in
|
|
most cases.
|
|
For the worst case there exists a subroutine in the library
|
|
which deals with the EM instruction.
|
|
In case of a
|
|
.B
|
|
adi
|
|
.R
|
|
4 (or
|
|
.B
|
|
sbi
|
|
.R
|
|
4) there only is a subroutine to deal with it.
|
|
A table content is:
|
|
.sp 1
|
|
.br
|
|
.B
|
|
lol lol adi
|
|
.R
|
|
(IN($1) && IN($2) && $3==2) | | /* is it in range */
|
|
.br
|
|
allocate(R16) /* allocate registerpair AX */
|
|
.br
|
|
"ldy #BASE+$1+1" /* load Y with offset for first operand */
|
|
.br
|
|
"lda (LBl),y" /* load indirect highbyte first operand */
|
|
.br
|
|
"pha" /* save highbyte first operand on hard_stack */
|
|
.br
|
|
"dey" /* decrement offset first operand */
|
|
.br
|
|
"lda (LBl),y" /* load indirect lowbyte first operand */
|
|
.br
|
|
"ldy #BASE+$2" /* load Y with offset for second operand */
|
|
.br
|
|
"clc" /* clear carry for addition */
|
|
.br
|
|
"adc (LBl),y" /* add the lowbytes of the operands */
|
|
.br
|
|
"tax" /* store lowbyte of result in place */
|
|
.br
|
|
"iny" /* increment offset second operand */
|
|
.br
|
|
"pla" /* get highbyte first operand */
|
|
.br
|
|
"adc (LBl),y" /* add the highbytes of the operands */
|
|
.br
|
|
| %[a] | | /* push the result onto the fake stack */
|
|
.NH 3
|
|
The mli instruction.
|
|
.PP
|
|
The
|
|
.B
|
|
mli
|
|
.R
|
|
2 instruction uses most the subroutine 'Mlinp'.
|
|
This subroutine expects the multiplicand in zero page
|
|
at locations ARTH, ARTH+1, while the multiplier is in zero
|
|
page locations ARTH+2, ARTH+3.
|
|
For a description of the algorithms used for multiplication and
|
|
division, see [3].
|
|
A table content is:
|
|
.sp 1
|
|
.br
|
|
.B
|
|
lol lol mli
|
|
.R
|
|
(IN($1) && IN($2) && $3==2) | |
|
|
.br
|
|
allocate(R16) /* allocate registerpair AX */
|
|
.br
|
|
"ldy #BASE+$1" /* load Y with offset of multiplicand */
|
|
.br
|
|
"lda (LBl),y" /* load indirect lowbyte of multiplicand */
|
|
.br
|
|
"sta ARTH" /* store lowbyte in zero page */
|
|
.br
|
|
"iny" /* increment offset of multiplicand */
|
|
.br
|
|
"lda (LBl),y" /* load indirect highbyte of multiplicand */
|
|
.br
|
|
"sta ARTH+1" /* store highbyte in zero page */
|
|
.br
|
|
"ldy #BASE+$2" /* load Y with offset of multiplier */
|
|
.br
|
|
"lda (LBl),y" /* load indirect lowbyte of multiplier */
|
|
.br
|
|
"sta ARTH+2" /* store lowbyte in zero page */
|
|
.br
|
|
"iny" /* increment offset of multiplier */
|
|
.br
|
|
"lda (LBl),y" /* load indirect highbyte of multiplier */
|
|
.br
|
|
"sta ARTH+3" /* store highbyte in zero page */
|
|
.br
|
|
"jsr Mlinp" /* perform the multiply */
|
|
.br
|
|
| %[a] | | /* push result onto fake stack */
|
|
.NH 2
|
|
The unsgned arithmetic instructions.
|
|
.PP
|
|
Since unsigned addition an subtraction is performed in the same way
|
|
as signed addition and subtraction, these cases are dealt with by
|
|
an EM replacement.
|
|
For mutiplication and division there are special subroutines.
|
|
.NH 3
|
|
Unsigned addition.
|
|
.PP
|
|
This is an example of the EM replacement strategy.
|
|
.sp 1
|
|
.br
|
|
.B
|
|
lol lol adu
|
|
.R
|
|
| | | |
|
|
.B
|
|
lol
|
|
.R
|
|
$1
|
|
.B
|
|
lol
|
|
.R
|
|
$2
|
|
.B
|
|
adi
|
|
.R
|
|
$3 |
|
|
.NH 2
|
|
Floating point arithmetic.
|
|
.PP
|
|
Floating point arithmetic isn't implemented in this table.
|
|
.NH 2
|
|
Pointer arithmetic instructions.
|
|
.PP
|
|
A typical pointer arithmetic instruction is
|
|
.B
|
|
adp
|
|
.R
|
|
2.
|
|
This instruction adds an offset and a pointer.
|
|
A table content is:
|
|
.sp 1
|
|
.br
|
|
.B
|
|
adp
|
|
.R
|
|
| | | |
|
|
.B
|
|
loc
|
|
.R
|
|
$1
|
|
.B
|
|
adi
|
|
.R
|
|
2 |
|
|
.NH 2
|
|
Increment, decrement and zero instructions.
|
|
.PP
|
|
In this group a typical instruction is
|
|
.B
|
|
inl
|
|
.R
|
|
, which increments a local or parameter.
|
|
The MCS6500 doesn't have an instruction to increment the
|
|
accumulator A, so the 'ADC' instruction must be used.
|
|
A table content is:
|
|
.sp 1
|
|
.br
|
|
.B
|
|
inl
|
|
.R
|
|
IN($1) | |
|
|
.br
|
|
allocate(R16) /* allocate registerpair AX */
|
|
.br
|
|
"ldy #BASE+$1" /* load Y with offset of the local */
|
|
.br
|
|
"clc" /* clear carry for addition */
|
|
.br
|
|
"lda (LBl),y" /* load indirect lowbyte of local */
|
|
.br
|
|
"adc #1" /* increment lowbyte */
|
|
.br
|
|
"sta (LBl),y" /* restore indirect the incremented lowbyte */
|
|
.br
|
|
"bcc 1f" /* if carry is clear then ready */
|
|
.br
|
|
"iny" /* increment offset of local */
|
|
.br
|
|
"lda (LBl),y" /* load indirect highbyte of local */
|
|
.br
|
|
"adc #0" /* add carry to highbyte */
|
|
.br
|
|
"sta (LBl),y\\n1:" /* restore indirect the highbyte */
|
|
.PP
|
|
If the offset of the local or parameter is to big, first the
|
|
local or parameter is fetched, than incremented, and then
|
|
restored.
|
|
.NH 2
|
|
Convert instructions.
|
|
.PP
|
|
In this case there are two convert instructions
|
|
which really do something.
|
|
One of them is in line code, and deals with the extension of
|
|
a character (1-byte) to an integer.
|
|
The other one is a subroutine which handles the conversion
|
|
between 2-byte integers and 4-byte integers.
|
|
.NH 3
|
|
The in line conversion.
|
|
.PP
|
|
The table content is:
|
|
.sp 1
|
|
.br
|
|
.B
|
|
loc loc cii
|
|
.R
|
|
$1==1 && $2==2 | R16 |
|
|
.br
|
|
"txa" /* see if sign extension is needed */
|
|
.br
|
|
"bpl 1f" /* there is no need for sign extension */
|
|
.br
|
|
"lda #0FFh" /* sign extension here */
|
|
.br
|
|
"bne 2f" /* conversion ready */
|
|
.br
|
|
"1: lda #0\\n2:" /* no sign extension here */
|
|
.NH 2
|
|
Logical instructions.
|
|
.PP
|
|
A typical instruction in this group is the logical
|
|
.B
|
|
and
|
|
.R
|
|
on two 2-byte words.
|
|
The logical
|
|
.B
|
|
and
|
|
.R
|
|
on groups of more than two bytes (max 254)
|
|
is also possible and uses a library subroutine.
|
|
.NH 3
|
|
The logical and on 2-byte groups.
|
|
.PP
|
|
The table content is:
|
|
.sp 1
|
|
.br
|
|
.B
|
|
and
|
|
.R
|
|
$1==2 | R16 | /* one group must be on the fake stack */
|
|
.br
|
|
"sta ARTH+1" /* temporary save of first group highbyte */
|
|
.br
|
|
"stx ARTH" /* temporary save of first group lowbyte */
|
|
.br
|
|
"jsr Pop" /* pop second group from the stack */
|
|
.br
|
|
"and ARTH+1" /* logical and on highbytes */
|
|
.br
|
|
"pha" /* temporary save the result's highbyte */
|
|
.br
|
|
"txa" /* logical and can only be done in A */
|
|
.br
|
|
"and ARTH" /* logical and on lowbytes */
|
|
.br
|
|
"tax" /* restore results lowbyte */
|
|
.br
|
|
"pla" /* restore results highbyte */
|
|
.br
|
|
| %[1] | | /* push result onto fake stack */
|
|
.NH 2
|
|
Set manipulation instructions.
|
|
.PP
|
|
A typical EM pattern in this group is
|
|
.B
|
|
loc inn zeq
|
|
.R
|
|
$1>0 && $1<16 && $2==2.
|
|
This EM pattern works on sets of 16 bits.
|
|
Sets can be bigger (max 256 bytes = 2048 bits), but than a
|
|
library routine is used instead of in line code.
|
|
The table content of the above EM pattern is:
|
|
.sp 1
|
|
.br
|
|
.B
|
|
loc inn zeq
|
|
.R
|
|
$1>0 && $1<16 && $2==2 | R16 |
|
|
.br
|
|
"ldy #$1+1" /* load Y with bit number */
|
|
.br
|
|
"stx ARTH" /* cannot rotate X, so use zero page */
|
|
.br
|
|
"1: lsr a" /* right shift A */
|
|
.br
|
|
"ror ARTH" /* right rotate zero page location */
|
|
.br
|
|
"dey" /* decrement Y */
|
|
.br
|
|
"bne 1b" /* shift $1 times */
|
|
.br
|
|
"bcc $1" /* no carry, so bit is zero */
|
|
.NH 2
|
|
Array instructions.
|
|
.PP
|
|
In this group a typical EM pattern is
|
|
.B
|
|
lae lar
|
|
.R
|
|
defined(rom(1,3)) | | | |
|
|
.B
|
|
lae
|
|
.R
|
|
$1
|
|
.B
|
|
aar
|
|
.R
|
|
$2
|
|
.B
|
|
loi
|
|
.R
|
|
rom(1,3).
|
|
This pattern uses the
|
|
.B
|
|
aar
|
|
.R
|
|
instruction, which is part of a typical EM pattern:
|
|
.sp 1
|
|
.br
|
|
.B
|
|
lae aar
|
|
.R
|
|
$2==2 && rom(1,3)==2 && rom(1,1)==0 | R16 | /* registerpair AX contains
|
|
the index in the array */
|
|
.br
|
|
"pha" /* save highbyte of index */
|
|
.br
|
|
"txa" /* move lowbyte of index to A */
|
|
.br
|
|
"asl a" /* shift left lowbyte == 2 times lowbyte */
|
|
.br
|
|
"tax" /* restore lowbyte */
|
|
.br
|
|
"pla" /* restore highbyte */
|
|
.br
|
|
"rol a" /* rotate left highbyte == 2 times highbyte */
|
|
.br
|
|
| %[1] | adi 2 | /* push new index, add to lowerbound array */
|
|
.NH 2
|
|
Compare instructions.
|
|
.PP
|
|
In this group all EM patterns are performed by calling
|
|
a subroutine.
|
|
Subroutines are used here because comparison is only
|
|
possible byte by byte.
|
|
This means a lot of code, and since compare are used frequently
|
|
a lot of in line code would be generated, and thus reducing
|
|
the space left for the software stack.
|
|
These subroutines can be found in the library.
|
|
.NH 2
|
|
Branch instructions.
|
|
.PP
|
|
A typical branch instruction is
|
|
.B
|
|
beq.
|
|
.R
|
|
The table content for it is:
|
|
.sp 1
|
|
.br
|
|
.B
|
|
beq
|
|
.R
|
|
| R16 |
|
|
.br
|
|
"sta BRANCH+1" /* save highbyte second operand in zero page */
|
|
.br
|
|
"stx BRANCH" /* save lowbyte second operand in zero page */
|
|
.br
|
|
"jsr Pop" /* pop the first operand */
|
|
.br
|
|
"cmp BRANCH+1" /* compare the highbytes */
|
|
.br
|
|
"bne 1f" /* there not equal so go on */
|
|
.br
|
|
"cpx BRANCH" /* compare the lowbytes */
|
|
.br
|
|
"beq $1\\n1:" /* lowbytes are also equal, so branch */
|
|
.PP
|
|
Another typical instruction in this group is
|
|
.B
|
|
zeq.
|
|
.R
|
|
The table content is:
|
|
.sp 1
|
|
.br
|
|
.B
|
|
zeq
|
|
.R
|
|
| R16 |
|
|
.br
|
|
"tay" /* move A to Y for setting testbits */
|
|
.br
|
|
"bmi $1" /* highbyte s minus so branch */
|
|
.br
|
|
"txa" /* move X to A for setting testbits */
|
|
.br
|
|
"beq $1\\n1:" /* lowbyte also zero, thus branch */
|
|
.NH 2
|
|
Procedure call instructions.
|
|
.PP
|
|
In this group one code generation might seem a little
|
|
akward.
|
|
It is the EM instruction
|
|
.B
|
|
cai
|
|
.R
|
|
which generates a 'jsr Indir'.
|
|
This is because there is no indirect jump_subroutine in the
|
|
MCS6500.
|
|
The only solution is to store the address in zero page, and then
|
|
do a 'jsr' to a known label.
|
|
At this label there must be an indirect jump instruction, which
|
|
perform a jump to the address stored in zero page.
|
|
In this case the label is Indir, and the address is stored in
|
|
zero page at the addresses ADDR, ADDR+1.
|
|
The tabel content is:
|
|
.sp 1
|
|
.br
|
|
.B
|
|
cai
|
|
.R
|
|
| R16 |
|
|
.br
|
|
"stx ADDR" /* store lowbyte of address in zero page */
|
|
.br
|
|
"sta ADDR+1" /* store highbyte of address in zero page */
|
|
.br
|
|
"jsr Indir" /* use the indirect jump */
|
|
.br
|
|
| | |
|
|
.NH 2
|
|
Miscellaneous instructions.
|
|
.PP
|
|
In this group, as the name suggests, there is no
|
|
typical EM instruction or EM pattern.
|
|
Most of the MCS6500 code to be generated uses a library subroutine
|
|
or is straightforward.
|
|
.DS C
|
|
.B
|
|
PERFORMANCE.
|
|
.R
|
|
.DE
|
|
.NH 0
|
|
Introduction.
|
|
.PP
|
|
To measure the performance of the back end table some timing
|
|
tests are done.
|
|
What to time?
|
|
In this case, the execution time of several Pascal statements
|
|
are timed.
|
|
Statements in C, which have a Pascal equivalence are timed also.
|
|
The statements are timed as follows.
|
|
A test program is been written, which executes two
|
|
nested for_loops from 1 to 1.000.
|
|
Within these for_loops the statement, which is to be tested, is placed,
|
|
so the statement will be executed 1.000.000 times.
|
|
Then the same program is executed without the test statement.
|
|
The time difference between the two executions is the time
|
|
neccesairy to execute the test statement 1.000.000 times.
|
|
The total time to execute the test statement requires thus the
|
|
time difference divided by 1.000.000.
|
|
.NH 0
|
|
Testing Pascal statements.
|
|
.PP
|
|
The next statements are tested.
|
|
.IP 1)
|
|
int1 := 0;
|
|
.IP 2)
|
|
int1 := int2 - 1;
|
|
.IP 3)
|
|
int1 := int1 + 1;
|
|
.IP 4)
|
|
int1 := icon1 - icon2;
|
|
.IP 5)
|
|
int1 := icon2 div icon1;
|
|
.IP 6)
|
|
int1 := int2 * int3;
|
|
.IP 7)
|
|
bool := (int1 < 0);
|
|
.IP 8)
|
|
bool := (int1 < 3);
|
|
.IP 9)
|
|
bool := ((int1 > 3) or (int1 < 3))
|
|
.IP 10)
|
|
case int1 of 1: bool := false; 2: bool := true end;
|
|
.IP 11)
|
|
if int1 = 0 then int2 := 3;
|
|
.IP 12)
|
|
while int1 > 0 do int1 := int1 - 1;
|
|
.IP 13)
|
|
m := a[k];
|
|
.IP 14)
|
|
let2 := ['a'..'c'];
|
|
.IP 15)
|
|
P3(x);
|
|
.IP 16)
|
|
dum := F3(x);
|
|
.IP 17)
|
|
s.overhead := 5400;
|
|
.IP 18)
|
|
with s do overhead := 5400;
|
|
.PP
|
|
These statement were tested in a procedure test.
|
|
.sp 1
|
|
.br
|
|
procedure test;
|
|
.br
|
|
var i, j, ... : integer;
|
|
.br
|
|
bool : boolean;
|
|
.br
|
|
let2 : set of char;
|
|
.br
|
|
begin
|
|
.br
|
|
for i := 1 to 1000
|
|
.br
|
|
for j := 1 to 1000
|
|
.br
|
|
STATEMENT
|
|
.br
|
|
end;
|
|
.sp 1
|
|
.PP
|
|
STATEMENT is one of the statements as shown above, or it is
|
|
the empty statement.
|
|
The assignment of used variables, if neccesairy, is done before
|
|
the first for_loop.
|
|
In case of the statement which uses the procedure call, statement
|
|
15, a dummy procedure is declared whose body is empty.
|
|
In case of the statement which uses the function, statement 16,
|
|
this function returns its argument.
|
|
for the timing of C statements a similar test program was
|
|
written.
|
|
.sp 1
|
|
.br
|
|
main()
|
|
.br
|
|
{
|
|
.br
|
|
int i, j, ...;
|
|
.br
|
|
for (i = 1; i <= 1000; i++)
|
|
.br
|
|
for (j = 1; j <= 1000; j++)
|
|
.br
|
|
STATEMENT
|
|
.br
|
|
}
|
|
.sp 1
|
|
.NH
|
|
The results.
|
|
.PP
|
|
Here are tables with the results of the time measurments.
|
|
Times are in microseconds (10^-6).
|
|
Some statements appear twice in the tables.
|
|
In the second case an array of 200 integers was declerated
|
|
before the variable to be tested, so this variable cannot
|
|
be accessed by indirect addressing from the second local base.
|
|
This results in a larger execution time of the statement to be
|
|
tested.
|
|
The column 68000 contains the times measured on a Bleasdale,
|
|
M68000 based, computer.
|
|
The times in column pdp are measured on a DEC pdp11/44, where
|
|
the times from column 6500 come from a BBC microcomputer.
|
|
.bp
|
|
.TS
|
|
expand;
|
|
c s s s
|
|
c c c c
|
|
lw35 nw7 nw7 nw7.
|
|
Pascal timing results
|
|
statement 68000 pdp 6500
|
|
_
|
|
T{
|
|
int1 := 0;
|
|
T} 4.0 5.8 16.7
|
|
4.0 4.2 97.8
|
|
_
|
|
T{
|
|
int1 := int2 - 1;
|
|
T} 7.2 7.1 27.2
|
|
6.9 7.1 206.5
|
|
_
|
|
T{
|
|
int1 := int1 + 1;
|
|
T} 6.9 6.8 27.2
|
|
6.4 6.7 106.5
|
|
_
|
|
T{
|
|
int1 := icon1 + icon2;
|
|
T} 6.2 6.2 25.6
|
|
6.2 6.0 106.6
|
|
_
|
|
T{
|
|
int1 := icon2 div icon1;
|
|
T} 14.9 14.3 372.6
|
|
14.9 14.7 453.7
|
|
_
|
|
T{
|
|
int1 := int2 * int3;
|
|
T} 11.5 12.0 558.1
|
|
11.3 11.6 728.6
|
|
_
|
|
T{
|
|
bool := (int1 < 0);
|
|
T} 7.2 6.9 122.8
|
|
7.8 8.1 453.2
|
|
_
|
|
T{
|
|
bool := (int1 < 3);
|
|
T} 7.3 7.6 126.0
|
|
7.2 8.1 232.2
|
|
_
|
|
T{
|
|
bool := ((int1 > 3) or (int1 < 3))
|
|
T} 10.1 12.0 307.8
|
|
10.2 11.9 440.1
|
|
_
|
|
T{
|
|
case int1 of 1: bool := false; 2: bool := true end;
|
|
T} 18.3 17.9 165.7
|
|
_
|
|
T{
|
|
if int1 = 0 then int2 := 3;
|
|
T} 9.5 8.5 133.8
|
|
_
|
|
T{
|
|
while int1 > 0 do int1 := int1 - 1;
|
|
T} 6.9 6.9 126.0
|
|
_
|
|
T{
|
|
m := a[k];
|
|
T} 7.2 6.8 134.3
|
|
_
|
|
T{
|
|
let2 := ['a'..'c'];
|
|
T} 38.4 38.8 447.4
|
|
_
|
|
T{
|
|
P3(x);
|
|
T} 18.9 18.8 180.3
|
|
_
|
|
T{
|
|
dum := F3(x);
|
|
T} 26.8 27.1 343.3
|
|
_
|
|
T{
|
|
s.overhead := 5400;
|
|
T} 4.6 4.1 16.7
|
|
_
|
|
T{
|
|
with s do overhead := 5400;
|
|
T} 4.2 4.3 16.7
|
|
.TE
|
|
.TS
|
|
expand;
|
|
c s s s
|
|
c c c c
|
|
lw35 nw7 nw7 nw7.
|
|
C timing results
|
|
statement 68000time pdptime 6500time
|
|
_
|
|
T{
|
|
int1 = 0;
|
|
T} 4.1 3.6 17.2
|
|
4.1 4.1 97.7
|
|
_
|
|
T{
|
|
int1 = int2 - 1;
|
|
T} 6.6 6.9 27.2
|
|
6.1 6.5 206.4
|
|
_
|
|
T{
|
|
int1 = int1 + 1;
|
|
T} 6.4 7.3 27.2
|
|
6.3 6.2 206.4
|
|
_
|
|
T{
|
|
int1 = int2 * int3;
|
|
T} 11.4 12.3 522.6
|
|
9.6 10.1 721.2
|
|
_
|
|
T{
|
|
int1 = (int2 < 0);
|
|
T} 7.2 7.6 126.4
|
|
7.4 7.7 232.5
|
|
_
|
|
T{
|
|
int1 = (int2 < 3);
|
|
T} 7.0 7.5 126.0
|
|
7.8 7.8 232.6
|
|
_
|
|
T{
|
|
int1 = ((int2 > 3) || (int2 < 3));
|
|
T} 11.8 12.2 193.4
|
|
11.5 13.2 245.6
|
|
_
|
|
T{
|
|
switch (int1) { case 1: int1 = 0; break; case 2: int1 = 1; break; }
|
|
T} 28.3 29.2 164.1
|
|
_
|
|
T{
|
|
if (int1 == 0) int2 = 3;
|
|
T} 4.8 4.8 19.4
|
|
_
|
|
T{
|
|
while (int2 > 0) int2 = int2 - 1;
|
|
T} 5.8 6.0 125.9
|
|
_
|
|
T{
|
|
int2 = a[int2];
|
|
T} 4.8 5.1 192.8
|
|
_
|
|
T{
|
|
P3(int2);
|
|
T} 18.8 18.4 180.3
|
|
_
|
|
T{
|
|
int2 = F3(int2);
|
|
T} 27.0 27.2 309.4
|
|
_
|
|
T{
|
|
s.overhead = 5400;
|
|
T} 5.0 4.1 16.7
|
|
.TE
|
|
.NH
|
|
Pascal statements which don't have a C equivalent.
|
|
.PP
|
|
At first, the two statements who perform an operation on constants
|
|
are left out.
|
|
These are left out while the C front end does constant folding,
|
|
while the Pascal front end doesn't.
|
|
So in C the statements int1 = icon1 + icon2; and int1 = icon1 / icont2;
|
|
will use the same amount of time since the expression is evaluated
|
|
by the front end.
|
|
The two other statements (let2 := ['a'..'c']; and
|
|
.B
|
|
with
|
|
.R
|
|
s
|
|
.B
|
|
do
|
|
.R
|
|
overhead := 5400;), aren't included in the C statement timing table,
|
|
because there constructs do not exist in C.
|
|
Although in C there can be direct bit manipulation, and thus can
|
|
be used to implement sets I have not used it here.
|
|
The
|
|
.B
|
|
with
|
|
.R
|
|
statement does not exists in C and there is nothing with the slightest
|
|
resemblance to it.
|
|
.PP
|
|
At first sight in the table , it looked if there is no much difference
|
|
in the times for the M68000 and the pdp11/44, in comparison with the
|
|
times needed by the MCS6500.
|
|
To verify this impression, I calculated the correlation coefficient
|
|
between the times of the M68000 and pdp11/44.
|
|
It turned out to be 0.997 for both the Pascal time tests and the C
|
|
time tests.
|
|
Since the correlation coefficient is near to one and the difference
|
|
between the times is small, they can be considered to be the same
|
|
as seen from the times of the MCS6500.
|
|
Then I have tried to make a grafic of the times from the M68000 and
|
|
the MCS6500.
|
|
Well, there was't any correlation to been seen, taken all the times.
|
|
The only correlation one could see, with some effort, was in the
|
|
times for the first three Pascal statements.
|
|
The two first C statements show also a correlation, which two points
|
|
always do.
|
|
.PP
|
|
Also the three Pascal statements
|
|
.B
|
|
case
|
|
.R
|
|
,
|
|
.B
|
|
if
|
|
.R
|
|
,
|
|
and
|
|
.B
|
|
while
|
|
.R
|
|
have a correlation coefficient of 0.999.
|
|
This is probably because the
|
|
.B
|
|
case
|
|
.R
|
|
statement uses a subroutine in both cases and the other two
|
|
statements
|
|
.B
|
|
if
|
|
.R
|
|
and,
|
|
.B
|
|
while
|
|
.R
|
|
generate in line code.
|
|
The last two Pascal statements use the same time, since the front
|
|
end wil generate the same EM code for both.
|
|
.PP
|
|
The independence between the rest of the test times is because
|
|
in these cases the object code for the MCS6500 uses library
|
|
subroutines, while the other processors can handle the EM code
|
|
with in line code.
|
|
.PP
|
|
It is clear that the MCS6500 is a slower device, it needs longer
|
|
execution times, the need of more library subroutines, but
|
|
there is no constant factor between it execution times and those
|
|
of other processors.
|
|
.PP
|
|
The slowing down of the MCS6500 as result of the need of a
|
|
library subroutine is illustrated by the muliplication
|
|
statement.
|
|
The MCS6500 needs a library subroutine, while the other
|
|
two processors have a machine instruction to perform the
|
|
multiply.
|
|
This results in a factor of 48.5, when the operands can be accessed
|
|
indirect by the MCS6500.
|
|
When the MCS6500 cannot access the operands indirectly the situation
|
|
is even worse.
|
|
The slight differences between the MCS6500 execution times for
|
|
Pascal statements and C statements is probably the result of the
|
|
front end, and thus beyond the scope of this discussion.
|
|
.PP
|
|
Another timing test is done in C on the statement k = i + j + 1983.
|
|
This statement is tested on many UNIX*
|
|
.FS
|
|
* UNIX is a Trademark of Bell Laboratories.
|
|
.FE
|
|
systems.
|
|
For a complete list see appendix A.
|
|
The slowest one is the IBM XT, which runs on a 8088 microprocessor.
|
|
The fasted one is the Amdahl computer.
|
|
Here is short table to illustrate the performance of the
|
|
MCS6500.
|
|
.TS
|
|
c c c
|
|
c n n.
|
|
machine short int
|
|
IBM XT 53.4 53.4
|
|
Amdahl 0.5 0.3
|
|
MCS6500 150.2 150.2
|
|
.TE
|
|
The MCS6500 is three times slower than the IBM XT, but threehundred
|
|
times slower than the Amdahl.
|
|
The reason why the times on the IBM XT and the MCS6500 are the
|
|
same for short's and int's, is that most C compilers make the types
|
|
short and integer the same size on 16-bit machines.
|
|
In this project the MCS6500 is regarded as a 16-bit machine.
|
|
.NH
|
|
Length tests.
|
|
.PP
|
|
I have also compiled several programs written in Pascal and C to
|
|
see if there is a resemblance between the number of bytes generated
|
|
in the machine's language.
|
|
In the tables:
|
|
.IP length: 9
|
|
The number of bytes of the source program.
|
|
.IP 68000:
|
|
The number of bytes of the a.out file for a M68000.
|
|
.IP pdp:
|
|
The number of bytes of the a.out file for a pdp11/44.
|
|
.IP 6500:
|
|
The number of bytes of the a.out file for a MCS6500.
|
|
.LP
|
|
These are the results:
|
|
.TS
|
|
c s s s
|
|
c c c c
|
|
n n n n.
|
|
Pascal programs
|
|
length 68000 pdp 6500
|
|
_
|
|
19946 14383 16090 26710
|
|
19484 20169 20190 35416
|
|
10849 10469 11464 18949
|
|
273 4221 5106 7944
|
|
1854 5807 6610 10301
|
|
.TE
|
|
.TS
|
|
c s s s
|
|
c c c c
|
|
n n n n.
|
|
C progams
|
|
length 68000 pdp 6500
|
|
_
|
|
9444 6927 8234 11559
|
|
7655 14353 18240 26251
|
|
4775 11309 15934 19910
|
|
639 6337 9660 12494
|
|
.TE
|
|
.PP
|
|
In contrast to the execution times of the test statements, the
|
|
object code files sizes show a constant factor between them.
|
|
After calculating the correlation coefficient, I have calculated
|
|
the line fitted between sizes.
|
|
.FS
|
|
* x is the number of bytes
|
|
.FE
|
|
.TS
|
|
c s s
|
|
c c c
|
|
l c c.
|
|
Pascal programs
|
|
processor corr. coef. fitted line
|
|
_
|
|
68000-pdp 0.996
|
|
68000-6500 0.999 1.76x + 502*
|
|
pdp-6500 0.999 1.80x - 1577
|
|
.TE
|
|
.TS
|
|
c s s
|
|
c c c
|
|
l c c.
|
|
C programs
|
|
processor corr. coef. fitted line
|
|
_
|
|
68000-pdp 0.974
|
|
68000-6500 0.992 1.80x + 502*
|
|
pdp-6500 0.980 1.40x - 1577
|
|
.TE
|
|
.PP
|
|
As seen from the tables above the correlation coefficient for
|
|
Pascal programs is better than the ones for C programs.
|
|
Thus the line fits best for Pascal programs.
|
|
With the formula of the best fitted line one can now estimate
|
|
the size of the object code, which a program needs, for a MCS6500
|
|
without having the compiler at hand.
|
|
One also can see from these formula that the object code
|
|
generated for a MCS6500 is about 1.8 times more than for the other
|
|
processors.
|
|
Since the number of bytes in the source file havily depends on the
|
|
programmer, how many spaces he or she uses, the size of the indenting
|
|
in structured programs, etc., there is no correlation between the
|
|
size of the source file and the size of the object file.
|
|
Also the use of comments has its influence on the size.
|
|
.bp
|
|
.DS C
|
|
.B
|
|
SUMMARY.
|
|
.R
|
|
.DE
|
|
.NH 0
|
|
Summary
|
|
.PP
|
|
In this chapter some final conclusions are made.
|
|
.PP
|
|
In spite of its simplicity, the MCS6500 is strong enough to
|
|
implement a EM machine.
|
|
A serious deficy of the MCS6500 is the missing of 16-bit
|
|
general purpose registers, and especially the missing of a
|
|
16-bit stackpointer.
|
|
As pointed out before, one 16-bit register can be simulated
|
|
by a pair of 8-bit registers, in fact, the accumulator A to
|
|
hold the highbyte, and the index register X to hold the lowbyte
|
|
of the word.
|
|
By lack of a 16-bit stackpointer, zero page must be used to hold
|
|
a stackpointer and there are also two subroutines needed for
|
|
manipulating the stack (Push and Pop).
|
|
.PP
|
|
As seen at the time tests, the simple instruction set of the
|
|
MCS6500 forces the use of library subroutines.
|
|
These library subroutines increas the execution time of the
|
|
programs.
|
|
.PP
|
|
The sizes of the object code files show a strong correlation
|
|
in contrast to the execution times.
|
|
With this correlatiuon one canestimate the size of a program
|
|
if it is to be used on a MCS6500.
|
|
.bp
|
|
.NH 0
|
|
.B
|
|
REFERENCES.
|
|
.R
|
|
.IP 1.
|
|
Osborn, A., Jacobson, S., and Kane, J. The Mos Technology MCS6500.
|
|
.B
|
|
An Introduction to Microcomputers ,
|
|
.R
|
|
Volume II, Some Real Products (june 1977) chap. 9.
|
|
.RS
|
|
.PP
|
|
A hardware description of some real existing CPU's, such as
|
|
the Intel Z80, MCS6500, etc. is given in this book.
|
|
.RE
|
|
.IP 2.
|
|
van Staveren, H.
|
|
The table driven code generator from the Amsterdam Compiler Kit.
|
|
Vrije Universiteit, Amsterdam, (July 11, 1983).
|
|
.RS
|
|
.PP
|
|
The defining document for writing a back end table.
|
|
.RE
|
|
.IP 3.
|
|
Tanenbaum, A.S. Structured Computer Organization.
|
|
Prentice Hall. (1976).
|
|
.RS
|
|
.PP
|
|
In this book computers are described as a hierarchy of levels,
|
|
with each one performing some well-defined function.
|
|
.RE
|