181 lines
6.8 KiB
Plaintext
181 lines
6.8 KiB
Plaintext
|
.BP
|
||
|
.S1 "INTRODUCTION"
|
||
|
EM is a family of intermediate languages designed for producing
|
||
|
portable compilers.
|
||
|
The general strategy is for a program called
|
||
|
.B front end
|
||
|
to translate the source program to EM.
|
||
|
Another program,
|
||
|
.B back
|
||
|
.BW end
|
||
|
translates EM to target assembly language.
|
||
|
Alternatively, the EM code can be assembled to a binary form
|
||
|
and interpreted.
|
||
|
These considerations led to the following goals:
|
||
|
.IS 2 10
|
||
|
.PS 1 4
|
||
|
.PT
|
||
|
The design should allow translation to,
|
||
|
or interpretation on, a wide range of existing machines.
|
||
|
Design decisions should be delayed as far as possible
|
||
|
and the implications of these decisions should
|
||
|
be localized as much as possible.
|
||
|
.N
|
||
|
The current microcomputer technology offers 8, 16 and 32 bit machines
|
||
|
with various sizes of address space.
|
||
|
EM should be flexible enough to be useful on most of these
|
||
|
machines.
|
||
|
The differences between the members of the EM family should only
|
||
|
concern the wordsize and address space size.
|
||
|
.PT
|
||
|
The architecture should ease the task of code generation for
|
||
|
high level languages such as Pascal, C, Ada, Algol 68, BCPL.
|
||
|
.PT
|
||
|
The instruction set used by the interpreter should be compact,
|
||
|
to reduce the amount of memory needed
|
||
|
for program storage, and to reduce the time needed to transmit
|
||
|
programs over communication lines.
|
||
|
.PT
|
||
|
It should be designed with microprogrammed implementations in
|
||
|
mind; in particular, the use of many short fields within
|
||
|
instruction opcodes should be avoided, because their extraction by the
|
||
|
microprogram or conversion to other instruction formats is inefficient.
|
||
|
.PE
|
||
|
.IE
|
||
|
.A
|
||
|
The basic architecture is based on the concept of a stack. The stack
|
||
|
is used for procedure return addresses, actual parameters, local variables,
|
||
|
and arithmetic operations.
|
||
|
There are several built-in object types,
|
||
|
for example, signed and unsigned integers,
|
||
|
floating point numbers, pointers and sets of bits.
|
||
|
There are instructions to push and pop objects
|
||
|
to and from the stack.
|
||
|
The push and pop instructions are not typed.
|
||
|
They only care about the size of the objects.
|
||
|
For each built-in type there are
|
||
|
reverse Polish type instructions that pop one or more
|
||
|
objects from the top of
|
||
|
the stack, perform an operation, and push the result back onto the
|
||
|
stack.
|
||
|
For all types except pointers,
|
||
|
these instructions have the object size
|
||
|
as argument.
|
||
|
.P
|
||
|
There are no visible general registers used for arithmetic operands
|
||
|
etc. This is in contrast to most third generation computers, which usually
|
||
|
have 8 or 16 general registers. The decision not to have a group of
|
||
|
general registers was fully intentional, and follows W.L. Van der
|
||
|
Poel's dictum that a machine should have 0, 1, or an infinite
|
||
|
number of any feature. General registers have two primary uses: to hold
|
||
|
intermediate results of complicated expressions, e.g.
|
||
|
.IS 5 0 1
|
||
|
((a*b + c*d)/e + f*g/h) * i
|
||
|
.IE 1
|
||
|
and to hold local variables.
|
||
|
.P
|
||
|
Various studies
|
||
|
have shown that the average expression has fewer than two operands,
|
||
|
making the former use of registers of doubtful value. The present trend
|
||
|
toward structured programs consisting of many small
|
||
|
procedures greatly reduces the value of registers to hold local variables
|
||
|
because the large number of procedure calls implies a large overhead in
|
||
|
saving and restoring the registers at every call.
|
||
|
.BP
|
||
|
.P
|
||
|
Although there are no general purpose registers, there are a
|
||
|
few internal registers with specific functions as follows:
|
||
|
.IS 2
|
||
|
.N 1
|
||
|
.TS
|
||
|
tab(:);
|
||
|
l 1 l l.
|
||
|
PC:-:Program Counter:Pointer to next instruction
|
||
|
LB:-:Local Base:Points to base of the local variables \
|
||
|
in the current procedure.
|
||
|
SP:-:Stack Pointer:Points to the highest occupied word on the stack.
|
||
|
HP:-:Heap Pointer:Points to the top of the heap area.
|
||
|
.TE 1
|
||
|
.IE
|
||
|
.A
|
||
|
Furthermore, reverse Polish code is much easier to generate than
|
||
|
multi-register machine code, especially if highly efficient code is
|
||
|
desired.
|
||
|
When translating to assembly language the back end can make
|
||
|
good use of the target machine's registers.
|
||
|
An EM machine can
|
||
|
achieve high performance by keeping part of the stack
|
||
|
in high speed storage (a cache or microprogram scratchpad memory) rather
|
||
|
than in primary memory.
|
||
|
.P
|
||
|
Again according to van der Poel's dictum,
|
||
|
all EM instructions have zero or one argument.
|
||
|
We believe that instructions needing two arguments
|
||
|
can be split into two simpler ones.
|
||
|
The simpler ones can probably be used in other
|
||
|
circumstances as well.
|
||
|
Moreover, these two instructions together often
|
||
|
have a shorter encoding than the single
|
||
|
instruction before.
|
||
|
.P
|
||
|
This document describes EM at three different levels:
|
||
|
the abstract level, the assembly language level and
|
||
|
the machine language level.
|
||
|
.A
|
||
|
The most important level is that of the abstract EM architecture.
|
||
|
This level deals with the basic design issues.
|
||
|
Only the functional capabilities of instructions are relevant, not their
|
||
|
format or encoding.
|
||
|
Most chapters of this document refer to the abstract level
|
||
|
and it is explicitly stated whenever
|
||
|
another level is described.
|
||
|
.A
|
||
|
The assembly language is intended for the compiler writer.
|
||
|
It presents a more or less orthogonal instruction
|
||
|
set and provides symbolic names for data.
|
||
|
Moreover, it facilitates the linking of
|
||
|
separately compiled 'modules' into a single program
|
||
|
by providing several pseudoinstructions.
|
||
|
.A
|
||
|
The machine language is designed for interpretation with a compact
|
||
|
program text and easy decoding.
|
||
|
The binary representation of the machine language instruction set is
|
||
|
far from orthogonal.
|
||
|
Frequent instructions have a short opcode.
|
||
|
The encoding is fully byte oriented.
|
||
|
These bytes do not contain small bit fields, because
|
||
|
bit fields would slow down decoding considerably.
|
||
|
.P
|
||
|
A common use for EM is for producing portable (cross) compilers.
|
||
|
When used this way, the compilers produce
|
||
|
EM assembly language as their output.
|
||
|
To run the compiled program on the target machine,
|
||
|
the back end, translates the EM assembly language to
|
||
|
the target machine's assembly language.
|
||
|
When this approach is used, the format of the EM
|
||
|
machine language instructions is irrelevant.
|
||
|
On the other hand, when writing an interpreter for EM machine language
|
||
|
programs, the interpreter must deal with the machine language
|
||
|
and not with the symbolic assembly language.
|
||
|
.P
|
||
|
As mentioned above, the
|
||
|
current microcomputer technology offers 8, 16 and 32 bit
|
||
|
machines with address spaces ranging from 2\v'-0.5m'16\v'0.5m'
|
||
|
to 2\v'-0.5m'32\v'0.5m' bytes.
|
||
|
Having one size of pointers and integers restricts
|
||
|
the usefulness of the language.
|
||
|
We decided to have a different language for each combination of
|
||
|
word and pointer size.
|
||
|
All languages offer the same instruction set and differ only in
|
||
|
memory alignment restrictions and the implicit size assumed in
|
||
|
several instructions.
|
||
|
The languages
|
||
|
differ slightly for the
|
||
|
different size combinations.
|
||
|
For example: the
|
||
|
size of any object on the stack and alignment restrictions.
|
||
|
The wordsize is restricted to powers of 2 and
|
||
|
the pointer size must be a multiple of the wordsize.
|
||
|
Almost all programs handling EM will be parametrized with word
|
||
|
and pointer size.
|