180 lines
		
	
	
	
		
			6.8 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			180 lines
		
	
	
	
		
			6.8 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
.BP
 | 
						|
.S1 "INTRODUCTION"
 | 
						|
EM is a family of intermediate languages designed for producing
 | 
						|
portable compilers.
 | 
						|
The general strategy is for a program called
 | 
						|
.B front end
 | 
						|
to translate the source program to EM.
 | 
						|
Another program,
 | 
						|
.B back
 | 
						|
.BW end
 | 
						|
translates EM to target assembly language.
 | 
						|
Alternatively, the EM code can be assembled to a binary form
 | 
						|
and interpreted.
 | 
						|
These considerations led to the following goals:
 | 
						|
.IS 2 10
 | 
						|
.PS 1 4
 | 
						|
.PT
 | 
						|
The design should allow translation to,
 | 
						|
or interpretation on, a wide range of existing machines.
 | 
						|
Design decisions should be delayed as far as possible
 | 
						|
and the implications of these decisions should
 | 
						|
be localized as much as possible.
 | 
						|
.N
 | 
						|
The current microcomputer technology offers 8, 16 and 32 bit machines
 | 
						|
with various sizes of address space.
 | 
						|
EM should be flexible enough to be useful on most of these
 | 
						|
machines.
 | 
						|
The differences between the members of the EM family should only
 | 
						|
concern the wordsize and address space size.
 | 
						|
.PT
 | 
						|
The architecture should ease the task of code generation for
 | 
						|
high level languages such as Pascal, C, Ada, Algol 68, BCPL.
 | 
						|
.PT
 | 
						|
The instruction set used by the interpreter should be compact,
 | 
						|
to reduce the amount of memory needed
 | 
						|
for program storage, and to reduce the time needed to transmit
 | 
						|
programs over communication lines.
 | 
						|
.PT
 | 
						|
It should be designed with microprogrammed implementations in
 | 
						|
mind; in particular, the use of many short fields within
 | 
						|
instruction opcodes should be avoided, because their extraction by the
 | 
						|
microprogram or conversion to other instruction formats is inefficient.
 | 
						|
.PE
 | 
						|
.IE
 | 
						|
.A
 | 
						|
The basic architecture is based on the concept of a stack. The stack
 | 
						|
is used for procedure return addresses, actual parameters, local variables,
 | 
						|
and arithmetic operations.
 | 
						|
There are several built-in object types,
 | 
						|
for example, signed and unsigned integers,
 | 
						|
floating point numbers, pointers and sets of bits.
 | 
						|
There are instructions to push and pop objects
 | 
						|
to and from the stack.
 | 
						|
The push and pop instructions are not typed.
 | 
						|
They only care about the size of the objects.
 | 
						|
For each built-in type there are
 | 
						|
reverse Polish type instructions that pop one or more
 | 
						|
objects from the top of
 | 
						|
the stack, perform an operation, and push the result back onto the
 | 
						|
stack.
 | 
						|
For all types except pointers,
 | 
						|
these instructions have the object size
 | 
						|
as argument.
 | 
						|
.P
 | 
						|
There are no visible general registers used for arithmetic operands
 | 
						|
etc. This is in contrast to most third generation computers, which usually
 | 
						|
have 8 or 16 general registers. The decision not to have a group of
 | 
						|
general registers was fully intentional, and follows W.L. Van der
 | 
						|
Poel's dictum that a machine should have 0, 1, or an infinite
 | 
						|
number of any feature. General registers have two primary uses: to hold
 | 
						|
intermediate results of complicated expressions, e.g.
 | 
						|
.IS 5 0 1
 | 
						|
((a*b + c*d)/e + f*g/h) * i
 | 
						|
.IE 1
 | 
						|
and to hold local variables.
 | 
						|
.P
 | 
						|
Various studies
 | 
						|
have shown that the average expression has fewer than two operands,
 | 
						|
making the former use of registers of doubtful value. The present trend
 | 
						|
toward structured programs consisting of many small
 | 
						|
procedures greatly reduces the value of registers to hold local variables
 | 
						|
because the large number of procedure calls implies a large overhead in
 | 
						|
saving and restoring the registers at every call.
 | 
						|
.BP
 | 
						|
.P
 | 
						|
Although there are no general purpose registers, there are a
 | 
						|
few internal registers with specific functions as follows:
 | 
						|
.IS 2
 | 
						|
.N 1
 | 
						|
.TS
 | 
						|
tab(:);
 | 
						|
l 1 l l.
 | 
						|
PC:-:Program Counter:Pointer to next instruction
 | 
						|
LB:-:Local Base:Points to base of the local variables \
 | 
						|
in the current procedure.
 | 
						|
SP:-:Stack Pointer:Points to the highest occupied word on the stack.
 | 
						|
HP:-:Heap Pointer:Points to the top of the heap area.
 | 
						|
.TE 1
 | 
						|
.IE
 | 
						|
.A
 | 
						|
Furthermore, reverse Polish code is much easier to generate than
 | 
						|
multi-register machine code, especially if highly efficient code is
 | 
						|
desired.
 | 
						|
When translating to assembly language the back end can make
 | 
						|
good use of the target machine's registers.
 | 
						|
An EM machine can
 | 
						|
achieve high performance by keeping part of the stack
 | 
						|
in high speed storage (a cache or microprogram scratchpad memory) rather
 | 
						|
than in primary memory.
 | 
						|
.P
 | 
						|
Again according to van der Poel's dictum,
 | 
						|
all EM instructions have zero or one argument.
 | 
						|
We believe that instructions needing two arguments
 | 
						|
can be split into two simpler ones.
 | 
						|
The simpler ones can probably be used in other
 | 
						|
circumstances as well.
 | 
						|
Moreover, these two instructions together often
 | 
						|
have a shorter encoding than the single
 | 
						|
instruction before.
 | 
						|
.P
 | 
						|
This document describes EM at three different levels:
 | 
						|
the abstract level, the assembly language level and
 | 
						|
the machine language level.
 | 
						|
.A
 | 
						|
The most important level is that of the abstract EM architecture.
 | 
						|
This level deals with the basic design issues.
 | 
						|
Only the functional capabilities of instructions are relevant, not their
 | 
						|
format or encoding.
 | 
						|
Most chapters of this document refer to the abstract level
 | 
						|
and it is explicitly stated whenever
 | 
						|
another level is described.
 | 
						|
.A
 | 
						|
The assembly language is intended for the compiler writer.
 | 
						|
It presents a more or less orthogonal instruction
 | 
						|
set and provides symbolic names for data.
 | 
						|
Moreover, it facilitates the linking of
 | 
						|
separately compiled 'modules' into a single program
 | 
						|
by providing several pseudoinstructions.
 | 
						|
.A
 | 
						|
The machine language is designed for interpretation with a compact
 | 
						|
program text and easy decoding.
 | 
						|
The binary representation of the machine language instruction set is
 | 
						|
far from orthogonal.
 | 
						|
Frequent instructions have a short opcode.
 | 
						|
The encoding is fully byte oriented.
 | 
						|
These bytes do not contain small bit fields, because
 | 
						|
bit fields would slow down decoding considerably.
 | 
						|
.P
 | 
						|
A common use for EM is for producing portable (cross) compilers.
 | 
						|
When used this way, the compilers produce
 | 
						|
EM assembly language as their output.
 | 
						|
To run the compiled program on the target machine,
 | 
						|
the back end, translates the EM assembly language to
 | 
						|
the target machine's assembly language.
 | 
						|
When this approach is used, the format of the EM
 | 
						|
machine language instructions is irrelevant.
 | 
						|
On the other hand, when writing an interpreter for EM machine language
 | 
						|
programs, the interpreter must deal with the machine language
 | 
						|
and not with the symbolic assembly language.
 | 
						|
.P
 | 
						|
As mentioned above, the
 | 
						|
current microcomputer technology offers 8, 16 and 32 bit
 | 
						|
machines with address spaces ranging from 2\v'-0.5m'16\v'0.5m'
 | 
						|
to 2\v'-0.5m'32\v'0.5m' bytes.
 | 
						|
Having one size of pointers and integers restricts
 | 
						|
the usefulness of the language.
 | 
						|
We decided to have a different language for each combination of
 | 
						|
word and pointer size.
 | 
						|
All languages offer the same instruction set and differ only in
 | 
						|
memory alignment restrictions and the implicit size assumed in
 | 
						|
several instructions.
 | 
						|
The languages
 | 
						|
differ slightly for the
 | 
						|
different size combinations.
 | 
						|
For example: the
 | 
						|
size of any object on the stack and alignment restrictions.
 | 
						|
The wordsize is restricted to powers of 2 and
 | 
						|
the pointer size must be a multiple of the wordsize.
 | 
						|
Almost all programs handling EM will be parametrized with word
 | 
						|
and pointer size.
 |