.BP .S1 "INTRODUCTION" EM is a family of intermediate languages designed for producing portable compilers. The general strategy is for a program called .B front end to translate the source program to EM. Another program, .B back .BW end translates EM to target assembly language. Alternatively, the EM code can be assembled to a binary form and interpreted. These considerations led to the following goals: .IS 2 10 .PS 1 4 .PT The design should allow translation to, or interpretation on, a wide range of existing machines. Design decisions should be delayed as far as possible and the implications of these decisions should be localized as much as possible. .N The current microcomputer technology offers 8, 16 and 32 bit machines with various sizes of address space. EM should be flexible enough to be useful on most of these machines. The differences between the members of the EM family should only concern the wordsize and address space size. .PT The architecture should ease the task of code generation for high level languages such as Pascal, C, Ada, Algol 68, BCPL. .PT The instruction set used by the interpreter should be compact, to reduce the amount of memory needed for program storage, and to reduce the time needed to transmit programs over communication lines. .PT It should be designed with microprogrammed implementations in mind; in particular, the use of many short fields within instruction opcodes should be avoided, because their extraction by the microprogram or conversion to other instruction formats is inefficient. .PE .IE .A The basic architecture is based on the concept of a stack. The stack is used for procedure return addresses, actual parameters, local variables, and arithmetic operations. There are several built-in object types, for example, signed and unsigned integers, floating point numbers, pointers and sets of bits. There are instructions to push and pop objects to and from the stack. The push and pop instructions are not typed. They only care about the size of the objects. For each built-in type there are reverse Polish type instructions that pop one or more objects from the top of the stack, perform an operation, and push the result back onto the stack. For all types except pointers, these instructions have the object size as argument. .P There are no visible general registers used for arithmetic operands etc. This is in contrast to most third generation computers, which usually have 8 or 16 general registers. The decision not to have a group of general registers was fully intentional, and follows W.L. Van der Poel's dictum that a machine should have 0, 1, or an infinite number of any feature. General registers have two primary uses: to hold intermediate results of complicated expressions, e.g. .IS 5 0 1 ((a*b + c*d)/e + f*g/h) * i .IE 1 and to hold local variables. .P Various studies have shown that the average expression has fewer than two operands, making the former use of registers of doubtful value. The present trend toward structured programs consisting of many small procedures greatly reduces the value of registers to hold local variables because the large number of procedure calls implies a large overhead in saving and restoring the registers at every call. .BP .P Although there are no general purpose registers, there are a few internal registers with specific functions as follows: .IS 2 .N 1 .TS tab(:); l 1 l l. PC:\-:Program Counter:Pointer to next instruction LB:\-:Local Base:Points to base of the local variables \ in the current procedure. SP:\-:Stack Pointer:Points to the highest occupied word on the stack. HP:\-:Heap Pointer:Points to the top of the heap area. .TE 1 .IE .A Furthermore, reverse Polish code is much easier to generate than multi-register machine code, especially if highly efficient code is desired. When translating to assembly language the back end can make good use of the target machine's registers. An EM machine can achieve high performance by keeping part of the stack in high speed storage (a cache or microprogram scratchpad memory) rather than in primary memory. .P Again according to van der Poel's dictum, all EM instructions have zero or one argument. We believe that instructions needing two arguments can be split into two simpler ones. The simpler ones can probably be used in other circumstances as well. Moreover, these two instructions together often have a shorter encoding than the single instruction before. .P This document describes EM at three different levels: the abstract level, the assembly language level and the machine language level. .A The most important level is that of the abstract EM architecture. This level deals with the basic design issues. Only the functional capabilities of instructions are relevant, not their format or encoding. Most chapters of this document refer to the abstract level and it is explicitly stated whenever another level is described. .A The assembly language is intended for the compiler writer. It presents a more or less orthogonal instruction set and provides symbolic names for data. Moreover, it facilitates the linking of separately compiled 'modules' into a single program by providing several pseudoinstructions. .A The machine language is designed for interpretation with a compact program text and easy decoding. The binary representation of the machine language instruction set is far from orthogonal. Frequent instructions have a short opcode. The encoding is fully byte oriented. These bytes do not contain small bit fields, because bit fields would slow down decoding considerably. .P A common use for EM is for producing portable (cross) compilers. When used this way, the compilers produce EM assembly language as their output. To run the compiled program on the target machine, the back end, translates the EM assembly language to the target machine's assembly language. When this approach is used, the format of the EM machine language instructions is irrelevant. On the other hand, when writing an interpreter for EM machine language programs, the interpreter must deal with the machine language and not with the symbolic assembly language. .P As mentioned above, the current microcomputer technology offers 8, 16 and 32 bit machines with address spaces ranging from 2\v'-0.5m'16\v'0.5m' to 2\v'-0.5m'32\v'0.5m' bytes. Having one size of pointers and integers restricts the usefulness of the language. We decided to have a different language for each combination of word and pointer size. All languages offer the same instruction set and differ only in memory alignment restrictions and the implicit size assumed in several instructions. The languages differ slightly for the different size combinations. For example: the size of any object on the stack and alignment restrictions. The wordsize is restricted to powers of 2 and the pointer size must be a multiple of the wordsize. Almost all programs handling EM will be parametrized with word and pointer size.