Now uses -ms macros

This commit is contained in:
ceriel 1993-03-30 15:43:44 +00:00
parent e9a4337ccf
commit 5117853b1b
19 changed files with 647 additions and 667 deletions

View file

@ -1,9 +1,7 @@
Makefile proto.make
READ_ME READ_ME
addend.n
app.codes.nr app.codes.nr
app.exam.nr app.exam.nr
app.int.nr
assem.nr assem.nr
cont.nr cont.nr
descr.nr descr.nr

View file

@ -3,7 +3,4 @@ DESCRIPTION OF A MACHINE ARCHITECTURE FOR USE WITH BLOCK STRUCTURED LANGUAGES
The file em.i (text of the defining interpreter) was hand-edited from int/em.p The file em.i (text of the defining interpreter) was hand-edited from int/em.p
To print, set NROFF and TBL in the Makefile and call make.
It uses the kun macro package which is also distributed.
The directory int contains the interpreter. The directory int contains the interpreter.

View file

@ -1,4 +1,4 @@
.BP .bp
.AP "EM CODE TABLES" .AP "EM CODE TABLES"
The following table is used by the assembler for EM machine The following table is used by the assembler for EM machine
language. language.
@ -10,65 +10,66 @@ Each line describes a range of interpreter opcodes by
specifying for which instruction the range is used, the type of the specifying for which instruction the range is used, the type of the
opcodes (mini, shortie, etc..) and range for the instruction opcodes (mini, shortie, etc..) and range for the instruction
argument. argument.
.A .QQ
The first field on each line gives the EM instruction mnemonic, The first field on each line gives the EM instruction mnemonic,
the second field gives some flags. the second field gives some flags.
If the opcodes are minis or shorties the third field specifies If the opcodes are minis or shorties the third field specifies
how many minis/shorties are used. how many minis/shorties are used.
The last field gives the number of the (first) interpreter The last field gives the number of the (first) interpreter
opcode. opcode.
.N 1 .LP
Flags : Flags :
.IS 3 .IP ""
.N 1
Opcode type, only one of the following may be specified. Opcode type, only one of the following may be specified.
.PS - 5 " " .RS
.PT \- .IP \-
opcode without argument opcode without argument
.PT m .IP m
mini mini
.PT s .IP s
shortie shortie
.PT 2 .IP 2
opcode with 2-byte signed argument opcode with 2-byte signed argument
.PT 4 .IP 4
opcode with 4-byte signed argument opcode with 4-byte signed argument
.PT 8 .IP 8
opcode with 8-byte signed argument opcode with 8-byte signed argument
.PT u .IP u
opcode with 2-byte unsigned argument opcode with 2-byte unsigned argument
.PE .RE
.IP ""
Secondary (escaped) opcodes. Secondary (escaped) opcodes.
.PS - 5 " " .RS
.PT e .IP e
The opcode thus marked is in the secondary opcode group instead The opcode thus marked is in the secondary opcode group instead
of the primary of the primary
.PE .RE
.IP ""
restrictions on arguments restrictions on arguments
.PS - 5 " " .RS
.PT N .IP N
Negative arguments only Negative arguments only
.PT P .IP P
Positive and zero arguments only Positive and zero arguments only
.PE .RE
.IP ""
mapping of arguments mapping of arguments
.PS - 5 " " .RS
.PT w .IP w
argument must be divisible by the wordsize and is divided by the argument must be divisible by the wordsize and is divided by the
wordsize before use as opcode argument. wordsize before use as opcode argument.
.PT o .IP o
argument ( possibly after division ) must be >= 1 and is argument ( possibly after division ) must be >= 1 and is
decremented before use as opcode argument decremented before use as opcode argument
.PE .RE
.IE .LP
If the opcode type is 2,4 or 8 the resulting argument is used as If the opcode type is 2,4 or 8 the resulting argument is used as
opcode argument (least significant byte first). opcode argument (least significant byte first).
.N
If the opcode type is mini, the argument is added If the opcode type is mini, the argument is added
to the first opcode \- if in range \- . to the first opcode \- if in range \- .
If the argument is negative, the absolute value minus one is If the argument is negative, the absolute value minus one is
used in the algorithm above. used in the algorithm above.
.N .br
For shorties with positive arguments the first opcode is used For shorties with positive arguments the first opcode is used
for arguments in the range 0..255, the second for the range for arguments in the range 0..255, the second for the range
256..511, etc.. 256..511, etc..
@ -78,30 +79,32 @@ for arguments in the range \-1..\-256, the second for the range
The byte following the opcode contains the least significant The byte following the opcode contains the least significant
byte of the argument. byte of the argument.
First some examples of these specifications. First some examples of these specifications.
.PS - 5 .IP "aar mwPo 1 34"
.PT "aar mwPo 1 34" .br
Indicates that opcode 34 is used as a mini for Positive Indicates that opcode 34 is used as a mini for Positive
instruction arguments only. instruction arguments only.
The w and o indicate division and decrementing of the The w and o indicate division and decrementing of the
instruction argument. instruction argument.
Because the resulting argument must be zero ( only opcode 34 may be used Because the resulting argument must be zero ( only opcode 34 may be used),
), this mini can only be used for instruction argument 2. this mini can only be used for instruction argument 2.
Conclusion: opcode 34 is for "AAR 2". Conclusion: opcode 34 is for "AAR 2".
.PT "adp sP 1 41" .IP "adp sP 1 41"
.br
Opcode 41 is used as shortie for ADP with arguments in the range Opcode 41 is used as shortie for ADP with arguments in the range
0..255. 0..255.
.PT "bra sN 2 60" .IP "bra sN 2 60"
.br
Opcode 60 is used as shortie for BRA with arguments \-1..\-256, Opcode 60 is used as shortie for BRA with arguments \-1..\-256,
61 is used for arguments \-257..\-512. 61 is used for arguments \-257..\-512.
.PT "zer e\- 145" .IP "zer e\- 145"
.br
Escaped opcode 145 is used for ZER. Escaped opcode 145 is used for ZER.
.PE .LP
The interpreter opcode table: The interpreter opcode table:
.N 1 .DS
.IS 3
.so itables .so itables
.IE .DE
.P .PP
The table above results in the following dispatch tables. The table above results in the following dispatch tables.
Dispatch tables are used by interpreters to jump to the Dispatch tables are used by interpreters to jump to the
routines implementing the EM instructions, indexed by the next opcode. routines implementing the EM instructions, indexed by the next opcode.
@ -110,60 +113,41 @@ of eight consecutive opcodes, preceded by the first opcode number
on that line. on that line.
Routine names consist of an EM mnemonic followed by a suffix. Routine names consist of an EM mnemonic followed by a suffix.
The suffices show the encoding used for each opcode. The suffices show the encoding used for each opcode.
.N .LP
The following suffices exist: The following suffices exist:
.N 1 .TS
.VS 1 0 tab(:);
.IS 4 l l.
.PS - 11 .z:no arguments
.PT .z .l:16-bit argument
no arguments .L:32-bit argument
.PT .l .u:16-bit unsigned argument
16-bit argument .lw:16-bit argument divided by the wordsize
.PT .L .Lw:32-bit argument divided by the wordsize
32-bit argument .p:positive 16-bit argument
.PT .u .P:positive 32-bit argument
16-bit unsigned argument .pw:positive 16-bit argument divided by the wordsize
.PT .lw .Pw:positive 32-bit argument divided by the wordsize
16-bit argument divided by the wordsize .n:negative 16-bit argument
.PT .Lw .N:negative 32-bit argument
32-bit argument divided by the wordsize .nw:negative 16-bit argument divided by the wordsize
.PT .p .Nw:negative 32-bit argument divided by the wordsize
positive 16-bit argument .s<num>:shortie with <num> as high order argument byte
.PT .P .w<num>:shortie with argument divided by the wordsize
positive 32-bit argument .<num>:mini with <num> as argument
.PT .pw .<num>W:mini with <num>*wordsize as argument
positive 16-bit argument divided by the wordsize .TE
.PT .Pw .LP
positive 32-bit argument divided by the wordsize
.PT .n
negative 16-bit argument
.PT .N
negative 32-bit argument
.PT .nw
negative 16-bit argument divided by the wordsize
.PT .Nw
negative 32-bit argument divided by the wordsize
.PT .s<num>
shortie with <num> as high order argument byte
.PT .w<num>
shortie with argument divided by the wordsize
.PT .<num>
mini with <num> as argument
.PT .<num>W
mini with <num>*wordsize as argument
.PE 1
<num> is a possibly negative integer. <num> is a possibly negative integer.
.VS .LP
.IE
The dispatch table for the 256 primary opcodes: The dispatch table for the 256 primary opcodes:
.N 1 .sp 1
.so dispat1 .so dispat1
.N 2 .sp 2
The list of secondary opcodes (escape1): The list of secondary opcodes (escape1):
.N 1 .sp 1
.so dispat2 .so dispat2
.N 2 .sp 2
Finally, the list of opcodes with four byte arguments (escape2). Finally, the list of opcodes with four byte arguments (escape2).
.N 1 .sp 1
.so dispat3 .so dispat3

View file

@ -1,7 +1,7 @@
.BP .bp
.AP "AN EXAMPLE PROGRAM" .AP "AN EXAMPLE PROGRAM"
.A 1 0 .PP
.NA .na
.ta 4n 8n 12n 16n 20n .ta 4n 8n 12n 16n 20n
.nf .nf
1 program example(output); 1 program example(output);
@ -45,12 +45,12 @@
39 test(r) 39 test(r)
40 end. 40 end.
.fi .fi
.AD .ad
.BP .bp
The EM code as produced by the Pascal-VU compiler is given below. Comments The EM code as produced by the Pascal-VU compiler is given below. Comments
have been added manually. Note that this code has already been optimized. have been added manually. Note that this code has already been optimized.
.A 1 0 .LP
.NA .na
.nf .nf
.ta 1n 24n .ta 1n 24n
mes 2,2,2 ; wordsize 2, pointersize 2 mes 2,2,2 ; wordsize 2, pointersize 2
@ -231,14 +231,13 @@ have been added manually. Note that this code has already been optimized.
end 0 end 0
mes 5 ; reals were used mes 5 ; reals were used
.fi .fi
.AD .ad
.A 1 0 .PP
The compact code corresponding to the above program is listed below. The compact code corresponding to the above program is listed below.
Read it horizontally, line by line, not column by column. Read it horizontally, line by line, not column by column.
Each number represents a byte of compact code, printed in decimal. Each number represents a byte of compact code, printed in decimal.
The first two bytes form the magic word. The first two bytes form the magic word.
.N 1 .LP
.IS 3
.Dr 33 .Dr 33
173 0 159 122 122 122 255 242 1 161 250 124 116 46 112 0 173 0 159 122 122 122 255 242 1 161 250 124 116 46 112 0
255 156 245 40 2 245 0 128 120 155 249 123 115 117 109 160 255 156 245 40 2 245 0 128 120 155 249 123 115 117 109 160
@ -274,4 +273,3 @@ The first two bytes form the magic word.
116 8 122 69 120 20 249 124 95 104 108 116 8 122 152 120 116 8 122 69 120 20 249 124 95 104 108 116 8 122 152 120
159 124 160 255 159 125 255 159 124 160 255 159 125 255
.De .De
.IE

View file

@ -1,6 +1,6 @@
.BP .bp
.SN 11 .P1 "EM ASSEMBLY LANGUAGE"
.S1 "EM ASSEMBLY LANGUAGE" .PP
We use two representations for assembly language programs, We use two representations for assembly language programs,
one is in ASCII and the other is the compact assembly language. one is in ASCII and the other is the compact assembly language.
The latter needs less space than the first for the same program The latter needs less space than the first for the same program
@ -16,7 +16,8 @@ The last part lists the EM instructions with the type of
arguments allowed and an indication of the function. arguments allowed and an indication of the function.
Appendix A gives a detailed description of the effect of all Appendix A gives a detailed description of the effect of all
instructions in the form of a Pascal program. instructions in the form of a Pascal program.
.S2 "ASCII assembly language" .P2 "ASCII assembly language"
.PP
An assembly language program consists of a series of lines, each An assembly language program consists of a series of lines, each
line may be blank, contain one (pseudo)instruction or contain one line may be blank, contain one (pseudo)instruction or contain one
label. label.
@ -25,13 +26,13 @@ Upper case is used in this
document merely to distinguish keywords from the surrounding prose. document merely to distinguish keywords from the surrounding prose.
Comment is allowed at the end of each line and starts with a semicolon ";". Comment is allowed at the end of each line and starts with a semicolon ";".
This kind of comment does not exist in the compact form. This kind of comment does not exist in the compact form.
.A .QQ
Labels must be placed all by themselves on a line and start in Labels must be placed all by themselves on a line and start in
column 1. column 1.
There are two kinds of labels, instruction and data labels. There are two kinds of labels, instruction and data labels.
Instruction labels are unsigned positive integers. Instruction labels are unsigned positive integers.
The scope of an instruction label is its procedure. The scope of an instruction label is its procedure.
.A .QQ
The pseudoinstructions CON, ROM and BSS may be preceded by a The pseudoinstructions CON, ROM and BSS may be preceded by a
line containing a line containing a
1\-8 character data label, the first character of which is a 1\-8 character data label, the first character of which is a
@ -46,13 +47,13 @@ These labels are considered as a special case and handled
more efficiently in compact assembly language (see below). more efficiently in compact assembly language (see below).
Note that a data label on its own or two consecutive labels are not Note that a data label on its own or two consecutive labels are not
allowed. allowed.
.P .PP
Each statement may contain an instruction mnemonic or pseudoinstruction. Each statement may contain an instruction mnemonic or pseudoinstruction.
These must begin in column 2 or later (not column 1) and must be followed These must begin in column 2 or later (not column 1) and must be followed
by a space, tab, semicolon or LF. by a space, tab, semicolon or LF.
Everything on the line following a semicolon is Everything on the line following a semicolon is
taken as a comment. taken as a comment.
.P .PP
Each input file contains one module. Each input file contains one module.
A module may contain many procedures, A module may contain many procedures,
which may be nested. which may be nested.
@ -62,14 +63,15 @@ collection of instructions and pseudoinstructions and finally an END
statement. statement.
Pseudoinstructions are also allowed between procedures. Pseudoinstructions are also allowed between procedures.
They do not belong to a specific procedure. They do not belong to a specific procedure.
.P .PP
All constants in EM are interpreted in the decimal base. All constants in EM are interpreted in the decimal base.
The ASCII assembly language accepts constant expressions The ASCII assembly language accepts constant expressions
wherever constants are allowed. wherever constants are allowed.
The operators recognized are: +, \-, *, % and / with the usual The operators recognized are: +, \-, *, % and / with the usual
precedence order. precedence order.
Use of the parentheses ( and ) to alter the precedence order is allowed. Use of the parentheses ( and ) to alter the precedence order is allowed.
.S3 "Instruction arguments" .P3 "Instruction arguments"
.PP
Unlike many other assembly languages, the EM assembly Unlike many other assembly languages, the EM assembly
language requires all arguments of normal and pseudoinstructions language requires all arguments of normal and pseudoinstructions
to be either a constant or an identifier, but not a combination to be either a constant or an identifier, but not a combination
@ -87,7 +89,7 @@ It is not allowed to add or subtract from instruction labels or procedure
identifiers, identifiers,
which certainly is not a severe restriction and greatly aids which certainly is not a severe restriction and greatly aids
optimization. optimization.
.P .PP
Instruction arguments can be constants, Instruction arguments can be constants,
data labels, data labels offsetted by a constant, instruction data labels, data labels offsetted by a constant, instruction
labels and procedure identifiers. labels and procedure identifiers.
@ -98,7 +100,7 @@ that fit in a word.
Arguments used as offsets to pointers should fit in a Arguments used as offsets to pointers should fit in a
pointer-sized integer. pointer-sized integer.
Finally, arguments to LDC should fit in a double-word integer. Finally, arguments to LDC should fit in a double-word integer.
.P .PP
Several instructions have two possible forms: Several instructions have two possible forms:
with an explicit argument and with an implicit argument on top of the stack. with an explicit argument and with an implicit argument on top of the stack.
The size of the implicit argument is the wordsize. The size of the implicit argument is the wordsize.
@ -109,7 +111,7 @@ integers on top of the stack are to be compared.
on top of the stack that specifies the size of the integers to on top of the stack that specifies the size of the integers to
be compared. be compared.
Thus the following two sequences are equivalent: Thus the following two sequences are equivalent:
.N 1 .KS
.TS .TS
center, tab(:) ; center, tab(:) ;
l r 30 l r. l r 30 l r.
@ -118,16 +120,18 @@ LDL:\-14:LDL:\-14
::LOC:4 ::LOC:4
CMI:4:CMI: CMI:4:CMI:
ZEQ:*1:ZEQ:*1 ZEQ:*1:ZEQ:*1
.TE 1 .TE
.KE
Section 11.1.6 shows the arguments allowed for each instruction. Section 11.1.6 shows the arguments allowed for each instruction.
.S3 "Pseudoinstruction arguments" .P3 "Pseudoinstruction arguments"
.PP
Pseudoinstruction arguments can be divided in two classes: Pseudoinstruction arguments can be divided in two classes:
Initializers and others. Initializers and others.
The following initializers are allowed: signed integer constants, The following initializers are allowed: signed integer constants,
unsigned integer constants, floating-point constants, strings, unsigned integer constants, floating-point constants, strings,
data labels, data labels offsetted by a constant, instruction data labels, data labels offsetted by a constant, instruction
labels and procedure identifiers. labels and procedure identifiers.
.P .PP
Constant initializers in BSS, HOL, CON and ROM pseudoinstructions Constant initializers in BSS, HOL, CON and ROM pseudoinstructions
can be followed by a letter I, U or F. can be followed by a letter I, U or F.
This indicator This indicator
@ -142,10 +146,9 @@ As in instruction arguments, initializers include expressions of the form:
\&"LABEL+offset" and "LABEL\-offset". \&"LABEL+offset" and "LABEL\-offset".
The offset must be an unsigned decimal constant. The offset must be an unsigned decimal constant.
The 'IUF' indicators cannot be used in the offsets. The 'IUF' indicators cannot be used in the offsets.
.P .PP
Data labels are referred to by their name. Data labels are referred to by their name.
.P .PP
Strings are surrounded by double quotes ("). Strings are surrounded by double quotes (").
Semicolon's in string do not indicate the start of comment. Semicolon's in string do not indicate the start of comment.
In the ASCII representation the escape character \e (backslash) In the ASCII representation the escape character \e (backslash)
@ -153,7 +156,6 @@ alters the meaning of subsequent character(s).
This feature allows inclusion of zeroes, graphic characters and This feature allows inclusion of zeroes, graphic characters and
the double quote in the string. the double quote in the string.
The following escape sequences exist: The following escape sequences exist:
.DS
.TS .TS
center, tab(:); center, tab(:);
l l l. l l l.
@ -166,7 +168,6 @@ backslash:\e:\e\e
double quote:":\e" double quote:":\e"
bit pattern:\fBddd\fP:\e\fBddd\fP bit pattern:\fBddd\fP:\e\fBddd\fP
.TE .TE
.DE
The escape \fB\eddd\fP consists of the backslash followed by 1, The escape \fB\eddd\fP consists of the backslash followed by 1,
2, or 3 octal digits specifying the value of 2, or 3 octal digits specifying the value of
the desired character. the desired character.
@ -176,17 +177,18 @@ the backslash is ignored.
Example: CON "hello\e012\e0". Example: CON "hello\e012\e0".
Each string element initializes a single byte. Each string element initializes a single byte.
The ASCII character set is used to map characters onto values. The ASCII character set is used to map characters onto values.
.P .PP
Instruction labels are referred to as *1, *2, etc. in both branch Instruction labels are referred to as *1, *2, etc. in both branch
instructions and as initializers. instructions and as initializers.
.P .PP
The notation $procname means the identifier for the procedure The notation $procname means the identifier for the procedure
with the specified name. with the specified name.
This identifier has the size of a pointer. This identifier has the size of a pointer.
.S3 Notation .P3 Notation
.PP
First, the notation used for the arguments, classes of First, the notation used for the arguments, classes of
instructions and pseudoinstructions. instructions and pseudoinstructions.
.IS 2 .DS
.TS .TS
tab(:); tab(:);
l l l. l l l.
@ -204,9 +206,10 @@ l l l.
<...>+:\&=:one or more of <...> <...>+:\&=:one or more of <...>
[...]:\&=:optional ... [...]:\&=:optional ...
.TE .TE
.IE .DE
.S3 "Pseudoinstructions" .P3 "Pseudoinstructions"
.S4 "Storage declaration" .P4 "Storage declaration"
.PP
Initialized global data is allocated by the pseudoinstruction CON, Initialized global data is allocated by the pseudoinstruction CON,
which needs at least one argument. which needs at least one argument.
Each argument is used to allocate and initialize a number of Each argument is used to allocate and initialize a number of
@ -215,7 +218,7 @@ The number of bytes to be allocated and the alignment depend on the type
of the argument. of the argument.
For each argument, an integral number of words, For each argument, an integral number of words,
determined by the argument type, is allocated and initialized. determined by the argument type, is allocated and initialized.
.P .PP
The pseudoinstruction ROM is the same as CON, The pseudoinstruction ROM is the same as CON,
except that it guarantees that the initialized words except that it guarantees that the initialized words
will not change during the execution of the program. will not change during the execution of the program.
@ -223,7 +226,7 @@ This information allows optimizers to do
certain calculations such as array indexing and certain calculations such as array indexing and
subrange checking at compile time instead subrange checking at compile time instead
of at run time. of at run time.
.P .PP
The pseudoinstruction BSS allocates The pseudoinstruction BSS allocates
uninitialized global data or large blocks of data initialized uninitialized global data or large blocks of data initialized
by the same value. by the same value.
@ -239,14 +242,14 @@ the second byte by 1 etc. in assembly language.
The assembler/loader adds the base address of The assembler/loader adds the base address of
the HOL block to these numbers to obtain the the HOL block to these numbers to obtain the
absolute address in the machine language. absolute address in the machine language.
.P .PP
The scope of a HOL block starts at the HOL pseudo and The scope of a HOL block starts at the HOL pseudo and
ends at the next HOL pseudo or at the end of a module ends at the next HOL pseudo or at the end of a module
whatever comes first. whatever comes first.
Each instruction falls in the scope of at most one Each instruction falls in the scope of at most one
HOL block, the current HOL block. HOL block, the current HOL block.
It is not allowed to have more than one HOL block per procedure. It is not allowed to have more than one HOL block per procedure.
.P .PP
The alignment restrictions are enforced by the The alignment restrictions are enforced by the
pseudoinstructions. pseudoinstructions.
All initializers are aligned on a multiple of their size or the wordsize All initializers are aligned on a multiple of their size or the wordsize
@ -257,52 +260,51 @@ Switching to another type of fragment or placing a label forces
word-alignment. word-alignment.
There are three types of fragments in global data space: CON, ROM and There are three types of fragments in global data space: CON, ROM and
BSS/HOL. BSS/HOL.
.N 1 .IP "BSS <cst1>,<val>,<cst2>"
.IS 2 .br
.PS - 4
.PT "BSS <cst1>,<val>,<cst2>"
Reserve <cst1> bytes. Reserve <cst1> bytes.
<val> is the value used to initialize the area. <val> is the value used to initialize the area.
<cst1> must be a multiple of the size of <val>. <cst1> must be a multiple of the size of <val>.
<cst2> is 0 if the initialization is not strictly necessary, <cst2> is 0 if the initialization is not strictly necessary,
1 if it is. 1 if it is.
.PT "HOL <cst1>,<val>,<cst2>" .IP "HOL <cst1>,<val>,<cst2>"
.br
Idem, but all following absolute global data references will Idem, but all following absolute global data references will
refer to this block. refer to this block.
Only one HOL is allowed per procedure, Only one HOL is allowed per procedure,
it has to be placed before the first instruction. it has to be placed before the first instruction.
.PT "CON <val>+" .IP "CON <val>+"
.br
Assemble global data words initialized with the <val> constants. Assemble global data words initialized with the <val> constants.
.PT "ROM <val>+" .IP "ROM <val>+"
.br
Idem, but the initialized data will never be changed by the program. Idem, but the initialized data will never be changed by the program.
.PE .P4 "Partitioning"
.IE .PP
.S4 "Partitioning"
Two pseudoinstructions partition the input into procedures: Two pseudoinstructions partition the input into procedures:
.IS 2 .IP "PRO <pro>[,<cst>]"
.PS - 4 .br
.PT "PRO <pro>[,<cst>]"
Start of procedure. Start of procedure.
<pro> is the procedure name. <pro> is the procedure name.
<cst> is the number of bytes for locals. <cst> is the number of bytes for locals.
The number of bytes for locals must be specified in the PRO or The number of bytes for locals must be specified in the PRO or
END pseudoinstruction. END pseudoinstruction.
When specified in both, they must be identical. When specified in both, they must be identical.
.PT "END [<cst>]" .IP "END [<cst>]"
.br
End of Procedure. End of Procedure.
<cst> is the number of bytes for locals. <cst> is the number of bytes for locals.
The number of bytes for locals must be specified in either the PRO or The number of bytes for locals must be specified in either the PRO or
END pseudoinstruction or both. END pseudoinstruction or both.
.PE .P4 "Visibility"
.IE .PP
.S4 "Visibility"
Names of data and procedures in an EM module can either be Names of data and procedures in an EM module can either be
internal or external. internal or external.
External names are known outside the module and are used to link External names are known outside the module and are used to link
several pieces of a program. several pieces of a program.
Internal names are not known outside the modules they are used in. Internal names are not known outside the modules they are used in.
Other modules will not 'see' an internal name. Other modules will not 'see' an internal name.
.A .QQ
To reduce the number of passes needed, To reduce the number of passes needed,
it must be known at the first occurrence whether it must be known at the first occurrence whether
a name is internal or external. a name is internal or external.
@ -312,47 +314,51 @@ If the first occurrence of a name is a reference,
the name is considered to be external. the name is considered to be external.
If the first occurrence is in one of the following pseudoinstructions, If the first occurrence is in one of the following pseudoinstructions,
the effect of the pseudo has precedence. the effect of the pseudo has precedence.
.IS 2 .IP "EXA <dlb>"
.PS - 4 .br
.PT "EXA <dlb>"
External name. External name.
<dlb> is known, possibly defined, outside this module. <dlb> is known, possibly defined, outside this module.
Note that <dlb> may be defined in the same module. Note that <dlb> may be defined in the same module.
.PT "EXP <pro>" .IP "EXP <pro>"
.br
External procedure identifier. External procedure identifier.
Note that <pro> may be defined in the same module. Note that <pro> may be defined in the same module.
.PT "INA <dlb>" .IP "INA <dlb>"
.br
Internal name. Internal name.
<dlb> is internal to this module and must be defined in this module. <dlb> is internal to this module and must be defined in this module.
.PT "INP <pro>" .IP "INP <pro>"
.br
Internal procedure. Internal procedure.
<pro> is internal to this module and must be defined in this module. <pro> is internal to this module and must be defined in this module.
.PE .P4 "Miscellaneous"
.IE .PP
.S4 "Miscellaneous"
Two other pseudoinstructions provide miscellaneous features: Two other pseudoinstructions provide miscellaneous features:
.IS 2 .IP "EXC <cst1>,<cst2>"
.PS - 4 .br
.PT "EXC <cst1>,<cst2>"
Two blocks of instructions preceding this one are Two blocks of instructions preceding this one are
interchanged before being processed. interchanged before being processed.
<cst1> gives the number of lines of the first block. <cst1> gives the number of lines of the first block.
<cst2> gives the number of lines of the second one. <cst2> gives the number of lines of the second one.
Blank and pure comment lines do not count. Blank and pure comment lines do not count.
This instruction is obsolete. Its use is strongly discouraged. This instruction is obsolete. Its use is strongly discouraged.
.PT "MES <cst>[,<par>]*" .IP "MES <cst>[,<par>]*"
.br
A special type of comment. A special type of comment.
Used by compilers to communicate with the Used by compilers to communicate with the
optimizer, assembler, etc. as follows: optimizer, assembler, etc. as follows:
.VS 1 0 .RS
.PS - 4 .IP "MES 0"
.PT "MES 0" .br
An error has occurred, stop further processing. An error has occurred, stop further processing.
.PT "MES 1" .IP "MES 1"
.br
Suppress optimization. Suppress optimization.
.PT "MES 2,<cst1>,<cst2>" .IP "MES 2,<cst1>,<cst2>"
.br
Use wordsize <cst1> and pointer size <cst2>. Use wordsize <cst1> and pointer size <cst2>.
.PT "MES 3,<cst1>,<cst2>,<cst3>,<cst4>" .IP "MES 3,<cst1>,<cst2>,<cst3>,<cst4>"
.br
Indicates that a local variable is never referenced indirectly. Indicates that a local variable is never referenced indirectly.
Used to indicate that a register may be used for a specific Used to indicate that a register may be used for a specific
variable. variable.
@ -361,51 +367,57 @@ and offset from LB if negative.
<cst2> gives the size of the variable. <cst2> gives the size of the variable.
<cst3> indicates the class of the variable. <cst3> indicates the class of the variable.
The following values are currently recognized: The following values are currently recognized:
.PS .br
.PT 0 0\0\0\0The variable can be used for anything.
The variable can be used for anything. .br
.PT 1 1\0\0\0The variable is used as a loopindex.
The variable is used as a loopindex. .br
.PT 2 2\0\0\0The variable is used as a pointer.
The variable is used as a pointer. .br
.PT 3 3\0\0\0The variable is used as a floating point number.
The variable is used as a floating point number. .br
.PE 0
<cst4> gives the priority of the variable, <cst4> gives the priority of the variable,
higher numbers indicate better candidates. higher numbers indicate better candidates.
.PT "MES 4,<cst>,<str>" .IP "MES 4,<cst>,<str>"
.br
Number of source lines in file <str> (for profiler). Number of source lines in file <str> (for profiler).
.PT "MES 5" .IP "MES 5"
.br
Floating point used. Floating point used.
.PT "MES 6,<val>*" .IP "MES 6,<val>*"
.br
Comment. Used to provide comments in compact assembly language. Comment. Used to provide comments in compact assembly language.
.PT "MES 7,....." .IP "MES 7,....."
.br
Reserved. Reserved.
.PT "MES 8,<pro>[,<dlb>]..." .IP "MES 8,<pro>[,<dlb>]..."
.br
Library module. Indicates that the module may only be loaded Library module. Indicates that the module may only be loaded
if it is useful, that is, if it can satisfy any unresolved if it is useful, that is, if it can satisfy any unresolved
references during the loading process. references during the loading process.
May not be preceded by any other pseudo, except MES's. May not be preceded by any other pseudo, except MES's.
.PT "MES 9,<cst>" .IP "MES 9,<cst>"
.br
Guarantees that no more than <cst> bytes of parameters are Guarantees that no more than <cst> bytes of parameters are
accessed, either directly or indirectly. accessed, either directly or indirectly.
.PT "MES 10,<cst>[,<par>]* .IP "MES 10,<cst>[,<par>]*
.br
This message number is reserved for the global optimizer. This message number is reserved for the global optimizer.
It inserts these messages in its output as hints to backends. It inserts these messages in its output as hints to backends.
<cst> indicates the type of hint. <cst> indicates the type of hint.
.PT "MES 11" .IP "MES 11"
.br
Procedures containing this message are possible destinations of Procedures containing this message are possible destinations of
non-local goto's with the GTO instruction. non-local goto's with the GTO instruction.
Some backends keep locals in registers, Some backends keep locals in registers,
the locals in this procedure should not be kept in registers and the locals in this procedure should not be kept in registers and
all registers containing locals of other procedures should be all registers containing locals of other procedures should be
saved upon entry to this procedure. saved upon entry to this procedure.
.PE 1 .RE
.VS .IP ""
Each backend is free to skip irrelevant MES pseudos. Each backend is free to skip irrelevant MES pseudos.
.PE .P2 "The Compact Assembly Language"
.IE .PP
.S2 "The Compact Assembly Language"
The assembler accepts input in a highly encoded form. The assembler accepts input in a highly encoded form.
This This
form is intended to reduce the amount of file transport between the form is intended to reduce the amount of file transport between the
@ -414,16 +426,14 @@ and back ends, and also reduces the amount of storage required for storing
libraries. libraries.
Libraries are stored as archived compact assembly language, not machine Libraries are stored as archived compact assembly language, not machine
language. language.
.P .PP
When beginning to read the input, the assembler is in neutral state, and When beginning to read the input, the assembler is in neutral state, and
expects either a label or an instruction (including the pseudoinstructions). expects either a label or an instruction (including the pseudoinstructions).
The meaning of the next byte(s) when in neutral state is as follows, where The meaning of the next byte(s) when in neutral state is as follows, where
b1, b2 b1, b2
etc. represent the succeeding bytes. etc. represent the succeeding bytes.
.N 1
.DS
.TS .TS
tab(:) ; tab(:);
rw17 4 l. rw17 4 l.
0:Reserved for future use 0:Reserved for future use
1\-129:Machine instructions, see Appendix A, alphabetical list 1\-129:Machine instructions, see Appendix A, alphabetical list
@ -433,38 +443,31 @@ rw17 4 l.
180\-239:Instruction labels 0 \- 59 (180 is local label 0 etc.) 180\-239:Instruction labels 0 \- 59 (180 is local label 0 etc.)
240\-244:See the Common Table below 240\-244:See the Common Table below
245\-255:Not used 245\-255:Not used
.TE 1 .TE
.DE 0
After a label, the assembler is back in neutral state; it can immediately After a label, the assembler is back in neutral state; it can immediately
accept another label or an instruction in the next byte. accept another label or an instruction in the next byte.
No linefeeds are used to separate lines. No linefeeds are used to separate lines.
.P .PP
If an opcode expects no arguments, If an opcode expects no arguments,
the assembler is back in neutral state after the assembler is back in neutral state after
reading the one byte containing the instruction number. reading the one byte containing the instruction number.
If it has one or If it has one or
more arguments (only pseudos have more than 1), the arguments follow directly, more arguments (only pseudos have more than 1), the arguments follow directly,
encoded as follows: encoded as follows:
.N 1
.IS 2
.TS .TS
tab(:); tab(:);
r l. r l.
0\-239:Offsets from \-120 to 119 0\-239:Offsets from \-120 to 119
240\-255:See the Common Table below 240\-255:See the Common Table below
.TE 1 .TE
Absence of an optional argument is indicated by a special Absence of an optional argument is indicated by a special
byte. byte.
.IE 2
.NE 7
.CS
Common Table for Neutral State and Arguments
.CE
.TS .TS
tab(:); tab(:);
c s s s
c c s c c c s c
l4 l l4 l. l4 l l4 l.
Common Table for Neutral State and Arguments
class:bytes:description class:bytes:description
<ilb>:240:b1:Instruction label b1 (Not used for branches) <ilb>:240:b1:Instruction label b1 (Not used for branches)
@ -486,7 +489,7 @@ class:bytes:description
<end>:255::Delimiter for argument lists or <end>:255::Delimiter for argument lists or
:::indicates absence of optional argument :::indicates absence of optional argument
.TE 1 .TE 1
.P .PP
The bytes specifying the value of a 16, 32 or 64 bit constant The bytes specifying the value of a 16, 32 or 64 bit constant
are presented in two's complement notation, with the least are presented in two's complement notation, with the least
significant byte first. For example: the value of a 32 bit significant byte first. For example: the value of a 32 bit
@ -494,25 +497,22 @@ constant is ((s4*256+b3)*256+b2)*256+b1, where s4 is b4\-256 if
b4 is greater than 128 else s4 takes the value of b4. b4 is greater than 128 else s4 takes the value of b4.
A <string> consists of a <cst> immediately followed by A <string> consists of a <cst> immediately followed by
a sequence of bytes with length <cst>. a sequence of bytes with length <cst>.
.P .PP
.ne 8 .ne 8
The pseudoinstructions fall into several categories, depending on their The pseudoinstructions fall into several categories, depending on their
arguments: arguments:
.N 1
.DS .DS
Group 1 \- EXC, BSS, HOL have a known number of arguments Group 1 \- EXC, BSS, HOL have a known number of arguments
Group 2 \- EXA, EXP, INA, INP have a string as argument Group 2 \- EXA, EXP, INA, INP have a string as argument
Group 3 \- CON, MES, ROM have a variable number of various things Group 3 \- CON, MES, ROM have a variable number of various things
Group 4 \- END, PRO have a trailing optional argument. Group 4 \- END, PRO have a trailing optional argument.
.DE 1 .DE
Groups 1 and 2 Groups 1 and 2
use the encoding described above. use the encoding described above.
Group 3 also uses the encoding listed above, with an <end> byte after the Group 3 also uses the encoding listed above, with an <end> byte after the
last argument to indicate the end of the list. last argument to indicate the end of the list.
Group 4 uses Group 4 uses
an <end> byte if the trailing argument is not present. an <end> byte if the trailing argument is not present.
.N 2
.IS 2
.TS .TS
tab(|); tab(|);
l s l l s l
@ -523,18 +523,17 @@ Example ASCII|Example compact
2||182 2||182
1||181 1||181
LOC|10|69 130 \0LOC|10|69 130
LOC|\-10|69 110 \0LOC|\-10|69 110
LOC|300|69 245 44 1 \0LOC|300|69 245 44 1
BRA|*19|18 139 \0BRA|*19|18 139
300||241 44 1 300||241 44 1
.3||242 3 .3||242 3
CON|4,9,*2,$foo|151 124 129 240 2 249 123 102 111 111 255 \0CON|4,9,*2,$foo|151 124 129 240 2 249 123 102 111 111 255
CON|.35|151 242 35 255 \0CON|.35|151 242 35 255
.TE 0 .TE
.IE 0 .P2 "Assembly language instruction list"
.S2 "Assembly language instruction list" .PP
.P
For each instruction in the list the range of argument values For each instruction in the list the range of argument values
in the assembly language is given. in the assembly language is given.
The column headed \fIassem\fP contains the mnemonics defined The column headed \fIassem\fP contains the mnemonics defined
@ -558,7 +557,7 @@ are indicated by letters:
.ds z \fBz\fP .ds z \fBz\fP
.ds o \fBo\fP .ds o \fBo\fP
.ds - \fB\-\fP .ds - \fB\-\fP
.N 1 .sp
.TS .TS
tab(:); tab(:);
c s l l c s l l
@ -579,8 +578,8 @@ l l 15 l l.
\&\*b:ilb:>= 0:label number \&\*b:ilb:>= 0:label number
\&\*r:cst:0,1,2:register number \&\*r:cst:0,1,2:register number
\&\*-:::no argument \&\*-:::no argument
.TE 1 .TE
.P .PP
The * at the rationale for \*w indicates that the argument The * at the rationale for \*w indicates that the argument
can either be given as argument or on top of the stack. can either be given as argument or on top of the stack.
If the argument is omitted, the argument is fetched from the If the argument is omitted, the argument is fetched from the
@ -589,8 +588,7 @@ it is assumed to be a wordsized unsigned integer.
Instructions that check for undefined integer or floating-point Instructions that check for undefined integer or floating-point
values and underflow or overflow values and underflow or overflow
are indicated below by (*). are indicated below by (*).
.N 1 .sp 1
.VS 0 0
.DS .DS
.ta 12n .ta 12n
GROUP 1 \- LOAD GROUP 1 \- LOAD
@ -687,7 +685,7 @@ GROUP 7 \- INCREMENT/DECREMENT/ZERO
ZER \*w : Load \*w zero bytes ZER \*w : Load \*w zero bytes
.DE .DE
.DS \" ??? .DS
GROUP 8 \- CONVERT (stack: source, source size, dest. size (top)) GROUP 8 \- CONVERT (stack: source, source size, dest. size (top))
CII \*- : Convert integer to integer (*) CII \*- : Convert integer to integer (*)
@ -744,7 +742,7 @@ GROUP 12 \- COMPARE
TGT \*- : True if greater, i.e. iff top of stack > 0 TGT \*- : True if greater, i.e. iff top of stack > 0
.DE .DE
.DS \" ??? .DS
GROUP 13 \- BRANCH GROUP 13 \- BRANCH
BRA \*b : Branch unconditionally to label \*b BRA \*b : Branch unconditionally to label \*b
@ -801,5 +799,4 @@ GROUP 15 \- MISCELLANEOUS
SIM \*- : Store 16 bit ignore mask SIM \*- : Store 16 bit ignore mask
STR \*r : Store register (0=LB, 1=SP, 2=HP) STR \*r : Store register (0=LB, 1=SP, 2=HP)
TRP \*- : Cause trap to occur (Error number on stack) TRP \*- : Cause trap to occur (Error number on stack)
.DE 0 .DE
.VS

View file

@ -1,6 +1,4 @@
.MS T A 0 .de PT
.ME ..
.BP .bp
.MS B A 0 .Ct
.ME
.CT

View file

@ -1,69 +1,62 @@
.SN 7 .bp
.BP .P1 "DESCRIPTORS"
.S1 "DESCRIPTORS" .PP
Several instructions use descriptors, notably the range check instruction, Several instructions use descriptors, notably the range check instruction,
the array instructions, the goto instruction and the case jump instructions. the array instructions, the goto instruction and the case jump instructions.
Descriptors reside in data space. Descriptors reside in data space.
They may be constructed at run time, but They may be constructed at run time, but
more often they are fixed and allocated in ROM data. more often they are fixed and allocated in ROM data.
.P .PP
All instructions using descriptors, except GTO, have as argument All instructions using descriptors, except GTO, have as argument
the size of the integers in the descriptor. the size of the integers in the descriptor.
All implementations have to allow integers of the size of a All implementations have to allow integers of the size of a
word in descriptors. word in descriptors.
All integers popped from the stack and used for indexing or comparing All integers popped from the stack and used for indexing or comparing
must have the same size as the integers in the descriptor. must have the same size as the integers in the descriptor.
.S2 "Range check descriptors" .P2 "Range check descriptors"
.PP
Range check descriptors consist of two integers: Range check descriptors consist of two integers:
.IS 2 .IP 1.
.PS 1 4 "" .
.PT
lower bound signed lower bound signed
.PT .IP 2.
upper bound signed upper bound signed
.PE .LP
.IE
The range check instruction checks an integer on the stack against The range check instruction checks an integer on the stack against
these bounds and causes a trap if the value is outside the interval. these bounds and causes a trap if the value is outside the interval.
The value itself is neither changed nor removed from the stack. The value itself is neither changed nor removed from the stack.
.S2 "Array descriptors" .P2 "Array descriptors"
.PP
Each array descriptor describes a single dimension. Each array descriptor describes a single dimension.
For multi-dimensional arrays, several array instructions are For multi-dimensional arrays, several array instructions are
needed to access a single element. needed to access a single element.
Array descriptors contain the following three integers: Array descriptors contain the following three integers:
.IS 2 .IP 1.
.PS 1 4 "" .
.PT
lower bound signed lower bound signed
.PT .IP 2.
upper bound \- lower bound unsigned upper bound \- lower bound unsigned
.PT .IP 3.
number of bytes per element unsigned number of bytes per element unsigned
.PE .LP
.IE
The array instructions LAR, SAR and AAR have the pointer to the start The array instructions LAR, SAR and AAR have the pointer to the start
of the descriptor as operand on the stack. of the descriptor as operand on the stack.
.sp .LP
The element A[I] is fetched as follows: The element A[I] is fetched as follows:
.IS 2 .IP 1.
.PS 1 4 "" .
.PT
Stack the address of A (e.g., using LAE or LAL) Stack the address of A (e.g., using LAE or LAL)
.PT .IP 2.
Stack the value of I (n-byte integer) Stack the value of I (n-byte integer)
.PT .IP 3.
Stack the pointer to the descriptor (e.g., using LAE) Stack the pointer to the descriptor (e.g., using LAE)
.PT .IP 4.
LAR n (n is the size of the integers in the descriptor and I) LAR n (n is the size of the integers in the descriptor and I)
.PE .LP
.IE
All array instructions first pop the address of the descriptor All array instructions first pop the address of the descriptor
and the index. and the index.
If the index is not within the bounds specified, a trap occurs. If the index is not within the bounds specified, a trap occurs.
If ok, (I~\-~lower bound) is multiplied If ok, (I~\-~lower bound) is multiplied
by the number of bytes per element (the third word). The result is added by the number of bytes per element (the third word). The result is added
to the address of A and replaces A on the stack. to the address of A and replaces A on the stack.
.A .QQ
At this point LAR, SAR and AAR diverge. At this point LAR, SAR and AAR diverge.
AAR is finished. LAR pops the address and fetches the data AAR is finished. LAR pops the address and fetches the data
item, item,
@ -71,21 +64,19 @@ the size being specified by the descriptor.
The usual restrictions for memory access must be obeyed. The usual restrictions for memory access must be obeyed.
SAR pops the address and stores the SAR pops the address and stores the
data item now exposed. data item now exposed.
.S2 "Non-local goto descriptors" .P2 "Non-local goto descriptors"
.PP
The GTO instruction provides a way of returning directly to any The GTO instruction provides a way of returning directly to any
active procedure invocation. active procedure invocation.
The argument of the instruction is the address of a descriptor The argument of the instruction is the address of a descriptor
containing three pointers: containing three pointers:
.IS 2 .IP 1.
.PS 1 4 "" .
.PT
value of PC after the jump value of PC after the jump
.PT .IP 2.
value of SP after the jump value of SP after the jump
.PT .IP 3.
value of LB after the jump value of LB after the jump
.PE .LP
.IE
GTO replaces the loads PC, SP and LB from the descriptor, GTO replaces the loads PC, SP and LB from the descriptor,
thereby jumping to a procedure thereby jumping to a procedure
and removing zero or more frames from the stack. and removing zero or more frames from the stack.
@ -94,7 +85,8 @@ dynamically enclosing procedure,
because some EM implementations will need to backtrack through because some EM implementations will need to backtrack through
the dynamic chain and use the implementation dependent data the dynamic chain and use the implementation dependent data
in frames to restore registers etc. in frames to restore registers etc.
.S2 "Case descriptors" .P2 "Case descriptors"
.PP
The case jump instructions CSA and CSB both The case jump instructions CSA and CSB both
provide multiway branches selected by a case index. provide multiway branches selected by a case index.
Both fetch two operands from the stack: Both fetch two operands from the stack:
@ -106,7 +98,7 @@ Therefore, the descriptors for CSA and CSB,
as shown in figure 4, are different. as shown in figure 4, are different.
All pointers in the table must be addresses of instructions in the All pointers in the table must be addresses of instructions in the
procedure executing the case instruction. procedure executing the case instruction.
.P .PP
CSA selects the new PC by indexing. CSA selects the new PC by indexing.
If the index, a signed integer, is greater than or equal to If the index, a signed integer, is greater than or equal to
the lower bound and less than or equal to the upper bound, the lower bound and less than or equal to the upper bound,
@ -116,23 +108,22 @@ The table does not contain the value of the upper bound,
but the value of upper-lower as an unsigned integer. but the value of upper-lower as an unsigned integer.
The default instruction pointer is used when the index is out of bounds. The default instruction pointer is used when the index is out of bounds.
If the resulting PC is 0, then trap. If the resulting PC is 0, then trap.
.P .PP
CSB selects the new PC by searching. CSB selects the new PC by searching.
The table is searched for an entry with index value equal to the case index. The table is searched for an entry with index value equal to the case index.
That entry or, if none is found, the default entry contains the That entry or, if none is found, the default entry contains the
new PC. new PC.
When the resulting PC is 0, a trap is performed. When the resulting PC is 0, a trap is performed.
.P .PP
The choice of which case instruction to use for The choice of which case instruction to use for
each source language case statement each source language case statement
is up to the front end. is up to the front end.
If the range of the index value is dense, i.e If the range of the index value is dense, i.e
.DS .DS
(highest value \- lowest value) / number of cases (highest value \- lowest value) / number of cases
.DE 1 .DE
is less than some threshold, then CSA is the obvious choice. is less than some threshold, then CSA is the obvious choice.
If the range is sparse, CSB is better. If the range is sparse, CSB is better.
.N 2
.Dr 30 .Dr 30
|--------------------| |--------------------| high address |--------------------| |--------------------| high address
| pointer for upb | | pointer n-1 | | pointer for upb | | pointer n-1 |

View file

@ -1,6 +1,6 @@
.BP .bp
.SN 4 .P1 "DATA ADDRESS SPACE"
.S1 "DATA ADDRESS SPACE" .PP
The data address space is divided into three parts, called 'areas', The data address space is divided into three parts, called 'areas',
each with its own addressing method: each with its own addressing method:
global data area, global data area,
@ -9,24 +9,24 @@ and heap data area.
These data areas must be part of the same These data areas must be part of the same
address space because all data is accessed by address space because all data is accessed by
the same type of pointers. the same type of pointers.
.P .PP
Space for global data is reserved using several pseudoinstructions in the Space for global data is reserved using several pseudoinstructions in the
assembly language, as described in assembly language, as described in
the next paragraph and chapter 11. the next paragraph and chapter 11.
The size of the global data area is fixed per program. The size of the global data area is fixed per program.
.A .QQ
Global data is addressed absolutely in the machine language. Global data is addressed absolutely in the machine language.
Many instructions are available to address global data. Many instructions are available to address global data.
They all have an absolute address as argument. They all have an absolute address as argument.
Examples are LOE, LAE and STE. Examples are LOE, LAE and STE.
.P .PP
Part of the global data area is initialized by the Part of the global data area is initialized by the
compiler, the compiler, the
rest is not initialized at all or is initialized rest is not initialized at all or is initialized
with a value, typically \-32768 or 0. with a value, typically \-32768 or 0.
Part of the initialized global data may be made read-only Part of the initialized global data may be made read-only
if the implementation supports protection. if the implementation supports protection.
.P .PP
The local data area is used as a stack, The local data area is used as a stack,
which grows from high to low addresses which grows from high to low addresses
and contains some data for each active procedure and contains some data for each active procedure
@ -44,14 +44,14 @@ Variables in other active procedures are addressed by following
the chain of statically enclosing procedures using the LXL or LXA instruction. the chain of statically enclosing procedures using the LXL or LXA instruction.
The variables in dynamically enclosing procedures can be The variables in dynamically enclosing procedures can be
addressed with the use of the DCH instruction. addressed with the use of the DCH instruction.
.A .QQ
Many instructions have offsets to LB as argument, Many instructions have offsets to LB as argument,
for instance LOL, LAL and STL. for instance LOL, LAL and STL.
The arguments of these instructions range from \-1 to some The arguments of these instructions range from \-1 to some
(negative) minimum (negative) minimum
for the access of local storage and from 0 to some (positive) for the access of local storage and from 0 to some (positive)
maximum for parameter access. maximum for parameter access.
.P .PP
The procedure call instructions CAL and CAI each create a new frame The procedure call instructions CAL and CAI each create a new frame
on the stack. on the stack.
Each procedure has an assembly-time parameter specifying Each procedure has an assembly-time parameter specifying
@ -62,7 +62,7 @@ Each procedure, therefore, starts with a stack with the local variables
already allocated. already allocated.
The return instructions RET and RTT remove a frame. The return instructions RET and RTT remove a frame.
The actual parameters must be removed by the calling procedure. The actual parameters must be removed by the calling procedure.
.P .PP
RET may copy some words from the stack of RET may copy some words from the stack of
the returning procedure to an unnamed 'function return area'. the returning procedure to an unnamed 'function return area'.
This area is available for 'READ-ONCE' access using the LFR instruction. This area is available for 'READ-ONCE' access using the LFR instruction.
@ -86,7 +86,7 @@ area is twice the pointer size,
because we want to be able to handle 'procedure instance because we want to be able to handle 'procedure instance
identifiers' which consist of a procedure identifier and the LB identifiers' which consist of a procedure identifier and the LB
of a frame belonging to that procedure. of a frame belonging to that procedure.
.P .PP
The heap data area grows upwards, to higher numbered The heap data area grows upwards, to higher numbered
addresses. addresses.
It is initially empty. It is initially empty.
@ -96,7 +96,8 @@ The heap pointer may be manipulated
by the LOR and STR instructions. by the LOR and STR instructions.
The heap can only be addressed indirectly, The heap can only be addressed indirectly,
by pointers derived from previous values of HP. by pointers derived from previous values of HP.
.S2 "Global data area" .P2 "Global data area"
.PP
The initial size of the global data area is determined at assembly time. The initial size of the global data area is determined at assembly time.
Global data is allocated by several Global data is allocated by several
pseudoinstructions in the EM assembly pseudoinstructions in the EM assembly
@ -109,7 +110,7 @@ under certain conditions, several blocks are allocated
in a single fragment. in a single fragment.
This guarantees that the bytes of these blocks This guarantees that the bytes of these blocks
are consecutive. are consecutive.
.P .PP
Global data is addressed absolutely in binary Global data is addressed absolutely in binary
machine language. machine language.
Most compilers, however, Most compilers, however,
@ -124,7 +125,7 @@ It is the task of the assembler/loader to
translate these labels into absolute addresses. translate these labels into absolute addresses.
These labels may also be used These labels may also be used
in CON and ROM pseudoinstructions to initialize pointers. in CON and ROM pseudoinstructions to initialize pointers.
.P .PP
The pseudoinstruction CON allocates initialized data. The pseudoinstruction CON allocates initialized data.
ROM acts like CON but indicates that the initialized data will ROM acts like CON but indicates that the initialized data will
not change during execution of the program. not change during execution of the program.
@ -134,7 +135,7 @@ data.
The pseudoinstruction HOL is similar to BSS, The pseudoinstruction HOL is similar to BSS,
but it alters the meaning of subsequent absolute addressing in but it alters the meaning of subsequent absolute addressing in
the assembly language. the assembly language.
.P .PP
Another type of global data is a small block, Another type of global data is a small block,
called the ABS block, with an implementation defined size. called the ABS block, with an implementation defined size.
Storage in this type of block can only be addressed Storage in this type of block can only be addressed
@ -146,7 +147,7 @@ update this counter.
A pointer at location 4 points to a string containing the A pointer at location 4 points to a string containing the
current source file name. current source file name.
The instruction FIL can be used to update the pointer. The instruction FIL can be used to update the pointer.
.P .PP
All numeric arguments of the instructions that address All numeric arguments of the instructions that address
the global data area refer to locations in the the global data area refer to locations in the
ABS block unless ABS block unless
@ -158,7 +159,7 @@ Thus LOE 0 loads the zeroth word of the most recent HOL, unless no HOL has
appeared in the current file so appeared in the current file so
far, in which case it loads the zeroth word of the far, in which case it loads the zeroth word of the
ABS fragment. ABS fragment.
.P .PP
The global data area is highly fragmented. The global data area is highly fragmented.
The ABS block and each HOL and BSS block are separate fragments. The ABS block and each HOL and BSS block are separate fragments.
The way fragments are formed from CON and ROM blocks is more complex. The way fragments are formed from CON and ROM blocks is more complex.
@ -169,12 +170,11 @@ allocated consecutively in a single fragment, unless
these CON pseudos are separated in the assembly language program these CON pseudos are separated in the assembly language program
by a data label definition or one or more of the following pseudos: by a data label definition or one or more of the following pseudos:
.DS .DS
ROM, BSS, HOL and END
ROM, BSS, HOL and END
.DE .DE
An analogous rule holds for ROM pseudos. An analogous rule holds for ROM pseudos.
.S2 "Local data area" .P2 "Local data area"
.PP
The local data area consists of a sequence of frames, one for The local data area consists of a sequence of frames, one for
each active procedure. each active procedure.
Below the frame of the current procedure resides the Below the frame of the current procedure resides the
@ -183,17 +183,15 @@ Frames are generated by procedure calls and are
removed by procedure returns. removed by procedure returns.
A procedure frame consists of six 'zones': A procedure frame consists of six 'zones':
.DS .DS
1. The return status block
1. The return status block 2. The local variables and compiler temporaries
2. The local variables and compiler temporaries 3. The register save block
3. The register save block 4. The dynamic local generators
4. The dynamic local generators 5. The operand stack.
5. The operand stack. 6. The parameters of a procedure one level deeper
6. The parameters of a procedure one level deeper
.DE .DE
A sample frame is shown in Figure 1. A sample frame is shown in Figure 1.
.P .PP
Before a procedure call is performed the actual Before a procedure call is performed the actual
parameters are pushed onto the stack of the calling procedure. parameters are pushed onto the stack of the calling procedure.
The exact details are compiler dependent. The exact details are compiler dependent.
@ -216,11 +214,11 @@ These instructions assume that this parameter contains the LB of
the statically enclosing procedure. the statically enclosing procedure.
Procedures that do not have a dynamically enclosing procedure Procedures that do not have a dynamically enclosing procedure
do not need a static link at offset 0. do not need a static link at offset 0.
.P .PP
Two instructions are available to perform procedure calls, CAL Two instructions are available to perform procedure calls, CAL
and CAI. and CAI.
Several tasks are performed by these call instructions. Several tasks are performed by these call instructions.
.A .QQ
First, a part of the status of the calling procedure is First, a part of the status of the calling procedure is
saved on the stack in the return status block. saved on the stack in the return status block.
This block should contain the return address of the calling This block should contain the return address of the calling
@ -235,12 +233,12 @@ The stack frames need not be contiguous then and the first
status save area can contain the parameter base AB, status save area can contain the parameter base AB,
which has the value of SP just after the last parameter has which has the value of SP just after the last parameter has
been pushed. been pushed.
.A .QQ
Second, the LB is changed to point to the Second, the LB is changed to point to the
first word above the local variables. first word above the local variables.
The new LB is a copy of the SP after the return status The new LB is a copy of the SP after the return status
block has been pushed. block has been pushed.
.A .QQ
Third, the amount of local storage needed by the procedure is Third, the amount of local storage needed by the procedure is
reserved. reserved.
The parameters and local storage are accessed by the same instructions. The parameters and local storage are accessed by the same instructions.
@ -256,28 +254,28 @@ The initial value of the allocated words is
not defined, but implementations that check for undefined not defined, but implementations that check for undefined
values will probably initialize them with a values will probably initialize them with a
special 'undefined' pattern, typically \-32768. special 'undefined' pattern, typically \-32768.
.A .QQ
Fourth, any EM implementation is allowed to reserve a variable size Fourth, any EM implementation is allowed to reserve a variable size
block beneath the local variables. block beneath the local variables.
This block could, for example, be used to save a variable number This block could, for example, be used to save a variable number
of registers. of registers.
.A .QQ
Finally, the address of the entry point of the called procedure Finally, the address of the entry point of the called procedure
is loaded into the Program Counter. is loaded into the Program Counter.
.P .PP
The ASP instruction can be used to allocate further (dynamic) The ASP instruction can be used to allocate further (dynamic)
local storage. local storage.
The base address of such storage must be obtained with a LOR~SP The base address of such storage must be obtained with a LOR~SP
instruction. instruction.
This same instruction ASP may also be used This same instruction ASP may also be used
to remove some words from the stack. to remove some words from the stack.
.P .PP
There is a version of ASP, called ASS, which fetches the number There is a version of ASP, called ASS, which fetches the number
of bytes to allocate from the stack. of bytes to allocate from the stack.
It can be used to allocate space for local It can be used to allocate space for local
objects whose size is unknown at compile time, objects whose size is unknown at compile time,
so called 'dynamic local generators'. so called 'dynamic local generators'.
.P .PP
Control is returned to the calling procedure with a RET instruction. Control is returned to the calling procedure with a RET instruction.
Any return value is then copied to the 'function return area'. Any return value is then copied to the 'function return area'.
The frame created by the call is deallocated and the status of The frame created by the call is deallocated and the status of
@ -293,7 +291,7 @@ Violating this restriction might result in hard to detect
errors. errors.
The calling procedure has to remove the parameters from the stack. The calling procedure has to remove the parameters from the stack.
This can be done with the aforementioned ASP instruction. This can be done with the aforementioned ASP instruction.
.P .PP
Each procedure frame is a separate fragment. Each procedure frame is a separate fragment.
Because any fragment may be placed anywhere in memory, Because any fragment may be placed anywhere in memory,
procedure frames need not be contiguous. procedure frames need not be contiguous.
@ -345,7 +343,8 @@ procedure frames need not be contiguous.
.Df .Df
Figure 1. A sample procedure frame and parameters. Figure 1. A sample procedure frame and parameters.
.De .De
.S2 "Heap data area" .P2 "Heap data area"
.PP
The heap area starts empty, with HP The heap area starts empty, with HP
pointing to the low end of it. pointing to the low end of it.
HP always contains a word address. HP always contains a word address.
@ -360,7 +359,7 @@ are allocated to the heap.
The heap may not grow into a part of memory that is already allocated. The heap may not grow into a part of memory that is already allocated.
When this is attempted, the STR instruction will cause a trap to occur. When this is attempted, the STR instruction will cause a trap to occur.
In this case, HP retains its old value. In this case, HP retains its old value.
.P .PP
The only way to address the heap is indirectly. The only way to address the heap is indirectly.
Whenever an object is allocated by increasing HP, Whenever an object is allocated by increasing HP,
then the old HP value must be saved and can be used later to address then the old HP value must be saved and can be used later to address
@ -370,7 +369,7 @@ is no longer part of the heap, then an attempt to access
the object is not allowed. the object is not allowed.
Furthermore, if the heap pointer is increased again to above Furthermore, if the heap pointer is increased again to above
the object address, then access to the old object gives undefined results. the object address, then access to the old object gives undefined results.
.P .PP
The heap is a single fragment. The heap is a single fragment.
All bytes have consecutive addresses. All bytes have consecutive addresses.
No limits are imposed on the size of the heap as long as it fits No limits are imposed on the size of the heap as long as it fits

View file

@ -1,3 +1,10 @@
.bp
.AP "EM INTERPRETER"
.nf
.ft CW
.lg 0
.nr x \w' '
.ta \nxu +\nxu +\nxu +\nxu +\nxu +\nxu +\nxu +\nxu +\nxu +\nxu
{ This is an interpreter for EM. It serves as the official machine { This is an interpreter for EM. It serves as the official machine
definition. This interpreter must run on a machine which supports definition. This interpreter must run on a machine which supports
@ -1666,3 +1673,6 @@ case insr of
writeln('halt with exit status: ',exitstatus:1); writeln('halt with exit status: ',exitstatus:1);
doident; doident;
end. end.
.ft P
.lg 1
.fi

View file

@ -1,56 +1,43 @@
.SN 8 .bp
.VS 1 0 .P1 "ENVIRONMENT INTERACTIONS"
.BP .PP
.S1 "ENVIRONMENT INTERACTIONS"
EM programs can interact with their environment in three ways. EM programs can interact with their environment in three ways.
Two, starting/stopping and monitor calls, are dealt with in this chapter. Two, starting/stopping and monitor calls, are dealt with in this chapter.
The remaining way to interact, interrupts, will be treated The remaining way to interact, interrupts, will be treated
together with traps in chapter 9. together with traps in chapter 9.
.S2 "Program starting and stopping" .P2 "Program starting and stopping"
.PP
EM user programs start with a call to a procedure called EM user programs start with a call to a procedure called
_m_a_i_n. _m_a_i_n.
The assembler and backends look for the definition of a procedure The assembler and backends look for the definition of a procedure
with this name in their input. with this name in their input.
The call passes three parameters to the procedure. The call passes three parameters to the procedure.
The parameters are similar to the parameters supplied by the The parameters are similar to the parameters supplied by the
UNIX .UX
.FS
UNIX is a Trademark of Bell Laboratories.
.FE
operating system to C programs. operating system to C programs.
These parameters are often called These parameters are often called \fBargc\fP, \fBargv\fP and \fBenvp\fP.
.BW argc ,
.B argv
and
.BW envp .
Argc is the parameter nearest to LB and is a wordsized integer. Argc is the parameter nearest to LB and is a wordsized integer.
The other two are pointers to the first element of an array of The other two are pointers to the first element of an array of
string pointers. string pointers.
.N The \fBargv\fP array contains \fBargc\fP
The
.B argv
array contains
.B argc
strings, the first of which contains the program call name. strings, the first of which contains the program call name.
The other strings in the The other strings in the \fBargv\fP
.B argv
array are the program parameters. array are the program parameters.
.P .PP
The The \fBenvp\fP
.B envp
array contains strings in the form "name=string", where 'name' array contains strings in the form "name=string", where 'name'
is the name of an environment variable and string its value. is the name of an environment variable and string its value.
The The \fBenvp\fP
.B envp
is terminated by a zero pointer. is terminated by a zero pointer.
.P .PP
An EM user program stops if the program returns from the first An EM user program stops if the program returns from the first
invocation of _m_a_i_n. invocation of _m_a_i_n.
The contents of the function return area are used to procure a The contents of the function return area are used to procure a
wordsized program return code. wordsized program return code.
EM programs also stop when traps and interrupts occur that are EM programs also stop when traps and interrupts occur that are
not caught and when the exit monitor call is executed. not caught and when the exit monitor call is executed.
.S2 "Input/Output and other monitor calls" .P2 "Input/Output and other monitor calls"
.PP
EM differs from most conventional machines in that it has high level i/o EM differs from most conventional machines in that it has high level i/o
instructions. instructions.
Typical instructions are OPEN FILE and READ FROM FILE instead Typical instructions are OPEN FILE and READ FROM FILE instead
@ -58,7 +45,7 @@ of low level instructions such as setting and clearing
bits in device registers. bits in device registers.
By providing such high level i/o primitives, the task of implementing By providing such high level i/o primitives, the task of implementing
EM on various non EM machines is made considerably easier. EM on various non EM machines is made considerably easier.
.P .PP
I/O is initiated by the MON instruction, which expects an iocode on top I/O is initiated by the MON instruction, which expects an iocode on top
of the stack. of the stack.
Often there are also parameters which are pushed on the Often there are also parameters which are pushed on the
@ -68,45 +55,35 @@ Some i/o functions also provide results, which are returned on the stack.
In the list of monitor calls we use several types of parameters and results, In the list of monitor calls we use several types of parameters and results,
these types consist of integers and unsigneds of varying sizes, but never these types consist of integers and unsigneds of varying sizes, but never
smaller than the wordsize, and the two pointer types. smaller than the wordsize, and the two pointer types.
.N 1 .LP
The names of the types used are: The names of the types used are:
.IS 4 .DS
.PS - 10 .TS
.PT int tab(:);
an integer of wordsize l l.
.PT int2 int:an integer of wordsize
an integer whose size is the maximum of the wordsize and 2 int2:an integer whose size is the maximum of the wordsize and 2 bytes
bytes int4:an integer whose size is the maximum of the wordsize and 4 bytes
.PT int4 intp:an integer with the size of a pointer
an integer whose size is the maximum of the wordsize and 4 uns2:an unsigned integer whose size is the maximum of the wordsize and 2
bytes unsp:an unsigned integer with the size of a pointer
.PT intp ptr:a pointer into data space
an integer with the size of a pointer .TE
.PT uns2 .DE
an unsigned integer whose size is the maximum of the wordsize and 2 .LP
.PT unsp
an unsigned integer with the size of a pointer
.PT ptr
a pointer into data space
.PE 1
.IE 0
The table below lists the i/o codes with their results and The table below lists the i/o codes with their results and
parameters. parameters.
This list is similar to the system calls of the UNIX Version 7 This list is similar to the system calls of the UNIX Version 7
operating system. operating system.
.A .QQ
To execute a monitor call, proceed as follows: To execute a monitor call, proceed as follows:
.IS 2 .IP a)
.N 1
.PS a 4 "" )
.PT
Stack the parameters, in reverse order, last parameter first. Stack the parameters, in reverse order, last parameter first.
.PT .IP b)
Push the monitor call number (iocode) onto the stack. Push the monitor call number (iocode) onto the stack.
.PT .IP c)
Execute the MON instruction. Execute the MON instruction.
.PE 1 .LP
.IE
An error code is present on the top of the stack after An error code is present on the top of the stack after
execution of most monitor calls. execution of most monitor calls.
If this error code is zero, the call performed the action If this error code is zero, the call performed the action
@ -117,9 +94,12 @@ This construction enables programs to test for failure with a
single instruction (~TEQ or TNE~) and still find out the cause of single instruction (~TEQ or TNE~) and still find out the cause of
the failure. the failure.
The result name 'e' is reserved for the error code. The result name 'e' is reserved for the error code.
.N 1 .ne 5
.LP
List of monitor calls. List of monitor calls.
.DS B .LP
.nf
.na
.ta 4n 13n 29n 52n .ta 4n 13n 29n 52n
nr name parameters results function nr name parameters results function
@ -191,22 +171,23 @@ nr name parameters results function
e:int Execute a file e:int Execute a file
60 Umask mask:int2 oldmask:int2 Set file creation mode mask 60 Umask mask:int2 oldmask:int2 Set file creation mode mask
61 Chroot string:ptr e:int Change root directory 61 Chroot string:ptr e:int Change root directory
.DE 1 .fi
.ad
.LP
Codes 0, 11, 13, 17, 31, 32, 38, 39, 40, 45, 49, 50, 52, Codes 0, 11, 13, 17, 31, 32, 38, 39, 40, 45, 49, 50, 52,
55, 57, 58, 62, and 63 are 55, 57, 58, 62, and 63 are
not used. not used.
.P .PP
All monitor calls, except fork and sigtrp All monitor calls, except fork and sigtrp
are the same as the UNIX version 7 system calls. are the same as the UNIX version 7 system calls.
.P .PP
The sigtrp entry maps UNIX signals onto EM interrupts. The sigtrp entry maps UNIX signals onto EM interrupts.
Normally, trapno is in the range 0 to 252. Normally, trapno is in the range 0 to 252.
In that case it requests that signal signo In that case it requests that signal signo
will cause trap trapno to occur. will cause trap trapno to occur.
When given trap number \-2, default signal handling is reset, and when given When given trap number \-2, default signal handling is reset, and when given
trap number \-3, the signal is ignored. trap number \-3, the signal is ignored.
.P .PP
The flag returned by fork is 1 in the child process and 0 in The flag returned by fork is 1 in the child process and 0 in
the parent. the parent.
The pid returned is the process-id of the other process. The pid returned is the process-id of the other process.
.VS

View file

@ -1,48 +1,42 @@
.BP .bp
.S1 "INTRODUCTION" .P1 "INTRODUCTION"
.PP
EM is a family of intermediate languages designed for producing EM is a family of intermediate languages designed for producing
portable compilers. portable compilers.
The general strategy is for a program called The general strategy is for a program called \fBfront end\fP
.B front end
to translate the source program to EM. to translate the source program to EM.
Another program, Another program, \fBback end\fP,
.B back
.BW end
translates EM to target assembly language. translates EM to target assembly language.
Alternatively, the EM code can be assembled to a binary form Alternatively, the EM code can be assembled to a binary form
and interpreted. and interpreted.
These considerations led to the following goals: These considerations led to the following goals:
.IS 2 10 .IP 1
.PS 1 4
.PT
The design should allow translation to, The design should allow translation to,
or interpretation on, a wide range of existing machines. or interpretation on, a wide range of existing machines.
Design decisions should be delayed as far as possible Design decisions should be delayed as far as possible
and the implications of these decisions should and the implications of these decisions should
be localized as much as possible. be localized as much as possible.
.N .br
The current microcomputer technology offers 8, 16 and 32 bit machines The current microcomputer technology offers 8, 16 and 32 bit machines
with various sizes of address space. with various sizes of address space.
EM should be flexible enough to be useful on most of these EM should be flexible enough to be useful on most of these
machines. machines.
The differences between the members of the EM family should only The differences between the members of the EM family should only
concern the wordsize and address space size. concern the wordsize and address space size.
.PT .IP 2
The architecture should ease the task of code generation for The architecture should ease the task of code generation for
high level languages such as Pascal, C, Ada, Algol 68, BCPL. high level languages such as Pascal, C, Ada, Algol 68, BCPL.
.PT .IP 3
The instruction set used by the interpreter should be compact, The instruction set used by the interpreter should be compact,
to reduce the amount of memory needed to reduce the amount of memory needed
for program storage, and to reduce the time needed to transmit for program storage, and to reduce the time needed to transmit
programs over communication lines. programs over communication lines.
.PT .IP 3
It should be designed with microprogrammed implementations in It should be designed with microprogrammed implementations in
mind; in particular, the use of many short fields within mind; in particular, the use of many short fields within
instruction opcodes should be avoided, because their extraction by the instruction opcodes should be avoided, because their extraction by the
microprogram or conversion to other instruction formats is inefficient. microprogram or conversion to other instruction formats is inefficient.
.PE .PP
.IE
.A
The basic architecture is based on the concept of a stack. The stack The basic architecture is based on the concept of a stack. The stack
is used for procedure return addresses, actual parameters, local variables, is used for procedure return addresses, actual parameters, local variables,
and arithmetic operations. and arithmetic operations.
@ -61,7 +55,7 @@ stack.
For all types except pointers, For all types except pointers,
these instructions have the object size these instructions have the object size
as argument. as argument.
.P .PP
There are no visible general registers used for arithmetic operands There are no visible general registers used for arithmetic operands
etc. This is in contrast to most third generation computers, which usually etc. This is in contrast to most third generation computers, which usually
have 8 or 16 general registers. The decision not to have a group of have 8 or 16 general registers. The decision not to have a group of
@ -69,11 +63,11 @@ general registers was fully intentional, and follows W.L. Van der
Poel's dictum that a machine should have 0, 1, or an infinite Poel's dictum that a machine should have 0, 1, or an infinite
number of any feature. General registers have two primary uses: to hold number of any feature. General registers have two primary uses: to hold
intermediate results of complicated expressions, e.g. intermediate results of complicated expressions, e.g.
.IS 5 0 1 .DS
((a*b + c*d)/e + f*g/h) * i ((a*b + c*d)/e + f*g/h) * i
.IE 1 .DE
and to hold local variables. and to hold local variables.
.P .PP
Various studies Various studies
have shown that the average expression has fewer than two operands, have shown that the average expression has fewer than two operands,
making the former use of registers of doubtful value. The present trend making the former use of registers of doubtful value. The present trend
@ -81,11 +75,9 @@ toward structured programs consisting of many small
procedures greatly reduces the value of registers to hold local variables procedures greatly reduces the value of registers to hold local variables
because the large number of procedure calls implies a large overhead in because the large number of procedure calls implies a large overhead in
saving and restoring the registers at every call. saving and restoring the registers at every call.
.P .PP
Although there are no general purpose registers, there are a Although there are no general purpose registers, there are a
few internal registers with specific functions as follows: few internal registers with specific functions as follows:
.IS 2
.N 1
.TS .TS
tab(:); tab(:);
l 1 l l l. l 1 l l l.
@ -94,9 +86,8 @@ LB:\-:Local Base:Points to base of the local variables
:::in the current procedure. :::in the current procedure.
SP:\-:Stack Pointer:Points to the highest occupied word on the stack. SP:\-:Stack Pointer:Points to the highest occupied word on the stack.
HP:\-:Heap Pointer:Points to the top of the heap area. HP:\-:Heap Pointer:Points to the top of the heap area.
.TE 1 .TE
.IE .PP
.A
Furthermore, reverse Polish code is much easier to generate than Furthermore, reverse Polish code is much easier to generate than
multi-register machine code, especially if highly efficient code is multi-register machine code, especially if highly efficient code is
desired. desired.
@ -106,7 +97,7 @@ An EM machine can
achieve high performance by keeping part of the stack achieve high performance by keeping part of the stack
in high speed storage (a cache or microprogram scratchpad memory) rather in high speed storage (a cache or microprogram scratchpad memory) rather
than in primary memory. than in primary memory.
.P .PP
Again according to van der Poel's dictum, Again according to van der Poel's dictum,
all EM instructions have zero or one argument. all EM instructions have zero or one argument.
We believe that instructions needing two arguments We believe that instructions needing two arguments
@ -116,11 +107,11 @@ circumstances as well.
Moreover, these two instructions together often Moreover, these two instructions together often
have a shorter encoding than the single have a shorter encoding than the single
instruction before. instruction before.
.P .PP
This document describes EM at three different levels: This document describes EM at three different levels:
the abstract level, the assembly language level and the abstract level, the assembly language level and
the machine language level. the machine language level.
.A .QQ
The most important level is that of the abstract EM architecture. The most important level is that of the abstract EM architecture.
This level deals with the basic design issues. This level deals with the basic design issues.
Only the functional capabilities of instructions are relevant, not their Only the functional capabilities of instructions are relevant, not their
@ -128,14 +119,14 @@ format or encoding.
Most chapters of this document refer to the abstract level Most chapters of this document refer to the abstract level
and it is explicitly stated whenever and it is explicitly stated whenever
another level is described. another level is described.
.A .QQ
The assembly language is intended for the compiler writer. The assembly language is intended for the compiler writer.
It presents a more or less orthogonal instruction It presents a more or less orthogonal instruction
set and provides symbolic names for data. set and provides symbolic names for data.
Moreover, it facilitates the linking of Moreover, it facilitates the linking of
separately compiled 'modules' into a single program separately compiled 'modules' into a single program
by providing several pseudoinstructions. by providing several pseudoinstructions.
.A .QQ
The machine language is designed for interpretation with a compact The machine language is designed for interpretation with a compact
program text and easy decoding. program text and easy decoding.
The binary representation of the machine language instruction set is The binary representation of the machine language instruction set is
@ -144,7 +135,7 @@ Frequent instructions have a short opcode.
The encoding is fully byte oriented. The encoding is fully byte oriented.
These bytes do not contain small bit fields, because These bytes do not contain small bit fields, because
bit fields would slow down decoding considerably. bit fields would slow down decoding considerably.
.P .PP
A common use for EM is for producing portable (cross) compilers. A common use for EM is for producing portable (cross) compilers.
When used this way, the compilers produce When used this way, the compilers produce
EM assembly language as their output. EM assembly language as their output.
@ -156,7 +147,7 @@ machine language instructions is irrelevant.
On the other hand, when writing an interpreter for EM machine language On the other hand, when writing an interpreter for EM machine language
programs, the interpreter must deal with the machine language programs, the interpreter must deal with the machine language
and not with the symbolic assembly language. and not with the symbolic assembly language.
.P .PP
As mentioned above, the As mentioned above, the
current microcomputer technology offers 8, 16 and 32 bit current microcomputer technology offers 8, 16 and 32 bit
machines with address spaces ranging from machines with address spaces ranging from

View file

@ -1,6 +1,5 @@
.SN 3 .bp
.BP .P1 "INSTRUCTION ADDRESS SPACE"
.S1 "INSTRUCTION ADDRESS SPACE"
The instruction space of the EM machine contains The instruction space of the EM machine contains
the code for procedures. the code for procedures.
Tables necessary for the execution of this code, for example, procedure Tables necessary for the execution of this code, for example, procedure
@ -10,14 +9,14 @@ the execution of a program, so that it may be
protected. protected.
No further restrictions to the instruction address space are No further restrictions to the instruction address space are
necessary for the abstract and assembly language level. necessary for the abstract and assembly language level.
.P .PP
Each procedure has a single entry point: the first instruction. Each procedure has a single entry point: the first instruction.
A special type of pointer identifies a procedure. A special type of pointer identifies a procedure.
Pointers into the instruction Pointers into the instruction
address space have the same size as pointers into data space and address space have the same size as pointers into data space and
can, for example, contain the address of the first instruction can, for example, contain the address of the first instruction
or an index in a procedure descriptor table. or an index in a procedure descriptor table.
.A .QQ
There is a single EM program counter, PC, pointing There is a single EM program counter, PC, pointing
to the next instruction to be executed. to the next instruction to be executed.
The procedure pointed to by PC is The procedure pointed to by PC is
@ -28,35 +27,31 @@ The calling procedure remains 'active' and is resumed whenever the called
procedure returns. procedure returns.
Note that a procedure has several 'active' invocations when Note that a procedure has several 'active' invocations when
called recursively. called recursively.
.P .PP
Each procedure must return properly. Each procedure must return properly.
It is not allowed to fall through to the It is not allowed to fall through to the
code of the next procedure. code of the next procedure.
There are several ways to exit from a procedure: There are several ways to exit from a procedure:
.IS 3 .IP -
.PS
.PT
the RET instruction, which returns to the the RET instruction, which returns to the
calling procedure. calling procedure.
.PT .IP -
the RTT instruction, which exits a trap handling routine and resumes the RTT instruction, which exits a trap handling routine and resumes
the trapping instruction (see next chapter). the trapping instruction (see next chapter).
.PT .IP -
the GTO instruction, which is used for non-local goto's. the GTO instruction, which is used for non-local goto's.
It can remove several frames from the stack and transfer It can remove several frames from the stack and transfer
control to an active procedure. control to an active procedure.
(see also MES~11 in paragraph 11.1.4.4) (see also MES~11 in paragraph 11.1.4.4)
.PE .PP
.IE
.P
All branch instructions can transfer control All branch instructions can transfer control
to any label within the same procedure. to any label within the same procedure.
Branch instructions can never jump out of a procedure. Branch instructions can never jump out of a procedure.
.P .PP
Several language implementations use a so called procedure Several language implementations use a so called procedure
instance identifier, a combination of a procedure identifier and instance identifier, a combination of a procedure identifier and
the LB of a stack frame, also called static link. the LB of a stack frame, also called static link.
.P .PP
The program text for each procedure, as well as any tables, The program text for each procedure, as well as any tables,
are fragments and can be allocated anywhere are fragments and can be allocated anywhere
in the instruction address space. in the instruction address space.

View file

@ -1,6 +1,6 @@
.BP .bp
.SN 10 .P1 "EM MACHINE LANGUAGE"
.S1 "EM MACHINE LANGUAGE" .PP
The EM machine language is designed to make program text compact The EM machine language is designed to make program text compact
and to make decoding easy. and to make decoding easy.
Compact program text has many advantages: programs execute faster, Compact program text has many advantages: programs execute faster,
@ -11,16 +11,18 @@ that it is feasible to use interpreters as long as EM hardware
machines are not available. machines are not available.
This chapter is irrelevant when back ends are used to This chapter is irrelevant when back ends are used to
produce executable target machine code. produce executable target machine code.
.S2 "Instruction encoding" .P2 "Instruction encoding"
.PP
A design goal of EM is to make the A design goal of EM is to make the
program text as compact as possible. program text as compact as possible.
Decoding must be easy, however. Decoding must be easy, however.
The encoding is fully byte oriented, without any small bit fields. The encoding is fully byte oriented, without any small bit fields.
There are 256 primary opcodes, two of which are an escape to There are 256 primary opcodes, two of which are an escape to
two groups of 256 secondary opcodes each. two groups of 256 secondary opcodes each.
.A .QQ
EM instructions without arguments have a single opcode assigned, EM instructions without arguments have a single opcode assigned,
possibly escaped: possibly escaped:
.ta 12n 24n
.Dr 6 .Dr 6
|--------------| |--------------|
| opcode | | opcode |
@ -37,7 +39,7 @@ Several instructions have an address from the global data area
as argument. as argument.
Other instructions have different opcodes for positive Other instructions have different opcodes for positive
and negative arguments. and negative arguments.
.N 1 .LP
There is always an opcode that takes the next two bytes as argument, There is always an opcode that takes the next two bytes as argument,
high byte first: high byte first:
.Dr 6 .Dr 6
@ -94,7 +96,7 @@ several different encodings are available.
It is the task of the assembler to select the shortest of these. It is the task of the assembler to select the shortest of these.
The savings by these mini and shortie The savings by these mini and shortie
opcodes are considerable, about 55%. opcodes are considerable, about 55%.
.P .PP
Further improvements are possible: Further improvements are possible:
the arguments of the arguments of
many instructions are a multiple of the wordsize. many instructions are a multiple of the wordsize.
@ -106,26 +108,24 @@ The arguments of some other instructions
rarely or never assume the value 0, but start at 1. rarely or never assume the value 0, but start at 1.
The value 1 is then encoded as 0, The value 1 is then encoded as 0,
2 as 1 and so on. 2 as 1 and so on.
.P .PP
Assigning opcodes to instructions by the assembler is completely Assigning opcodes to instructions by the assembler is completely
table driven. table driven.
For details see appendix B. For details see appendix B.
.S2 "Procedure descriptors" .P2 "Procedure descriptors"
.PP
The procedure identifiers used in the interpreter are indices The procedure identifiers used in the interpreter are indices
into a table of procedure descriptors. into a table of procedure descriptors.
Each descriptor contains: Each descriptor contains:
.IS 6 .IP 1.
.PS - 4
.PT 1.
the number of bytes to be reserved for locals at each the number of bytes to be reserved for locals at each
invocation. invocation.
.N .br
This is a pointer-sized integer. This is a pointer-sized integer.
.PT 2. .IP 2.
the start address of the procedure the start address of the procedure
.PE .P2 "Load format"
.IE .PP
.S2 "Load format"
The EM machine language load format defines the interface between The EM machine language load format defines the interface between
the EM assembler/loader and the EM machine itself. the EM assembler/loader and the EM machine itself.
A load file consists of a header, the program text to be executed, A load file consists of a header, the program text to be executed,
@ -133,7 +133,7 @@ a description of the global data area and the procedure descriptor table,
in this order. in this order.
All integers in the load file are presented with the All integers in the load file are presented with the
least significant byte first. least significant byte first.
.P .PP
The header has two parts: the first half (eight 16-bit integers) The header has two parts: the first half (eight 16-bit integers)
aids in selecting aids in selecting
the correct EM machine or interpreter. the correct EM machine or interpreter.
@ -141,68 +141,59 @@ Some EM machines, for instance, may have hardware floating point
instructions. instructions.
.N .N
The header entries are as follows (bit 0 is rightmost): The header entries are as follows (bit 0 is rightmost):
.IS 2 .IP 1:
.VS 1 0
.PS 1 4 "" :
.PT
magic number (07255) magic number (07255)
.PT .IP 2:
flag bits with the following meaning: flag bits with the following meaning:
.PS - 7 "" : .RS
.PT bit 0 .IP "bit 0"
TEST; test for integer overflow etc. TEST; test for integer overflow etc.
.PT bit 1 .IP "bit 1"
PROFILE; for each source line: count the number of memory PROFILE; for each source line: count the number of memory
cycles executed. cycles executed.
.PT bit 2 .IP "bit 2"
FLOW; for each source line: set a bit in a bit map table if FLOW; for each source line: set a bit in a bit map table if
instructions on that line are executed. instructions on that line are executed.
.PT bit 3 .IP "bit 3"
COUNT; for each source line: increment a counter if that line COUNT; for each source line: increment a counter if that line
is entered. is entered.
.PT bit 4 .IP "bit 4"
REALS; set if a program uses floating point instructions. REALS; set if a program uses floating point instructions.
.PT bit 5 .IP "bit 5"
EXTRA; more tests during compiler debugging. EXTRA; more tests during compiler debugging.
.PE .RE
.PT .IP 3:
number of unresolved references. number of unresolved references.
.PT .IP 4:
version number; used to detect obsolete EM load files. version number; used to detect obsolete EM load files.
.PT .IP 5:
wordsize ; the number of bytes in each machine word. wordsize ; the number of bytes in each machine word.
.PT .IP 6:
pointer size ; the number of bytes available for addressing. pointer size ; the number of bytes available for addressing.
.PT .IP 7:
unused unused
.PT .IP 8:
unused unused
.PE .LP
.IE
The second part of the header (eight entries, of pointer size bytes each) The second part of the header (eight entries, of pointer size bytes each)
describes the load file itself: describes the load file itself:
.IS 2 .IP 1:
.PS 1 4 "" :
.PT
NTEXT; the program text size in bytes. NTEXT; the program text size in bytes.
.PT .IP 2:
NDATA; the number of load-file descriptors (see below). NDATA; the number of load-file descriptors (see below).
.PT .IP 3:
NPROC; the number of entries in the procedure descriptor table. NPROC; the number of entries in the procedure descriptor table.
.PT .IP 4:
ENTRY; procedure number of the procedure to start with. ENTRY; procedure number of the procedure to start with.
.PT .IP 5:
NLINE; the maximum source line number. NLINE; the maximum source line number.
.PT .IP 6:
SZDATA; the address of the lowest uninitialized data byte. SZDATA; the address of the lowest uninitialized data byte.
.PT .IP 7:
unused unused
.PT .IP 8:
unused unused
.PE .PP
.IE
.VS
.P
The program text consists of NTEXT bytes. The program text consists of NTEXT bytes.
NTEXT is always a multiple of the wordsize. NTEXT is always a multiple of the wordsize.
The first byte of the program text is the The first byte of the program text is the
@ -212,7 +203,7 @@ Pointers into the program text are found in the procedure descriptor
table where relocation is simple and in the global data area. table where relocation is simple and in the global data area.
The initialization of the global data area allows easy The initialization of the global data area allows easy
relocation of pointers into both address spaces. relocation of pointers into both address spaces.
.P .PP
The global data area is described by the NDATA descriptors. The global data area is described by the NDATA descriptors.
Each descriptor describes a number of consecutive words (of~wordsize) Each descriptor describes a number of consecutive words (of~wordsize)
and consists of a sequence of bytes. and consists of a sequence of bytes.
@ -220,7 +211,7 @@ While reading the descriptors from the load file, one can
initialize the global data area from low to high addresses. initialize the global data area from low to high addresses.
The size of the initialized data area is given by SZDATA, The size of the initialized data area is given by SZDATA,
this number can be used to check the initialization. this number can be used to check the initialization.
.N .br
The header of each descriptor consists of a byte, describing the type, The header of each descriptor consists of a byte, describing the type,
and a count. and a count.
The number of bytes used for this (unsigned) count depends on the The number of bytes used for this (unsigned) count depends on the
@ -232,31 +223,22 @@ At load time an interpreter can
perform any conversion deemed necessary, such as perform any conversion deemed necessary, such as
reordering bytes in integers reordering bytes in integers
and pointers and adding base addresses to pointers. and pointers and adding base addresses to pointers.
.A .QQ
In the following pictures we show a graphical notation of the In the following pictures we show a graphical notation of the
initializers. initializers.
The leftmost rectangle represents the leading byte. The leftmost rectangle represents the leading byte.
.N 1 .LP
.VS 1 0
.DS
.PS - 4 " "
Fields marked with Fields marked with
.N 1 .TS
.PT n tab(:);
contain a pointer-sized integer used as a count l l.
.PT m n:contain a pointer-sized integer used as a count
contain a one-byte integer used as a count m:contain a one-byte integer used as a count
.PT b b:contain a one-byte integer
contain a one-byte integer w:contain a wordsized integer
.PT w p:contain a data or instruction pointer
contain a wordsized integer s:contain a null terminated ASCII string
.PT p .TE
contain a data or instruction pointer
.PT s
contain a null terminated ASCII string
.PE 1
.DE 0
.VS
.Dr 6 .Dr 6
------------------- -------------------
| 0 | n | repeat last initialization n times | 0 | n | repeat last initialization n times
@ -316,8 +298,7 @@ contain a null terminated ASCII string
| 8 | m | s | initialized float of size m | 8 | m | s | initialized float of size m
------------------------- -------------------------
.De .De
.PS - 8 .IP type~0: 10
.PT type~0:
If the last initialization initialized k bytes starting If the last initialization initialized k bytes starting
at address \fIa\fP, do the same initialization again n times, at address \fIa\fP, do the same initialization again n times,
starting at \fIa\fP+k, \fIa\fP+2*k, .... \fIa\fP+n*k. starting at \fIa\fP+k, \fIa\fP+2*k, .... \fIa\fP+n*k.
@ -328,43 +309,43 @@ pointer,
in all other descriptors the first byte is followed by a one-byte count. in all other descriptors the first byte is followed by a one-byte count.
This descriptor must be preceded by a descriptor of This descriptor must be preceded by a descriptor of
another type. another type.
.PT type~1: .IP type~1: 10
Reserve m words, not explicitly initialized (BSS and HOL). Reserve m words, not explicitly initialized (BSS and HOL).
.PT type~2: .IP type~2: 10
The m bytes following the descriptor header are The m bytes following the descriptor header are
initializers for the next m bytes of the initializers for the next m bytes of the
global data area. global data area.
m is divisible by the wordsize. m is divisible by the wordsize.
.PT type~3: .IP type~3: 10
The m words following the header are initializers for the next m words of the The m words following the header are initializers for the next m words of the
global data area. global data area.
.PT type~4: .IP type~4: 10
The m data address space pointers following the header are The m data address space pointers following the header are
initializers for the next initializers for the next
m data pointers in the global data area. m data pointers in the global data area.
Interpreters that represent EM pointers by Interpreters that represent EM pointers by
target machine addresses must relocate all data pointers. target machine addresses must relocate all data pointers.
.PT type~5: .IP type~5: 10
The m instruction address space pointers following the header are The m instruction address space pointers following the header are
initializers for the next initializers for the next
m instruction pointers in the global data area. m instruction pointers in the global data area.
Interpreters that represent EM instruction pointers by Interpreters that represent EM instruction pointers by
target machine addresses must relocate these pointers. target machine addresses must relocate these pointers.
.PT type~6: .IP type~6: 10
The m bytes following the header form The m bytes following the header form
a signed integer number with a size of m bytes, a signed integer number with a size of m bytes,
which is an initializer for the next m bytes which is an initializer for the next m bytes
of the global data area. of the global data area.
m is governed by the same restrictions as for m is governed by the same restrictions as for
transfer of objects to/from memory. transfer of objects to/from memory.
.PT type~7: .IP type~7: 10
The m bytes following the header form The m bytes following the header form
an unsigned integer number with a size of m bytes, an unsigned integer number with a size of m bytes,
which is an initializer for the next m bytes which is an initializer for the next m bytes
of the global data area. of the global data area.
m is governed by the same restrictions as for m is governed by the same restrictions as for
transfer of objects to/from memory. transfer of objects to/from memory.
.PT type~8: .IP type~8: 10
The header is followed by an ASCII string, null terminated, to The header is followed by an ASCII string, null terminated, to
initialize, in global data, initialize, in global data,
a floating point number with a size of m bytes. a floating point number with a size of m bytes.
@ -372,8 +353,7 @@ m is governed by the same restrictions as for
transfer of objects to/from memory. transfer of objects to/from memory.
The ASCII string contains the notation of a real as used in the The ASCII string contains the notation of a real as used in the
Pascal language. Pascal language.
.PE .PP
.P
The NPROC procedure descriptors on the load file consist of The NPROC procedure descriptors on the load file consist of
an instruction space address (of~pointer~size) and an instruction space address (of~pointer~size) and
an integer (of~pointer~size) specifying the number of bytes for an integer (of~pointer~size) specifying the number of bytes for

View file

@ -1,39 +1,113 @@
.SS 10 .LP
.if n .LL 78 .if n \{\
.RP .nr LL 78
.MS T E .ll 78 \}
\!.TL '%''' .tr ~
.ME
.MS T O
\!.TL '''%'
.ME
.MS B
.sp 1
.ME
.SM S1 B
.SM S2 B
.\" below are three simple macros to get the drawings right .\" below are three simple macros to get the drawings right
.\" added by Dick Grune .\" added by Dick Grune
.de Dr \" Drawing $1 (size) .de Dr \" Drawing $1 (size)
.N 1 .sp 1
.NE \\$1 .ne \\$1
.NA .na
.ft CW \" constant spacing .ft CW \" constant spacing
.lg 0 \" no ligatures .lg 0 \" no ligatures
.. ..
.de Df \" Drawing Footer .de Df \" Drawing Footer
.N 1 .br
.sp 1
.ft R .ft R
.CS .ce 1000
.lg 1 .lg 1
.. ..
.de De \" Drawing End $1 (lines) .de De \" Drawing End $1 (lines)
.Df \" if it hasn't happened yet .br
.CE .ft R
.AD .lg 1
.N \\$1 .ce 0
.ad
.sp \\$1
.. ..
.\" macro for exponents, added by Ceriel Jacobs .\" macro for exponents, added by Ceriel Jacobs
.de Ex \" Exponent $1 $2 [$3] .de Ex \" Exponent $1 $2 [$3]
\\$1\v'-0.5m'\s-2\\$2\s+2\v'0.5m'\\$3 \\$1\v'-0.5m'\s-2\\$2\s+2\v'0.5m'\\$3
.. ..
.\" QQ is like PP, but without space
. \" use .PP, with PD 0.
.de QQ
.nr xx \\n(PD
.nr PD 0
.PP
.nr PD \\n(xx
..
.nr N1 0
.nr N2 0
.nr N3 0
.nr N4 0
.nr N5 0
.nr A5 0
.af A5 A
.de P1
.nr N2 0
.nr N1 \\n(N1+1
.ds Tl "\\n(N1. \\$1
.Ca 0
.sp
.LP
\\fB\\n(N1. \\$1\\fP
.sp
..
.de P2
.nr N3 0
.nr N2 \\n(N2+1
.ds Tl "\\n(N1.\\n(N2 \\$1
.ne 5
.Ca 2
.sp
.LP
\\fB\\n(N1.\\n(N2 \\$1\fP
..
.de P3
.nr N4 0
.nr N3 \\n(N3+1
.ds Tl "\\n(N1.\\n(N2.\\n(N3 \\$1
.Ca 4
.LP
\\fI\\n(N1.\\n(N2.\\n(N3 \\$1\fP
..
.de P4
.nr N4 \\n(N4+1
.ds Tl "\\n(N1.\\n(N2.\\n(N3.\\n(N4 \\$1
.ne 5
.Ca 6
.LP
\\fI\\n(N1.\\n(N2.\\n(N3.\\n(N4 \\$1\fP
..
.de AP
.nr N5 \\n(N5+1
.nr A5 \\n(N5
.ds Tl "\\n(A5. \\$1
.ne 5
.Ca 0
.LP
\\fB\\n(A5. \\$1\\fP
.sp
..
.de Ca
.da Cc
.if \\$1=0 \!.sp \\\\n(PDu
\!\l\&\\$1n\ \&\\*(Tl \l\&|\\\\n(LLu-\w\&\ \\n(PN\&u.\&\ \\n(PN
\!.br
.da
..
.de Ct
.Cc
.rm Cc
..
.de PT
.lt \\n(LLu
.pc %
.nr PN \\n%-1
.if \\n(PN%2=1 .tl '''\\n(PN'
.if (\\n(PN%2=0)&(\\n(PN) .tl '\\n(PN'''
.lt \\n(.lu
..

View file

@ -1,12 +1,12 @@
.SN 5 .bp
.BP .P1 "MAPPING OF EM DATA MEMORY ONTO TARGET MACHINE MEMORY"
.S1 "MAPPING OF EM DATA MEMORY ONTO TARGET MACHINE MEMORY" .PP
The EM architecture is designed to be implemented The EM architecture is designed to be implemented
on many existing and future machines. on many existing and future machines.
EM memory is highly fragmented to make EM memory is highly fragmented to make
adaptation to various memory architectures possible. adaptation to various memory architectures possible.
Format and encoding of pointers is explicitly undefined. Format and encoding of pointers is explicitly undefined.
.P .PP
This chapter gives solutions to some of the This chapter gives solutions to some of the
anticipated problems. anticipated problems.
First, we describe a possible memory layout for machines First, we describe a possible memory layout for machines
@ -53,11 +53,9 @@ The most straightforward layout is shown in figure 2.
Figure 2. Memory layout showing typical register Figure 2. Memory layout showing typical register
positions during execution of an EM program. positions during execution of an EM program.
.De .De
.N 1 .sp 1
The base registers for the various memory pieces can be stored The base registers for the various memory pieces can be stored
in target machine registers or memory. in target machine registers or memory.
.IS
.N 1
.TS .TS
tab(;); tab(;);
l 1 l l l. l 1 l l l.
@ -65,8 +63,8 @@ PB;:;program base;points to the base of the instruction address space.
EB;:;external base;points to the base of the data address space. EB;:;external base;points to the base of the data address space.
HB;:;heap base;points to the base of the heap area. HB;:;heap base;points to the base of the heap area.
ML;:;memory limit;marks the high end of the addressable data space. ML;:;memory limit;marks the high end of the addressable data space.
.TE 1 .TE
.IE .LP
The stack grows from high The stack grows from high
EM addresses to low EM addresses, and the heap the EM addresses to low EM addresses, and the heap the
other way. other way.
@ -74,7 +72,7 @@ The memory between SP and HP is not accessible,
but may be allocated later to the stack or the heap if needed. but may be allocated later to the stack or the heap if needed.
The local data area is allocated starting at the high end of The local data area is allocated starting at the high end of
memory. memory.
.P .PP
Because EM address 0 is not mapped onto target Because EM address 0 is not mapped onto target
address 0, a problem arises when pointers are used. address 0, a problem arises when pointers are used.
If a program pushed a constant, say 6, onto the stack, If a program pushed a constant, say 6, onto the stack,
@ -86,7 +84,7 @@ This particular problem is solved by explicitly declaring
the format of a pointer to be undefined, the format of a pointer to be undefined,
so that using a constant as a pointer is completely illegal. so that using a constant as a pointer is completely illegal.
However, the general problem of mapping pointers still exists. However, the general problem of mapping pointers still exists.
.P .PP
There are two possible solutions. There are two possible solutions.
In the first solution, EM pointers are represented In the first solution, EM pointers are represented
in the target machine as true EM addresses, in the target machine as true EM addresses,
@ -100,7 +98,7 @@ facilities, EB can be kept in a target machine register,
and the relocation can indeed be done on and the relocation can indeed be done on
every reference to the data address space every reference to the data address space
at a modest cost in speed. at a modest cost in speed.
.P .PP
The other solution consists of having EM pointers The other solution consists of having EM pointers
refer to the true target machine address. refer to the true target machine address.
Thus the instruction LAE 6 (Load Address of External 6) Thus the instruction LAE 6 (Load Address of External 6)
@ -112,7 +110,7 @@ However, the problem is not completely solved,
because a front end may have to initialize a pointer because a front end may have to initialize a pointer
in CON or ROM data to point to a global address. in CON or ROM data to point to a global address.
This pointer must also be relocated by the back end or the interpreter. This pointer must also be relocated by the back end or the interpreter.
.P .PP
Although the EM stack grows from high to low EM addresses, Although the EM stack grows from high to low EM addresses,
some machines have hardware PUSH and POP some machines have hardware PUSH and POP
instructions that require the stack to grow upwards. instructions that require the stack to grow upwards.
@ -144,17 +142,13 @@ Figure 3. Two possible memory implementations.
Numbers within the boxes are EM addresses. Numbers within the boxes are EM addresses.
The other numbers are physical addresses. The other numbers are physical addresses.
.De .De
.A 1 0 .LP
So, we have two different EM memory implementations: So, we have two different EM memory implementations:
.IS .IP "A~\-"
.PS - 4
.PT A~\-
stack downwards stack downwards
.PT B~\- .IP "B~\-"
stack upwards stack upwards
.PE .PP
.IE
.P
For each of these two possibilities we give the translation of For each of these two possibilities we give the translation of
the EM instructions to push the third byte of a global data the EM instructions to push the third byte of a global data
block starting at EM address 40 onto the stack and to load the block starting at EM address 40 onto the stack and to load the
@ -164,22 +158,20 @@ The target machine used is a PDP-11 augmented with push and pop instructions.
Registers 'r0' and 'r1' are used and suffer from sign extension for byte Registers 'r0' and 'r1' are used and suffer from sign extension for byte
transfers. transfers.
Push $40 means push the constant 40, not word 40. Push $40 means push the constant 40, not word 40.
.P .PP
The translation of the EM instructions depends on the pointer representation The translation of the EM instructions depends on the pointer representation
used. used.
For each of the two solutions explained above the translation is given. For each of the two solutions explained above the translation is given.
.P .PP
First, the translation for the two implementations using EM addresses as First, the translation for the two implementations using EM addresses as
pointer representation: pointer representation:
.DS .KS
.TS .TS
tab(:), center; tab(:), center;
l s l s l s l s l s l s
_ s _ s _ s
l 2 l 6 l 2 l 6 l 2 l. l 2 l 6 l 2 l 6 l 2 l.
EM:type A:type B EM:type A:type B
_
LAE:40:push:$40:push:$40 LAE:40:push:$40:push:$40
ADP:3:pop:r0:pop:r0 ADP:3:pop:r0:pop:r0
@ -194,20 +186,17 @@ LOI:1:pop:r0:pop:r0
LOE:40:push:eb+40:push:eb-41 LOE:40:push:eb+40:push:eb-41
.TE .TE
.DE .KE
.P .PP
The translation for the two implementations, if the target machine address is The translation for the two implementations, if the target machine address is
used as pointer representation, is: used as pointer representation, is:
.N 1 .KS
.DS
.TS .TS
tab(:), center; tab(:), center;
l s l s l s l s l s l s
_ s _ s _ s
l 2 l 6 l 2 l 6 l 2 l. l 2 l 6 l 2 l 6 l 2 l.
EM:type A:type B EM:type A:type B
_
LAE:40:push:$eb+40:push:$eb-40 LAE:40:push:$eb+40:push:$eb-40
ADP:3:pop:r0:pop:r0 ADP:3:pop:r0:pop:r0
@ -221,12 +210,12 @@ LOI:1:pop:r0:pop:r0
LOE:40:push:eb+40:push:eb-41 LOE:40:push:eb+40:push:eb-41
.TE .TE
.DE .KE
.P .PP
The translation presented above is not intended to be optimal. The translation presented above is not intended to be optimal.
Most machines can handle these simple cases in one or two instructions. Most machines can handle these simple cases in one or two instructions.
It demonstrates, however, the flexibility of the EM design. It demonstrates, however, the flexibility of the EM design.
.P .PP
There are several possibilities to implement EM on machines with There are several possibilities to implement EM on machines with
address spaces larger than 64k bytes. address spaces larger than 64k bytes.
For EM with two byte pointers one could allocate instruction and For EM with two byte pointers one could allocate instruction and
@ -236,7 +225,7 @@ but the base registers PB and EB may be loaded in hardware registers
wider than 16 bits, if available. wider than 16 bits, if available.
EM implementations can also make efficient use of a machine EM implementations can also make efficient use of a machine
with separate instruction and data space. with separate instruction and data space.
.P .PP
EM with 32 bit pointers allows one to make use of machines EM with 32 bit pointers allows one to make use of machines
with large address spaces. with large address spaces.
In a virtual, segmented memory system one could use a separate In a virtual, segmented memory system one could use a separate

View file

@ -1,13 +1,13 @@
.BP .bp
.SN 2 .P1 MEMORY
.S1 MEMORY .PP
The EM machine has two distinct address spaces, The EM machine has two distinct address spaces,
one for instructions and one for data. one for instructions and one for data.
The data space is divided up into 8-bit bytes. The data space is divided up into 8-bit bytes.
The smallest addressable unit is a byte. The smallest addressable unit is a byte.
Bytes are numbered consecutively from 0 to some maximum. Bytes are numbered consecutively from 0 to some maximum.
All sizes in EM are expressed in bytes. All sizes in EM are expressed in bytes.
.P .PP
Some EM instructions can transfer objects containing several bytes Some EM instructions can transfer objects containing several bytes
to and/or from memory. to and/or from memory.
The size of all objects larger than a word must be a multiple of The size of all objects larger than a word must be a multiple of
@ -26,7 +26,7 @@ location \fIm\fP and the wordsize is 2,
\fIm\fP must be a multiple of 2 and the bytes at \fIm\fP must be a multiple of 2 and the bytes at
locations \fIm\fP, \fIm\fP\|+\|1,\fIm\fP\|+\|2 and locations \fIm\fP, \fIm\fP\|+\|1,\fIm\fP\|+\|2 and
\fIm\fP\|+\|3 are overwritten. \fIm\fP\|+\|3 are overwritten.
.P .PP
The size of almost all objects in EM The size of almost all objects in EM
is an integral number of words. is an integral number of words.
Only two operations are allowed on Only two operations are allowed on
@ -42,11 +42,11 @@ EM provides a way to sign-extend a small integer.
Popping a small object from the stack removes a word Popping a small object from the stack removes a word
from the stack, stores the least significant byte(s) from the stack, stores the least significant byte(s)
of this word in memory and discards the rest of the word. of this word in memory and discards the rest of the word.
.P .PP
The format of pointers into both address spaces is explicitly undefined. The format of pointers into both address spaces is explicitly undefined.
The size of a pointer, however, is fixed for a member of EM, so that The size of a pointer, however, is fixed for a member of EM, so that
the compiler writer knows how much storage to allocate for a pointer. the compiler writer knows how much storage to allocate for a pointer.
.P .PP
A minor problem is raised by the undefined pointer format. A minor problem is raised by the undefined pointer format.
Some languages, notably Pascal, require a special, Some languages, notably Pascal, require a special,
otherwise illegal, pointer value to represent the nil pointer. otherwise illegal, pointer value to represent the nil pointer.
@ -59,12 +59,12 @@ but it is hard to imagine an implementation
for which the current solution is inadequate, for which the current solution is inadequate,
especially because the first word in the EM data space especially because the first word in the EM data space
is special and probably not the target of any pointer. is special and probably not the target of any pointer.
.P .PP
The next two chapters describe the EM memory The next two chapters describe the EM memory
in more detail. in more detail.
One describes the instruction address space, One describes the instruction address space,
the other the data address space. the other the data address space.
.P .PP
A design goal of EM has been to allow A design goal of EM has been to allow
its implementation on a wide range of existing machines, its implementation on a wide range of existing machines,
as well as allowing a new one to be built in hardware. as well as allowing a new one to be built in hardware.

View file

@ -1,6 +1,4 @@
.po 0 .LP
.TP 1
.ll 79n
\& \&
.sp 10 .sp 10
.ce 4 .ce 4
@ -25,12 +23,9 @@ Abstract
.ti +5 .ti +5
EM is a family of intermediate languages EM is a family of intermediate languages
designed for producing portable compilers. designed for producing portable compilers.
A program called A program called \fBfront end\fP
.B front end
translates source programs to EM. translates source programs to EM.
Another program, Another program, \fBback end\fP,
.B back
.BW end ,
translates EM to the assembly language of the target machine. translates EM to the assembly language of the target machine.
Alternatively, the EM program can be assembled to a highly Alternatively, the EM program can be assembled to a highly
efficient binary format for interpretation. efficient binary format for interpretation.

View file

@ -1,13 +1,12 @@
.SN 9 .bp
.VS 1 0 .P1 "TRAPS AND INTERRUPTS"
.BP .PP
.S1 "TRAPS AND INTERRUPTS"
EM provides a means for the user program to catch all traps EM provides a means for the user program to catch all traps
generated by the program itself, the hardware, or external conditions. generated by the program itself, the hardware, or external conditions.
This mechanism uses five instructions: LIM, SIM, SIG, TRP and RTT. This mechanism uses five instructions: LIM, SIM, SIG, TRP and RTT.
This section of the manual may be omitted on the first reading since it This section of the manual may be omitted on the first reading since it
presupposes knowledge of the EM instruction set. presupposes knowledge of the EM instruction set.
.P .PP
The action taken when a trap occurs is determined by the value The action taken when a trap occurs is determined by the value
of an internal EM trap register. of an internal EM trap register.
This register contains a pointer to a procedure. This register contains a pointer to a procedure.
@ -26,7 +25,7 @@ Two consecutive SIGs are a no-op.
When a trap occurs, the trap register is reset to its initial When a trap occurs, the trap register is reset to its initial
condition, to prevent recursive traps from hanging the machine up, condition, to prevent recursive traps from hanging the machine up,
e.g. stack overflow in the stack overflow handling procedure. e.g. stack overflow in the stack overflow handling procedure.
.P .PP
The runtime systems for some languages need to ignore some EM The runtime systems for some languages need to ignore some EM
traps. traps.
EM offers a feature called the ignore mask. EM offers a feature called the ignore mask.
@ -37,24 +36,24 @@ If a certain bit is 1 the corresponding trap never
occurs and processing simply continues. occurs and processing simply continues.
The actions performed by the offending instruction are The actions performed by the offending instruction are
described by the Pascal program in appendix A. described by the Pascal program in appendix A.
.N .br
If the bit is 0, traps are not ignored. If the bit is 0, traps are not ignored.
The instructions LIM and SIM allow copying and replacement of The instructions LIM and SIM allow copying and replacement of
the ignore mask.~ the ignore mask.~
.P .PP
The TRP instruction generates a trap, the trap number being found on the The TRP instruction generates a trap, the trap number being found on the
stack. stack.
This is, among other things, This is, among other things,
useful for library procedures and runtime systems. useful for library procedures and runtime systems.
It can also be used by a low level trap procedure to pass the trap to a It can also be used by a low level trap procedure to pass the trap to a
higher level one (see example below). higher level one (see example below).
.P .PP
The RTT instruction returns from the trap procedure and continues after the The RTT instruction returns from the trap procedure and continues after the
trap. trap.
In the list below all traps marked with an asterisk ('*') are In the list below all traps marked with an asterisk ('*') are
considered to be fatal and it is explicitly undefined what happens when considered to be fatal and it is explicitly undefined what happens when
restarting after the trap. restarting after the trap.
.P .PP
The way a trap procedure is called is completely compatible The way a trap procedure is called is completely compatible
with normal calling conventions. The only way a trap procedure with normal calling conventions. The only way a trap procedure
differs from normal procedures is the return. It has to use RTT instead differs from normal procedures is the return. It has to use RTT instead
@ -62,25 +61,20 @@ of RET. This is necessary because the complete runtime status is saved on the
stack before calling the procedure and all this status has to be reloaded. stack before calling the procedure and all this status has to be reloaded.
Error numbers are in the range 0 to 252. Error numbers are in the range 0 to 252.
The trap numbers are divided into three categories: The trap numbers are divided into three categories:
.IS 4 .IP "\0\00\-\063" 12
.N 1
.PS - 10
.PT ~~0\-~63
EM machine errors, e.g. illegal instruction. EM machine errors, e.g. illegal instruction.
.PS - 8 .RS
.PT ~0\-15 .IP "\00\-15" 8
maskable maskable
.PT 16\-63 .IP "16\-63" 8
not maskable not maskable
.PE .RE
.PT ~64\-127 .IP "\064\-127" 12
Reserved for use by compilers, run time systems, etc. Reserved for use by compilers, run time systems, etc.
.PT 128\-252 .IP "128\-252" 12
Available for user programs. Available for user programs.
.PE 1 .LP
.IE
EM machine errors are numbered as follows: EM machine errors are numbered as follows:
.DS I 5
.TS .TS
tab(@); tab(@);
n l l. n l l.
@ -108,15 +102,16 @@ n l l.
26@EBADLIN@Argument of LIN too high 26@EBADLIN@Argument of LIN too high
27@EBADGTO@GTO descriptor error 27@EBADGTO@GTO descriptor error
.TE .TE
.DE 0 .PP
.P
As an example, As an example,
suppose a subprocedure has to be written to do a numeric suppose a subprocedure has to be written to do a numeric
calculation. calculation.
When an overflow occurs the computation has to be stopped and When an overflow occurs the computation has to be stopped and
the higher level procedure must be resumed. the higher level procedure must be resumed.
This can be programmed as follows using the mechanism described above: This can be programmed as follows using the mechanism described above:
.DS B .LP
.KS
.nf
.ta 1n 24n .ta 1n 24n
mes 2,2,2 ; set sizes mes 2,2,2 ; set sizes
ersave ersave
@ -150,10 +145,12 @@ msave
jmpbuf jmpbuf
con *1,0,0 con *1,0,0
end end
.DE 0 .KE
.VS .KS
.DS .LP
Example of catch procedure Example of catch procedure
.LP
.nf
.ta 1n 24n .ta 1n 24n
pro $catch,0 ; Local procedure that must catch the overflow trap pro $catch,0 ; Local procedure that must catch the overflow trap
lol 2 ; Load trap number lol 2 ; Load trap number
@ -168,4 +165,5 @@ Example of catch procedure
trp ; call other trap procedure trp ; call other trap procedure
rtt ; if other procedure returns, do the same rtt ; if other procedure returns, do the same
end end
.DE .KE
.fi

View file

@ -1,6 +1,6 @@
.SN 6 .bp
.BP .P1 "TYPE REPRESENTATIONS"
.S1 "TYPE REPRESENTATIONS" .PP
The representations used for typed objects are not precisely The representations used for typed objects are not precisely
specified by EM. specified by EM.
Sometimes we only specify that a typed object occupies a Sometimes we only specify that a typed object occupies a
@ -15,7 +15,7 @@ on the same object(s).
For example, the instruction ZER pushes signed and For example, the instruction ZER pushes signed and
unsigned integers with the value zero and empty sets. unsigned integers with the value zero and empty sets.
ZER has as only argument the size of the object. ZER has as only argument the size of the object.
.A .QQ
The representation of floating point numbers is a good example, The representation of floating point numbers is a good example,
it allows widely varying implementations. it allows widely varying implementations.
The only ways to create floating point numbers are via The only ways to create floating point numbers are via
@ -26,13 +26,14 @@ be converted to human readable output.
Implementations may use base 10, base 2 or any other Implementations may use base 10, base 2 or any other
base for exponents, and have freedom in choosing the range of base for exponents, and have freedom in choosing the range of
exponent and mantissa. exponent and mantissa.
.A .QQ
Other types are more precisely described. Other types are more precisely described.
In the following paragraphs a description will be given of the In the following paragraphs a description will be given of the
restrictions imposed on the representation of the types used. restrictions imposed on the representation of the types used.
A number \fBn\fP used in these paragraphs indicates the size of A number \fBn\fP used in these paragraphs indicates the size of
the object in \fIbits\fP. the object in \fIbits\fP.
.S2 "Unsigned integers" .P2 "Unsigned integers"
.PP
The range of unsigned integers is 0.. The range of unsigned integers is 0..
.Ex 2 "\fBn\fP" -1. .Ex 2 "\fBn\fP" -1.
A binary representation is assumed. A binary representation is assumed.
@ -47,20 +48,21 @@ This of course means that some sequences of instructions have
unpredictable effects. unpredictable effects.
For example: For example:
.DS .DS
LOC 258 ; STL 0 ; LAL 0 ; LOI 1 ( wordsize >=2 ) LOC 258 ; STL 0 ; LAL 0 ; LOI 1 ( wordsize >=2 )
.DE .DE
The value on the stack after executing this sequence The value on the stack after executing this sequence
can be anything, can be anything,
but will most likely be 1 or 2. but will most likely be 1 or 2.
.A .QQ
Conversion between unsigned integers of different sizes have to Conversion between unsigned integers of different sizes have to
be done with explicit convert instructions. be done with explicit convert instructions.
One cannot simply pad an unsigned integer with zero's at either end One cannot simply pad an unsigned integer with zero's at either end
and expect a correct result. and expect a correct result.
.A .QQ
We assume existence of at least single word unsigned arithmetic We assume existence of at least single word unsigned arithmetic
in any implementation. in any implementation.
.S2 "Signed Integers" .P2 "Signed Integers"
.PP
The range of signed integers is The range of signed integers is
.Ex \-2 "\fBn\fP\-1" ~.. .Ex \-2 "\fBn\fP\-1" ~..
.Ex 2 "\fBn\fP\-1" \-1, .Ex 2 "\fBn\fP\-1" \-1,
@ -75,29 +77,31 @@ range
In other words, the most significant bit is used as sign bit. In other words, the most significant bit is used as sign bit.
The convert instructions between signed and unsigned integers The convert instructions between signed and unsigned integers
of the same size can be used to catch errors. of the same size can be used to catch errors.
.A .QQ
The value The value
.Ex \-2 "\fBn\fP\-1" .Ex \-2 "\fBn\fP\-1"
is used for undefined is used for undefined
signed integers. signed integers.
EM implementations should trap when this value is used in an EM implementations should trap when this value is used in an
operation on signed integers. operation on signed integers.
The instruction mask, accessed with SIM and LIM \-~see chapter 9~\- , The instruction mask, accessed with SIM and LIM \-~see chapter 9~\-,
can be used to disable such traps. can be used to disable such traps.
.A .QQ
We assume existence of at least single word signed arithmetic We assume existence of at least single word signed arithmetic
in any implementation. in any implementation.
.S2 "Floating point values" .P2 "Floating point values"
.PP
Floating point values must have a signed mantissa and a signed Floating point values must have a signed mantissa and a signed
exponent. exponent.
Although no base is specified, base 2 is the normal choice, Although no base is specified, base 2 is the normal choice,
because the FEF instruction pushes the exponent in base 2. because the FEF instruction pushes the exponent in base 2.
.A .QQ
The implementation of floating point arithmetic is optional. The implementation of floating point arithmetic is optional.
The compilers currently in use have runtime parameters for the The compilers currently in use have runtime parameters for the
size of the floating point values they should use. size of the floating point values they should use.
Common choices are 4 and/or 8 bytes. Common choices are 4 and/or 8 bytes.
.S2 Pointers .P2 Pointers
.PP
EM has two kinds of pointers: for instruction and for data EM has two kinds of pointers: for instruction and for data
space. space.
Each kind can only be used for its own space, conversion between Each kind can only be used for its own space, conversion between
@ -109,13 +113,14 @@ One can of course not expect to be able to address two megabyte
of memory using a 2-byte pointer. of memory using a 2-byte pointer.
Normally, a 2-byte pointer allows up to 65536 bytes of Normally, a 2-byte pointer allows up to 65536 bytes of
addressable memory. addressable memory.
.A .QQ
Pointer representation has one restriction. Pointer representation has one restriction.
The pointer with the same representation as the integer zero of The pointer with the same representation as the integer zero of
the same size should be invalid. the same size should be invalid.
Some languages and/or runtime systems represent the nil Some languages and/or runtime systems represent the nil
pointer as zero. pointer as zero.
.S2 "Bit sets" .P2 "Bit sets"
.PP
All bit sets of size \fBn\fP are subsets of the set All bit sets of size \fBn\fP are subsets of the set
{~i~|~i>=0,~i<\fBn\fP~}. {~i~|~i>=0,~i<\fBn\fP~}.
A bit set contains a bit for each element showing its A bit set contains a bit for each element showing its
@ -129,7 +134,7 @@ The relation between a set with size of
a word and an unsigned integer word is that a word and an unsigned integer word is that
the value of the unsigned integer is the summation of the the value of the unsigned integer is the summation of the
2\v'-0.5m'i\v'0.5m' where i is in the set. 2\v'-0.5m'i\v'0.5m' where i is in the set.
.A .QQ
Example: a 2-word bit set (wordsize 2) containing the Example: a 2-word bit set (wordsize 2) containing the
elements 1, 6, 8, 15, 18, 21, 27 and 28 is composed of two elements 1, 6, 8, 15, 18, 21, 27 and 28 is composed of two
integers, e.g. at addresses 40 and 42. integers, e.g. at addresses 40 and 42.