Now uses -ms macros
This commit is contained in:
parent
e9a4337ccf
commit
5117853b1b
19 changed files with 647 additions and 667 deletions
|
@ -1,9 +1,7 @@
|
|||
Makefile
|
||||
proto.make
|
||||
READ_ME
|
||||
addend.n
|
||||
app.codes.nr
|
||||
app.exam.nr
|
||||
app.int.nr
|
||||
assem.nr
|
||||
cont.nr
|
||||
descr.nr
|
||||
|
|
|
@ -3,7 +3,4 @@ DESCRIPTION OF A MACHINE ARCHITECTURE FOR USE WITH BLOCK STRUCTURED LANGUAGES
|
|||
|
||||
The file em.i (text of the defining interpreter) was hand-edited from int/em.p
|
||||
|
||||
To print, set NROFF and TBL in the Makefile and call make.
|
||||
It uses the kun macro package which is also distributed.
|
||||
|
||||
The directory int contains the interpreter.
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
.BP
|
||||
.bp
|
||||
.AP "EM CODE TABLES"
|
||||
The following table is used by the assembler for EM machine
|
||||
language.
|
||||
|
@ -10,65 +10,66 @@ Each line describes a range of interpreter opcodes by
|
|||
specifying for which instruction the range is used, the type of the
|
||||
opcodes (mini, shortie, etc..) and range for the instruction
|
||||
argument.
|
||||
.A
|
||||
.QQ
|
||||
The first field on each line gives the EM instruction mnemonic,
|
||||
the second field gives some flags.
|
||||
If the opcodes are minis or shorties the third field specifies
|
||||
how many minis/shorties are used.
|
||||
The last field gives the number of the (first) interpreter
|
||||
opcode.
|
||||
.N 1
|
||||
.LP
|
||||
Flags :
|
||||
.IS 3
|
||||
.N 1
|
||||
.IP ""
|
||||
Opcode type, only one of the following may be specified.
|
||||
.PS - 5 " "
|
||||
.PT \-
|
||||
.RS
|
||||
.IP \-
|
||||
opcode without argument
|
||||
.PT m
|
||||
.IP m
|
||||
mini
|
||||
.PT s
|
||||
.IP s
|
||||
shortie
|
||||
.PT 2
|
||||
.IP 2
|
||||
opcode with 2-byte signed argument
|
||||
.PT 4
|
||||
.IP 4
|
||||
opcode with 4-byte signed argument
|
||||
.PT 8
|
||||
.IP 8
|
||||
opcode with 8-byte signed argument
|
||||
.PT u
|
||||
.IP u
|
||||
opcode with 2-byte unsigned argument
|
||||
.PE
|
||||
.RE
|
||||
.IP ""
|
||||
Secondary (escaped) opcodes.
|
||||
.PS - 5 " "
|
||||
.PT e
|
||||
.RS
|
||||
.IP e
|
||||
The opcode thus marked is in the secondary opcode group instead
|
||||
of the primary
|
||||
.PE
|
||||
.RE
|
||||
.IP ""
|
||||
restrictions on arguments
|
||||
.PS - 5 " "
|
||||
.PT N
|
||||
.RS
|
||||
.IP N
|
||||
Negative arguments only
|
||||
.PT P
|
||||
.IP P
|
||||
Positive and zero arguments only
|
||||
.PE
|
||||
.RE
|
||||
.IP ""
|
||||
mapping of arguments
|
||||
.PS - 5 " "
|
||||
.PT w
|
||||
.RS
|
||||
.IP w
|
||||
argument must be divisible by the wordsize and is divided by the
|
||||
wordsize before use as opcode argument.
|
||||
.PT o
|
||||
.IP o
|
||||
argument ( possibly after division ) must be >= 1 and is
|
||||
decremented before use as opcode argument
|
||||
.PE
|
||||
.IE
|
||||
.RE
|
||||
.LP
|
||||
If the opcode type is 2,4 or 8 the resulting argument is used as
|
||||
opcode argument (least significant byte first).
|
||||
.N
|
||||
If the opcode type is mini, the argument is added
|
||||
to the first opcode \- if in range \- .
|
||||
If the argument is negative, the absolute value minus one is
|
||||
used in the algorithm above.
|
||||
.N
|
||||
.br
|
||||
For shorties with positive arguments the first opcode is used
|
||||
for arguments in the range 0..255, the second for the range
|
||||
256..511, etc..
|
||||
|
@ -78,30 +79,32 @@ for arguments in the range \-1..\-256, the second for the range
|
|||
The byte following the opcode contains the least significant
|
||||
byte of the argument.
|
||||
First some examples of these specifications.
|
||||
.PS - 5
|
||||
.PT "aar mwPo 1 34"
|
||||
.IP "aar mwPo 1 34"
|
||||
.br
|
||||
Indicates that opcode 34 is used as a mini for Positive
|
||||
instruction arguments only.
|
||||
The w and o indicate division and decrementing of the
|
||||
instruction argument.
|
||||
Because the resulting argument must be zero ( only opcode 34 may be used
|
||||
), this mini can only be used for instruction argument 2.
|
||||
Because the resulting argument must be zero ( only opcode 34 may be used),
|
||||
this mini can only be used for instruction argument 2.
|
||||
Conclusion: opcode 34 is for "AAR 2".
|
||||
.PT "adp sP 1 41"
|
||||
.IP "adp sP 1 41"
|
||||
.br
|
||||
Opcode 41 is used as shortie for ADP with arguments in the range
|
||||
0..255.
|
||||
.PT "bra sN 2 60"
|
||||
.IP "bra sN 2 60"
|
||||
.br
|
||||
Opcode 60 is used as shortie for BRA with arguments \-1..\-256,
|
||||
61 is used for arguments \-257..\-512.
|
||||
.PT "zer e\- 145"
|
||||
.IP "zer e\- 145"
|
||||
.br
|
||||
Escaped opcode 145 is used for ZER.
|
||||
.PE
|
||||
.LP
|
||||
The interpreter opcode table:
|
||||
.N 1
|
||||
.IS 3
|
||||
.DS
|
||||
.so itables
|
||||
.IE
|
||||
.P
|
||||
.DE
|
||||
.PP
|
||||
The table above results in the following dispatch tables.
|
||||
Dispatch tables are used by interpreters to jump to the
|
||||
routines implementing the EM instructions, indexed by the next opcode.
|
||||
|
@ -110,60 +113,41 @@ of eight consecutive opcodes, preceded by the first opcode number
|
|||
on that line.
|
||||
Routine names consist of an EM mnemonic followed by a suffix.
|
||||
The suffices show the encoding used for each opcode.
|
||||
.N
|
||||
.LP
|
||||
The following suffices exist:
|
||||
.N 1
|
||||
.VS 1 0
|
||||
.IS 4
|
||||
.PS - 11
|
||||
.PT .z
|
||||
no arguments
|
||||
.PT .l
|
||||
16-bit argument
|
||||
.PT .L
|
||||
32-bit argument
|
||||
.PT .u
|
||||
16-bit unsigned argument
|
||||
.PT .lw
|
||||
16-bit argument divided by the wordsize
|
||||
.PT .Lw
|
||||
32-bit argument divided by the wordsize
|
||||
.PT .p
|
||||
positive 16-bit argument
|
||||
.PT .P
|
||||
positive 32-bit argument
|
||||
.PT .pw
|
||||
positive 16-bit argument divided by the wordsize
|
||||
.PT .Pw
|
||||
positive 32-bit argument divided by the wordsize
|
||||
.PT .n
|
||||
negative 16-bit argument
|
||||
.PT .N
|
||||
negative 32-bit argument
|
||||
.PT .nw
|
||||
negative 16-bit argument divided by the wordsize
|
||||
.PT .Nw
|
||||
negative 32-bit argument divided by the wordsize
|
||||
.PT .s<num>
|
||||
shortie with <num> as high order argument byte
|
||||
.PT .w<num>
|
||||
shortie with argument divided by the wordsize
|
||||
.PT .<num>
|
||||
mini with <num> as argument
|
||||
.PT .<num>W
|
||||
mini with <num>*wordsize as argument
|
||||
.PE 1
|
||||
.TS
|
||||
tab(:);
|
||||
l l.
|
||||
.z:no arguments
|
||||
.l:16-bit argument
|
||||
.L:32-bit argument
|
||||
.u:16-bit unsigned argument
|
||||
.lw:16-bit argument divided by the wordsize
|
||||
.Lw:32-bit argument divided by the wordsize
|
||||
.p:positive 16-bit argument
|
||||
.P:positive 32-bit argument
|
||||
.pw:positive 16-bit argument divided by the wordsize
|
||||
.Pw:positive 32-bit argument divided by the wordsize
|
||||
.n:negative 16-bit argument
|
||||
.N:negative 32-bit argument
|
||||
.nw:negative 16-bit argument divided by the wordsize
|
||||
.Nw:negative 32-bit argument divided by the wordsize
|
||||
.s<num>:shortie with <num> as high order argument byte
|
||||
.w<num>:shortie with argument divided by the wordsize
|
||||
.<num>:mini with <num> as argument
|
||||
.<num>W:mini with <num>*wordsize as argument
|
||||
.TE
|
||||
.LP
|
||||
<num> is a possibly negative integer.
|
||||
.VS
|
||||
.IE
|
||||
.LP
|
||||
The dispatch table for the 256 primary opcodes:
|
||||
.N 1
|
||||
.sp 1
|
||||
.so dispat1
|
||||
.N 2
|
||||
.sp 2
|
||||
The list of secondary opcodes (escape1):
|
||||
.N 1
|
||||
.sp 1
|
||||
.so dispat2
|
||||
.N 2
|
||||
.sp 2
|
||||
Finally, the list of opcodes with four byte arguments (escape2).
|
||||
.N 1
|
||||
.sp 1
|
||||
.so dispat3
|
||||
|
|
|
@ -1,7 +1,7 @@
|
|||
.BP
|
||||
.bp
|
||||
.AP "AN EXAMPLE PROGRAM"
|
||||
.A 1 0
|
||||
.NA
|
||||
.PP
|
||||
.na
|
||||
.ta 4n 8n 12n 16n 20n
|
||||
.nf
|
||||
1 program example(output);
|
||||
|
@ -45,12 +45,12 @@
|
|||
39 test(r)
|
||||
40 end.
|
||||
.fi
|
||||
.AD
|
||||
.BP
|
||||
.ad
|
||||
.bp
|
||||
The EM code as produced by the Pascal-VU compiler is given below. Comments
|
||||
have been added manually. Note that this code has already been optimized.
|
||||
.A 1 0
|
||||
.NA
|
||||
.LP
|
||||
.na
|
||||
.nf
|
||||
.ta 1n 24n
|
||||
mes 2,2,2 ; wordsize 2, pointersize 2
|
||||
|
@ -231,14 +231,13 @@ have been added manually. Note that this code has already been optimized.
|
|||
end 0
|
||||
mes 5 ; reals were used
|
||||
.fi
|
||||
.AD
|
||||
.A 1 0
|
||||
.ad
|
||||
.PP
|
||||
The compact code corresponding to the above program is listed below.
|
||||
Read it horizontally, line by line, not column by column.
|
||||
Each number represents a byte of compact code, printed in decimal.
|
||||
The first two bytes form the magic word.
|
||||
.N 1
|
||||
.IS 3
|
||||
.LP
|
||||
.Dr 33
|
||||
173 0 159 122 122 122 255 242 1 161 250 124 116 46 112 0
|
||||
255 156 245 40 2 245 0 128 120 155 249 123 115 117 109 160
|
||||
|
@ -274,4 +273,3 @@ The first two bytes form the magic word.
|
|||
116 8 122 69 120 20 249 124 95 104 108 116 8 122 152 120
|
||||
159 124 160 255 159 125 255
|
||||
.De
|
||||
.IE
|
||||
|
|
271
doc/em/assem.nr
271
doc/em/assem.nr
|
@ -1,6 +1,6 @@
|
|||
.BP
|
||||
.SN 11
|
||||
.S1 "EM ASSEMBLY LANGUAGE"
|
||||
.bp
|
||||
.P1 "EM ASSEMBLY LANGUAGE"
|
||||
.PP
|
||||
We use two representations for assembly language programs,
|
||||
one is in ASCII and the other is the compact assembly language.
|
||||
The latter needs less space than the first for the same program
|
||||
|
@ -16,7 +16,8 @@ The last part lists the EM instructions with the type of
|
|||
arguments allowed and an indication of the function.
|
||||
Appendix A gives a detailed description of the effect of all
|
||||
instructions in the form of a Pascal program.
|
||||
.S2 "ASCII assembly language"
|
||||
.P2 "ASCII assembly language"
|
||||
.PP
|
||||
An assembly language program consists of a series of lines, each
|
||||
line may be blank, contain one (pseudo)instruction or contain one
|
||||
label.
|
||||
|
@ -25,13 +26,13 @@ Upper case is used in this
|
|||
document merely to distinguish keywords from the surrounding prose.
|
||||
Comment is allowed at the end of each line and starts with a semicolon ";".
|
||||
This kind of comment does not exist in the compact form.
|
||||
.A
|
||||
.QQ
|
||||
Labels must be placed all by themselves on a line and start in
|
||||
column 1.
|
||||
There are two kinds of labels, instruction and data labels.
|
||||
Instruction labels are unsigned positive integers.
|
||||
The scope of an instruction label is its procedure.
|
||||
.A
|
||||
.QQ
|
||||
The pseudoinstructions CON, ROM and BSS may be preceded by a
|
||||
line containing a
|
||||
1\-8 character data label, the first character of which is a
|
||||
|
@ -46,13 +47,13 @@ These labels are considered as a special case and handled
|
|||
more efficiently in compact assembly language (see below).
|
||||
Note that a data label on its own or two consecutive labels are not
|
||||
allowed.
|
||||
.P
|
||||
.PP
|
||||
Each statement may contain an instruction mnemonic or pseudoinstruction.
|
||||
These must begin in column 2 or later (not column 1) and must be followed
|
||||
by a space, tab, semicolon or LF.
|
||||
Everything on the line following a semicolon is
|
||||
taken as a comment.
|
||||
.P
|
||||
.PP
|
||||
Each input file contains one module.
|
||||
A module may contain many procedures,
|
||||
which may be nested.
|
||||
|
@ -62,14 +63,15 @@ collection of instructions and pseudoinstructions and finally an END
|
|||
statement.
|
||||
Pseudoinstructions are also allowed between procedures.
|
||||
They do not belong to a specific procedure.
|
||||
.P
|
||||
.PP
|
||||
All constants in EM are interpreted in the decimal base.
|
||||
The ASCII assembly language accepts constant expressions
|
||||
wherever constants are allowed.
|
||||
The operators recognized are: +, \-, *, % and / with the usual
|
||||
precedence order.
|
||||
Use of the parentheses ( and ) to alter the precedence order is allowed.
|
||||
.S3 "Instruction arguments"
|
||||
.P3 "Instruction arguments"
|
||||
.PP
|
||||
Unlike many other assembly languages, the EM assembly
|
||||
language requires all arguments of normal and pseudoinstructions
|
||||
to be either a constant or an identifier, but not a combination
|
||||
|
@ -87,7 +89,7 @@ It is not allowed to add or subtract from instruction labels or procedure
|
|||
identifiers,
|
||||
which certainly is not a severe restriction and greatly aids
|
||||
optimization.
|
||||
.P
|
||||
.PP
|
||||
Instruction arguments can be constants,
|
||||
data labels, data labels offsetted by a constant, instruction
|
||||
labels and procedure identifiers.
|
||||
|
@ -98,7 +100,7 @@ that fit in a word.
|
|||
Arguments used as offsets to pointers should fit in a
|
||||
pointer-sized integer.
|
||||
Finally, arguments to LDC should fit in a double-word integer.
|
||||
.P
|
||||
.PP
|
||||
Several instructions have two possible forms:
|
||||
with an explicit argument and with an implicit argument on top of the stack.
|
||||
The size of the implicit argument is the wordsize.
|
||||
|
@ -109,7 +111,7 @@ integers on top of the stack are to be compared.
|
|||
on top of the stack that specifies the size of the integers to
|
||||
be compared.
|
||||
Thus the following two sequences are equivalent:
|
||||
.N 1
|
||||
.KS
|
||||
.TS
|
||||
center, tab(:) ;
|
||||
l r 30 l r.
|
||||
|
@ -118,16 +120,18 @@ LDL:\-14:LDL:\-14
|
|||
::LOC:4
|
||||
CMI:4:CMI:
|
||||
ZEQ:*1:ZEQ:*1
|
||||
.TE 1
|
||||
.TE
|
||||
.KE
|
||||
Section 11.1.6 shows the arguments allowed for each instruction.
|
||||
.S3 "Pseudoinstruction arguments"
|
||||
.P3 "Pseudoinstruction arguments"
|
||||
.PP
|
||||
Pseudoinstruction arguments can be divided in two classes:
|
||||
Initializers and others.
|
||||
The following initializers are allowed: signed integer constants,
|
||||
unsigned integer constants, floating-point constants, strings,
|
||||
data labels, data labels offsetted by a constant, instruction
|
||||
labels and procedure identifiers.
|
||||
.P
|
||||
.PP
|
||||
Constant initializers in BSS, HOL, CON and ROM pseudoinstructions
|
||||
can be followed by a letter I, U or F.
|
||||
This indicator
|
||||
|
@ -142,10 +146,9 @@ As in instruction arguments, initializers include expressions of the form:
|
|||
\&"LABEL+offset" and "LABEL\-offset".
|
||||
The offset must be an unsigned decimal constant.
|
||||
The 'IUF' indicators cannot be used in the offsets.
|
||||
.P
|
||||
.PP
|
||||
Data labels are referred to by their name.
|
||||
.P
|
||||
|
||||
.PP
|
||||
Strings are surrounded by double quotes (").
|
||||
Semicolon's in string do not indicate the start of comment.
|
||||
In the ASCII representation the escape character \e (backslash)
|
||||
|
@ -153,7 +156,6 @@ alters the meaning of subsequent character(s).
|
|||
This feature allows inclusion of zeroes, graphic characters and
|
||||
the double quote in the string.
|
||||
The following escape sequences exist:
|
||||
.DS
|
||||
.TS
|
||||
center, tab(:);
|
||||
l l l.
|
||||
|
@ -166,7 +168,6 @@ backslash:\e:\e\e
|
|||
double quote:":\e"
|
||||
bit pattern:\fBddd\fP:\e\fBddd\fP
|
||||
.TE
|
||||
.DE
|
||||
The escape \fB\eddd\fP consists of the backslash followed by 1,
|
||||
2, or 3 octal digits specifying the value of
|
||||
the desired character.
|
||||
|
@ -176,17 +177,18 @@ the backslash is ignored.
|
|||
Example: CON "hello\e012\e0".
|
||||
Each string element initializes a single byte.
|
||||
The ASCII character set is used to map characters onto values.
|
||||
.P
|
||||
.PP
|
||||
Instruction labels are referred to as *1, *2, etc. in both branch
|
||||
instructions and as initializers.
|
||||
.P
|
||||
.PP
|
||||
The notation $procname means the identifier for the procedure
|
||||
with the specified name.
|
||||
This identifier has the size of a pointer.
|
||||
.S3 Notation
|
||||
.P3 Notation
|
||||
.PP
|
||||
First, the notation used for the arguments, classes of
|
||||
instructions and pseudoinstructions.
|
||||
.IS 2
|
||||
.DS
|
||||
.TS
|
||||
tab(:);
|
||||
l l l.
|
||||
|
@ -204,9 +206,10 @@ l l l.
|
|||
<...>+:\&=:one or more of <...>
|
||||
[...]:\&=:optional ...
|
||||
.TE
|
||||
.IE
|
||||
.S3 "Pseudoinstructions"
|
||||
.S4 "Storage declaration"
|
||||
.DE
|
||||
.P3 "Pseudoinstructions"
|
||||
.P4 "Storage declaration"
|
||||
.PP
|
||||
Initialized global data is allocated by the pseudoinstruction CON,
|
||||
which needs at least one argument.
|
||||
Each argument is used to allocate and initialize a number of
|
||||
|
@ -215,7 +218,7 @@ The number of bytes to be allocated and the alignment depend on the type
|
|||
of the argument.
|
||||
For each argument, an integral number of words,
|
||||
determined by the argument type, is allocated and initialized.
|
||||
.P
|
||||
.PP
|
||||
The pseudoinstruction ROM is the same as CON,
|
||||
except that it guarantees that the initialized words
|
||||
will not change during the execution of the program.
|
||||
|
@ -223,7 +226,7 @@ This information allows optimizers to do
|
|||
certain calculations such as array indexing and
|
||||
subrange checking at compile time instead
|
||||
of at run time.
|
||||
.P
|
||||
.PP
|
||||
The pseudoinstruction BSS allocates
|
||||
uninitialized global data or large blocks of data initialized
|
||||
by the same value.
|
||||
|
@ -239,14 +242,14 @@ the second byte by 1 etc. in assembly language.
|
|||
The assembler/loader adds the base address of
|
||||
the HOL block to these numbers to obtain the
|
||||
absolute address in the machine language.
|
||||
.P
|
||||
.PP
|
||||
The scope of a HOL block starts at the HOL pseudo and
|
||||
ends at the next HOL pseudo or at the end of a module
|
||||
whatever comes first.
|
||||
Each instruction falls in the scope of at most one
|
||||
HOL block, the current HOL block.
|
||||
It is not allowed to have more than one HOL block per procedure.
|
||||
.P
|
||||
.PP
|
||||
The alignment restrictions are enforced by the
|
||||
pseudoinstructions.
|
||||
All initializers are aligned on a multiple of their size or the wordsize
|
||||
|
@ -257,52 +260,51 @@ Switching to another type of fragment or placing a label forces
|
|||
word-alignment.
|
||||
There are three types of fragments in global data space: CON, ROM and
|
||||
BSS/HOL.
|
||||
.N 1
|
||||
.IS 2
|
||||
.PS - 4
|
||||
.PT "BSS <cst1>,<val>,<cst2>"
|
||||
.IP "BSS <cst1>,<val>,<cst2>"
|
||||
.br
|
||||
Reserve <cst1> bytes.
|
||||
<val> is the value used to initialize the area.
|
||||
<cst1> must be a multiple of the size of <val>.
|
||||
<cst2> is 0 if the initialization is not strictly necessary,
|
||||
1 if it is.
|
||||
.PT "HOL <cst1>,<val>,<cst2>"
|
||||
.IP "HOL <cst1>,<val>,<cst2>"
|
||||
.br
|
||||
Idem, but all following absolute global data references will
|
||||
refer to this block.
|
||||
Only one HOL is allowed per procedure,
|
||||
it has to be placed before the first instruction.
|
||||
.PT "CON <val>+"
|
||||
.IP "CON <val>+"
|
||||
.br
|
||||
Assemble global data words initialized with the <val> constants.
|
||||
.PT "ROM <val>+"
|
||||
.IP "ROM <val>+"
|
||||
.br
|
||||
Idem, but the initialized data will never be changed by the program.
|
||||
.PE
|
||||
.IE
|
||||
.S4 "Partitioning"
|
||||
.P4 "Partitioning"
|
||||
.PP
|
||||
Two pseudoinstructions partition the input into procedures:
|
||||
.IS 2
|
||||
.PS - 4
|
||||
.PT "PRO <pro>[,<cst>]"
|
||||
.IP "PRO <pro>[,<cst>]"
|
||||
.br
|
||||
Start of procedure.
|
||||
<pro> is the procedure name.
|
||||
<cst> is the number of bytes for locals.
|
||||
The number of bytes for locals must be specified in the PRO or
|
||||
END pseudoinstruction.
|
||||
When specified in both, they must be identical.
|
||||
.PT "END [<cst>]"
|
||||
.IP "END [<cst>]"
|
||||
.br
|
||||
End of Procedure.
|
||||
<cst> is the number of bytes for locals.
|
||||
The number of bytes for locals must be specified in either the PRO or
|
||||
END pseudoinstruction or both.
|
||||
.PE
|
||||
.IE
|
||||
.S4 "Visibility"
|
||||
.P4 "Visibility"
|
||||
.PP
|
||||
Names of data and procedures in an EM module can either be
|
||||
internal or external.
|
||||
External names are known outside the module and are used to link
|
||||
several pieces of a program.
|
||||
Internal names are not known outside the modules they are used in.
|
||||
Other modules will not 'see' an internal name.
|
||||
.A
|
||||
.QQ
|
||||
To reduce the number of passes needed,
|
||||
it must be known at the first occurrence whether
|
||||
a name is internal or external.
|
||||
|
@ -312,47 +314,51 @@ If the first occurrence of a name is a reference,
|
|||
the name is considered to be external.
|
||||
If the first occurrence is in one of the following pseudoinstructions,
|
||||
the effect of the pseudo has precedence.
|
||||
.IS 2
|
||||
.PS - 4
|
||||
.PT "EXA <dlb>"
|
||||
.IP "EXA <dlb>"
|
||||
.br
|
||||
External name.
|
||||
<dlb> is known, possibly defined, outside this module.
|
||||
Note that <dlb> may be defined in the same module.
|
||||
.PT "EXP <pro>"
|
||||
.IP "EXP <pro>"
|
||||
.br
|
||||
External procedure identifier.
|
||||
Note that <pro> may be defined in the same module.
|
||||
.PT "INA <dlb>"
|
||||
.IP "INA <dlb>"
|
||||
.br
|
||||
Internal name.
|
||||
<dlb> is internal to this module and must be defined in this module.
|
||||
.PT "INP <pro>"
|
||||
.IP "INP <pro>"
|
||||
.br
|
||||
Internal procedure.
|
||||
<pro> is internal to this module and must be defined in this module.
|
||||
.PE
|
||||
.IE
|
||||
.S4 "Miscellaneous"
|
||||
.P4 "Miscellaneous"
|
||||
.PP
|
||||
Two other pseudoinstructions provide miscellaneous features:
|
||||
.IS 2
|
||||
.PS - 4
|
||||
.PT "EXC <cst1>,<cst2>"
|
||||
.IP "EXC <cst1>,<cst2>"
|
||||
.br
|
||||
Two blocks of instructions preceding this one are
|
||||
interchanged before being processed.
|
||||
<cst1> gives the number of lines of the first block.
|
||||
<cst2> gives the number of lines of the second one.
|
||||
Blank and pure comment lines do not count.
|
||||
This instruction is obsolete. Its use is strongly discouraged.
|
||||
.PT "MES <cst>[,<par>]*"
|
||||
.IP "MES <cst>[,<par>]*"
|
||||
.br
|
||||
A special type of comment.
|
||||
Used by compilers to communicate with the
|
||||
optimizer, assembler, etc. as follows:
|
||||
.VS 1 0
|
||||
.PS - 4
|
||||
.PT "MES 0"
|
||||
.RS
|
||||
.IP "MES 0"
|
||||
.br
|
||||
An error has occurred, stop further processing.
|
||||
.PT "MES 1"
|
||||
.IP "MES 1"
|
||||
.br
|
||||
Suppress optimization.
|
||||
.PT "MES 2,<cst1>,<cst2>"
|
||||
.IP "MES 2,<cst1>,<cst2>"
|
||||
.br
|
||||
Use wordsize <cst1> and pointer size <cst2>.
|
||||
.PT "MES 3,<cst1>,<cst2>,<cst3>,<cst4>"
|
||||
.IP "MES 3,<cst1>,<cst2>,<cst3>,<cst4>"
|
||||
.br
|
||||
Indicates that a local variable is never referenced indirectly.
|
||||
Used to indicate that a register may be used for a specific
|
||||
variable.
|
||||
|
@ -361,51 +367,57 @@ and offset from LB if negative.
|
|||
<cst2> gives the size of the variable.
|
||||
<cst3> indicates the class of the variable.
|
||||
The following values are currently recognized:
|
||||
.PS
|
||||
.PT 0
|
||||
The variable can be used for anything.
|
||||
.PT 1
|
||||
The variable is used as a loopindex.
|
||||
.PT 2
|
||||
The variable is used as a pointer.
|
||||
.PT 3
|
||||
The variable is used as a floating point number.
|
||||
.PE 0
|
||||
.br
|
||||
0\0\0\0The variable can be used for anything.
|
||||
.br
|
||||
1\0\0\0The variable is used as a loopindex.
|
||||
.br
|
||||
2\0\0\0The variable is used as a pointer.
|
||||
.br
|
||||
3\0\0\0The variable is used as a floating point number.
|
||||
.br
|
||||
<cst4> gives the priority of the variable,
|
||||
higher numbers indicate better candidates.
|
||||
.PT "MES 4,<cst>,<str>"
|
||||
.IP "MES 4,<cst>,<str>"
|
||||
.br
|
||||
Number of source lines in file <str> (for profiler).
|
||||
.PT "MES 5"
|
||||
.IP "MES 5"
|
||||
.br
|
||||
Floating point used.
|
||||
.PT "MES 6,<val>*"
|
||||
.IP "MES 6,<val>*"
|
||||
.br
|
||||
Comment. Used to provide comments in compact assembly language.
|
||||
.PT "MES 7,....."
|
||||
.IP "MES 7,....."
|
||||
.br
|
||||
Reserved.
|
||||
.PT "MES 8,<pro>[,<dlb>]..."
|
||||
.IP "MES 8,<pro>[,<dlb>]..."
|
||||
.br
|
||||
Library module. Indicates that the module may only be loaded
|
||||
if it is useful, that is, if it can satisfy any unresolved
|
||||
references during the loading process.
|
||||
May not be preceded by any other pseudo, except MES's.
|
||||
.PT "MES 9,<cst>"
|
||||
.IP "MES 9,<cst>"
|
||||
.br
|
||||
Guarantees that no more than <cst> bytes of parameters are
|
||||
accessed, either directly or indirectly.
|
||||
.PT "MES 10,<cst>[,<par>]*
|
||||
.IP "MES 10,<cst>[,<par>]*
|
||||
.br
|
||||
This message number is reserved for the global optimizer.
|
||||
It inserts these messages in its output as hints to backends.
|
||||
<cst> indicates the type of hint.
|
||||
.PT "MES 11"
|
||||
.IP "MES 11"
|
||||
.br
|
||||
Procedures containing this message are possible destinations of
|
||||
non-local goto's with the GTO instruction.
|
||||
Some backends keep locals in registers,
|
||||
the locals in this procedure should not be kept in registers and
|
||||
all registers containing locals of other procedures should be
|
||||
saved upon entry to this procedure.
|
||||
.PE 1
|
||||
.VS
|
||||
.RE
|
||||
.IP ""
|
||||
Each backend is free to skip irrelevant MES pseudos.
|
||||
.PE
|
||||
.IE
|
||||
.S2 "The Compact Assembly Language"
|
||||
.P2 "The Compact Assembly Language"
|
||||
.PP
|
||||
The assembler accepts input in a highly encoded form.
|
||||
This
|
||||
form is intended to reduce the amount of file transport between the
|
||||
|
@ -414,16 +426,14 @@ and back ends, and also reduces the amount of storage required for storing
|
|||
libraries.
|
||||
Libraries are stored as archived compact assembly language, not machine
|
||||
language.
|
||||
.P
|
||||
.PP
|
||||
When beginning to read the input, the assembler is in neutral state, and
|
||||
expects either a label or an instruction (including the pseudoinstructions).
|
||||
The meaning of the next byte(s) when in neutral state is as follows, where
|
||||
b1, b2
|
||||
etc. represent the succeeding bytes.
|
||||
.N 1
|
||||
.DS
|
||||
.TS
|
||||
tab(:) ;
|
||||
tab(:);
|
||||
rw17 4 l.
|
||||
0:Reserved for future use
|
||||
1\-129:Machine instructions, see Appendix A, alphabetical list
|
||||
|
@ -433,38 +443,31 @@ rw17 4 l.
|
|||
180\-239:Instruction labels 0 \- 59 (180 is local label 0 etc.)
|
||||
240\-244:See the Common Table below
|
||||
245\-255:Not used
|
||||
.TE 1
|
||||
.DE 0
|
||||
.TE
|
||||
After a label, the assembler is back in neutral state; it can immediately
|
||||
accept another label or an instruction in the next byte.
|
||||
No linefeeds are used to separate lines.
|
||||
.P
|
||||
.PP
|
||||
If an opcode expects no arguments,
|
||||
the assembler is back in neutral state after
|
||||
reading the one byte containing the instruction number.
|
||||
If it has one or
|
||||
more arguments (only pseudos have more than 1), the arguments follow directly,
|
||||
encoded as follows:
|
||||
.N 1
|
||||
.IS 2
|
||||
.TS
|
||||
tab(:);
|
||||
r l.
|
||||
0\-239:Offsets from \-120 to 119
|
||||
|
||||
240\-255:See the Common Table below
|
||||
.TE 1
|
||||
.TE
|
||||
Absence of an optional argument is indicated by a special
|
||||
byte.
|
||||
.IE 2
|
||||
.NE 7
|
||||
.CS
|
||||
Common Table for Neutral State and Arguments
|
||||
.CE
|
||||
.TS
|
||||
tab(:);
|
||||
c s s s
|
||||
c c s c
|
||||
l4 l l4 l.
|
||||
Common Table for Neutral State and Arguments
|
||||
class:bytes:description
|
||||
|
||||
<ilb>:240:b1:Instruction label b1 (Not used for branches)
|
||||
|
@ -486,7 +489,7 @@ class:bytes:description
|
|||
<end>:255::Delimiter for argument lists or
|
||||
:::indicates absence of optional argument
|
||||
.TE 1
|
||||
.P
|
||||
.PP
|
||||
The bytes specifying the value of a 16, 32 or 64 bit constant
|
||||
are presented in two's complement notation, with the least
|
||||
significant byte first. For example: the value of a 32 bit
|
||||
|
@ -494,25 +497,22 @@ constant is ((s4*256+b3)*256+b2)*256+b1, where s4 is b4\-256 if
|
|||
b4 is greater than 128 else s4 takes the value of b4.
|
||||
A <string> consists of a <cst> immediately followed by
|
||||
a sequence of bytes with length <cst>.
|
||||
.P
|
||||
.PP
|
||||
.ne 8
|
||||
The pseudoinstructions fall into several categories, depending on their
|
||||
arguments:
|
||||
.N 1
|
||||
.DS
|
||||
Group 1 \- EXC, BSS, HOL have a known number of arguments
|
||||
Group 2 \- EXA, EXP, INA, INP have a string as argument
|
||||
Group 3 \- CON, MES, ROM have a variable number of various things
|
||||
Group 4 \- END, PRO have a trailing optional argument.
|
||||
.DE 1
|
||||
Group 1 \- EXC, BSS, HOL have a known number of arguments
|
||||
Group 2 \- EXA, EXP, INA, INP have a string as argument
|
||||
Group 3 \- CON, MES, ROM have a variable number of various things
|
||||
Group 4 \- END, PRO have a trailing optional argument.
|
||||
.DE
|
||||
Groups 1 and 2
|
||||
use the encoding described above.
|
||||
Group 3 also uses the encoding listed above, with an <end> byte after the
|
||||
last argument to indicate the end of the list.
|
||||
Group 4 uses
|
||||
an <end> byte if the trailing argument is not present.
|
||||
.N 2
|
||||
.IS 2
|
||||
.TS
|
||||
tab(|);
|
||||
l s l
|
||||
|
@ -523,18 +523,17 @@ Example ASCII|Example compact
|
|||
|
||||
2||182
|
||||
1||181
|
||||
LOC|10|69 130
|
||||
LOC|\-10|69 110
|
||||
LOC|300|69 245 44 1
|
||||
BRA|*19|18 139
|
||||
\0LOC|10|69 130
|
||||
\0LOC|\-10|69 110
|
||||
\0LOC|300|69 245 44 1
|
||||
\0BRA|*19|18 139
|
||||
300||241 44 1
|
||||
.3||242 3
|
||||
CON|4,9,*2,$foo|151 124 129 240 2 249 123 102 111 111 255
|
||||
CON|.35|151 242 35 255
|
||||
.TE 0
|
||||
.IE 0
|
||||
.S2 "Assembly language instruction list"
|
||||
.P
|
||||
\0CON|4,9,*2,$foo|151 124 129 240 2 249 123 102 111 111 255
|
||||
\0CON|.35|151 242 35 255
|
||||
.TE
|
||||
.P2 "Assembly language instruction list"
|
||||
.PP
|
||||
For each instruction in the list the range of argument values
|
||||
in the assembly language is given.
|
||||
The column headed \fIassem\fP contains the mnemonics defined
|
||||
|
@ -558,7 +557,7 @@ are indicated by letters:
|
|||
.ds z \fBz\fP
|
||||
.ds o \fBo\fP
|
||||
.ds - \fB\-\fP
|
||||
.N 1
|
||||
.sp
|
||||
.TS
|
||||
tab(:);
|
||||
c s l l
|
||||
|
@ -579,8 +578,8 @@ l l 15 l l.
|
|||
\&\*b:ilb:>= 0:label number
|
||||
\&\*r:cst:0,1,2:register number
|
||||
\&\*-:::no argument
|
||||
.TE 1
|
||||
.P
|
||||
.TE
|
||||
.PP
|
||||
The * at the rationale for \*w indicates that the argument
|
||||
can either be given as argument or on top of the stack.
|
||||
If the argument is omitted, the argument is fetched from the
|
||||
|
@ -589,8 +588,7 @@ it is assumed to be a wordsized unsigned integer.
|
|||
Instructions that check for undefined integer or floating-point
|
||||
values and underflow or overflow
|
||||
are indicated below by (*).
|
||||
.N 1
|
||||
.VS 0 0
|
||||
.sp 1
|
||||
.DS
|
||||
.ta 12n
|
||||
GROUP 1 \- LOAD
|
||||
|
@ -687,7 +685,7 @@ GROUP 7 \- INCREMENT/DECREMENT/ZERO
|
|||
ZER \*w : Load \*w zero bytes
|
||||
.DE
|
||||
|
||||
.DS \" ???
|
||||
.DS
|
||||
GROUP 8 \- CONVERT (stack: source, source size, dest. size (top))
|
||||
|
||||
CII \*- : Convert integer to integer (*)
|
||||
|
@ -744,7 +742,7 @@ GROUP 12 \- COMPARE
|
|||
TGT \*- : True if greater, i.e. iff top of stack > 0
|
||||
.DE
|
||||
|
||||
.DS \" ???
|
||||
.DS
|
||||
GROUP 13 \- BRANCH
|
||||
|
||||
BRA \*b : Branch unconditionally to label \*b
|
||||
|
@ -801,5 +799,4 @@ GROUP 15 \- MISCELLANEOUS
|
|||
SIM \*- : Store 16 bit ignore mask
|
||||
STR \*r : Store register (0=LB, 1=SP, 2=HP)
|
||||
TRP \*- : Cause trap to occur (Error number on stack)
|
||||
.DE 0
|
||||
.VS
|
||||
.DE
|
||||
|
|
|
@ -1,6 +1,4 @@
|
|||
.MS T A 0
|
||||
.ME
|
||||
.BP
|
||||
.MS B A 0
|
||||
.ME
|
||||
.CT
|
||||
.de PT
|
||||
..
|
||||
.bp
|
||||
.Ct
|
||||
|
|
|
@ -1,69 +1,62 @@
|
|||
.SN 7
|
||||
.BP
|
||||
.S1 "DESCRIPTORS"
|
||||
.bp
|
||||
.P1 "DESCRIPTORS"
|
||||
.PP
|
||||
Several instructions use descriptors, notably the range check instruction,
|
||||
the array instructions, the goto instruction and the case jump instructions.
|
||||
Descriptors reside in data space.
|
||||
They may be constructed at run time, but
|
||||
more often they are fixed and allocated in ROM data.
|
||||
.P
|
||||
.PP
|
||||
All instructions using descriptors, except GTO, have as argument
|
||||
the size of the integers in the descriptor.
|
||||
All implementations have to allow integers of the size of a
|
||||
word in descriptors.
|
||||
All integers popped from the stack and used for indexing or comparing
|
||||
must have the same size as the integers in the descriptor.
|
||||
.S2 "Range check descriptors"
|
||||
.P2 "Range check descriptors"
|
||||
.PP
|
||||
Range check descriptors consist of two integers:
|
||||
.IS 2
|
||||
.PS 1 4 "" .
|
||||
.PT
|
||||
.IP 1.
|
||||
lower bound signed
|
||||
.PT
|
||||
.IP 2.
|
||||
upper bound signed
|
||||
.PE
|
||||
.IE
|
||||
.LP
|
||||
The range check instruction checks an integer on the stack against
|
||||
these bounds and causes a trap if the value is outside the interval.
|
||||
The value itself is neither changed nor removed from the stack.
|
||||
.S2 "Array descriptors"
|
||||
.P2 "Array descriptors"
|
||||
.PP
|
||||
Each array descriptor describes a single dimension.
|
||||
For multi-dimensional arrays, several array instructions are
|
||||
needed to access a single element.
|
||||
Array descriptors contain the following three integers:
|
||||
.IS 2
|
||||
.PS 1 4 "" .
|
||||
.PT
|
||||
.IP 1.
|
||||
lower bound signed
|
||||
.PT
|
||||
.IP 2.
|
||||
upper bound \- lower bound unsigned
|
||||
.PT
|
||||
.IP 3.
|
||||
number of bytes per element unsigned
|
||||
.PE
|
||||
.IE
|
||||
.LP
|
||||
The array instructions LAR, SAR and AAR have the pointer to the start
|
||||
of the descriptor as operand on the stack.
|
||||
.sp
|
||||
.LP
|
||||
The element A[I] is fetched as follows:
|
||||
.IS 2
|
||||
.PS 1 4 "" .
|
||||
.PT
|
||||
.IP 1.
|
||||
Stack the address of A (e.g., using LAE or LAL)
|
||||
.PT
|
||||
.IP 2.
|
||||
Stack the value of I (n-byte integer)
|
||||
.PT
|
||||
.IP 3.
|
||||
Stack the pointer to the descriptor (e.g., using LAE)
|
||||
.PT
|
||||
.IP 4.
|
||||
LAR n (n is the size of the integers in the descriptor and I)
|
||||
.PE
|
||||
.IE
|
||||
.LP
|
||||
All array instructions first pop the address of the descriptor
|
||||
and the index.
|
||||
If the index is not within the bounds specified, a trap occurs.
|
||||
If ok, (I~\-~lower bound) is multiplied
|
||||
by the number of bytes per element (the third word). The result is added
|
||||
to the address of A and replaces A on the stack.
|
||||
.A
|
||||
.QQ
|
||||
At this point LAR, SAR and AAR diverge.
|
||||
AAR is finished. LAR pops the address and fetches the data
|
||||
item,
|
||||
|
@ -71,21 +64,19 @@ the size being specified by the descriptor.
|
|||
The usual restrictions for memory access must be obeyed.
|
||||
SAR pops the address and stores the
|
||||
data item now exposed.
|
||||
.S2 "Non-local goto descriptors"
|
||||
.P2 "Non-local goto descriptors"
|
||||
.PP
|
||||
The GTO instruction provides a way of returning directly to any
|
||||
active procedure invocation.
|
||||
The argument of the instruction is the address of a descriptor
|
||||
containing three pointers:
|
||||
.IS 2
|
||||
.PS 1 4 "" .
|
||||
.PT
|
||||
.IP 1.
|
||||
value of PC after the jump
|
||||
.PT
|
||||
.IP 2.
|
||||
value of SP after the jump
|
||||
.PT
|
||||
.IP 3.
|
||||
value of LB after the jump
|
||||
.PE
|
||||
.IE
|
||||
.LP
|
||||
GTO replaces the loads PC, SP and LB from the descriptor,
|
||||
thereby jumping to a procedure
|
||||
and removing zero or more frames from the stack.
|
||||
|
@ -94,7 +85,8 @@ dynamically enclosing procedure,
|
|||
because some EM implementations will need to backtrack through
|
||||
the dynamic chain and use the implementation dependent data
|
||||
in frames to restore registers etc.
|
||||
.S2 "Case descriptors"
|
||||
.P2 "Case descriptors"
|
||||
.PP
|
||||
The case jump instructions CSA and CSB both
|
||||
provide multiway branches selected by a case index.
|
||||
Both fetch two operands from the stack:
|
||||
|
@ -106,7 +98,7 @@ Therefore, the descriptors for CSA and CSB,
|
|||
as shown in figure 4, are different.
|
||||
All pointers in the table must be addresses of instructions in the
|
||||
procedure executing the case instruction.
|
||||
.P
|
||||
.PP
|
||||
CSA selects the new PC by indexing.
|
||||
If the index, a signed integer, is greater than or equal to
|
||||
the lower bound and less than or equal to the upper bound,
|
||||
|
@ -116,23 +108,22 @@ The table does not contain the value of the upper bound,
|
|||
but the value of upper-lower as an unsigned integer.
|
||||
The default instruction pointer is used when the index is out of bounds.
|
||||
If the resulting PC is 0, then trap.
|
||||
.P
|
||||
.PP
|
||||
CSB selects the new PC by searching.
|
||||
The table is searched for an entry with index value equal to the case index.
|
||||
That entry or, if none is found, the default entry contains the
|
||||
new PC.
|
||||
When the resulting PC is 0, a trap is performed.
|
||||
.P
|
||||
.PP
|
||||
The choice of which case instruction to use for
|
||||
each source language case statement
|
||||
is up to the front end.
|
||||
If the range of the index value is dense, i.e
|
||||
.DS
|
||||
(highest value \- lowest value) / number of cases
|
||||
.DE 1
|
||||
.DE
|
||||
is less than some threshold, then CSA is the obvious choice.
|
||||
If the range is sparse, CSB is better.
|
||||
.N 2
|
||||
.Dr 30
|
||||
|--------------------| |--------------------| high address
|
||||
| pointer for upb | | pointer n-1 |
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
.BP
|
||||
.SN 4
|
||||
.S1 "DATA ADDRESS SPACE"
|
||||
.bp
|
||||
.P1 "DATA ADDRESS SPACE"
|
||||
.PP
|
||||
The data address space is divided into three parts, called 'areas',
|
||||
each with its own addressing method:
|
||||
global data area,
|
||||
|
@ -9,24 +9,24 @@ and heap data area.
|
|||
These data areas must be part of the same
|
||||
address space because all data is accessed by
|
||||
the same type of pointers.
|
||||
.P
|
||||
.PP
|
||||
Space for global data is reserved using several pseudoinstructions in the
|
||||
assembly language, as described in
|
||||
the next paragraph and chapter 11.
|
||||
The size of the global data area is fixed per program.
|
||||
.A
|
||||
.QQ
|
||||
Global data is addressed absolutely in the machine language.
|
||||
Many instructions are available to address global data.
|
||||
They all have an absolute address as argument.
|
||||
Examples are LOE, LAE and STE.
|
||||
.P
|
||||
.PP
|
||||
Part of the global data area is initialized by the
|
||||
compiler, the
|
||||
rest is not initialized at all or is initialized
|
||||
with a value, typically \-32768 or 0.
|
||||
Part of the initialized global data may be made read-only
|
||||
if the implementation supports protection.
|
||||
.P
|
||||
.PP
|
||||
The local data area is used as a stack,
|
||||
which grows from high to low addresses
|
||||
and contains some data for each active procedure
|
||||
|
@ -44,14 +44,14 @@ Variables in other active procedures are addressed by following
|
|||
the chain of statically enclosing procedures using the LXL or LXA instruction.
|
||||
The variables in dynamically enclosing procedures can be
|
||||
addressed with the use of the DCH instruction.
|
||||
.A
|
||||
.QQ
|
||||
Many instructions have offsets to LB as argument,
|
||||
for instance LOL, LAL and STL.
|
||||
The arguments of these instructions range from \-1 to some
|
||||
(negative) minimum
|
||||
for the access of local storage and from 0 to some (positive)
|
||||
maximum for parameter access.
|
||||
.P
|
||||
.PP
|
||||
The procedure call instructions CAL and CAI each create a new frame
|
||||
on the stack.
|
||||
Each procedure has an assembly-time parameter specifying
|
||||
|
@ -62,7 +62,7 @@ Each procedure, therefore, starts with a stack with the local variables
|
|||
already allocated.
|
||||
The return instructions RET and RTT remove a frame.
|
||||
The actual parameters must be removed by the calling procedure.
|
||||
.P
|
||||
.PP
|
||||
RET may copy some words from the stack of
|
||||
the returning procedure to an unnamed 'function return area'.
|
||||
This area is available for 'READ-ONCE' access using the LFR instruction.
|
||||
|
@ -86,7 +86,7 @@ area is twice the pointer size,
|
|||
because we want to be able to handle 'procedure instance
|
||||
identifiers' which consist of a procedure identifier and the LB
|
||||
of a frame belonging to that procedure.
|
||||
.P
|
||||
.PP
|
||||
The heap data area grows upwards, to higher numbered
|
||||
addresses.
|
||||
It is initially empty.
|
||||
|
@ -96,7 +96,8 @@ The heap pointer may be manipulated
|
|||
by the LOR and STR instructions.
|
||||
The heap can only be addressed indirectly,
|
||||
by pointers derived from previous values of HP.
|
||||
.S2 "Global data area"
|
||||
.P2 "Global data area"
|
||||
.PP
|
||||
The initial size of the global data area is determined at assembly time.
|
||||
Global data is allocated by several
|
||||
pseudoinstructions in the EM assembly
|
||||
|
@ -109,7 +110,7 @@ under certain conditions, several blocks are allocated
|
|||
in a single fragment.
|
||||
This guarantees that the bytes of these blocks
|
||||
are consecutive.
|
||||
.P
|
||||
.PP
|
||||
Global data is addressed absolutely in binary
|
||||
machine language.
|
||||
Most compilers, however,
|
||||
|
@ -124,7 +125,7 @@ It is the task of the assembler/loader to
|
|||
translate these labels into absolute addresses.
|
||||
These labels may also be used
|
||||
in CON and ROM pseudoinstructions to initialize pointers.
|
||||
.P
|
||||
.PP
|
||||
The pseudoinstruction CON allocates initialized data.
|
||||
ROM acts like CON but indicates that the initialized data will
|
||||
not change during execution of the program.
|
||||
|
@ -134,7 +135,7 @@ data.
|
|||
The pseudoinstruction HOL is similar to BSS,
|
||||
but it alters the meaning of subsequent absolute addressing in
|
||||
the assembly language.
|
||||
.P
|
||||
.PP
|
||||
Another type of global data is a small block,
|
||||
called the ABS block, with an implementation defined size.
|
||||
Storage in this type of block can only be addressed
|
||||
|
@ -146,7 +147,7 @@ update this counter.
|
|||
A pointer at location 4 points to a string containing the
|
||||
current source file name.
|
||||
The instruction FIL can be used to update the pointer.
|
||||
.P
|
||||
.PP
|
||||
All numeric arguments of the instructions that address
|
||||
the global data area refer to locations in the
|
||||
ABS block unless
|
||||
|
@ -158,7 +159,7 @@ Thus LOE 0 loads the zeroth word of the most recent HOL, unless no HOL has
|
|||
appeared in the current file so
|
||||
far, in which case it loads the zeroth word of the
|
||||
ABS fragment.
|
||||
.P
|
||||
.PP
|
||||
The global data area is highly fragmented.
|
||||
The ABS block and each HOL and BSS block are separate fragments.
|
||||
The way fragments are formed from CON and ROM blocks is more complex.
|
||||
|
@ -169,12 +170,11 @@ allocated consecutively in a single fragment, unless
|
|||
these CON pseudos are separated in the assembly language program
|
||||
by a data label definition or one or more of the following pseudos:
|
||||
.DS
|
||||
|
||||
ROM, BSS, HOL and END
|
||||
|
||||
ROM, BSS, HOL and END
|
||||
.DE
|
||||
An analogous rule holds for ROM pseudos.
|
||||
.S2 "Local data area"
|
||||
.P2 "Local data area"
|
||||
.PP
|
||||
The local data area consists of a sequence of frames, one for
|
||||
each active procedure.
|
||||
Below the frame of the current procedure resides the
|
||||
|
@ -183,17 +183,15 @@ Frames are generated by procedure calls and are
|
|||
removed by procedure returns.
|
||||
A procedure frame consists of six 'zones':
|
||||
.DS
|
||||
|
||||
1. The return status block
|
||||
2. The local variables and compiler temporaries
|
||||
3. The register save block
|
||||
4. The dynamic local generators
|
||||
5. The operand stack.
|
||||
6. The parameters of a procedure one level deeper
|
||||
|
||||
1. The return status block
|
||||
2. The local variables and compiler temporaries
|
||||
3. The register save block
|
||||
4. The dynamic local generators
|
||||
5. The operand stack.
|
||||
6. The parameters of a procedure one level deeper
|
||||
.DE
|
||||
A sample frame is shown in Figure 1.
|
||||
.P
|
||||
.PP
|
||||
Before a procedure call is performed the actual
|
||||
parameters are pushed onto the stack of the calling procedure.
|
||||
The exact details are compiler dependent.
|
||||
|
@ -216,11 +214,11 @@ These instructions assume that this parameter contains the LB of
|
|||
the statically enclosing procedure.
|
||||
Procedures that do not have a dynamically enclosing procedure
|
||||
do not need a static link at offset 0.
|
||||
.P
|
||||
.PP
|
||||
Two instructions are available to perform procedure calls, CAL
|
||||
and CAI.
|
||||
Several tasks are performed by these call instructions.
|
||||
.A
|
||||
.QQ
|
||||
First, a part of the status of the calling procedure is
|
||||
saved on the stack in the return status block.
|
||||
This block should contain the return address of the calling
|
||||
|
@ -235,12 +233,12 @@ The stack frames need not be contiguous then and the first
|
|||
status save area can contain the parameter base AB,
|
||||
which has the value of SP just after the last parameter has
|
||||
been pushed.
|
||||
.A
|
||||
.QQ
|
||||
Second, the LB is changed to point to the
|
||||
first word above the local variables.
|
||||
The new LB is a copy of the SP after the return status
|
||||
block has been pushed.
|
||||
.A
|
||||
.QQ
|
||||
Third, the amount of local storage needed by the procedure is
|
||||
reserved.
|
||||
The parameters and local storage are accessed by the same instructions.
|
||||
|
@ -256,28 +254,28 @@ The initial value of the allocated words is
|
|||
not defined, but implementations that check for undefined
|
||||
values will probably initialize them with a
|
||||
special 'undefined' pattern, typically \-32768.
|
||||
.A
|
||||
.QQ
|
||||
Fourth, any EM implementation is allowed to reserve a variable size
|
||||
block beneath the local variables.
|
||||
This block could, for example, be used to save a variable number
|
||||
of registers.
|
||||
.A
|
||||
.QQ
|
||||
Finally, the address of the entry point of the called procedure
|
||||
is loaded into the Program Counter.
|
||||
.P
|
||||
.PP
|
||||
The ASP instruction can be used to allocate further (dynamic)
|
||||
local storage.
|
||||
The base address of such storage must be obtained with a LOR~SP
|
||||
instruction.
|
||||
This same instruction ASP may also be used
|
||||
to remove some words from the stack.
|
||||
.P
|
||||
.PP
|
||||
There is a version of ASP, called ASS, which fetches the number
|
||||
of bytes to allocate from the stack.
|
||||
It can be used to allocate space for local
|
||||
objects whose size is unknown at compile time,
|
||||
so called 'dynamic local generators'.
|
||||
.P
|
||||
.PP
|
||||
Control is returned to the calling procedure with a RET instruction.
|
||||
Any return value is then copied to the 'function return area'.
|
||||
The frame created by the call is deallocated and the status of
|
||||
|
@ -293,7 +291,7 @@ Violating this restriction might result in hard to detect
|
|||
errors.
|
||||
The calling procedure has to remove the parameters from the stack.
|
||||
This can be done with the aforementioned ASP instruction.
|
||||
.P
|
||||
.PP
|
||||
Each procedure frame is a separate fragment.
|
||||
Because any fragment may be placed anywhere in memory,
|
||||
procedure frames need not be contiguous.
|
||||
|
@ -345,7 +343,8 @@ procedure frames need not be contiguous.
|
|||
.Df
|
||||
Figure 1. A sample procedure frame and parameters.
|
||||
.De
|
||||
.S2 "Heap data area"
|
||||
.P2 "Heap data area"
|
||||
.PP
|
||||
The heap area starts empty, with HP
|
||||
pointing to the low end of it.
|
||||
HP always contains a word address.
|
||||
|
@ -360,7 +359,7 @@ are allocated to the heap.
|
|||
The heap may not grow into a part of memory that is already allocated.
|
||||
When this is attempted, the STR instruction will cause a trap to occur.
|
||||
In this case, HP retains its old value.
|
||||
.P
|
||||
.PP
|
||||
The only way to address the heap is indirectly.
|
||||
Whenever an object is allocated by increasing HP,
|
||||
then the old HP value must be saved and can be used later to address
|
||||
|
@ -370,7 +369,7 @@ is no longer part of the heap, then an attempt to access
|
|||
the object is not allowed.
|
||||
Furthermore, if the heap pointer is increased again to above
|
||||
the object address, then access to the old object gives undefined results.
|
||||
.P
|
||||
.PP
|
||||
The heap is a single fragment.
|
||||
All bytes have consecutive addresses.
|
||||
No limits are imposed on the size of the heap as long as it fits
|
||||
|
|
10
doc/em/em.i
10
doc/em/em.i
|
@ -1,3 +1,10 @@
|
|||
.bp
|
||||
.AP "EM INTERPRETER"
|
||||
.nf
|
||||
.ft CW
|
||||
.lg 0
|
||||
.nr x \w' '
|
||||
.ta \nxu +\nxu +\nxu +\nxu +\nxu +\nxu +\nxu +\nxu +\nxu +\nxu
|
||||
|
||||
{ This is an interpreter for EM. It serves as the official machine
|
||||
definition. This interpreter must run on a machine which supports
|
||||
|
@ -1666,3 +1673,6 @@ case insr of
|
|||
writeln('halt with exit status: ',exitstatus:1);
|
||||
doident;
|
||||
end.
|
||||
.ft P
|
||||
.lg 1
|
||||
.fi
|
||||
|
|
113
doc/em/env.nr
113
doc/em/env.nr
|
@ -1,56 +1,43 @@
|
|||
.SN 8
|
||||
.VS 1 0
|
||||
.BP
|
||||
.S1 "ENVIRONMENT INTERACTIONS"
|
||||
.bp
|
||||
.P1 "ENVIRONMENT INTERACTIONS"
|
||||
.PP
|
||||
EM programs can interact with their environment in three ways.
|
||||
Two, starting/stopping and monitor calls, are dealt with in this chapter.
|
||||
The remaining way to interact, interrupts, will be treated
|
||||
together with traps in chapter 9.
|
||||
.S2 "Program starting and stopping"
|
||||
.P2 "Program starting and stopping"
|
||||
.PP
|
||||
EM user programs start with a call to a procedure called
|
||||
_m_a_i_n.
|
||||
The assembler and backends look for the definition of a procedure
|
||||
with this name in their input.
|
||||
The call passes three parameters to the procedure.
|
||||
The parameters are similar to the parameters supplied by the
|
||||
UNIX
|
||||
.FS
|
||||
UNIX is a Trademark of Bell Laboratories.
|
||||
.FE
|
||||
.UX
|
||||
operating system to C programs.
|
||||
These parameters are often called
|
||||
.BW argc ,
|
||||
.B argv
|
||||
and
|
||||
.BW envp .
|
||||
These parameters are often called \fBargc\fP, \fBargv\fP and \fBenvp\fP.
|
||||
Argc is the parameter nearest to LB and is a wordsized integer.
|
||||
The other two are pointers to the first element of an array of
|
||||
string pointers.
|
||||
.N
|
||||
The
|
||||
.B argv
|
||||
array contains
|
||||
.B argc
|
||||
The \fBargv\fP array contains \fBargc\fP
|
||||
strings, the first of which contains the program call name.
|
||||
The other strings in the
|
||||
.B argv
|
||||
The other strings in the \fBargv\fP
|
||||
array are the program parameters.
|
||||
.P
|
||||
The
|
||||
.B envp
|
||||
.PP
|
||||
The \fBenvp\fP
|
||||
array contains strings in the form "name=string", where 'name'
|
||||
is the name of an environment variable and string its value.
|
||||
The
|
||||
.B envp
|
||||
The \fBenvp\fP
|
||||
is terminated by a zero pointer.
|
||||
.P
|
||||
.PP
|
||||
An EM user program stops if the program returns from the first
|
||||
invocation of _m_a_i_n.
|
||||
The contents of the function return area are used to procure a
|
||||
wordsized program return code.
|
||||
EM programs also stop when traps and interrupts occur that are
|
||||
not caught and when the exit monitor call is executed.
|
||||
.S2 "Input/Output and other monitor calls"
|
||||
.P2 "Input/Output and other monitor calls"
|
||||
.PP
|
||||
EM differs from most conventional machines in that it has high level i/o
|
||||
instructions.
|
||||
Typical instructions are OPEN FILE and READ FROM FILE instead
|
||||
|
@ -58,7 +45,7 @@ of low level instructions such as setting and clearing
|
|||
bits in device registers.
|
||||
By providing such high level i/o primitives, the task of implementing
|
||||
EM on various non EM machines is made considerably easier.
|
||||
.P
|
||||
.PP
|
||||
I/O is initiated by the MON instruction, which expects an iocode on top
|
||||
of the stack.
|
||||
Often there are also parameters which are pushed on the
|
||||
|
@ -68,45 +55,35 @@ Some i/o functions also provide results, which are returned on the stack.
|
|||
In the list of monitor calls we use several types of parameters and results,
|
||||
these types consist of integers and unsigneds of varying sizes, but never
|
||||
smaller than the wordsize, and the two pointer types.
|
||||
.N 1
|
||||
.LP
|
||||
The names of the types used are:
|
||||
.IS 4
|
||||
.PS - 10
|
||||
.PT int
|
||||
an integer of wordsize
|
||||
.PT int2
|
||||
an integer whose size is the maximum of the wordsize and 2
|
||||
bytes
|
||||
.PT int4
|
||||
an integer whose size is the maximum of the wordsize and 4
|
||||
bytes
|
||||
.PT intp
|
||||
an integer with the size of a pointer
|
||||
.PT uns2
|
||||
an unsigned integer whose size is the maximum of the wordsize and 2
|
||||
.PT unsp
|
||||
an unsigned integer with the size of a pointer
|
||||
.PT ptr
|
||||
a pointer into data space
|
||||
.PE 1
|
||||
.IE 0
|
||||
.DS
|
||||
.TS
|
||||
tab(:);
|
||||
l l.
|
||||
int:an integer of wordsize
|
||||
int2:an integer whose size is the maximum of the wordsize and 2 bytes
|
||||
int4:an integer whose size is the maximum of the wordsize and 4 bytes
|
||||
intp:an integer with the size of a pointer
|
||||
uns2:an unsigned integer whose size is the maximum of the wordsize and 2
|
||||
unsp:an unsigned integer with the size of a pointer
|
||||
ptr:a pointer into data space
|
||||
.TE
|
||||
.DE
|
||||
.LP
|
||||
The table below lists the i/o codes with their results and
|
||||
parameters.
|
||||
This list is similar to the system calls of the UNIX Version 7
|
||||
operating system.
|
||||
.A
|
||||
.QQ
|
||||
To execute a monitor call, proceed as follows:
|
||||
.IS 2
|
||||
.N 1
|
||||
.PS a 4 "" )
|
||||
.PT
|
||||
.IP a)
|
||||
Stack the parameters, in reverse order, last parameter first.
|
||||
.PT
|
||||
.IP b)
|
||||
Push the monitor call number (iocode) onto the stack.
|
||||
.PT
|
||||
.IP c)
|
||||
Execute the MON instruction.
|
||||
.PE 1
|
||||
.IE
|
||||
.LP
|
||||
An error code is present on the top of the stack after
|
||||
execution of most monitor calls.
|
||||
If this error code is zero, the call performed the action
|
||||
|
@ -117,9 +94,12 @@ This construction enables programs to test for failure with a
|
|||
single instruction (~TEQ or TNE~) and still find out the cause of
|
||||
the failure.
|
||||
The result name 'e' is reserved for the error code.
|
||||
.N 1
|
||||
.ne 5
|
||||
.LP
|
||||
List of monitor calls.
|
||||
.DS B
|
||||
.LP
|
||||
.nf
|
||||
.na
|
||||
.ta 4n 13n 29n 52n
|
||||
nr name parameters results function
|
||||
|
||||
|
@ -191,22 +171,23 @@ nr name parameters results function
|
|||
e:int Execute a file
|
||||
60 Umask mask:int2 oldmask:int2 Set file creation mode mask
|
||||
61 Chroot string:ptr e:int Change root directory
|
||||
.DE 1
|
||||
.fi
|
||||
.ad
|
||||
.LP
|
||||
Codes 0, 11, 13, 17, 31, 32, 38, 39, 40, 45, 49, 50, 52,
|
||||
55, 57, 58, 62, and 63 are
|
||||
not used.
|
||||
.P
|
||||
.PP
|
||||
All monitor calls, except fork and sigtrp
|
||||
are the same as the UNIX version 7 system calls.
|
||||
.P
|
||||
.PP
|
||||
The sigtrp entry maps UNIX signals onto EM interrupts.
|
||||
Normally, trapno is in the range 0 to 252.
|
||||
In that case it requests that signal signo
|
||||
will cause trap trapno to occur.
|
||||
When given trap number \-2, default signal handling is reset, and when given
|
||||
trap number \-3, the signal is ignored.
|
||||
.P
|
||||
.PP
|
||||
The flag returned by fork is 1 in the child process and 0 in
|
||||
the parent.
|
||||
The pid returned is the process-id of the other process.
|
||||
.VS
|
||||
|
|
|
@ -1,48 +1,42 @@
|
|||
.BP
|
||||
.S1 "INTRODUCTION"
|
||||
.bp
|
||||
.P1 "INTRODUCTION"
|
||||
.PP
|
||||
EM is a family of intermediate languages designed for producing
|
||||
portable compilers.
|
||||
The general strategy is for a program called
|
||||
.B front end
|
||||
The general strategy is for a program called \fBfront end\fP
|
||||
to translate the source program to EM.
|
||||
Another program,
|
||||
.B back
|
||||
.BW end
|
||||
Another program, \fBback end\fP,
|
||||
translates EM to target assembly language.
|
||||
Alternatively, the EM code can be assembled to a binary form
|
||||
and interpreted.
|
||||
These considerations led to the following goals:
|
||||
.IS 2 10
|
||||
.PS 1 4
|
||||
.PT
|
||||
.IP 1
|
||||
The design should allow translation to,
|
||||
or interpretation on, a wide range of existing machines.
|
||||
Design decisions should be delayed as far as possible
|
||||
and the implications of these decisions should
|
||||
be localized as much as possible.
|
||||
.N
|
||||
.br
|
||||
The current microcomputer technology offers 8, 16 and 32 bit machines
|
||||
with various sizes of address space.
|
||||
EM should be flexible enough to be useful on most of these
|
||||
machines.
|
||||
The differences between the members of the EM family should only
|
||||
concern the wordsize and address space size.
|
||||
.PT
|
||||
.IP 2
|
||||
The architecture should ease the task of code generation for
|
||||
high level languages such as Pascal, C, Ada, Algol 68, BCPL.
|
||||
.PT
|
||||
.IP 3
|
||||
The instruction set used by the interpreter should be compact,
|
||||
to reduce the amount of memory needed
|
||||
for program storage, and to reduce the time needed to transmit
|
||||
programs over communication lines.
|
||||
.PT
|
||||
.IP 3
|
||||
It should be designed with microprogrammed implementations in
|
||||
mind; in particular, the use of many short fields within
|
||||
instruction opcodes should be avoided, because their extraction by the
|
||||
microprogram or conversion to other instruction formats is inefficient.
|
||||
.PE
|
||||
.IE
|
||||
.A
|
||||
.PP
|
||||
The basic architecture is based on the concept of a stack. The stack
|
||||
is used for procedure return addresses, actual parameters, local variables,
|
||||
and arithmetic operations.
|
||||
|
@ -61,7 +55,7 @@ stack.
|
|||
For all types except pointers,
|
||||
these instructions have the object size
|
||||
as argument.
|
||||
.P
|
||||
.PP
|
||||
There are no visible general registers used for arithmetic operands
|
||||
etc. This is in contrast to most third generation computers, which usually
|
||||
have 8 or 16 general registers. The decision not to have a group of
|
||||
|
@ -69,11 +63,11 @@ general registers was fully intentional, and follows W.L. Van der
|
|||
Poel's dictum that a machine should have 0, 1, or an infinite
|
||||
number of any feature. General registers have two primary uses: to hold
|
||||
intermediate results of complicated expressions, e.g.
|
||||
.IS 5 0 1
|
||||
.DS
|
||||
((a*b + c*d)/e + f*g/h) * i
|
||||
.IE 1
|
||||
.DE
|
||||
and to hold local variables.
|
||||
.P
|
||||
.PP
|
||||
Various studies
|
||||
have shown that the average expression has fewer than two operands,
|
||||
making the former use of registers of doubtful value. The present trend
|
||||
|
@ -81,11 +75,9 @@ toward structured programs consisting of many small
|
|||
procedures greatly reduces the value of registers to hold local variables
|
||||
because the large number of procedure calls implies a large overhead in
|
||||
saving and restoring the registers at every call.
|
||||
.P
|
||||
.PP
|
||||
Although there are no general purpose registers, there are a
|
||||
few internal registers with specific functions as follows:
|
||||
.IS 2
|
||||
.N 1
|
||||
.TS
|
||||
tab(:);
|
||||
l 1 l l l.
|
||||
|
@ -94,9 +86,8 @@ LB:\-:Local Base:Points to base of the local variables
|
|||
:::in the current procedure.
|
||||
SP:\-:Stack Pointer:Points to the highest occupied word on the stack.
|
||||
HP:\-:Heap Pointer:Points to the top of the heap area.
|
||||
.TE 1
|
||||
.IE
|
||||
.A
|
||||
.TE
|
||||
.PP
|
||||
Furthermore, reverse Polish code is much easier to generate than
|
||||
multi-register machine code, especially if highly efficient code is
|
||||
desired.
|
||||
|
@ -106,7 +97,7 @@ An EM machine can
|
|||
achieve high performance by keeping part of the stack
|
||||
in high speed storage (a cache or microprogram scratchpad memory) rather
|
||||
than in primary memory.
|
||||
.P
|
||||
.PP
|
||||
Again according to van der Poel's dictum,
|
||||
all EM instructions have zero or one argument.
|
||||
We believe that instructions needing two arguments
|
||||
|
@ -116,11 +107,11 @@ circumstances as well.
|
|||
Moreover, these two instructions together often
|
||||
have a shorter encoding than the single
|
||||
instruction before.
|
||||
.P
|
||||
.PP
|
||||
This document describes EM at three different levels:
|
||||
the abstract level, the assembly language level and
|
||||
the machine language level.
|
||||
.A
|
||||
.QQ
|
||||
The most important level is that of the abstract EM architecture.
|
||||
This level deals with the basic design issues.
|
||||
Only the functional capabilities of instructions are relevant, not their
|
||||
|
@ -128,14 +119,14 @@ format or encoding.
|
|||
Most chapters of this document refer to the abstract level
|
||||
and it is explicitly stated whenever
|
||||
another level is described.
|
||||
.A
|
||||
.QQ
|
||||
The assembly language is intended for the compiler writer.
|
||||
It presents a more or less orthogonal instruction
|
||||
set and provides symbolic names for data.
|
||||
Moreover, it facilitates the linking of
|
||||
separately compiled 'modules' into a single program
|
||||
by providing several pseudoinstructions.
|
||||
.A
|
||||
.QQ
|
||||
The machine language is designed for interpretation with a compact
|
||||
program text and easy decoding.
|
||||
The binary representation of the machine language instruction set is
|
||||
|
@ -144,7 +135,7 @@ Frequent instructions have a short opcode.
|
|||
The encoding is fully byte oriented.
|
||||
These bytes do not contain small bit fields, because
|
||||
bit fields would slow down decoding considerably.
|
||||
.P
|
||||
.PP
|
||||
A common use for EM is for producing portable (cross) compilers.
|
||||
When used this way, the compilers produce
|
||||
EM assembly language as their output.
|
||||
|
@ -156,7 +147,7 @@ machine language instructions is irrelevant.
|
|||
On the other hand, when writing an interpreter for EM machine language
|
||||
programs, the interpreter must deal with the machine language
|
||||
and not with the symbolic assembly language.
|
||||
.P
|
||||
.PP
|
||||
As mentioned above, the
|
||||
current microcomputer technology offers 8, 16 and 32 bit
|
||||
machines with address spaces ranging from
|
||||
|
|
|
@ -1,6 +1,5 @@
|
|||
.SN 3
|
||||
.BP
|
||||
.S1 "INSTRUCTION ADDRESS SPACE"
|
||||
.bp
|
||||
.P1 "INSTRUCTION ADDRESS SPACE"
|
||||
The instruction space of the EM machine contains
|
||||
the code for procedures.
|
||||
Tables necessary for the execution of this code, for example, procedure
|
||||
|
@ -10,14 +9,14 @@ the execution of a program, so that it may be
|
|||
protected.
|
||||
No further restrictions to the instruction address space are
|
||||
necessary for the abstract and assembly language level.
|
||||
.P
|
||||
.PP
|
||||
Each procedure has a single entry point: the first instruction.
|
||||
A special type of pointer identifies a procedure.
|
||||
Pointers into the instruction
|
||||
address space have the same size as pointers into data space and
|
||||
can, for example, contain the address of the first instruction
|
||||
or an index in a procedure descriptor table.
|
||||
.A
|
||||
.QQ
|
||||
There is a single EM program counter, PC, pointing
|
||||
to the next instruction to be executed.
|
||||
The procedure pointed to by PC is
|
||||
|
@ -28,35 +27,31 @@ The calling procedure remains 'active' and is resumed whenever the called
|
|||
procedure returns.
|
||||
Note that a procedure has several 'active' invocations when
|
||||
called recursively.
|
||||
.P
|
||||
.PP
|
||||
Each procedure must return properly.
|
||||
It is not allowed to fall through to the
|
||||
code of the next procedure.
|
||||
There are several ways to exit from a procedure:
|
||||
.IS 3
|
||||
.PS
|
||||
.PT
|
||||
.IP -
|
||||
the RET instruction, which returns to the
|
||||
calling procedure.
|
||||
.PT
|
||||
.IP -
|
||||
the RTT instruction, which exits a trap handling routine and resumes
|
||||
the trapping instruction (see next chapter).
|
||||
.PT
|
||||
.IP -
|
||||
the GTO instruction, which is used for non-local goto's.
|
||||
It can remove several frames from the stack and transfer
|
||||
control to an active procedure.
|
||||
(see also MES~11 in paragraph 11.1.4.4)
|
||||
.PE
|
||||
.IE
|
||||
.P
|
||||
.PP
|
||||
All branch instructions can transfer control
|
||||
to any label within the same procedure.
|
||||
Branch instructions can never jump out of a procedure.
|
||||
.P
|
||||
.PP
|
||||
Several language implementations use a so called procedure
|
||||
instance identifier, a combination of a procedure identifier and
|
||||
the LB of a stack frame, also called static link.
|
||||
.P
|
||||
.PP
|
||||
The program text for each procedure, as well as any tables,
|
||||
are fragments and can be allocated anywhere
|
||||
in the instruction address space.
|
||||
|
|
156
doc/em/mach.nr
156
doc/em/mach.nr
|
@ -1,6 +1,6 @@
|
|||
.BP
|
||||
.SN 10
|
||||
.S1 "EM MACHINE LANGUAGE"
|
||||
.bp
|
||||
.P1 "EM MACHINE LANGUAGE"
|
||||
.PP
|
||||
The EM machine language is designed to make program text compact
|
||||
and to make decoding easy.
|
||||
Compact program text has many advantages: programs execute faster,
|
||||
|
@ -11,16 +11,18 @@ that it is feasible to use interpreters as long as EM hardware
|
|||
machines are not available.
|
||||
This chapter is irrelevant when back ends are used to
|
||||
produce executable target machine code.
|
||||
.S2 "Instruction encoding"
|
||||
.P2 "Instruction encoding"
|
||||
.PP
|
||||
A design goal of EM is to make the
|
||||
program text as compact as possible.
|
||||
Decoding must be easy, however.
|
||||
The encoding is fully byte oriented, without any small bit fields.
|
||||
There are 256 primary opcodes, two of which are an escape to
|
||||
two groups of 256 secondary opcodes each.
|
||||
.A
|
||||
.QQ
|
||||
EM instructions without arguments have a single opcode assigned,
|
||||
possibly escaped:
|
||||
.ta 12n 24n
|
||||
.Dr 6
|
||||
|--------------|
|
||||
| opcode |
|
||||
|
@ -37,7 +39,7 @@ Several instructions have an address from the global data area
|
|||
as argument.
|
||||
Other instructions have different opcodes for positive
|
||||
and negative arguments.
|
||||
.N 1
|
||||
.LP
|
||||
There is always an opcode that takes the next two bytes as argument,
|
||||
high byte first:
|
||||
.Dr 6
|
||||
|
@ -94,7 +96,7 @@ several different encodings are available.
|
|||
It is the task of the assembler to select the shortest of these.
|
||||
The savings by these mini and shortie
|
||||
opcodes are considerable, about 55%.
|
||||
.P
|
||||
.PP
|
||||
Further improvements are possible:
|
||||
the arguments of
|
||||
many instructions are a multiple of the wordsize.
|
||||
|
@ -106,26 +108,24 @@ The arguments of some other instructions
|
|||
rarely or never assume the value 0, but start at 1.
|
||||
The value 1 is then encoded as 0,
|
||||
2 as 1 and so on.
|
||||
.P
|
||||
.PP
|
||||
Assigning opcodes to instructions by the assembler is completely
|
||||
table driven.
|
||||
For details see appendix B.
|
||||
.S2 "Procedure descriptors"
|
||||
.P2 "Procedure descriptors"
|
||||
.PP
|
||||
The procedure identifiers used in the interpreter are indices
|
||||
into a table of procedure descriptors.
|
||||
Each descriptor contains:
|
||||
.IS 6
|
||||
.PS - 4
|
||||
.PT 1.
|
||||
.IP 1.
|
||||
the number of bytes to be reserved for locals at each
|
||||
invocation.
|
||||
.N
|
||||
.br
|
||||
This is a pointer-sized integer.
|
||||
.PT 2.
|
||||
.IP 2.
|
||||
the start address of the procedure
|
||||
.PE
|
||||
.IE
|
||||
.S2 "Load format"
|
||||
.P2 "Load format"
|
||||
.PP
|
||||
The EM machine language load format defines the interface between
|
||||
the EM assembler/loader and the EM machine itself.
|
||||
A load file consists of a header, the program text to be executed,
|
||||
|
@ -133,7 +133,7 @@ a description of the global data area and the procedure descriptor table,
|
|||
in this order.
|
||||
All integers in the load file are presented with the
|
||||
least significant byte first.
|
||||
.P
|
||||
.PP
|
||||
The header has two parts: the first half (eight 16-bit integers)
|
||||
aids in selecting
|
||||
the correct EM machine or interpreter.
|
||||
|
@ -141,68 +141,59 @@ Some EM machines, for instance, may have hardware floating point
|
|||
instructions.
|
||||
.N
|
||||
The header entries are as follows (bit 0 is rightmost):
|
||||
.IS 2
|
||||
.VS 1 0
|
||||
.PS 1 4 "" :
|
||||
.PT
|
||||
.IP 1:
|
||||
magic number (07255)
|
||||
.PT
|
||||
.IP 2:
|
||||
flag bits with the following meaning:
|
||||
.PS - 7 "" :
|
||||
.PT bit 0
|
||||
.RS
|
||||
.IP "bit 0"
|
||||
TEST; test for integer overflow etc.
|
||||
.PT bit 1
|
||||
.IP "bit 1"
|
||||
PROFILE; for each source line: count the number of memory
|
||||
cycles executed.
|
||||
.PT bit 2
|
||||
.IP "bit 2"
|
||||
FLOW; for each source line: set a bit in a bit map table if
|
||||
instructions on that line are executed.
|
||||
.PT bit 3
|
||||
.IP "bit 3"
|
||||
COUNT; for each source line: increment a counter if that line
|
||||
is entered.
|
||||
.PT bit 4
|
||||
.IP "bit 4"
|
||||
REALS; set if a program uses floating point instructions.
|
||||
.PT bit 5
|
||||
.IP "bit 5"
|
||||
EXTRA; more tests during compiler debugging.
|
||||
.PE
|
||||
.PT
|
||||
.RE
|
||||
.IP 3:
|
||||
number of unresolved references.
|
||||
.PT
|
||||
.IP 4:
|
||||
version number; used to detect obsolete EM load files.
|
||||
.PT
|
||||
.IP 5:
|
||||
wordsize ; the number of bytes in each machine word.
|
||||
.PT
|
||||
.IP 6:
|
||||
pointer size ; the number of bytes available for addressing.
|
||||
.PT
|
||||
.IP 7:
|
||||
unused
|
||||
.PT
|
||||
.IP 8:
|
||||
unused
|
||||
.PE
|
||||
.IE
|
||||
.LP
|
||||
The second part of the header (eight entries, of pointer size bytes each)
|
||||
describes the load file itself:
|
||||
.IS 2
|
||||
.PS 1 4 "" :
|
||||
.PT
|
||||
.IP 1:
|
||||
NTEXT; the program text size in bytes.
|
||||
.PT
|
||||
.IP 2:
|
||||
NDATA; the number of load-file descriptors (see below).
|
||||
.PT
|
||||
.IP 3:
|
||||
NPROC; the number of entries in the procedure descriptor table.
|
||||
.PT
|
||||
.IP 4:
|
||||
ENTRY; procedure number of the procedure to start with.
|
||||
.PT
|
||||
.IP 5:
|
||||
NLINE; the maximum source line number.
|
||||
.PT
|
||||
.IP 6:
|
||||
SZDATA; the address of the lowest uninitialized data byte.
|
||||
.PT
|
||||
.IP 7:
|
||||
unused
|
||||
.PT
|
||||
.IP 8:
|
||||
unused
|
||||
.PE
|
||||
.IE
|
||||
.VS
|
||||
.P
|
||||
.PP
|
||||
The program text consists of NTEXT bytes.
|
||||
NTEXT is always a multiple of the wordsize.
|
||||
The first byte of the program text is the
|
||||
|
@ -212,7 +203,7 @@ Pointers into the program text are found in the procedure descriptor
|
|||
table where relocation is simple and in the global data area.
|
||||
The initialization of the global data area allows easy
|
||||
relocation of pointers into both address spaces.
|
||||
.P
|
||||
.PP
|
||||
The global data area is described by the NDATA descriptors.
|
||||
Each descriptor describes a number of consecutive words (of~wordsize)
|
||||
and consists of a sequence of bytes.
|
||||
|
@ -220,7 +211,7 @@ While reading the descriptors from the load file, one can
|
|||
initialize the global data area from low to high addresses.
|
||||
The size of the initialized data area is given by SZDATA,
|
||||
this number can be used to check the initialization.
|
||||
.N
|
||||
.br
|
||||
The header of each descriptor consists of a byte, describing the type,
|
||||
and a count.
|
||||
The number of bytes used for this (unsigned) count depends on the
|
||||
|
@ -232,31 +223,22 @@ At load time an interpreter can
|
|||
perform any conversion deemed necessary, such as
|
||||
reordering bytes in integers
|
||||
and pointers and adding base addresses to pointers.
|
||||
.A
|
||||
.QQ
|
||||
In the following pictures we show a graphical notation of the
|
||||
initializers.
|
||||
The leftmost rectangle represents the leading byte.
|
||||
.N 1
|
||||
.VS 1 0
|
||||
.DS
|
||||
.PS - 4 " "
|
||||
.LP
|
||||
Fields marked with
|
||||
.N 1
|
||||
.PT n
|
||||
contain a pointer-sized integer used as a count
|
||||
.PT m
|
||||
contain a one-byte integer used as a count
|
||||
.PT b
|
||||
contain a one-byte integer
|
||||
.PT w
|
||||
contain a wordsized integer
|
||||
.PT p
|
||||
contain a data or instruction pointer
|
||||
.PT s
|
||||
contain a null terminated ASCII string
|
||||
.PE 1
|
||||
.DE 0
|
||||
.VS
|
||||
.TS
|
||||
tab(:);
|
||||
l l.
|
||||
n:contain a pointer-sized integer used as a count
|
||||
m:contain a one-byte integer used as a count
|
||||
b:contain a one-byte integer
|
||||
w:contain a wordsized integer
|
||||
p:contain a data or instruction pointer
|
||||
s:contain a null terminated ASCII string
|
||||
.TE
|
||||
.Dr 6
|
||||
-------------------
|
||||
| 0 | n | repeat last initialization n times
|
||||
|
@ -316,8 +298,7 @@ contain a null terminated ASCII string
|
|||
| 8 | m | s | initialized float of size m
|
||||
-------------------------
|
||||
.De
|
||||
.PS - 8
|
||||
.PT type~0:
|
||||
.IP type~0: 10
|
||||
If the last initialization initialized k bytes starting
|
||||
at address \fIa\fP, do the same initialization again n times,
|
||||
starting at \fIa\fP+k, \fIa\fP+2*k, .... \fIa\fP+n*k.
|
||||
|
@ -328,43 +309,43 @@ pointer,
|
|||
in all other descriptors the first byte is followed by a one-byte count.
|
||||
This descriptor must be preceded by a descriptor of
|
||||
another type.
|
||||
.PT type~1:
|
||||
.IP type~1: 10
|
||||
Reserve m words, not explicitly initialized (BSS and HOL).
|
||||
.PT type~2:
|
||||
.IP type~2: 10
|
||||
The m bytes following the descriptor header are
|
||||
initializers for the next m bytes of the
|
||||
global data area.
|
||||
m is divisible by the wordsize.
|
||||
.PT type~3:
|
||||
.IP type~3: 10
|
||||
The m words following the header are initializers for the next m words of the
|
||||
global data area.
|
||||
.PT type~4:
|
||||
.IP type~4: 10
|
||||
The m data address space pointers following the header are
|
||||
initializers for the next
|
||||
m data pointers in the global data area.
|
||||
Interpreters that represent EM pointers by
|
||||
target machine addresses must relocate all data pointers.
|
||||
.PT type~5:
|
||||
.IP type~5: 10
|
||||
The m instruction address space pointers following the header are
|
||||
initializers for the next
|
||||
m instruction pointers in the global data area.
|
||||
Interpreters that represent EM instruction pointers by
|
||||
target machine addresses must relocate these pointers.
|
||||
.PT type~6:
|
||||
.IP type~6: 10
|
||||
The m bytes following the header form
|
||||
a signed integer number with a size of m bytes,
|
||||
which is an initializer for the next m bytes
|
||||
of the global data area.
|
||||
m is governed by the same restrictions as for
|
||||
transfer of objects to/from memory.
|
||||
.PT type~7:
|
||||
.IP type~7: 10
|
||||
The m bytes following the header form
|
||||
an unsigned integer number with a size of m bytes,
|
||||
which is an initializer for the next m bytes
|
||||
of the global data area.
|
||||
m is governed by the same restrictions as for
|
||||
transfer of objects to/from memory.
|
||||
.PT type~8:
|
||||
.IP type~8: 10
|
||||
The header is followed by an ASCII string, null terminated, to
|
||||
initialize, in global data,
|
||||
a floating point number with a size of m bytes.
|
||||
|
@ -372,8 +353,7 @@ m is governed by the same restrictions as for
|
|||
transfer of objects to/from memory.
|
||||
The ASCII string contains the notation of a real as used in the
|
||||
Pascal language.
|
||||
.PE
|
||||
.P
|
||||
.PP
|
||||
The NPROC procedure descriptors on the load file consist of
|
||||
an instruction space address (of~pointer~size) and
|
||||
an integer (of~pointer~size) specifying the number of bytes for
|
||||
|
|
120
doc/em/macr.nr
120
doc/em/macr.nr
|
@ -1,39 +1,113 @@
|
|||
.SS 10
|
||||
.if n .LL 78
|
||||
.RP
|
||||
.MS T E
|
||||
\!.TL '%'''
|
||||
.ME
|
||||
.MS T O
|
||||
\!.TL '''%'
|
||||
.ME
|
||||
.MS B
|
||||
.sp 1
|
||||
.ME
|
||||
.SM S1 B
|
||||
.SM S2 B
|
||||
.LP
|
||||
.if n \{\
|
||||
.nr LL 78
|
||||
.ll 78 \}
|
||||
.tr ~
|
||||
.\" below are three simple macros to get the drawings right
|
||||
.\" added by Dick Grune
|
||||
.de Dr \" Drawing $1 (size)
|
||||
.N 1
|
||||
.NE \\$1
|
||||
.NA
|
||||
.sp 1
|
||||
.ne \\$1
|
||||
.na
|
||||
.ft CW \" constant spacing
|
||||
.lg 0 \" no ligatures
|
||||
..
|
||||
.de Df \" Drawing Footer
|
||||
.N 1
|
||||
.br
|
||||
.sp 1
|
||||
.ft R
|
||||
.CS
|
||||
.ce 1000
|
||||
.lg 1
|
||||
..
|
||||
.de De \" Drawing End $1 (lines)
|
||||
.Df \" if it hasn't happened yet
|
||||
.CE
|
||||
.AD
|
||||
.N \\$1
|
||||
.br
|
||||
.ft R
|
||||
.lg 1
|
||||
.ce 0
|
||||
.ad
|
||||
.sp \\$1
|
||||
..
|
||||
.\" macro for exponents, added by Ceriel Jacobs
|
||||
.de Ex \" Exponent $1 $2 [$3]
|
||||
\\$1\v'-0.5m'\s-2\\$2\s+2\v'0.5m'\\$3
|
||||
..
|
||||
.\" QQ is like PP, but without space
|
||||
. \" use .PP, with PD 0.
|
||||
.de QQ
|
||||
.nr xx \\n(PD
|
||||
.nr PD 0
|
||||
.PP
|
||||
.nr PD \\n(xx
|
||||
..
|
||||
.nr N1 0
|
||||
.nr N2 0
|
||||
.nr N3 0
|
||||
.nr N4 0
|
||||
.nr N5 0
|
||||
.nr A5 0
|
||||
.af A5 A
|
||||
.de P1
|
||||
.nr N2 0
|
||||
.nr N1 \\n(N1+1
|
||||
.ds Tl "\\n(N1. \\$1
|
||||
.Ca 0
|
||||
.sp
|
||||
.LP
|
||||
\\fB\\n(N1. \\$1\\fP
|
||||
.sp
|
||||
..
|
||||
.de P2
|
||||
.nr N3 0
|
||||
.nr N2 \\n(N2+1
|
||||
.ds Tl "\\n(N1.\\n(N2 \\$1
|
||||
.ne 5
|
||||
.Ca 2
|
||||
.sp
|
||||
.LP
|
||||
\\fB\\n(N1.\\n(N2 \\$1\fP
|
||||
..
|
||||
.de P3
|
||||
.nr N4 0
|
||||
.nr N3 \\n(N3+1
|
||||
.ds Tl "\\n(N1.\\n(N2.\\n(N3 \\$1
|
||||
.Ca 4
|
||||
.LP
|
||||
\\fI\\n(N1.\\n(N2.\\n(N3 \\$1\fP
|
||||
..
|
||||
.de P4
|
||||
.nr N4 \\n(N4+1
|
||||
.ds Tl "\\n(N1.\\n(N2.\\n(N3.\\n(N4 \\$1
|
||||
.ne 5
|
||||
.Ca 6
|
||||
.LP
|
||||
\\fI\\n(N1.\\n(N2.\\n(N3.\\n(N4 \\$1\fP
|
||||
..
|
||||
.de AP
|
||||
.nr N5 \\n(N5+1
|
||||
.nr A5 \\n(N5
|
||||
.ds Tl "\\n(A5. \\$1
|
||||
.ne 5
|
||||
.Ca 0
|
||||
.LP
|
||||
\\fB\\n(A5. \\$1\\fP
|
||||
.sp
|
||||
..
|
||||
.de Ca
|
||||
.da Cc
|
||||
.if \\$1=0 \!.sp \\\\n(PDu
|
||||
\!\l\&\\$1n\ \&\\*(Tl \l\&|\\\\n(LLu-\w\&\ \\n(PN\&u.\&\ \\n(PN
|
||||
\!.br
|
||||
.da
|
||||
..
|
||||
.de Ct
|
||||
.Cc
|
||||
.rm Cc
|
||||
..
|
||||
.de PT
|
||||
.lt \\n(LLu
|
||||
.pc %
|
||||
.nr PN \\n%-1
|
||||
.if \\n(PN%2=1 .tl '''\\n(PN'
|
||||
.if (\\n(PN%2=0)&(\\n(PN) .tl '\\n(PN'''
|
||||
.lt \\n(.lu
|
||||
..
|
||||
|
|
|
@ -1,12 +1,12 @@
|
|||
.SN 5
|
||||
.BP
|
||||
.S1 "MAPPING OF EM DATA MEMORY ONTO TARGET MACHINE MEMORY"
|
||||
.bp
|
||||
.P1 "MAPPING OF EM DATA MEMORY ONTO TARGET MACHINE MEMORY"
|
||||
.PP
|
||||
The EM architecture is designed to be implemented
|
||||
on many existing and future machines.
|
||||
EM memory is highly fragmented to make
|
||||
adaptation to various memory architectures possible.
|
||||
Format and encoding of pointers is explicitly undefined.
|
||||
.P
|
||||
.PP
|
||||
This chapter gives solutions to some of the
|
||||
anticipated problems.
|
||||
First, we describe a possible memory layout for machines
|
||||
|
@ -53,11 +53,9 @@ The most straightforward layout is shown in figure 2.
|
|||
Figure 2. Memory layout showing typical register
|
||||
positions during execution of an EM program.
|
||||
.De
|
||||
.N 1
|
||||
.sp 1
|
||||
The base registers for the various memory pieces can be stored
|
||||
in target machine registers or memory.
|
||||
.IS
|
||||
.N 1
|
||||
.TS
|
||||
tab(;);
|
||||
l 1 l l l.
|
||||
|
@ -65,8 +63,8 @@ PB;:;program base;points to the base of the instruction address space.
|
|||
EB;:;external base;points to the base of the data address space.
|
||||
HB;:;heap base;points to the base of the heap area.
|
||||
ML;:;memory limit;marks the high end of the addressable data space.
|
||||
.TE 1
|
||||
.IE
|
||||
.TE
|
||||
.LP
|
||||
The stack grows from high
|
||||
EM addresses to low EM addresses, and the heap the
|
||||
other way.
|
||||
|
@ -74,7 +72,7 @@ The memory between SP and HP is not accessible,
|
|||
but may be allocated later to the stack or the heap if needed.
|
||||
The local data area is allocated starting at the high end of
|
||||
memory.
|
||||
.P
|
||||
.PP
|
||||
Because EM address 0 is not mapped onto target
|
||||
address 0, a problem arises when pointers are used.
|
||||
If a program pushed a constant, say 6, onto the stack,
|
||||
|
@ -86,7 +84,7 @@ This particular problem is solved by explicitly declaring
|
|||
the format of a pointer to be undefined,
|
||||
so that using a constant as a pointer is completely illegal.
|
||||
However, the general problem of mapping pointers still exists.
|
||||
.P
|
||||
.PP
|
||||
There are two possible solutions.
|
||||
In the first solution, EM pointers are represented
|
||||
in the target machine as true EM addresses,
|
||||
|
@ -100,7 +98,7 @@ facilities, EB can be kept in a target machine register,
|
|||
and the relocation can indeed be done on
|
||||
every reference to the data address space
|
||||
at a modest cost in speed.
|
||||
.P
|
||||
.PP
|
||||
The other solution consists of having EM pointers
|
||||
refer to the true target machine address.
|
||||
Thus the instruction LAE 6 (Load Address of External 6)
|
||||
|
@ -112,7 +110,7 @@ However, the problem is not completely solved,
|
|||
because a front end may have to initialize a pointer
|
||||
in CON or ROM data to point to a global address.
|
||||
This pointer must also be relocated by the back end or the interpreter.
|
||||
.P
|
||||
.PP
|
||||
Although the EM stack grows from high to low EM addresses,
|
||||
some machines have hardware PUSH and POP
|
||||
instructions that require the stack to grow upwards.
|
||||
|
@ -144,17 +142,13 @@ Figure 3. Two possible memory implementations.
|
|||
Numbers within the boxes are EM addresses.
|
||||
The other numbers are physical addresses.
|
||||
.De
|
||||
.A 1 0
|
||||
.LP
|
||||
So, we have two different EM memory implementations:
|
||||
.IS
|
||||
.PS - 4
|
||||
.PT A~\-
|
||||
.IP "A~\-"
|
||||
stack downwards
|
||||
.PT B~\-
|
||||
.IP "B~\-"
|
||||
stack upwards
|
||||
.PE
|
||||
.IE
|
||||
.P
|
||||
.PP
|
||||
For each of these two possibilities we give the translation of
|
||||
the EM instructions to push the third byte of a global data
|
||||
block starting at EM address 40 onto the stack and to load the
|
||||
|
@ -164,22 +158,20 @@ The target machine used is a PDP-11 augmented with push and pop instructions.
|
|||
Registers 'r0' and 'r1' are used and suffer from sign extension for byte
|
||||
transfers.
|
||||
Push $40 means push the constant 40, not word 40.
|
||||
.P
|
||||
.PP
|
||||
The translation of the EM instructions depends on the pointer representation
|
||||
used.
|
||||
For each of the two solutions explained above the translation is given.
|
||||
.P
|
||||
.PP
|
||||
First, the translation for the two implementations using EM addresses as
|
||||
pointer representation:
|
||||
.DS
|
||||
.KS
|
||||
.TS
|
||||
tab(:), center;
|
||||
l s l s l s
|
||||
_ s _ s _ s
|
||||
l 2 l 6 l 2 l 6 l 2 l.
|
||||
EM:type A:type B
|
||||
|
||||
|
||||
_
|
||||
LAE:40:push:$40:push:$40
|
||||
|
||||
ADP:3:pop:r0:pop:r0
|
||||
|
@ -194,20 +186,17 @@ LOI:1:pop:r0:pop:r0
|
|||
|
||||
LOE:40:push:eb+40:push:eb-41
|
||||
.TE
|
||||
.DE
|
||||
.P
|
||||
.KE
|
||||
.PP
|
||||
The translation for the two implementations, if the target machine address is
|
||||
used as pointer representation, is:
|
||||
.N 1
|
||||
.DS
|
||||
.KS
|
||||
.TS
|
||||
tab(:), center;
|
||||
l s l s l s
|
||||
_ s _ s _ s
|
||||
l 2 l 6 l 2 l 6 l 2 l.
|
||||
EM:type A:type B
|
||||
|
||||
|
||||
_
|
||||
LAE:40:push:$eb+40:push:$eb-40
|
||||
|
||||
ADP:3:pop:r0:pop:r0
|
||||
|
@ -221,12 +210,12 @@ LOI:1:pop:r0:pop:r0
|
|||
|
||||
LOE:40:push:eb+40:push:eb-41
|
||||
.TE
|
||||
.DE
|
||||
.P
|
||||
.KE
|
||||
.PP
|
||||
The translation presented above is not intended to be optimal.
|
||||
Most machines can handle these simple cases in one or two instructions.
|
||||
It demonstrates, however, the flexibility of the EM design.
|
||||
.P
|
||||
.PP
|
||||
There are several possibilities to implement EM on machines with
|
||||
address spaces larger than 64k bytes.
|
||||
For EM with two byte pointers one could allocate instruction and
|
||||
|
@ -236,7 +225,7 @@ but the base registers PB and EB may be loaded in hardware registers
|
|||
wider than 16 bits, if available.
|
||||
EM implementations can also make efficient use of a machine
|
||||
with separate instruction and data space.
|
||||
.P
|
||||
.PP
|
||||
EM with 32 bit pointers allows one to make use of machines
|
||||
with large address spaces.
|
||||
In a virtual, segmented memory system one could use a separate
|
||||
|
|
|
@ -1,13 +1,13 @@
|
|||
.BP
|
||||
.SN 2
|
||||
.S1 MEMORY
|
||||
.bp
|
||||
.P1 MEMORY
|
||||
.PP
|
||||
The EM machine has two distinct address spaces,
|
||||
one for instructions and one for data.
|
||||
The data space is divided up into 8-bit bytes.
|
||||
The smallest addressable unit is a byte.
|
||||
Bytes are numbered consecutively from 0 to some maximum.
|
||||
All sizes in EM are expressed in bytes.
|
||||
.P
|
||||
.PP
|
||||
Some EM instructions can transfer objects containing several bytes
|
||||
to and/or from memory.
|
||||
The size of all objects larger than a word must be a multiple of
|
||||
|
@ -26,7 +26,7 @@ location \fIm\fP and the wordsize is 2,
|
|||
\fIm\fP must be a multiple of 2 and the bytes at
|
||||
locations \fIm\fP, \fIm\fP\|+\|1,\fIm\fP\|+\|2 and
|
||||
\fIm\fP\|+\|3 are overwritten.
|
||||
.P
|
||||
.PP
|
||||
The size of almost all objects in EM
|
||||
is an integral number of words.
|
||||
Only two operations are allowed on
|
||||
|
@ -42,11 +42,11 @@ EM provides a way to sign-extend a small integer.
|
|||
Popping a small object from the stack removes a word
|
||||
from the stack, stores the least significant byte(s)
|
||||
of this word in memory and discards the rest of the word.
|
||||
.P
|
||||
.PP
|
||||
The format of pointers into both address spaces is explicitly undefined.
|
||||
The size of a pointer, however, is fixed for a member of EM, so that
|
||||
the compiler writer knows how much storage to allocate for a pointer.
|
||||
.P
|
||||
.PP
|
||||
A minor problem is raised by the undefined pointer format.
|
||||
Some languages, notably Pascal, require a special,
|
||||
otherwise illegal, pointer value to represent the nil pointer.
|
||||
|
@ -59,12 +59,12 @@ but it is hard to imagine an implementation
|
|||
for which the current solution is inadequate,
|
||||
especially because the first word in the EM data space
|
||||
is special and probably not the target of any pointer.
|
||||
.P
|
||||
.PP
|
||||
The next two chapters describe the EM memory
|
||||
in more detail.
|
||||
One describes the instruction address space,
|
||||
the other the data address space.
|
||||
.P
|
||||
.PP
|
||||
A design goal of EM has been to allow
|
||||
its implementation on a wide range of existing machines,
|
||||
as well as allowing a new one to be built in hardware.
|
||||
|
|
|
@ -1,6 +1,4 @@
|
|||
.po 0
|
||||
.TP 1
|
||||
.ll 79n
|
||||
.LP
|
||||
\&
|
||||
.sp 10
|
||||
.ce 4
|
||||
|
@ -25,12 +23,9 @@ Abstract
|
|||
.ti +5
|
||||
EM is a family of intermediate languages
|
||||
designed for producing portable compilers.
|
||||
A program called
|
||||
.B front end
|
||||
A program called \fBfront end\fP
|
||||
translates source programs to EM.
|
||||
Another program,
|
||||
.B back
|
||||
.BW end ,
|
||||
Another program, \fBback end\fP,
|
||||
translates EM to the assembly language of the target machine.
|
||||
Alternatively, the EM program can be assembled to a highly
|
||||
efficient binary format for interpretation.
|
||||
|
|
|
@ -1,13 +1,12 @@
|
|||
.SN 9
|
||||
.VS 1 0
|
||||
.BP
|
||||
.S1 "TRAPS AND INTERRUPTS"
|
||||
.bp
|
||||
.P1 "TRAPS AND INTERRUPTS"
|
||||
.PP
|
||||
EM provides a means for the user program to catch all traps
|
||||
generated by the program itself, the hardware, or external conditions.
|
||||
This mechanism uses five instructions: LIM, SIM, SIG, TRP and RTT.
|
||||
This section of the manual may be omitted on the first reading since it
|
||||
presupposes knowledge of the EM instruction set.
|
||||
.P
|
||||
.PP
|
||||
The action taken when a trap occurs is determined by the value
|
||||
of an internal EM trap register.
|
||||
This register contains a pointer to a procedure.
|
||||
|
@ -26,7 +25,7 @@ Two consecutive SIGs are a no-op.
|
|||
When a trap occurs, the trap register is reset to its initial
|
||||
condition, to prevent recursive traps from hanging the machine up,
|
||||
e.g. stack overflow in the stack overflow handling procedure.
|
||||
.P
|
||||
.PP
|
||||
The runtime systems for some languages need to ignore some EM
|
||||
traps.
|
||||
EM offers a feature called the ignore mask.
|
||||
|
@ -37,24 +36,24 @@ If a certain bit is 1 the corresponding trap never
|
|||
occurs and processing simply continues.
|
||||
The actions performed by the offending instruction are
|
||||
described by the Pascal program in appendix A.
|
||||
.N
|
||||
.br
|
||||
If the bit is 0, traps are not ignored.
|
||||
The instructions LIM and SIM allow copying and replacement of
|
||||
the ignore mask.~
|
||||
.P
|
||||
.PP
|
||||
The TRP instruction generates a trap, the trap number being found on the
|
||||
stack.
|
||||
This is, among other things,
|
||||
useful for library procedures and runtime systems.
|
||||
It can also be used by a low level trap procedure to pass the trap to a
|
||||
higher level one (see example below).
|
||||
.P
|
||||
.PP
|
||||
The RTT instruction returns from the trap procedure and continues after the
|
||||
trap.
|
||||
In the list below all traps marked with an asterisk ('*') are
|
||||
considered to be fatal and it is explicitly undefined what happens when
|
||||
restarting after the trap.
|
||||
.P
|
||||
.PP
|
||||
The way a trap procedure is called is completely compatible
|
||||
with normal calling conventions. The only way a trap procedure
|
||||
differs from normal procedures is the return. It has to use RTT instead
|
||||
|
@ -62,25 +61,20 @@ of RET. This is necessary because the complete runtime status is saved on the
|
|||
stack before calling the procedure and all this status has to be reloaded.
|
||||
Error numbers are in the range 0 to 252.
|
||||
The trap numbers are divided into three categories:
|
||||
.IS 4
|
||||
.N 1
|
||||
.PS - 10
|
||||
.PT ~~0\-~63
|
||||
.IP "\0\00\-\063" 12
|
||||
EM machine errors, e.g. illegal instruction.
|
||||
.PS - 8
|
||||
.PT ~0\-15
|
||||
.RS
|
||||
.IP "\00\-15" 8
|
||||
maskable
|
||||
.PT 16\-63
|
||||
.IP "16\-63" 8
|
||||
not maskable
|
||||
.PE
|
||||
.PT ~64\-127
|
||||
.RE
|
||||
.IP "\064\-127" 12
|
||||
Reserved for use by compilers, run time systems, etc.
|
||||
.PT 128\-252
|
||||
.IP "128\-252" 12
|
||||
Available for user programs.
|
||||
.PE 1
|
||||
.IE
|
||||
.LP
|
||||
EM machine errors are numbered as follows:
|
||||
.DS I 5
|
||||
.TS
|
||||
tab(@);
|
||||
n l l.
|
||||
|
@ -108,15 +102,16 @@ n l l.
|
|||
26@EBADLIN@Argument of LIN too high
|
||||
27@EBADGTO@GTO descriptor error
|
||||
.TE
|
||||
.DE 0
|
||||
.P
|
||||
.PP
|
||||
As an example,
|
||||
suppose a subprocedure has to be written to do a numeric
|
||||
calculation.
|
||||
When an overflow occurs the computation has to be stopped and
|
||||
the higher level procedure must be resumed.
|
||||
This can be programmed as follows using the mechanism described above:
|
||||
.DS B
|
||||
.LP
|
||||
.KS
|
||||
.nf
|
||||
.ta 1n 24n
|
||||
mes 2,2,2 ; set sizes
|
||||
ersave
|
||||
|
@ -150,10 +145,12 @@ msave
|
|||
jmpbuf
|
||||
con *1,0,0
|
||||
end
|
||||
.DE 0
|
||||
.VS
|
||||
.DS
|
||||
.KE
|
||||
.KS
|
||||
.LP
|
||||
Example of catch procedure
|
||||
.LP
|
||||
.nf
|
||||
.ta 1n 24n
|
||||
pro $catch,0 ; Local procedure that must catch the overflow trap
|
||||
lol 2 ; Load trap number
|
||||
|
@ -168,4 +165,5 @@ Example of catch procedure
|
|||
trp ; call other trap procedure
|
||||
rtt ; if other procedure returns, do the same
|
||||
end
|
||||
.DE
|
||||
.KE
|
||||
.fi
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
.SN 6
|
||||
.BP
|
||||
.S1 "TYPE REPRESENTATIONS"
|
||||
.bp
|
||||
.P1 "TYPE REPRESENTATIONS"
|
||||
.PP
|
||||
The representations used for typed objects are not precisely
|
||||
specified by EM.
|
||||
Sometimes we only specify that a typed object occupies a
|
||||
|
@ -15,7 +15,7 @@ on the same object(s).
|
|||
For example, the instruction ZER pushes signed and
|
||||
unsigned integers with the value zero and empty sets.
|
||||
ZER has as only argument the size of the object.
|
||||
.A
|
||||
.QQ
|
||||
The representation of floating point numbers is a good example,
|
||||
it allows widely varying implementations.
|
||||
The only ways to create floating point numbers are via
|
||||
|
@ -26,13 +26,14 @@ be converted to human readable output.
|
|||
Implementations may use base 10, base 2 or any other
|
||||
base for exponents, and have freedom in choosing the range of
|
||||
exponent and mantissa.
|
||||
.A
|
||||
.QQ
|
||||
Other types are more precisely described.
|
||||
In the following paragraphs a description will be given of the
|
||||
restrictions imposed on the representation of the types used.
|
||||
A number \fBn\fP used in these paragraphs indicates the size of
|
||||
the object in \fIbits\fP.
|
||||
.S2 "Unsigned integers"
|
||||
.P2 "Unsigned integers"
|
||||
.PP
|
||||
The range of unsigned integers is 0..
|
||||
.Ex 2 "\fBn\fP" -1.
|
||||
A binary representation is assumed.
|
||||
|
@ -47,20 +48,21 @@ This of course means that some sequences of instructions have
|
|||
unpredictable effects.
|
||||
For example:
|
||||
.DS
|
||||
LOC 258 ; STL 0 ; LAL 0 ; LOI 1 ( wordsize >=2 )
|
||||
LOC 258 ; STL 0 ; LAL 0 ; LOI 1 ( wordsize >=2 )
|
||||
.DE
|
||||
The value on the stack after executing this sequence
|
||||
can be anything,
|
||||
but will most likely be 1 or 2.
|
||||
.A
|
||||
.QQ
|
||||
Conversion between unsigned integers of different sizes have to
|
||||
be done with explicit convert instructions.
|
||||
One cannot simply pad an unsigned integer with zero's at either end
|
||||
and expect a correct result.
|
||||
.A
|
||||
.QQ
|
||||
We assume existence of at least single word unsigned arithmetic
|
||||
in any implementation.
|
||||
.S2 "Signed Integers"
|
||||
.P2 "Signed Integers"
|
||||
.PP
|
||||
The range of signed integers is
|
||||
.Ex \-2 "\fBn\fP\-1" ~..
|
||||
.Ex 2 "\fBn\fP\-1" \-1,
|
||||
|
@ -75,29 +77,31 @@ range
|
|||
In other words, the most significant bit is used as sign bit.
|
||||
The convert instructions between signed and unsigned integers
|
||||
of the same size can be used to catch errors.
|
||||
.A
|
||||
.QQ
|
||||
The value
|
||||
.Ex \-2 "\fBn\fP\-1"
|
||||
is used for undefined
|
||||
signed integers.
|
||||
EM implementations should trap when this value is used in an
|
||||
operation on signed integers.
|
||||
The instruction mask, accessed with SIM and LIM \-~see chapter 9~\- ,
|
||||
The instruction mask, accessed with SIM and LIM \-~see chapter 9~\-,
|
||||
can be used to disable such traps.
|
||||
.A
|
||||
.QQ
|
||||
We assume existence of at least single word signed arithmetic
|
||||
in any implementation.
|
||||
.S2 "Floating point values"
|
||||
.P2 "Floating point values"
|
||||
.PP
|
||||
Floating point values must have a signed mantissa and a signed
|
||||
exponent.
|
||||
Although no base is specified, base 2 is the normal choice,
|
||||
because the FEF instruction pushes the exponent in base 2.
|
||||
.A
|
||||
.QQ
|
||||
The implementation of floating point arithmetic is optional.
|
||||
The compilers currently in use have runtime parameters for the
|
||||
size of the floating point values they should use.
|
||||
Common choices are 4 and/or 8 bytes.
|
||||
.S2 Pointers
|
||||
.P2 Pointers
|
||||
.PP
|
||||
EM has two kinds of pointers: for instruction and for data
|
||||
space.
|
||||
Each kind can only be used for its own space, conversion between
|
||||
|
@ -109,13 +113,14 @@ One can of course not expect to be able to address two megabyte
|
|||
of memory using a 2-byte pointer.
|
||||
Normally, a 2-byte pointer allows up to 65536 bytes of
|
||||
addressable memory.
|
||||
.A
|
||||
.QQ
|
||||
Pointer representation has one restriction.
|
||||
The pointer with the same representation as the integer zero of
|
||||
the same size should be invalid.
|
||||
Some languages and/or runtime systems represent the nil
|
||||
pointer as zero.
|
||||
.S2 "Bit sets"
|
||||
.P2 "Bit sets"
|
||||
.PP
|
||||
All bit sets of size \fBn\fP are subsets of the set
|
||||
{~i~|~i>=0,~i<\fBn\fP~}.
|
||||
A bit set contains a bit for each element showing its
|
||||
|
@ -129,7 +134,7 @@ The relation between a set with size of
|
|||
a word and an unsigned integer word is that
|
||||
the value of the unsigned integer is the summation of the
|
||||
2\v'-0.5m'i\v'0.5m' where i is in the set.
|
||||
.A
|
||||
.QQ
|
||||
Example: a 2-word bit set (wordsize 2) containing the
|
||||
elements 1, 6, 8, 15, 18, 21, 27 and 28 is composed of two
|
||||
integers, e.g. at addresses 40 and 42.
|
||||
|
|
Loading…
Reference in a new issue