Initial revision
This commit is contained in:
parent
253118db19
commit
e0872423d9
21 changed files with 7189 additions and 0 deletions
1121
doc/em/addend.n
Normal file
1121
doc/em/addend.n
Normal file
File diff suppressed because it is too large
Load diff
488
doc/em/app.nr
Normal file
488
doc/em/app.nr
Normal file
|
@ -0,0 +1,488 @@
|
|||
.BP
|
||||
.AP "EM INTERPRETER"
|
||||
.nf
|
||||
.ta 8 16 24 32 40 48 56 64 72 80
|
||||
.so em.i
|
||||
.fi
|
||||
.BP
|
||||
.AP "EM CODE TABLES"
|
||||
The following table is used by the assembler for EM machine
|
||||
language.
|
||||
It specifies the opcodes used for each instruction and
|
||||
how arguments are mapped to machine language arguments.
|
||||
The table is presented in three columns,
|
||||
each line in each column contains three or four fields.
|
||||
Each line describes a range of interpreter opcodes by
|
||||
specifying for which instruction the range is used, the type of the
|
||||
opcodes (mini, shortie, etc..) and range for the instruction
|
||||
argument.
|
||||
.A
|
||||
The first field on each line gives the EM instruction mnemonic,
|
||||
the second field gives some flags.
|
||||
If the opcodes are minis or shorties the third field specifies
|
||||
how many minis/shorties are used.
|
||||
The last field gives the number of the (first) interpreter
|
||||
opcode.
|
||||
.N 1
|
||||
Flags :
|
||||
.IS 3
|
||||
.N 1
|
||||
Opcode type, only one of the following may be specified.
|
||||
.PS - 5 " "
|
||||
.PT -
|
||||
opcode without argument
|
||||
.PT m
|
||||
mini
|
||||
.PT s
|
||||
shortie
|
||||
.PT 2
|
||||
opcode with 2-byte signed argument
|
||||
.PT 4
|
||||
opcode with 4-byte signed argument
|
||||
.PT 8
|
||||
opcode with 8-byte signed argument
|
||||
.PE
|
||||
Secondary (escaped) opcodes.
|
||||
.PS - 5 " "
|
||||
.PT e
|
||||
The opcode thus marked is in the secondary opcode group instead
|
||||
of the primary
|
||||
.PE
|
||||
restrictions on arguments
|
||||
.PS - 5 " "
|
||||
.PT N
|
||||
Negative arguments only
|
||||
.PT P
|
||||
Positive and zero arguments only
|
||||
.PE
|
||||
mapping of arguments
|
||||
.PS - 5 " "
|
||||
.PT w
|
||||
argument must be divisible by the wordsize and is divided by the
|
||||
wordsize before use as opcode argument.
|
||||
.PT o
|
||||
argument ( possibly after division ) must be >= 1 and is
|
||||
decremented before use as opcode argument
|
||||
.PE
|
||||
.IE
|
||||
If the opcode type is 2,4 or 8 the resulting argument is used as
|
||||
opcode argument (least significant byte first).
|
||||
.N
|
||||
If the opcode type is mini, the argument is added
|
||||
to the first opcode - if in range - .
|
||||
If the argument is negative, the absolute value minus one is
|
||||
used in the algorithm above.
|
||||
.N
|
||||
For shorties with positive arguments the first opcode is used
|
||||
for arguments in the range 0..255, the second for the range
|
||||
256..511, etc..
|
||||
For shorties with negative arguments the first opcode is used
|
||||
for arguments in the range -1..-256, the second for the range
|
||||
-257..-512, etc..
|
||||
The byte following the opcode contains the least significant
|
||||
byte of the argument.
|
||||
First some examples of these specifications.
|
||||
.PS - 5
|
||||
.PT "aar mwPo 1 34"
|
||||
Indicates that opcode 34 is used as a mini for Positive
|
||||
instruction arguments only.
|
||||
The w and o indicate division and decrementing of the
|
||||
instruction argument.
|
||||
Because the resulting argument must be zero ( only opcode 34 may be used
|
||||
), this mini can only be used for instruction argument 2.
|
||||
Conclusion: opcode 34 is for "AAR 2".
|
||||
.PT "adp sP 1 41"
|
||||
Opcode 41 is used as shortie for ADP with arguments in the range
|
||||
0..255.
|
||||
.PT "bra sN 2 60"
|
||||
Opcode 60 is used as shortie for BRA with arguments -1..-256,
|
||||
61 is used for arguments -257..-512.
|
||||
.PT "zer e- 145"
|
||||
Escaped opcode 145 is used for ZER.
|
||||
.PE
|
||||
The interpreter opcode table:
|
||||
.N 1
|
||||
.IS 3
|
||||
.DS B
|
||||
.so itables
|
||||
.DE 0
|
||||
.IE
|
||||
.P
|
||||
The table above results in the following dispatch tables.
|
||||
Dispatch tables are used by interpreters to jump to the
|
||||
routines implementing the EM instructions, indexed by the next opcode.
|
||||
Each line of the dispatch tables gives the routine names
|
||||
of eight consecutive opcodes, preceded by the first opcode number
|
||||
on that line.
|
||||
Routine names consist of an EM mnemonic followed by a suffix.
|
||||
The suffices show the encoding used for each opcode.
|
||||
.N
|
||||
The following suffices exist:
|
||||
.N 1
|
||||
.VS 1 0
|
||||
.IS 4
|
||||
.PS - 11
|
||||
.PT .z
|
||||
no arguments
|
||||
.PT .l
|
||||
16-bit argument
|
||||
.PT .lw
|
||||
16-bit argument divided by the wordsize
|
||||
.PT .p
|
||||
positive 16-bit argument
|
||||
.PT .pw
|
||||
positive 16-bit argument divided by the wordsize
|
||||
.PT .n
|
||||
negative 16-bit argument
|
||||
.PT .nw
|
||||
negative 16-bit argument divided by the wordsize
|
||||
.PT .s<num>
|
||||
shortie with <num> as high order argument byte
|
||||
.PT .sw<num>
|
||||
shortie with argument divided by the wordsize
|
||||
.PT .<num>
|
||||
mini with <num> as argument
|
||||
.PT .<num>W
|
||||
mini with <num>*wordsize as argument
|
||||
.PE 3
|
||||
<num> is a possibly negative integer.
|
||||
.VS 1 1
|
||||
.IE
|
||||
The dispatch table for the 256 primary opcodes:
|
||||
.DS B
|
||||
0 loc.0 loc.1 loc.2 loc.3 loc.4 loc.5 loc.6 loc.7
|
||||
8 loc.8 loc.9 loc.10 loc.11 loc.12 loc.13 loc.14 loc.15
|
||||
16 loc.16 loc.17 loc.18 loc.19 loc.20 loc.21 loc.22 loc.23
|
||||
24 loc.24 loc.25 loc.26 loc.27 loc.28 loc.29 loc.30 loc.31
|
||||
32 loc.32 loc.33 aar.1W adf.s0 adi.1W adi.2W adp.l adp.1
|
||||
40 adp.2 adp.s0 adp.s-1 ads.1W and.1W asp.1W asp.2W asp.3W
|
||||
48 asp.4W asp.5W asp.w0 beq.l beq.s0 bge.s0 bgt.s0 ble.s0
|
||||
56 blm.s0 blt.s0 bne.s0 bra.l bra.s-1 bra.s-2 bra.s0 bra.s1
|
||||
64 cal.1 cal.2 cal.3 cal.4 cal.5 cal.6 cal.7 cal.8
|
||||
72 cal.9 cal.10 cal.11 cal.12 cal.13 cal.14 cal.15 cal.16
|
||||
80 cal.17 cal.18 cal.19 cal.20 cal.21 cal.22 cal.23 cal.24
|
||||
88 cal.25 cal.26 cal.27 cal.28 cal.s0 cff.z cif.z cii.z
|
||||
96 cmf.s0 cmi.1W cmi.2W cmp.z cms.s0 csa.1W csb.1W dec.z
|
||||
104 dee.w0 del.w-1 dup.1W dvf.s0 dvi.1W fil.l inc.z ine.lw
|
||||
112 ine.w0 inl.-1W inl.-2W inl.-3W inl.w-1 inn.s0 ior.1W ior.s0
|
||||
120 lae.l lae.w0 lae.w1 lae.w2 lae.w3 lae.w4 lae.w5 lae.w6
|
||||
128 lal.p lal.n lal.0 lal.-1 lal.w0 lal.w-1 lal.w-2 lar.W
|
||||
136 ldc.0 lde.lw lde.w0 ldl.0 ldl.w-1 lfr.1W lfr.2W lfr.s0
|
||||
144 lil.w-1 lil.w0 lil.0 lil.1W lin.l lin.s0 lni.z loc.l
|
||||
152 loc.-1 loc.s0 loc.s-1 loe.lw loe.w0 loe.w1 loe.w2 loe.w3
|
||||
160 loe.w4 lof.l lof.1W lof.2W lof.3W lof.4W lof.s0 loi.l
|
||||
168 loi.1 loi.1W loi.2W loi.3W loi.4W loi.s0 lol.pw lol.nw
|
||||
176 lol.0 lol.1W lol.2W lol.3W lol.-1W lol.-2W lol.-3W lol.-4W
|
||||
184 lol.-5W lol.-6W lol.-7W lol.-8W lol.w0 lol.w-1 lxa.1 lxl.1
|
||||
192 lxl.2 mlf.s0 mli.1W mli.2W rck.1W ret.0 ret.1W ret.s0
|
||||
200 rmi.1W sar.1W sbf.s0 sbi.1W sbi.2W sdl.w-1 set.s0 sil.w-1
|
||||
208 sil.w0 sli.1W ste.lw ste.w0 ste.w1 ste.w2 stf.l stf.W
|
||||
216 stf.2W stf.s0 sti.1 sti.1W sti.2W sti.3W sti.4W sti.s0
|
||||
224 stl.pw stl.nw stl.0 stl.1W stl.-1W stl.-2W stl.-3W stl.-4W
|
||||
232 stl.-5W stl.w-1 teq.z tgt.z tlt.z tne.z zeq.l zeq.s0
|
||||
240 zeq.s1 zer.s0 zge.s0 zgt.s0 zle.s0 zlt.s0 zne.s0 zne.s-1
|
||||
248 zre.lw zre.w0 zrl.-1W zrl.-2W zrl.w-1 zrl.nw escape1 escape2
|
||||
.DE 2
|
||||
The list of secondary opcodes (escape1):
|
||||
.N 1
|
||||
.DS B
|
||||
0 aar.l aar.z adf.l adf.z adi.l adi.z ads.l ads.z
|
||||
8 adu.l adu.z and.l and.z asp.lw ass.l ass.z bge.l
|
||||
16 bgt.l ble.l blm.l bls.l bls.z blt.l bne.l cai.z
|
||||
24 cal.l cfi.z cfu.z ciu.z cmf.l cmf.z cmi.l cmi.z
|
||||
32 cms.l cms.z cmu.l cmu.z com.l com.z csa.l csa.z
|
||||
40 csb.l csb.z cuf.z cui.z cuu.z dee.lw del.pw del.nw
|
||||
48 dup.l dus.l dus.z dvf.l dvf.z dvi.l dvi.z dvu.l
|
||||
56 dvu.z fef.l fef.z fif.l fif.z inl.pw inl.nw inn.l
|
||||
64 inn.z ior.l ior.z lar.l lar.z ldc.l ldf.l ldl.pw
|
||||
72 ldl.nw lfr.l lil.pw lil.nw lim.z los.l los.z lor.s0
|
||||
80 lpi.l lxa.l lxl.l mlf.l mlf.z mli.l mli.z mlu.l
|
||||
88 mlu.z mon.z ngf.l ngf.z ngi.l ngi.z nop.z rck.l
|
||||
96 rck.z ret.l rmi.l rmi.z rmu.l rmu.z rol.l rol.z
|
||||
104 ror.l ror.z rtt.z sar.l sar.z sbf.l sbf.z sbi.l
|
||||
112 sbi.z sbs.l sbs.z sbu.l sbu.z sde.l sdf.l sdl.pw
|
||||
120 sdl.nw set.l set.z sig.z sil.pw sil.nw sim.z sli.l
|
||||
128 sli.z slu.l slu.z sri.l sri.z sru.l sru.z sti.l
|
||||
136 sts.l sts.z str.s0 tge.z tle.z trp.z xor.l xor.z
|
||||
144 zer.l zer.z zge.l zgt.l zle.l zlt.l zne.l zrf.l
|
||||
152 zrf.z zrl.pw dch.z exg.s0 exg.l exg.z lpb.z gto.l
|
||||
.DE 2
|
||||
Finally, the list of opcodes with four byte arguments (escape2).
|
||||
.DS
|
||||
|
||||
0 loc
|
||||
.DE 0
|
||||
.BP
|
||||
.AP "AN EXAMPLE PROGRAM"
|
||||
.DS B
|
||||
1 program example(output);
|
||||
2 {This program just demonstrates typical EM code.}
|
||||
3 type rec = record r1: integer; r2:real; r3: boolean end;
|
||||
4 var mi: integer; mx:real; r:rec;
|
||||
5
|
||||
6 function sum(a,b:integer):integer;
|
||||
7 begin
|
||||
8 sum := a + b
|
||||
9 end;
|
||||
10
|
||||
11 procedure test(var r: rec);
|
||||
12 label 1;
|
||||
13 var i,j: integer;
|
||||
14 x,y: real;
|
||||
15 b: boolean;
|
||||
16 c: char;
|
||||
17 a: array[1..100] of integer;
|
||||
18
|
||||
19 begin
|
||||
20 j := 1;
|
||||
21 i := 3 * j + 6;
|
||||
22 x := 4.8;
|
||||
23 y := x/0.5;
|
||||
24 b := true;
|
||||
25 c := 'z';
|
||||
26 for i:= 1 to 100 do a[i] := i * i;
|
||||
27 r.r1 := j+27;
|
||||
28 r.r3 := b;
|
||||
29 r.r2 := x+y;
|
||||
30 i := sum(r.r1, a[j]);
|
||||
31 while i > 0 do begin j := j + r.r1; i := i - 1 end;
|
||||
32 with r do begin r3 := b; r2 := x+y; r1 := 0 end;
|
||||
33 goto 1;
|
||||
34 1: writeln(j, i:6, x:9:3, b)
|
||||
35 end; {test}
|
||||
36 begin {main program}
|
||||
37 mx := 15.96;
|
||||
38 mi := 99;
|
||||
39 test(r)
|
||||
40 end.
|
||||
.DE 0
|
||||
.BP
|
||||
The EM code as produced by the Pascal-VU compiler is given below. Comments
|
||||
have been added manually. Note that this code has already been optimized.
|
||||
.DS B
|
||||
mes 2,2,2 ; wordsize 2, pointersize 2
|
||||
.1
|
||||
rom 't.p\e000' ; the name of the source file
|
||||
hol 552,-32768,0 ; externals and buf occupy 552 bytes
|
||||
exp $sum ; sum can be called from other modules
|
||||
pro $sum,2 ; procedure sum; 2 bytes local storage
|
||||
lin 8 ; code from source line 8
|
||||
ldl 0 ; load two locals ( a and b )
|
||||
adi 2 ; add them
|
||||
ret 2 ; return the result
|
||||
end 2 ; end of procedure ( still two bytes local storage )
|
||||
.2
|
||||
rom 1,99,2 ; descriptor of array a[]
|
||||
exp $test ; the compiler exports all level 0 procedures
|
||||
pro $test,226 ; procedure test, 226 bytes local storage
|
||||
.3
|
||||
rom 4.8F8 ; assemble Floating point 4.8 (8 bytes) in
|
||||
.4 ; global storage
|
||||
rom 0.5F8 ; same for 0.5
|
||||
mes 3,-226,2,2 ; compiler temporary not referenced by address
|
||||
mes 3,-24,2,0 ; the same is true for i, j, b and c in test
|
||||
mes 3,-22,2,0
|
||||
mes 3,-4,2,0
|
||||
mes 3,-2,2,0
|
||||
mes 3,-20,8,0 ; and for x and y
|
||||
mes 3,-12,8,0
|
||||
lin 20 ; maintain source line number
|
||||
loc 1
|
||||
stl -4 ; j := 1
|
||||
lni ; lin 21 prior to optimization
|
||||
lol -4
|
||||
loc 3
|
||||
mli 2
|
||||
loc 6
|
||||
adi 2
|
||||
stl -2 ; i := 3 * j + 6
|
||||
lni ; lin 22 prior to optimization
|
||||
lae .3
|
||||
loi 8
|
||||
lal -12
|
||||
sti 8 ; x := 4.8
|
||||
lni ; lin 23 prior to optimization
|
||||
lal -12
|
||||
loi 8
|
||||
lae .4
|
||||
loi 8
|
||||
dvf 8
|
||||
lal -20
|
||||
sti 8 ; y := x / 0.5
|
||||
lni ; lin 24 prior to optimization
|
||||
loc 1
|
||||
stl -22 ; b := true
|
||||
lni ; lin 25 prior to optimization
|
||||
loc 122
|
||||
stl -24 ; c := 'z'
|
||||
lni ; lin 26 prior to optimization
|
||||
loc 1
|
||||
stl -2 ; for i:= 1
|
||||
2
|
||||
lol -2
|
||||
dup 2
|
||||
mli 2 ; i*i
|
||||
lal -224
|
||||
lol -2
|
||||
lae .2
|
||||
sar 2 ; a[i] :=
|
||||
lol -2
|
||||
loc 100
|
||||
beq *3 ; to 100 do
|
||||
inl -2 ; increment i and loop
|
||||
bra *2
|
||||
3
|
||||
lin 27
|
||||
lol -4
|
||||
loc 27
|
||||
adi 2 ; j + 27
|
||||
sil 0 ; r.r1 :=
|
||||
lni ; lin 28 prior to optimization
|
||||
lol -22 ; b
|
||||
lol 0
|
||||
stf 10 ; r.r3 :=
|
||||
lni ; lin 29 prior to optimization
|
||||
lal -20
|
||||
loi 16
|
||||
adf 8 ; x + y
|
||||
lol 0
|
||||
adp 2
|
||||
sti 8 ; r.r2 :=
|
||||
lni ; lin 23 prior to optimization
|
||||
lal -224
|
||||
lol -4
|
||||
lae .2
|
||||
lar 2 ; a[j]
|
||||
lil 0 ; r.r1
|
||||
cal $sum ; call now
|
||||
asp 4 ; remove parameters from stack
|
||||
lfr 2 ; get function result
|
||||
stl -2 ; i :=
|
||||
4
|
||||
lin 31
|
||||
lol -2
|
||||
zle *5 ; while i > 0 do
|
||||
lol -4
|
||||
lil 0
|
||||
adi 2
|
||||
stl -4 ; j := j + r.r1
|
||||
del -2 ; i := i - 1
|
||||
bra *4 ; loop
|
||||
5
|
||||
lin 32
|
||||
lol 0
|
||||
stl -226 ; make copy of address of r
|
||||
lol -22
|
||||
lol -226
|
||||
stf 10 ; r3 := b
|
||||
lal -20
|
||||
loi 16
|
||||
adf 8
|
||||
lol -226
|
||||
adp 2
|
||||
sti 8 ; r2 := x + y
|
||||
loc 0
|
||||
sil -226 ; r1 := 0
|
||||
lin 34 ; note the abscence of the unnecesary jump
|
||||
lae 22 ; address of output structure
|
||||
lol -4
|
||||
cal $_wri ; write integer with default width
|
||||
asp 4 ; pop parameters
|
||||
lae 22
|
||||
lol -2
|
||||
loc 6
|
||||
cal $_wsi ; write integer width 6
|
||||
asp 6
|
||||
lae 22
|
||||
lal -12
|
||||
loi 8
|
||||
loc 9
|
||||
loc 3
|
||||
cal $_wrf ; write fixed format real, width 9, precision 3
|
||||
asp 14
|
||||
lae 22
|
||||
lol -22
|
||||
cal $_wrb ; write boolean, default width
|
||||
asp 4
|
||||
lae 22
|
||||
cal $_wln ; writeln
|
||||
asp 2
|
||||
ret 0 ; return, no result
|
||||
end 226
|
||||
exp $_main
|
||||
pro $_main,0 ; main program
|
||||
.6
|
||||
con 2,-1,22 ; description of external files
|
||||
.5
|
||||
rom 15.96F8
|
||||
fil .1 ; maintain source file name
|
||||
lae .6 ; description of external files
|
||||
lae 0 ; base of hol area to relocate buffer addresses
|
||||
cal $_ini ; initialize files, etc...
|
||||
asp 4
|
||||
lin 37
|
||||
lae .5
|
||||
loi 8
|
||||
lae 2
|
||||
sti 8 ; mx := 15.96
|
||||
lni ; lin 38 prior to optimization
|
||||
loc 99
|
||||
ste 0 ; mi := 99
|
||||
lni ; lin 39 prior to optimization
|
||||
lae 10 ; address of r
|
||||
cal $test
|
||||
asp 2
|
||||
loc 0 ; normal exit
|
||||
cal $_hlt ; cleanup and finish
|
||||
asp 2
|
||||
end 0
|
||||
mes 5 ; reals were used
|
||||
.DE 0
|
||||
The compact code corresponding to the above program is listed below.
|
||||
Read it horizontally, line by line, not column by column.
|
||||
Each number represents a byte of compact code, printed in decimal.
|
||||
The first two bytes form the magic word.
|
||||
.N 1
|
||||
.IS 3
|
||||
.DS B
|
||||
173 0 159 122 122 122 255 242 1 161 250 124 116 46 112 0
|
||||
255 156 245 40 2 245 0 128 120 155 249 123 115 117 109 160
|
||||
249 123 115 117 109 122 67 128 63 120 3 122 88 122 152 122
|
||||
242 2 161 121 219 122 255 155 249 124 116 101 115 116 160 249
|
||||
124 116 101 115 116 245 226 0 242 3 161 253 128 123 52 46
|
||||
56 255 242 4 161 253 128 123 48 46 53 255 159 123 245 30
|
||||
255 122 122 255 159 123 96 122 120 255 159 123 98 122 120 255
|
||||
159 123 116 122 120 255 159 123 118 122 120 255 159 123 100 128
|
||||
120 255 159 123 108 128 120 255 67 140 69 121 113 116 68 73
|
||||
116 69 123 81 122 69 126 3 122 113 118 68 57 242 3 72
|
||||
128 58 108 112 128 68 58 108 72 128 57 242 4 72 128 44
|
||||
128 58 100 112 128 68 69 121 113 98 68 69 245 122 0 113
|
||||
96 68 69 121 113 118 182 73 118 42 122 81 122 58 245 32
|
||||
255 73 118 57 242 2 94 122 73 118 69 220 10 123 54 118
|
||||
18 122 183 67 147 73 116 69 147 3 122 104 120 68 73 98
|
||||
73 120 111 130 68 58 100 72 136 2 128 73 120 4 122 112
|
||||
128 68 58 245 32 255 73 116 57 242 2 59 122 65 120 20
|
||||
249 123 115 117 109 8 124 64 122 113 118 184 67 151 73 118
|
||||
128 125 73 116 65 120 3 122 113 116 41 118 18 124 185 67
|
||||
152 73 120 113 245 30 255 73 98 73 245 30 255 111 130 58
|
||||
100 72 136 2 128 73 245 30 255 4 122 112 128 69 120 104
|
||||
245 30 255 67 154 57 142 73 116 20 249 124 95 119 114 105
|
||||
8 124 57 142 73 118 69 126 20 249 124 95 119 115 105 8
|
||||
126 57 142 58 108 72 128 69 129 69 123 20 249 124 95 119
|
||||
114 102 8 134 57 142 73 98 20 249 124 95 119 114 98 8
|
||||
124 57 142 20 249 124 95 119 108 110 8 122 88 120 152 245
|
||||
226 0 155 249 125 95 109 97 105 110 160 249 125 95 109 97
|
||||
105 110 120 242 6 151 122 119 142 255 242 5 161 253 128 125
|
||||
49 53 46 57 54 255 50 242 1 57 242 6 57 120 20 249
|
||||
124 95 105 110 105 8 124 67 157 57 242 5 72 128 57 122
|
||||
112 128 68 69 219 110 120 68 57 130 20 249 124 116 101 115
|
||||
116 8 122 69 120 20 249 124 95 104 108 116 8 122 152 120
|
||||
159 124 160 255 159 125 255
|
||||
.DE 0
|
||||
.IE
|
||||
.MS T A 0
|
||||
.ME
|
||||
.BP
|
||||
.MS B A 0
|
||||
.ME
|
||||
.CT
|
756
doc/em/assem.nr
Normal file
756
doc/em/assem.nr
Normal file
|
@ -0,0 +1,756 @@
|
|||
.BP
|
||||
.SN 11
|
||||
.S1 "EM ASSEMBLY LANGUAGE"
|
||||
We use two representations for assembly language programs,
|
||||
one is in ASCII and the other is the compact assembly language.
|
||||
The latter needs less space than the first for the same program
|
||||
and therefore allows faster processing.
|
||||
Our only program accepting ASCII assembly
|
||||
language converts it to the compact form.
|
||||
All other programs expect compact assembly input.
|
||||
The first part of the chapter describes the ASCII assembly
|
||||
language and its semantics.
|
||||
The second part describes the syntax of the compact assembly
|
||||
language.
|
||||
The last part lists the EM instructions with the type of
|
||||
arguments allowed and an indication of the function.
|
||||
Appendix A gives a detailed description of the effect of all
|
||||
instructions in the form of a Pascal program.
|
||||
.S2 "ASCII assembly language"
|
||||
An assembly language program consists of a series of lines, each
|
||||
line may be blank, contain one (pseudo)instruction or contain one
|
||||
label.
|
||||
Input to the assembler is in lower case.
|
||||
Upper case is used in this
|
||||
document merely to distinguish keywords from the surrounding prose.
|
||||
Comment is allowed at the end of each line and starts with a semicolon ";".
|
||||
This kind of comment does not exist in the compact form.
|
||||
.A
|
||||
Labels must be placed all by themselves on a line and start in
|
||||
column 1.
|
||||
There are two kinds of labels, instruction and data labels.
|
||||
Instruction labels are unsigned positive integers.
|
||||
The scope of an instruction label is its procedure.
|
||||
.A
|
||||
The pseudoinstructions CON, ROM and BSS may be preceded by a
|
||||
line containing a
|
||||
1-8 character data label, the first character of which is a
|
||||
letter, period or underscore.
|
||||
The period may only be followed by
|
||||
digits, the others may be followed by letters, digits and underscores.
|
||||
The use of the character "." followed by a constant,
|
||||
which must be in the range 1 to 32767 (e.g. ".40") is recommended
|
||||
for compiler
|
||||
generated programs.
|
||||
These labels are considered as a special case and handled
|
||||
more efficiently in compact assembly language (see below).
|
||||
Note that a data label on its own or two consecutive labels are not
|
||||
allowed.
|
||||
.P
|
||||
Each statement may contain an instruction mnemonic or pseudoinstruction.
|
||||
These must begin in column 2 or later (not column 1) and must be followed
|
||||
by a space, tab, semicolon or LF.
|
||||
Everything on the line following a semicolon is
|
||||
taken as a comment.
|
||||
.P
|
||||
Each input file contains one module.
|
||||
A module may contain many procedures,
|
||||
which may be nested.
|
||||
A procedure consists of
|
||||
a PRO statement, a (possibly empty)
|
||||
collection of instructions and pseudoinstructions and finally an END
|
||||
statement.
|
||||
Pseudoinstructions are also allowed between procedures.
|
||||
They do not belong to a specific procedure.
|
||||
.P
|
||||
All constants in EM are interpreted in the decimal base.
|
||||
The ASCII assembly language accepts constant expressions
|
||||
wherever constants are allowed.
|
||||
The operators recognized are: +, -, *, % and / with the usual
|
||||
precedence order.
|
||||
Use of the parentheses ( and ) to alter the precedence order is allowed.
|
||||
.S3 "Instruction arguments"
|
||||
Unlike many other assembly languages, the EM assembly
|
||||
language requires all arguments of normal and pseudoinstructions
|
||||
to be either a constant or an identifier, but not a combination
|
||||
of these two.
|
||||
There is one exception to this rule: when a data label is used
|
||||
for initialization or as an instruction argument,
|
||||
expressions of the form 'label+constant' and 'label-constant'
|
||||
are allowed.
|
||||
This makes it possible to address, for example, the
|
||||
third word of a ten word BSS block
|
||||
directly.
|
||||
Thus LOE LABEL+4 is permitted and so is CON LABEL+3.
|
||||
The resulting address is must be in the same fragment as the label.
|
||||
It is not allowed to add or subtract from instruction labels or procedure
|
||||
identifiers,
|
||||
which certainly is not a severe restriction and greatly aids
|
||||
optimization.
|
||||
.P
|
||||
Instruction arguments can be constants,
|
||||
data labels, data labels offsetted by a constant, instruction
|
||||
labels and procedure identifiers.
|
||||
The range of integers allowed depends on the instruction.
|
||||
Most instructions allow only integers
|
||||
(signed or unsigned)
|
||||
that fit in a word.
|
||||
Arguments used as offsets to pointers should fit in a
|
||||
pointer-sized integer.
|
||||
Finally, arguments to LDC should fit in a double-word integer.
|
||||
.P
|
||||
Several instructions have two possible forms:
|
||||
with an explicit argument and with an implicit argument on top of the stack.
|
||||
The size of the implicit argument is the wordsize.
|
||||
The implicit argument is always popped before all other operands.
|
||||
For example: 'CMI 4' specifies that two four-byte signed
|
||||
integers on top of the stack are to be compared.
|
||||
\&'CMI' without an argument expects a wordsized integer
|
||||
on top of the stack that specifies the size of the integers to
|
||||
be compared.
|
||||
Thus the following two sequences are equivalent:
|
||||
.N 2
|
||||
.TS
|
||||
center, tab(:) ;
|
||||
l r 30 l r.
|
||||
LDL:-10:LDL:-10
|
||||
LDL:-14:LDL:-14
|
||||
::LOC:4
|
||||
CMI:4:CMI:
|
||||
ZEQ:*1:ZEQ:*1
|
||||
.TE 2
|
||||
Section 11.1.6 shows the arguments allowed for each instruction.
|
||||
.S3 "Pseudoinstruction arguments"
|
||||
Pseudoinstruction arguments can be divided in two classes:
|
||||
Initializers and others.
|
||||
The following initializers are allowed: signed integer constants,
|
||||
unsigned integer constants, floating-point constants, strings,
|
||||
data labels, data labels offsetted by a constant, instruction
|
||||
labels and procedure identifiers.
|
||||
.P
|
||||
Constant initializers in BSS, HOL, CON and ROM pseudoinstructions
|
||||
can be followed by a letter I, U or F.
|
||||
This indicator
|
||||
specifies the type of the initializer: Integer, Unsigned or Float.
|
||||
If no indicator is present I is assumed.
|
||||
The size of the object is the wordsize unless
|
||||
the indicator is followed by an integer specifying the
|
||||
object's size.
|
||||
This integer is governed by the same restrictions as for
|
||||
transfer of objects to/from memory.
|
||||
As in instruction arguments, initializers include expressions of the form:
|
||||
\&"LABEL+offset" and "LABEL-offset".
|
||||
The offset must be an unsigned decimal constant.
|
||||
The 'IUF' indicators cannot be used in the offsets.
|
||||
.P
|
||||
Data labels are referred to by their name.
|
||||
.P
|
||||
Strings are surrounded by double quotes (").
|
||||
Semecolon's in string do not indicate the start of comment.
|
||||
In the ASCII representation the escape character \e (backslash)
|
||||
alters the meaning of subsequent character(s).
|
||||
This feature allows inclusion of zeroes, graphic characters and
|
||||
the double quote in the string.
|
||||
The following escape sequences exist:
|
||||
.DS
|
||||
.TS
|
||||
center, tab(:);
|
||||
l l l.
|
||||
newline:NL\|(LF):\en
|
||||
horizontal tab:HT:\et
|
||||
backspace:BS:\eb
|
||||
carriage return:CR:\er
|
||||
form feed:FF:\ef
|
||||
backslash:\e:\e\e
|
||||
double quote:":\e"
|
||||
bit pattern:\fBddd\fP:\e\fBddd\fP
|
||||
.TE
|
||||
.DE
|
||||
The escape \fBddd\fP consists of the backslash followed by 1,
|
||||
2, or 3 octal digits specifing the value of
|
||||
the desired character.
|
||||
If the character following a backslash is not one of those
|
||||
specified,
|
||||
the backslash is ignored.
|
||||
Example: CON "hello\e012\e0".
|
||||
Each string element initializes a single byte.
|
||||
The ASCII character set is used to map characters onto values.
|
||||
Strings are padded with zeroes up to a multiple of the wordsize.
|
||||
.P
|
||||
Instruction labels are referred to as *1, *2, etc. in both branch
|
||||
instructions and as initializers.
|
||||
.P
|
||||
The notation $procname means the identifier for the procedure
|
||||
with the specified name.
|
||||
This identifier has the size of a pointer.
|
||||
.S3 Notation
|
||||
First, the notation used for the arguments, classes of
|
||||
instructions and pseudoinstructions.
|
||||
.IS 2
|
||||
.TS
|
||||
tab(:);
|
||||
l l l.
|
||||
<cst>:\&=:integer constant (current range -2**31..2**31-1)
|
||||
<dlb>:\&=:data label
|
||||
<arg>:\&=:<cst> or <dlb> or <dlb>+<cst> or <dlb>-<cst>
|
||||
<con>:\&=:integer constant, unsigned constant, floating-point constant
|
||||
<str>:\&=:string constant (surrounded by double quotes),
|
||||
<ilb>:\&=:instruction label
|
||||
::'*' followed by an integer in the range 0..32767.
|
||||
<pro>:\&=:procedure number ('$' followed by a procedure name)
|
||||
<val>:\&=:<arg>, <con>, <pro> or <ilb>.
|
||||
<par>:\&=:<val> or <str>
|
||||
<...>*:\&=:zero or more of <...>
|
||||
<...>+:\&=:one or more of <...>
|
||||
[...]:\&=:optional ...
|
||||
.TE
|
||||
.IE
|
||||
.S3 "Pseudoinstructions"
|
||||
.S4 Storage declaration
|
||||
Initialized global data is allocated by the pseudoinstruction CON,
|
||||
which needs at least one argument.
|
||||
For each argument, an integral number of words,
|
||||
determined by the argument type, is allocated and initialized.
|
||||
.P
|
||||
The pseudoinstruction ROM is the same as CON,
|
||||
except that it guarantees that the initialized words
|
||||
will not change during the execution of the program.
|
||||
This information allows optimizers to do
|
||||
certain calculations such as array indexing and
|
||||
subrange checking at compile time instead
|
||||
of at run time.
|
||||
.P
|
||||
The pseudoinstruction BSS allocates
|
||||
uninitialized global data or large blocks of data initialized
|
||||
by the same value.
|
||||
The first argument to this pseudo is the number
|
||||
of bytes required, which must be a multiple of the wordsize.
|
||||
The other arguments specify the value used for initialization and
|
||||
whether the initialization is only for convenience or a strict necessity.
|
||||
The pseudoinstruction HOL is similar to BSS in that it requests an
|
||||
(un)initialized global data block.
|
||||
Addressing of a HOL block, however, is quasi absolute.
|
||||
The first byte is addressed by 0,
|
||||
the second byte by 1 etc. in assembly language.
|
||||
The assembler/loader adds the base address of
|
||||
the HOL block to these numbers to obtain the
|
||||
absolute address in the machine language.
|
||||
.P
|
||||
The scope of a HOL block starts at the HOL pseudo and
|
||||
ends at the next HOL pseudo or at the end of a module
|
||||
whatever comes first.
|
||||
Each instruction falls in the scope of at most one
|
||||
HOL block, the current HOL block.
|
||||
It is not allowed to have more than one HOL block per procedure.
|
||||
.P
|
||||
The alignment restrictions are enforced by the
|
||||
pseudoinstructions.
|
||||
All objects are aligned on a multiple of their size or the wordsize
|
||||
whichever is smaller.
|
||||
Switching to another type of fragment or placing a label forces
|
||||
word-alignment.
|
||||
There are three types of fragments in global data space: CON, ROM and
|
||||
BSS/HOL.
|
||||
.N 2
|
||||
.IS 2
|
||||
.PS - 4
|
||||
.PT "BSS <cst1>,<val>,<cst2>"
|
||||
Reserve <cst1> bytes.
|
||||
<val> is the value used to initialize the area.
|
||||
<cst1> must be a multiple of the size of <val>.
|
||||
<cst2> is 0 if the initialization is not strictly necessary,
|
||||
1 if it is.
|
||||
.PT "HOL <cst1>,<val>,<cst2>"
|
||||
Idem, but all following absolute global data references will
|
||||
refer to this block.
|
||||
Only one HOL is allowed per procedure,
|
||||
it has to be placed before the first instruction.
|
||||
.PT "CON <val>+"
|
||||
Assemble global data words initialized with the <val> constants.
|
||||
.PT "ROM <val>+"
|
||||
Idem, but the initialized data will never be changed by the program.
|
||||
.PE
|
||||
.IE
|
||||
.S4 Partitioning
|
||||
Two pseudoinstructions partition the input into procedures:
|
||||
.IS 2
|
||||
.PS - 4
|
||||
.PT "PRO <pro>[,<cst>]"
|
||||
Start of procedure.
|
||||
<pro> is the procedure name.
|
||||
<cst> is the number of bytes for locals.
|
||||
The number of bytes for locals must be specified in the PRO or
|
||||
END pseudoinstruction.
|
||||
When specified in both, they must be identical.
|
||||
.PT "END [<cst>]"
|
||||
End of Procedure.
|
||||
<cst> is the number of bytes for locals.
|
||||
The number of bytes for locals must be specified in either the PRO or
|
||||
END pseudoinstruction or both.
|
||||
.PE
|
||||
.IE
|
||||
.S4 Visibility
|
||||
Names of data and procedures in an EM module can either be
|
||||
internal or external.
|
||||
External names are known outside the module and are used to link
|
||||
several pieces of a program.
|
||||
Internal names are not known outside the modules they are used in.
|
||||
Other modules will not 'see' an internal name.
|
||||
.A
|
||||
To reduce the number of passes needed,
|
||||
it must be known at the first occurrence whether
|
||||
a name is internal or external.
|
||||
If the first occurrence of a name is in a definition,
|
||||
the name is considered to be internal.
|
||||
If the first occurrence of a name is a reference,
|
||||
the name is considered to be external.
|
||||
If the first occurrence is in one of the following pseudoinstructions,
|
||||
the effect of the pseudo has precedence.
|
||||
.IS 2
|
||||
.PS - 4
|
||||
.PT "EXA <dlb>"
|
||||
External name.
|
||||
<dlb> is known, possibly defined, outside this module.
|
||||
Note that <dlb> may be defined in the same module.
|
||||
.PT "EXP <pro>"
|
||||
External procedure identifier.
|
||||
Note that <pro> may be defined in the same module.
|
||||
.PT "INA <dlb>"
|
||||
Internal name.
|
||||
<dlb> is internal to this module and must be defined in this module.
|
||||
.PT "INP <pro>"
|
||||
Internal procedure.
|
||||
<pro> is internal to this module and must be defined in this module.
|
||||
.PE
|
||||
.IE
|
||||
.S4 Miscellaneous
|
||||
Two other pseudoinstructions provide miscellaneous features:
|
||||
.IS 2
|
||||
.PS - 4
|
||||
.PT "EXC <cst1>,<cst2>"
|
||||
Two blocks of instructions preceding this one are
|
||||
interchanged before being processed.
|
||||
<cst1> gives the number of lines of the first block.
|
||||
<cst2> gives the number of lines of the second one.
|
||||
Blank and pure comment lines do not count.
|
||||
.PT "MES <cst>[,<par>]*"
|
||||
A special type of comment.
|
||||
Used by compilers to communicate with the
|
||||
optimizer, assembler, etc. as follows:
|
||||
.VS 1 0
|
||||
.PS - 4
|
||||
.PT "MES 0"
|
||||
An error has occurred, stop further processing.
|
||||
.PT "MES 1"
|
||||
Suppress optimization.
|
||||
.PT "MES 2,<cst1>,<cst2>"
|
||||
Use wordsize <cst1> and pointer size <cst2>.
|
||||
.PT "MES 3,<cst1>,<cst2>,<cst3>,<cst4>"
|
||||
Indicates that a local variable is never referenced indirectly.
|
||||
Used to indicate that a register may be used for a specific
|
||||
variable.
|
||||
<cst1> is offset in bytes from AB if positive
|
||||
and offset from LB if negative.
|
||||
<cst2> gives the size of the variable.
|
||||
<cst3> indicates the class of the variable.
|
||||
The following values are currently recognized:
|
||||
.PS
|
||||
.PT 0
|
||||
The variable can be used for anything.
|
||||
.PT 1
|
||||
The variable is used as a loopindex.
|
||||
.PT 2
|
||||
The variable is used as a pointer.
|
||||
.PT 3
|
||||
The variable is used as a floating point number.
|
||||
.PE 0
|
||||
<cst4> gives the priority of the variable,
|
||||
higher numbers indicate better candidates.
|
||||
.PT "MES 4,<cst>,<str>"
|
||||
Number of source lines in file <str> (for profiler).
|
||||
.PT "MES 5"
|
||||
Floating point used.
|
||||
.PT "MES 6,<val>*"
|
||||
Comment. Used to provide comments in compact assembly language.
|
||||
.PT "MES 7,....."
|
||||
Reserved.
|
||||
.PT "MES 8,<pro>[,<dlb>]..."
|
||||
Library module. Indicates that the module may only be loaded
|
||||
if it is useful, that is, if it can satisfy any unresolved
|
||||
references during the loading process.
|
||||
May not be preceded by any other pseudo, except MES's.
|
||||
.PT "MES 9,<cst>"
|
||||
Guarantees that no more than <cst> bytes of parameters are
|
||||
accessed, either directly or indirectly.
|
||||
.PE 1
|
||||
.VS 1 1
|
||||
Each backend is free to skip irrelevant MES pseudos.
|
||||
.PE
|
||||
.IE
|
||||
.S2 "The Compact Assembly Language"
|
||||
The assembler accepts input in a highly encoded form.
|
||||
This
|
||||
form is intended to reduce the amount of file transport between the
|
||||
front ends, optimizers
|
||||
and back ends, and also reduces the amount of storage required for storing
|
||||
libraries.
|
||||
Libraries are stored as archived compact assembly language, not machine
|
||||
language.
|
||||
.P
|
||||
When beginning to read the input, the assembler is in neutral state, and
|
||||
expects either a label or an instruction (including the pseudoinstructions).
|
||||
The meaning of the next byte(s) when in neutral state is as follows, where
|
||||
b1, b2
|
||||
etc. represent the succeeding bytes.
|
||||
.N 1
|
||||
.DS
|
||||
.TS
|
||||
tab(:) ;
|
||||
rw17 4 l.
|
||||
0:Reserved for future use
|
||||
1-129:Machine instructions, see Appendix A, alphabetical list
|
||||
130-149:Reserved for future use
|
||||
150-161:BSS,CON,END,EXA,EXC,EXP,HOL,INA,INP,MES,PRO,ROM
|
||||
162-179:Reserved for future pseudoinstructions
|
||||
180-239:Instruction labels 0 - 59 (180 is local label 0 etc.)
|
||||
240-244:See the Common Table below
|
||||
245-255:Not used
|
||||
.TE 1
|
||||
.DE 0
|
||||
After a label, the assembler is back in neutral state; it can immediately
|
||||
accept another label or an instruction in the next byte.
|
||||
No linefeeds are used to separate lines.
|
||||
.P
|
||||
If an opcode expects no arguments,
|
||||
the assembler is back in neutral state after
|
||||
reading the one byte containing the instruction number.
|
||||
If it has one or
|
||||
more arguments (only pseudos have more than 1), the arguments follow directly,
|
||||
encoded as follows:
|
||||
.N 1
|
||||
.IS 2
|
||||
.TS
|
||||
tab(:);
|
||||
r l.
|
||||
0-239:Offsets from -120 to 119
|
||||
|
||||
240-255:See the Common Table below
|
||||
.TE 1
|
||||
Absence of an optional argument is indicated by a special
|
||||
byte.
|
||||
.IE 2
|
||||
.CS
|
||||
Common Table for Neutral State and Arguments
|
||||
.CE
|
||||
.TS
|
||||
tab(:);
|
||||
c c s c
|
||||
l8 l l8 l.
|
||||
class:bytes:description
|
||||
|
||||
<ilb>:240:b1:Instruction label b1 (Not used for branches)
|
||||
<ilb>:241:b1 b2:16 bit instruction label (256*b2 + b1)
|
||||
<dlb>:242:b1:Global label .0-.255, with b1 being the label
|
||||
<dlb>:243:b1 b2:Global label .0-.32767
|
||||
:::with 256*b2+b1 being the label
|
||||
<dlb>:244:<string>:Global symbol not of the form .nnn
|
||||
<cst>:245:b1 b2:16 bit constant
|
||||
<cst>:246:b1 b2 b3 b4:32 bit constant
|
||||
<cst>:247:b1 .. b8:64 bit constant
|
||||
<arg>:248:<dlb><cst>:Global label + (possibly negative) constant
|
||||
<pro>:249:<string>:Procedure name (not including $)
|
||||
<str>:250:<string>:String used in CON or ROM (no quotes-no escapes)
|
||||
<con>:251:<cst><string>:Integer constant, size <cst> bytes
|
||||
<con>:252:<cst><string>:Unsigned constant, size <cst> bytes
|
||||
<con>:253:<cst><string>:Floating constant, size <cst> bytes
|
||||
:254::unused
|
||||
<end>:255::Delimiter for argument lists or
|
||||
:::indicates absence of optional argument
|
||||
.TE 1
|
||||
.P
|
||||
The bytes specifying the value of a 16, 32 or 64 bit constant
|
||||
are presented in two's complement notation, with the least
|
||||
significant byte first. For example: the value of a 32 bit
|
||||
constant is ((s4*256+b3)*256+b2)*256+b1, where s4 is b4-256 if
|
||||
b4 is greater than 128 else s4 takes the value of b4.
|
||||
A <string> consists of a <cst> inmediatly followed by
|
||||
a sequence of bytes with length <cst>.
|
||||
.P
|
||||
.ne 8
|
||||
The pseudoinstructions fall into several categories, depending on their
|
||||
arguments:
|
||||
.N 1
|
||||
.DS
|
||||
Group 1 -- EXC, BSS, HOL have a known number of arguments
|
||||
Group 2 -- EXA, EXP, INA, INP have a string as argument
|
||||
Group 3 -- CON, MES, ROM have a variable number of various things
|
||||
Group 4 -- END, PRO have a trailing optional argument.
|
||||
.DE 1
|
||||
Groups 1 and 2
|
||||
use the encoding described above.
|
||||
Group 3 also uses the encoding listed above, with an <end> byte after the
|
||||
last argument to indicate the end of the list.
|
||||
Group 4 uses
|
||||
an <end> byte if the trailing argument is not present.
|
||||
.N 2
|
||||
.IS 2
|
||||
.TS
|
||||
tab(|);
|
||||
l s l
|
||||
l s s
|
||||
l 2 lw(46) l.
|
||||
Example ASCII|Example compact
|
||||
(LOC = 69, BRA = 18 here):
|
||||
|
||||
2||182
|
||||
1||181
|
||||
LOC|10|69 130
|
||||
LOC|-10|69 110
|
||||
LOC|300|69 245 44 1
|
||||
BRA|*19|18 139
|
||||
300||241 44 1
|
||||
.3||242 3
|
||||
CON|4,9,*2,$foo|151 124 129 240 2 249 123 102 111 111 255
|
||||
CON|.35|151 242 35 255
|
||||
.TE 0
|
||||
.IE 0
|
||||
.BP
|
||||
.S2 "Assembly language instruction list"
|
||||
.P
|
||||
For each instruction in the list the range of argument values
|
||||
in the assembly language is given.
|
||||
The column headed \fIassem\fP contains the mnemonics defined
|
||||
in 11.1.3.
|
||||
The following column specifies restrictions of the argument
|
||||
value.
|
||||
Addresses have to obey the restrictions mentioned in chapter 2.
|
||||
The classes of arguments
|
||||
are indicated by letters:
|
||||
.ds b \fBb\fP
|
||||
.ds c \fBc\fP
|
||||
.ds d \fBd\fP
|
||||
.ds g \fBg\fP
|
||||
.ds f \fBf\fP
|
||||
.ds l \fBl\fP
|
||||
.ds n \fBn\fP
|
||||
.ds w \fBw\fP
|
||||
.ds p \fBp\fP
|
||||
.ds r \fBr\fP
|
||||
.ds s \fBs\fP
|
||||
.ds z \fBz\fP
|
||||
.ds o \fBo\fP
|
||||
.ds - \fB-\fP
|
||||
.N 1
|
||||
.TS
|
||||
tab(:);
|
||||
c s l l
|
||||
l l 15 l l.
|
||||
\fIassem\fP:constraints:rationale
|
||||
|
||||
\&\*c:cst:fits word:constant
|
||||
\&\*d:cst:fits double word:constant
|
||||
\&\*l:cst::local offset
|
||||
\&\*g:arg:>= 0:global offset
|
||||
\&\*f:cst::fragment offset
|
||||
\&\*n:cst:>= 0:counter
|
||||
\&\*s:cst:>0 , word multiple:object size
|
||||
\&\*z:cst:>= 0 , zero or word multiple:object size
|
||||
\&\*o:cst:>= 0 , word multiple or fraction:object size
|
||||
\&\*w:cst:> 0 , word multiple:object size *
|
||||
\&\*p:pro::pro identifier
|
||||
\&\*b:ilb:>= 0:label number
|
||||
\&\*r:cst:0,1,2:register number
|
||||
\&\*-:::no argument
|
||||
.TE 1
|
||||
.P
|
||||
The * at the rationale for \*w indicates that the argument
|
||||
can either be given as argument or on top of the stack.
|
||||
If the argument is omitted, the argument is fetched from the
|
||||
stack;
|
||||
it is assumed to be a wordsized unsigned integer.
|
||||
Instructions that check for undefined integer or floating-point
|
||||
values and underflow or overflow
|
||||
are indicated below by (*).
|
||||
.N 1
|
||||
.DS B
|
||||
GROUP 1 - LOAD
|
||||
|
||||
LOC \*c : Load constant (i.e. push one word onto the stack)
|
||||
LDC \*d : Load double constant ( push two words )
|
||||
LOL \*l : Load word at \*l-th local (\*l<0) or parameter (\*l>=0)
|
||||
LOE \*g : Load external word \*g
|
||||
LIL \*l : Load word pointed to by \*l-th local or parameter
|
||||
LOF \*f : Load offsetted (top of stack + \*f yield address)
|
||||
LAL \*l : Load address of local or parameter
|
||||
LAE \*g : Load address of external
|
||||
LXL \*n : Load lexical (address of LB \*n static levels back)
|
||||
LXA \*n : Load lexical (address of AB \*n static levels back)
|
||||
LOI \*o : Load indirect \*o bytes (address is popped from the stack)
|
||||
LOS \*w : Load indirect, \*w-byte integer on top of stack gives object size
|
||||
LDL \*l : Load double local or parameter (two consecutive words are stacked)
|
||||
LDE \*g : Load double external (two consecutive externals are stacked)
|
||||
LDF \*f : Load double offsetted (top of stack + \*f yield address)
|
||||
LPI \*p : Load procedure identifier
|
||||
|
||||
GROUP 2 - STORE
|
||||
|
||||
STL \*l : Store local or parameter
|
||||
STE \*g : Store external
|
||||
SIL \*l : Store into word pointed to by \*l-th local or parameter
|
||||
STF \*f : Store offsetted
|
||||
STI \*o : Store indirect \*o bytes (pop address, then data)
|
||||
STS \*w : Store indirect, \*w-byte integer on top of stack gives object size
|
||||
SDL \*l : Store double local or parameter
|
||||
SDE \*g : Store double external
|
||||
SDF \*f : Store double offsetted
|
||||
|
||||
GROUP 3 - INTEGER ARITHMETIC
|
||||
|
||||
ADI \*w : Addition (*)
|
||||
SBI \*w : Subtraction (*)
|
||||
MLI \*w : Multiplication (*)
|
||||
DVI \*w : Division (*)
|
||||
RMI \*w : Remainder (*)
|
||||
NGI \*w : Negate (two's complement) (*)
|
||||
SLI \*w : Shift left (*)
|
||||
SRI \*w : Shift right (*)
|
||||
|
||||
GROUP 4 - UNSIGNED ARITHMETIC
|
||||
|
||||
ADU \*w : Addition
|
||||
SBU \*w : Subtraction
|
||||
MLU \*w : Multiplication
|
||||
DVU \*w : Division
|
||||
RMU \*w : Remainder
|
||||
SLU \*w : Shift left
|
||||
SRU \*w : Shift right
|
||||
|
||||
GROUP 5 - FLOATING POINT ARITHMETIC
|
||||
|
||||
ADF \*w : Floating add (*)
|
||||
SBF \*w : Floating subtract (*)
|
||||
MLF \*w : Floating multiply (*)
|
||||
DVF \*w : Floating divide (*)
|
||||
NGF \*w : Floating negate (*)
|
||||
FIF \*w : Floating multiply and split integer and fraction part (*)
|
||||
FEF \*w : Split floating number in exponent and fraction part (*)
|
||||
|
||||
GROUP 6 - POINTER ARITHMETIC
|
||||
|
||||
ADP \*f : Add \*f to pointer on top of stack
|
||||
ADS \*w : Add \*w-byte value and pointer
|
||||
SBS \*w : Subtract pointers in same fragment and push diff as size \*w integer
|
||||
|
||||
GROUP 7 - INCREMENT/DECREMENT/ZERO
|
||||
|
||||
INC \*- : Increment word on top of stack by 1 (*)
|
||||
INL \*l : Increment local or parameter (*)
|
||||
INE \*g : Increment external (*)
|
||||
DEC \*- : Decrement word on top of stack by 1 (*)
|
||||
DEL \*l : Decrement local or parameter (*)
|
||||
DEE \*g : Decrement external (*)
|
||||
ZRL \*l : Zero local or parameter
|
||||
ZRE \*g : Zero external
|
||||
ZRF \*w : Load a floating zero of size \*w
|
||||
ZER \*w : Load \*w zero bytes
|
||||
|
||||
GROUP 8 - CONVERT (stack: source, source size, dest. size (top))
|
||||
|
||||
CII \*- : Convert integer to integer (*)
|
||||
CUI \*- : Convert unsigned to integer (*)
|
||||
CFI \*- : Convert floating to integer (*)
|
||||
CIF \*- : Convert integer to floating (*)
|
||||
CUF \*- : Convert unsigned to floating (*)
|
||||
CFF \*- : Convert floating to floating (*)
|
||||
CIU \*- : Convert integer to unsigned
|
||||
CUU \*- : Convert unsigned to unsigned
|
||||
CFU \*- : Convert floating to unsigned
|
||||
|
||||
GROUP 9 - LOGICAL
|
||||
|
||||
AND \*w : Boolean and on two groups of \*w bytes
|
||||
IOR \*w : Boolean inclusive or on two groups of \*w bytes
|
||||
XOR \*w : Boolean exclusive or on two groups of \*w bytes
|
||||
COM \*w : Complement (one's complement of top \*w bytes)
|
||||
ROL \*w : Rotate left a group of \*w bytes
|
||||
ROR \*w : Rotate right a group of \*w bytes
|
||||
|
||||
GROUP 10 - SETS
|
||||
|
||||
INN \*w : Bit test on \*w byte set (bit number on top of stack)
|
||||
SET \*w : Create singleton \*w byte set with bit n on (n is top of stack)
|
||||
|
||||
GROUP 11 - ARRAY
|
||||
|
||||
LAR \*w : Load array element, descriptor contains integers of size \*w
|
||||
SAR \*w : Store array element
|
||||
AAR \*w : Load address of array element
|
||||
|
||||
GROUP 12 - COMPARE
|
||||
|
||||
CMI \*w : Compare \*w byte integers, Push negative, zero, positive for <, = or >
|
||||
CMF \*w : Compare \*w byte reals
|
||||
CMU \*w : Compare \*w byte unsigneds
|
||||
CMS \*w : Compare \*w byte values, can only be used for bit for bit equality test
|
||||
CMP \*- : Compare pointers
|
||||
|
||||
TLT \*- : True if less, i.e. iff top of stack < 0
|
||||
TLE \*- : True if less or equal, i.e. iff top of stack <= 0
|
||||
TEQ \*- : True if equal, i.e. iff top of stack = 0
|
||||
TNE \*- : True if not equal, i.e. iff top of stack non zero
|
||||
TGE \*- : True if greater or equal, i.e. iff top of stack >= 0
|
||||
TGT \*- : True if greater, i.e. iff top of stack > 0
|
||||
|
||||
GROUP 13 - BRANCH
|
||||
|
||||
BRA \*b : Branch unconditionally to label \*b
|
||||
|
||||
BLT \*b : Branch less (pop 2 words, branch if top > second)
|
||||
BLE \*b : Branch less or equal
|
||||
BEQ \*b : Branch equal
|
||||
BNE \*b : Branch not equal
|
||||
BGE \*b : Branch greater or equal
|
||||
BGT \*b : Branch greater
|
||||
|
||||
ZLT \*b : Branch less than zero (pop 1 word, branch negative)
|
||||
ZLE \*b : Branch less or equal to zero
|
||||
ZEQ \*b : Branch equal zero
|
||||
ZNE \*b : Branch not zero
|
||||
ZGE \*b : Branch greater or equal zero
|
||||
ZGT \*b : Branch greater than zero
|
||||
|
||||
GROUP 14 - PROCEDURE CALL
|
||||
|
||||
CAI \*- : Call procedure (procedure identifier on stack)
|
||||
CAL \*p : Call procedure (with identifier \*p)
|
||||
LFR \*s : Load function result
|
||||
RET \*z : Return (function result consists of top \*z bytes)
|
||||
|
||||
GROUP 15 - MISCELLANEOUS
|
||||
|
||||
ASP \*f : Adjust the stack pointer by \*f
|
||||
ASS \*w : Adjust the stack pointer by \*w-byte integer
|
||||
BLM \*z : Block move \*z bytes; first pop destination addr, then source addr
|
||||
BLS \*w : Block move, size is in \*w-byte integer on top of stack
|
||||
CSA \*w : Case jump; address of jump table at top of stack
|
||||
CSB \*w : Table lookup jump; address of jump table at top of stack
|
||||
DCH \*- : Follow dynamic chain, convert LB to LB of caller
|
||||
DUP \*s : Duplicate top \*s bytes
|
||||
DUS \*w : Duplicate top \*w bytes
|
||||
EXG \*w : Exchange top \*w bytes
|
||||
FIL \*g : File name (external 4 := \*g)
|
||||
GTO \*g : Non-local goto, descriptor at \*g
|
||||
LIM \*- : Load 16 bit ignore mask
|
||||
LIN \*n : Line number (external 0 := \*n)
|
||||
LNI \*- : Line number increment
|
||||
LOR \*r : Load register (0=LB, 1=SP, 2=HP)
|
||||
LPB \*- : Convert local base to argument base
|
||||
MON \*- : Monitor call
|
||||
NOP \*- : No operation
|
||||
RCK \*w : Range check; trap on error
|
||||
RTT \*- : Return from trap
|
||||
SIG \*- : Trap errors to proc identifier on top of stack, -2 resets default
|
||||
SIM \*- : Store 16 bit ignore mask
|
||||
STR \*r : Store register (0=LB, 1=SP, 2=HP)
|
||||
TRP \*- : Cause trap to occur (Error number on stack)
|
||||
.DE 0
|
164
doc/em/descr.nr
Normal file
164
doc/em/descr.nr
Normal file
|
@ -0,0 +1,164 @@
|
|||
.SN 7
|
||||
.BP
|
||||
.S1 "DESCRIPTORS"
|
||||
Several instructions use descriptors, notably the range check instruction,
|
||||
the array instructions, the goto instruction and the case jump instructions.
|
||||
Descriptors reside in data space.
|
||||
They may be constructed at run time, but
|
||||
more often they are fixed and allocated in ROM data.
|
||||
.P
|
||||
All instructions using descriptors, except GTO, have as argument
|
||||
the size of the integers in the descriptor.
|
||||
All implementations have to allow integers of the size of a
|
||||
word in descriptors.
|
||||
All integers popped from the stack and used for indexing or comparing
|
||||
must have the same size as the integers in the descriptor.
|
||||
.S2 "Range check descriptors"
|
||||
Range check descriptors consist of two integers:
|
||||
.IS 2
|
||||
.PS 1 4 "" .
|
||||
.PT
|
||||
lower bound~~~~~~~signed
|
||||
.PT
|
||||
upper bound~~~~~~~signed
|
||||
.PE
|
||||
.IE
|
||||
The range check instruction checks an integer on the stack against
|
||||
these bounds and causes a trap if the value is outside the interval.
|
||||
The value itself is neither changed nor removed from the stack.
|
||||
.S2 "Array descriptors"
|
||||
Each array descriptor describes a single dimension.
|
||||
For multi-dimensional arrays, several array instructions are
|
||||
needed to access a single element.
|
||||
Array descriptors contain the following three integers:
|
||||
.IS 2
|
||||
.PS 1 4 "" .
|
||||
.PT
|
||||
lower bound~~~~~~~~~~~~~~~~~~~~~signed
|
||||
.PT
|
||||
upper bound - lower bound~~~~~~~unsigned
|
||||
.PT
|
||||
number of bytes per element~~~~~unsigned
|
||||
.PE
|
||||
.IE
|
||||
The array instructions LAR, SAR and AAR have the pointer to the start
|
||||
of the descriptor as operand on the stack.
|
||||
.sp
|
||||
The element A[I] is fetched as follows:
|
||||
.IS 2
|
||||
.PS 1 4 "" .
|
||||
.PT
|
||||
Stack the address of A (e.g., using LAE or LAL)
|
||||
.PT
|
||||
Stack the value of I (n-byte integer)
|
||||
.PT
|
||||
Stack the pointer to the descriptor (e.g., using LAE)
|
||||
.PT
|
||||
LAR n (n is the size of the integers in the descriptor and I)
|
||||
.PE
|
||||
.IE
|
||||
All array instructions first pop the address of the descriptor
|
||||
and the index.
|
||||
If the index is not within the bounds specified, a trap occurs.
|
||||
If ok, (I~-~lower bound) is multiplied
|
||||
by the number of bytes per element (the third word). The result is added
|
||||
to the address of A and replaces A on the stack.
|
||||
.A
|
||||
At this point LAR, SAR and AAR diverge.
|
||||
AAR is finished. LAR pops the address and fetches the data
|
||||
item,
|
||||
the size being specified by the descriptor.
|
||||
The usual restrictions for memory access must be obeyed.
|
||||
SAR pops the address and stores the
|
||||
data item now exposed.
|
||||
.S2 "Non-local goto descriptors"
|
||||
The GTO instruction provides a way of returning directly to any
|
||||
active procedure invocation.
|
||||
The argument of the instruction is the address of a descriptor
|
||||
containing three pointers:
|
||||
.IS 2
|
||||
.PS 1 4 "" .
|
||||
.PT
|
||||
value of PC after the jump
|
||||
.PT
|
||||
value of SP after the jump
|
||||
.PT
|
||||
value of LB after the jump
|
||||
.PE
|
||||
.IE
|
||||
GTO replaces the loads PC, SP and LB from the descriptor,
|
||||
thereby jumping to a procedure
|
||||
and removing zeor or more frames from the stack.
|
||||
The LB, SP and PC in the descriptor must belong to a
|
||||
dynamically enclosing procedure,
|
||||
because some EM implementations will need to backtrack through
|
||||
the dynamic chain and use the implementation dependent data
|
||||
in frames to restore registers etc.
|
||||
.S2 "Case descriptors"
|
||||
The case jump instructions CSA and CSB both
|
||||
provide multiway branches selected by a case index.
|
||||
Both fetch two operands from the stack:
|
||||
first a pointer to the low address of the case descriptor
|
||||
and then the case index.
|
||||
CSA uses the case index as index in the descriptor table, but CSB searches
|
||||
the table for an occurrence of the case index.
|
||||
Therefore, the descriptors for CSA and CSB,
|
||||
as shown in figure 4, are different.
|
||||
All pointers in the table must be addresses of instructions in the
|
||||
procedure executing the case instruction.
|
||||
.P
|
||||
CSA selects the new PC by indexing.
|
||||
If the index, a signed integer, is greater than or equal to
|
||||
the lower bound and less than or equal to the upper bound,
|
||||
then fetch the new PC from the list of instruction pointers by indexing with
|
||||
index-lower.
|
||||
The table does not contain the value of the upper bound,
|
||||
but the value of upper-lower as an unsigned integer.
|
||||
If the index is out of bounds or if the fetched pointer is 0,
|
||||
then fetch the default instruction pointer.
|
||||
If the resulting PC is 0, then trap.
|
||||
.P
|
||||
CSB selects the new PC by searching.
|
||||
The table is searched for an entry with index value equal to the case index.
|
||||
That entry or, if none is found, the default entry contains the
|
||||
new PC.
|
||||
When the resulting PC is 0, a trap is performed.
|
||||
.P
|
||||
The choice of which case instruction to use for
|
||||
each source language case statement
|
||||
is up to the front end.
|
||||
If the range of the index value is dense, i.e
|
||||
.DS
|
||||
(highest value - lowest value) / number of cases
|
||||
.DE 1
|
||||
is less than some threshold, then CSA is the obvious choice.
|
||||
If the range is sparse, CSB is better.
|
||||
.N 2
|
||||
.DS
|
||||
|--------------------| |--------------------| high address
|
||||
| pointer for upb | | pointer n-1 |
|
||||
|--------------------| |- - - - - - - |
|
||||
| . | | index n-1 |
|
||||
| . | |--------------------|
|
||||
| . | | . |
|
||||
| . | | . |
|
||||
| . | | . |
|
||||
| . | |--------------------|
|
||||
| . | | pointer 1 |
|
||||
|--------------------| |- - - - - - - |
|
||||
| pointer for lwb+1 | | index 1 |
|
||||
|--------------------| |--------------------|
|
||||
| pointer for lwb | | pointer 0 |
|
||||
|--------------------| |- - - - - - - |
|
||||
| upper - lower | | index 0 |
|
||||
|--------------------| |--------------------|
|
||||
| lower bound | | number of entries |
|
||||
|--------------------| |--------------------|
|
||||
| default pointer | | default pointer | low address
|
||||
|--------------------| |--------------------|
|
||||
|
||||
CSA descriptor CSB descriptor
|
||||
|
||||
|
||||
Figure 4. Descriptor layout for CSA and CSB
|
||||
.DE
|
377
doc/em/dspace.nr
Normal file
377
doc/em/dspace.nr
Normal file
|
@ -0,0 +1,377 @@
|
|||
.BP
|
||||
.SN 4
|
||||
.S1 "DATA ADDRESS SPACE"
|
||||
The data address space is divided into three parts, called 'areas',
|
||||
each with its own addressing method:
|
||||
global data area,
|
||||
local data area (including the stack),
|
||||
and heap data area.
|
||||
These data areas must be part of the same
|
||||
address space because all data is accessed by
|
||||
the same type of pointers.
|
||||
.P
|
||||
Space for global data is reserved using several pseudoinstructions in the
|
||||
assembly language, as described in
|
||||
the next paragraph and chapter 11.
|
||||
The size of the global data area is fixed per program.
|
||||
.A
|
||||
Global data is addressed absolutely in the machine language.
|
||||
Many instructions are available to address global data.
|
||||
They all have an absolute address as argument.
|
||||
Examples are LOE, LAE and STE.
|
||||
.P
|
||||
Part of the global data area is initialized by the
|
||||
compiler, the
|
||||
rest is not initialized at all or is initialized
|
||||
with a value, typically -32768 or 0.
|
||||
Part of the initialized global data may be made read-only
|
||||
if the implementation supports protection.
|
||||
.P
|
||||
The local data area is used as a stack,
|
||||
which grows from high to low addresses
|
||||
and contains some data for each active procedure
|
||||
invocation, called a 'frame'.
|
||||
The size of the local data area varies dynamically during
|
||||
execution.
|
||||
Below the current procedure frame resides the operand stack.
|
||||
The stack pointer SP always points to the bottom of
|
||||
the local data area.
|
||||
Local data is addressed by offsetting from the local base pointer LB.
|
||||
LB always points to the frame of the current procedure.
|
||||
Only the words of the current frame and the parameters
|
||||
can be addressed directly.
|
||||
Variables in other active procedures are addressed by following
|
||||
the chain of statically enclosing procedures using the LXL or LXA instruction.
|
||||
The variables in dynamically enclosing procedures can be
|
||||
addressed with the use of the DCH instruction.
|
||||
.A
|
||||
Many instructions have offsets to LB as argument,
|
||||
for instance LOL, LAL and STL.
|
||||
The arguments of these instructions range from -1 to some
|
||||
(negative) minimum
|
||||
for the access of local storage and from 0 to some (positive)
|
||||
maximum for parameter access.
|
||||
.P
|
||||
The procedure call instructions CAL and CAI each create a new frame
|
||||
on the stack.
|
||||
Each procedure has an assembly-time parameter specifying
|
||||
the number of bytes needed for local storage.
|
||||
This storage is allocated each time the procedure is called and
|
||||
must be a multiple of the wordsize.
|
||||
Each procedure, therefore, starts with a stack with the local variables
|
||||
already allocated.
|
||||
The return instructions RET and RTT remove a frame.
|
||||
The actual parameters must be removed by the calling procedure.
|
||||
.P
|
||||
RET may copy some words from the stack of
|
||||
the returning procedure to an unnamed 'function return area'.
|
||||
This area is available for 'READ-ONCE' access using the LFR instruction.
|
||||
The result of a LFR is only defined if the size used to fetch
|
||||
is identical to the size used in the last return.
|
||||
The instruction ASP, used to remove the parameters from the
|
||||
stack, the branch instruction BRA and the non-local goto
|
||||
instrucion GTO are the only ones that leave the contents of
|
||||
the 'function return area' intact.
|
||||
All other instructions are allowed to destroy the function
|
||||
return area.
|
||||
Thus parameters can be popped before fetching the function result.
|
||||
The maximum size of all function return areas is
|
||||
implementation dependent,
|
||||
but should allow procedure instance identifiers and all
|
||||
implemented objects of type integer, unsigned, float
|
||||
and pointer to be returned.
|
||||
In most implementations
|
||||
the maximum size of the function return
|
||||
area is twice the pointer size,
|
||||
because we want to be able to handle 'procedure instance
|
||||
identifiers' which consist of a procedure identifier and the LB
|
||||
of a frame belonging to that procedure.
|
||||
.P
|
||||
The heap data area grows upwards, to higher numbered
|
||||
addresses.
|
||||
It is initially empty.
|
||||
The initial value of the heap pointer HP
|
||||
marks the low end.
|
||||
The heap pointer may be manipulated
|
||||
by the LOR and STR instructions.
|
||||
The heap can only be addressed indirectly,
|
||||
by pointers derived from previous values of HP.
|
||||
.S2 "Global data area"
|
||||
The initial size of the global data area is determined at assembly time.
|
||||
Global data is allocated by several
|
||||
pseudoinstructions in the EM assembly
|
||||
language.
|
||||
Each pseudoinstruction allocates one or more bytes.
|
||||
The bytes allocated for a single pseudo form
|
||||
a 'block'.
|
||||
A block differs from a fragment, because,
|
||||
under certain conditions, several blocks are allocated
|
||||
in a single fragment.
|
||||
This guarantees that the bytes of these blocks
|
||||
are consecutive.
|
||||
.P
|
||||
Global data is addressed absolutely in binary
|
||||
machine language.
|
||||
Most compilers, however,
|
||||
cannot assign absolute addresses to their global variables,
|
||||
especially not if the language
|
||||
allows programs to be composed of several separately compiled modules.
|
||||
The assembly language therefore allows the compiler to name
|
||||
the first address of a global data block with an alphanumeric label.
|
||||
Moreover, the only way to address such a named global data block
|
||||
in the assembly language is by using its name.
|
||||
It is the task of the assembler/loader to
|
||||
translate these labels into absolute addresses.
|
||||
These labels may also be used
|
||||
in CON and ROM pseudoinstructions to initialize pointers.
|
||||
.P
|
||||
The pseudoinstruction CON allocates initialized data.
|
||||
ROM acts like CON but indicates that the initialized data will
|
||||
not change during execution of the program.
|
||||
The pseudoinstruction BSS allocates a block of uninitialized
|
||||
or identically initialized
|
||||
data.
|
||||
The pseudoinstruction HOL is similar to BSS,
|
||||
but it alters the meaning of subsequent absolute addressing in
|
||||
the assembly language.
|
||||
.P
|
||||
Another type of global data is a small block,
|
||||
called the ABS block, with an implementation defined size.
|
||||
Storage in this type of block can only be addressed
|
||||
absolutely in assembly language.
|
||||
The first word has address 0 and is used to maintain the
|
||||
source line number.
|
||||
Special instructions LIN and LNI are provided to
|
||||
update this counter.
|
||||
A pointer at location 4 points to a string containing the
|
||||
current source file name.
|
||||
The instruction FIL can be used to update the pointer.
|
||||
.P
|
||||
All numeric arguments of the instructions that address
|
||||
the global data area refer to locations in the
|
||||
ABS block unless
|
||||
they are preceded by at least one HOL pseudo in the same
|
||||
module,
|
||||
in which case they refer to the storage area allocated by the
|
||||
last HOL pseudoinstruction.
|
||||
Thus LOE 0 loads the zeroth word of the most recent HOL, unless no HOL has
|
||||
appeared in the current file so
|
||||
far, in which case it loads the zeroth word of the
|
||||
ABS fragment.
|
||||
.P
|
||||
The global data area is highly fragmented.
|
||||
The ABS block and each HOL and BSS block are separate fragments.
|
||||
The way fragments are formed from CON and ROM blocks is more complex.
|
||||
The assemblers group several blocks into a single fragment.
|
||||
A fragment only contains blocks of the same type: CON or ROM.
|
||||
It is guaranteed that the bytes allocated for two consecutive CON pseudos are
|
||||
allocated consecutively in a single fragment, unless
|
||||
these CON pseudos are separated in the assembly language program
|
||||
by a data label definition or one or more of the following pseudos:
|
||||
.DS
|
||||
|
||||
ROM, BSS, HOL and END
|
||||
|
||||
.DE
|
||||
An analogous rule holds for ROM pseudos.
|
||||
.S2 "Local data area"
|
||||
The local data area consists of a sequence of frames, one for
|
||||
each active procedure.
|
||||
Below the frame of the current procedure resides the
|
||||
expression stack.
|
||||
Frames are generated by procedure calls and are
|
||||
removed by procedure returns.
|
||||
A procedure frame consists of six 'zones':
|
||||
.DS
|
||||
|
||||
1. The return status block
|
||||
2. The local variables and compiler temporaries
|
||||
3. The register save block
|
||||
4. The dynamic local generators
|
||||
5. The operand stack.
|
||||
6. The parameters of a procedure one level deeper
|
||||
|
||||
.DE
|
||||
A sample frame is shown in Figure 1.
|
||||
.P
|
||||
Before a procedure call is performed the actual
|
||||
parameters are pushed onto the stack of the calling procedure.
|
||||
The exact details are compiler dependent.
|
||||
EM allows procedures to be called with a variable number of
|
||||
parameters.
|
||||
The implementation of the C-language almost forces its runtime
|
||||
system to push the parameters in reverse order, that is,
|
||||
the first positional parameter last.
|
||||
Most compilers use the C calling convention to be compatible.
|
||||
The parameters of a procedure belong to the frame of the
|
||||
calling procedure.
|
||||
Note that the evaluation of the actual parameters may imply
|
||||
the calling of procedures.
|
||||
The parameters can be accessed with certain instructions using
|
||||
offsets of 0 and greater.
|
||||
The first byte of the last parameter pushed has offset 0.
|
||||
Note that the parameter at offset 0 has a special use in the
|
||||
instructions following the static chain (LXL and LXA).
|
||||
These instructions assume that this parameter contains the LB of
|
||||
the statically enclosing procedure.
|
||||
Procedures that do not have a dynamically enclosing procedure
|
||||
do not need a static link at offset 0.
|
||||
.P
|
||||
Two instructions are available to perform procedure calls, CAL
|
||||
and CAI.
|
||||
Several tasks are performed by these call instructions.
|
||||
.A
|
||||
First, a part of the status of the calling procedure is
|
||||
saved on the stack in the return status block.
|
||||
This block should contain the return address of the calling
|
||||
procedure, its LB and other implementation dependent data.
|
||||
The size of this block is fixed for any given implementation
|
||||
because the lexical instructions LPB, LXL and LXA must be able to
|
||||
obtain the base addresses of the procedure parameters \fBand\fP local
|
||||
variables.
|
||||
An alternative solution can be used on machines with a highly
|
||||
segmented address space.
|
||||
The stack frames need not be contiguous then and the first
|
||||
status save area can contain the parameter base AB,
|
||||
which has the value of SP just after the last parameter has
|
||||
been pushed.
|
||||
.A
|
||||
Second, the LB is changed to point to the
|
||||
first word above the local variables.
|
||||
The new LB is a copy of the SP after the return status
|
||||
block has been pushed.
|
||||
.A
|
||||
Third, the amount of local storage needed by the procedure is
|
||||
reserved.
|
||||
The parameters and local storage are accessed by the same instructions.
|
||||
Negative offsets are used for access to local variables.
|
||||
The highest byte, that is the byte nearest
|
||||
to LB, has to be accessed with offset -1.
|
||||
The pseudoinstruction specifying the entry point of a
|
||||
procedure, has an argument that specifies the amount of local
|
||||
storage needed.
|
||||
The local variables allocated by the CAI or CAL instructions
|
||||
are the only ones that can be accessed with a fixed negative offset.
|
||||
The initial value of the allocated words is
|
||||
not defined, but implementations that check for undefined
|
||||
values will probably initialize them with a
|
||||
special 'undefined' pattern, typically -32768.
|
||||
.A
|
||||
Fourth, any EM implementation is allowed to reserve a variable size
|
||||
block beneath the local variables.
|
||||
This block could, for example, be used to save a variable number
|
||||
of registers.
|
||||
.A
|
||||
Finally, the address of the entry point of the called procedure
|
||||
is loaded into the Program Counter.
|
||||
.P
|
||||
The ASP instruction can be used to allocate further (dynamic)
|
||||
local storage.
|
||||
The base address of such storage must be obtained with a LOR~SP
|
||||
instruction.
|
||||
This same instruction ASP may also be used
|
||||
to remove some words from the stack.
|
||||
.P
|
||||
There is a version of ASP, called ASS, which fetches the number
|
||||
of bytes to allocate from the stack.
|
||||
It can be used to allocate space for local
|
||||
objects whose size is unknown at compile time,
|
||||
so called 'dynamic local generators'.
|
||||
.P
|
||||
Control is returned to the calling procedure with a RET instruction.
|
||||
Any return value is then copied to the 'function return area'.
|
||||
The frame created by the call is deallocated and the status of
|
||||
the calling procedure is restored.
|
||||
The value of SP just after the return value has been popped must
|
||||
be the same as the
|
||||
value of SP just before executing the first instruction of this
|
||||
invocation.
|
||||
This means that when a RET is executed the operand stack can
|
||||
only contain the return value and all dynamically generated locals must be
|
||||
deallocated.
|
||||
Violating this restriction might result in hard to detect
|
||||
errors.
|
||||
The calling procedure has to remove the parameters from the stack.
|
||||
This can be done with the aforementioned ASP instruction.
|
||||
.P
|
||||
Each procedure frame is a separate fragment.
|
||||
Because any fragment may be placed anywhere in memory,
|
||||
procedure frames need not be contiguous.
|
||||
.DS
|
||||
|===============================|
|
||||
| actual parameter n-1 |
|
||||
|-------------------------------|
|
||||
| . |
|
||||
| . |
|
||||
| . |
|
||||
|-------------------------------|
|
||||
| actual parameter 0 | ( <- AB )
|
||||
|===============================|
|
||||
|
||||
|
||||
|===============================|
|
||||
|///////////////////////////////|
|
||||
|///// return status block /////|
|
||||
|///////////////////////////////| <- LB
|
||||
|===============================|
|
||||
| |
|
||||
| local variables |
|
||||
| |
|
||||
|-------------------------------|
|
||||
| |
|
||||
| compiler temporaries |
|
||||
| |
|
||||
|===============================|
|
||||
|///////////////////////////////|
|
||||
|///// register save block /////|
|
||||
|///////////////////////////////|
|
||||
|===============================|
|
||||
| |
|
||||
| dynamic local generators |
|
||||
| |
|
||||
|===============================|
|
||||
| operand |
|
||||
|-------------------------------|
|
||||
| operand |
|
||||
|===============================|
|
||||
| parameter m-1 |
|
||||
|-------------------------------|
|
||||
| . |
|
||||
| . |
|
||||
| . |
|
||||
|-------------------------------|
|
||||
| parameter 0 | <- SP
|
||||
|===============================|
|
||||
|
||||
Figure 1. A sample procedure frame and parameters.
|
||||
.DE
|
||||
.S2 "Heap data area"
|
||||
The heap area starts empty, with HP
|
||||
pointing to the low end of it.
|
||||
HP always contains a word address.
|
||||
A copy of HP can always be obtained with the LOR instruction.
|
||||
A new value may be stored in the heap pointer using the STR instruction.
|
||||
If the new value is greater than the old one,
|
||||
then the heap grows.
|
||||
If it is smaller, then the heap shrinks.
|
||||
HP may never point below its original value.
|
||||
All words between the current HP and the original HP
|
||||
are allocated to the heap.
|
||||
The heap may not grow into a part of memory that is already allocated
|
||||
for the stack.
|
||||
When this is attempted, the STR instruction will cause a trap to occur.
|
||||
.P
|
||||
The only way to address the heap is indirectly.
|
||||
Whenever an object is allocated by increasing HP,
|
||||
then the old HP value must be saved and can be used later to address
|
||||
the allocated object.
|
||||
If, in the meantime, HP is decreased so that the object
|
||||
is no longer part of the heap, then an attempt to access
|
||||
the object is not allowed.
|
||||
Furthermore, if the heap pointer is increased again to above
|
||||
the object address, then access to the old object gives undefined results.
|
||||
.P
|
||||
The heap is a single fragment.
|
||||
All bytes have consecutive addresses.
|
||||
No limits are imposed on the size of the heap as long as it fits
|
||||
in the available data address space.
|
9
doc/em/even.c
Normal file
9
doc/em/even.c
Normal file
|
@ -0,0 +1,9 @@
|
|||
main() {
|
||||
register int l,j ;
|
||||
|
||||
for ( j=0 ; (l=getchar()) != -1 ; j++ ) {
|
||||
if ( j%16 == 15 ) printf("%3d\n",l&0377 ) ;
|
||||
else printf("%3d ",l&0377 ) ;
|
||||
}
|
||||
printf("\n") ;
|
||||
}
|
178
doc/em/exam.e
Normal file
178
doc/em/exam.e
Normal file
|
@ -0,0 +1,178 @@
|
|||
mes 2,2,2 ; wordsize 2, pointersize 2
|
||||
.1
|
||||
rom 't.p\000' ; the name of the source file
|
||||
hol 552,-32768,0 ; externals and buf occupy 552 bytes
|
||||
exp $sum ; sum can be called from other modules
|
||||
pro $sum,2 ; procedure sum; 2 bytes local storage
|
||||
lin 8 ; code from source line 8
|
||||
ldl 0 ; load two locals ( a and b )
|
||||
adi 2 ; add them
|
||||
ret 2 ; return the result
|
||||
end 2 ; end of procedure ( still two bytes local storage )
|
||||
.2
|
||||
rom 1,99,2 ; descriptor of array a[]
|
||||
exp $test ; the compiler exports all level 0 procedures
|
||||
pro $test,226 ; procedure test, 226 bytes local storage
|
||||
.3
|
||||
rom 4.8F8 ; assemble Floating point 4.8 (8 bytes) in
|
||||
.4 ; global storage
|
||||
rom 0.5F8 ; same for 0.5
|
||||
mes 3,-226,2,2 ; compiler temporary not referenced indirect
|
||||
mes 3,-24,2,0 ; the same is true for i, j, b and c in test
|
||||
mes 3,-22,2,0
|
||||
mes 3,-4,2,0
|
||||
mes 3,-2,2,0
|
||||
mes 3,-20,8,0 ; and for x and y
|
||||
mes 3,-12,8,0
|
||||
lin 20 ; maintain source line number
|
||||
loc 1
|
||||
stl -4 ; j := 1
|
||||
lni ; was lin 21 prior to optimization
|
||||
lol -4
|
||||
loc 3
|
||||
mli 2
|
||||
loc 6
|
||||
adi 2
|
||||
stl -2 ; i := 3 * j + 6
|
||||
lni ; was lin 22 prior to optimization
|
||||
lae .3
|
||||
loi 8
|
||||
lal -12
|
||||
sti 8 ; x := 4.8
|
||||
lni ; was lin 23 prior to optimization
|
||||
lal -12
|
||||
loi 8
|
||||
lae .4
|
||||
loi 8
|
||||
dvf 8
|
||||
lal -20
|
||||
sti 8 ; y := x / 0.5
|
||||
lni ; was lin 24 prior to optimization
|
||||
loc 1
|
||||
stl -22 ; b := true
|
||||
lni ; was lin 25 prior to optimization
|
||||
loc 122
|
||||
stl -24 ; c := 'z'
|
||||
lni ; was lin 26 prior to optimization
|
||||
loc 1
|
||||
stl -2 ; for i:= 1
|
||||
2
|
||||
lol -2
|
||||
dup 2
|
||||
mli 2 ; i*i
|
||||
lal -224
|
||||
lol -2
|
||||
lae .2
|
||||
sar 2 ; a[i] :=
|
||||
lol -2
|
||||
loc 100
|
||||
beq *3 ; to 100 do
|
||||
inl -2 ; increment i and loop
|
||||
bra *2
|
||||
3
|
||||
lin 27
|
||||
lol -4
|
||||
loc 27
|
||||
adi 2 ; j + 27
|
||||
sil 0 ; r.r1 :=
|
||||
lni ; was lin 28 prior to optimization
|
||||
lol -22 ; b
|
||||
lol 0
|
||||
stf 10 ; r.r3 :=
|
||||
lni ; was lin 29 prior to optimization
|
||||
lal -20
|
||||
loi 16
|
||||
adf 8 ; x + y
|
||||
lol 0
|
||||
adp 2
|
||||
sti 8 ; r.r2 :=
|
||||
lni ; was lin 23 prior to optimization
|
||||
lal -224
|
||||
lol -4
|
||||
lae .2
|
||||
lar 2 ; a[j]
|
||||
lil 0 ; r.r1
|
||||
cal $sum ; call now
|
||||
asp 4 ; remove parameters from stack
|
||||
lfr 2 ; get function result
|
||||
stl -2 ; i :=
|
||||
4
|
||||
lin 31
|
||||
lol -2
|
||||
zle *5 ; while i > 0 do
|
||||
lol -4
|
||||
lil 0
|
||||
adi 2
|
||||
stl -4 ; j := j + r.r1
|
||||
del -2 ; i := i - 1
|
||||
bra *4 ; loop
|
||||
5
|
||||
lin 32
|
||||
lol 0
|
||||
stl -226 ; make copy of address of r
|
||||
lol -22
|
||||
lol -226
|
||||
stf 10 ; r3 := b
|
||||
lal -20
|
||||
loi 16
|
||||
adf 8
|
||||
lol -226
|
||||
adp 2
|
||||
sti 8 ; r2 := x + y
|
||||
loc 0
|
||||
sil -226 ; r1 := 0
|
||||
lin 34 ; note the abscence of the unnecesary jump
|
||||
lae 22 ; address of output structure
|
||||
lol -4
|
||||
cal $_wri ; write integer with default width
|
||||
asp 4 ; pop parameters
|
||||
lae 22
|
||||
lol -2
|
||||
loc 6
|
||||
cal $_wsi ; write integer width 6
|
||||
asp 6
|
||||
lae 22
|
||||
lal -12
|
||||
loi 8
|
||||
loc 9
|
||||
loc 3
|
||||
cal $_wrf ; write fixed format real, width 9, precision 3
|
||||
asp 14
|
||||
lae 22
|
||||
lol -22
|
||||
cal $_wrb ; write boolean, default width
|
||||
asp 4
|
||||
lae 22
|
||||
cal $_wln ; writeln
|
||||
asp 2
|
||||
ret 0 ; return, no result
|
||||
end 226
|
||||
exp $_main
|
||||
pro $_main,0 ; main program
|
||||
.6
|
||||
con 2,-1,22 ; description of external files
|
||||
.5
|
||||
rom 15.96F8
|
||||
fil .1 ; maintain source file name
|
||||
lae .6 ; description of external files
|
||||
lae 0 ; base of hol area to relocate buffer addresses
|
||||
cal $_ini ; initialize files, etc...
|
||||
asp 4
|
||||
lin 37
|
||||
lae .5
|
||||
loi 8
|
||||
lae 2
|
||||
sti 8 ; x := 15.9
|
||||
lni ; was lin 38 prior to optimization
|
||||
loc 99
|
||||
ste 0 ; mi := 99
|
||||
lni ; was lin 39 prior to optimization
|
||||
lae 10 ; address of r
|
||||
cal $test
|
||||
asp 2
|
||||
loc 0 ; normal exit
|
||||
cal $_hlt ; cleanup and finish
|
||||
asp 2
|
||||
end 0
|
||||
mes 4,40 ; length of source file is 40 lines
|
||||
mes 5 ; reals were used
|
40
doc/em/exam.p
Normal file
40
doc/em/exam.p
Normal file
|
@ -0,0 +1,40 @@
|
|||
program example(output);
|
||||
{This program just demonstrates typical EM code.}
|
||||
type rec = record r1: integer; r2:real; r3: boolean end;
|
||||
var mi: integer; mx:real; r:rec;
|
||||
|
||||
function sum(a,b:integer):integer;
|
||||
begin
|
||||
sum := a + b
|
||||
end;
|
||||
|
||||
procedure test(var r: rec);
|
||||
label 1;
|
||||
var i,j: integer;
|
||||
x,y: real;
|
||||
b: boolean;
|
||||
c: char;
|
||||
a: array[1..100] of integer;
|
||||
|
||||
begin
|
||||
j := 1;
|
||||
i := 3 * j + 6;
|
||||
x := 4.8;
|
||||
y := x/0.5;
|
||||
b := true;
|
||||
c := 'z';
|
||||
for i:= 1 to 100 do a[i] := i * i;
|
||||
r.r1 := j+27;
|
||||
r.r3 := b;
|
||||
r.r2 := x+y;
|
||||
i := sum(r.r1, a[j]);
|
||||
while i > 0 do begin j := j + r.r1; i := i - 1 end;
|
||||
with r do begin r3 := b; r2 := x+y; r1 := 0 end;
|
||||
goto 1;
|
||||
1: writeln(j, i:6, x:9:3, b)
|
||||
end; {test}
|
||||
begin {main program}
|
||||
mx := 15.96;
|
||||
mi := 99;
|
||||
test(r)
|
||||
end.
|
180
doc/em/intro.nr
Normal file
180
doc/em/intro.nr
Normal file
|
@ -0,0 +1,180 @@
|
|||
.BP
|
||||
.S1 "INTRODUCTION"
|
||||
EM is a family of intermediate languages designed for producing
|
||||
portable compilers.
|
||||
The general strategy is for a program called
|
||||
.B front end
|
||||
to translate the source program to EM.
|
||||
Another program,
|
||||
.B back
|
||||
.BW end
|
||||
translates EM to target assembly language.
|
||||
Alternatively, the EM code can be assembled to a binary form
|
||||
and interpreted.
|
||||
These considerations led to the following goals:
|
||||
.IS 2 10
|
||||
.PS 1 4
|
||||
.PT
|
||||
The design should allow translation to,
|
||||
or interpretation on, a wide range of existing machines.
|
||||
Design decisions should be delayed as far as possible
|
||||
and the implications of these decisions should
|
||||
be localized as much as possible.
|
||||
.N
|
||||
The current microcomputer technology offers 8, 16 and 32 bit machines
|
||||
with various sizes of address space.
|
||||
EM should be flexible enough to be useful on most of these
|
||||
machines.
|
||||
The differences between the members of the EM family should only
|
||||
concern the wordsize and address space size.
|
||||
.PT
|
||||
The architecture should ease the task of code generation for
|
||||
high level languages such as Pascal, C, Ada, Algol 68, BCPL.
|
||||
.PT
|
||||
The instruction set used by the interpreter should be compact,
|
||||
to reduce the amount of memory needed
|
||||
for program storage, and to reduce the time needed to transmit
|
||||
programs over communication lines.
|
||||
.PT
|
||||
It should be designed with microprogrammed implementations in
|
||||
mind; in particular, the use of many short fields within
|
||||
instruction opcodes should be avoided, because their extraction by the
|
||||
microprogram or conversion to other instruction formats is inefficient.
|
||||
.PE
|
||||
.IE
|
||||
.A
|
||||
The basic architecture is based on the concept of a stack. The stack
|
||||
is used for procedure return addresses, actual parameters, local variables,
|
||||
and arithmetic operations.
|
||||
There are several built-in object types,
|
||||
for example, signed and unsigned integers,
|
||||
floating point numbers, pointers and sets of bits.
|
||||
There are instructions to push and pop objects
|
||||
to and from the stack.
|
||||
The push and pop instructions are not typed.
|
||||
They only care about the size of the objects.
|
||||
For each built-in type there are
|
||||
reverse Polish type instructions that pop one or more
|
||||
objects from the top of
|
||||
the stack, perform an operation, and push the result back onto the
|
||||
stack.
|
||||
For all types except pointers,
|
||||
these instructions have the object size
|
||||
as argument.
|
||||
.P
|
||||
There are no visible general registers used for arithmetic operands
|
||||
etc. This is in contrast to most third generation computers, which usually
|
||||
have 8 or 16 general registers. The decision not to have a group of
|
||||
general registers was fully intentional, and follows W.L. Van der
|
||||
Poel's dictum that a machine should have 0, 1, or an infinite
|
||||
number of any feature. General registers have two primary uses: to hold
|
||||
intermediate results of complicated expressions, e.g.
|
||||
.IS 5 0 1
|
||||
((a*b + c*d)/e + f*g/h) * i
|
||||
.IE 1
|
||||
and to hold local variables.
|
||||
.P
|
||||
Various studies
|
||||
have shown that the average expression has fewer than two operands,
|
||||
making the former use of registers of doubtful value. The present trend
|
||||
toward structured programs consisting of many small
|
||||
procedures greatly reduces the value of registers to hold local variables
|
||||
because the large number of procedure calls implies a large overhead in
|
||||
saving and restoring the registers at every call.
|
||||
.BP
|
||||
.P
|
||||
Although there are no general purpose registers, there are a
|
||||
few internal registers with specific functions as follows:
|
||||
.IS 2
|
||||
.N 1
|
||||
.TS
|
||||
tab(:);
|
||||
l 1 l l.
|
||||
PC:-:Program Counter:Pointer to next instruction
|
||||
LB:-:Local Base:Points to base of the local variables \
|
||||
in the current procedure.
|
||||
SP:-:Stack Pointer:Points to the highest occupied word on the stack.
|
||||
HP:-:Heap Pointer:Points to the top of the heap area.
|
||||
.TE 1
|
||||
.IE
|
||||
.A
|
||||
Furthermore, reverse Polish code is much easier to generate than
|
||||
multi-register machine code, especially if highly efficient code is
|
||||
desired.
|
||||
When translating to assembly language the back end can make
|
||||
good use of the target machine's registers.
|
||||
An EM machine can
|
||||
achieve high performance by keeping part of the stack
|
||||
in high speed storage (a cache or microprogram scratchpad memory) rather
|
||||
than in primary memory.
|
||||
.P
|
||||
Again according to van der Poel's dictum,
|
||||
all EM instructions have zero or one argument.
|
||||
We believe that instructions needing two arguments
|
||||
can be split into two simpler ones.
|
||||
The simpler ones can probably be used in other
|
||||
circumstances as well.
|
||||
Moreover, these two instructions together often
|
||||
have a shorter encoding than the single
|
||||
instruction before.
|
||||
.P
|
||||
This document describes EM at three different levels:
|
||||
the abstract level, the assembly language level and
|
||||
the machine language level.
|
||||
.A
|
||||
The most important level is that of the abstract EM architecture.
|
||||
This level deals with the basic design issues.
|
||||
Only the functional capabilities of instructions are relevant, not their
|
||||
format or encoding.
|
||||
Most chapters of this document refer to the abstract level
|
||||
and it is explicitly stated whenever
|
||||
another level is described.
|
||||
.A
|
||||
The assembly language is intended for the compiler writer.
|
||||
It presents a more or less orthogonal instruction
|
||||
set and provides symbolic names for data.
|
||||
Moreover, it facilitates the linking of
|
||||
separately compiled 'modules' into a single program
|
||||
by providing several pseudoinstructions.
|
||||
.A
|
||||
The machine language is designed for interpretation with a compact
|
||||
program text and easy decoding.
|
||||
The binary representation of the machine language instruction set is
|
||||
far from orthogonal.
|
||||
Frequent instructions have a short opcode.
|
||||
The encoding is fully byte oriented.
|
||||
These bytes do not contain small bit fields, because
|
||||
bit fields would slow down decoding considerably.
|
||||
.P
|
||||
A common use for EM is for producing portable (cross) compilers.
|
||||
When used this way, the compilers produce
|
||||
EM assembly language as their output.
|
||||
To run the compiled program on the target machine,
|
||||
the back end, translates the EM assembly language to
|
||||
the target machine's assembly language.
|
||||
When this approach is used, the format of the EM
|
||||
machine language instructions is irrelevant.
|
||||
On the other hand, when writing an interpreter for EM machine language
|
||||
programs, the interpreter must deal with the machine language
|
||||
and not with the symbolic assembly language.
|
||||
.P
|
||||
As mentioned above, the
|
||||
current microcomputer technology offers 8, 16 and 32 bit
|
||||
machines with address spaces ranging from 2\v'-0.5m'16\v'0.5m'
|
||||
to 2\v'-0.5m'32\v'0.5m' bytes.
|
||||
Having one size of pointers and integers restricts
|
||||
the usefulness of the language.
|
||||
We decided to have a different language for each combination of
|
||||
word and pointer size.
|
||||
All languages offer the same instruction set and differ only in
|
||||
memory alignment restrictions and the implicit size assumed in
|
||||
several instructions.
|
||||
The languages
|
||||
differ slightly for the
|
||||
different size combinations.
|
||||
For example: the
|
||||
size of any object on the stack and alignment restrictions.
|
||||
The wordsize is restricted to powers of 2 and
|
||||
the pointer size must be a multiple of the wordsize.
|
||||
Almost all programs handling EM will be parametrized with word
|
||||
and pointer size.
|
376
doc/em/iotrap.nr
Normal file
376
doc/em/iotrap.nr
Normal file
|
@ -0,0 +1,376 @@
|
|||
.SN 8
|
||||
.VS 1 0
|
||||
.BP
|
||||
.S1 "ENVIRONMENT INTERACTIONS"
|
||||
EM programs can interact with their environment in three ways.
|
||||
Two, starting/stopping and monitor calls, are dealt with in this chapter.
|
||||
The remaining way to interact, interrupts, will be treated
|
||||
together with traps in chapter 9.
|
||||
.S2 "Program starting and stopping"
|
||||
EM user programs start with a call to a procedure called
|
||||
m_a_i_n.
|
||||
The assembler and backends look for the definition of a procedure
|
||||
with this name in their input.
|
||||
The call passes three parameters to the procedure.
|
||||
The parameters are similar to the parameters supplied by the
|
||||
UNIX
|
||||
.FS
|
||||
UNIX is a Trademark of Bell Laboratories.
|
||||
.FE
|
||||
operating system to C programs.
|
||||
These parameters are often called
|
||||
.BW argc ,
|
||||
.B argv
|
||||
and
|
||||
.BW envp .
|
||||
Argc is the parameter nearest to LB and is a wordsized integer.
|
||||
The other two are pointers to the first element of an array of
|
||||
string pointers.
|
||||
.N
|
||||
The
|
||||
.B argv
|
||||
array contains
|
||||
.B argc
|
||||
strings, the first of which contains the program call name.
|
||||
The other strings in the
|
||||
.B argv
|
||||
array are the program parameters.
|
||||
.P
|
||||
The
|
||||
.B envp
|
||||
array contains strings in the form "name=string", where 'name'
|
||||
is the name of an environment variable and string its value.
|
||||
The
|
||||
.B envp
|
||||
is terminated by a zero pointer.
|
||||
.P
|
||||
An EM user program stops if the program returns from the first
|
||||
invocation of m_a_i_n.
|
||||
The contents of the function return area are used to procure a
|
||||
wordsized program return code.
|
||||
EM programs also stop when traps and interrupts occur that are
|
||||
not caught and when the exit monitor call is executed.
|
||||
.S2 "Input/Output and other monitor calls"
|
||||
EM differs from most conventional machines in that it has high level i/o
|
||||
instructions.
|
||||
Typical instructions are OPEN FILE and READ FROM FILE instead
|
||||
of low level instructions such as setting and clearing
|
||||
bits in device registers.
|
||||
By providing such high level i/o primitives, the task of implementing
|
||||
EM on various non EM machines is made considerably easier.
|
||||
.P
|
||||
I/O is initiated by the MON instruction, which expects an iocode on top
|
||||
of the stack.
|
||||
Often there are also parameters which are pushed on the
|
||||
stack in reverse order, that is: last
|
||||
parameter first.
|
||||
Some i/o functions also provide results, which are returned on the stack.
|
||||
In the list of monitor calls we use several types of parameters and results,
|
||||
these types consist of integers and unsigneds of varying sizes, but never
|
||||
smaller than the wordsize, and the two pointer types.
|
||||
.N 1
|
||||
The names of the types used are:
|
||||
.IS 4
|
||||
.PS - 10
|
||||
.PT int
|
||||
an integer of wordsize
|
||||
.PT int2
|
||||
an integer whose size is the maximum of the wordsize and 2
|
||||
bytes
|
||||
.PT int4
|
||||
an integer whose size is the maximum of the wordsize and 4
|
||||
bytes
|
||||
.PT intp
|
||||
an integer with the size of a pointer
|
||||
.PT uns2
|
||||
an unsigned integer whose size is the maximum of the wordsize and 2
|
||||
.PT unsp
|
||||
an unsigned integer with the size of a pointer
|
||||
.PT ptr
|
||||
a pointer into data space
|
||||
.PE 1
|
||||
.IE 0
|
||||
The table below lists the i/o codes with their results and
|
||||
parameters.
|
||||
This list is similar to the system calls of the UNIX Version 7
|
||||
operating system.
|
||||
.BP
|
||||
.A
|
||||
To execute a monitor call, proceed as follows:
|
||||
.IS 2
|
||||
.N 1
|
||||
.PS a 4 "" )
|
||||
.PT
|
||||
Stack the parameters, in reverse order, last parameter first.
|
||||
.PT
|
||||
Push the monitor call number (iocode) onto the stack.
|
||||
.PT
|
||||
Execute the MON instruction.
|
||||
.PE 1
|
||||
.IE
|
||||
An error code is present on the top of the stack after
|
||||
execution of most monitor calls.
|
||||
If this error code is zero, the call performed the action
|
||||
requested and the results are available on top of the stack.
|
||||
Non-zero error codes indicate a failure, in this case no
|
||||
results are available and the error code has been pushed twice.
|
||||
This construction enables programs to test for failure with a
|
||||
single instruction (~TEQ or TNE~) and still find out the cause of
|
||||
the failure.
|
||||
The result name 'e' is reserved for the error code.
|
||||
.N 1
|
||||
List of monitor calls.
|
||||
.DS B
|
||||
number name parameters results function
|
||||
|
||||
1 Exit status:int Terminate this process
|
||||
2 Fork e,flag,pid:int Spawn new process
|
||||
3 Read fildes:int;buf:ptr;nbytes:unsp
|
||||
e:int;rbytes:unsp Read from file
|
||||
4 Write fildes:int;buf:ptr;nbytes:unsp
|
||||
e:int;wbytes:unsp Write on a file
|
||||
5 Open string:ptr;flag:int
|
||||
e,fildes:int Open file for read and/or write
|
||||
6 Close fildes:int e:int Close a file
|
||||
7 Wait e:int;status,pid:int2
|
||||
Wait for child
|
||||
8 Creat string:ptr;mode:int
|
||||
e,fildes:int Create a new file
|
||||
9 Link string1,string2:ptr
|
||||
e:int Link to a file
|
||||
10 Unlink string:ptr e:int Remove directory entry
|
||||
12 Chdir string:ptr e:int Change default directory
|
||||
14 Mknod string:ptr;mode,addr:int2
|
||||
e:int Make a special file
|
||||
15 Chmod string:ptr;mode:int2
|
||||
e:int Change mode of file
|
||||
16 Chown string:ptr;owner,group:int2
|
||||
e:int Change owner/group of a file
|
||||
18 Stat string,statbuf:ptr
|
||||
e:int Get file status
|
||||
19 Lseek fildes:int;off:int4;whence:int
|
||||
e:int;oldoff:int4 Move read/write pointer
|
||||
20 Getpid pid:int2 Get process identification
|
||||
21 Mount special,string:ptr;rwflag:int
|
||||
e:int Mount file system
|
||||
22 Umount special:ptr e:int Unmount file system
|
||||
23 Setuid userid:int2 e:int Set user ID
|
||||
24 Getuid e_uid,r_uid:int2 Get user ID
|
||||
25 Stime time:int4 e:int Set time and date
|
||||
26 Ptrace request:int;pid:int2;addr:ptr;data:int
|
||||
e,value:int Process trace
|
||||
27 Alarm seconds:uns2 previous:uns2 Schedule signal
|
||||
28 Fstat fildes:int;statbuf:ptr
|
||||
e:int Get file status
|
||||
29 Pause Stop until signal
|
||||
30 Utime string,timep:ptr
|
||||
e:int Set file times
|
||||
33 Access string,mode:int e:int Determine file accessibility
|
||||
34 Nice incr:int Set program priority
|
||||
35 Ftime bufp:ptr e:int Get date and time
|
||||
36 Sync Update filesystem
|
||||
37 Kill pid:int2;sig:int
|
||||
e:int Send signal to a process
|
||||
41 Dup fildes,newfildes:int
|
||||
e,fildes:int Duplicate a file descriptor
|
||||
42 Pipe e,w_des,r_des:int Create a pipe
|
||||
43 Times buffer:ptr Get process times
|
||||
44 Profil buff:ptr;bufsiz,offset,scale:intp Execution time profile
|
||||
46 Setgid gid:int2 e:int Set group ID
|
||||
47 Getgid e_gid,r_gid:int Get group ID
|
||||
48 Sigtrp trapno,signo:int
|
||||
e,prevtrap:int See below
|
||||
51 Acct file:ptr e:int Turn accounting on or off
|
||||
53 Lock flag:int e:int Lock a process
|
||||
54 Ioctl fildes,request:int;argp:ptr
|
||||
e:int Control device
|
||||
56 Mpxcall cmd:int;vec:ptr e:int Multiplexed file handling
|
||||
59 Exece name,argv,envp:ptr
|
||||
e:int Execute a file
|
||||
60 Umask complmode:int2 oldmask:int2 Set file creation mode mask
|
||||
61 Chroot string:ptr e:int Change root directory
|
||||
.DE 1
|
||||
Codes 0, 11, 13, 17, 31, 32, 38, 39, 40, 45, 49, 50, 52,
|
||||
55, 57, 58, 62, and 63 are
|
||||
not used.
|
||||
.P
|
||||
All monitor calls, except fork and sigtrp
|
||||
are the same as the UNIX version 7 system calls.
|
||||
.P
|
||||
The sigtrp entry maps UNIX signals onto EM interrupts.
|
||||
Normally, trapno is in the range 0 to 252.
|
||||
In that case it requests that signal signo
|
||||
will cause trap trapno to occur.
|
||||
When given trap number -2, default signal handling is reset, and when given
|
||||
trap number -3, the signal is ignored.
|
||||
.P
|
||||
The flag returned by fork is 1 in the child process and 0 in
|
||||
the parent.
|
||||
The pid returned is the process-id of the other process.
|
||||
.BP
|
||||
.S1 "TRAPS AND INTERRUPTS"
|
||||
EM provides a means for the user program to catch all traps
|
||||
generated by the program itself, the hardware, or external conditions.
|
||||
This mechanism uses five instructions: LIM, SIM, SIG, TRP and RTT.
|
||||
This section of the manual may be omitted on the first reading since it
|
||||
presupposes knowledge of the EM instruction set.
|
||||
.P
|
||||
The action taken when a trap occures is determined by the value
|
||||
of an internal EM trap register.
|
||||
This register contains a pointer to a procedure.
|
||||
Initially the pointer used is zero and all traps halt the
|
||||
program with, hopefully, a useful message to the outside world.
|
||||
The SIG instruction can be used to alter the trap register,
|
||||
it pops a procedure pointer from the
|
||||
stack into the trap register.
|
||||
When a trap occurs after storing a nonzero value in the trap
|
||||
register, the procedure pointed to by the trap register
|
||||
is called with the trap number
|
||||
as the only parameter (see below).
|
||||
SIG returns the previous value of the trap register on the
|
||||
stack.
|
||||
Two consecutive SIGs are a no-op.
|
||||
When a trap occurs, the trap register is reset to its initial
|
||||
condition, to prevent recursive traps from hanging the machine up,
|
||||
e.g. stack overflow in the stack overflow handling procedure.
|
||||
.P
|
||||
The runtime systems for some languages need to ignore some EM
|
||||
traps.
|
||||
EM offers a feature called the ignore mask.
|
||||
It contains one bit for each of the lowest 16 trap numbers.
|
||||
The bits are numbered 0 to 15, with the least significant bit
|
||||
having number 0.
|
||||
If a certain bit is 1 the corresponding trap never
|
||||
occurs and processing simply continues.
|
||||
The actions performed by the offending instruction are
|
||||
described by the Pascal program in appendix A.
|
||||
.N
|
||||
If the bit is 0, traps are not ignored.
|
||||
The instructions LIM and SIM allow copying and replacement of
|
||||
the ignore mask.~
|
||||
.P
|
||||
The TRP instruction generates a trap, the trap number being found on the
|
||||
stack.
|
||||
This is, among other things,
|
||||
useful for library procedures and runtime systems.
|
||||
It can also be used by a low level trap procedure to pass the trap to a
|
||||
higher level one (see example below).
|
||||
.P
|
||||
The RTT instruction returns from the trap procedure and continues after the
|
||||
trap.
|
||||
In the list below all traps marked with an asterisk ('*') are
|
||||
considered to be fatal and it is explicitly undefined what happens if
|
||||
you try to restart after the trap.
|
||||
.P
|
||||
The way a trap procedure is called is completely compatible
|
||||
with normal calling conventions. The only way a trap procedure
|
||||
differs from normal procedures is the return. It has to use RTT instead
|
||||
of RET. This is necessary because the complete runtime status is saved on the
|
||||
stack before calling the procedure and all this status has to be reloaded.
|
||||
Error numbers are in the range 0 to 252.
|
||||
The trap numbers are divided into three categories:
|
||||
.IS 4
|
||||
.N 1
|
||||
.PS - 10
|
||||
.PT ~~0-~63
|
||||
EM machine errors, e.g. illegal instruction.
|
||||
.PS - 8
|
||||
.PT ~0-15
|
||||
maskable
|
||||
.PT 16-63
|
||||
not maskable
|
||||
.PE
|
||||
.PT ~64-127
|
||||
Reserved for use by compilers, run time systems, etc.
|
||||
.PT 128-252
|
||||
Available for user programs.
|
||||
.PE 1
|
||||
.IE
|
||||
EM machine errors are numbered as follows:
|
||||
.DS I 5
|
||||
.TS
|
||||
tab(@);
|
||||
n l l.
|
||||
0@EARRAY@Array bound error
|
||||
1@ERANGE@Range bound error
|
||||
2@ESET@Set bound error
|
||||
3@EIOVFL@Integer overflow
|
||||
4@EFOVFL@Floating overflow
|
||||
5@EFUNFL@Floating underflow
|
||||
6@EIDIVZ@Divide by 0
|
||||
7@EFDIVZ@Divide by 0.0
|
||||
8@EIUND@Undefined integer
|
||||
9@EFUND@Undefined float
|
||||
10@ECONV@Conversion error
|
||||
16*@ESTACK@Stack overflow
|
||||
17*@EHEAP@Heap overflow
|
||||
18*@EILLINS@Illegal instruction
|
||||
19*@EODDZ@Illegal size argument
|
||||
20*@ECASE@Case error
|
||||
21*@EMEMFLT@Addressing non existent memory
|
||||
22*@EBADPTR@Bad pointer used
|
||||
23*@EBADPC@Program counter out of range
|
||||
24@EBADLAE@Bad argument of LAE
|
||||
25@EBADMON@Bad monitor call
|
||||
26@EBADLIN@Argument of LIN too high
|
||||
27@EBADGTO@GTO descriptor error
|
||||
.TE
|
||||
.DE 0
|
||||
.P
|
||||
As an example,
|
||||
suppose a subprocedure has to be written to do a numeric
|
||||
calculation.
|
||||
When an overflow occurs the computation has to be stopped and
|
||||
the higher level procedure must be resumed.
|
||||
This can be programmed as follows using the mechanism described above:
|
||||
.DS B
|
||||
mes 2,2,2 ; set sizes
|
||||
ersave
|
||||
bss 2,0,0 ; Room to save previous value of trap procedure
|
||||
msave
|
||||
bss 2,0,0 ; Room to save previous value of trap mask
|
||||
|
||||
pro calcule,0 ; entry point
|
||||
lxl 0 ; fill in non-local goto descriptor with LB
|
||||
ste jmpbuf+4
|
||||
lor 1 ; and SP
|
||||
ste jmpbuf+2
|
||||
lim ; get current ignore mask
|
||||
ste msave ; save it
|
||||
lim
|
||||
loc 4 ; bit for EFOVFL
|
||||
ior 2 ; set in mask
|
||||
sim ; ignore EFOVFL from now on
|
||||
lpi $catch ; load procedure identifier
|
||||
sig ; catch wil get all traps now
|
||||
ste ersave ; save previous trap procedure identifier
|
||||
; perform calculation now, possibly generating overflow
|
||||
1 ; label jumped to by catch procedure
|
||||
loe ersave ; get old trap procedure
|
||||
sig ; refer all following trap to old procedure
|
||||
asp 2 ; remove result of sig
|
||||
loe msave ; restore previous mask
|
||||
sim ; done now
|
||||
; load result of calculation
|
||||
ret 2 ; return result
|
||||
jmpbuf
|
||||
con *1,0,0
|
||||
end
|
||||
.DE 0
|
||||
.VS 1 1
|
||||
.DS
|
||||
Example of catch procedure
|
||||
pro catch,0 ; Local procedure that must catch the overflow trap
|
||||
lol 2 ; Load trap number
|
||||
loc 4 ; check for overflow
|
||||
bne *1 ; if other trap, call higher trap procedure
|
||||
gto jmpbuf ; return to procedure calcule
|
||||
1 ; other trap has occurred
|
||||
loe ersave ; previous trap procedure
|
||||
sig ; other procedure will get the traps now
|
||||
asp 2 ; remove the result of sig
|
||||
lol 2 ; stack trap number
|
||||
trp ; call other trap procedure
|
||||
rtt ; if other procedure returns, do the same
|
||||
end
|
||||
.DE
|
6
doc/em/ip.awk
Normal file
6
doc/em/ip.awk
Normal file
|
@ -0,0 +1,6 @@
|
|||
BEGIN { printf ".TS\nlw(6) lw(8) rw(3) rw(6) 14 lw(6) lw(8) rw(3) rw(6) 14 lw(6) lw(8) rw(3) rw(6).\n" }
|
||||
NF == 4 { printf "%s\t%s\t%d\t%d",$1,$2,$3,$4 }
|
||||
NF == 3 { printf "%s\t%s\t\t%d",$1,$2,$3 }
|
||||
{ if ( NR%3 == 0 ) printf("\n") ; else printf("\t"); }
|
||||
END { if ( NR%3 != 0 ) printf("\n")
|
||||
printf ".TE\n" }
|
61
doc/em/ispace.nr
Normal file
61
doc/em/ispace.nr
Normal file
|
@ -0,0 +1,61 @@
|
|||
.SN 3
|
||||
.BP
|
||||
.S1 "INSTRUCTION ADDRESS SPACE"
|
||||
The instruction space of the EM machine contains
|
||||
the code for procedures.
|
||||
Tables necessary for the execution of this code, for example, procedure
|
||||
descriptor tables, may also be present.
|
||||
The instruction space does not change during
|
||||
the execution of a program, so that it may be
|
||||
protected.
|
||||
No further restrictions to the instruction address space are
|
||||
necessary for the abstract and assembly language level.
|
||||
.P
|
||||
Each procedure has a single entry point: the first instruction.
|
||||
A special type of pointer identifies a procedure.
|
||||
Pointers into the instruction
|
||||
address space have the same size as pointers into data space and
|
||||
can, for example, contain the address of the first instruction
|
||||
or an index in a procedure descriptor table.
|
||||
.A
|
||||
There is a single EM program counter, PC, pointing
|
||||
to the next instruction to be executed.
|
||||
The procedure pointed to by PC is
|
||||
called the 'current' procedure.
|
||||
A procedure may call another procedure using the CAL or CAI
|
||||
instruction.
|
||||
The calling procedure remains 'active' and is resumed whenever the called
|
||||
procedure returns.
|
||||
Note that a procedure has several 'active' invocations when
|
||||
called recursively.
|
||||
.P
|
||||
Each procedure must return properly.
|
||||
It is not allowed to fall through to the
|
||||
code of the next procedure.
|
||||
There are several ways to exit from a procedure:
|
||||
.IS 3
|
||||
.PS
|
||||
.PT
|
||||
the RET instruction, which returns to the
|
||||
calling procedure.
|
||||
.PT
|
||||
the RTT instruction, which exits a trap handling routine and resumes
|
||||
the trapping instruction (see next chapter).
|
||||
.PT
|
||||
the GTO instruction, which is used for non-local goto's.
|
||||
It can remove several frames from the stack and transfer
|
||||
control to an active procedure.
|
||||
.PE
|
||||
.IE
|
||||
.P
|
||||
All branch instructions can transfer control
|
||||
to any label within the same procedure.
|
||||
Branch instructions can never jump out of a procedure.
|
||||
.P
|
||||
Several language implementations use a so called procedure
|
||||
instance identifier, a combination of a procedure identifier and
|
||||
the LB of a stack frame, also called static link.
|
||||
.P
|
||||
The program text for each procedure, as well as any tables,
|
||||
are fragments and can be allocated anywhere
|
||||
in the instruction address space.
|
2525
doc/em/itables
Normal file
2525
doc/em/itables
Normal file
File diff suppressed because it is too large
Load diff
390
doc/em/mach.nr
Normal file
390
doc/em/mach.nr
Normal file
|
@ -0,0 +1,390 @@
|
|||
.BP
|
||||
.SN 10
|
||||
.S1 "EM MACHINE LANGUAGE"
|
||||
The EM machine language is designed to make program text compact
|
||||
and to make decoding easy.
|
||||
Compact program text has many advantages: programs execute faster,
|
||||
programs occupy less primary and secondary storage and loading
|
||||
programs into satellite processors is faster.
|
||||
The decoding of EM machine language is so simple,
|
||||
that it is feasible to use interpreters as long as EM hardware
|
||||
machines are not available.
|
||||
This chapter is irrelevant when back ends are used to
|
||||
produce executable target machine code.
|
||||
.S2 "Instruction encoding"
|
||||
A design goal of EM is to make the
|
||||
program text as compact as possible.
|
||||
Decoding must be easy, however.
|
||||
The encoding is fully byte oriented, without any small bit fields.
|
||||
There are 256 primary opcodes, two of which are an escape to
|
||||
two groups of 256 secondary opcodes each.
|
||||
.A
|
||||
EM instructions without arguments have a single opcode assigned,
|
||||
possibly escaped:
|
||||
.DS
|
||||
|
||||
|--------------|
|
||||
| opcode |
|
||||
|--------------|
|
||||
|
||||
or
|
||||
|
||||
|--------------|--------------|
|
||||
| escape | opcode |
|
||||
|--------------|--------------|
|
||||
|
||||
.DE
|
||||
The encoding for instructions with an argument is more complex.
|
||||
Several instructions have an address from the global data area
|
||||
as argument.
|
||||
Other instructions have different opcodes for positive
|
||||
and negative arguments.
|
||||
.N 1
|
||||
There is always an opcode that takes the next two bytes as argument,
|
||||
high byte first:
|
||||
.DS
|
||||
|
||||
|--------------|--------------|--------------|
|
||||
| opcode | hibyte | lobyte |
|
||||
|--------------|--------------|--------------|
|
||||
|
||||
or
|
||||
|
||||
|--------------|--------------|--------------|--------------|
|
||||
| escape | opcode | hibyte | lobyte |
|
||||
|--------------|--------------|--------------|--------------|
|
||||
|
||||
.DE
|
||||
.DS
|
||||
An extra escape is provided for instructions with four or eight byte arguments.
|
||||
|
||||
|--------------|--------------|--------------| |--------------|
|
||||
| ESCAPE | opcode | hibyte |...| lobyte |
|
||||
|--------------|--------------|--------------| |--------------|
|
||||
|
||||
.DE
|
||||
For most instructions some argument values predominate.
|
||||
The most frequent combinations of instruction and argument
|
||||
will be encoded in a single byte, called a mini:
|
||||
.DS
|
||||
|
||||
|---------------|
|
||||
|opcode+argument| (mini)
|
||||
|---------------|
|
||||
|
||||
.DE
|
||||
The number of minis is restricted, because only
|
||||
254 primary opcodes are available.
|
||||
Many instructions have the bulk of their arguments
|
||||
fall in the range 0 to 255.
|
||||
Instructions that address global data have their arguments
|
||||
distributed over a wider range,
|
||||
but small values of the high byte are common.
|
||||
For all these cases there is another encoding
|
||||
that combines the instruction and the high byte of the argument
|
||||
into a single opcode.
|
||||
These opcodes are called shorties.
|
||||
Shorties may be escaped.
|
||||
.DS
|
||||
|
||||
|--------------|--------------|
|
||||
| opcode+high | lobyte | (shortie)
|
||||
|--------------|--------------|
|
||||
|
||||
or
|
||||
|
||||
|--------------|--------------|--------------|
|
||||
| escape | opcode+high | lobyte |
|
||||
|--------------|--------------|--------------|
|
||||
|
||||
.DE
|
||||
Escaped shorties are useless if the normal encoding has a primary opcode.
|
||||
Note that for some instruction-argument combinations
|
||||
several different encodings are available.
|
||||
It is the task of the assembler to select the shortest of these.
|
||||
The savings by these mini and shortie
|
||||
opcodes are considerable, about 55%.
|
||||
.P
|
||||
Further improvements are possible:
|
||||
the arguments of
|
||||
many instructions are a multiple of the wordsize.
|
||||
Some do also not allow zero as an argument.
|
||||
If these arguments are divided by the wordsize and,
|
||||
when zero is not allowed, then decremented by 1, more of them can
|
||||
be encoded as shortie or mini.
|
||||
The arguments of some other instructions
|
||||
rarely or never assume the value 0, but start at 1.
|
||||
The value 1 is then encoded as 0,
|
||||
2 as 1 and so on.
|
||||
.P
|
||||
Assigning opcodes to instructions by the assembler is completely
|
||||
table driven.
|
||||
For details see appendix B.
|
||||
.S2 "Procedure descriptors"
|
||||
The procedure identifiers used in the interpreter are indices
|
||||
into a table of procedure descriptors.
|
||||
Each descriptor contains:
|
||||
.IS 6
|
||||
.PS - 4
|
||||
.PT 1.
|
||||
the number of bytes to be reserved for locals at each
|
||||
invocation.
|
||||
.N
|
||||
This is a pointer-szied integer.
|
||||
.PT 2.
|
||||
the start address of the procedure
|
||||
.PE
|
||||
.IE
|
||||
.S2 "Load format"
|
||||
The EM machine language load format defines the interface between
|
||||
the EM assembler/loader and the EM machine itself.
|
||||
A load file consists of a header, the program text to be executed,
|
||||
a description of the global data area and the procedure descriptor table,
|
||||
in this order.
|
||||
All integers in the load file are presented with the
|
||||
least significant byte first.
|
||||
.P
|
||||
The header has two parts: the first half (eight 16-bit integers)
|
||||
aids in selecting
|
||||
the correct EM machine or interpreter.
|
||||
Some EM machines, for instance, may have hardware floating point
|
||||
instructions.
|
||||
.N
|
||||
The header entries are as follows (bit 0 is rightmost):
|
||||
.IS 2
|
||||
.VS 1 0
|
||||
.PS 1 4 "" :
|
||||
.PT
|
||||
magic number (07255)
|
||||
.PT
|
||||
flag bits with the following meaning:
|
||||
.PS - 7 "" :
|
||||
.PT bit 0
|
||||
TEST; test for integer overflow etc.
|
||||
.PT bit 1
|
||||
PROFILE; for each source line: count the number of memory
|
||||
cycles executed.
|
||||
.PT bit 2
|
||||
FLOW; for each source line: set a bit in a bit map table if
|
||||
instructions on that line are executed.
|
||||
.PT bit 3
|
||||
COUNT; for each source line: increment a counter if that line
|
||||
is entered.
|
||||
.PT bit 4
|
||||
REALS; set if a program uses floating point instructions.
|
||||
.PT bit 5
|
||||
EXTRA; more tests during compiler debugging.
|
||||
.PE
|
||||
.PT
|
||||
number of unresolved references.
|
||||
.PT
|
||||
version number; used to detect obsolete EM load files.
|
||||
.PT
|
||||
wordsize ; the number of bytes in each machine word.
|
||||
.PT
|
||||
pointer size ; the number of bytes available for addressing.
|
||||
.PT
|
||||
unused
|
||||
.PT
|
||||
unused
|
||||
.PE
|
||||
.IE
|
||||
The second part of the header (eight entries, of pointer size bytes each)
|
||||
describes the load file itself:
|
||||
.IS 2
|
||||
.PS 1 4 "" :
|
||||
.PT
|
||||
NTEXT; the program text size in bytes.
|
||||
.PT
|
||||
NDATA; the number of load-file descriptors (see below).
|
||||
.PT
|
||||
NPROC; the number of entries in the procedure descriptor table.
|
||||
.PT
|
||||
ENTRY; procedure number of the procedure to start with.
|
||||
.PT
|
||||
NLINE; the maximum source line number.
|
||||
.PT
|
||||
SZDATA; the address of the lowest uninitialized data byte.
|
||||
.PT
|
||||
unused
|
||||
.PT
|
||||
unused
|
||||
.PE
|
||||
.IE
|
||||
.P
|
||||
The program text consists of NTEXT bytes.
|
||||
NTEXT is always a multiple of the wordsize.
|
||||
The first byte of the program text is the
|
||||
first byte of the instruction address
|
||||
space, i.e. it has address 0.
|
||||
Pointers into the program text are found in the procedure descriptor
|
||||
table where relocation is simple and in the global data area.
|
||||
The initialization of the global data area allows easy
|
||||
relocation of pointers into both address spaces.
|
||||
.P
|
||||
The global data area is described by the NDATA descriptors.
|
||||
Each descriptor describes a number of consecutive words (of~wordsize)
|
||||
and consists of a sequence of bytes.
|
||||
While reading the descriptors from the load file, one can
|
||||
initialize the global data area from low to high addresses.
|
||||
The size of the initialized data area is given by SZDATA,
|
||||
this number can be used to check the initialization.
|
||||
.N
|
||||
The header of each descriptor consists of a byte, describing the type,
|
||||
and a count.
|
||||
The number of bytes used for this (unsigned) count depends on the
|
||||
type of the descriptor and
|
||||
is either a pointer-sized integer
|
||||
or one byte.
|
||||
The meaning of the count depends on the descriptor type.
|
||||
At load time an interpreter can
|
||||
perform any conversion deemed necessary, such as
|
||||
reordering bytes in integers
|
||||
and pointers and adding base addresses to pointers.
|
||||
.BP
|
||||
.A
|
||||
In the following pictures we show a graphical notation of the
|
||||
initializers.
|
||||
The leftmost rectangle represents the leading byte.
|
||||
.N 1
|
||||
.DS
|
||||
.PS - 4 " "
|
||||
Fields marked with
|
||||
.N 1
|
||||
.PT n
|
||||
contain a pointer-sized integer used as a count
|
||||
.PT m
|
||||
contain a one-byte integer used as a count
|
||||
.PT b
|
||||
contain a one-byte integer
|
||||
.PT w
|
||||
contain a wordsized integer
|
||||
.PT p
|
||||
contain a data or instruction pointer
|
||||
.PT s
|
||||
contain a null terminated ASCII string
|
||||
.PE 1
|
||||
.DE 0
|
||||
.VS 1 1
|
||||
.DS
|
||||
|
||||
-------------------
|
||||
| 0 | n | repeat last initialization n times
|
||||
-------------------
|
||||
.DE
|
||||
.DS
|
||||
---------
|
||||
| 1 | m | m uninitialized words
|
||||
---------
|
||||
.DE
|
||||
.DS
|
||||
____________
|
||||
/ bytes \e
|
||||
----------------- -----
|
||||
| 2 | m | b | b |...| b | m initialized bytes
|
||||
----------------- -----
|
||||
.DE
|
||||
.DS
|
||||
_________
|
||||
/ word \e
|
||||
-----------------------
|
||||
| 3 | m | w |... m initialized wordsized integers
|
||||
-----------------------
|
||||
.DE
|
||||
.DS
|
||||
_________
|
||||
/ pointer \e
|
||||
-----------------------
|
||||
| 4 | m | p |... m initialized data pointers
|
||||
-----------------------
|
||||
.DE
|
||||
.DS
|
||||
_________
|
||||
/ pointer \e
|
||||
-----------------------
|
||||
| 5 | m | p |... m initialized instruction pointers
|
||||
-----------------------
|
||||
.DE
|
||||
.DS
|
||||
____________
|
||||
/ bytes \e
|
||||
-------------------------
|
||||
| 6 | m | b | b |...| b | initialized integer of size m
|
||||
-------------------------
|
||||
.DE
|
||||
.DS
|
||||
____________
|
||||
/ bytes \e
|
||||
-------------------------
|
||||
| 7 | m | b | b |...| b | initialized unsigned of size m
|
||||
-------------------------
|
||||
.DE
|
||||
.DS
|
||||
____________
|
||||
/ string \e
|
||||
-------------------------
|
||||
| 8 | m | s | initialized float of size m
|
||||
-------------------------
|
||||
.DE 3
|
||||
.PS - 8
|
||||
.PT type~0:
|
||||
If the last initialization initialized k bytes starting
|
||||
at address \fIa\fP, do the same initialization again n times,
|
||||
starting at \fIa\fP+k, \fIa\fP+2*k, .... \fIa\fP+n*k.
|
||||
This is the only descriptor whose starting byte
|
||||
is followed by an integer with the
|
||||
size of a
|
||||
pointer,
|
||||
in all other descriptors the first byte is followed by a one-byte count.
|
||||
This descriptor must be preceded by a descriptor of
|
||||
another type.
|
||||
.PT type~1:
|
||||
Reserve m words, not explicitly initialized (BSS and HOL).
|
||||
.PT type~2:
|
||||
The m bytes following the descriptor header are
|
||||
initializers for the next m bytes of the
|
||||
global data area.
|
||||
m is divisible by the wordsize.
|
||||
.PT type~3:
|
||||
The m words following the header are initializers for the next m words of the
|
||||
global data area.
|
||||
.PT type~4:
|
||||
The m data address space pointers following the header are
|
||||
initializers for the next
|
||||
m data pointers in the global data area.
|
||||
Interpreters that represent EM pointers by
|
||||
target machine addresses must relocate all data pointers.
|
||||
.PT type~5:
|
||||
The m instruction address space pointers following the header are
|
||||
initializers for the next
|
||||
m instruction pointers in the global data area.
|
||||
Interpreters that represent EM instruction pointers by
|
||||
target machine addresses must relocate these pointers.
|
||||
.PT type~6:
|
||||
The m bytes following the header form
|
||||
a signed integer number with a size of m bytes,
|
||||
which is an initializer for the next m bytes
|
||||
of the global data area.
|
||||
m is governed by the same restrictions as for
|
||||
transfer of objects to/from memory.
|
||||
.PT type~7:
|
||||
The m bytes following the header form
|
||||
an unsigned integer number with a size of m bytes,
|
||||
which is an initializer for the next m bytes
|
||||
of the global data area.
|
||||
m is governed by the same restrictions as for
|
||||
transfer of objects to/from memory.
|
||||
.PT type~8:
|
||||
The header is followed by an ASCII string, null terminated, to
|
||||
initialize, in global data,
|
||||
a floating point number with a size of m bytes.
|
||||
m is governed by the same restrictions as for
|
||||
transfer of objects to/from memory.
|
||||
The ASCII string contains the notation of a real as used in the
|
||||
Pascal language.
|
||||
.PE
|
||||
.P
|
||||
The NPROC procedure descriptors on the load file consist of
|
||||
an instruction space address (of~pointer~size) and
|
||||
an integer (of~pointer~size) specifying the number of bytes for
|
||||
locals.
|
16
doc/em/macr.nr
Normal file
16
doc/em/macr.nr
Normal file
|
@ -0,0 +1,16 @@
|
|||
.so /usr/lib/tmac/tmac.kun
|
||||
.SS 6
|
||||
.RP
|
||||
.PL 12i 11i
|
||||
.LL 89
|
||||
.MS T E
|
||||
\!.TL '%'''
|
||||
.ME
|
||||
.MS T O
|
||||
\!.TL '''%'
|
||||
.ME
|
||||
.MS B
|
||||
.sp 1
|
||||
.ME
|
||||
.SM S1 B
|
||||
.SM S2 B
|
245
doc/em/mapping.nr
Normal file
245
doc/em/mapping.nr
Normal file
|
@ -0,0 +1,245 @@
|
|||
.SN 5
|
||||
.BP
|
||||
.S1 "MAPPING OF EM DATA MEMORY ONTO TARGET MACHINE MEMORY"
|
||||
The EM architecture is designed to be implemented
|
||||
on many existing and future machines.
|
||||
EM memory is highly fragmented to make
|
||||
adaptation to various memory architectures possible.
|
||||
Format and encoding of pointers is explicitly undefined.
|
||||
.P
|
||||
This chapter gives solutions to some of the
|
||||
anticipated problems.
|
||||
First, we describe a possible memory layout for machines
|
||||
with 64K bytes of address space.
|
||||
Here we use a member of the EM family with 2-byte word and pointer
|
||||
size.
|
||||
The most straightforward layout is shown in figure 2.
|
||||
.N 1
|
||||
.DS
|
||||
65534 -> |-------------------------------|
|
||||
|///////////////////////////////|
|
||||
|//// unimplemented memory /////|
|
||||
|///////////////////////////////|
|
||||
ML -> |-------------------------------|
|
||||
| |
|
||||
| | <- LB
|
||||
| stack and local area |
|
||||
| |
|
||||
|-------------------------------| <- SP
|
||||
|///////////////////////////////|
|
||||
|//////// inaccessible /////////|
|
||||
|///////////////////////////////|
|
||||
|-------------------------------| <- HP
|
||||
| |
|
||||
| heap area |
|
||||
| |
|
||||
| |
|
||||
HB -> |-------------------------------|
|
||||
| |
|
||||
| global data area |
|
||||
| |
|
||||
EB -> |-------------------------------|
|
||||
| |
|
||||
| program text | <- PC
|
||||
| |
|
||||
| ( and tables ) |
|
||||
| |
|
||||
| |
|
||||
PB -> |-------------------------------|
|
||||
|///////////////////////////////|
|
||||
|////////// undefined //////////|
|
||||
|///////////////////////////////|
|
||||
0 -> |-------------------------------|
|
||||
|
||||
Figure 2. Memory layout showing typical register
|
||||
positions during execution of an EM program.
|
||||
.DE 2
|
||||
The base registers for the various memory pieces can be stored
|
||||
in target machine registers or memory.
|
||||
.IS
|
||||
.N 1
|
||||
.TS
|
||||
tab(;);
|
||||
l 1 l l l.
|
||||
PB;:;program base;points to the base of the instruction address space.
|
||||
EB;:;external base;points to the base of the data address space.
|
||||
HB;:;heap base;points to the base of the heap area.
|
||||
ML;:;memory limit;marks the high end of the addressable data space.
|
||||
.TE 1
|
||||
.IE
|
||||
The stack grows from high
|
||||
EM addresses to low EM addresses, and the heap the
|
||||
other way.
|
||||
The memory between SP and HP is not accessible,
|
||||
but may be allocated later to the stack or the heap if needed.
|
||||
The local data area is allocated starting at the high end of
|
||||
memory.
|
||||
.P
|
||||
Because EM address 0 is not mapped onto target
|
||||
address 0, a problem arises when pointers are used.
|
||||
If a program pushed a constant, say 6, onto the stack,
|
||||
and then tried to indirect through it,
|
||||
the wrong word would be fetched,
|
||||
because EM address 6 is mapped onto target address EB+6
|
||||
and not target address 6 itself.
|
||||
This particular problem is solved by explicitly declaring
|
||||
the format of a pointer to be undefined,
|
||||
so that using a constant as a pointer is completely illegal.
|
||||
However, the general problem of mapping pointers still exists.
|
||||
.P
|
||||
There are two possible solutions.
|
||||
In the first solution, EM pointers are represented
|
||||
in the target machine as true EM addresses,
|
||||
for example, a pointer to EM address 6 really is
|
||||
stored as a 6 in the target machine.
|
||||
This solution implies that every time a pointer is fetched
|
||||
EB must be added before referencing
|
||||
the target machine's memory.
|
||||
If the target machine has powerful indexing
|
||||
facilities, EB can be kept in a target machine register,
|
||||
and the relocation can indeed be done on
|
||||
every reference to the data address space
|
||||
at a modest cost in speed.
|
||||
.P
|
||||
The other solution consists of having EM pointers
|
||||
refer to the true target machine address.
|
||||
Thus the instruction LAE 6 (Load Address of External 6)
|
||||
would push the value of EB+6 onto the stack.
|
||||
When this approach is chosen, back ends must know
|
||||
how to offset from EB, to translate all
|
||||
instructions that manipulate EM addresses.
|
||||
However, the problem is not completely solved,
|
||||
because a front end may have to initialize a pointer
|
||||
in CON or ROM data to point to a global address.
|
||||
This pointer must also be relocated by the back end or the interpreter.
|
||||
.P
|
||||
Although the EM stack grows from high to low EM addresses,
|
||||
some machines have hardware PUSH and POP
|
||||
instructions that require the stack to grow upwards.
|
||||
If reasons of efficiency urge you to use these
|
||||
instructions, then EM
|
||||
can be implemented with the memory layout
|
||||
upside down, as shown in figure 3.
|
||||
This is possible because the pointer format is explicitly undefined.
|
||||
The first element of a word array will have a
|
||||
lower physical address than the second element.
|
||||
.N 2
|
||||
.DS
|
||||
| | | |
|
||||
| EB=60 | | ^ |
|
||||
| | | | |
|
||||
|-----------------| |-----------------|
|
||||
105 | 45 | 44 | 104 214 | 41 | 40 | 215
|
||||
|-----------------| |-----------------|
|
||||
103 | 43 | 42 | 102 212 | 43 | 42 | 213
|
||||
|-----------------| |-----------------|
|
||||
101 | 41 | 40 | 100 210 | 45 | 44 | 211
|
||||
|-----------------| |-----------------|
|
||||
| | | | |
|
||||
| v | | EB=255 |
|
||||
| | | |
|
||||
|
||||
Type A Type B
|
||||
.sp 2
|
||||
Figure 3. Two possible memory implementations.
|
||||
Numbers within the boxes are EM addresses.
|
||||
The other numbers are physical addresses.
|
||||
.DE 2
|
||||
.A 0 0
|
||||
So, we have two different EM memory implementations:
|
||||
.IS
|
||||
.PS - 4
|
||||
.PT A~-
|
||||
stack downwards
|
||||
.PT B~-
|
||||
stack upwards
|
||||
.PE
|
||||
.IE
|
||||
.P
|
||||
For each of these two possibilities we give the translation of
|
||||
the EM instructions to push the third byte of a global data
|
||||
block starting at EM address 40 onto the stack and to load the
|
||||
word at address 40.
|
||||
All translations assume a word and pointer size of two bytes.
|
||||
The target machine used is a PDP-11 augmented with push and pop instructions.
|
||||
Registers 'r0' and 'r1' are used and suffer from sign extension for byte
|
||||
transfers.
|
||||
Push $40 means push the constant 40, not word 40.
|
||||
.P
|
||||
The translation of the EM instructions depends on the pointer representation
|
||||
used.
|
||||
For each of the two solutions explained above the translation is given.
|
||||
.P
|
||||
First, the translation for the two implementations using EM addresses as
|
||||
pointer representation:
|
||||
.DS
|
||||
.TS
|
||||
tab(:), center;
|
||||
l s l s l s
|
||||
_ s _ s _ s
|
||||
l 2 l 6 l 2 l 6 l 2 l.
|
||||
EM:type A:type B
|
||||
|
||||
|
||||
LAE:40:push:$40:push:$40
|
||||
|
||||
ADP:3:pop:r0:pop:r0
|
||||
::add:$3,r0:add:$3,r0
|
||||
::push:r0:push:r0
|
||||
|
||||
LOI:1:pop:r0:pop:r0
|
||||
::-::neg:r0
|
||||
::clr:r1:clr:r1
|
||||
::bisb:eb(r0),r1:bisb:eb(r0),r1
|
||||
::push:r1:push:r1
|
||||
|
||||
LOE:40:push:eb+40:push:eb-41
|
||||
.TE
|
||||
.DE
|
||||
.BP
|
||||
.P
|
||||
The translation for the two implementations, if the target machine address is
|
||||
used as pointer representation, is:
|
||||
.N 1
|
||||
.DS
|
||||
.TS
|
||||
tab(:), center;
|
||||
l s l s l s
|
||||
_ s _ s _ s
|
||||
l 2 l 6 l 2 l 6 l 2 l.
|
||||
EM:type A:type B
|
||||
|
||||
|
||||
LAE:40:push:$eb+40:push:$eb-40
|
||||
|
||||
ADP:3:pop:r0:pop:r0
|
||||
::add:$3,r0:sub:$3,r0
|
||||
::push:r0:push:r0
|
||||
|
||||
LOI:1:pop:r0:pop:r0
|
||||
::clr:r1:clr:r1
|
||||
::bisb:(r0),r1:bisb:(r0),r1
|
||||
::push:r1:push:r1
|
||||
|
||||
LOE:40:push:eb+40:push:eb-41
|
||||
.TE
|
||||
.DE
|
||||
.P
|
||||
The translation presented above is not intended to be optimal.
|
||||
Most machines can handle these simple cases in one or two instructions.
|
||||
It demonstrates, however, the flexibility of the EM design.
|
||||
.P
|
||||
There are several possibilities to implement EM on machines with
|
||||
address spaces larger than 64k bytes.
|
||||
For EM with two byte pointers one could allocate instruction and
|
||||
data space each in a separate 64k piece of memory.
|
||||
EM pointers still have to fit in two bytes,
|
||||
but the base registers PB and EB may be loaded in hardware registers
|
||||
wider than 16 bits, if available.
|
||||
EM implementations can also make efficient use of a machine
|
||||
with separate instruction and data space.
|
||||
.P
|
||||
EM with 32 bit pointers allows one to make use of machines
|
||||
with large address spaces.
|
||||
In a virtual, segmented memory system one could use a separate
|
||||
segment for each fragment.
|
80
doc/em/mem.nr
Normal file
80
doc/em/mem.nr
Normal file
|
@ -0,0 +1,80 @@
|
|||
.BP
|
||||
.SN 2
|
||||
.S1 MEMORY
|
||||
The EM machine has two distinct address spaces,
|
||||
one for instructions and one for data.
|
||||
The data space is divided up into 8-bit bytes.
|
||||
The smallest addressable unit is a byte.
|
||||
Bytes are numbered consecutively from 0 to some maximum.
|
||||
All sizes in EM are expressed in bytes.
|
||||
.P
|
||||
Some EM instructions can transfer objects containing several bytes
|
||||
to and/or from memory.
|
||||
The size of all objects larger than a word must be a multiple of
|
||||
the wordsize.
|
||||
The size of all objects smaller than a word must be a divisor
|
||||
of the wordsize.
|
||||
For example: if the wordsize is 2 bytes, objects of the sizes 1,
|
||||
2, 4, 6,... are allowed.
|
||||
The address of such an object is the lowest address of all bytes it contains.
|
||||
For objects smaller than the wordsize, the
|
||||
address must be a multiple of the object size.
|
||||
For all other objects the address must be a multiple of the
|
||||
wordsize.
|
||||
For example, if an instruction transfers a 4-byte object to memory at
|
||||
location \fIm\fP and the wordsize is 2,
|
||||
\fIm\fP must be a multiple of 2 and the bytes at
|
||||
locations \fIm\fP, \fIm\fP\|+\|1,\fIm\fP\|+\|2 and
|
||||
\fIm\fP\|+\|3 are overwritten.
|
||||
.P
|
||||
The size of almost all objects in EM
|
||||
is an integral number of words.
|
||||
Only two operations are allowed on
|
||||
objects whose size is a divisor of the wordsize:
|
||||
push it onto the stack and pop it from the stack.
|
||||
The addressing of these objects in memory is always indirect.
|
||||
If such a small object is pushed onto the stack
|
||||
it is assumed to be a small integer and stored
|
||||
in the least significant part of a word.
|
||||
The rest of the word is cleared to zero,
|
||||
although
|
||||
EM provides a way to sign-extend a small integer.
|
||||
Popping a small object from the stack removes a word
|
||||
from the stack, stores the least significant byte(s)
|
||||
of this word in memory and discards the rest of the word.
|
||||
.P
|
||||
The format of pointers into both address spaces is explicitly undefined.
|
||||
The size of a pointer, however, is fixed for a member of EM, so that
|
||||
the compiler writer knows how much storage to allocate for a pointer.
|
||||
.P
|
||||
A minor problem is raised by the undefined pointer format.
|
||||
Some languages, notably Pascal, require a special,
|
||||
otherwise illegal, pointer value to represent the nil pointer.
|
||||
The current Pascal-VU compiler uses the
|
||||
integer value 0 as nil pointer.
|
||||
This value is also used by many C programs as a normally impossible address.
|
||||
A better solution would be to have a special
|
||||
instruction loading an illegal pointer value,
|
||||
but it is hard to imagine an implementation
|
||||
for which the current solution is inadequate,
|
||||
especially because the first word in the EM data space
|
||||
is special and probably not the target of any pointer.
|
||||
.P
|
||||
The next two chapters describe the EM memory
|
||||
in more detail.
|
||||
One describes the instruction address space,
|
||||
the other the data address space.
|
||||
.P
|
||||
A design goal of EM has been to allow
|
||||
its implementation on a wide range of existing machines,
|
||||
as well as allowing a new one to be built in hardware.
|
||||
To this extent we have tried to minimize the demands
|
||||
of EM on the memory structure of the target machine.
|
||||
Therefore, apart from the logical partitioning,
|
||||
EM memory is divided into 'fragments'.
|
||||
A fragment consists of consecutive machine
|
||||
words and has a base address and a size.
|
||||
Pointer arithmetic is only defined within a fragment.
|
||||
The only exception to this rule is comparison with the null
|
||||
pointer.
|
||||
All fragments must be word aligned.
|
5
doc/em/print
Executable file
5
doc/em/print
Executable file
|
@ -0,0 +1,5 @@
|
|||
|
||||
case $# in
|
||||
1) make "$1".t ; ntlp "$1".t^lpr ;;
|
||||
*) echo $0 heeft een argument nodig ;;
|
||||
esac
|
4
doc/em/show
Executable file
4
doc/em/show
Executable file
|
@ -0,0 +1,4 @@
|
|||
case $# in
|
||||
1) make $1.t ; ntout $1.t ;;
|
||||
*) echo $0 heeft een argument nodig ;;
|
||||
esac
|
38
doc/em/title.nr
Normal file
38
doc/em/title.nr
Normal file
|
@ -0,0 +1,38 @@
|
|||
.po 0
|
||||
.TP 1
|
||||
.ll 79
|
||||
.sp 15
|
||||
.ce 4
|
||||
DESCRIPTION OF A MACHINE
|
||||
ARCHITECTURE FOR USE WITH
|
||||
BLOCK STRUCTURED LANGUAGES
|
||||
.sp 6
|
||||
.ce 4
|
||||
Andrew S. Tanenbaum
|
||||
Hans van Staveren
|
||||
Ed G. Keizer
|
||||
Johan W. Stevenson\v'-0.5m'*\v'0.5m'
|
||||
.sp 2
|
||||
.ce
|
||||
August 1983
|
||||
.sp 2
|
||||
.ce
|
||||
Informatica Rapport IR-81
|
||||
.sp 13
|
||||
Abstract
|
||||
.sp 2
|
||||
.ti +5
|
||||
EM is a family of intermediate languages
|
||||
designed for producing portable compilers.
|
||||
A program called
|
||||
.B front end
|
||||
translates source programs to EM.
|
||||
Another program,
|
||||
.B back
|
||||
.BW end ,
|
||||
translates EM to the assembly language of the target machine.
|
||||
Alternatively, the EM program can be assembled to a highly
|
||||
efficient binary format for interpretation.
|
||||
This document describes the EM languages in detail.
|
||||
.sp 4
|
||||
\v'-0.5m'*\v'0.5m' Present affiliation: NV Philips, Eindhoven
|
130
doc/em/types.nr
Normal file
130
doc/em/types.nr
Normal file
|
@ -0,0 +1,130 @@
|
|||
.SN 6
|
||||
.BP
|
||||
.S1 "TYPE REPRESENTATIONS"
|
||||
The representations used for typed objects are not precisely
|
||||
specified by EM.
|
||||
Sometimes we only specify that a typed object occupies a
|
||||
certain amount of space and state no further restrictions.
|
||||
If one wants to have a different representation of the value of
|
||||
an object on the stack one has to use a convert instruction
|
||||
in most cases.
|
||||
We do specify some relations between the representations of
|
||||
types.
|
||||
This allows some intermixed use of operators for different types
|
||||
on the same object(s).
|
||||
For example, the instruction ZER pushes signed and
|
||||
unsigned integers with the value zero and empty sets.
|
||||
ZER has as only argument the size of the object.
|
||||
.A
|
||||
The representation of floating point numbers is a good example,
|
||||
it allows widely varying implementations.
|
||||
The only ways to create floating point numbers are via
|
||||
initialization and via conversions from integer numbers.
|
||||
Only by using conversions to integers and comparing
|
||||
two floating point numbers with each other, can these numbers
|
||||
be converted to human readable output.
|
||||
Implementations may use base 10, base 2 or any other
|
||||
base for exponents, and have freedom in choosing the range of
|
||||
exponent and mantissa.
|
||||
.A
|
||||
Other types are more precisely described.
|
||||
In the following paragraphs a description will be given of the
|
||||
restrictions imposed on the representation of the types used.
|
||||
A number \fBn\fP used in these paragraphs indicates the size of
|
||||
the object in \fIbits\fP.
|
||||
.S2 "Unsigned integers"
|
||||
The range of unsigned integers is 0..2\v'-0.5m'\fBn\fP\v'0.5m'-1.
|
||||
A binary representation is assumed.
|
||||
The order of the bits within an object is knowingly left
|
||||
unspecified.
|
||||
Discussing bit order within each 8-bit byte is academic,
|
||||
so the only real freedom of this specification lies in the byte
|
||||
order.
|
||||
We really do not care whether an implementation of a 4-byte
|
||||
integer has its bytes in a particular order of significance.
|
||||
This of course means that some sequences of instructions have
|
||||
unpredictable effects.
|
||||
For example:
|
||||
.DS
|
||||
LOC 258 ; STL 0 ; LAL 0 ; LOI 1 ( wordsize >=2 )
|
||||
.DE
|
||||
The value on the stack after executing this sequence
|
||||
can be anything,
|
||||
but will most likely be 1 or 2.
|
||||
.A
|
||||
Conversion between unsigned integers of different sizes have to
|
||||
be done with explicit convert instructions.
|
||||
One cannot simply pad an unsigned integer with zero's at either end
|
||||
and expect a correct result.
|
||||
.A
|
||||
We assume existence of at least single word unsigned arithmetic
|
||||
in any implementation.
|
||||
.S2 "Signed Integers"
|
||||
The range of signed integers is -2\v'-0.5m'\fBn\fP-1\v'0.5m'~..~2\v'-0.5m'\fBn\fP-1\v'0.5m'-1,
|
||||
in other words the range of signed integers of \fBn\fP bits
|
||||
using two's complement arithmetic.
|
||||
The representation is the same as for unsigned integers except
|
||||
the range 2\v'-0.5m'\fBn\fP-1\v'0.5m'~..~2\v'-0.5m'\fBn\fP\v'0.5m'-1 is mapped on the
|
||||
range -2\v'-0.5m'\fBn\fP-1\v'0.5m'~..~-1.
|
||||
In other words, the most significant bit is used as sign bit.
|
||||
The convert instructions between signed and unsigned integers
|
||||
of the same size can be used to catch errors.
|
||||
.A
|
||||
The value -2\v'-0.5m'\fBn\fP-1\v'0.5m' is used for undefined
|
||||
signed integers.
|
||||
EM implementations should trap when this value is used in an
|
||||
operation on signed integers.
|
||||
The instruction mask, accessed with SIM and LIM -~see chapter 9~- ,
|
||||
can be used to disable such traps.
|
||||
.A
|
||||
We assume existence of at least single word signed arithmetic
|
||||
in any implementation.
|
||||
.BP
|
||||
.S2 "Floating point values"
|
||||
Floating point values must have a signed mantissa and a signed
|
||||
exponent.
|
||||
Although no base is specified, base 2 is the normal choice,
|
||||
because the FEF instruction pushes the exponent in base 2.
|
||||
.A
|
||||
The implementation of floating point arithmetic is optional.
|
||||
The compilers currently in use have runtime parameters for the
|
||||
size of the floating point values they should use.
|
||||
Common choices are 4 and/or 8 bytes.
|
||||
.S2 Pointers
|
||||
EM has two kinds of pointers: for instruction and for data
|
||||
space.
|
||||
Each kind can only be used for its own space, conversion between
|
||||
these two subtypes is impossible.
|
||||
We assume that pointers have a range from 0 upwards.
|
||||
Any implementation may have holes in the pointer range between
|
||||
fragments.
|
||||
One can of course not expect to be able to address two megabyte
|
||||
of memory using a 2-byte pointer.
|
||||
Normally, a 2-byte pointer allows up to 65536 bytes of
|
||||
addressable memory.
|
||||
.A
|
||||
Pointer representation has one restriction.
|
||||
The pointer with the same representation as the integer zero of
|
||||
the same size should be invalid.
|
||||
Some languages and/or runtime systems represent the nil
|
||||
pointer as zero.
|
||||
.S2 "Bit sets"
|
||||
All bit sets of size \fBn\fP are subsets of the set
|
||||
{~i~|~i>=0,~i<\fBn\fP~}.
|
||||
A bit set contains a bit for each element showing its
|
||||
presence or absence.
|
||||
Bit sets are subdivided into words.
|
||||
The word with the lowest EM address governs the subset
|
||||
{~i~|~i>=0,~i<\fBm\fP~}, where \fBm\fP is the number of bits in
|
||||
a word.
|
||||
The next higher words each govern the next higher \fBm\fP set elements.
|
||||
The relation between a set with size of
|
||||
a word and an unsigned integer word is that
|
||||
the value of the unsigned integer is the summation of the
|
||||
2\v'-0.5m'i\v'0.5m' where i is in the set.
|
||||
.A
|
||||
Example: a 2-word bit set (wordsize 2) containing the
|
||||
elements 1, 6, 8, 15, 18, 21, 27 and 28 is composed of two
|
||||
integers, e.g. at addresses 40 and 42.
|
||||
The word at 40 contains the value 33090 (or~-32446),
|
||||
the word at 42 contains the value 6180.
|
Loading…
Reference in a new issue