Initial revision

1984-06-29 14:46:39 +00:00 · 1984-06-29 14:46:39 +00:00 · e0872423d9
commit e0872423d9
parent 253118db19
21 changed files with 7189 additions and 0 deletions
--- a/doc/em/addend.n
+++ b/doc/em/addend.n
--- a/doc/em/app.nr
+++ b/doc/em/app.nr
@ -0,0 +1,488 @@
+.BP
+.AP "EM INTERPRETER"
+.nf
+.ta 8 16 24 32 40 48 56 64 72 80
+.so em.i
+.fi
+.BP
+.AP "EM CODE TABLES"
+The following table is used by the assembler for EM machine
+language.
+It specifies the opcodes used for each instruction and
+how arguments are mapped to machine language arguments.
+The table is presented in three columns,
+each line in each column contains three or four fields.
+Each line describes a range of interpreter opcodes by
+specifying for which instruction the range is used, the type of the
+opcodes (mini, shortie, etc..) and range for the instruction
+argument.
+.A
+The first field on each line gives the EM instruction mnemonic,
+the second field gives some flags.
+If the opcodes are minis or shorties the third field specifies
+how many minis/shorties are used.
+The last field gives the number of the (first) interpreter
+opcode.
+.N 1
+Flags :
+.IS 3
+.N 1
+Opcode type, only one of the following may be specified.
+.PS - 5 "  "
+.PT -
+opcode without argument
+.PT m
+mini
+.PT s
+shortie
+.PT 2
+opcode with 2-byte signed argument
+.PT 4
+opcode with 4-byte signed argument
+.PT 8
+opcode with 8-byte signed argument
+.PE
+Secondary (escaped) opcodes.
+.PS - 5 "  "
+.PT e
+The opcode thus marked is in the secondary opcode group instead
+of the primary
+.PE
+restrictions on arguments
+.PS - 5 "  "
+.PT N
+Negative arguments only
+.PT P
+Positive and zero arguments only
+.PE
+mapping of arguments
+.PS - 5 "  "
+.PT w
+argument must be divisible by the wordsize and is divided by the
+wordsize before use as opcode argument.
+.PT o
+argument ( possibly after division ) must be >= 1 and is
+decremented before use as opcode argument
+.PE
+.IE
+If the opcode type is 2,4 or 8 the resulting argument is used as
+opcode argument (least significant byte first).
+.N
+If the opcode type is mini, the argument is added
+to the first opcode - if in range - .
+If the argument is negative, the absolute value minus one is
+used in the algorithm above.
+.N
+For shorties with positive arguments the first opcode is used
+for arguments in the range 0..255, the second for the range
+256..511, etc..
+For shorties with negative arguments the first opcode is used
+for arguments in the range -1..-256, the second for the range
+-257..-512, etc..
+The byte following the opcode contains the least significant
+byte of the argument.
+First some examples of these specifications.
+.PS - 5
+.PT "aar mwPo 1 34"
+Indicates that opcode 34 is used as a mini for Positive
+instruction arguments only.
+The w and o indicate division and decrementing of the
+instruction argument.
+Because the resulting argument must be zero ( only opcode 34 may be used
+), this mini can only be used for instruction argument 2.
+Conclusion: opcode 34 is for "AAR 2".
+.PT "adp sP 1 41"
+Opcode 41 is used as shortie for ADP with arguments in the range
+0..255.
+.PT "bra sN 2 60"
+Opcode 60 is used as shortie for BRA with arguments -1..-256,
+61 is used for arguments -257..-512.
+.PT "zer e- 145"
+Escaped opcode 145 is used for ZER.
+.PE
+The interpreter opcode table:
+.N 1
+.IS 3
+.DS B
+.so itables
+.DE 0
+.IE
+.P
+The table above results in the following dispatch tables.
+Dispatch tables are used by interpreters to jump to the
+routines implementing the EM instructions, indexed by the next opcode.
+Each line of the dispatch tables gives the routine names
+of eight consecutive opcodes, preceded by the first opcode number
+on that line.
+Routine names consist of an EM mnemonic followed by a suffix.
+The suffices show the encoding used for each opcode.
+.N
+The following suffices exist:
+.N 1
+.VS 1 0
+.IS 4
+.PS - 11
+.PT .z
+no arguments
+.PT .l
+16-bit argument
+.PT .lw
+16-bit argument divided by the wordsize
+.PT .p
+positive 16-bit argument
+.PT .pw
+positive 16-bit argument divided by the wordsize
+.PT .n
+negative 16-bit argument
+.PT .nw
+negative 16-bit argument divided by the wordsize
+.PT .s<num>
+shortie with <num> as high order argument byte
+.PT .sw<num>
+shortie with argument divided by the wordsize
+.PT .<num>
+mini with <num> as argument
+.PT .<num>W
+mini with <num>*wordsize as argument
+.PE 3
+<num> is a possibly negative integer.
+.VS 1 1
+.IE
+The dispatch table for the 256 primary opcodes:
+.DS B
+   0   loc.0    loc.1    loc.2    loc.3    loc.4    loc.5    loc.6    loc.7
+   8   loc.8    loc.9    loc.10   loc.11   loc.12   loc.13   loc.14   loc.15
+  16   loc.16   loc.17   loc.18   loc.19   loc.20   loc.21   loc.22   loc.23
+  24   loc.24   loc.25   loc.26   loc.27   loc.28   loc.29   loc.30   loc.31
+  32   loc.32   loc.33   aar.1W   adf.s0   adi.1W   adi.2W   adp.l    adp.1
+  40   adp.2    adp.s0   adp.s-1  ads.1W   and.1W   asp.1W   asp.2W   asp.3W
+  48   asp.4W   asp.5W   asp.w0   beq.l    beq.s0   bge.s0   bgt.s0   ble.s0
+  56   blm.s0   blt.s0   bne.s0   bra.l    bra.s-1  bra.s-2  bra.s0   bra.s1
+  64   cal.1    cal.2    cal.3    cal.4    cal.5    cal.6    cal.7    cal.8
+  72   cal.9    cal.10   cal.11   cal.12   cal.13   cal.14   cal.15   cal.16
+  80   cal.17   cal.18   cal.19   cal.20   cal.21   cal.22   cal.23   cal.24
+  88   cal.25   cal.26   cal.27   cal.28   cal.s0   cff.z    cif.z    cii.z
+  96   cmf.s0   cmi.1W   cmi.2W   cmp.z    cms.s0   csa.1W   csb.1W   dec.z
+ 104   dee.w0   del.w-1  dup.1W   dvf.s0   dvi.1W   fil.l    inc.z    ine.lw
+ 112   ine.w0   inl.-1W  inl.-2W  inl.-3W  inl.w-1  inn.s0   ior.1W   ior.s0
+ 120   lae.l    lae.w0   lae.w1   lae.w2   lae.w3   lae.w4   lae.w5   lae.w6
+ 128   lal.p    lal.n    lal.0    lal.-1   lal.w0   lal.w-1  lal.w-2  lar.W
+ 136   ldc.0    lde.lw   lde.w0   ldl.0    ldl.w-1  lfr.1W   lfr.2W   lfr.s0
+ 144   lil.w-1  lil.w0   lil.0    lil.1W   lin.l    lin.s0   lni.z    loc.l
+ 152   loc.-1   loc.s0   loc.s-1  loe.lw   loe.w0   loe.w1   loe.w2   loe.w3
+ 160   loe.w4   lof.l    lof.1W   lof.2W   lof.3W   lof.4W   lof.s0   loi.l
+ 168   loi.1    loi.1W   loi.2W   loi.3W   loi.4W   loi.s0   lol.pw   lol.nw
+ 176   lol.0    lol.1W   lol.2W   lol.3W   lol.-1W  lol.-2W  lol.-3W  lol.-4W
+ 184   lol.-5W  lol.-6W  lol.-7W  lol.-8W  lol.w0   lol.w-1  lxa.1    lxl.1
+ 192   lxl.2    mlf.s0   mli.1W   mli.2W   rck.1W   ret.0    ret.1W   ret.s0
+ 200   rmi.1W   sar.1W   sbf.s0   sbi.1W   sbi.2W   sdl.w-1  set.s0   sil.w-1
+ 208   sil.w0   sli.1W   ste.lw   ste.w0   ste.w1   ste.w2   stf.l    stf.W
+ 216   stf.2W   stf.s0   sti.1    sti.1W   sti.2W   sti.3W   sti.4W   sti.s0
+ 224   stl.pw   stl.nw   stl.0    stl.1W   stl.-1W  stl.-2W  stl.-3W  stl.-4W
+ 232   stl.-5W  stl.w-1  teq.z    tgt.z    tlt.z    tne.z    zeq.l    zeq.s0
+ 240   zeq.s1   zer.s0   zge.s0   zgt.s0   zle.s0   zlt.s0   zne.s0   zne.s-1
+ 248   zre.lw   zre.w0   zrl.-1W  zrl.-2W  zrl.w-1  zrl.nw   escape1  escape2
+.DE 2
+The list of secondary opcodes (escape1):
+.N  1
+.DS  B
+   0   aar.l    aar.z    adf.l    adf.z    adi.l    adi.z    ads.l    ads.z
+   8   adu.l    adu.z    and.l    and.z    asp.lw   ass.l    ass.z    bge.l
+  16   bgt.l    ble.l    blm.l    bls.l    bls.z    blt.l    bne.l    cai.z
+  24   cal.l    cfi.z    cfu.z    ciu.z    cmf.l    cmf.z    cmi.l    cmi.z
+  32   cms.l    cms.z    cmu.l    cmu.z    com.l    com.z    csa.l    csa.z
+  40   csb.l    csb.z    cuf.z    cui.z    cuu.z    dee.lw   del.pw   del.nw
+  48   dup.l    dus.l    dus.z    dvf.l    dvf.z    dvi.l    dvi.z    dvu.l
+  56   dvu.z    fef.l    fef.z    fif.l    fif.z    inl.pw   inl.nw   inn.l
+  64   inn.z    ior.l    ior.z    lar.l    lar.z    ldc.l    ldf.l    ldl.pw
+  72   ldl.nw   lfr.l    lil.pw   lil.nw   lim.z    los.l    los.z    lor.s0
+  80   lpi.l    lxa.l    lxl.l    mlf.l    mlf.z    mli.l    mli.z    mlu.l
+  88   mlu.z    mon.z    ngf.l    ngf.z    ngi.l    ngi.z    nop.z    rck.l
+  96   rck.z    ret.l    rmi.l    rmi.z    rmu.l    rmu.z    rol.l    rol.z
+ 104   ror.l    ror.z    rtt.z    sar.l    sar.z    sbf.l    sbf.z    sbi.l
+ 112   sbi.z    sbs.l    sbs.z    sbu.l    sbu.z    sde.l    sdf.l    sdl.pw
+ 120   sdl.nw   set.l    set.z    sig.z    sil.pw   sil.nw   sim.z    sli.l
+ 128   sli.z    slu.l    slu.z    sri.l    sri.z    sru.l    sru.z    sti.l
+ 136   sts.l    sts.z    str.s0   tge.z    tle.z    trp.z    xor.l    xor.z
+ 144   zer.l    zer.z    zge.l    zgt.l    zle.l    zlt.l    zne.l    zrf.l
+ 152   zrf.z    zrl.pw   dch.z    exg.s0   exg.l    exg.z    lpb.z    gto.l
+.DE 2
+Finally, the list of opcodes with four byte arguments (escape2).
+.DS
+
+   0  loc
+.DE 0
+.BP
+.AP "AN EXAMPLE PROGRAM"
+.DS B
+ 1      program example(output);
+ 2      {This program just demonstrates typical EM code.}
+ 3      type rec = record r1: integer; r2:real; r3: boolean end;
+ 4      var mi: integer;  mx:real;  r:rec;
+ 5
+ 6      function sum(a,b:integer):integer;
+ 7      begin
+ 8        sum := a + b
+ 9      end;
+10
+11      procedure test(var r: rec);
+12      label 1;
+13      var i,j: integer;
+14          x,y: real;
+15          b: boolean;
+16          c: char;
+17          a: array[1..100] of integer;
+18
+19      begin
+20              j := 1;
+21              i := 3 * j + 6;
+22              x := 4.8;
+23              y := x/0.5;
+24              b := true;
+25              c := 'z';
+26              for i:= 1 to 100 do a[i] := i * i;
+27              r.r1 := j+27;
+28              r.r3 := b;
+29              r.r2 := x+y;
+30              i := sum(r.r1, a[j]);
+31              while i > 0 do begin j := j + r.r1; i := i - 1 end;
+32              with r do begin r3 := b;  r2 := x+y;  r1 := 0 end;
+33              goto 1;
+34      1:      writeln(j, i:6, x:9:3, b)
+35      end; {test}
+36      begin {main program}
+37        mx := 15.96;
+38        mi := 99;
+39        test(r)
+40      end.
+.DE 0
+.BP
+The EM code as produced by the Pascal-VU compiler is given below. Comments
+have been added manually.  Note that this code has already been  optimized.
+.DS B
+  mes 2,2,2              ; wordsize 2, pointersize 2
+ .1
+  rom 't.p\e000'         ; the name of the source file
+  hol 552,-32768,0       ; externals and buf occupy 552 bytes
+  exp $sum               ; sum can be called from other modules
+  pro $sum,2             ; procedure sum; 2 bytes local storage
+  lin 8                  ; code from source line 8
+  ldl 0                  ; load two locals ( a and b )
+  adi 2                  ; add them
+  ret 2                  ; return the result
+  end 2                  ; end of procedure ( still two bytes local storage )
+ .2
+  rom 1,99,2             ; descriptor of array a[]
+  exp $test              ; the compiler exports all level 0 procedures
+  pro $test,226          ; procedure test, 226 bytes local storage
+ .3
+  rom 4.8F8              ; assemble Floating point 4.8 (8 bytes) in
+ .4                              ; global storage
+  rom 0.5F8              ; same for 0.5
+  mes 3,-226,2,2         ; compiler temporary not referenced by address
+  mes 3,-24,2,0          ; the same is true for i, j, b and c in test
+  mes 3,-22,2,0
+  mes 3,-4,2,0
+  mes 3,-2,2,0
+  mes 3,-20,8,0          ; and for x and y
+  mes 3,-12,8,0
+  lin 20                 ; maintain source line number
+  loc 1
+  stl -4                 ; j := 1
+  lni                    ; lin 21 prior to optimization
+  lol -4
+  loc 3
+  mli 2
+  loc 6
+  adi 2
+  stl -2                 ; i := 3 * j + 6
+  lni                    ; lin 22 prior to optimization
+  lae .3
+  loi 8
+  lal -12
+  sti 8                  ; x := 4.8
+  lni                    ; lin 23 prior to optimization
+  lal -12
+  loi 8
+  lae .4
+  loi 8
+  dvf 8
+  lal -20
+  sti 8                  ; y := x / 0.5
+  lni                    ; lin 24 prior to optimization
+  loc 1
+  stl -22                ; b := true
+  lni                    ; lin 25 prior to optimization
+  loc 122
+  stl -24                ; c := 'z'
+  lni                    ; lin 26 prior to optimization
+  loc 1
+  stl -2                 ; for i:= 1
+ 2
+  lol -2
+  dup 2
+  mli 2                  ; i*i
+  lal -224
+  lol -2
+  lae .2
+  sar 2                  ; a[i] :=
+  lol -2
+  loc 100
+  beq *3                 ; to 100 do
+  inl -2                 ; increment i and loop
+  bra *2
+ 3
+  lin 27
+  lol -4
+  loc 27
+  adi 2                  ; j + 27
+  sil 0                  ; r.r1 :=
+  lni                    ; lin 28 prior to optimization
+  lol -22                ; b
+  lol 0
+  stf 10                 ; r.r3 :=
+  lni                    ; lin 29 prior to optimization
+  lal -20
+  loi 16
+  adf 8                  ; x + y
+  lol 0
+  adp 2
+  sti 8                  ; r.r2 :=
+  lni                    ; lin 23 prior to optimization
+  lal -224
+  lol -4
+  lae .2
+  lar 2                  ; a[j]
+  lil 0                  ; r.r1
+  cal $sum               ; call now
+  asp 4                  ; remove parameters from stack
+  lfr 2                  ; get function result
+  stl -2                 ; i :=
+ 4
+  lin 31
+  lol -2
+  zle *5                 ; while i > 0 do
+  lol -4
+  lil 0
+  adi 2
+  stl -4                 ; j := j + r.r1
+  del -2                 ; i := i - 1
+  bra *4                 ; loop
+ 5
+  lin 32
+  lol 0
+  stl -226               ; make copy of address of r
+  lol -22
+  lol -226
+  stf 10                 ; r3 := b
+  lal -20
+  loi 16
+  adf 8
+  lol -226
+  adp 2
+  sti 8                  ; r2 := x + y
+  loc 0
+  sil -226               ; r1 := 0
+  lin 34                 ; note the abscence of the unnecesary jump
+  lae 22                 ; address of output structure
+  lol -4
+  cal $_wri              ; write integer with default width
+  asp 4                  ; pop parameters
+  lae 22
+  lol -2
+  loc 6
+  cal $_wsi              ; write integer width 6
+  asp 6
+  lae 22
+  lal -12
+  loi 8
+  loc 9
+  loc 3
+  cal $_wrf              ; write fixed format real, width 9, precision 3
+  asp 14
+  lae 22
+  lol -22
+  cal $_wrb              ; write boolean, default width
+  asp 4
+  lae 22
+  cal $_wln              ; writeln
+  asp 2
+  ret 0                  ; return, no result
+  end 226
+  exp $_main
+  pro $_main,0           ; main program
+ .6
+  con 2,-1,22            ; description of external files
+ .5
+  rom 15.96F8
+  fil .1                 ; maintain source file name
+  lae .6                 ; description of external files
+  lae 0                  ; base of hol area to relocate buffer addresses
+  cal $_ini              ; initialize files, etc...
+  asp 4
+  lin 37
+  lae .5
+  loi 8
+  lae 2
+  sti 8                  ; mx := 15.96
+  lni                    ; lin 38 prior to optimization
+  loc 99
+  ste 0                  ; mi := 99
+  lni                    ; lin 39 prior to optimization
+  lae 10                 ; address of r
+  cal $test
+  asp 2
+  loc 0                  ; normal exit
+  cal $_hlt              ; cleanup and finish
+  asp 2
+  end 0
+  mes 5                  ; reals were used
+.DE 0
+The compact code corresponding to the above program is listed below.
+Read it horizontally, line by line, not column by column.
+Each number represents a byte of compact code, printed in decimal.
+The first two bytes form the magic word.
+.N 1
+.IS 3
+.DS B
+173   0 159 122 122 122 255 242   1 161 250 124 116  46 112   0
+255 156 245  40   2 245   0 128 120 155 249 123 115 117 109 160
+249 123 115 117 109 122  67 128  63 120   3 122  88 122 152 122
+242   2 161 121 219 122 255 155 249 124 116 101 115 116 160 249
+124 116 101 115 116 245 226   0 242   3 161 253 128 123  52  46
+ 56 255 242   4 161 253 128 123  48  46  53 255 159 123 245  30
+255 122 122 255 159 123  96 122 120 255 159 123  98 122 120 255
+159 123 116 122 120 255 159 123 118 122 120 255 159 123 100 128
+120 255 159 123 108 128 120 255  67 140  69 121 113 116  68  73
+116  69 123  81 122  69 126   3 122 113 118  68  57 242   3  72
+128  58 108 112 128  68  58 108  72 128  57 242   4  72 128  44
+128  58 100 112 128  68  69 121 113  98  68  69 245 122   0 113
+ 96  68  69 121 113 118 182  73 118  42 122  81 122  58 245  32
+255  73 118  57 242   2  94 122  73 118  69 220  10 123  54 118
+ 18 122 183  67 147  73 116  69 147   3 122 104 120  68  73  98
+ 73 120 111 130  68  58 100  72 136   2 128  73 120   4 122 112
+128  68  58 245  32 255  73 116  57 242   2  59 122  65 120  20
+249 123 115 117 109   8 124  64 122 113 118 184  67 151  73 118
+128 125  73 116  65 120   3 122 113 116  41 118  18 124 185  67
+152  73 120 113 245  30 255  73  98  73 245  30 255 111 130  58
+100  72 136   2 128  73 245  30 255   4 122 112 128  69 120 104
+245  30 255  67 154  57 142  73 116  20 249 124  95 119 114 105
+  8 124  57 142  73 118  69 126  20 249 124  95 119 115 105   8
+126  57 142  58 108  72 128  69 129  69 123  20 249 124  95 119
+114 102   8 134  57 142  73  98  20 249 124  95 119 114  98   8
+124  57 142  20 249 124  95 119 108 110   8 122  88 120 152 245
+226   0 155 249 125  95 109  97 105 110 160 249 125  95 109  97
+105 110 120 242   6 151 122 119 142 255 242   5 161 253 128 125
+ 49  53  46  57  54 255  50 242   1  57 242   6  57 120  20 249
+124  95 105 110 105   8 124  67 157  57 242   5  72 128  57 122
+112 128  68  69 219 110 120  68  57 130  20 249 124 116 101 115
+116   8 122  69 120  20 249 124  95 104 108 116   8 122 152 120
+159 124 160 255 159 125 255
+.DE 0
+.IE
+.MS T A 0
+.ME
+.BP
+.MS B A 0
+.ME
+.CT
--- a/doc/em/assem.nr
+++ b/doc/em/assem.nr
@ -0,0 +1,756 @@
+.BP
+.SN 11
+.S1 "EM ASSEMBLY LANGUAGE"
+We use two representations for assembly language programs,
+one is in ASCII and the other is the compact assembly language.
+The latter needs less space than the first for the same program
+and therefore allows faster processing.
+Our only program accepting ASCII assembly
+language converts it to the compact form.
+All other programs expect compact assembly input.
+The first part of the chapter describes the ASCII assembly
+language and its semantics.
+The second part describes the syntax of the compact assembly
+language.
+The last part lists the EM instructions with the type of
+arguments allowed and an indication of the function.
+Appendix A gives a detailed description of the effect of all
+instructions in the form of a Pascal program.
+.S2 "ASCII assembly language"
+An assembly language program consists of a series of lines, each
+line may be blank, contain one (pseudo)instruction or contain one
+label.
+Input to the assembler is in lower case.
+Upper case is used in this
+document merely to distinguish keywords from the surrounding prose.
+Comment is allowed at the end of each line and starts with a semicolon ";".
+This kind of comment does not exist in the compact form.
+.A
+Labels must be placed all by themselves on a line and start in
+column 1.
+There are two kinds of labels, instruction and data labels.
+Instruction labels are unsigned positive integers.
+The scope of an instruction label is its procedure.
+.A
+The pseudoinstructions CON, ROM and BSS may be preceded by a
+line containing a
+1-8 character data label, the first character of which is a
+letter, period or underscore.
+The period may only be followed by
+digits, the others may be followed by letters, digits and underscores.
+The use of the character "." followed by a constant,
+which must be in the range 1 to 32767 (e.g. ".40") is recommended
+for compiler
+generated programs.
+These labels are considered as a special case and handled
+more efficiently in compact assembly language (see below).
+Note that a data label on its own or two consecutive labels are not
+allowed.
+.P
+Each statement may contain an instruction mnemonic or pseudoinstruction.
+These must begin in column 2 or later (not column 1) and must be followed
+by a space, tab, semicolon or LF.
+Everything on the line following a semicolon is
+taken as a comment.
+.P
+Each input file contains one module.
+A module may contain many procedures,
+which may be nested.
+A procedure consists of
+a PRO statement, a (possibly empty)
+collection of instructions and pseudoinstructions and finally an END
+statement.
+Pseudoinstructions are also allowed between procedures.
+They do not belong to a specific procedure.
+.P
+All constants in EM are interpreted in the decimal base.
+The ASCII assembly language accepts constant expressions
+wherever constants are allowed.
+The operators recognized are: +, -, *, % and / with the usual
+precedence order.
+Use of the parentheses ( and ) to alter the precedence order is allowed.
+.S3 "Instruction arguments"
+Unlike many other assembly languages, the EM assembly
+language requires all arguments of normal and pseudoinstructions
+to be either a constant or an identifier, but not a combination
+of these two.
+There is one exception to this rule: when a data label is used
+for initialization or as an instruction argument,
+expressions of the form 'label+constant' and 'label-constant'
+are allowed.
+This makes it possible to address, for example, the
+third word of a ten word BSS block
+directly.
+Thus LOE LABEL+4 is permitted and so is CON LABEL+3.
+The resulting address is must be in the same fragment as the label.
+It is not allowed to add or subtract from instruction labels or procedure
+identifiers,
+which certainly is not a severe restriction and greatly aids
+optimization.
+.P
+Instruction arguments can be constants,
+data labels, data labels offsetted by a constant, instruction
+labels and procedure identifiers.
+The range of integers allowed depends on the instruction.
+Most instructions allow only integers
+(signed or unsigned)
+that fit in a word.
+Arguments used as offsets to pointers should fit in a
+pointer-sized integer.
+Finally, arguments to LDC should fit in a double-word integer.
+.P
+Several instructions have two possible forms:
+with an explicit argument and with an implicit argument on top of the stack.
+The size of the implicit argument is the wordsize.
+The implicit argument is always popped before all other operands.
+For example: 'CMI 4' specifies that two four-byte signed
+integers on top of the stack are to be compared.
+\&'CMI' without an argument expects a wordsized integer
+on top of the stack that specifies the size of the integers to
+be compared.
+Thus the following two sequences are equivalent:
+.N 2
+.TS
+center, tab(:) ;
+l r 30 l r.
+LDL:-10:LDL:-10
+LDL:-14:LDL:-14
+::LOC:4
+CMI:4:CMI:
+ZEQ:*1:ZEQ:*1
+.TE 2
+Section 11.1.6 shows the arguments allowed for each instruction.
+.S3 "Pseudoinstruction arguments"
+Pseudoinstruction arguments can be divided in two classes:
+Initializers and others.
+The following initializers are allowed: signed integer constants,
+unsigned integer constants, floating-point constants, strings,
+data labels, data labels offsetted by a constant, instruction
+labels and procedure identifiers.
+.P
+Constant initializers in BSS, HOL, CON and ROM pseudoinstructions
+can be followed by a letter I, U or F.
+This indicator
+specifies the type of the initializer: Integer, Unsigned or Float.
+If no indicator is present I is assumed.
+The size of the object is the wordsize unless
+the indicator is followed by an integer specifying the
+object's size.
+This integer is governed by the same restrictions as for
+transfer of objects to/from memory.
+As in instruction arguments, initializers include expressions of the form:
+\&"LABEL+offset" and "LABEL-offset".
+The offset must be an unsigned decimal constant.
+The 'IUF' indicators cannot be used in the offsets.
+.P
+Data labels are referred to by their name.
+.P
+Strings are surrounded by double quotes (").
+Semecolon's in string do not indicate the start of comment.
+In the ASCII representation the escape character \e (backslash)
+alters the meaning of subsequent character(s).
+This feature allows inclusion of zeroes, graphic characters and
+the double quote in the string.
+The following escape sequences exist:
+.DS
+.TS
+center, tab(:);
+l l l.
+newline:NL\|(LF):\en
+horizontal tab:HT:\et
+backspace:BS:\eb
+carriage return:CR:\er
+form feed:FF:\ef
+backslash:\e:\e\e
+double quote:":\e"
+bit pattern:\fBddd\fP:\e\fBddd\fP
+.TE
+.DE
+The escape \fBddd\fP consists of the backslash followed by 1,
+2, or 3 octal digits specifing the value of
+the desired character.
+If the character following a backslash is not one of those
+specified,
+the backslash is ignored.
+Example: CON "hello\e012\e0".
+Each string element initializes a single byte.
+The ASCII character set is used to map characters onto values.
+Strings are padded with zeroes up to a multiple of the wordsize.
+.P
+Instruction labels are referred to as *1, *2, etc.  in both branch
+instructions and as initializers.
+.P
+The notation $procname means the identifier for the procedure
+with the specified name.
+This identifier has the size of a pointer.
+.S3 Notation
+First, the notation used for the arguments, classes of
+instructions and pseudoinstructions.
+.IS 2
+.TS
+tab(:);
+l l l.
+<cst>:\&=:integer constant (current range -2**31..2**31-1)
+<dlb>:\&=:data label
+<arg>:\&=:<cst> or <dlb> or <dlb>+<cst> or <dlb>-<cst>
+<con>:\&=:integer constant, unsigned constant, floating-point constant
+<str>:\&=:string constant (surrounded by double quotes),
+<ilb>:\&=:instruction label
+::'*' followed by an integer in the range 0..32767.
+<pro>:\&=:procedure number ('$' followed by a procedure name)
+<val>:\&=:<arg>, <con>, <pro> or <ilb>.
+<par>:\&=:<val> or <str>
+<...>*:\&=:zero or more of <...>
+<...>+:\&=:one or more of <...>
+[...]:\&=:optional ...
+.TE
+.IE
+.S3 "Pseudoinstructions"
+.S4 Storage declaration
+Initialized global data is allocated by the pseudoinstruction CON,
+which needs at least one argument.
+For each argument, an integral number of words,
+determined by the argument type, is allocated and initialized.
+.P
+The pseudoinstruction ROM is the same as CON,
+except that it guarantees that the initialized words
+will not change during the execution of the program.
+This information allows optimizers to do
+certain calculations such as array indexing and
+subrange checking at compile time instead
+of at run time.
+.P
+The pseudoinstruction BSS allocates
+uninitialized global data or large blocks of data initialized
+by the same value.
+The first argument to this pseudo is the number
+of bytes required, which must be a multiple of the wordsize.
+The other arguments specify the value used for initialization and
+whether the initialization is only for convenience or a strict necessity.
+The pseudoinstruction HOL is similar to BSS in that it requests an
+(un)initialized global data block.
+Addressing of a HOL block, however, is quasi absolute.
+The first byte is addressed by 0,
+the second byte by 1 etc. in assembly language.
+The assembler/loader adds the base address of
+the HOL block to these numbers to obtain the
+absolute address in the machine language.
+.P
+The scope of a HOL block starts at the HOL pseudo and
+ends at the next HOL pseudo or at the end of a module
+whatever comes first.
+Each instruction falls in the scope of at most one
+HOL block, the current HOL block.
+It is not allowed to have more than one HOL block per procedure.
+.P
+The alignment restrictions are enforced by the
+pseudoinstructions.
+All objects are aligned on a multiple of their size or the wordsize
+whichever is smaller.
+Switching to another type of fragment or placing a label forces
+word-alignment.
+There are three types of fragments in global data space: CON, ROM and
+BSS/HOL.
+.N 2
+.IS 2
+.PS - 4
+.PT "BSS <cst1>,<val>,<cst2>"
+Reserve <cst1> bytes.
+<val> is the value used to initialize the area.
+<cst1> must be a multiple of the size of <val>.
+<cst2> is 0 if the initialization is not strictly necessary,
+1 if it is.
+.PT "HOL <cst1>,<val>,<cst2>"
+Idem, but all following absolute global data references will
+refer to this block.
+Only one HOL is allowed per procedure,
+it has to be placed before the first instruction.
+.PT "CON <val>+"
+Assemble global data words initialized with the <val> constants.
+.PT "ROM <val>+"
+Idem, but the initialized data will never be changed by the program.
+.PE
+.IE
+.S4 Partitioning
+Two pseudoinstructions partition the input into procedures:
+.IS 2
+.PS - 4
+.PT "PRO <pro>[,<cst>]"
+Start of procedure.
+<pro> is the procedure name.
+<cst> is the number of bytes for locals.
+The number of bytes for locals must be specified in the PRO or
+END pseudoinstruction.
+When specified in both, they must be identical.
+.PT "END  [<cst>]"
+End of Procedure.
+<cst> is the number of bytes for locals.
+The number of bytes for locals must be specified in either the PRO or
+END pseudoinstruction or both.
+.PE
+.IE
+.S4 Visibility
+Names of data and procedures in an EM module can either be
+internal or external.
+External names are known outside the module and are used to link
+several pieces of a program.
+Internal names are not known outside the modules they are used in.
+Other modules will not 'see' an internal name.
+.A
+To reduce the number of passes needed,
+it must be known at the first occurrence whether
+a name is internal or external.
+If the first occurrence of a name is in a definition,
+the name is considered to be internal.
+If the first occurrence of a name is a reference,
+the name is considered to be external.
+If the first occurrence is in one of the following pseudoinstructions,
+the effect of the pseudo has precedence.
+.IS 2
+.PS - 4
+.PT "EXA <dlb>"
+External name.
+<dlb> is known, possibly defined, outside this module.
+Note that <dlb> may be defined in the same module.
+.PT "EXP <pro>"
+External procedure identifier.
+Note that <pro> may be defined in the same module.
+.PT "INA <dlb>"
+Internal name.
+<dlb> is internal to this module and must be defined in this module.
+.PT "INP <pro>"
+Internal procedure.
+<pro> is internal to this module and must be defined in this module.
+.PE
+.IE
+.S4 Miscellaneous
+Two other pseudoinstructions provide miscellaneous features:
+.IS 2
+.PS - 4
+.PT "EXC <cst1>,<cst2>"
+Two blocks of instructions preceding this one are
+interchanged before being processed.
+<cst1> gives the number of lines of the first block.
+<cst2> gives the number of lines of the second one.
+Blank and pure comment lines do not count.
+.PT "MES <cst>[,<par>]*"
+A special type of comment.
+Used by compilers to communicate with the
+optimizer, assembler, etc. as follows:
+.VS 1 0
+.PS - 4
+.PT "MES 0"
+An error has occurred, stop further processing.
+.PT "MES 1"
+Suppress optimization.
+.PT "MES 2,<cst1>,<cst2>"
+Use wordsize <cst1> and pointer size <cst2>.
+.PT "MES 3,<cst1>,<cst2>,<cst3>,<cst4>"
+Indicates that a local variable is never referenced indirectly.
+Used to indicate that a register may be used for a specific
+variable.
+<cst1> is offset in bytes from AB if positive
+and offset from LB if negative.
+<cst2> gives the size of the variable.
+<cst3> indicates the class of the variable.
+The following values are currently recognized:
+.PS
+.PT 0
+The variable can be used for anything.
+.PT 1
+The variable is used as a loopindex.
+.PT 2
+The variable is used as a pointer.
+.PT 3
+The variable is used as a floating point number.
+.PE 0
+<cst4> gives the priority of the variable,
+higher numbers indicate better candidates.
+.PT "MES 4,<cst>,<str>"
+Number of source lines in file <str> (for profiler).
+.PT "MES 5"
+Floating point used.
+.PT "MES 6,<val>*"
+Comment.  Used to provide comments in compact assembly language.
+.PT "MES 7,....."
+Reserved.
+.PT "MES 8,<pro>[,<dlb>]..."
+Library module. Indicates that the module may only be loaded
+if it is useful, that is, if it can satisfy any unresolved
+references during the loading process.
+May not be preceded by any other pseudo, except MES's.
+.PT "MES 9,<cst>"
+Guarantees that no more than <cst> bytes of parameters are
+accessed, either directly or indirectly.
+.PE 1
+.VS 1 1
+Each backend is free to skip irrelevant MES pseudos.
+.PE
+.IE
+.S2 "The Compact Assembly Language"
+The assembler accepts input in a highly encoded form.
+This
+form is intended to reduce the amount of file transport between the
+front ends, optimizers
+and back ends, and also reduces the amount of storage required for storing
+libraries.
+Libraries are stored as archived compact assembly language, not machine
+language.
+.P
+When beginning to read the input, the assembler is in neutral state, and
+expects either a label or an instruction (including the pseudoinstructions).
+The meaning of the next byte(s) when in neutral state is as follows, where
+b1, b2
+etc. represent the succeeding bytes.
+.N 1
+.DS
+.TS
+tab(:) ;
+rw17 4 l.
+0:Reserved for future use
+1-129:Machine instructions, see Appendix A, alphabetical list
+130-149:Reserved for future use
+150-161:BSS,CON,END,EXA,EXC,EXP,HOL,INA,INP,MES,PRO,ROM
+162-179:Reserved for future pseudoinstructions
+180-239:Instruction labels 0 - 59  (180 is local label 0 etc.)
+240-244:See the Common Table below
+245-255:Not used
+.TE 1
+.DE 0
+After a label, the assembler is back in neutral state; it can immediately
+accept another label or an instruction in the next byte.
+No linefeeds are used to separate lines.
+.P
+If an opcode expects no arguments,
+the assembler is back in neutral state after
+reading the one byte containing the instruction number.
+If it has one or
+more arguments (only pseudos have more than 1), the arguments follow directly,
+encoded as follows:
+.N 1
+.IS 2
+.TS
+tab(:);
+r l.
+0-239:Offsets from -120 to 119
+
+240-255:See the Common Table below
+.TE 1
+Absence of an optional argument is indicated by a special
+byte.
+.IE 2
+.CS
+Common Table for Neutral State and Arguments
+.CE
+.TS
+tab(:);
+c c s c
+l8 l l8 l.
+class:bytes:description
+
+<ilb>:240:b1:Instruction label b1  (Not used for branches)
+<ilb>:241:b1 b2:16 bit instruction label  (256*b2 + b1)
+<dlb>:242:b1:Global label .0-.255, with b1 being the label
+<dlb>:243:b1 b2:Global label .0-.32767
+:::with 256*b2+b1 being the label
+<dlb>:244:<string>:Global symbol not of the form .nnn
+<cst>:245:b1 b2:16 bit constant
+<cst>:246:b1 b2 b3 b4:32 bit constant
+<cst>:247:b1 .. b8:64 bit constant
+<arg>:248:<dlb><cst>:Global label + (possibly negative) constant
+<pro>:249:<string>:Procedure name  (not including $)
+<str>:250:<string>:String used in CON or ROM (no quotes-no escapes)
+<con>:251:<cst><string>:Integer constant, size <cst> bytes
+<con>:252:<cst><string>:Unsigned constant, size <cst> bytes
+<con>:253:<cst><string>:Floating constant, size <cst> bytes
+:254::unused
+<end>:255::Delimiter for argument lists or
+:::indicates absence of optional argument
+.TE 1
+.P
+The bytes specifying the value of a 16, 32 or 64 bit constant
+are presented in two's complement notation, with the least
+significant byte first. For example: the value of a 32 bit
+constant is ((s4*256+b3)*256+b2)*256+b1, where s4 is b4-256 if
+b4 is greater than 128 else s4 takes the value of b4.
+A <string> consists of a <cst> inmediatly followed by
+a sequence of bytes with length <cst>.
+.P
+.ne 8
+The pseudoinstructions fall into several categories, depending on their
+arguments:
+.N 1
+.DS
+ Group 1 -- EXC, BSS, HOL have a known number of arguments
+ Group 2 -- EXA, EXP, INA, INP have a string as argument
+ Group 3 -- CON, MES, ROM have a variable number of various things
+ Group 4 -- END, PRO have a trailing optional argument.
+.DE 1
+Groups 1 and 2
+use the encoding described above.
+Group 3 also uses the encoding listed above, with an <end> byte after the
+last argument to indicate the end of the list.
+Group 4 uses
+an <end> byte if the trailing argument is not present.
+.N 2
+.IS 2
+.TS
+tab(|);
+l s l
+l s s
+l 2 lw(46) l.
+Example  ASCII|Example compact
+(LOC = 69, BRA = 18 here):
+
+2||182
+1||181
+ LOC|10|69 130
+ LOC|-10|69 110
+ LOC|300|69 245 44 1
+ BRA|*19|18 139
+300||241 44 1
+.3||242 3
+ CON|4,9,*2,$foo|151 124 129 240 2 249 123 102 111 111 255
+ CON|.35|151 242 35 255
+.TE 0
+.IE 0
+.BP
+.S2 "Assembly language instruction list"
+.P
+For each instruction in the list the range of argument values
+in the assembly language is given.
+The column headed \fIassem\fP contains the mnemonics defined
+in 11.1.3.
+The following column specifies restrictions of the argument
+value.
+Addresses have to obey the restrictions mentioned in chapter 2.
+The classes of arguments
+are indicated by letters:
+.ds b \fBb\fP
+.ds c \fBc\fP
+.ds d \fBd\fP
+.ds g \fBg\fP
+.ds f \fBf\fP
+.ds l \fBl\fP
+.ds n \fBn\fP
+.ds w \fBw\fP
+.ds p \fBp\fP
+.ds r \fBr\fP
+.ds s \fBs\fP
+.ds z \fBz\fP
+.ds o \fBo\fP
+.ds - \fB-\fP
+.N 1
+.TS
+tab(:);
+c s l l
+l l 15 l l.
+\fIassem\fP:constraints:rationale
+
+\&\*c:cst:fits word:constant
+\&\*d:cst:fits double word:constant
+\&\*l:cst::local offset
+\&\*g:arg:>= 0:global offset
+\&\*f:cst::fragment offset
+\&\*n:cst:>= 0:counter
+\&\*s:cst:>0 , word multiple:object size
+\&\*z:cst:>= 0 , zero or word multiple:object size
+\&\*o:cst:>= 0 , word multiple or fraction:object size
+\&\*w:cst:> 0 , word multiple:object size *
+\&\*p:pro::pro identifier
+\&\*b:ilb:>= 0:label number
+\&\*r:cst:0,1,2:register number
+\&\*-:::no argument
+.TE 1
+.P
+The * at the rationale for \*w indicates that the argument
+can either be given as argument or on top of the stack.
+If the argument is omitted, the argument is fetched from the
+stack;
+it is assumed to be a wordsized unsigned integer.
+Instructions that check for undefined integer or floating-point
+values and underflow or overflow
+are indicated below by (*).
+.N 1
+.DS B
+GROUP 1 - LOAD
+
+  LOC \*c : Load constant (i.e. push one word onto the stack)
+  LDC \*d : Load double constant ( push two words )
+  LOL \*l : Load word at \*l-th local (\*l<0) or parameter (\*l>=0)
+  LOE \*g : Load external word \*g
+  LIL \*l : Load word pointed to by \*l-th local or parameter
+  LOF \*f : Load offsetted (top of stack + \*f yield address)
+  LAL \*l : Load address of local or parameter
+  LAE \*g : Load address of external
+  LXL \*n : Load lexical (address of LB \*n static levels back)
+  LXA \*n : Load lexical (address of AB \*n static levels back)
+  LOI \*o : Load indirect \*o bytes (address is popped from the stack)
+  LOS \*w : Load indirect, \*w-byte integer on top of stack gives object size
+  LDL \*l : Load double local or parameter (two consecutive words are stacked)
+  LDE \*g : Load double external (two consecutive externals are stacked)
+  LDF \*f : Load double offsetted (top of stack + \*f yield address)
+  LPI \*p : Load procedure identifier
+
+GROUP 2 - STORE
+
+  STL \*l : Store local or parameter
+  STE \*g : Store external
+  SIL \*l : Store into word pointed to by \*l-th local or parameter
+  STF \*f : Store offsetted
+  STI \*o : Store indirect \*o bytes (pop address, then data)
+  STS \*w : Store indirect, \*w-byte integer on top of stack gives object size
+  SDL \*l : Store double local or parameter
+  SDE \*g : Store double external
+  SDF \*f : Store double offsetted
+
+GROUP 3 - INTEGER ARITHMETIC
+
+  ADI \*w : Addition (*)
+  SBI \*w : Subtraction (*)
+  MLI \*w : Multiplication (*)
+  DVI \*w : Division (*)
+  RMI \*w : Remainder (*)
+  NGI \*w : Negate (two's complement) (*)
+  SLI \*w : Shift left (*)
+  SRI \*w : Shift right (*)
+
+GROUP 4 - UNSIGNED ARITHMETIC
+
+  ADU \*w : Addition
+  SBU \*w : Subtraction
+  MLU \*w : Multiplication
+  DVU \*w : Division
+  RMU \*w : Remainder
+  SLU \*w : Shift left
+  SRU \*w : Shift right
+
+GROUP 5 - FLOATING POINT ARITHMETIC
+
+  ADF \*w : Floating add (*)
+  SBF \*w : Floating subtract (*)
+  MLF \*w : Floating multiply (*)
+  DVF \*w : Floating divide (*)
+  NGF \*w : Floating negate (*)
+  FIF \*w : Floating multiply and split integer and fraction part (*)
+  FEF \*w : Split floating number in exponent and fraction part (*)
+
+GROUP 6 - POINTER ARITHMETIC
+
+  ADP \*f : Add \*f to pointer on top of stack
+  ADS \*w : Add \*w-byte value and pointer
+  SBS \*w : Subtract pointers in same fragment and push diff as size \*w integer
+
+GROUP 7 - INCREMENT/DECREMENT/ZERO
+
+  INC \*- : Increment word on top of stack by 1 (*)
+  INL \*l : Increment local or parameter (*)
+  INE \*g : Increment external (*)
+  DEC \*- : Decrement word on top of stack by 1 (*)
+  DEL \*l : Decrement local or parameter (*)
+  DEE \*g : Decrement external (*)
+  ZRL \*l : Zero local or parameter
+  ZRE \*g : Zero external
+  ZRF \*w : Load a floating zero of size \*w
+  ZER \*w : Load \*w zero bytes
+
+GROUP 8 - CONVERT    (stack: source, source size, dest. size (top))
+
+  CII \*- : Convert integer to integer (*)
+  CUI \*- : Convert unsigned to integer (*)
+  CFI \*- : Convert floating to integer (*)
+  CIF \*- : Convert integer to floating (*)
+  CUF \*- : Convert unsigned to floating (*)
+  CFF \*- : Convert floating to floating (*)
+  CIU \*- : Convert integer to unsigned
+  CUU \*- : Convert unsigned to unsigned
+  CFU \*- : Convert floating to unsigned
+
+GROUP 9 - LOGICAL
+
+  AND \*w : Boolean and on two groups of \*w bytes
+  IOR \*w : Boolean inclusive or on two groups of \*w bytes
+  XOR \*w : Boolean exclusive or on two groups of \*w bytes
+  COM \*w : Complement (one's complement of top \*w bytes)
+  ROL \*w : Rotate left a group of \*w bytes
+  ROR \*w : Rotate right a group of \*w bytes
+
+GROUP 10 - SETS
+
+  INN \*w : Bit test on \*w byte set (bit number on top of stack)
+  SET \*w : Create singleton \*w byte set with bit n on (n is top of stack)
+
+GROUP 11 - ARRAY
+
+  LAR \*w : Load array element, descriptor contains integers of size \*w
+  SAR \*w : Store array element
+  AAR \*w : Load address of array element
+
+GROUP 12 - COMPARE
+
+  CMI \*w : Compare \*w byte integers, Push negative, zero, positive for <, = or >
+  CMF \*w : Compare \*w byte reals
+  CMU \*w : Compare \*w byte unsigneds
+  CMS \*w : Compare \*w byte values, can only be used for bit for bit equality test
+  CMP \*- : Compare pointers
+
+  TLT \*- : True if less, i.e. iff top of stack < 0
+  TLE \*- : True if less or equal, i.e. iff top of stack <= 0
+  TEQ \*- : True if equal, i.e. iff top of stack = 0
+  TNE \*- : True if not equal, i.e. iff top of stack non zero
+  TGE \*- : True if greater or equal, i.e. iff top of stack >= 0
+  TGT \*- : True if greater, i.e. iff top of stack > 0
+
+GROUP 13 - BRANCH
+
+  BRA \*b : Branch unconditionally to label \*b
+
+  BLT \*b : Branch less (pop 2 words, branch if top > second)
+  BLE \*b : Branch less or equal
+  BEQ \*b : Branch equal
+  BNE \*b : Branch not equal
+  BGE \*b : Branch greater or equal
+  BGT \*b : Branch greater
+
+  ZLT \*b : Branch less than zero (pop 1 word, branch negative)
+  ZLE \*b : Branch less or equal to zero
+  ZEQ \*b : Branch equal zero
+  ZNE \*b : Branch not zero
+  ZGE \*b : Branch greater or equal zero
+  ZGT \*b : Branch greater than zero
+
+GROUP 14 - PROCEDURE CALL
+
+  CAI \*- : Call procedure (procedure identifier on stack)
+  CAL \*p : Call procedure (with identifier \*p)
+  LFR \*s : Load function result
+  RET \*z : Return (function result consists of top \*z bytes)
+
+GROUP 15 - MISCELLANEOUS
+
+  ASP \*f : Adjust the stack pointer by \*f
+  ASS \*w : Adjust the stack pointer by \*w-byte integer
+  BLM \*z : Block move \*z bytes; first pop destination addr, then source addr
+  BLS \*w : Block move, size is in \*w-byte integer on top of stack
+  CSA \*w : Case jump; address of jump table at top of stack
+  CSB \*w : Table lookup jump; address of jump table at top of stack
+  DCH \*- : Follow dynamic chain, convert LB to LB of caller
+  DUP \*s : Duplicate top \*s bytes
+  DUS \*w : Duplicate top \*w bytes
+  EXG \*w : Exchange top \*w bytes
+  FIL \*g : File name (external 4 := \*g)
+  GTO \*g : Non-local goto, descriptor at \*g
+  LIM \*- : Load 16 bit ignore mask
+  LIN \*n : Line number (external 0 := \*n)
+  LNI \*- : Line number increment
+  LOR \*r : Load register (0=LB, 1=SP, 2=HP)
+  LPB \*- : Convert local base to argument base
+  MON \*- : Monitor call
+  NOP \*- : No operation
+  RCK \*w : Range check; trap on error
+  RTT \*- : Return from trap
+  SIG \*- : Trap errors to proc identifier on top of stack, -2 resets default
+  SIM \*- : Store 16 bit ignore mask
+  STR \*r : Store register (0=LB, 1=SP, 2=HP)
+  TRP \*- : Cause trap to occur (Error number on stack)
+.DE 0
--- a/doc/em/descr.nr
+++ b/doc/em/descr.nr
@ -0,0 +1,164 @@
+.SN 7
+.BP
+.S1 "DESCRIPTORS"
+Several instructions use descriptors, notably the range check instruction,
+the array instructions, the goto instruction and the case jump instructions.
+Descriptors reside in data space.
+They may be constructed at run time, but
+more often they are fixed and allocated in ROM data.
+.P
+All instructions using descriptors, except GTO, have as argument
+the size of the integers in the descriptor.
+All implementations have to allow integers of the size of a
+word in descriptors.
+All integers popped from the stack and used for indexing or comparing
+must have the same size as the integers in the descriptor.
+.S2 "Range check descriptors"
+Range check descriptors consist of two integers:
+.IS 2
+.PS 1 4 "" .
+.PT
+lower bound~~~~~~~signed
+.PT
+upper bound~~~~~~~signed
+.PE
+.IE
+The range check instruction checks an integer on the stack against
+these bounds and causes a trap if the value is outside the interval.
+The value itself is neither changed nor removed from the stack.
+.S2 "Array descriptors"
+Each array descriptor describes a single dimension.
+For multi-dimensional arrays, several array instructions are
+needed to access a single element.
+Array descriptors contain the following three integers:
+.IS 2
+.PS 1 4 "" .
+.PT
+lower bound~~~~~~~~~~~~~~~~~~~~~signed
+.PT
+upper bound - lower bound~~~~~~~unsigned
+.PT
+number of bytes per element~~~~~unsigned
+.PE
+.IE
+The array instructions LAR, SAR and AAR have the pointer to the start
+of the descriptor as operand on the stack.
+.sp
+The element A[I] is fetched as follows:
+.IS 2
+.PS 1 4 "" .
+.PT
+Stack the address of A  (e.g., using LAE or LAL)
+.PT
+Stack the value of I (n-byte integer)
+.PT
+Stack the pointer to the descriptor (e.g., using LAE)
+.PT
+LAR n (n is the size of the integers in the descriptor and I)
+.PE
+.IE
+All array instructions first pop the address of the descriptor
+and the index.
+If the index is not within the bounds specified, a trap occurs.
+If ok, (I~-~lower bound) is multiplied
+by the number of bytes per element (the third word).  The result is added
+to the address of A and replaces A on the stack.
+.A
+At this point LAR, SAR and AAR diverge.
+AAR is finished.  LAR pops the address and fetches the data
+item,
+the size being specified by the descriptor.
+The usual restrictions for memory access must be obeyed.
+SAR pops the address and stores the
+data item now exposed.
+.S2 "Non-local goto descriptors"
+The GTO instruction provides a way of returning directly to any
+active procedure invocation.
+The argument of the instruction is the address of a descriptor
+containing three pointers:
+.IS 2
+.PS 1 4 "" .
+.PT
+value of PC after the jump
+.PT
+value of SP after the jump
+.PT
+value of LB after the jump
+.PE
+.IE
+GTO replaces the loads PC, SP and LB from the descriptor,
+thereby jumping to a procedure
+and removing zeor or more frames from the stack.
+The LB, SP and PC in the descriptor must belong to a
+dynamically enclosing procedure,
+because some EM implementations will need to backtrack through
+the dynamic chain and use the implementation dependent data
+in frames to restore registers etc.
+.S2 "Case descriptors"
+The case jump instructions CSA and CSB both
+provide multiway branches selected by a case index.
+Both fetch two operands from the stack:
+first a pointer to the low address of the case descriptor
+and then the case index.
+CSA uses the case index as index in the descriptor table, but CSB searches
+the table for an occurrence of the case index.
+Therefore, the descriptors for CSA and CSB,
+as shown in figure 4, are different.
+All pointers in the table must be addresses of instructions in the
+procedure executing the case instruction.
+.P
+CSA selects the new PC by indexing.
+If the index, a signed integer, is greater than or equal to
+the lower bound and less than or equal to the upper bound,
+then fetch the new PC from the list of instruction pointers by indexing with
+index-lower.
+The table does not contain the value of the upper bound,
+but the value of upper-lower as an unsigned integer.
+If the index is out of bounds or if the fetched pointer is 0,
+then fetch the default instruction pointer.
+If the resulting PC is 0, then trap.
+.P
+CSB selects the new PC by searching.
+The table is searched for an entry with index value equal to the case index.
+That entry or, if none is found, the default entry contains the
+new PC.
+When the resulting PC is 0, a trap is performed.
+.P
+The choice of which case instruction to use for
+each source language case statement
+is up to the front end.
+If the range of the index value is dense, i.e
+.DS
+(highest value - lowest value) / number of cases
+.DE 1
+is less than some threshold, then CSA is the obvious choice.
+If the range is sparse, CSB is better.
+.N 2
+.DS
+   |--------------------|        |--------------------|  high address
+   | pointer for upb    |        |    pointer n-1     |
+   |--------------------|        |-  -  -  -  -  -  - |
+   |         .          |        |     index  n-1     |
+   |         .          |        |--------------------|
+   |         .          |        |          .         |
+   |         .          |        |          .         |
+   |         .          |        |          .         |
+   |         .          |        |--------------------|
+   |         .          |        |    pointer  1      |
+   |--------------------|        |-  -  -  -  -  -  - |
+   | pointer for lwb+1  |        |     index   1      |
+   |--------------------|        |--------------------|
+   | pointer for lwb    |        |    pointer  0      |
+   |--------------------|        |-  -  -  -  -  -  - |
+   |   upper - lower    |        |     index   0      |
+   |--------------------|        |--------------------|
+   |    lower bound     |        | number of entries  |
+   |--------------------|        |--------------------|
+   |  default pointer   |        |  default pointer   |  low address
+   |--------------------|        |--------------------|
+
+       CSA descriptor                CSB descriptor
+
+
+      Figure 4. Descriptor layout for CSA and CSB
+.DE
--- a/doc/em/dspace.nr
+++ b/doc/em/dspace.nr
@ -0,0 +1,377 @@
+.BP
+.SN 4
+.S1 "DATA ADDRESS SPACE"
+The data address space is divided into three parts, called 'areas',
+each with its own addressing method:
+global data area,
+local data area (including the stack),
+and heap data area.
+These data areas must be part of the same
+address space because all data is accessed by
+the same type of pointers.
+.P
+Space for global data is reserved using several pseudoinstructions in the
+assembly language, as described in
+the next paragraph and chapter 11.
+The size of the global data area is fixed per program.
+.A
+Global data is addressed absolutely in the machine language.
+Many instructions are available to address global data.
+They all have an absolute address as argument.
+Examples are LOE, LAE and STE.
+.P
+Part of the global data area is initialized by the
+compiler, the
+rest is not initialized at all or is initialized
+with a value, typically -32768 or 0.
+Part of the initialized global data may be made read-only
+if the implementation supports protection.
+.P
+The local data area is used as a stack,
+which grows from high to low addresses
+and contains some data for each active procedure
+invocation, called a 'frame'.
+The size of the local data area varies dynamically during
+execution.
+Below the current procedure frame resides the operand stack.
+The stack pointer SP always points to the bottom of
+the local data area.
+Local data is addressed by offsetting from the local base pointer LB.
+LB always points to the frame of the current procedure.
+Only the words of the current frame and the parameters
+can be addressed directly.
+Variables in other active procedures are addressed by following
+the chain of statically enclosing procedures using the LXL or LXA instruction.
+The variables in dynamically enclosing procedures can be
+addressed with the use of the DCH instruction.
+.A
+Many instructions have offsets to LB as argument,
+for instance LOL, LAL and STL.
+The arguments of these instructions range from -1 to some
+(negative) minimum
+for the access of local storage and from 0 to some (positive)
+maximum for parameter access.
+.P
+The procedure call instructions CAL and CAI each create a new frame
+on the stack.
+Each procedure has an assembly-time parameter specifying
+the number of bytes needed for local storage.
+This storage is allocated each time the procedure is called and
+must be a multiple of the wordsize.
+Each procedure, therefore, starts with a stack with the local variables
+already allocated.
+The return instructions RET and RTT remove a frame.
+The actual parameters must be removed by the calling procedure.
+.P
+RET may copy some words from the stack of
+the returning procedure to an unnamed 'function return area'.
+This area is available for 'READ-ONCE' access using the LFR instruction.
+The result of a LFR is only defined if the size used to fetch
+is identical to the size used in the last return.
+The instruction ASP, used to remove the parameters from the
+stack, the branch instruction BRA and the non-local goto
+instrucion GTO are the only ones that leave the contents of
+the 'function return area' intact.
+All other instructions are allowed to destroy the function
+return area.
+Thus parameters can be popped before fetching the function result.
+The maximum size of all function return areas is
+implementation dependent,
+but should allow procedure instance identifiers and all
+implemented objects of type integer, unsigned, float
+and pointer to be returned.
+In most implementations
+the maximum size of the function return
+area is twice the pointer size,
+because we want to be able to handle 'procedure instance
+identifiers' which consist of a procedure identifier and the LB
+of a frame belonging to that procedure.
+.P
+The heap data area grows upwards, to higher numbered
+addresses.
+It is initially empty.
+The initial value of the heap pointer HP
+marks the low end.
+The heap pointer may be manipulated
+by the LOR and STR instructions.
+The heap can only be addressed indirectly,
+by pointers derived from previous values of HP.
+.S2 "Global data area"
+The initial size of the global data area is determined at assembly time.
+Global data is allocated by several
+pseudoinstructions in the EM assembly
+language.
+Each pseudoinstruction allocates one or more bytes.
+The bytes allocated for a single pseudo form
+a 'block'.
+A block differs from a fragment, because,
+under certain conditions, several blocks are allocated
+in a single fragment.
+This guarantees that the bytes of these blocks
+are consecutive.
+.P
+Global data is addressed absolutely in binary
+machine language.
+Most compilers, however,
+cannot assign absolute addresses to their global variables,
+especially not if the language
+allows programs to be composed of several separately compiled modules.
+The assembly language therefore allows the compiler to name
+the first address of a global data block with an alphanumeric label.
+Moreover, the only way to address such a named global data block
+in the assembly language is by using its name.
+It is the task of the assembler/loader to
+translate these labels into absolute addresses.
+These labels may also be used
+in CON and ROM pseudoinstructions to initialize pointers.
+.P
+The pseudoinstruction CON allocates initialized data.
+ROM acts like CON but indicates that the initialized data will
+not change during execution of the program.
+The pseudoinstruction BSS allocates a block of uninitialized
+or identically initialized
+data.
+The pseudoinstruction HOL is similar to BSS,
+but it alters the meaning of subsequent absolute addressing in
+the assembly language.
+.P
+Another type of global data is a small block,
+called the ABS block, with an implementation defined size.
+Storage in this type of block can only be addressed
+absolutely in assembly language.
+The first word has address 0 and is used to maintain the
+source line number.
+Special instructions LIN and LNI are provided to
+update this counter.
+A pointer at location 4 points to a string containing the
+current source file name.
+The instruction FIL can be used to update the pointer.
+.P
+All numeric arguments of the instructions that address
+the global data area refer to locations in the
+ABS block unless
+they are preceded by at least one HOL pseudo in the same
+module,
+in which case they refer to the storage area allocated by the
+last HOL pseudoinstruction.
+Thus LOE 0 loads the zeroth word of the most recent HOL, unless no HOL has
+appeared in the current file so
+far, in which case it loads the zeroth word of the
+ABS fragment.
+.P
+The global data area is highly fragmented.
+The ABS block and each HOL and BSS block are separate fragments.
+The way fragments are formed from CON and ROM blocks is more complex.
+The assemblers group several blocks into a single fragment.
+A fragment only contains blocks of the same type: CON or ROM.
+It is guaranteed that the bytes allocated for two consecutive CON pseudos are
+allocated consecutively in a single fragment, unless
+these CON pseudos are separated in the assembly language program
+by a data label definition or one or more of the following pseudos:
+.DS
+
+     ROM, BSS, HOL and END
+
+.DE
+An analogous rule holds for ROM pseudos.
+.S2 "Local data area"
+The local data area consists of a sequence of frames, one for
+each active procedure.
+Below the frame of the current procedure resides the
+expression stack.
+Frames are generated by procedure calls and are
+removed by procedure returns.
+A procedure frame consists of six 'zones':
+.DS
+
+  1.  The return status block
+  2.  The local variables and compiler temporaries
+  3.  The register save block
+  4.  The dynamic local generators
+  5.  The operand stack.
+  6.  The parameters of a procedure one level deeper
+
+.DE
+A sample frame is shown in Figure 1.
+.P
+Before a procedure call is performed the actual
+parameters are pushed onto the stack of the calling procedure.
+The exact details are compiler dependent.
+EM allows procedures to be called with a variable number of
+parameters.
+The implementation of the C-language almost forces its runtime
+system to push the parameters in reverse order, that is,
+the first positional parameter last.
+Most compilers use the C calling convention to be compatible.
+The parameters of a procedure belong to the frame of the
+calling procedure.
+Note that the evaluation of the actual parameters may imply
+the calling of procedures.
+The parameters can be accessed with certain instructions using
+offsets of 0 and greater.
+The first byte of the last parameter pushed has offset 0.
+Note that the parameter at offset 0 has a special use in the
+instructions following the static chain (LXL and LXA).
+These instructions assume that this parameter contains the LB of
+the statically enclosing procedure.
+Procedures that do not have a dynamically enclosing procedure
+do not need a static link at offset 0.
+.P
+Two instructions are available to perform procedure calls, CAL
+and CAI.
+Several tasks are performed by these call instructions.
+.A
+First, a part of the status of the calling procedure is
+saved on the stack in the return status block.
+This block should contain the return address of the calling
+procedure, its LB and other implementation dependent data.
+The size of this block is fixed for any given implementation
+because the lexical instructions LPB, LXL and LXA must be able to
+obtain the base addresses of the procedure parameters \fBand\fP local
+variables.
+An alternative solution can be used on machines with a highly
+segmented address space.
+The stack frames need not be contiguous then and the first
+status save area can contain the parameter base AB,
+which has the value of SP just after the last parameter has
+been pushed.
+.A
+Second, the LB is changed to point to the
+first word above the local variables.
+The new LB is a copy of the SP after the return status
+block has been pushed.
+.A
+Third, the amount of local storage needed by the procedure is
+reserved.
+The parameters and local storage are accessed by the same instructions.
+Negative offsets are used for access to local variables.
+The highest byte, that is the byte nearest
+to LB, has to be accessed with offset -1.
+The pseudoinstruction specifying the entry point of a
+procedure, has an argument that specifies the amount of local
+storage needed.
+The local variables allocated by the CAI or CAL instructions
+are the only ones that can be accessed with a fixed negative offset.
+The initial value of the allocated words is
+not defined, but implementations that check for undefined
+values will probably initialize them with a
+special 'undefined' pattern, typically -32768.
+.A
+Fourth, any EM implementation is allowed to reserve a variable size
+block beneath the local variables.
+This block could, for example, be used to save a variable number
+of registers.
+.A
+Finally, the address of the entry point of the called procedure
+is loaded into the Program Counter.
+.P
+The ASP instruction can be used to allocate further (dynamic)
+local storage.
+The base address of such storage must be obtained with a LOR~SP
+instruction.
+This same instruction ASP may also be used
+to remove some words from the stack.
+.P
+There is a version of ASP, called ASS, which fetches the number
+of bytes to allocate from the stack.
+It can be used to allocate space for local
+objects whose size is unknown at compile time,
+so called 'dynamic local generators'.
+.P
+Control is returned to the calling procedure with a RET instruction.
+Any return value is then copied to the 'function return area'.
+The frame created by the call is deallocated and the status of
+the calling procedure is restored.
+The value of SP just after the return value has been popped must
+be the same as the
+value of SP just before executing the first instruction of this
+invocation.
+This means that when a RET is executed the operand stack can
+only contain the return value and all dynamically generated locals must be
+deallocated.
+Violating this restriction might result in hard to detect
+errors.
+The calling procedure has to remove the parameters from the stack.
+This can be done with the aforementioned ASP instruction.
+.P
+Each procedure frame is a separate fragment.
+Because any fragment may be placed anywhere in memory,
+procedure frames need not be contiguous.
+.DS
+                |===============================|
+                |     actual parameter  n-1     |
+                |-------------------------------|
+                |              .                |
+                |              .                |
+                |              .                |
+                |-------------------------------|
+                |     actual parameter  0       | ( <- AB )
+                |===============================|
+
+
+                |===============================|
+                |///////////////////////////////|
+                |///// return status block /////|
+                |///////////////////////////////|   <- LB
+                |===============================|
+                |                               |
+                |       local variables         |
+                |                               |
+                |-------------------------------|
+                |                               |
+                |      compiler temporaries     |
+                |                               |
+                |===============================|
+                |///////////////////////////////|
+                |///// register save block /////|
+                |///////////////////////////////|
+                |===============================|
+                |                               |
+                |   dynamic local generators    |
+                |                               |
+                |===============================|
+                |           operand             |
+                |-------------------------------|
+                |           operand             |
+                |===============================|
+                |         parameter  m-1        |
+                |-------------------------------|
+                |              .                |
+                |              .                |
+                |              .                |
+                |-------------------------------|
+                |         parameter  0          | <- SP
+                |===============================|
+
+          Figure 1. A sample procedure frame and parameters.
+.DE
+.S2 "Heap data area"
+The heap area starts empty, with HP
+pointing to the low end of it.
+HP always contains a word address.
+A copy of HP can always be obtained with the LOR instruction.
+A new value may be stored in the heap pointer using the STR instruction.
+If the new value is greater than the old one,
+then the heap grows.
+If it is smaller, then the heap shrinks.
+HP may never point below its original value.
+All words between the current HP and the original HP
+are allocated to the heap.
+The heap may not grow into a part of memory that is already allocated
+for the stack.
+When this is attempted, the STR instruction will cause a trap to occur.
+.P
+The only way to address the heap is indirectly.
+Whenever an object is allocated by increasing HP,
+then the old HP value must be saved and can be used later to address
+the allocated object.
+If, in the meantime, HP is decreased so that the object
+is no longer part of the heap, then an attempt to access
+the object is not allowed.
+Furthermore, if the heap pointer is increased again to above
+the object address, then access to the old object gives undefined results.
+.P
+The heap is a single fragment.
+All bytes have consecutive addresses.
+No limits are imposed on the size of the heap as long as it fits
+in the available data address space.
--- a/doc/em/even.c
+++ b/doc/em/even.c
@ -0,0 +1,9 @@
+main() {
+	register int l,j ;
+
+	for ( j=0 ; (l=getchar()) != -1 ; j++ ) {
+		if ( j%16 == 15 ) printf("%3d\n",l&0377 ) ;
+		else              printf("%3d ",l&0377 ) ;
+	}
+	printf("\n") ;
+}
--- a/doc/em/exam.e
+++ b/doc/em/exam.e
@ -0,0 +1,178 @@
+  mes 2,2,2              ; wordsize 2, pointersize 2
+ .1
+  rom 't.p\000'          ; the name of the source file
+  hol 552,-32768,0       ; externals and buf occupy 552 bytes
+  exp $sum               ; sum can be called from other modules
+  pro $sum,2             ; procedure sum; 2 bytes local storage
+  lin 8                  ; code from source line 8
+  ldl 0                  ; load two locals ( a and b )
+  adi 2                  ; add them
+  ret 2                  ; return the result
+  end 2                  ; end of procedure ( still two bytes local storage )
+ .2
+  rom 1,99,2             ; descriptor of array a[]
+  exp $test              ; the compiler exports all level 0 procedures
+  pro $test,226          ; procedure test, 226 bytes local storage
+ .3
+  rom 4.8F8              ; assemble Floating point 4.8 (8 bytes) in
+ .4                              ; global storage
+  rom 0.5F8              ; same for 0.5
+  mes 3,-226,2,2         ; compiler temporary not referenced indirect
+  mes 3,-24,2,0          ; the same is true for i, j, b and c in test
+  mes 3,-22,2,0
+  mes 3,-4,2,0
+  mes 3,-2,2,0
+  mes 3,-20,8,0          ; and for x and y
+  mes 3,-12,8,0
+  lin 20                 ; maintain source line number
+  loc 1
+  stl -4                 ; j := 1
+  lni                    ; was lin 21 prior to optimization
+  lol -4
+  loc 3
+  mli 2
+  loc 6
+  adi 2
+  stl -2                 ; i := 3 * j + 6
+  lni                    ; was lin 22 prior to optimization
+  lae .3
+  loi 8
+  lal -12
+  sti 8                  ; x := 4.8
+  lni                    ; was lin 23 prior to optimization
+  lal -12
+  loi 8
+  lae .4
+  loi 8
+  dvf 8
+  lal -20
+  sti 8                  ; y := x / 0.5
+  lni                    ; was lin 24 prior to optimization
+  loc 1
+  stl -22                ; b := true
+  lni                    ; was lin 25 prior to optimization
+  loc 122
+  stl -24                ; c := 'z'
+  lni                    ; was lin 26 prior to optimization
+  loc 1
+  stl -2                 ; for i:= 1
+ 2
+  lol -2
+  dup 2
+  mli 2                  ; i*i
+  lal -224
+  lol -2
+  lae .2
+  sar 2                  ; a[i] :=
+  lol -2
+  loc 100
+  beq *3                 ; to 100 do
+  inl -2                 ; increment i and loop
+  bra *2
+ 3
+  lin 27
+  lol -4
+  loc 27
+  adi 2                  ; j + 27
+  sil 0                  ; r.r1 :=
+  lni                    ; was lin 28 prior to optimization
+  lol -22                ; b
+  lol 0
+  stf 10                 ; r.r3 :=
+  lni                    ; was lin 29 prior to optimization
+  lal -20
+  loi 16
+  adf 8                  ; x + y
+  lol 0
+  adp 2
+  sti 8                  ; r.r2 :=
+  lni                    ; was lin 23 prior to optimization
+  lal -224
+  lol -4
+  lae .2
+  lar 2                  ; a[j]
+  lil 0                  ; r.r1
+  cal $sum               ; call now
+  asp 4                  ; remove parameters from stack
+  lfr 2                  ; get function result
+  stl -2                 ; i :=
+ 4
+  lin 31
+  lol -2
+  zle *5                 ; while i > 0 do
+  lol -4
+  lil 0
+  adi 2
+  stl -4                 ; j := j + r.r1
+  del -2                 ; i := i - 1
+  bra *4                 ; loop
+ 5
+  lin 32
+  lol 0
+  stl -226               ; make copy of address of r
+  lol -22
+  lol -226
+  stf 10                 ; r3 := b
+  lal -20
+  loi 16
+  adf 8
+  lol -226
+  adp 2
+  sti 8                  ; r2 := x + y
+  loc 0
+  sil -226               ; r1 := 0
+  lin 34                 ; note the abscence of the unnecesary jump
+  lae 22                 ; address of output structure
+  lol -4
+  cal $_wri              ; write integer with default width
+  asp 4                  ; pop parameters
+  lae 22
+  lol -2
+  loc 6
+  cal $_wsi              ; write integer width 6
+  asp 6
+  lae 22
+  lal -12
+  loi 8
+  loc 9
+  loc 3
+  cal $_wrf              ; write fixed format real, width 9, precision 3
+  asp 14
+  lae 22
+  lol -22
+  cal $_wrb              ; write boolean, default width
+  asp 4
+  lae 22
+  cal $_wln              ; writeln
+  asp 2
+  ret 0                  ; return, no result
+  end 226
+  exp $_main
+  pro $_main,0           ; main program
+ .6
+  con 2,-1,22            ; description of external files
+ .5
+  rom 15.96F8
+  fil .1                 ; maintain source file name
+  lae .6                 ; description of external files
+  lae 0                  ; base of hol area to relocate buffer addresses
+  cal $_ini              ; initialize files, etc...
+  asp 4
+  lin 37
+  lae .5
+  loi 8
+  lae 2
+  sti 8                  ; x := 15.9
+  lni                    ; was lin 38 prior to optimization
+  loc 99
+  ste 0                  ; mi := 99
+  lni                    ; was lin 39 prior to optimization
+  lae 10                 ; address of r
+  cal $test
+  asp 2
+  loc 0                  ; normal exit
+  cal $_hlt              ; cleanup and finish
+  asp 2
+  end 0
+  mes 4,40               ; length of source file is 40 lines
+  mes 5                  ; reals were used
--- a/doc/em/exam.p
+++ b/doc/em/exam.p
@ -0,0 +1,40 @@
+  program example(output);
+  {This program just demonstrates typical EM code.}
+  type rec = record r1: integer; r2:real; r3: boolean end;
+  var mi: integer;  mx:real;  r:rec;
+
+  function sum(a,b:integer):integer;
+  begin
+    sum := a + b
+  end;
+
+  procedure test(var r: rec);
+  label 1;
+  var   i,j: integer;
+	x,y: real;
+	b: boolean;
+	c: char;
+	a: array[1..100] of integer;
+
+  begin
+	j := 1;
+	i := 3 * j + 6;
+	x := 4.8;
+	y := x/0.5;
+	b := true;
+	c := 'z';
+	for i:= 1 to 100 do a[i] := i * i;
+	r.r1 := j+27;
+	r.r3 := b;
+	r.r2 := x+y;
+	i := sum(r.r1, a[j]);
+	while i > 0 do begin j := j + r.r1; i := i - 1 end;
+	with r do begin r3 := b;  r2 := x+y;  r1 := 0 end;
+	goto 1;
+  1:    writeln(j, i:6, x:9:3, b)
+  end; {test}
+  begin {main program}
+    mx := 15.96;
+    mi := 99;
+    test(r)
+  end.
--- a/doc/em/intro.nr
+++ b/doc/em/intro.nr
@ -0,0 +1,180 @@
+.BP
+.S1 "INTRODUCTION"
+EM is a family of intermediate languages designed for producing
+portable compilers.
+The general strategy is for a program called
+.B front end
+to translate the source program to EM.
+Another program,
+.B back
+.BW end
+translates EM to target assembly language.
+Alternatively, the EM code can be assembled to a binary form
+and interpreted.
+These considerations led to the following goals:
+.IS 2 10
+.PS 1 4
+.PT
+The design should allow translation to,
+or interpretation on, a wide range of existing machines.
+Design decisions should be delayed as far as possible
+and the implications of these decisions should
+be localized as much as possible.
+.N
+The current microcomputer technology offers 8, 16 and 32 bit machines
+with various sizes of address space.
+EM should be flexible enough to be useful on most of these
+machines.
+The differences between the members of the EM family should only
+concern the wordsize and address space size.
+.PT
+The architecture should ease the task of code generation for
+high level languages such as Pascal, C, Ada, Algol 68, BCPL.
+.PT
+The instruction set used by the interpreter should be compact,
+to reduce the amount of memory needed
+for program storage, and to reduce the time needed to transmit
+programs over communication lines.
+.PT
+It should be designed with microprogrammed implementations in
+mind; in particular, the use of many short fields within
+instruction opcodes should be avoided, because their extraction by the
+microprogram or conversion to other instruction formats is inefficient.
+.PE
+.IE
+.A
+The basic architecture is based on the concept of a stack. The stack
+is used for procedure return addresses, actual parameters, local variables,
+and arithmetic operations.
+There are several built-in object types,
+for example, signed and unsigned integers,
+floating point numbers, pointers and sets of bits.
+There are instructions to push and pop objects
+to and from the stack.
+The push and pop instructions are not typed.
+They only care about the size of the objects.
+For each built-in type there are
+reverse Polish type instructions that pop one or more
+objects from the top of
+the stack, perform an operation, and push the result back onto the
+stack.
+For all types except pointers,
+these instructions have the object size
+as argument.
+.P
+There are no visible general registers used for arithmetic operands
+etc. This is in contrast to most third generation computers, which usually
+have 8 or 16 general registers. The decision not to have a group of
+general registers was fully intentional, and follows W.L. Van der
+Poel's dictum that a machine should have 0, 1, or an infinite
+number of any feature. General registers have two primary uses: to hold
+intermediate results of complicated expressions, e.g.
+.IS 5 0 1
+((a*b + c*d)/e + f*g/h) * i
+.IE 1
+and to hold local variables.
+.P
+Various studies
+have shown that the average expression has fewer than two operands,
+making the former use of registers of doubtful value. The present trend
+toward structured programs consisting of many small
+procedures greatly reduces the value of registers to hold local variables
+because the large number of procedure calls implies a large overhead in
+saving and restoring the registers at every call.
+.BP
+.P
+Although there are no general purpose registers, there are a
+few internal registers with specific functions as follows:
+.IS 2
+.N 1
+.TS
+tab(:);
+l 1 l l.
+PC:-:Program Counter:Pointer to next instruction
+LB:-:Local Base:Points to base of the local variables \
+in the current procedure.
+SP:-:Stack Pointer:Points to the highest occupied word on the stack.
+HP:-:Heap Pointer:Points to the top of the heap area.
+.TE 1
+.IE
+.A
+Furthermore, reverse Polish code is much easier to generate than
+multi-register machine code, especially if highly efficient code is
+desired.
+When translating to assembly language the back end can make
+good use of the target machine's registers.
+An EM machine can
+achieve high performance by keeping part of the stack
+in high speed storage (a cache or microprogram scratchpad memory) rather
+than in primary memory.
+.P
+Again according to van der Poel's dictum,
+all EM instructions have zero or one argument.
+We believe that instructions needing two arguments
+can be split into two simpler ones.
+The simpler ones can probably be used in other
+circumstances as well.
+Moreover, these two instructions together often
+have a shorter encoding than the single
+instruction before.
+.P
+This document describes EM at three different levels:
+the abstract level, the assembly language level and
+the machine language level.
+.A
+The most important level is that of the abstract EM architecture.
+This level deals with the basic design issues.
+Only the functional capabilities of instructions are relevant, not their
+format or encoding.
+Most chapters of this document refer to the abstract level
+and it is explicitly stated whenever
+another level is described.
+.A
+The assembly language is intended for the compiler writer.
+It presents a more or less orthogonal instruction
+set and provides symbolic names for data.
+Moreover, it facilitates the linking of
+separately compiled 'modules' into a single program
+by providing several pseudoinstructions.
+.A
+The machine language is designed for interpretation with a compact
+program text and easy decoding.
+The binary representation of the machine language instruction set is
+far from orthogonal.
+Frequent instructions have a short opcode.
+The encoding is fully byte oriented.
+These bytes do not contain small bit fields, because
+bit fields would slow down decoding considerably.
+.P
+A common use for EM is for producing portable (cross) compilers.
+When used this way, the compilers produce
+EM assembly language as their output.
+To run the compiled program on the target machine,
+the back end, translates the EM assembly language to
+the target machine's assembly language.
+When this approach is used, the format of the EM
+machine language instructions is irrelevant.
+On the other hand, when writing an interpreter for EM machine language
+programs, the interpreter must deal with the machine language
+and not with the symbolic assembly language.
+.P
+As mentioned above, the
+current microcomputer technology offers 8, 16 and 32 bit
+machines with address spaces ranging from 2\v'-0.5m'16\v'0.5m'
+to 2\v'-0.5m'32\v'0.5m' bytes.
+Having one size of pointers and integers restricts
+the usefulness of the language.
+We decided to have a different language for each combination of
+word and pointer size.
+All languages offer the same instruction set and differ only in
+memory alignment restrictions and the implicit size assumed in
+several instructions.
+The languages
+differ slightly for the
+different size combinations.
+For example: the
+size of any object on the stack and alignment restrictions.
+The wordsize is restricted to powers of 2 and
+the pointer size must be a multiple of the wordsize.
+Almost all programs handling EM will be parametrized with word
+and pointer size.
--- a/doc/em/iotrap.nr
+++ b/doc/em/iotrap.nr
@ -0,0 +1,376 @@
+.SN 8
+.VS 1 0
+.BP
+.S1 "ENVIRONMENT INTERACTIONS"
+EM programs can interact with their environment in three ways.
+Two, starting/stopping and monitor calls, are dealt with in this chapter.
+The remaining way to interact, interrupts, will be treated
+together with traps in chapter 9.
+.S2 "Program starting and stopping"
+EM user programs start with a call to a procedure called
+m_a_i_n.
+The assembler and backends look for the definition of a procedure
+with this name in their input.
+The call passes three parameters to the procedure.
+The parameters are similar to the parameters supplied by the
+UNIX
+.FS
+UNIX is a Trademark of Bell Laboratories.
+.FE
+operating system to C programs.
+These parameters are often called
+.BW argc ,
+.B argv
+and
+.BW envp .
+Argc is the parameter nearest to LB and is a wordsized integer.
+The other two are pointers to the first element of an array of
+string pointers.
+.N
+The
+.B argv
+array contains
+.B argc
+strings, the first of which contains the program call name.
+The other strings in the
+.B argv
+array are the program parameters.
+.P
+The
+.B envp
+array contains strings in the form "name=string", where 'name'
+is the name of an environment variable and string its value.
+The
+.B envp
+is terminated by a zero pointer.
+.P
+An EM user program stops if the program returns from the first
+invocation of m_a_i_n.
+The contents of the function return area are used to procure a
+wordsized program return code.
+EM programs also stop when traps and interrupts occur that are
+not caught and when the exit monitor call is executed.
+.S2 "Input/Output and other monitor calls"
+EM differs from most conventional machines in that it has high level i/o
+instructions.
+Typical instructions are OPEN FILE and READ FROM FILE instead
+of low level instructions such as setting and clearing
+bits in device registers.
+By providing such high level i/o primitives, the task of implementing
+EM on various non EM machines is made considerably easier.
+.P
+I/O is initiated by the MON instruction, which expects an iocode on top
+of the stack.
+Often there are also parameters which are pushed on the
+stack in reverse order, that is: last
+parameter first.
+Some i/o functions also provide results, which are returned on the stack.
+In the list of monitor calls we use several types of parameters and results,
+these types consist of integers and unsigneds of varying sizes, but never
+smaller than the wordsize, and the two pointer types.
+.N 1
+The names of the types used are:
+.IS 4
+.PS - 10
+.PT int
+an integer of wordsize
+.PT int2
+an integer whose size is the maximum of the wordsize and 2
+bytes
+.PT int4
+an integer whose size is the maximum of the wordsize and 4
+bytes
+.PT intp
+an integer with the size of a pointer
+.PT uns2
+an unsigned integer whose size is the maximum of the wordsize and 2
+.PT unsp
+an unsigned integer with the size of a pointer
+.PT ptr
+a pointer into data space
+.PE 1
+.IE 0
+The table below lists the i/o codes with their results and
+parameters.
+This list is similar to the system calls of the UNIX Version 7
+operating system.
+.BP
+.A
+To execute a monitor call, proceed as follows:
+.IS 2
+.N 1
+.PS a 4 "" )
+.PT
+Stack the parameters, in reverse order, last parameter first.
+.PT
+Push the monitor call number (iocode) onto the stack.
+.PT
+Execute the MON instruction.
+.PE 1
+.IE
+An error code is present on the top of the stack after
+execution of most monitor calls.
+If this error code is zero, the call performed the action
+requested and the results are available on top of the stack.
+Non-zero error codes indicate a failure, in this case no
+results are available and the error code has been pushed twice.
+This construction enables programs to test for failure with a
+single instruction (~TEQ or TNE~) and still find out the cause of
+the failure.
+The result name 'e' is reserved for the error code.
+.N 1
+List of monitor calls.
+.DS B
+number name    parameters      results           function
+
+   1   Exit    status:int                        Terminate this process
+   2   Fork                    e,flag,pid:int    Spawn new process
+   3   Read    fildes:int;buf:ptr;nbytes:unsp
+                               e:int;rbytes:unsp Read from file
+   4   Write   fildes:int;buf:ptr;nbytes:unsp
+                               e:int;wbytes:unsp Write on a file
+   5   Open    string:ptr;flag:int
+                               e,fildes:int      Open file for read and/or write
+   6   Close   fildes:int      e:int             Close a file
+   7   Wait                    e:int;status,pid:int2
+                                                 Wait for child
+   8   Creat   string:ptr;mode:int
+                               e,fildes:int      Create a new file
+   9   Link    string1,string2:ptr
+                               e:int             Link to a file
+  10   Unlink  string:ptr      e:int             Remove directory entry
+  12   Chdir   string:ptr      e:int             Change default directory
+  14   Mknod   string:ptr;mode,addr:int2
+                               e:int             Make a special file
+  15   Chmod   string:ptr;mode:int2
+                               e:int             Change mode of file
+  16   Chown   string:ptr;owner,group:int2
+                               e:int             Change owner/group of a file
+  18   Stat    string,statbuf:ptr
+                               e:int             Get file status
+  19   Lseek   fildes:int;off:int4;whence:int
+                               e:int;oldoff:int4 Move read/write pointer
+  20   Getpid                  pid:int2          Get process identification
+  21   Mount   special,string:ptr;rwflag:int
+                               e:int             Mount file system
+  22   Umount  special:ptr     e:int             Unmount file system
+  23   Setuid  userid:int2     e:int             Set user ID
+  24   Getuid                  e_uid,r_uid:int2  Get user ID
+  25   Stime   time:int4       e:int             Set time and date
+  26   Ptrace  request:int;pid:int2;addr:ptr;data:int
+                               e,value:int       Process trace
+  27   Alarm   seconds:uns2    previous:uns2     Schedule signal
+  28   Fstat   fildes:int;statbuf:ptr
+                               e:int             Get file status
+  29   Pause                                     Stop until signal
+  30   Utime   string,timep:ptr
+                               e:int             Set file times
+  33   Access  string,mode:int e:int             Determine file accessibility
+  34   Nice    incr:int                          Set program priority
+  35   Ftime   bufp:ptr        e:int             Get date and time
+  36   Sync                                      Update filesystem
+  37   Kill    pid:int2;sig:int
+                               e:int             Send signal to a process
+  41   Dup     fildes,newfildes:int
+                               e,fildes:int      Duplicate a file descriptor
+  42   Pipe                    e,w_des,r_des:int Create a pipe
+  43   Times   buffer:ptr                        Get process times
+  44   Profil  buff:ptr;bufsiz,offset,scale:intp Execution time profile
+  46   Setgid  gid:int2        e:int             Set group ID
+  47   Getgid                  e_gid,r_gid:int   Get group ID
+  48   Sigtrp  trapno,signo:int
+                               e,prevtrap:int    See below
+  51   Acct    file:ptr        e:int             Turn accounting on or off
+  53   Lock    flag:int        e:int             Lock a process
+  54   Ioctl   fildes,request:int;argp:ptr
+                               e:int             Control device
+  56   Mpxcall cmd:int;vec:ptr e:int             Multiplexed file handling
+  59   Exece   name,argv,envp:ptr
+                               e:int             Execute a file
+  60   Umask   complmode:int2  oldmask:int2      Set file creation mode mask
+  61   Chroot  string:ptr      e:int             Change root directory
+.DE 1
+Codes 0, 11, 13, 17, 31, 32, 38, 39, 40, 45, 49, 50, 52,
+55, 57, 58, 62, and 63 are
+not used.
+.P
+All monitor calls, except fork and sigtrp
+are the same as the UNIX version 7 system calls.
+.P
+The sigtrp entry maps UNIX signals onto EM interrupts.
+Normally, trapno is in the range 0 to 252.
+In that case it requests that signal signo
+will cause trap trapno to occur.
+When given trap number -2, default signal handling is reset, and when given
+trap number -3, the signal is ignored.
+.P
+The flag returned by fork is 1 in the child process and 0 in
+the parent.
+The pid returned is the process-id of the other process.
+.BP
+.S1 "TRAPS AND INTERRUPTS"
+EM provides a means for the user program to catch all traps
+generated by the program itself, the hardware, or external conditions.
+This mechanism uses five instructions: LIM, SIM, SIG, TRP and RTT.
+This section of the manual may be omitted on the first reading since it
+presupposes knowledge of the EM instruction set.
+.P
+The action taken when a trap occures is determined by the value
+of an internal EM trap register.
+This register contains a pointer to a procedure.
+Initially the pointer used is zero and all traps halt the
+program with, hopefully, a useful message to the outside world.
+The SIG instruction can be used to alter the trap register,
+it pops a procedure pointer from the
+stack into the trap register.
+When a trap occurs after storing a nonzero value in the trap
+register, the procedure pointed to by the trap register
+is called with the trap number
+as the only parameter (see below).
+SIG returns the previous value of the trap register on the
+stack.
+Two consecutive SIGs are a no-op.
+When a trap occurs, the trap register is reset to its initial
+condition, to prevent recursive traps from hanging the machine up,
+e.g. stack overflow in the stack overflow handling procedure.
+.P
+The runtime systems for some languages need to ignore some EM
+traps.
+EM offers a feature called the ignore mask.
+It contains one bit for each of the lowest 16 trap numbers.
+The bits are numbered 0 to 15, with the least significant bit
+having number 0.
+If a certain bit is 1 the corresponding trap never
+occurs and processing simply continues.
+The actions performed by the offending instruction are
+described by the Pascal program in appendix A.
+.N
+If the bit is 0, traps are not ignored.
+The instructions LIM and SIM allow copying and replacement of
+the ignore mask.~
+.P
+The TRP instruction generates a trap, the trap number being found on the
+stack.
+This is, among other things,
+useful for library procedures and runtime systems.
+It can also be used by a low level trap procedure to pass the trap to a
+higher level one (see example below).
+.P
+The RTT instruction returns from the trap procedure and continues after the
+trap.
+In the list below all traps marked with an asterisk ('*') are
+considered to be fatal and it is explicitly undefined what happens if
+you try to restart after the trap.
+.P
+The way a trap procedure is called is completely compatible
+with normal calling conventions. The only way a trap procedure
+differs from normal procedures is the return. It has to use RTT instead
+of RET. This is necessary because the complete runtime status is saved on the
+stack before calling the procedure and all this status has to be reloaded.
+Error numbers are in the range 0 to 252.
+The trap numbers are divided into three categories:
+.IS 4
+.N 1
+.PS - 10
+.PT ~~0-~63
+EM machine errors, e.g. illegal instruction.
+.PS - 8
+.PT ~0-15
+maskable
+.PT 16-63
+not maskable
+.PE
+.PT ~64-127
+Reserved for use by compilers, run time systems, etc.
+.PT 128-252
+Available for user programs.
+.PE 1
+.IE
+EM machine errors are numbered as follows:
+.DS I 5
+.TS
+tab(@);
+n l l.
+0@EARRAY@Array bound error
+1@ERANGE@Range bound error
+2@ESET@Set bound error
+3@EIOVFL@Integer overflow
+4@EFOVFL@Floating overflow
+5@EFUNFL@Floating underflow
+6@EIDIVZ@Divide by 0
+7@EFDIVZ@Divide by 0.0
+8@EIUND@Undefined integer
+9@EFUND@Undefined float
+10@ECONV@Conversion error
+16*@ESTACK@Stack overflow
+17*@EHEAP@Heap overflow
+18*@EILLINS@Illegal instruction
+19*@EODDZ@Illegal size argument
+20*@ECASE@Case error
+21*@EMEMFLT@Addressing non existent memory
+22*@EBADPTR@Bad pointer used
+23*@EBADPC@Program counter out of range
+24@EBADLAE@Bad argument of LAE
+25@EBADMON@Bad monitor call
+26@EBADLIN@Argument of LIN too high
+27@EBADGTO@GTO descriptor error
+.TE
+.DE 0
+.P
+As an example,
+suppose a subprocedure has to be written to do a numeric
+calculation.
+When an overflow occurs the computation has to be stopped and
+the higher level procedure must be resumed.
+This can be programmed as follows using the mechanism described above:
+.DS B
+ mes 2,2,2              ; set sizes
+ersave
+ bss 2,0,0              ; Room to save previous value of trap procedure
+msave
+ bss 2,0,0              ; Room to save previous value of trap mask
+
+ pro calcule,0          ; entry point
+ lxl 0                  ; fill in non-local goto descriptor with LB
+ ste jmpbuf+4
+ lor 1                  ; and SP
+ ste jmpbuf+2
+ lim                    ; get current ignore mask
+ ste msave              ; save it
+ lim
+ loc 4                  ; bit for EFOVFL
+ ior 2                  ; set in mask
+ sim                    ; ignore EFOVFL from now on
+ lpi $catch             ; load procedure identifier
+ sig                    ; catch wil get all traps now
+ ste ersave             ; save previous trap procedure identifier
+; perform calculation now, possibly generating overflow
+1                       ; label jumped to by catch procedure
+ loe ersave             ; get old trap procedure
+ sig                    ; refer all following trap to old procedure
+ asp 2                  ; remove result of sig
+ loe msave              ; restore previous mask
+ sim                    ; done now
+; load result of calculation
+ ret 2                  ; return result
+jmpbuf
+ con *1,0,0
+ end
+.DE 0
+.VS 1 1
+.DS
+Example of catch procedure
+ pro catch,0            ; Local procedure that must catch the overflow trap
+ lol 2                  ; Load trap number
+ loc 4                  ; check for overflow
+ bne *1                 ; if other trap, call higher trap procedure
+ gto jmpbuf             ; return to procedure calcule
+1                       ; other trap has occurred
+ loe ersave             ; previous trap procedure
+ sig                    ; other procedure will get the traps now
+ asp 2                  ; remove the result of sig
+ lol 2                  ; stack trap number
+ trp                    ; call other trap procedure
+ rtt                    ; if other procedure returns, do the same
+ end
+.DE
--- a/doc/em/ip.awk
+++ b/doc/em/ip.awk
@ -0,0 +1,6 @@
+BEGIN { printf ".TS\nlw(6) lw(8) rw(3) rw(6) 14 lw(6) lw(8) rw(3) rw(6) 14 lw(6) lw(8) rw(3) rw(6).\n" }
+NF == 4 { printf "%s\t%s\t%d\t%d",$1,$2,$3,$4 }
+NF == 3 { printf "%s\t%s\t\t%d",$1,$2,$3 }
+ { if ( NR%3 == 0 ) printf("\n") ; else printf("\t"); }
+END { if ( NR%3 != 0 ) printf("\n")
+      printf ".TE\n" }
--- a/doc/em/ispace.nr
+++ b/doc/em/ispace.nr
@ -0,0 +1,61 @@
+.SN 3
+.BP
+.S1 "INSTRUCTION ADDRESS SPACE"
+The instruction space of the EM machine contains
+the code for procedures.
+Tables necessary for the execution of this code, for example, procedure
+descriptor tables, may also be present.
+The instruction space does not change during
+the execution of a program, so that it may be
+protected.
+No further restrictions to the instruction address space are
+necessary for the abstract and assembly language level.
+.P
+Each procedure has a single entry point: the first instruction.
+A special type of pointer identifies a procedure.
+Pointers into the instruction
+address space have the same size as pointers into data space and
+can, for example, contain the address of the first instruction
+or an index in a procedure descriptor table.
+.A
+There is a single EM program counter, PC, pointing
+to the next instruction to be executed.
+The procedure pointed to by PC is
+called the 'current' procedure.
+A procedure may call another procedure using the CAL or CAI
+instruction.
+The calling procedure remains 'active' and is resumed whenever the called
+procedure returns.
+Note that a procedure has several 'active' invocations when
+called recursively.
+.P
+Each procedure must return properly.
+It is not allowed to fall through to the
+code of the next procedure.
+There are several ways to exit from a procedure:
+.IS 3
+.PS
+.PT
+the RET instruction, which returns to the
+calling procedure.
+.PT
+the RTT instruction, which exits a trap handling routine and resumes
+the trapping instruction (see next chapter).
+.PT
+the GTO instruction, which is used for non-local goto's.
+It can remove several frames from the stack and transfer
+control to an active procedure.
+.PE
+.IE
+.P
+All branch instructions can transfer control
+to any label within the same procedure.
+Branch instructions can never jump out of a procedure.
+.P
+Several language implementations use a so called procedure
+instance identifier, a combination of a procedure identifier and
+the LB of a stack frame, also called static link.
+.P
+The program text for each procedure, as well as any tables,
+are fragments and can be allocated anywhere
+in the instruction address space.
--- a/doc/em/itables
+++ b/doc/em/itables
--- a/doc/em/mach.nr
+++ b/doc/em/mach.nr
@ -0,0 +1,390 @@
+.BP
+.SN 10
+.S1 "EM MACHINE LANGUAGE"
+The EM machine language is designed to make program text compact
+and to make decoding easy.
+Compact program text has many advantages: programs execute faster,
+programs occupy less primary and secondary storage and loading
+programs into satellite processors is faster.
+The decoding of EM machine language is so simple,
+that it is feasible to use interpreters as long as EM hardware
+machines are not available.
+This chapter is irrelevant when back ends are used to
+produce executable target machine code.
+.S2 "Instruction encoding"
+A design goal of EM is to make the
+program text as compact as possible.
+Decoding must be easy, however.
+The encoding is fully byte oriented, without any small bit fields.
+There are 256 primary opcodes, two of which are an escape to
+two groups of 256 secondary opcodes each.
+.A
+EM instructions without arguments have a single opcode assigned,
+possibly escaped:
+.DS
+
+         |--------------|
+         |    opcode    |
+         |--------------|
+
+                or
+
+         |--------------|--------------|
+         |    escape    |     opcode   |
+         |--------------|--------------|
+
+.DE
+The encoding for instructions with an argument is more complex.
+Several instructions have an address from the global data area
+as argument.
+Other instructions have different opcodes for positive
+and negative arguments.
+.N 1
+There is always an opcode that takes the next two bytes as argument,
+high byte first:
+.DS
+
+         |--------------|--------------|--------------|
+         |    opcode    |    hibyte    |    lobyte    |
+         |--------------|--------------|--------------|
+
+                or
+
+         |--------------|--------------|--------------|--------------|
+         |    escape    |    opcode    |    hibyte    |    lobyte    |
+         |--------------|--------------|--------------|--------------|
+
+.DE
+.DS
+An extra escape is provided for instructions with four or eight byte arguments.
+
+  |--------------|--------------|--------------|   |--------------|
+  |    ESCAPE    |    opcode    |    hibyte    |...|    lobyte    |
+  |--------------|--------------|--------------|   |--------------|
+
+.DE
+For most instructions some argument values predominate.
+The most frequent combinations of instruction and argument
+will be encoded in a single byte, called a mini:
+.DS
+
+         |---------------|
+         |opcode+argument|  (mini)
+         |---------------|
+
+.DE
+The number of minis is restricted, because only
+254 primary opcodes are available.
+Many instructions have the bulk of their arguments
+fall in the range 0 to 255.
+Instructions that address global data have their arguments
+distributed over a wider range,
+but small values of the high byte are common.
+For all these cases there is another encoding
+that combines the instruction and the high byte of the argument
+into a single opcode.
+These opcodes are called shorties.
+Shorties may be escaped.
+.DS
+
+         |--------------|--------------|
+         | opcode+high  |    lobyte    |  (shortie)
+         |--------------|--------------|
+
+                or
+
+         |--------------|--------------|--------------|
+         |    escape    | opcode+high  |    lobyte    |
+         |--------------|--------------|--------------|
+
+.DE
+Escaped shorties are useless if the normal encoding has a primary opcode.
+Note that for some instruction-argument combinations
+several different encodings are available.
+It is the task of the assembler to select the shortest of these.
+The savings by these mini and shortie
+opcodes are considerable, about 55%.
+.P
+Further improvements are possible:
+the arguments of
+many instructions are a multiple of the wordsize.
+Some do also not allow zero as an argument.
+If these arguments are divided by the wordsize and,
+when zero is not allowed, then decremented by 1, more of them can
+be encoded as shortie or mini.
+The arguments of some other instructions
+rarely or never assume the value 0, but start at 1.
+The value 1 is then encoded as 0,
+2 as 1 and so on.
+.P
+Assigning opcodes to instructions by the assembler is completely
+table driven.
+For details see appendix B.
+.S2 "Procedure descriptors"
+The procedure identifiers used in the interpreter are indices
+into a table of procedure descriptors.
+Each descriptor contains:
+.IS 6
+.PS - 4
+.PT 1.
+the number of bytes to be reserved for locals at each
+invocation.
+.N
+This is a pointer-szied integer.
+.PT 2.
+the start address of the procedure
+.PE
+.IE
+.S2 "Load format"
+The EM machine language load format defines the interface between
+the EM assembler/loader and the EM machine itself.
+A load file consists of a header, the program text to be executed,
+a description of the global data area and the procedure descriptor table,
+in this order.
+All integers in the load file are presented with the
+least significant byte first.
+.P
+The header has two parts: the first half (eight 16-bit integers)
+aids in selecting
+the correct EM machine or interpreter.
+Some EM machines, for instance, may have hardware floating point
+instructions.
+.N
+The header entries are as follows (bit 0 is rightmost):
+.IS 2
+.VS 1 0
+.PS 1 4 "" :
+.PT
+magic number (07255)
+.PT
+flag bits with the following meaning:
+.PS - 7 "" :
+.PT bit 0
+TEST; test for integer overflow etc.
+.PT bit 1
+PROFILE; for each source line: count the number of memory
+cycles executed.
+.PT bit 2
+FLOW; for each source line: set a bit in a bit map table if
+instructions on that line are executed.
+.PT bit 3
+COUNT; for each source line: increment a counter if that line
+is entered.
+.PT bit 4
+REALS; set if a program uses floating point instructions.
+.PT bit 5
+EXTRA; more tests during compiler debugging.
+.PE
+.PT
+number of unresolved references.
+.PT
+version number; used to detect obsolete EM load files.
+.PT
+wordsize ; the number of bytes in each machine word.
+.PT
+pointer size ; the number of bytes available for addressing.
+.PT
+unused
+.PT
+unused
+.PE
+.IE
+The second part of the header (eight entries, of pointer size bytes each)
+describes the load file itself:
+.IS 2
+.PS 1 4 "" :
+.PT
+NTEXT; the program text size in bytes.
+.PT
+NDATA; the number of load-file descriptors (see below).
+.PT
+NPROC; the number of entries in the procedure descriptor table.
+.PT
+ENTRY; procedure number of the procedure to start with.
+.PT
+NLINE; the maximum source line number.
+.PT
+SZDATA; the address of the lowest uninitialized data byte.
+.PT
+unused
+.PT
+unused
+.PE
+.IE
+.P
+The program text consists of NTEXT bytes.
+NTEXT is always a multiple of the wordsize.
+The first byte of the program text is the
+first byte of the instruction address
+space, i.e. it has address 0.
+Pointers into the program text are found in the procedure descriptor
+table where relocation is simple and in the global data area.
+The initialization of the global data area allows easy
+relocation of pointers into both address spaces.
+.P
+The global data area is described by the NDATA descriptors.
+Each descriptor describes a number of consecutive words (of~wordsize)
+and consists of a sequence of bytes.
+While reading the descriptors from the load file, one can
+initialize the global data area from low to high addresses.
+The size of the initialized data area is given by SZDATA,
+this number can be used to check the initialization.
+.N
+The header of each descriptor consists of a byte, describing the type,
+and a count.
+The number of bytes used for this (unsigned) count depends on the
+type of the descriptor and
+is either a pointer-sized integer
+or one byte.
+The meaning of the count depends on the descriptor type.
+At load time an interpreter can
+perform any conversion deemed necessary, such as
+reordering bytes in integers
+and pointers and adding base addresses to pointers.
+.BP
+.A
+In the following pictures we show a graphical notation of the
+initializers.
+The leftmost rectangle represents the leading byte.
+.N 1
+.DS
+.PS - 4 " "
+Fields marked with
+.N 1
+.PT n
+contain a pointer-sized integer used as a count
+.PT m
+contain a one-byte integer used as a count
+.PT b
+contain a one-byte integer
+.PT w
+contain a wordsized integer
+.PT p
+contain a data or instruction pointer
+.PT s
+contain a null terminated ASCII string
+.PE 1
+.DE 0
+.VS 1 1
+.DS
+
+    -------------------
+    | 0 |      n      |           repeat last initialization n times
+    -------------------
+.DE
+.DS
+    ---------
+    | 1 | m |                     m uninitialized words
+    ---------
+.DE
+.DS
+               ____________
+              /    bytes   \e
+    -----------------   -----
+    | 2 | m | b | b |...| b |     m initialized bytes
+    -----------------   -----
+.DE
+.DS
+               _________
+              /  word   \e
+    -----------------------
+    | 3 | m |      w      |...    m initialized wordsized integers
+    -----------------------
+.DE
+.DS
+               _________
+              / pointer \e
+    -----------------------
+    | 4 | m |      p      |...    m initialized data pointers
+    -----------------------
+.DE
+.DS
+               _________
+              / pointer \e
+    -----------------------
+    | 5 | m |      p      |...    m initialized instruction pointers
+    -----------------------
+.DE
+.DS
+               ____________
+              /    bytes   \e
+    -------------------------
+    | 6 | m | b | b |...| b |     initialized integer of size m
+    -------------------------
+.DE
+.DS
+               ____________
+              /    bytes   \e
+    -------------------------
+    | 7 | m | b | b |...| b |     initialized unsigned of size m
+    -------------------------
+.DE
+.DS
+               ____________
+              /   string   \e
+    -------------------------
+    | 8 | m |        s      |     initialized float of size m
+    -------------------------
+.DE 3
+.PS - 8
+.PT type~0:
+If the last initialization initialized k bytes starting
+at address \fIa\fP, do the same initialization again n times,
+starting at \fIa\fP+k, \fIa\fP+2*k, .... \fIa\fP+n*k.
+This is the only descriptor whose starting byte
+is followed by an integer with the
+size of a
+pointer,
+in all other descriptors the first byte is followed by a one-byte count.
+This descriptor must be preceded by a descriptor of
+another type.
+.PT type~1:
+Reserve m words, not explicitly initialized (BSS and HOL).
+.PT type~2:
+The m bytes following the descriptor header are
+initializers for the next m bytes of the
+global data area.
+m is divisible by the wordsize.
+.PT type~3:
+The m words following the header are initializers for the next m words of the
+global data area.
+.PT type~4:
+The m data address space pointers following the header are
+initializers for the next
+m data pointers in the global data area.
+Interpreters that represent EM pointers by
+target machine addresses must relocate all data pointers.
+.PT type~5:
+The m instruction address space pointers following the header are
+initializers for the next
+m instruction pointers in the global data area.
+Interpreters that represent EM instruction pointers by
+target machine addresses must relocate these pointers.
+.PT type~6:
+The m bytes following the header form
+a signed integer number with a size of m bytes,
+which is an initializer for the next m bytes
+of the global data area.
+m is governed by the same restrictions as for
+transfer of objects to/from memory.
+.PT type~7:
+The m bytes following the header form
+an unsigned integer number with a size of m bytes,
+which is an initializer for the next m bytes
+of the global data area.
+m is governed by the same restrictions as for
+transfer of objects to/from memory.
+.PT type~8:
+The header is followed by an ASCII string, null terminated, to
+initialize, in global data,
+a floating point number with a size of m bytes.
+m is governed by the same restrictions as for
+transfer of objects to/from memory.
+The ASCII string contains the notation of a real as used in the
+Pascal language.
+.PE
+.P
+The NPROC procedure descriptors on the load file consist of
+an instruction space address (of~pointer~size) and
+an integer (of~pointer~size) specifying the number of bytes for
+locals.
--- a/doc/em/macr.nr
+++ b/doc/em/macr.nr
@ -0,0 +1,16 @@
+.so /usr/lib/tmac/tmac.kun
+.SS 6
+.RP
+.PL 12i 11i
+.LL 89
+.MS T E
+\!.TL '%'''
+.ME
+.MS T O
+\!.TL '''%'
+.ME
+.MS B
+.sp 1
+.ME
+.SM S1 B
+.SM S2 B
--- a/doc/em/mapping.nr
+++ b/doc/em/mapping.nr
@ -0,0 +1,245 @@
+.SN 5
+.BP
+.S1 "MAPPING OF EM DATA MEMORY ONTO TARGET MACHINE MEMORY"
+The EM architecture is designed to be implemented
+on many existing and future machines.
+EM memory is highly fragmented to make
+adaptation to various memory architectures possible.
+Format and encoding of pointers is explicitly undefined.
+.P
+This chapter gives solutions to some of the
+anticipated problems.
+First, we describe a possible memory layout for machines
+with 64K bytes of address space.
+Here we use a member of the EM family with 2-byte word and pointer
+size.
+The most straightforward layout is shown in figure 2.
+.N 1
+.DS
+       65534 -> |-------------------------------|
+                |///////////////////////////////|
+                |//// unimplemented memory /////|
+                |///////////////////////////////|
+          ML -> |-------------------------------|
+                |                               |
+                |                               | <- LB
+                |     stack and local area      |
+                |                               |
+                |-------------------------------| <- SP
+                |///////////////////////////////|
+                |//////// inaccessible /////////|
+                |///////////////////////////////|
+                |-------------------------------| <- HP
+                |                               |
+                |           heap area           |
+                |                               |
+                |                               |
+          HB -> |-------------------------------|
+                |                               |
+                |       global data area        |
+                |                               |
+          EB -> |-------------------------------|
+                |                               |
+                |         program text          | <- PC
+                |                               |
+                |        ( and tables )         |
+                |                               |
+                |                               |
+          PB -> |-------------------------------|
+                |///////////////////////////////|
+                |////////// undefined //////////|
+                |///////////////////////////////|
+           0 -> |-------------------------------|
+
+           Figure 2.  Memory layout showing typical register
+           positions during execution of an EM program.
+.DE 2
+The base registers for the various memory pieces can be stored
+in target machine registers or memory.
+.IS
+.N 1
+.TS
+tab(;);
+l 1 l l l.
+PB;:;program base;points to the base of the instruction address space.
+EB;:;external base;points to the base of the data address space.
+HB;:;heap base;points to the base of the heap area.
+ML;:;memory limit;marks the high end of the addressable data space.
+.TE 1
+.IE
+The stack grows from high
+EM addresses to low EM addresses, and the heap the
+other way.
+The memory between SP and HP is not accessible,
+but may be allocated later to the stack or the heap if needed.
+The local data area is allocated starting at the high end of
+memory.
+.P
+Because EM address 0 is not mapped onto target
+address 0, a problem arises when pointers are used.
+If a program pushed a constant, say 6, onto the stack,
+and then tried to indirect through it,
+the wrong word would be fetched,
+because EM address 6 is mapped onto target address EB+6
+and not target address 6 itself.
+This particular problem is solved by explicitly declaring
+the format of a pointer to be undefined,
+so that using a constant as a pointer is completely illegal.
+However, the general problem of mapping pointers still exists.
+.P
+There are two possible solutions.
+In the first solution, EM pointers are represented
+in the target machine as true EM addresses,
+for example, a pointer to EM address 6 really is
+stored as a 6 in the target machine.
+This solution implies that every time a pointer is fetched
+EB must be added before referencing
+the target machine's memory.
+If the target machine has powerful indexing
+facilities, EB can be kept in a target machine register,
+and the relocation can indeed be done on
+every reference to the data address space
+at a modest cost in speed.
+.P
+The other solution consists of having EM pointers
+refer to the true target machine address.
+Thus the instruction LAE 6 (Load Address of External 6)
+would push the value of EB+6 onto the stack.
+When this approach is chosen, back ends must know
+how to offset from EB, to translate all
+instructions that manipulate EM addresses.
+However, the problem is not completely solved,
+because a front end may have to initialize a pointer
+in CON or ROM data to point to a global address.
+This pointer must also be relocated by the back end or the interpreter.
+.P
+Although the EM stack grows from high to low EM addresses,
+some machines have hardware PUSH and POP
+instructions that require the stack to grow upwards.
+If reasons of efficiency urge you to use these
+instructions, then EM
+can be implemented with the memory layout
+upside down, as shown in figure 3.
+This is possible because the pointer format is explicitly undefined.
+The first element of a word array will have a
+lower physical address than the second element.
+.N 2
+.DS
+          |                 |                    |                 |
+          |      EB=60      |                    |        ^        |
+          |                 |                    |        |        |
+          |-----------------|                    |-----------------|
+      105 |   45   |   44   | 104            214 |   41   |   40   | 215
+          |-----------------|                    |-----------------|
+      103 |   43   |   42   | 102            212 |   43   |   42   | 213
+          |-----------------|                    |-----------------|
+      101 |   41   |   40   | 100            210 |   45   |   44   | 211
+          |-----------------|                    |-----------------|
+          |        |        |                    |                 |
+          |        v        |                    |      EB=255     |
+          |                 |                    |                 |
+
+                Type A                                 Type B
+.sp 2
+              Figure 3. Two possible memory implementations.
+                 Numbers within the boxes are EM addresses.
+                 The other numbers are physical addresses.
+.DE 2
+.A 0 0
+So, we have two different EM memory implementations:
+.IS
+.PS - 4
+.PT A~-
+stack downwards
+.PT B~-
+stack upwards
+.PE
+.IE
+.P
+For each of these two possibilities we give the translation of
+the EM instructions to push the third byte of a global data
+block starting at EM address 40 onto the stack and to load the
+word at address 40.
+All translations assume a word and pointer size of two bytes.
+The target machine used is a PDP-11 augmented with push and pop instructions.
+Registers 'r0' and 'r1' are used and suffer from sign extension for byte
+transfers.
+Push $40 means push the constant 40, not word 40.
+.P
+The translation of the EM instructions depends on the pointer representation
+used.
+For each of the two solutions explained above the translation is given.
+.P
+First, the translation for the two implementations using EM addresses as
+pointer representation:
+.DS
+.TS
+tab(:), center;
+l s l s l s
+_ s _ s _ s
+l 2 l 6 l 2 l 6 l 2 l.
+EM:type A:type B
+
+
+LAE:40:push:$40:push:$40
+
+ADP:3:pop:r0:pop:r0
+::add:$3,r0:add:$3,r0
+::push:r0:push:r0
+
+LOI:1:pop:r0:pop:r0
+::-::neg:r0
+::clr:r1:clr:r1
+::bisb:eb(r0),r1:bisb:eb(r0),r1
+::push:r1:push:r1
+
+LOE:40:push:eb+40:push:eb-41
+.TE
+.DE
+.BP
+.P
+The translation for the two implementations, if the target machine address is
+used as pointer representation, is:
+.N 1
+.DS
+.TS
+tab(:), center;
+l s l s l s
+_ s _ s _ s
+l 2 l 6 l 2 l 6 l 2 l.
+EM:type A:type B
+
+
+LAE:40:push:$eb+40:push:$eb-40
+
+ADP:3:pop:r0:pop:r0
+::add:$3,r0:sub:$3,r0
+::push:r0:push:r0
+
+LOI:1:pop:r0:pop:r0
+::clr:r1:clr:r1
+::bisb:(r0),r1:bisb:(r0),r1
+::push:r1:push:r1
+
+LOE:40:push:eb+40:push:eb-41
+.TE
+.DE
+.P
+The translation presented above is not intended to be optimal.
+Most machines can handle these simple cases in one or two instructions.
+It demonstrates, however, the flexibility of the EM design.
+.P
+There are several possibilities to implement EM on machines with
+address spaces larger than 64k bytes.
+For EM with two byte pointers one could allocate instruction and
+data space each in a separate 64k piece of memory.
+EM pointers still have to fit in two bytes,
+but the base registers PB and EB may be loaded in hardware registers
+wider than 16 bits, if available.
+EM implementations can also make efficient use of a machine
+with separate instruction and data space.
+.P
+EM with 32 bit pointers allows one to make use of machines
+with large address spaces.
+In a virtual, segmented memory system one could use a separate
+segment for each fragment.
--- a/doc/em/mem.nr
+++ b/doc/em/mem.nr
@ -0,0 +1,80 @@
+.BP
+.SN 2
+.S1 MEMORY
+The EM machine has two distinct address spaces,
+one for instructions and one for data.
+The data space is divided up into 8-bit bytes.
+The smallest addressable unit is a byte.
+Bytes are numbered consecutively from 0 to some maximum.
+All sizes in EM are expressed in bytes.
+.P
+Some EM instructions can transfer objects containing several bytes
+to and/or from memory.
+The size of all objects larger than a word must be a multiple of
+the wordsize.
+The size of all objects smaller than a word must be a divisor
+of the wordsize.
+For example: if the wordsize is 2 bytes, objects of the sizes 1,
+2, 4, 6,... are allowed.
+The address of such an object is the lowest address of all bytes it contains.
+For objects smaller than the wordsize, the
+address must be a multiple of the object size.
+For all other objects the address must be a multiple of the
+wordsize.
+For example, if an instruction transfers a 4-byte object to memory at
+location \fIm\fP and the wordsize is 2,
+\fIm\fP must be a multiple of 2 and the bytes at
+locations \fIm\fP, \fIm\fP\|+\|1,\fIm\fP\|+\|2 and
+\fIm\fP\|+\|3 are overwritten.
+.P
+The size of almost all objects in EM
+is an integral number of words.
+Only two operations are allowed on
+objects whose size is a divisor of the wordsize:
+push it onto the stack and pop it from the stack.
+The addressing of these objects in memory is always indirect.
+If such a small object is pushed onto the stack
+it is assumed to be a small integer and stored
+in the least significant part of a word.
+The rest of the word is cleared to zero,
+although
+EM provides a way to sign-extend a small integer.
+Popping a small object from the stack removes a word
+from the stack, stores the least significant byte(s)
+of this word in memory and discards the rest of the word.
+.P
+The format of pointers into both address spaces is explicitly undefined.
+The size of a pointer, however, is fixed for a member of EM, so that
+the compiler writer knows how much storage to allocate for a pointer.
+.P
+A minor problem is raised by the undefined pointer format.
+Some languages, notably Pascal, require a special,
+otherwise illegal, pointer value to represent the nil pointer.
+The current Pascal-VU compiler uses the
+integer value 0 as nil pointer.
+This value is also used by many C programs as a normally impossible address.
+A better solution would be to have a special
+instruction loading an illegal pointer value,
+but it is hard to imagine an implementation
+for which the current solution is inadequate,
+especially because the first word in the EM data space
+is special and probably not the target of any pointer.
+.P
+The next two chapters describe the EM memory
+in more detail.
+One describes the instruction address space,
+the other the data address space.
+.P
+A design goal of EM has been to allow
+its implementation on a wide range of existing machines,
+as well as allowing a new one to be built in hardware.
+To this extent we have tried to minimize the demands
+of EM on the memory structure of the target machine.
+Therefore, apart from the logical partitioning,
+EM memory is divided into 'fragments'.
+A fragment consists of consecutive machine
+words and has a base address and a size.
+Pointer arithmetic is only defined within a fragment.
+The only exception to this rule is comparison with the null
+pointer.
+All fragments must be word aligned.
--- a/doc/em/print
+++ b/doc/em/print
@ -0,0 +1,5 @@
+
+case $# in
+1)      make "$1".t ; ntlp "$1".t^lpr ;;
+*)      echo $0 heeft een argument nodig ;;
+esac
--- a/doc/em/show
+++ b/doc/em/show
@ -0,0 +1,4 @@
+case $# in
+1)      make $1.t ; ntout $1.t ;;
+*)      echo $0 heeft een argument nodig ;;
+esac
--- a/doc/em/title.nr
+++ b/doc/em/title.nr
@ -0,0 +1,38 @@
+.po 0
+.TP 1
+.ll 79
+.sp 15
+.ce 4
+DESCRIPTION OF A MACHINE
+ARCHITECTURE FOR  USE WITH
+BLOCK  STRUCTURED  LANGUAGES
+.sp 6
+.ce 4
+Andrew S. Tanenbaum
+Hans  van  Staveren
+Ed G. Keizer
+Johan  W. Stevenson\v'-0.5m'*\v'0.5m'
+.sp 2
+.ce
+August 1983
+.sp 2
+.ce
+Informatica Rapport IR-81
+.sp 13
+Abstract
+.sp 2
+.ti +5
+EM is a family of intermediate languages
+designed for producing portable compilers.
+A program called
+.B front end
+translates source programs to EM.
+Another program,
+.B back
+.BW end ,
+translates EM to the assembly language of the target machine.
+Alternatively, the EM program can be assembled to a highly
+efficient binary format for interpretation.
+This document describes the EM languages in detail.
+.sp 4
+\v'-0.5m'*\v'0.5m' Present affiliation: NV Philips, Eindhoven
--- a/doc/em/types.nr
+++ b/doc/em/types.nr
@ -0,0 +1,130 @@
+.SN 6
+.BP
+.S1 "TYPE REPRESENTATIONS"
+The representations used for typed objects are not precisely
+specified by EM.
+Sometimes we only specify that a typed object occupies a
+certain amount of space and state no further restrictions.
+If one wants to have a different representation of the value of
+an object on the stack one has to use a convert instruction
+in most cases.
+We do specify some relations between the representations of
+types.
+This allows some intermixed use of operators for different types
+on the same object(s).
+For example, the instruction ZER pushes signed and
+unsigned integers with the value zero and empty sets.
+ZER has as only argument the size of the object.
+.A
+The representation of floating point numbers is a good example,
+it allows widely varying implementations.
+The only ways to create floating point numbers are via
+initialization and via conversions from integer numbers.
+Only by using conversions to integers and comparing
+two floating point numbers with each other, can these numbers
+be converted to human readable output.
+Implementations may use base 10, base 2 or any other
+base for exponents, and have freedom in choosing the range of
+exponent and mantissa.
+.A
+Other types are more precisely described.
+In the following paragraphs a description will be given of the
+restrictions imposed on the representation of the types used.
+A number \fBn\fP used in these paragraphs indicates the size of
+the object in \fIbits\fP.
+.S2 "Unsigned integers"
+The range of unsigned integers is 0..2\v'-0.5m'\fBn\fP\v'0.5m'-1.
+A binary representation is assumed.
+The order of the bits within an object is knowingly left
+unspecified.
+Discussing bit order within each 8-bit byte is academic,
+so the only real freedom of this specification lies in the byte
+order.
+We really do not care whether an implementation of a 4-byte
+integer has its bytes in a particular order of significance.
+This of course means that some sequences of instructions have
+unpredictable effects.
+For example:
+.DS
+   LOC 258 ; STL 0 ; LAL 0 ; LOI 1      ( wordsize >=2 )
+.DE
+The value on the stack after executing this sequence
+can be anything,
+but will most likely be 1 or 2.
+.A
+Conversion between unsigned integers of different sizes have to
+be done with explicit convert instructions.
+One cannot simply pad an unsigned integer with zero's at either end
+and expect a correct result.
+.A
+We assume existence of at least single word unsigned arithmetic
+in any implementation.
+.S2 "Signed Integers"
+The range of signed integers is -2\v'-0.5m'\fBn\fP-1\v'0.5m'~..~2\v'-0.5m'\fBn\fP-1\v'0.5m'-1,
+in other words the range of signed integers of \fBn\fP bits
+using two's complement arithmetic.
+The representation is the same as for unsigned integers except
+the range 2\v'-0.5m'\fBn\fP-1\v'0.5m'~..~2\v'-0.5m'\fBn\fP\v'0.5m'-1 is mapped on the
+range -2\v'-0.5m'\fBn\fP-1\v'0.5m'~..~-1.
+In other words, the most significant bit is used as sign bit.
+The convert instructions between signed and unsigned integers
+of the same size can be used to catch errors.
+.A
+The value -2\v'-0.5m'\fBn\fP-1\v'0.5m' is used for undefined
+signed integers.
+EM implementations should trap when this value is used in an
+operation on signed integers.
+The instruction mask, accessed with SIM and LIM -~see chapter 9~- ,
+can be used to disable such traps.
+.A
+We assume existence of at least single word signed arithmetic
+in any implementation.
+.BP
+.S2 "Floating point values"
+Floating point values must have a signed mantissa and a signed
+exponent.
+Although no base is specified, base 2 is the normal choice,
+because the FEF instruction pushes the exponent in base 2.
+.A
+The implementation of floating point arithmetic is optional.
+The compilers currently in use have runtime parameters for the
+size of the floating point values they should use.
+Common choices are 4 and/or 8 bytes.
+.S2 Pointers
+EM has two kinds of pointers: for instruction and for data
+space.
+Each kind can only be used for its own space, conversion between
+these two subtypes is impossible.
+We assume that pointers have a range from 0 upwards.
+Any implementation may have holes in the pointer range between
+fragments.
+One can of course not expect to be able to address two megabyte
+of memory using a 2-byte pointer.
+Normally, a 2-byte pointer allows up to 65536 bytes of
+addressable memory.
+.A
+Pointer representation has one restriction.
+The pointer with the same representation as the integer zero of
+the same size should be invalid.
+Some languages and/or runtime systems represent the nil
+pointer as zero.
+.S2 "Bit sets"
+All bit sets of size \fBn\fP are subsets of the set
+{~i~|~i>=0,~i<\fBn\fP~}.
+A bit set contains a bit for each element showing its
+presence or absence.
+Bit sets are subdivided into words.
+The word with the lowest EM address governs the subset
+{~i~|~i>=0,~i<\fBm\fP~}, where \fBm\fP is the number of bits in
+a word.
+The next higher words each govern the next higher \fBm\fP set elements.
+The relation between a set with size of
+a word and an unsigned integer word is that
+the value of the unsigned integer is the summation of the
+2\v'-0.5m'i\v'0.5m' where i is in the set.
+.A
+Example: a 2-word bit set (wordsize 2) containing the
+elements 1, 6, 8, 15, 18, 21, 27 and 28 is composed of two
+integers, e.g. at addresses 40 and 42.
+The word at 40 contains the value 33090 (or~-32446),
+the word at 42 contains the value 6180.