Initial revision

1984-06-29 14:46:39 +00:00 · 1984-06-29 14:46:39 +00:00 · e0872423d9
commit e0872423d9
parent 253118db19
21 changed files with 7189 additions and 0 deletions
--- a/doc/em/addend.n
+++ b/doc/em/addend.n
--- a/doc/em/app.nr
+++ b/doc/em/app.nr
@ -0,0 +1,488 @@
 .BP
 .AP "EM INTERPRETER"
 .nf
 .ta 8 16 24 32 40 48 56 64 72 80
 .so em.i
 .fi
 .BP
 .AP "EM CODE TABLES"
 The following table is used by the assembler for EM machine
 language.
 It specifies the opcodes used for each instruction and
 how arguments are mapped to machine language arguments.
 The table is presented in three columns,
 each line in each column contains three or four fields.
 Each line describes a range of interpreter opcodes by
 specifying for which instruction the range is used, the type of the
 opcodes (mini, shortie, etc..) and range for the instruction
 argument.
 .A
 The first field on each line gives the EM instruction mnemonic,
 the second field gives some flags.
 If the opcodes are minis or shorties the third field specifies
 how many minis/shorties are used.
 The last field gives the number of the (first) interpreter
 opcode.
 .N 1
 Flags :
 .IS 3
 .N 1
 Opcode type, only one of the following may be specified.
 .PS - 5 "  "
 .PT -
 opcode without argument
 .PT m
 mini
 .PT s
 shortie
 .PT 2
 opcode with 2-byte signed argument
 .PT 4
 opcode with 4-byte signed argument
 .PT 8
 opcode with 8-byte signed argument
 .PE
 Secondary (escaped) opcodes.
 .PS - 5 "  "
 .PT e
 The opcode thus marked is in the secondary opcode group instead
 of the primary
 .PE
 restrictions on arguments
 .PS - 5 "  "
 .PT N
 Negative arguments only
 .PT P
 Positive and zero arguments only
 .PE
 mapping of arguments
 .PS - 5 "  "
 .PT w
 argument must be divisible by the wordsize and is divided by the
 wordsize before use as opcode argument.
 .PT o
 argument ( possibly after division ) must be >= 1 and is
 decremented before use as opcode argument
 .PE
 .IE
 If the opcode type is 2,4 or 8 the resulting argument is used as
 opcode argument (least significant byte first).
 .N
 If the opcode type is mini, the argument is added
 to the first opcode - if in range - .
 If the argument is negative, the absolute value minus one is
 used in the algorithm above.
 .N
 For shorties with positive arguments the first opcode is used
 for arguments in the range 0..255, the second for the range
 256..511, etc..
 For shorties with negative arguments the first opcode is used
 for arguments in the range -1..-256, the second for the range
 -257..-512, etc..
 The byte following the opcode contains the least significant
 byte of the argument.
 First some examples of these specifications.
 .PS - 5
 .PT "aar mwPo 1 34"
 Indicates that opcode 34 is used as a mini for Positive
 instruction arguments only.
 The w and o indicate division and decrementing of the
 instruction argument.
 Because the resulting argument must be zero ( only opcode 34 may be used
 ), this mini can only be used for instruction argument 2.
 Conclusion: opcode 34 is for "AAR 2".
 .PT "adp sP 1 41"
 Opcode 41 is used as shortie for ADP with arguments in the range
 0..255.
 .PT "bra sN 2 60"
 Opcode 60 is used as shortie for BRA with arguments -1..-256,
 61 is used for arguments -257..-512.
 .PT "zer e- 145"
 Escaped opcode 145 is used for ZER.
 .PE
 The interpreter opcode table:
 .N 1
 .IS 3
 .DS B
 .so itables
 .DE 0
 .IE
 .P
 The table above results in the following dispatch tables.
 Dispatch tables are used by interpreters to jump to the
 routines implementing the EM instructions, indexed by the next opcode.
 Each line of the dispatch tables gives the routine names
 of eight consecutive opcodes, preceded by the first opcode number
 on that line.
 Routine names consist of an EM mnemonic followed by a suffix.
 The suffices show the encoding used for each opcode.
 .N
 The following suffices exist:
 .N 1
 .VS 1 0
 .IS 4
 .PS - 11
 .PT .z
 no arguments
 .PT .l
 16-bit argument
 .PT .lw
 16-bit argument divided by the wordsize
 .PT .p
 positive 16-bit argument
 .PT .pw
 positive 16-bit argument divided by the wordsize
 .PT .n
 negative 16-bit argument
 .PT .nw
 negative 16-bit argument divided by the wordsize
 .PT .s<num>
 shortie with <num> as high order argument byte
 .PT .sw<num>
 shortie with argument divided by the wordsize
 .PT .<num>
 mini with <num> as argument
 .PT .<num>W
 mini with <num>*wordsize as argument
 .PE 3
 <num> is a possibly negative integer.
 .VS 1 1
 .IE
 The dispatch table for the 256 primary opcodes:
 .DS B
   0   loc.0    loc.1    loc.2    loc.3    loc.4    loc.5    loc.6    loc.7
   8   loc.8    loc.9    loc.10   loc.11   loc.12   loc.13   loc.14   loc.15
  16   loc.16   loc.17   loc.18   loc.19   loc.20   loc.21   loc.22   loc.23
  24   loc.24   loc.25   loc.26   loc.27   loc.28   loc.29   loc.30   loc.31
  32   loc.32   loc.33   aar.1W   adf.s0   adi.1W   adi.2W   adp.l    adp.1
  40   adp.2    adp.s0   adp.s-1  ads.1W   and.1W   asp.1W   asp.2W   asp.3W
  48   asp.4W   asp.5W   asp.w0   beq.l    beq.s0   bge.s0   bgt.s0   ble.s0
  56   blm.s0   blt.s0   bne.s0   bra.l    bra.s-1  bra.s-2  bra.s0   bra.s1
  64   cal.1    cal.2    cal.3    cal.4    cal.5    cal.6    cal.7    cal.8
  72   cal.9    cal.10   cal.11   cal.12   cal.13   cal.14   cal.15   cal.16
  80   cal.17   cal.18   cal.19   cal.20   cal.21   cal.22   cal.23   cal.24
  88   cal.25   cal.26   cal.27   cal.28   cal.s0   cff.z    cif.z    cii.z
  96   cmf.s0   cmi.1W   cmi.2W   cmp.z    cms.s0   csa.1W   csb.1W   dec.z
 104   dee.w0   del.w-1  dup.1W   dvf.s0   dvi.1W   fil.l    inc.z    ine.lw
 112   ine.w0   inl.-1W  inl.-2W  inl.-3W  inl.w-1  inn.s0   ior.1W   ior.s0
 120   lae.l    lae.w0   lae.w1   lae.w2   lae.w3   lae.w4   lae.w5   lae.w6
 128   lal.p    lal.n    lal.0    lal.-1   lal.w0   lal.w-1  lal.w-2  lar.W
 136   ldc.0    lde.lw   lde.w0   ldl.0    ldl.w-1  lfr.1W   lfr.2W   lfr.s0
 144   lil.w-1  lil.w0   lil.0    lil.1W   lin.l    lin.s0   lni.z    loc.l
 152   loc.-1   loc.s0   loc.s-1  loe.lw   loe.w0   loe.w1   loe.w2   loe.w3
 160   loe.w4   lof.l    lof.1W   lof.2W   lof.3W   lof.4W   lof.s0   loi.l
 168   loi.1    loi.1W   loi.2W   loi.3W   loi.4W   loi.s0   lol.pw   lol.nw
 176   lol.0    lol.1W   lol.2W   lol.3W   lol.-1W  lol.-2W  lol.-3W  lol.-4W
 184   lol.-5W  lol.-6W  lol.-7W  lol.-8W  lol.w0   lol.w-1  lxa.1    lxl.1
 192   lxl.2    mlf.s0   mli.1W   mli.2W   rck.1W   ret.0    ret.1W   ret.s0
 200   rmi.1W   sar.1W   sbf.s0   sbi.1W   sbi.2W   sdl.w-1  set.s0   sil.w-1
 208   sil.w0   sli.1W   ste.lw   ste.w0   ste.w1   ste.w2   stf.l    stf.W
 216   stf.2W   stf.s0   sti.1    sti.1W   sti.2W   sti.3W   sti.4W   sti.s0
 224   stl.pw   stl.nw   stl.0    stl.1W   stl.-1W  stl.-2W  stl.-3W  stl.-4W
 232   stl.-5W  stl.w-1  teq.z    tgt.z    tlt.z    tne.z    zeq.l    zeq.s0
 240   zeq.s1   zer.s0   zge.s0   zgt.s0   zle.s0   zlt.s0   zne.s0   zne.s-1
 248   zre.lw   zre.w0   zrl.-1W  zrl.-2W  zrl.w-1  zrl.nw   escape1  escape2
 .DE 2
 The list of secondary opcodes (escape1):
 .N  1
 .DS  B
   0   aar.l    aar.z    adf.l    adf.z    adi.l    adi.z    ads.l    ads.z
   8   adu.l    adu.z    and.l    and.z    asp.lw   ass.l    ass.z    bge.l
  16   bgt.l    ble.l    blm.l    bls.l    bls.z    blt.l    bne.l    cai.z
  24   cal.l    cfi.z    cfu.z    ciu.z    cmf.l    cmf.z    cmi.l    cmi.z
  32   cms.l    cms.z    cmu.l    cmu.z    com.l    com.z    csa.l    csa.z
  40   csb.l    csb.z    cuf.z    cui.z    cuu.z    dee.lw   del.pw   del.nw
  48   dup.l    dus.l    dus.z    dvf.l    dvf.z    dvi.l    dvi.z    dvu.l
  56   dvu.z    fef.l    fef.z    fif.l    fif.z    inl.pw   inl.nw   inn.l
  64   inn.z    ior.l    ior.z    lar.l    lar.z    ldc.l    ldf.l    ldl.pw
  72   ldl.nw   lfr.l    lil.pw   lil.nw   lim.z    los.l    los.z    lor.s0
  80   lpi.l    lxa.l    lxl.l    mlf.l    mlf.z    mli.l    mli.z    mlu.l
  88   mlu.z    mon.z    ngf.l    ngf.z    ngi.l    ngi.z    nop.z    rck.l
  96   rck.z    ret.l    rmi.l    rmi.z    rmu.l    rmu.z    rol.l    rol.z
 104   ror.l    ror.z    rtt.z    sar.l    sar.z    sbf.l    sbf.z    sbi.l
 112   sbi.z    sbs.l    sbs.z    sbu.l    sbu.z    sde.l    sdf.l    sdl.pw
 120   sdl.nw   set.l    set.z    sig.z    sil.pw   sil.nw   sim.z    sli.l
 128   sli.z    slu.l    slu.z    sri.l    sri.z    sru.l    sru.z    sti.l
 136   sts.l    sts.z    str.s0   tge.z    tle.z    trp.z    xor.l    xor.z
 144   zer.l    zer.z    zge.l    zgt.l    zle.l    zlt.l    zne.l    zrf.l
 152   zrf.z    zrl.pw   dch.z    exg.s0   exg.l    exg.z    lpb.z    gto.l
 .DE 2
 Finally, the list of opcodes with four byte arguments (escape2).
 .DS
   0  loc
 .DE 0
 .BP
 .AP "AN EXAMPLE PROGRAM"
 .DS B
 1      program example(output);
 2      {This program just demonstrates typical EM code.}
 3      type rec = record r1: integer; r2:real; r3: boolean end;
 4      var mi: integer;  mx:real;  r:rec;
 5
 6      function sum(a,b:integer):integer;
 7      begin
 8        sum := a + b
 9      end;
 10
 11      procedure test(var r: rec);
 12      label 1;
 13      var i,j: integer;
 14          x,y: real;
 15          b: boolean;
 16          c: char;
 17          a: array[1..100] of integer;
 18
 19      begin
 20              j := 1;
 21              i := 3 * j + 6;
 22              x := 4.8;
 23              y := x/0.5;
 24              b := true;
 25              c := 'z';
 26              for i:= 1 to 100 do a[i] := i * i;
 27              r.r1 := j+27;
 28              r.r3 := b;
 29              r.r2 := x+y;
 30              i := sum(r.r1, a[j]);
 31              while i > 0 do begin j := j + r.r1; i := i - 1 end;
 32              with r do begin r3 := b;  r2 := x+y;  r1 := 0 end;
 33              goto 1;
 34      1:      writeln(j, i:6, x:9:3, b)
 35      end; {test}
 36      begin {main program}
 37        mx := 15.96;
 38        mi := 99;
 39        test(r)
 40      end.
 .DE 0
 .BP
 The EM code as produced by the Pascal-VU compiler is given below. Comments
 have been added manually.  Note that this code has already been  optimized.
 .DS B
  mes 2,2,2              ; wordsize 2, pointersize 2
 .1
  rom 't.p\e000'         ; the name of the source file
  hol 552,-32768,0       ; externals and buf occupy 552 bytes
  exp $sum               ; sum can be called from other modules
  pro $sum,2             ; procedure sum; 2 bytes local storage
  lin 8                  ; code from source line 8
  ldl 0                  ; load two locals ( a and b )
  adi 2                  ; add them
  ret 2                  ; return the result
  end 2                  ; end of procedure ( still two bytes local storage )
 .2
  rom 1,99,2             ; descriptor of array a[]
  exp $test              ; the compiler exports all level 0 procedures
  pro $test,226          ; procedure test, 226 bytes local storage
 .3
  rom 4.8F8              ; assemble Floating point 4.8 (8 bytes) in
 .4                              ; global storage
  rom 0.5F8              ; same for 0.5
  mes 3,-226,2,2         ; compiler temporary not referenced by address
  mes 3,-24,2,0          ; the same is true for i, j, b and c in test
  mes 3,-22,2,0
  mes 3,-4,2,0
  mes 3,-2,2,0
  mes 3,-20,8,0          ; and for x and y
  mes 3,-12,8,0
  lin 20                 ; maintain source line number
  loc 1
  stl -4                 ; j := 1
  lni                    ; lin 21 prior to optimization
  lol -4
  loc 3
  mli 2
  loc 6
  adi 2
  stl -2                 ; i := 3 * j + 6
  lni                    ; lin 22 prior to optimization
  lae .3
  loi 8
  lal -12
  sti 8                  ; x := 4.8
  lni                    ; lin 23 prior to optimization
  lal -12
  loi 8
  lae .4
  loi 8
  dvf 8
  lal -20
  sti 8                  ; y := x / 0.5
  lni                    ; lin 24 prior to optimization
  loc 1
  stl -22                ; b := true
  lni                    ; lin 25 prior to optimization
  loc 122
  stl -24                ; c := 'z'
  lni                    ; lin 26 prior to optimization
  loc 1
  stl -2                 ; for i:= 1
 2
  lol -2
  dup 2
  mli 2                  ; i*i
  lal -224
  lol -2
  lae .2
  sar 2                  ; a[i] :=
  lol -2
  loc 100
  beq *3                 ; to 100 do
  inl -2                 ; increment i and loop
  bra *2
 3
  lin 27
  lol -4
  loc 27
  adi 2                  ; j + 27
  sil 0                  ; r.r1 :=
  lni                    ; lin 28 prior to optimization
  lol -22                ; b
  lol 0
  stf 10                 ; r.r3 :=
  lni                    ; lin 29 prior to optimization
  lal -20
  loi 16
  adf 8                  ; x + y
  lol 0
  adp 2
  sti 8                  ; r.r2 :=
  lni                    ; lin 23 prior to optimization
  lal -224
  lol -4
  lae .2
  lar 2                  ; a[j]
  lil 0                  ; r.r1
  cal $sum               ; call now
  asp 4                  ; remove parameters from stack
  lfr 2                  ; get function result
  stl -2                 ; i :=
 4
  lin 31
  lol -2
  zle *5                 ; while i > 0 do
  lol -4
  lil 0
  adi 2
  stl -4                 ; j := j + r.r1
  del -2                 ; i := i - 1
  bra *4                 ; loop
 5
  lin 32
  lol 0
  stl -226               ; make copy of address of r
  lol -22
  lol -226
  stf 10                 ; r3 := b
  lal -20
  loi 16
  adf 8
  lol -226
  adp 2
  sti 8                  ; r2 := x + y
  loc 0
  sil -226               ; r1 := 0
  lin 34                 ; note the abscence of the unnecesary jump
  lae 22                 ; address of output structure
  lol -4
  cal $_wri              ; write integer with default width
  asp 4                  ; pop parameters
  lae 22
  lol -2
  loc 6
  cal $_wsi              ; write integer width 6
  asp 6
  lae 22
  lal -12
  loi 8
  loc 9
  loc 3
  cal $_wrf              ; write fixed format real, width 9, precision 3
  asp 14
  lae 22
  lol -22
  cal $_wrb              ; write boolean, default width
  asp 4
  lae 22
  cal $_wln              ; writeln
  asp 2
  ret 0                  ; return, no result
  end 226
  exp $_main
  pro $_main,0           ; main program
 .6
  con 2,-1,22            ; description of external files
 .5
  rom 15.96F8
  fil .1                 ; maintain source file name
  lae .6                 ; description of external files
  lae 0                  ; base of hol area to relocate buffer addresses
  cal $_ini              ; initialize files, etc...
  asp 4
  lin 37
  lae .5
  loi 8
  lae 2
  sti 8                  ; mx := 15.96
  lni                    ; lin 38 prior to optimization
  loc 99
  ste 0                  ; mi := 99
  lni                    ; lin 39 prior to optimization
  lae 10                 ; address of r
  cal $test
  asp 2
  loc 0                  ; normal exit
  cal $_hlt              ; cleanup and finish
  asp 2
  end 0
  mes 5                  ; reals were used
 .DE 0
 The compact code corresponding to the above program is listed below.
 Read it horizontally, line by line, not column by column.
 Each number represents a byte of compact code, printed in decimal.
 The first two bytes form the magic word.
 .N 1
 .IS 3
 .DS B
 173   0 159 122 122 122 255 242   1 161 250 124 116  46 112   0
 255 156 245  40   2 245   0 128 120 155 249 123 115 117 109 160
 249 123 115 117 109 122  67 128  63 120   3 122  88 122 152 122
 242   2 161 121 219 122 255 155 249 124 116 101 115 116 160 249
 124 116 101 115 116 245 226   0 242   3 161 253 128 123  52  46
 56 255 242   4 161 253 128 123  48  46  53 255 159 123 245  30
 255 122 122 255 159 123  96 122 120 255 159 123  98 122 120 255
 159 123 116 122 120 255 159 123 118 122 120 255 159 123 100 128
 120 255 159 123 108 128 120 255  67 140  69 121 113 116  68  73
 116  69 123  81 122  69 126   3 122 113 118  68  57 242   3  72
 128  58 108 112 128  68  58 108  72 128  57 242   4  72 128  44
 128  58 100 112 128  68  69 121 113  98  68  69 245 122   0 113
 96  68  69 121 113 118 182  73 118  42 122  81 122  58 245  32
 255  73 118  57 242   2  94 122  73 118  69 220  10 123  54 118
 18 122 183  67 147  73 116  69 147   3 122 104 120  68  73  98
 73 120 111 130  68  58 100  72 136   2 128  73 120   4 122 112
 128  68  58 245  32 255  73 116  57 242   2  59 122  65 120  20
 249 123 115 117 109   8 124  64 122 113 118 184  67 151  73 118
 128 125  73 116  65 120   3 122 113 116  41 118  18 124 185  67
 152  73 120 113 245  30 255  73  98  73 245  30 255 111 130  58
 100  72 136   2 128  73 245  30 255   4 122 112 128  69 120 104
 245  30 255  67 154  57 142  73 116  20 249 124  95 119 114 105
  8 124  57 142  73 118  69 126  20 249 124  95 119 115 105   8
 126  57 142  58 108  72 128  69 129  69 123  20 249 124  95 119
 114 102   8 134  57 142  73  98  20 249 124  95 119 114  98   8
 124  57 142  20 249 124  95 119 108 110   8 122  88 120 152 245
 226   0 155 249 125  95 109  97 105 110 160 249 125  95 109  97
 105 110 120 242   6 151 122 119 142 255 242   5 161 253 128 125
 49  53  46  57  54 255  50 242   1  57 242   6  57 120  20 249
 124  95 105 110 105   8 124  67 157  57 242   5  72 128  57 122
 112 128  68  69 219 110 120  68  57 130  20 249 124 116 101 115
 116   8 122  69 120  20 249 124  95 104 108 116   8 122 152 120
 159 124 160 255 159 125 255
 .DE 0
 .IE
 .MS T A 0
 .ME
 .BP
 .MS B A 0
 .ME
 .CT
--- a/doc/em/assem.nr
+++ b/doc/em/assem.nr
@ -0,0 +1,756 @@
 .BP
 .SN 11
 .S1 "EM ASSEMBLY LANGUAGE"
 We use two representations for assembly language programs,
 one is in ASCII and the other is the compact assembly language.
 The latter needs less space than the first for the same program
 and therefore allows faster processing.
 Our only program accepting ASCII assembly
 language converts it to the compact form.
 All other programs expect compact assembly input.
 The first part of the chapter describes the ASCII assembly
 language and its semantics.
 The second part describes the syntax of the compact assembly
 language.
 The last part lists the EM instructions with the type of
 arguments allowed and an indication of the function.
 Appendix A gives a detailed description of the effect of all
 instructions in the form of a Pascal program.
 .S2 "ASCII assembly language"
 An assembly language program consists of a series of lines, each
 line may be blank, contain one (pseudo)instruction or contain one
 label.
 Input to the assembler is in lower case.
 Upper case is used in this
 document merely to distinguish keywords from the surrounding prose.
 Comment is allowed at the end of each line and starts with a semicolon ";".
 This kind of comment does not exist in the compact form.
 .A
 Labels must be placed all by themselves on a line and start in
 column 1.
 There are two kinds of labels, instruction and data labels.
 Instruction labels are unsigned positive integers.
 The scope of an instruction label is its procedure.
 .A
 The pseudoinstructions CON, ROM and BSS may be preceded by a
 line containing a
 1-8 character data label, the first character of which is a
 letter, period or underscore.
 The period may only be followed by
 digits, the others may be followed by letters, digits and underscores.
 The use of the character "." followed by a constant,
 which must be in the range 1 to 32767 (e.g. ".40") is recommended
 for compiler
 generated programs.
 These labels are considered as a special case and handled
 more efficiently in compact assembly language (see below).
 Note that a data label on its own or two consecutive labels are not
 allowed.
 .P
 Each statement may contain an instruction mnemonic or pseudoinstruction.
 These must begin in column 2 or later (not column 1) and must be followed
 by a space, tab, semicolon or LF.
 Everything on the line following a semicolon is
 taken as a comment.
 .P
 Each input file contains one module.
 A module may contain many procedures,
 which may be nested.
 A procedure consists of
 a PRO statement, a (possibly empty)
 collection of instructions and pseudoinstructions and finally an END
 statement.
 Pseudoinstructions are also allowed between procedures.
 They do not belong to a specific procedure.
 .P
 All constants in EM are interpreted in the decimal base.
 The ASCII assembly language accepts constant expressions
 wherever constants are allowed.
 The operators recognized are: +, -, *, % and / with the usual
 precedence order.
 Use of the parentheses ( and ) to alter the precedence order is allowed.
 .S3 "Instruction arguments"
 Unlike many other assembly languages, the EM assembly
 language requires all arguments of normal and pseudoinstructions
 to be either a constant or an identifier, but not a combination
 of these two.
 There is one exception to this rule: when a data label is used
 for initialization or as an instruction argument,
 expressions of the form 'label+constant' and 'label-constant'
 are allowed.
 This makes it possible to address, for example, the
 third word of a ten word BSS block
 directly.
 Thus LOE LABEL+4 is permitted and so is CON LABEL+3.
 The resulting address is must be in the same fragment as the label.
 It is not allowed to add or subtract from instruction labels or procedure
 identifiers,
 which certainly is not a severe restriction and greatly aids
 optimization.
 .P
 Instruction arguments can be constants,
 data labels, data labels offsetted by a constant, instruction
 labels and procedure identifiers.
 The range of integers allowed depends on the instruction.
 Most instructions allow only integers
 (signed or unsigned)
 that fit in a word.
 Arguments used as offsets to pointers should fit in a
 pointer-sized integer.
 Finally, arguments to LDC should fit in a double-word integer.
 .P
 Several instructions have two possible forms:
 with an explicit argument and with an implicit argument on top of the stack.
 The size of the implicit argument is the wordsize.
 The implicit argument is always popped before all other operands.
 For example: 'CMI 4' specifies that two four-byte signed
 integers on top of the stack are to be compared.
 \&'CMI' without an argument expects a wordsized integer
 on top of the stack that specifies the size of the integers to
 be compared.
 Thus the following two sequences are equivalent:
 .N 2
 .TS
 center, tab(:) ;
 l r 30 l r.
 LDL:-10:LDL:-10
 LDL:-14:LDL:-14
 ::LOC:4
 CMI:4:CMI:
 ZEQ:*1:ZEQ:*1
 .TE 2
 Section 11.1.6 shows the arguments allowed for each instruction.
 .S3 "Pseudoinstruction arguments"
 Pseudoinstruction arguments can be divided in two classes:
 Initializers and others.
 The following initializers are allowed: signed integer constants,
 unsigned integer constants, floating-point constants, strings,
 data labels, data labels offsetted by a constant, instruction
 labels and procedure identifiers.
 .P
 Constant initializers in BSS, HOL, CON and ROM pseudoinstructions
 can be followed by a letter I, U or F.
 This indicator
 specifies the type of the initializer: Integer, Unsigned or Float.
 If no indicator is present I is assumed.
 The size of the object is the wordsize unless
 the indicator is followed by an integer specifying the
 object's size.
 This integer is governed by the same restrictions as for
 transfer of objects to/from memory.
 As in instruction arguments, initializers include expressions of the form:
 \&"LABEL+offset" and "LABEL-offset".
 The offset must be an unsigned decimal constant.
 The 'IUF' indicators cannot be used in the offsets.
 .P
 Data labels are referred to by their name.
 .P
 Strings are surrounded by double quotes (").
 Semecolon's in string do not indicate the start of comment.
 In the ASCII representation the escape character \e (backslash)
 alters the meaning of subsequent character(s).
 This feature allows inclusion of zeroes, graphic characters and
 the double quote in the string.
 The following escape sequences exist:
 .DS
 .TS
 center, tab(:);
 l l l.
 newline:NL\|(LF):\en
 horizontal tab:HT:\et
 backspace:BS:\eb
 carriage return:CR:\er
 form feed:FF:\ef
 backslash:\e:\e\e
 double quote:":\e"
 bit pattern:\fBddd\fP:\e\fBddd\fP
 .TE
 .DE
 The escape \fBddd\fP consists of the backslash followed by 1,
 2, or 3 octal digits specifing the value of
 the desired character.
 If the character following a backslash is not one of those
 specified,
 the backslash is ignored.
 Example: CON "hello\e012\e0".
 Each string element initializes a single byte.
 The ASCII character set is used to map characters onto values.
 Strings are padded with zeroes up to a multiple of the wordsize.
 .P
 Instruction labels are referred to as *1, *2, etc.  in both branch
 instructions and as initializers.
 .P
 The notation $procname means the identifier for the procedure
 with the specified name.
 This identifier has the size of a pointer.
 .S3 Notation
 First, the notation used for the arguments, classes of
 instructions and pseudoinstructions.
 .IS 2
 .TS
 tab(:);
 l l l.
 <cst>:\&=:integer constant (current range -2**31..2**31-1)
 <dlb>:\&=:data label
 <arg>:\&=:<cst> or <dlb> or <dlb>+<cst> or <dlb>-<cst>
 <con>:\&=:integer constant, unsigned constant, floating-point constant
 <str>:\&=:string constant (surrounded by double quotes),
 <ilb>:\&=:instruction label
 ::'*' followed by an integer in the range 0..32767.
 <pro>:\&=:procedure number ('$' followed by a procedure name)
 <val>:\&=:<arg>, <con>, <pro> or <ilb>.
 <par>:\&=:<val> or <str>
 <...>*:\&=:zero or more of <...>
 <...>+:\&=:one or more of <...>
 [...]:\&=:optional ...
 .TE
 .IE
 .S3 "Pseudoinstructions"
 .S4 Storage declaration
 Initialized global data is allocated by the pseudoinstruction CON,
 which needs at least one argument.
 For each argument, an integral number of words,
 determined by the argument type, is allocated and initialized.
 .P
 The pseudoinstruction ROM is the same as CON,
 except that it guarantees that the initialized words
 will not change during the execution of the program.
 This information allows optimizers to do
 certain calculations such as array indexing and
 subrange checking at compile time instead
 of at run time.
 .P
 The pseudoinstruction BSS allocates
 uninitialized global data or large blocks of data initialized
 by the same value.
 The first argument to this pseudo is the number
 of bytes required, which must be a multiple of the wordsize.
 The other arguments specify the value used for initialization and
 whether the initialization is only for convenience or a strict necessity.
 The pseudoinstruction HOL is similar to BSS in that it requests an
 (un)initialized global data block.
 Addressing of a HOL block, however, is quasi absolute.
 The first byte is addressed by 0,
 the second byte by 1 etc. in assembly language.
 The assembler/loader adds the base address of
 the HOL block to these numbers to obtain the
 absolute address in the machine language.
 .P
 The scope of a HOL block starts at the HOL pseudo and
 ends at the next HOL pseudo or at the end of a module
 whatever comes first.
 Each instruction falls in the scope of at most one
 HOL block, the current HOL block.
 It is not allowed to have more than one HOL block per procedure.
 .P
 The alignment restrictions are enforced by the
 pseudoinstructions.
 All objects are aligned on a multiple of their size or the wordsize
 whichever is smaller.
 Switching to another type of fragment or placing a label forces
 word-alignment.
 There are three types of fragments in global data space: CON, ROM and
 BSS/HOL.
 .N 2
 .IS 2
 .PS - 4
 .PT "BSS <cst1>,<val>,<cst2>"
 Reserve <cst1> bytes.
 <val> is the value used to initialize the area.
 <cst1> must be a multiple of the size of <val>.
 <cst2> is 0 if the initialization is not strictly necessary,
 1 if it is.
 .PT "HOL <cst1>,<val>,<cst2>"
 Idem, but all following absolute global data references will
 refer to this block.
 Only one HOL is allowed per procedure,
 it has to be placed before the first instruction.
 .PT "CON <val>+"
 Assemble global data words initialized with the <val> constants.
 .PT "ROM <val>+"
 Idem, but the initialized data will never be changed by the program.
 .PE
 .IE
 .S4 Partitioning
 Two pseudoinstructions partition the input into procedures:
 .IS 2
 .PS - 4
 .PT "PRO <pro>[,<cst>]"
 Start of procedure.
 <pro> is the procedure name.
 <cst> is the number of bytes for locals.
 The number of bytes for locals must be specified in the PRO or
 END pseudoinstruction.
 When specified in both, they must be identical.
 .PT "END  [<cst>]"
 End of Procedure.
 <cst> is the number of bytes for locals.
 The number of bytes for locals must be specified in either the PRO or
 END pseudoinstruction or both.
 .PE
 .IE
 .S4 Visibility
 Names of data and procedures in an EM module can either be
 internal or external.
 External names are known outside the module and are used to link
 several pieces of a program.
 Internal names are not known outside the modules they are used in.
 Other modules will not 'see' an internal name.
 .A
 To reduce the number of passes needed,
 it must be known at the first occurrence whether
 a name is internal or external.
 If the first occurrence of a name is in a definition,
 the name is considered to be internal.
 If the first occurrence of a name is a reference,
 the name is considered to be external.
 If the first occurrence is in one of the following pseudoinstructions,
 the effect of the pseudo has precedence.
 .IS 2
 .PS - 4
 .PT "EXA <dlb>"
 External name.
 <dlb> is known, possibly defined, outside this module.
 Note that <dlb> may be defined in the same module.
 .PT "EXP <pro>"
 External procedure identifier.
 Note that <pro> may be defined in the same module.
 .PT "INA <dlb>"
 Internal name.
 <dlb> is internal to this module and must be defined in this module.
 .PT "INP <pro>"
 Internal procedure.
 <pro> is internal to this module and must be defined in this module.
 .PE
 .IE
 .S4 Miscellaneous
 Two other pseudoinstructions provide miscellaneous features:
 .IS 2
 .PS - 4
 .PT "EXC <cst1>,<cst2>"
 Two blocks of instructions preceding this one are
 interchanged before being processed.
 <cst1> gives the number of lines of the first block.
 <cst2> gives the number of lines of the second one.
 Blank and pure comment lines do not count.
 .PT "MES <cst>[,<par>]*"
 A special type of comment.
 Used by compilers to communicate with the
 optimizer, assembler, etc. as follows:
 .VS 1 0
 .PS - 4
 .PT "MES 0"
 An error has occurred, stop further processing.
 .PT "MES 1"
 Suppress optimization.
 .PT "MES 2,<cst1>,<cst2>"
 Use wordsize <cst1> and pointer size <cst2>.
 .PT "MES 3,<cst1>,<cst2>,<cst3>,<cst4>"
 Indicates that a local variable is never referenced indirectly.
 Used to indicate that a register may be used for a specific
 variable.
 <cst1> is offset in bytes from AB if positive
 and offset from LB if negative.
 <cst2> gives the size of the variable.
 <cst3> indicates the class of the variable.
 The following values are currently recognized:
 .PS
 .PT 0
 The variable can be used for anything.
 .PT 1
 The variable is used as a loopindex.
 .PT 2
 The variable is used as a pointer.
 .PT 3
 The variable is used as a floating point number.
 .PE 0
 <cst4> gives the priority of the variable,
 higher numbers indicate better candidates.
 .PT "MES 4,<cst>,<str>"
 Number of source lines in file <str> (for profiler).
 .PT "MES 5"
 Floating point used.
 .PT "MES 6,<val>*"
 Comment.  Used to provide comments in compact assembly language.
 .PT "MES 7,....."
 Reserved.
 .PT "MES 8,<pro>[,<dlb>]..."
 Library module. Indicates that the module may only be loaded
 if it is useful, that is, if it can satisfy any unresolved
 references during the loading process.
 May not be preceded by any other pseudo, except MES's.
 .PT "MES 9,<cst>"
 Guarantees that no more than <cst> bytes of parameters are
 accessed, either directly or indirectly.
 .PE 1
 .VS 1 1
 Each backend is free to skip irrelevant MES pseudos.
 .PE
 .IE
 .S2 "The Compact Assembly Language"
 The assembler accepts input in a highly encoded form.
 This
 form is intended to reduce the amount of file transport between the
 front ends, optimizers
 and back ends, and also reduces the amount of storage required for storing
 libraries.
 Libraries are stored as archived compact assembly language, not machine
 language.
 .P
 When beginning to read the input, the assembler is in neutral state, and
 expects either a label or an instruction (including the pseudoinstructions).
 The meaning of the next byte(s) when in neutral state is as follows, where
 b1, b2
 etc. represent the succeeding bytes.
 .N 1
 .DS
 .TS
 tab(:) ;
 rw17 4 l.
 0:Reserved for future use
 1-129:Machine instructions, see Appendix A, alphabetical list
 130-149:Reserved for future use
 150-161:BSS,CON,END,EXA,EXC,EXP,HOL,INA,INP,MES,PRO,ROM
 162-179:Reserved for future pseudoinstructions
 180-239:Instruction labels 0 - 59  (180 is local label 0 etc.)
 240-244:See the Common Table below
 245-255:Not used
 .TE 1
 .DE 0
 After a label, the assembler is back in neutral state; it can immediately
 accept another label or an instruction in the next byte.
 No linefeeds are used to separate lines.
 .P
 If an opcode expects no arguments,
 the assembler is back in neutral state after
 reading the one byte containing the instruction number.
 If it has one or
 more arguments (only pseudos have more than 1), the arguments follow directly,
 encoded as follows:
 .N 1
 .IS 2
 .TS
 tab(:);
 r l.
 0-239:Offsets from -120 to 119
 240-255:See the Common Table below
 .TE 1
 Absence of an optional argument is indicated by a special
 byte.
 .IE 2
 .CS
 Common Table for Neutral State and Arguments
 .CE
 .TS
 tab(:);
 c c s c
 l8 l l8 l.
 class:bytes:description
 <ilb>:240:b1:Instruction label b1  (Not used for branches)
 <ilb>:241:b1 b2:16 bit instruction label  (256*b2 + b1)
 <dlb>:242:b1:Global label .0-.255, with b1 being the label
 <dlb>:243:b1 b2:Global label .0-.32767
 :::with 256*b2+b1 being the label
 <dlb>:244:<string>:Global symbol not of the form .nnn
 <cst>:245:b1 b2:16 bit constant
 <cst>:246:b1 b2 b3 b4:32 bit constant
 <cst>:247:b1 .. b8:64 bit constant
 <arg>:248:<dlb><cst>:Global label + (possibly negative) constant
 <pro>:249:<string>:Procedure name  (not including $)
 <str>:250:<string>:String used in CON or ROM (no quotes-no escapes)
 <con>:251:<cst><string>:Integer constant, size <cst> bytes
 <con>:252:<cst><string>:Unsigned constant, size <cst> bytes
 <con>:253:<cst><string>:Floating constant, size <cst> bytes
 :254::unused
 <end>:255::Delimiter for argument lists or
 :::indicates absence of optional argument
 .TE 1
 .P
 The bytes specifying the value of a 16, 32 or 64 bit constant
 are presented in two's complement notation, with the least
 significant byte first. For example: the value of a 32 bit
 constant is ((s4*256+b3)*256+b2)*256+b1, where s4 is b4-256 if
 b4 is greater than 128 else s4 takes the value of b4.
 A <string> consists of a <cst> inmediatly followed by
 a sequence of bytes with length <cst>.
 .P
 .ne 8
 The pseudoinstructions fall into several categories, depending on their
 arguments:
 .N 1
 .DS
 Group 1 -- EXC, BSS, HOL have a known number of arguments
 Group 2 -- EXA, EXP, INA, INP have a string as argument
 Group 3 -- CON, MES, ROM have a variable number of various things
 Group 4 -- END, PRO have a trailing optional argument.
 .DE 1
 Groups 1 and 2
 use the encoding described above.
 Group 3 also uses the encoding listed above, with an <end> byte after the
 last argument to indicate the end of the list.
 Group 4 uses
 an <end> byte if the trailing argument is not present.
 .N 2
 .IS 2
 .TS
 tab(|);
 l s l
 l s s
 l 2 lw(46) l.
 Example  ASCII|Example compact
 (LOC = 69, BRA = 18 here):
 2||182
 1||181
 LOC|10|69 130
 LOC|-10|69 110
 LOC|300|69 245 44 1
 BRA|*19|18 139
 300||241 44 1
 .3||242 3
 CON|4,9,*2,$foo|151 124 129 240 2 249 123 102 111 111 255
 CON|.35|151 242 35 255
 .TE 0
 .IE 0
 .BP
 .S2 "Assembly language instruction list"
 .P
 For each instruction in the list the range of argument values
 in the assembly language is given.
 The column headed \fIassem\fP contains the mnemonics defined
 in 11.1.3.
 The following column specifies restrictions of the argument
 value.
 Addresses have to obey the restrictions mentioned in chapter 2.
 The classes of arguments
 are indicated by letters:
 .ds b \fBb\fP
 .ds c \fBc\fP
 .ds d \fBd\fP
 .ds g \fBg\fP
 .ds f \fBf\fP
 .ds l \fBl\fP
 .ds n \fBn\fP
 .ds w \fBw\fP
 .ds p \fBp\fP
 .ds r \fBr\fP
 .ds s \fBs\fP
 .ds z \fBz\fP
 .ds o \fBo\fP
 .ds - \fB-\fP
 .N 1
 .TS
 tab(:);
 c s l l
 l l 15 l l.
 \fIassem\fP:constraints:rationale
 \&\*c:cst:fits word:constant
 \&\*d:cst:fits double word:constant
 \&\*l:cst::local offset
 \&\*g:arg:>= 0:global offset
 \&\*f:cst::fragment offset
 \&\*n:cst:>= 0:counter
 \&\*s:cst:>0 , word multiple:object size
 \&\*z:cst:>= 0 , zero or word multiple:object size
 \&\*o:cst:>= 0 , word multiple or fraction:object size
 \&\*w:cst:> 0 , word multiple:object size *
 \&\*p:pro::pro identifier
 \&\*b:ilb:>= 0:label number
 \&\*r:cst:0,1,2:register number
 \&\*-:::no argument
 .TE 1
 .P
 The * at the rationale for \*w indicates that the argument
 can either be given as argument or on top of the stack.
 If the argument is omitted, the argument is fetched from the
 stack;
 it is assumed to be a wordsized unsigned integer.
 Instructions that check for undefined integer or floating-point
 values and underflow or overflow
 are indicated below by (*).
 .N 1
 .DS B
 GROUP 1 - LOAD
  LOC \*c : Load constant (i.e. push one word onto the stack)
  LDC \*d : Load double constant ( push two words )
  LOL \*l : Load word at \*l-th local (\*l<0) or parameter (\*l>=0)
  LOE \*g : Load external word \*g
  LIL \*l : Load word pointed to by \*l-th local or parameter
  LOF \*f : Load offsetted (top of stack + \*f yield address)
  LAL \*l : Load address of local or parameter
  LAE \*g : Load address of external
  LXL \*n : Load lexical (address of LB \*n static levels back)
  LXA \*n : Load lexical (address of AB \*n static levels back)
  LOI \*o : Load indirect \*o bytes (address is popped from the stack)
  LOS \*w : Load indirect, \*w-byte integer on top of stack gives object size
  LDL \*l : Load double local or parameter (two consecutive words are stacked)
  LDE \*g : Load double external (two consecutive externals are stacked)
  LDF \*f : Load double offsetted (top of stack + \*f yield address)
  LPI \*p : Load procedure identifier
 GROUP 2 - STORE
  STL \*l : Store local or parameter
  STE \*g : Store external
  SIL \*l : Store into word pointed to by \*l-th local or parameter
  STF \*f : Store offsetted
  STI \*o : Store indirect \*o bytes (pop address, then data)
  STS \*w : Store indirect, \*w-byte integer on top of stack gives object size
  SDL \*l : Store double local or parameter
  SDE \*g : Store double external
  SDF \*f : Store double offsetted
 GROUP 3 - INTEGER ARITHMETIC
  ADI \*w : Addition (*)
  SBI \*w : Subtraction (*)
  MLI \*w : Multiplication (*)
  DVI \*w : Division (*)
  RMI \*w : Remainder (*)
  NGI \*w : Negate (two's complement) (*)
  SLI \*w : Shift left (*)
  SRI \*w : Shift right (*)
 GROUP 4 - UNSIGNED ARITHMETIC
  ADU \*w : Addition
  SBU \*w : Subtraction
  MLU \*w : Multiplication
  DVU \*w : Division
  RMU \*w : Remainder
  SLU \*w : Shift left
  SRU \*w : Shift right
 GROUP 5 - FLOATING POINT ARITHMETIC
  ADF \*w : Floating add (*)
  SBF \*w : Floating subtract (*)
  MLF \*w : Floating multiply (*)
  DVF \*w : Floating divide (*)
  NGF \*w : Floating negate (*)
  FIF \*w : Floating multiply and split integer and fraction part (*)
  FEF \*w : Split floating number in exponent and fraction part (*)
 GROUP 6 - POINTER ARITHMETIC
  ADP \*f : Add \*f to pointer on top of stack
  ADS \*w : Add \*w-byte value and pointer
  SBS \*w : Subtract pointers in same fragment and push diff as size \*w integer
 GROUP 7 - INCREMENT/DECREMENT/ZERO
  INC \*- : Increment word on top of stack by 1 (*)
  INL \*l : Increment local or parameter (*)
  INE \*g : Increment external (*)
  DEC \*- : Decrement word on top of stack by 1 (*)
  DEL \*l : Decrement local or parameter (*)
  DEE \*g : Decrement external (*)
  ZRL \*l : Zero local or parameter
  ZRE \*g : Zero external
  ZRF \*w : Load a floating zero of size \*w
  ZER \*w : Load \*w zero bytes
 GROUP 8 - CONVERT    (stack: source, source size, dest. size (top))
  CII \*- : Convert integer to integer (*)
  CUI \*- : Convert unsigned to integer (*)
  CFI \*- : Convert floating to integer (*)
  CIF \*- : Convert integer to floating (*)
  CUF \*- : Convert unsigned to floating (*)
  CFF \*- : Convert floating to floating (*)
  CIU \*- : Convert integer to unsigned
  CUU \*- : Convert unsigned to unsigned
  CFU \*- : Convert floating to unsigned
 GROUP 9 - LOGICAL
  AND \*w : Boolean and on two groups of \*w bytes
  IOR \*w : Boolean inclusive or on two groups of \*w bytes
  XOR \*w : Boolean exclusive or on two groups of \*w bytes
  COM \*w : Complement (one's complement of top \*w bytes)
  ROL \*w : Rotate left a group of \*w bytes
  ROR \*w : Rotate right a group of \*w bytes
 GROUP 10 - SETS
  INN \*w : Bit test on \*w byte set (bit number on top of stack)
  SET \*w : Create singleton \*w byte set with bit n on (n is top of stack)
 GROUP 11 - ARRAY
  LAR \*w : Load array element, descriptor contains integers of size \*w
  SAR \*w : Store array element
  AAR \*w : Load address of array element
 GROUP 12 - COMPARE
  CMI \*w : Compare \*w byte integers, Push negative, zero, positive for <, = or >
  CMF \*w : Compare \*w byte reals
  CMU \*w : Compare \*w byte unsigneds
  CMS \*w : Compare \*w byte values, can only be used for bit for bit equality test
  CMP \*- : Compare pointers
  TLT \*- : True if less, i.e. iff top of stack < 0
  TLE \*- : True if less or equal, i.e. iff top of stack <= 0
  TEQ \*- : True if equal, i.e. iff top of stack = 0
  TNE \*- : True if not equal, i.e. iff top of stack non zero
  TGE \*- : True if greater or equal, i.e. iff top of stack >= 0
  TGT \*- : True if greater, i.e. iff top of stack > 0
 GROUP 13 - BRANCH
  BRA \*b : Branch unconditionally to label \*b
  BLT \*b : Branch less (pop 2 words, branch if top > second)
  BLE \*b : Branch less or equal
  BEQ \*b : Branch equal
  BNE \*b : Branch not equal
  BGE \*b : Branch greater or equal
  BGT \*b : Branch greater
  ZLT \*b : Branch less than zero (pop 1 word, branch negative)
  ZLE \*b : Branch less or equal to zero
  ZEQ \*b : Branch equal zero
  ZNE \*b : Branch not zero
  ZGE \*b : Branch greater or equal zero
  ZGT \*b : Branch greater than zero
 GROUP 14 - PROCEDURE CALL
  CAI \*- : Call procedure (procedure identifier on stack)
  CAL \*p : Call procedure (with identifier \*p)
  LFR \*s : Load function result
  RET \*z : Return (function result consists of top \*z bytes)
 GROUP 15 - MISCELLANEOUS
  ASP \*f : Adjust the stack pointer by \*f
  ASS \*w : Adjust the stack pointer by \*w-byte integer
  BLM \*z : Block move \*z bytes; first pop destination addr, then source addr
  BLS \*w : Block move, size is in \*w-byte integer on top of stack
  CSA \*w : Case jump; address of jump table at top of stack
  CSB \*w : Table lookup jump; address of jump table at top of stack
  DCH \*- : Follow dynamic chain, convert LB to LB of caller
  DUP \*s : Duplicate top \*s bytes
  DUS \*w : Duplicate top \*w bytes
  EXG \*w : Exchange top \*w bytes
  FIL \*g : File name (external 4 := \*g)
  GTO \*g : Non-local goto, descriptor at \*g
  LIM \*- : Load 16 bit ignore mask
  LIN \*n : Line number (external 0 := \*n)
  LNI \*- : Line number increment
  LOR \*r : Load register (0=LB, 1=SP, 2=HP)
  LPB \*- : Convert local base to argument base
  MON \*- : Monitor call
  NOP \*- : No operation
  RCK \*w : Range check; trap on error
  RTT \*- : Return from trap
  SIG \*- : Trap errors to proc identifier on top of stack, -2 resets default
  SIM \*- : Store 16 bit ignore mask
  STR \*r : Store register (0=LB, 1=SP, 2=HP)
  TRP \*- : Cause trap to occur (Error number on stack)
 .DE 0
--- a/doc/em/descr.nr
+++ b/doc/em/descr.nr
@ -0,0 +1,164 @@
 .SN 7
 .BP
 .S1 "DESCRIPTORS"
 Several instructions use descriptors, notably the range check instruction,
 the array instructions, the goto instruction and the case jump instructions.
 Descriptors reside in data space.
 They may be constructed at run time, but
 more often they are fixed and allocated in ROM data.
 .P
 All instructions using descriptors, except GTO, have as argument
 the size of the integers in the descriptor.
 All implementations have to allow integers of the size of a
 word in descriptors.
 All integers popped from the stack and used for indexing or comparing
 must have the same size as the integers in the descriptor.
 .S2 "Range check descriptors"
 Range check descriptors consist of two integers:
 .IS 2
 .PS 1 4 "" .
 .PT
 lower bound~~~~~~~signed
 .PT
 upper bound~~~~~~~signed
 .PE
 .IE
 The range check instruction checks an integer on the stack against
 these bounds and causes a trap if the value is outside the interval.
 The value itself is neither changed nor removed from the stack.
 .S2 "Array descriptors"
 Each array descriptor describes a single dimension.
 For multi-dimensional arrays, several array instructions are
 needed to access a single element.
 Array descriptors contain the following three integers:
 .IS 2
 .PS 1 4 "" .
 .PT
 lower bound~~~~~~~~~~~~~~~~~~~~~signed
 .PT
 upper bound - lower bound~~~~~~~unsigned
 .PT
 number of bytes per element~~~~~unsigned
 .PE
 .IE
 The array instructions LAR, SAR and AAR have the pointer to the start
 of the descriptor as operand on the stack.
 .sp
 The element A[I] is fetched as follows:
 .IS 2
 .PS 1 4 "" .
 .PT
 Stack the address of A  (e.g., using LAE or LAL)
 .PT
 Stack the value of I (n-byte integer)
 .PT
 Stack the pointer to the descriptor (e.g., using LAE)
 .PT
 LAR n (n is the size of the integers in the descriptor and I)
 .PE
 .IE
 All array instructions first pop the address of the descriptor
 and the index.
 If the index is not within the bounds specified, a trap occurs.
 If ok, (I~-~lower bound) is multiplied
 by the number of bytes per element (the third word).  The result is added
 to the address of A and replaces A on the stack.
 .A
 At this point LAR, SAR and AAR diverge.
 AAR is finished.  LAR pops the address and fetches the data
 item,
 the size being specified by the descriptor.
 The usual restrictions for memory access must be obeyed.
 SAR pops the address and stores the
 data item now exposed.
 .S2 "Non-local goto descriptors"
 The GTO instruction provides a way of returning directly to any
 active procedure invocation.
 The argument of the instruction is the address of a descriptor
 containing three pointers:
 .IS 2
 .PS 1 4 "" .
 .PT
 value of PC after the jump
 .PT
 value of SP after the jump
 .PT
 value of LB after the jump
 .PE
 .IE
 GTO replaces the loads PC, SP and LB from the descriptor,
 thereby jumping to a procedure
 and removing zeor or more frames from the stack.
 The LB, SP and PC in the descriptor must belong to a
 dynamically enclosing procedure,
 because some EM implementations will need to backtrack through
 the dynamic chain and use the implementation dependent data
 in frames to restore registers etc.
 .S2 "Case descriptors"
 The case jump instructions CSA and CSB both
 provide multiway branches selected by a case index.
 Both fetch two operands from the stack:
 first a pointer to the low address of the case descriptor
 and then the case index.
 CSA uses the case index as index in the descriptor table, but CSB searches
 the table for an occurrence of the case index.
 Therefore, the descriptors for CSA and CSB,
 as shown in figure 4, are different.
 All pointers in the table must be addresses of instructions in the
 procedure executing the case instruction.
 .P
 CSA selects the new PC by indexing.
 If the index, a signed integer, is greater than or equal to
 the lower bound and less than or equal to the upper bound,
 then fetch the new PC from the list of instruction pointers by indexing with
 index-lower.
 The table does not contain the value of the upper bound,
 but the value of upper-lower as an unsigned integer.
 If the index is out of bounds or if the fetched pointer is 0,
 then fetch the default instruction pointer.
 If the resulting PC is 0, then trap.
 .P
 CSB selects the new PC by searching.
 The table is searched for an entry with index value equal to the case index.
 That entry or, if none is found, the default entry contains the
 new PC.
 When the resulting PC is 0, a trap is performed.
 .P
 The choice of which case instruction to use for
 each source language case statement
 is up to the front end.
 If the range of the index value is dense, i.e
 .DS
 (highest value - lowest value) / number of cases
 .DE 1
 is less than some threshold, then CSA is the obvious choice.
 If the range is sparse, CSB is better.
 .N 2
 .DS
   |--------------------|        |--------------------|  high address
   | pointer for upb    |        |    pointer n-1     |
   |--------------------|        |-  -  -  -  -  -  - |
   |         .          |        |     index  n-1     |
   |         .          |        |--------------------|
   |         .          |        |          .         |
   |         .          |        |          .         |
   |         .          |        |          .         |
   |         .          |        |--------------------|
   |         .          |        |    pointer  1      |
   |--------------------|        |-  -  -  -  -  -  - |
   | pointer for lwb+1  |        |     index   1      |
   |--------------------|        |--------------------|
   | pointer for lwb    |        |    pointer  0      |
   |--------------------|        |-  -  -  -  -  -  - |
   |   upper - lower    |        |     index   0      |
   |--------------------|        |--------------------|
   |    lower bound     |        | number of entries  |
   |--------------------|        |--------------------|
   |  default pointer   |        |  default pointer   |  low address
   |--------------------|        |--------------------|
       CSA descriptor                CSB descriptor
      Figure 4. Descriptor layout for CSA and CSB
 .DE
--- a/doc/em/dspace.nr
+++ b/doc/em/dspace.nr
@ -0,0 +1,377 @@
 .BP
 .SN 4
 .S1 "DATA ADDRESS SPACE"
 The data address space is divided into three parts, called 'areas',
 each with its own addressing method:
 global data area,
 local data area (including the stack),
 and heap data area.
 These data areas must be part of the same
 address space because all data is accessed by
 the same type of pointers.
 .P
 Space for global data is reserved using several pseudoinstructions in the
 assembly language, as described in
 the next paragraph and chapter 11.
 The size of the global data area is fixed per program.
 .A
 Global data is addressed absolutely in the machine language.
 Many instructions are available to address global data.
 They all have an absolute address as argument.
 Examples are LOE, LAE and STE.
 .P
 Part of the global data area is initialized by the
 compiler, the
 rest is not initialized at all or is initialized
 with a value, typically -32768 or 0.
 Part of the initialized global data may be made read-only
 if the implementation supports protection.
 .P
 The local data area is used as a stack,
 which grows from high to low addresses
 and contains some data for each active procedure
 invocation, called a 'frame'.
 The size of the local data area varies dynamically during
 execution.
 Below the current procedure frame resides the operand stack.
 The stack pointer SP always points to the bottom of
 the local data area.
 Local data is addressed by offsetting from the local base pointer LB.
 LB always points to the frame of the current procedure.
 Only the words of the current frame and the parameters
 can be addressed directly.
 Variables in other active procedures are addressed by following
 the chain of statically enclosing procedures using the LXL or LXA instruction.
 The variables in dynamically enclosing procedures can be
 addressed with the use of the DCH instruction.
 .A
 Many instructions have offsets to LB as argument,
 for instance LOL, LAL and STL.
 The arguments of these instructions range from -1 to some
 (negative) minimum
 for the access of local storage and from 0 to some (positive)
 maximum for parameter access.
 .P
 The procedure call instructions CAL and CAI each create a new frame
 on the stack.
 Each procedure has an assembly-time parameter specifying
 the number of bytes needed for local storage.
 This storage is allocated each time the procedure is called and
 must be a multiple of the wordsize.
 Each procedure, therefore, starts with a stack with the local variables
 already allocated.
 The return instructions RET and RTT remove a frame.
 The actual parameters must be removed by the calling procedure.
 .P
 RET may copy some words from the stack of
 the returning procedure to an unnamed 'function return area'.
 This area is available for 'READ-ONCE' access using the LFR instruction.
 The result of a LFR is only defined if the size used to fetch
 is identical to the size used in the last return.
 The instruction ASP, used to remove the parameters from the
 stack, the branch instruction BRA and the non-local goto
 instrucion GTO are the only ones that leave the contents of
 the 'function return area' intact.
 All other instructions are allowed to destroy the function
 return area.
 Thus parameters can be popped before fetching the function result.
 The maximum size of all function return areas is
 implementation dependent,
 but should allow procedure instance identifiers and all
 implemented objects of type integer, unsigned, float
 and pointer to be returned.
 In most implementations
 the maximum size of the function return
 area is twice the pointer size,
 because we want to be able to handle 'procedure instance
 identifiers' which consist of a procedure identifier and the LB
 of a frame belonging to that procedure.
 .P
 The heap data area grows upwards, to higher numbered
 addresses.
 It is initially empty.
 The initial value of the heap pointer HP
 marks the low end.
 The heap pointer may be manipulated
 by the LOR and STR instructions.
 The heap can only be addressed indirectly,
 by pointers derived from previous values of HP.
 .S2 "Global data area"
 The initial size of the global data area is determined at assembly time.
 Global data is allocated by several
 pseudoinstructions in the EM assembly
 language.
 Each pseudoinstruction allocates one or more bytes.
 The bytes allocated for a single pseudo form
 a 'block'.
 A block differs from a fragment, because,
 under certain conditions, several blocks are allocated
 in a single fragment.
 This guarantees that the bytes of these blocks
 are consecutive.
 .P
 Global data is addressed absolutely in binary
 machine language.
 Most compilers, however,
 cannot assign absolute addresses to their global variables,
 especially not if the language
 allows programs to be composed of several separately compiled modules.
 The assembly language therefore allows the compiler to name
 the first address of a global data block with an alphanumeric label.
 Moreover, the only way to address such a named global data block
 in the assembly language is by using its name.
 It is the task of the assembler/loader to
 translate these labels into absolute addresses.
 These labels may also be used
 in CON and ROM pseudoinstructions to initialize pointers.
 .P
 The pseudoinstruction CON allocates initialized data.
 ROM acts like CON but indicates that the initialized data will
 not change during execution of the program.
 The pseudoinstruction BSS allocates a block of uninitialized
 or identically initialized
 data.
 The pseudoinstruction HOL is similar to BSS,
 but it alters the meaning of subsequent absolute addressing in
 the assembly language.
 .P
 Another type of global data is a small block,
 called the ABS block, with an implementation defined size.
 Storage in this type of block can only be addressed
 absolutely in assembly language.
 The first word has address 0 and is used to maintain the
 source line number.
 Special instructions LIN and LNI are provided to
 update this counter.
 A pointer at location 4 points to a string containing the
 current source file name.
 The instruction FIL can be used to update the pointer.
 .P
 All numeric arguments of the instructions that address
 the global data area refer to locations in the
 ABS block unless
 they are preceded by at least one HOL pseudo in the same
 module,
 in which case they refer to the storage area allocated by the
 last HOL pseudoinstruction.
 Thus LOE 0 loads the zeroth word of the most recent HOL, unless no HOL has
 appeared in the current file so
 far, in which case it loads the zeroth word of the
 ABS fragment.
 .P
 The global data area is highly fragmented.
 The ABS block and each HOL and BSS block are separate fragments.
 The way fragments are formed from CON and ROM blocks is more complex.
 The assemblers group several blocks into a single fragment.
 A fragment only contains blocks of the same type: CON or ROM.
 It is guaranteed that the bytes allocated for two consecutive CON pseudos are
 allocated consecutively in a single fragment, unless
 these CON pseudos are separated in the assembly language program
 by a data label definition or one or more of the following pseudos:
 .DS
     ROM, BSS, HOL and END
 .DE
 An analogous rule holds for ROM pseudos.
 .S2 "Local data area"
 The local data area consists of a sequence of frames, one for
 each active procedure.
 Below the frame of the current procedure resides the
 expression stack.
 Frames are generated by procedure calls and are
 removed by procedure returns.
 A procedure frame consists of six 'zones':
 .DS
  1.  The return status block
  2.  The local variables and compiler temporaries
  3.  The register save block
  4.  The dynamic local generators
  5.  The operand stack.
  6.  The parameters of a procedure one level deeper
 .DE
 A sample frame is shown in Figure 1.
 .P
 Before a procedure call is performed the actual
 parameters are pushed onto the stack of the calling procedure.
 The exact details are compiler dependent.
 EM allows procedures to be called with a variable number of
 parameters.
 The implementation of the C-language almost forces its runtime
 system to push the parameters in reverse order, that is,
 the first positional parameter last.
 Most compilers use the C calling convention to be compatible.
 The parameters of a procedure belong to the frame of the
 calling procedure.
 Note that the evaluation of the actual parameters may imply
 the calling of procedures.
 The parameters can be accessed with certain instructions using
 offsets of 0 and greater.
 The first byte of the last parameter pushed has offset 0.
 Note that the parameter at offset 0 has a special use in the
 instructions following the static chain (LXL and LXA).
 These instructions assume that this parameter contains the LB of
 the statically enclosing procedure.
 Procedures that do not have a dynamically enclosing procedure
 do not need a static link at offset 0.
 .P
 Two instructions are available to perform procedure calls, CAL
 and CAI.
 Several tasks are performed by these call instructions.
 .A
 First, a part of the status of the calling procedure is
 saved on the stack in the return status block.
 This block should contain the return address of the calling
 procedure, its LB and other implementation dependent data.
 The size of this block is fixed for any given implementation
 because the lexical instructions LPB, LXL and LXA must be able to
 obtain the base addresses of the procedure parameters \fBand\fP local
 variables.
 An alternative solution can be used on machines with a highly
 segmented address space.
 The stack frames need not be contiguous then and the first
 status save area can contain the parameter base AB,
 which has the value of SP just after the last parameter has
 been pushed.
 .A
 Second, the LB is changed to point to the
 first word above the local variables.
 The new LB is a copy of the SP after the return status
 block has been pushed.
 .A
 Third, the amount of local storage needed by the procedure is
 reserved.
 The parameters and local storage are accessed by the same instructions.
 Negative offsets are used for access to local variables.
 The highest byte, that is the byte nearest
 to LB, has to be accessed with offset -1.
 The pseudoinstruction specifying the entry point of a
 procedure, has an argument that specifies the amount of local
 storage needed.
 The local variables allocated by the CAI or CAL instructions
 are the only ones that can be accessed with a fixed negative offset.
 The initial value of the allocated words is
 not defined, but implementations that check for undefined
 values will probably initialize them with a
 special 'undefined' pattern, typically -32768.
 .A
 Fourth, any EM implementation is allowed to reserve a variable size
 block beneath the local variables.
 This block could, for example, be used to save a variable number
 of registers.
 .A
 Finally, the address of the entry point of the called procedure
 is loaded into the Program Counter.
 .P
 The ASP instruction can be used to allocate further (dynamic)
 local storage.
 The base address of such storage must be obtained with a LOR~SP
 instruction.
 This same instruction ASP may also be used
 to remove some words from the stack.
 .P
 There is a version of ASP, called ASS, which fetches the number
 of bytes to allocate from the stack.
 It can be used to allocate space for local
 objects whose size is unknown at compile time,
 so called 'dynamic local generators'.
 .P
 Control is returned to the calling procedure with a RET instruction.
 Any return value is then copied to the 'function return area'.
 The frame created by the call is deallocated and the status of
 the calling procedure is restored.
 The value of SP just after the return value has been popped must
 be the same as the
 value of SP just before executing the first instruction of this
 invocation.
 This means that when a RET is executed the operand stack can
 only contain the return value and all dynamically generated locals must be
 deallocated.
 Violating this restriction might result in hard to detect
 errors.
 The calling procedure has to remove the parameters from the stack.
 This can be done with the aforementioned ASP instruction.
 .P
 Each procedure frame is a separate fragment.
 Because any fragment may be placed anywhere in memory,
 procedure frames need not be contiguous.
 .DS
                |===============================|
                |     actual parameter  n-1     |
                |-------------------------------|
                |              .                |
                |              .                |
                |              .                |
                |-------------------------------|
                |     actual parameter  0       | ( <- AB )
                |===============================|
                |===============================|
                |///////////////////////////////|
                |///// return status block /////|
                |///////////////////////////////|   <- LB
                |===============================|
                |                               |
                |       local variables         |
                |                               |
                |-------------------------------|
                |                               |
                |      compiler temporaries     |
                |                               |
                |===============================|
                |///////////////////////////////|
                |///// register save block /////|
                |///////////////////////////////|
                |===============================|
                |                               |
                |   dynamic local generators    |
                |                               |
                |===============================|
                |           operand             |
                |-------------------------------|
                |           operand             |
                |===============================|
                |         parameter  m-1        |
                |-------------------------------|
                |              .                |
                |              .                |
                |              .                |
                |-------------------------------|
                |         parameter  0          | <- SP
                |===============================|
          Figure 1. A sample procedure frame and parameters.
 .DE
 .S2 "Heap data area"
 The heap area starts empty, with HP
 pointing to the low end of it.
 HP always contains a word address.
 A copy of HP can always be obtained with the LOR instruction.
 A new value may be stored in the heap pointer using the STR instruction.
 If the new value is greater than the old one,
 then the heap grows.
 If it is smaller, then the heap shrinks.
 HP may never point below its original value.
 All words between the current HP and the original HP
 are allocated to the heap.
 The heap may not grow into a part of memory that is already allocated
 for the stack.
 When this is attempted, the STR instruction will cause a trap to occur.
 .P
 The only way to address the heap is indirectly.
 Whenever an object is allocated by increasing HP,
 then the old HP value must be saved and can be used later to address
 the allocated object.
 If, in the meantime, HP is decreased so that the object
 is no longer part of the heap, then an attempt to access
 the object is not allowed.
 Furthermore, if the heap pointer is increased again to above
 the object address, then access to the old object gives undefined results.
 .P
 The heap is a single fragment.
 All bytes have consecutive addresses.
 No limits are imposed on the size of the heap as long as it fits
 in the available data address space.
--- a/doc/em/even.c
+++ b/doc/em/even.c
@ -0,0 +1,9 @@
 main() {
 	register int l,j ;
 	for ( j=0 ; (l=getchar()) != -1 ; j++ ) {
 		if ( j%16 == 15 ) printf("%3d\n",l&0377 ) ;
 		else              printf("%3d ",l&0377 ) ;
 	}
 	printf("\n") ;
 }
--- a/doc/em/exam.e
+++ b/doc/em/exam.e
@ -0,0 +1,178 @@
  mes 2,2,2              ; wordsize 2, pointersize 2
 .1
  rom 't.p\000'          ; the name of the source file
  hol 552,-32768,0       ; externals and buf occupy 552 bytes
  exp $sum               ; sum can be called from other modules
  pro $sum,2             ; procedure sum; 2 bytes local storage
  lin 8                  ; code from source line 8
  ldl 0                  ; load two locals ( a and b )
  adi 2                  ; add them
  ret 2                  ; return the result
  end 2                  ; end of procedure ( still two bytes local storage )
 .2
  rom 1,99,2             ; descriptor of array a[]
  exp $test              ; the compiler exports all level 0 procedures
  pro $test,226          ; procedure test, 226 bytes local storage
 .3
  rom 4.8F8              ; assemble Floating point 4.8 (8 bytes) in
 .4                              ; global storage
  rom 0.5F8              ; same for 0.5
  mes 3,-226,2,2         ; compiler temporary not referenced indirect
  mes 3,-24,2,0          ; the same is true for i, j, b and c in test
  mes 3,-22,2,0
  mes 3,-4,2,0
  mes 3,-2,2,0
  mes 3,-20,8,0          ; and for x and y
  mes 3,-12,8,0
  lin 20                 ; maintain source line number
  loc 1
  stl -4                 ; j := 1
  lni                    ; was lin 21 prior to optimization
  lol -4
  loc 3
  mli 2
  loc 6
  adi 2
  stl -2                 ; i := 3 * j + 6
  lni                    ; was lin 22 prior to optimization
  lae .3
  loi 8
  lal -12
  sti 8                  ; x := 4.8
  lni                    ; was lin 23 prior to optimization
  lal -12
  loi 8
  lae .4
  loi 8
  dvf 8
  lal -20
  sti 8                  ; y := x / 0.5
  lni                    ; was lin 24 prior to optimization
  loc 1
  stl -22                ; b := true
  lni                    ; was lin 25 prior to optimization
  loc 122
  stl -24                ; c := 'z'
  lni                    ; was lin 26 prior to optimization
  loc 1
  stl -2                 ; for i:= 1
 2
  lol -2
  dup 2
  mli 2                  ; i*i
  lal -224
  lol -2
  lae .2
  sar 2                  ; a[i] :=
  lol -2
  loc 100
  beq *3                 ; to 100 do
  inl -2                 ; increment i and loop
  bra *2
 3
  lin 27
  lol -4
  loc 27
  adi 2                  ; j + 27
  sil 0                  ; r.r1 :=
  lni                    ; was lin 28 prior to optimization
  lol -22                ; b
  lol 0
  stf 10                 ; r.r3 :=
  lni                    ; was lin 29 prior to optimization
  lal -20
  loi 16
  adf 8                  ; x + y
  lol 0
  adp 2
  sti 8                  ; r.r2 :=
  lni                    ; was lin 23 prior to optimization
  lal -224
  lol -4
  lae .2
  lar 2                  ; a[j]
  lil 0                  ; r.r1
  cal $sum               ; call now
  asp 4                  ; remove parameters from stack
  lfr 2                  ; get function result
  stl -2                 ; i :=
 4
  lin 31
  lol -2
  zle *5                 ; while i > 0 do
  lol -4
  lil 0
  adi 2
  stl -4                 ; j := j + r.r1
  del -2                 ; i := i - 1
  bra *4                 ; loop
 5
  lin 32
  lol 0
  stl -226               ; make copy of address of r
  lol -22
  lol -226
  stf 10                 ; r3 := b
  lal -20
  loi 16
  adf 8
  lol -226
  adp 2
  sti 8                  ; r2 := x + y
  loc 0
  sil -226               ; r1 := 0
  lin 34                 ; note the abscence of the unnecesary jump
  lae 22                 ; address of output structure
  lol -4
  cal $_wri              ; write integer with default width
  asp 4                  ; pop parameters
  lae 22
  lol -2
  loc 6
  cal $_wsi              ; write integer width 6
  asp 6
  lae 22
  lal -12
  loi 8
  loc 9
  loc 3
  cal $_wrf              ; write fixed format real, width 9, precision 3
  asp 14
  lae 22
  lol -22
  cal $_wrb              ; write boolean, default width
  asp 4
  lae 22
  cal $_wln              ; writeln
  asp 2
  ret 0                  ; return, no result
  end 226
  exp $_main
  pro $_main,0           ; main program
 .6
  con 2,-1,22            ; description of external files
 .5
  rom 15.96F8
  fil .1                 ; maintain source file name
  lae .6                 ; description of external files
  lae 0                  ; base of hol area to relocate buffer addresses
  cal $_ini              ; initialize files, etc...
  asp 4
  lin 37
  lae .5
  loi 8
  lae 2
  sti 8                  ; x := 15.9
  lni                    ; was lin 38 prior to optimization
  loc 99
  ste 0                  ; mi := 99
  lni                    ; was lin 39 prior to optimization
  lae 10                 ; address of r
  cal $test
  asp 2
  loc 0                  ; normal exit
  cal $_hlt              ; cleanup and finish
  asp 2
  end 0
  mes 4,40               ; length of source file is 40 lines
  mes 5                  ; reals were used
--- a/doc/em/exam.p
+++ b/doc/em/exam.p
@ -0,0 +1,40 @@
  program example(output);
  {This program just demonstrates typical EM code.}
  type rec = record r1: integer; r2:real; r3: boolean end;
  var mi: integer;  mx:real;  r:rec;
  function sum(a,b:integer):integer;
  begin
    sum := a + b
  end;
  procedure test(var r: rec);
  label 1;
  var   i,j: integer;
 	x,y: real;
 	b: boolean;
 	c: char;
 	a: array[1..100] of integer;
  begin
 	j := 1;
 	i := 3 * j + 6;
 	x := 4.8;
 	y := x/0.5;
 	b := true;
 	c := 'z';
 	for i:= 1 to 100 do a[i] := i * i;
 	r.r1 := j+27;
 	r.r3 := b;
 	r.r2 := x+y;
 	i := sum(r.r1, a[j]);
 	while i > 0 do begin j := j + r.r1; i := i - 1 end;
 	with r do begin r3 := b;  r2 := x+y;  r1 := 0 end;
 	goto 1;
  1:    writeln(j, i:6, x:9:3, b)
  end; {test}
  begin {main program}
    mx := 15.96;
    mi := 99;
    test(r)
  end.
--- a/doc/em/intro.nr
+++ b/doc/em/intro.nr
@ -0,0 +1,180 @@
 .BP
 .S1 "INTRODUCTION"
 EM is a family of intermediate languages designed for producing
 portable compilers.
 The general strategy is for a program called
 .B front end
 to translate the source program to EM.
 Another program,
 .B back
 .BW end
 translates EM to target assembly language.
 Alternatively, the EM code can be assembled to a binary form
 and interpreted.
 These considerations led to the following goals:
 .IS 2 10
 .PS 1 4
 .PT
 The design should allow translation to,
 or interpretation on, a wide range of existing machines.
 Design decisions should be delayed as far as possible
 and the implications of these decisions should
 be localized as much as possible.
 .N
 The current microcomputer technology offers 8, 16 and 32 bit machines
 with various sizes of address space.
 EM should be flexible enough to be useful on most of these
 machines.
 The differences between the members of the EM family should only
 concern the wordsize and address space size.
 .PT
 The architecture should ease the task of code generation for
 high level languages such as Pascal, C, Ada, Algol 68, BCPL.
 .PT
 The instruction set used by the interpreter should be compact,
 to reduce the amount of memory needed
 for program storage, and to reduce the time needed to transmit
 programs over communication lines.
 .PT
 It should be designed with microprogrammed implementations in
 mind; in particular, the use of many short fields within
 instruction opcodes should be avoided, because their extraction by the
 microprogram or conversion to other instruction formats is inefficient.
 .PE
 .IE
 .A
 The basic architecture is based on the concept of a stack. The stack
 is used for procedure return addresses, actual parameters, local variables,
 and arithmetic operations.
 There are several built-in object types,
 for example, signed and unsigned integers,
 floating point numbers, pointers and sets of bits.
 There are instructions to push and pop objects
 to and from the stack.
 The push and pop instructions are not typed.
 They only care about the size of the objects.
 For each built-in type there are
 reverse Polish type instructions that pop one or more
 objects from the top of
 the stack, perform an operation, and push the result back onto the
 stack.
 For all types except pointers,
 these instructions have the object size
 as argument.
 .P
 There are no visible general registers used for arithmetic operands
 etc. This is in contrast to most third generation computers, which usually
 have 8 or 16 general registers. The decision not to have a group of
 general registers was fully intentional, and follows W.L. Van der
 Poel's dictum that a machine should have 0, 1, or an infinite
 number of any feature. General registers have two primary uses: to hold
 intermediate results of complicated expressions, e.g.
 .IS 5 0 1
 ((a*b + c*d)/e + f*g/h) * i
 .IE 1
 and to hold local variables.
 .P
 Various studies
 have shown that the average expression has fewer than two operands,
 making the former use of registers of doubtful value. The present trend
 toward structured programs consisting of many small
 procedures greatly reduces the value of registers to hold local variables
 because the large number of procedure calls implies a large overhead in
 saving and restoring the registers at every call.
 .BP
 .P
 Although there are no general purpose registers, there are a
 few internal registers with specific functions as follows:
 .IS 2
 .N 1
 .TS
 tab(:);
 l 1 l l.
 PC:-:Program Counter:Pointer to next instruction
 LB:-:Local Base:Points to base of the local variables \
 in the current procedure.
 SP:-:Stack Pointer:Points to the highest occupied word on the stack.
 HP:-:Heap Pointer:Points to the top of the heap area.
 .TE 1
 .IE
 .A
 Furthermore, reverse Polish code is much easier to generate than
 multi-register machine code, especially if highly efficient code is
 desired.
 When translating to assembly language the back end can make
 good use of the target machine's registers.
 An EM machine can
 achieve high performance by keeping part of the stack
 in high speed storage (a cache or microprogram scratchpad memory) rather
 than in primary memory.
 .P
 Again according to van der Poel's dictum,
 all EM instructions have zero or one argument.
 We believe that instructions needing two arguments
 can be split into two simpler ones.
 The simpler ones can probably be used in other
 circumstances as well.
 Moreover, these two instructions together often
 have a shorter encoding than the single
 instruction before.
 .P
 This document describes EM at three different levels:
 the abstract level, the assembly language level and
 the machine language level.
 .A
 The most important level is that of the abstract EM architecture.
 This level deals with the basic design issues.
 Only the functional capabilities of instructions are relevant, not their
 format or encoding.
 Most chapters of this document refer to the abstract level
 and it is explicitly stated whenever
 another level is described.
 .A
 The assembly language is intended for the compiler writer.
 It presents a more or less orthogonal instruction
 set and provides symbolic names for data.
 Moreover, it facilitates the linking of
 separately compiled 'modules' into a single program
 by providing several pseudoinstructions.
 .A
 The machine language is designed for interpretation with a compact
 program text and easy decoding.
 The binary representation of the machine language instruction set is
 far from orthogonal.
 Frequent instructions have a short opcode.
 The encoding is fully byte oriented.
 These bytes do not contain small bit fields, because
 bit fields would slow down decoding considerably.
 .P
 A common use for EM is for producing portable (cross) compilers.
 When used this way, the compilers produce
 EM assembly language as their output.
 To run the compiled program on the target machine,
 the back end, translates the EM assembly language to
 the target machine's assembly language.
 When this approach is used, the format of the EM
 machine language instructions is irrelevant.
 On the other hand, when writing an interpreter for EM machine language
 programs, the interpreter must deal with the machine language
 and not with the symbolic assembly language.
 .P
 As mentioned above, the
 current microcomputer technology offers 8, 16 and 32 bit
 machines with address spaces ranging from 2\v'-0.5m'16\v'0.5m'
 to 2\v'-0.5m'32\v'0.5m' bytes.
 Having one size of pointers and integers restricts
 the usefulness of the language.
 We decided to have a different language for each combination of
 word and pointer size.
 All languages offer the same instruction set and differ only in
 memory alignment restrictions and the implicit size assumed in
 several instructions.
 The languages
 differ slightly for the
 different size combinations.
 For example: the
 size of any object on the stack and alignment restrictions.
 The wordsize is restricted to powers of 2 and
 the pointer size must be a multiple of the wordsize.
 Almost all programs handling EM will be parametrized with word
 and pointer size.
--- a/doc/em/iotrap.nr
+++ b/doc/em/iotrap.nr
@ -0,0 +1,376 @@
 .SN 8
 .VS 1 0
 .BP
 .S1 "ENVIRONMENT INTERACTIONS"
 EM programs can interact with their environment in three ways.
 Two, starting/stopping and monitor calls, are dealt with in this chapter.
 The remaining way to interact, interrupts, will be treated
 together with traps in chapter 9.
 .S2 "Program starting and stopping"
 EM user programs start with a call to a procedure called
 m_a_i_n.
 The assembler and backends look for the definition of a procedure
 with this name in their input.
 The call passes three parameters to the procedure.
 The parameters are similar to the parameters supplied by the
 UNIX
 .FS
 UNIX is a Trademark of Bell Laboratories.
 .FE
 operating system to C programs.
 These parameters are often called
 .BW argc ,
 .B argv
 and
 .BW envp .
 Argc is the parameter nearest to LB and is a wordsized integer.
 The other two are pointers to the first element of an array of
 string pointers.
 .N
 The
 .B argv
 array contains
 .B argc
 strings, the first of which contains the program call name.
 The other strings in the
 .B argv
 array are the program parameters.
 .P
 The
 .B envp
 array contains strings in the form "name=string", where 'name'
 is the name of an environment variable and string its value.
 The
 .B envp
 is terminated by a zero pointer.
 .P
 An EM user program stops if the program returns from the first
 invocation of m_a_i_n.
 The contents of the function return area are used to procure a
 wordsized program return code.
 EM programs also stop when traps and interrupts occur that are
 not caught and when the exit monitor call is executed.
 .S2 "Input/Output and other monitor calls"
 EM differs from most conventional machines in that it has high level i/o
 instructions.
 Typical instructions are OPEN FILE and READ FROM FILE instead
 of low level instructions such as setting and clearing
 bits in device registers.
 By providing such high level i/o primitives, the task of implementing
 EM on various non EM machines is made considerably easier.
 .P
 I/O is initiated by the MON instruction, which expects an iocode on top
 of the stack.
 Often there are also parameters which are pushed on the
 stack in reverse order, that is: last
 parameter first.
 Some i/o functions also provide results, which are returned on the stack.
 In the list of monitor calls we use several types of parameters and results,
 these types consist of integers and unsigneds of varying sizes, but never
 smaller than the wordsize, and the two pointer types.
 .N 1
 The names of the types used are:
 .IS 4
 .PS - 10
 .PT int
 an integer of wordsize
 .PT int2
 an integer whose size is the maximum of the wordsize and 2
 bytes
 .PT int4
 an integer whose size is the maximum of the wordsize and 4
 bytes
 .PT intp
 an integer with the size of a pointer
 .PT uns2
 an unsigned integer whose size is the maximum of the wordsize and 2
 .PT unsp
 an unsigned integer with the size of a pointer
 .PT ptr
 a pointer into data space
 .PE 1
 .IE 0
 The table below lists the i/o codes with their results and
 parameters.
 This list is similar to the system calls of the UNIX Version 7
 operating system.
 .BP
 .A
 To execute a monitor call, proceed as follows:
 .IS 2
 .N 1
 .PS a 4 "" )
 .PT
 Stack the parameters, in reverse order, last parameter first.
 .PT
 Push the monitor call number (iocode) onto the stack.
 .PT
 Execute the MON instruction.
 .PE 1
 .IE
 An error code is present on the top of the stack after
 execution of most monitor calls.
 If this error code is zero, the call performed the action
 requested and the results are available on top of the stack.
 Non-zero error codes indicate a failure, in this case no
 results are available and the error code has been pushed twice.
 This construction enables programs to test for failure with a
 single instruction (~TEQ or TNE~) and still find out the cause of
 the failure.
 The result name 'e' is reserved for the error code.
 .N 1
 List of monitor calls.
 .DS B
 number name    parameters      results           function
   1   Exit    status:int                        Terminate this process
   2   Fork                    e,flag,pid:int    Spawn new process
   3   Read    fildes:int;buf:ptr;nbytes:unsp
                               e:int;rbytes:unsp Read from file
   4   Write   fildes:int;buf:ptr;nbytes:unsp
                               e:int;wbytes:unsp Write on a file
   5   Open    string:ptr;flag:int
                               e,fildes:int      Open file for read and/or write
   6   Close   fildes:int      e:int             Close a file
   7   Wait                    e:int;status,pid:int2
                                                 Wait for child
   8   Creat   string:ptr;mode:int
                               e,fildes:int      Create a new file
   9   Link    string1,string2:ptr
                               e:int             Link to a file
  10   Unlink  string:ptr      e:int             Remove directory entry
  12   Chdir   string:ptr      e:int             Change default directory
  14   Mknod   string:ptr;mode,addr:int2
                               e:int             Make a special file
  15   Chmod   string:ptr;mode:int2
                               e:int             Change mode of file
  16   Chown   string:ptr;owner,group:int2
                               e:int             Change owner/group of a file
  18   Stat    string,statbuf:ptr
                               e:int             Get file status
  19   Lseek   fildes:int;off:int4;whence:int
                               e:int;oldoff:int4 Move read/write pointer
  20   Getpid                  pid:int2          Get process identification
  21   Mount   special,string:ptr;rwflag:int
                               e:int             Mount file system
  22   Umount  special:ptr     e:int             Unmount file system
  23   Setuid  userid:int2     e:int             Set user ID
  24   Getuid                  e_uid,r_uid:int2  Get user ID
  25   Stime   time:int4       e:int             Set time and date
  26   Ptrace  request:int;pid:int2;addr:ptr;data:int
                               e,value:int       Process trace
  27   Alarm   seconds:uns2    previous:uns2     Schedule signal
  28   Fstat   fildes:int;statbuf:ptr
                               e:int             Get file status
  29   Pause                                     Stop until signal
  30   Utime   string,timep:ptr
                               e:int             Set file times
  33   Access  string,mode:int e:int             Determine file accessibility
  34   Nice    incr:int                          Set program priority
  35   Ftime   bufp:ptr        e:int             Get date and time
  36   Sync                                      Update filesystem
  37   Kill    pid:int2;sig:int
                               e:int             Send signal to a process
  41   Dup     fildes,newfildes:int
                               e,fildes:int      Duplicate a file descriptor
  42   Pipe                    e,w_des,r_des:int Create a pipe
  43   Times   buffer:ptr                        Get process times
  44   Profil  buff:ptr;bufsiz,offset,scale:intp Execution time profile
  46   Setgid  gid:int2        e:int             Set group ID
  47   Getgid                  e_gid,r_gid:int   Get group ID
  48   Sigtrp  trapno,signo:int
                               e,prevtrap:int    See below
  51   Acct    file:ptr        e:int             Turn accounting on or off
  53   Lock    flag:int        e:int             Lock a process
  54   Ioctl   fildes,request:int;argp:ptr
                               e:int             Control device
  56   Mpxcall cmd:int;vec:ptr e:int             Multiplexed file handling
  59   Exece   name,argv,envp:ptr
                               e:int             Execute a file
  60   Umask   complmode:int2  oldmask:int2      Set file creation mode mask
  61   Chroot  string:ptr      e:int             Change root directory
 .DE 1
 Codes 0, 11, 13, 17, 31, 32, 38, 39, 40, 45, 49, 50, 52,
 55, 57, 58, 62, and 63 are
 not used.
 .P
 All monitor calls, except fork and sigtrp
 are the same as the UNIX version 7 system calls.
 .P
 The sigtrp entry maps UNIX signals onto EM interrupts.
 Normally, trapno is in the range 0 to 252.
 In that case it requests that signal signo
 will cause trap trapno to occur.
 When given trap number -2, default signal handling is reset, and when given
 trap number -3, the signal is ignored.
 .P
 The flag returned by fork is 1 in the child process and 0 in
 the parent.
 The pid returned is the process-id of the other process.
 .BP
 .S1 "TRAPS AND INTERRUPTS"
 EM provides a means for the user program to catch all traps
 generated by the program itself, the hardware, or external conditions.
 This mechanism uses five instructions: LIM, SIM, SIG, TRP and RTT.
 This section of the manual may be omitted on the first reading since it
 presupposes knowledge of the EM instruction set.
 .P
 The action taken when a trap occures is determined by the value
 of an internal EM trap register.
 This register contains a pointer to a procedure.
 Initially the pointer used is zero and all traps halt the
 program with, hopefully, a useful message to the outside world.
 The SIG instruction can be used to alter the trap register,
 it pops a procedure pointer from the
 stack into the trap register.
 When a trap occurs after storing a nonzero value in the trap
 register, the procedure pointed to by the trap register
 is called with the trap number
 as the only parameter (see below).
 SIG returns the previous value of the trap register on the
 stack.
 Two consecutive SIGs are a no-op.
 When a trap occurs, the trap register is reset to its initial
 condition, to prevent recursive traps from hanging the machine up,
 e.g. stack overflow in the stack overflow handling procedure.
 .P
 The runtime systems for some languages need to ignore some EM
 traps.
 EM offers a feature called the ignore mask.
 It contains one bit for each of the lowest 16 trap numbers.
 The bits are numbered 0 to 15, with the least significant bit
 having number 0.
 If a certain bit is 1 the corresponding trap never
 occurs and processing simply continues.
 The actions performed by the offending instruction are
 described by the Pascal program in appendix A.
 .N
 If the bit is 0, traps are not ignored.
 The instructions LIM and SIM allow copying and replacement of
 the ignore mask.~
 .P
 The TRP instruction generates a trap, the trap number being found on the
 stack.
 This is, among other things,
 useful for library procedures and runtime systems.
 It can also be used by a low level trap procedure to pass the trap to a
 higher level one (see example below).
 .P
 The RTT instruction returns from the trap procedure and continues after the
 trap.
 In the list below all traps marked with an asterisk ('*') are
 considered to be fatal and it is explicitly undefined what happens if
 you try to restart after the trap.
 .P
 The way a trap procedure is called is completely compatible
 with normal calling conventions. The only way a trap procedure
 differs from normal procedures is the return. It has to use RTT instead
 of RET. This is necessary because the complete runtime status is saved on the
 stack before calling the procedure and all this status has to be reloaded.
 Error numbers are in the range 0 to 252.
 The trap numbers are divided into three categories:
 .IS 4
 .N 1
 .PS - 10
 .PT ~~0-~63
 EM machine errors, e.g. illegal instruction.
 .PS - 8
 .PT ~0-15
 maskable
 .PT 16-63
 not maskable
 .PE
 .PT ~64-127
 Reserved for use by compilers, run time systems, etc.
 .PT 128-252
 Available for user programs.
 .PE 1
 .IE
 EM machine errors are numbered as follows:
 .DS I 5
 .TS
 tab(@);
 n l l.
 0@EARRAY@Array bound error
 1@ERANGE@Range bound error
 2@ESET@Set bound error
 3@EIOVFL@Integer overflow
 4@EFOVFL@Floating overflow
 5@EFUNFL@Floating underflow
 6@EIDIVZ@Divide by 0
 7@EFDIVZ@Divide by 0.0
 8@EIUND@Undefined integer
 9@EFUND@Undefined float
 10@ECONV@Conversion error
 16*@ESTACK@Stack overflow
 17*@EHEAP@Heap overflow
 18*@EILLINS@Illegal instruction
 19*@EODDZ@Illegal size argument
 20*@ECASE@Case error
 21*@EMEMFLT@Addressing non existent memory
 22*@EBADPTR@Bad pointer used
 23*@EBADPC@Program counter out of range
 24@EBADLAE@Bad argument of LAE
 25@EBADMON@Bad monitor call
 26@EBADLIN@Argument of LIN too high
 27@EBADGTO@GTO descriptor error
 .TE
 .DE 0
 .P
 As an example,
 suppose a subprocedure has to be written to do a numeric
 calculation.
 When an overflow occurs the computation has to be stopped and
 the higher level procedure must be resumed.
 This can be programmed as follows using the mechanism described above:
 .DS B
 mes 2,2,2              ; set sizes
 ersave
 bss 2,0,0              ; Room to save previous value of trap procedure
 msave
 bss 2,0,0              ; Room to save previous value of trap mask
 pro calcule,0          ; entry point
 lxl 0                  ; fill in non-local goto descriptor with LB
 ste jmpbuf+4
 lor 1                  ; and SP
 ste jmpbuf+2
 lim                    ; get current ignore mask
 ste msave              ; save it
 lim
 loc 4                  ; bit for EFOVFL
 ior 2                  ; set in mask
 sim                    ; ignore EFOVFL from now on
 lpi $catch             ; load procedure identifier
 sig                    ; catch wil get all traps now
 ste ersave             ; save previous trap procedure identifier
 ; perform calculation now, possibly generating overflow
 1                       ; label jumped to by catch procedure
 loe ersave             ; get old trap procedure
 sig                    ; refer all following trap to old procedure
 asp 2                  ; remove result of sig
 loe msave              ; restore previous mask
 sim                    ; done now
 ; load result of calculation
 ret 2                  ; return result
 jmpbuf
 con *1,0,0
 end
 .DE 0
 .VS 1 1
 .DS
 Example of catch procedure
 pro catch,0            ; Local procedure that must catch the overflow trap
 lol 2                  ; Load trap number
 loc 4                  ; check for overflow
 bne *1                 ; if other trap, call higher trap procedure
 gto jmpbuf             ; return to procedure calcule
 1                       ; other trap has occurred
 loe ersave             ; previous trap procedure
 sig                    ; other procedure will get the traps now
 asp 2                  ; remove the result of sig
 lol 2                  ; stack trap number
 trp                    ; call other trap procedure
 rtt                    ; if other procedure returns, do the same
 end
 .DE
--- a/doc/em/ip.awk
+++ b/doc/em/ip.awk
@ -0,0 +1,6 @@
 BEGIN { printf ".TS\nlw(6) lw(8) rw(3) rw(6) 14 lw(6) lw(8) rw(3) rw(6) 14 lw(6) lw(8) rw(3) rw(6).\n" }
 NF == 4 { printf "%s\t%s\t%d\t%d",$1,$2,$3,$4 }
 NF == 3 { printf "%s\t%s\t\t%d",$1,$2,$3 }
 { if ( NR%3 == 0 ) printf("\n") ; else printf("\t"); }
 END { if ( NR%3 != 0 ) printf("\n")
      printf ".TE\n" }
--- a/doc/em/ispace.nr
+++ b/doc/em/ispace.nr
@ -0,0 +1,61 @@
 .SN 3
 .BP
 .S1 "INSTRUCTION ADDRESS SPACE"
 The instruction space of the EM machine contains
 the code for procedures.
 Tables necessary for the execution of this code, for example, procedure
 descriptor tables, may also be present.
 The instruction space does not change during
 the execution of a program, so that it may be
 protected.
 No further restrictions to the instruction address space are
 necessary for the abstract and assembly language level.
 .P
 Each procedure has a single entry point: the first instruction.
 A special type of pointer identifies a procedure.
 Pointers into the instruction
 address space have the same size as pointers into data space and
 can, for example, contain the address of the first instruction
 or an index in a procedure descriptor table.
 .A
 There is a single EM program counter, PC, pointing
 to the next instruction to be executed.
 The procedure pointed to by PC is
 called the 'current' procedure.
 A procedure may call another procedure using the CAL or CAI
 instruction.
 The calling procedure remains 'active' and is resumed whenever the called
 procedure returns.
 Note that a procedure has several 'active' invocations when
 called recursively.
 .P
 Each procedure must return properly.
 It is not allowed to fall through to the
 code of the next procedure.
 There are several ways to exit from a procedure:
 .IS 3
 .PS
 .PT
 the RET instruction, which returns to the
 calling procedure.
 .PT
 the RTT instruction, which exits a trap handling routine and resumes
 the trapping instruction (see next chapter).
 .PT
 the GTO instruction, which is used for non-local goto's.
 It can remove several frames from the stack and transfer
 control to an active procedure.
 .PE
 .IE
 .P
 All branch instructions can transfer control
 to any label within the same procedure.
 Branch instructions can never jump out of a procedure.
 .P
 Several language implementations use a so called procedure
 instance identifier, a combination of a procedure identifier and
 the LB of a stack frame, also called static link.
 .P
 The program text for each procedure, as well as any tables,
 are fragments and can be allocated anywhere
 in the instruction address space.
--- a/doc/em/itables
+++ b/doc/em/itables
--- a/doc/em/mach.nr
+++ b/doc/em/mach.nr
@ -0,0 +1,390 @@
 .BP
 .SN 10
 .S1 "EM MACHINE LANGUAGE"
 The EM machine language is designed to make program text compact
 and to make decoding easy.
 Compact program text has many advantages: programs execute faster,
 programs occupy less primary and secondary storage and loading
 programs into satellite processors is faster.
 The decoding of EM machine language is so simple,
 that it is feasible to use interpreters as long as EM hardware
 machines are not available.
 This chapter is irrelevant when back ends are used to
 produce executable target machine code.
 .S2 "Instruction encoding"
 A design goal of EM is to make the
 program text as compact as possible.
 Decoding must be easy, however.
 The encoding is fully byte oriented, without any small bit fields.
 There are 256 primary opcodes, two of which are an escape to
 two groups of 256 secondary opcodes each.
 .A
 EM instructions without arguments have a single opcode assigned,
 possibly escaped:
 .DS
         |--------------|
         |    opcode    |
         |--------------|
                or
         |--------------|--------------|
         |    escape    |     opcode   |
         |--------------|--------------|
 .DE
 The encoding for instructions with an argument is more complex.
 Several instructions have an address from the global data area
 as argument.
 Other instructions have different opcodes for positive
 and negative arguments.
 .N 1
 There is always an opcode that takes the next two bytes as argument,
 high byte first:
 .DS
         |--------------|--------------|--------------|
         |    opcode    |    hibyte    |    lobyte    |
         |--------------|--------------|--------------|
                or
         |--------------|--------------|--------------|--------------|
         |    escape    |    opcode    |    hibyte    |    lobyte    |
         |--------------|--------------|--------------|--------------|
 .DE
 .DS
 An extra escape is provided for instructions with four or eight byte arguments.
  |--------------|--------------|--------------|   |--------------|
  |    ESCAPE    |    opcode    |    hibyte    |...|    lobyte    |
  |--------------|--------------|--------------|   |--------------|
 .DE
 For most instructions some argument values predominate.
 The most frequent combinations of instruction and argument
 will be encoded in a single byte, called a mini:
 .DS
         |---------------|
         |opcode+argument|  (mini)
         |---------------|
 .DE
 The number of minis is restricted, because only
 254 primary opcodes are available.
 Many instructions have the bulk of their arguments
 fall in the range 0 to 255.
 Instructions that address global data have their arguments
 distributed over a wider range,
 but small values of the high byte are common.
 For all these cases there is another encoding
 that combines the instruction and the high byte of the argument
 into a single opcode.
 These opcodes are called shorties.
 Shorties may be escaped.
 .DS
         |--------------|--------------|
         | opcode+high  |    lobyte    |  (shortie)
         |--------------|--------------|
                or
         |--------------|--------------|--------------|
         |    escape    | opcode+high  |    lobyte    |
         |--------------|--------------|--------------|
 .DE
 Escaped shorties are useless if the normal encoding has a primary opcode.
 Note that for some instruction-argument combinations
 several different encodings are available.
 It is the task of the assembler to select the shortest of these.
 The savings by these mini and shortie
 opcodes are considerable, about 55%.
 .P
 Further improvements are possible:
 the arguments of
 many instructions are a multiple of the wordsize.
 Some do also not allow zero as an argument.
 If these arguments are divided by the wordsize and,
 when zero is not allowed, then decremented by 1, more of them can
 be encoded as shortie or mini.
 The arguments of some other instructions
 rarely or never assume the value 0, but start at 1.
 The value 1 is then encoded as 0,
 2 as 1 and so on.
 .P
 Assigning opcodes to instructions by the assembler is completely
 table driven.
 For details see appendix B.
 .S2 "Procedure descriptors"
 The procedure identifiers used in the interpreter are indices
 into a table of procedure descriptors.
 Each descriptor contains:
 .IS 6
 .PS - 4
 .PT 1.
 the number of bytes to be reserved for locals at each
 invocation.
 .N
 This is a pointer-szied integer.
 .PT 2.
 the start address of the procedure
 .PE
 .IE
 .S2 "Load format"
 The EM machine language load format defines the interface between
 the EM assembler/loader and the EM machine itself.
 A load file consists of a header, the program text to be executed,
 a description of the global data area and the procedure descriptor table,
 in this order.
 All integers in the load file are presented with the
 least significant byte first.
 .P
 The header has two parts: the first half (eight 16-bit integers)
 aids in selecting
 the correct EM machine or interpreter.
 Some EM machines, for instance, may have hardware floating point
 instructions.
 .N
 The header entries are as follows (bit 0 is rightmost):
 .IS 2
 .VS 1 0
 .PS 1 4 "" :
 .PT
 magic number (07255)
 .PT
 flag bits with the following meaning:
 .PS - 7 "" :
 .PT bit 0
 TEST; test for integer overflow etc.
 .PT bit 1
 PROFILE; for each source line: count the number of memory
 cycles executed.
 .PT bit 2
 FLOW; for each source line: set a bit in a bit map table if
 instructions on that line are executed.
 .PT bit 3
 COUNT; for each source line: increment a counter if that line
 is entered.
 .PT bit 4
 REALS; set if a program uses floating point instructions.
 .PT bit 5
 EXTRA; more tests during compiler debugging.
 .PE
 .PT
 number of unresolved references.
 .PT
 version number; used to detect obsolete EM load files.
 .PT
 wordsize ; the number of bytes in each machine word.
 .PT
 pointer size ; the number of bytes available for addressing.
 .PT
 unused
 .PT
 unused
 .PE
 .IE
 The second part of the header (eight entries, of pointer size bytes each)
 describes the load file itself:
 .IS 2
 .PS 1 4 "" :
 .PT
 NTEXT; the program text size in bytes.
 .PT
 NDATA; the number of load-file descriptors (see below).
 .PT
 NPROC; the number of entries in the procedure descriptor table.
 .PT
 ENTRY; procedure number of the procedure to start with.
 .PT
 NLINE; the maximum source line number.
 .PT
 SZDATA; the address of the lowest uninitialized data byte.
 .PT
 unused
 .PT
 unused
 .PE
 .IE
 .P
 The program text consists of NTEXT bytes.
 NTEXT is always a multiple of the wordsize.
 The first byte of the program text is the
 first byte of the instruction address
 space, i.e. it has address 0.
 Pointers into the program text are found in the procedure descriptor
 table where relocation is simple and in the global data area.
 The initialization of the global data area allows easy
 relocation of pointers into both address spaces.
 .P
 The global data area is described by the NDATA descriptors.
 Each descriptor describes a number of consecutive words (of~wordsize)
 and consists of a sequence of bytes.
 While reading the descriptors from the load file, one can
 initialize the global data area from low to high addresses.
 The size of the initialized data area is given by SZDATA,
 this number can be used to check the initialization.
 .N
 The header of each descriptor consists of a byte, describing the type,
 and a count.
 The number of bytes used for this (unsigned) count depends on the
 type of the descriptor and
 is either a pointer-sized integer
 or one byte.
 The meaning of the count depends on the descriptor type.
 At load time an interpreter can
 perform any conversion deemed necessary, such as
 reordering bytes in integers
 and pointers and adding base addresses to pointers.
 .BP
 .A
 In the following pictures we show a graphical notation of the
 initializers.
 The leftmost rectangle represents the leading byte.
 .N 1
 .DS
 .PS - 4 " "
 Fields marked with
 .N 1
 .PT n
 contain a pointer-sized integer used as a count
 .PT m
 contain a one-byte integer used as a count
 .PT b
 contain a one-byte integer
 .PT w
 contain a wordsized integer
 .PT p
 contain a data or instruction pointer
 .PT s
 contain a null terminated ASCII string
 .PE 1
 .DE 0
 .VS 1 1
 .DS
    -------------------
    | 0 |      n      |           repeat last initialization n times
    -------------------
 .DE
 .DS
    ---------
    | 1 | m |                     m uninitialized words
    ---------
 .DE
 .DS
               ____________
              /    bytes   \e
    -----------------   -----
    | 2 | m | b | b |...| b |     m initialized bytes
    -----------------   -----
 .DE
 .DS
               _________
              /  word   \e
    -----------------------
    | 3 | m |      w      |...    m initialized wordsized integers
    -----------------------
 .DE
 .DS
               _________
              / pointer \e
    -----------------------
    | 4 | m |      p      |...    m initialized data pointers
    -----------------------
 .DE
 .DS
               _________
              / pointer \e
    -----------------------
    | 5 | m |      p      |...    m initialized instruction pointers
    -----------------------
 .DE
 .DS
               ____________
              /    bytes   \e
    -------------------------
    | 6 | m | b | b |...| b |     initialized integer of size m
    -------------------------
 .DE
 .DS
               ____________
              /    bytes   \e
    -------------------------
    | 7 | m | b | b |...| b |     initialized unsigned of size m
    -------------------------
 .DE
 .DS
               ____________
              /   string   \e
    -------------------------
    | 8 | m |        s      |     initialized float of size m
    -------------------------
 .DE 3
 .PS - 8
 .PT type~0:
 If the last initialization initialized k bytes starting
 at address \fIa\fP, do the same initialization again n times,
 starting at \fIa\fP+k, \fIa\fP+2*k, .... \fIa\fP+n*k.
 This is the only descriptor whose starting byte
 is followed by an integer with the
 size of a
 pointer,
 in all other descriptors the first byte is followed by a one-byte count.
 This descriptor must be preceded by a descriptor of
 another type.
 .PT type~1:
 Reserve m words, not explicitly initialized (BSS and HOL).
 .PT type~2:
 The m bytes following the descriptor header are
 initializers for the next m bytes of the
 global data area.
 m is divisible by the wordsize.
 .PT type~3:
 The m words following the header are initializers for the next m words of the
 global data area.
 .PT type~4:
 The m data address space pointers following the header are
 initializers for the next
 m data pointers in the global data area.
 Interpreters that represent EM pointers by
 target machine addresses must relocate all data pointers.
 .PT type~5:
 The m instruction address space pointers following the header are
 initializers for the next
 m instruction pointers in the global data area.
 Interpreters that represent EM instruction pointers by
 target machine addresses must relocate these pointers.
 .PT type~6:
 The m bytes following the header form
 a signed integer number with a size of m bytes,
 which is an initializer for the next m bytes
 of the global data area.
 m is governed by the same restrictions as for
 transfer of objects to/from memory.
 .PT type~7:
 The m bytes following the header form
 an unsigned integer number with a size of m bytes,
 which is an initializer for the next m bytes
 of the global data area.
 m is governed by the same restrictions as for
 transfer of objects to/from memory.
 .PT type~8:
 The header is followed by an ASCII string, null terminated, to
 initialize, in global data,
 a floating point number with a size of m bytes.
 m is governed by the same restrictions as for
 transfer of objects to/from memory.
 The ASCII string contains the notation of a real as used in the
 Pascal language.
 .PE
 .P
 The NPROC procedure descriptors on the load file consist of
 an instruction space address (of~pointer~size) and
 an integer (of~pointer~size) specifying the number of bytes for
 locals.
--- a/doc/em/macr.nr
+++ b/doc/em/macr.nr
@ -0,0 +1,16 @@
 .so /usr/lib/tmac/tmac.kun
 .SS 6
 .RP
 .PL 12i 11i
 .LL 89
 .MS T E
 \!.TL '%'''
 .ME
 .MS T O
 \!.TL '''%'
 .ME
 .MS B
 .sp 1
 .ME
 .SM S1 B
 .SM S2 B
--- a/doc/em/mapping.nr
+++ b/doc/em/mapping.nr
@ -0,0 +1,245 @@
 .SN 5
 .BP
 .S1 "MAPPING OF EM DATA MEMORY ONTO TARGET MACHINE MEMORY"
 The EM architecture is designed to be implemented
 on many existing and future machines.
 EM memory is highly fragmented to make
 adaptation to various memory architectures possible.
 Format and encoding of pointers is explicitly undefined.
 .P
 This chapter gives solutions to some of the
 anticipated problems.
 First, we describe a possible memory layout for machines
 with 64K bytes of address space.
 Here we use a member of the EM family with 2-byte word and pointer
 size.
 The most straightforward layout is shown in figure 2.
 .N 1
 .DS
       65534 -> |-------------------------------|
                |///////////////////////////////|
                |//// unimplemented memory /////|
                |///////////////////////////////|
          ML -> |-------------------------------|
                |                               |
                |                               | <- LB
                |     stack and local area      |
                |                               |
                |-------------------------------| <- SP
                |///////////////////////////////|
                |//////// inaccessible /////////|
                |///////////////////////////////|
                |-------------------------------| <- HP
                |                               |
                |           heap area           |
                |                               |
                |                               |
          HB -> |-------------------------------|
                |                               |
                |       global data area        |
                |                               |
          EB -> |-------------------------------|
                |                               |
                |         program text          | <- PC
                |                               |
                |        ( and tables )         |
                |                               |
                |                               |
          PB -> |-------------------------------|
                |///////////////////////////////|
                |////////// undefined //////////|
                |///////////////////////////////|
           0 -> |-------------------------------|
           Figure 2.  Memory layout showing typical register
           positions during execution of an EM program.
 .DE 2
 The base registers for the various memory pieces can be stored
 in target machine registers or memory.
 .IS
 .N 1
 .TS
 tab(;);
 l 1 l l l.
 PB;:;program base;points to the base of the instruction address space.
 EB;:;external base;points to the base of the data address space.
 HB;:;heap base;points to the base of the heap area.
 ML;:;memory limit;marks the high end of the addressable data space.
 .TE 1
 .IE
 The stack grows from high
 EM addresses to low EM addresses, and the heap the
 other way.
 The memory between SP and HP is not accessible,
 but may be allocated later to the stack or the heap if needed.
 The local data area is allocated starting at the high end of
 memory.
 .P
 Because EM address 0 is not mapped onto target
 address 0, a problem arises when pointers are used.
 If a program pushed a constant, say 6, onto the stack,
 and then tried to indirect through it,
 the wrong word would be fetched,
 because EM address 6 is mapped onto target address EB+6
 and not target address 6 itself.
 This particular problem is solved by explicitly declaring
 the format of a pointer to be undefined,
 so that using a constant as a pointer is completely illegal.
 However, the general problem of mapping pointers still exists.
 .P
 There are two possible solutions.
 In the first solution, EM pointers are represented
 in the target machine as true EM addresses,
 for example, a pointer to EM address 6 really is
 stored as a 6 in the target machine.
 This solution implies that every time a pointer is fetched
 EB must be added before referencing
 the target machine's memory.
 If the target machine has powerful indexing
 facilities, EB can be kept in a target machine register,
 and the relocation can indeed be done on
 every reference to the data address space
 at a modest cost in speed.
 .P
 The other solution consists of having EM pointers
 refer to the true target machine address.
 Thus the instruction LAE 6 (Load Address of External 6)
 would push the value of EB+6 onto the stack.
 When this approach is chosen, back ends must know
 how to offset from EB, to translate all
 instructions that manipulate EM addresses.
 However, the problem is not completely solved,
 because a front end may have to initialize a pointer
 in CON or ROM data to point to a global address.
 This pointer must also be relocated by the back end or the interpreter.
 .P
 Although the EM stack grows from high to low EM addresses,
 some machines have hardware PUSH and POP
 instructions that require the stack to grow upwards.
 If reasons of efficiency urge you to use these
 instructions, then EM
 can be implemented with the memory layout
 upside down, as shown in figure 3.
 This is possible because the pointer format is explicitly undefined.
 The first element of a word array will have a
 lower physical address than the second element.
 .N 2
 .DS
          |                 |                    |                 |
          |      EB=60      |                    |        ^        |
          |                 |                    |        |        |
          |-----------------|                    |-----------------|
      105 |   45   |   44   | 104            214 |   41   |   40   | 215
          |-----------------|                    |-----------------|
      103 |   43   |   42   | 102            212 |   43   |   42   | 213
          |-----------------|                    |-----------------|
      101 |   41   |   40   | 100            210 |   45   |   44   | 211
          |-----------------|                    |-----------------|
          |        |        |                    |                 |
          |        v        |                    |      EB=255     |
          |                 |                    |                 |
                Type A                                 Type B
 .sp 2
              Figure 3. Two possible memory implementations.
                 Numbers within the boxes are EM addresses.
                 The other numbers are physical addresses.
 .DE 2
 .A 0 0
 So, we have two different EM memory implementations:
 .IS
 .PS - 4
 .PT A~-
 stack downwards
 .PT B~-
 stack upwards
 .PE
 .IE
 .P
 For each of these two possibilities we give the translation of
 the EM instructions to push the third byte of a global data
 block starting at EM address 40 onto the stack and to load the
 word at address 40.
 All translations assume a word and pointer size of two bytes.
 The target machine used is a PDP-11 augmented with push and pop instructions.
 Registers 'r0' and 'r1' are used and suffer from sign extension for byte
 transfers.
 Push $40 means push the constant 40, not word 40.
 .P
 The translation of the EM instructions depends on the pointer representation
 used.
 For each of the two solutions explained above the translation is given.
 .P
 First, the translation for the two implementations using EM addresses as
 pointer representation:
 .DS
 .TS
 tab(:), center;
 l s l s l s
 _ s _ s _ s
 l 2 l 6 l 2 l 6 l 2 l.
 EM:type A:type B
 LAE:40:push:$40:push:$40
 ADP:3:pop:r0:pop:r0
 ::add:$3,r0:add:$3,r0
 ::push:r0:push:r0
 LOI:1:pop:r0:pop:r0
 ::-::neg:r0
 ::clr:r1:clr:r1
 ::bisb:eb(r0),r1:bisb:eb(r0),r1
 ::push:r1:push:r1
 LOE:40:push:eb+40:push:eb-41
 .TE
 .DE
 .BP
 .P
 The translation for the two implementations, if the target machine address is
 used as pointer representation, is:
 .N 1
 .DS
 .TS
 tab(:), center;
 l s l s l s
 _ s _ s _ s
 l 2 l 6 l 2 l 6 l 2 l.
 EM:type A:type B
 LAE:40:push:$eb+40:push:$eb-40
 ADP:3:pop:r0:pop:r0
 ::add:$3,r0:sub:$3,r0
 ::push:r0:push:r0
 LOI:1:pop:r0:pop:r0
 ::clr:r1:clr:r1
 ::bisb:(r0),r1:bisb:(r0),r1
 ::push:r1:push:r1
 LOE:40:push:eb+40:push:eb-41
 .TE
 .DE
 .P
 The translation presented above is not intended to be optimal.
 Most machines can handle these simple cases in one or two instructions.
 It demonstrates, however, the flexibility of the EM design.
 .P
 There are several possibilities to implement EM on machines with
 address spaces larger than 64k bytes.
 For EM with two byte pointers one could allocate instruction and
 data space each in a separate 64k piece of memory.
 EM pointers still have to fit in two bytes,
 but the base registers PB and EB may be loaded in hardware registers
 wider than 16 bits, if available.
 EM implementations can also make efficient use of a machine
 with separate instruction and data space.
 .P
 EM with 32 bit pointers allows one to make use of machines
 with large address spaces.
 In a virtual, segmented memory system one could use a separate
 segment for each fragment.
--- a/doc/em/mem.nr
+++ b/doc/em/mem.nr
@ -0,0 +1,80 @@
 .BP
 .SN 2
 .S1 MEMORY
 The EM machine has two distinct address spaces,
 one for instructions and one for data.
 The data space is divided up into 8-bit bytes.
 The smallest addressable unit is a byte.
 Bytes are numbered consecutively from 0 to some maximum.
 All sizes in EM are expressed in bytes.
 .P
 Some EM instructions can transfer objects containing several bytes
 to and/or from memory.
 The size of all objects larger than a word must be a multiple of
 the wordsize.
 The size of all objects smaller than a word must be a divisor
 of the wordsize.
 For example: if the wordsize is 2 bytes, objects of the sizes 1,
 2, 4, 6,... are allowed.
 The address of such an object is the lowest address of all bytes it contains.
 For objects smaller than the wordsize, the
 address must be a multiple of the object size.
 For all other objects the address must be a multiple of the
 wordsize.
 For example, if an instruction transfers a 4-byte object to memory at
 location \fIm\fP and the wordsize is 2,
 \fIm\fP must be a multiple of 2 and the bytes at
 locations \fIm\fP, \fIm\fP\|+\|1,\fIm\fP\|+\|2 and
 \fIm\fP\|+\|3 are overwritten.
 .P
 The size of almost all objects in EM
 is an integral number of words.
 Only two operations are allowed on
 objects whose size is a divisor of the wordsize:
 push it onto the stack and pop it from the stack.
 The addressing of these objects in memory is always indirect.
 If such a small object is pushed onto the stack
 it is assumed to be a small integer and stored
 in the least significant part of a word.
 The rest of the word is cleared to zero,
 although
 EM provides a way to sign-extend a small integer.
 Popping a small object from the stack removes a word
 from the stack, stores the least significant byte(s)
 of this word in memory and discards the rest of the word.
 .P
 The format of pointers into both address spaces is explicitly undefined.
 The size of a pointer, however, is fixed for a member of EM, so that
 the compiler writer knows how much storage to allocate for a pointer.
 .P
 A minor problem is raised by the undefined pointer format.
 Some languages, notably Pascal, require a special,
 otherwise illegal, pointer value to represent the nil pointer.
 The current Pascal-VU compiler uses the
 integer value 0 as nil pointer.
 This value is also used by many C programs as a normally impossible address.
 A better solution would be to have a special
 instruction loading an illegal pointer value,
 but it is hard to imagine an implementation
 for which the current solution is inadequate,
 especially because the first word in the EM data space
 is special and probably not the target of any pointer.
 .P
 The next two chapters describe the EM memory
 in more detail.
 One describes the instruction address space,
 the other the data address space.
 .P
 A design goal of EM has been to allow
 its implementation on a wide range of existing machines,
 as well as allowing a new one to be built in hardware.
 To this extent we have tried to minimize the demands
 of EM on the memory structure of the target machine.
 Therefore, apart from the logical partitioning,
 EM memory is divided into 'fragments'.
 A fragment consists of consecutive machine
 words and has a base address and a size.
 Pointer arithmetic is only defined within a fragment.
 The only exception to this rule is comparison with the null
 pointer.
 All fragments must be word aligned.
--- a/doc/em/print
+++ b/doc/em/print
@ -0,0 +1,5 @@
 case $# in
 1)      make "$1".t ; ntlp "$1".t^lpr ;;
 *)      echo $0 heeft een argument nodig ;;
 esac
--- a/doc/em/show
+++ b/doc/em/show
@ -0,0 +1,4 @@
 case $# in
 1)      make $1.t ; ntout $1.t ;;
 *)      echo $0 heeft een argument nodig ;;
 esac
--- a/doc/em/title.nr
+++ b/doc/em/title.nr
@ -0,0 +1,38 @@
 .po 0
 .TP 1
 .ll 79
 .sp 15
 .ce 4
 DESCRIPTION OF A MACHINE
 ARCHITECTURE FOR  USE WITH
 BLOCK  STRUCTURED  LANGUAGES
 .sp 6
 .ce 4
 Andrew S. Tanenbaum
 Hans  van  Staveren
 Ed G. Keizer
 Johan  W. Stevenson\v'-0.5m'*\v'0.5m'
 .sp 2
 .ce
 August 1983
 .sp 2
 .ce
 Informatica Rapport IR-81
 .sp 13
 Abstract
 .sp 2
 .ti +5
 EM is a family of intermediate languages
 designed for producing portable compilers.
 A program called
 .B front end
 translates source programs to EM.
 Another program,
 .B back
 .BW end ,
 translates EM to the assembly language of the target machine.
 Alternatively, the EM program can be assembled to a highly
 efficient binary format for interpretation.
 This document describes the EM languages in detail.
 .sp 4
 \v'-0.5m'*\v'0.5m' Present affiliation: NV Philips, Eindhoven
--- a/doc/em/types.nr
+++ b/doc/em/types.nr
@ -0,0 +1,130 @@
 .SN 6
 .BP
 .S1 "TYPE REPRESENTATIONS"
 The representations used for typed objects are not precisely
 specified by EM.
 Sometimes we only specify that a typed object occupies a
 certain amount of space and state no further restrictions.
 If one wants to have a different representation of the value of
 an object on the stack one has to use a convert instruction
 in most cases.
 We do specify some relations between the representations of
 types.
 This allows some intermixed use of operators for different types
 on the same object(s).
 For example, the instruction ZER pushes signed and
 unsigned integers with the value zero and empty sets.
 ZER has as only argument the size of the object.
 .A
 The representation of floating point numbers is a good example,
 it allows widely varying implementations.
 The only ways to create floating point numbers are via
 initialization and via conversions from integer numbers.
 Only by using conversions to integers and comparing
 two floating point numbers with each other, can these numbers
 be converted to human readable output.
 Implementations may use base 10, base 2 or any other
 base for exponents, and have freedom in choosing the range of
 exponent and mantissa.
 .A
 Other types are more precisely described.
 In the following paragraphs a description will be given of the
 restrictions imposed on the representation of the types used.
 A number \fBn\fP used in these paragraphs indicates the size of
 the object in \fIbits\fP.
 .S2 "Unsigned integers"
 The range of unsigned integers is 0..2\v'-0.5m'\fBn\fP\v'0.5m'-1.
 A binary representation is assumed.
 The order of the bits within an object is knowingly left
 unspecified.
 Discussing bit order within each 8-bit byte is academic,
 so the only real freedom of this specification lies in the byte
 order.
 We really do not care whether an implementation of a 4-byte
 integer has its bytes in a particular order of significance.
 This of course means that some sequences of instructions have
 unpredictable effects.
 For example:
 .DS
   LOC 258 ; STL 0 ; LAL 0 ; LOI 1      ( wordsize >=2 )
 .DE
 The value on the stack after executing this sequence
 can be anything,
 but will most likely be 1 or 2.
 .A
 Conversion between unsigned integers of different sizes have to
 be done with explicit convert instructions.
 One cannot simply pad an unsigned integer with zero's at either end
 and expect a correct result.
 .A
 We assume existence of at least single word unsigned arithmetic
 in any implementation.
 .S2 "Signed Integers"
 The range of signed integers is -2\v'-0.5m'\fBn\fP-1\v'0.5m'~..~2\v'-0.5m'\fBn\fP-1\v'0.5m'-1,
 in other words the range of signed integers of \fBn\fP bits
 using two's complement arithmetic.
 The representation is the same as for unsigned integers except
 the range 2\v'-0.5m'\fBn\fP-1\v'0.5m'~..~2\v'-0.5m'\fBn\fP\v'0.5m'-1 is mapped on the
 range -2\v'-0.5m'\fBn\fP-1\v'0.5m'~..~-1.
 In other words, the most significant bit is used as sign bit.
 The convert instructions between signed and unsigned integers
 of the same size can be used to catch errors.
 .A
 The value -2\v'-0.5m'\fBn\fP-1\v'0.5m' is used for undefined
 signed integers.
 EM implementations should trap when this value is used in an
 operation on signed integers.
 The instruction mask, accessed with SIM and LIM -~see chapter 9~- ,
 can be used to disable such traps.
 .A
 We assume existence of at least single word signed arithmetic
 in any implementation.
 .BP
 .S2 "Floating point values"
 Floating point values must have a signed mantissa and a signed
 exponent.
 Although no base is specified, base 2 is the normal choice,
 because the FEF instruction pushes the exponent in base 2.
 .A
 The implementation of floating point arithmetic is optional.
 The compilers currently in use have runtime parameters for the
 size of the floating point values they should use.
 Common choices are 4 and/or 8 bytes.
 .S2 Pointers
 EM has two kinds of pointers: for instruction and for data
 space.
 Each kind can only be used for its own space, conversion between
 these two subtypes is impossible.
 We assume that pointers have a range from 0 upwards.
 Any implementation may have holes in the pointer range between
 fragments.
 One can of course not expect to be able to address two megabyte
 of memory using a 2-byte pointer.
 Normally, a 2-byte pointer allows up to 65536 bytes of
 addressable memory.
 .A
 Pointer representation has one restriction.
 The pointer with the same representation as the integer zero of
 the same size should be invalid.
 Some languages and/or runtime systems represent the nil
 pointer as zero.
 .S2 "Bit sets"
 All bit sets of size \fBn\fP are subsets of the set
 {~i~|~i>=0,~i<\fBn\fP~}.
 A bit set contains a bit for each element showing its
 presence or absence.
 Bit sets are subdivided into words.
 The word with the lowest EM address governs the subset
 {~i~|~i>=0,~i<\fBm\fP~}, where \fBm\fP is the number of bits in
 a word.
 The next higher words each govern the next higher \fBm\fP set elements.
 The relation between a set with size of
 a word and an unsigned integer word is that
 the value of the unsigned integer is the summation of the
 2\v'-0.5m'i\v'0.5m' where i is in the set.
 .A
 Example: a 2-word bit set (wordsize 2) containing the
 elements 1, 6, 8, 15, 18, 21, 27 and 28 is composed of two
 integers, e.g. at addresses 40 and 42.
 The word at 40 contains the value 33090 (or~-32446),
 the word at 42 contains the value 6180.