Initial revision
This commit is contained in:
		
							parent
							
								
									253118db19
								
							
						
					
					
						commit
						e0872423d9
					
				
					 21 changed files with 7189 additions and 0 deletions
				
			
		
							
								
								
									
										1121
									
								
								doc/em/addend.n
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										1121
									
								
								doc/em/addend.n
									
										
									
									
									
										Normal file
									
								
							
										
											
												File diff suppressed because it is too large
												Load diff
											
										
									
								
							
							
								
								
									
										488
									
								
								doc/em/app.nr
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										488
									
								
								doc/em/app.nr
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,488 @@ | |||
| .BP | ||||
| .AP "EM INTERPRETER" | ||||
| .nf | ||||
| .ta 8 16 24 32 40 48 56 64 72 80 | ||||
| .so em.i | ||||
| .fi | ||||
| .BP | ||||
| .AP "EM CODE TABLES" | ||||
| The following table is used by the assembler for EM machine | ||||
| language. | ||||
| It specifies the opcodes used for each instruction and | ||||
| how arguments are mapped to machine language arguments. | ||||
| The table is presented in three columns, | ||||
| each line in each column contains three or four fields. | ||||
| Each line describes a range of interpreter opcodes by | ||||
| specifying for which instruction the range is used, the type of the | ||||
| opcodes (mini, shortie, etc..) and range for the instruction | ||||
| argument. | ||||
| .A | ||||
| The first field on each line gives the EM instruction mnemonic, | ||||
| the second field gives some flags. | ||||
| If the opcodes are minis or shorties the third field specifies | ||||
| how many minis/shorties are used. | ||||
| The last field gives the number of the (first) interpreter | ||||
| opcode. | ||||
| .N 1 | ||||
| Flags : | ||||
| .IS 3 | ||||
| .N 1 | ||||
| Opcode type, only one of the following may be specified. | ||||
| .PS - 5 "  " | ||||
| .PT - | ||||
| opcode without argument | ||||
| .PT m | ||||
| mini | ||||
| .PT s | ||||
| shortie | ||||
| .PT 2 | ||||
| opcode with 2-byte signed argument | ||||
| .PT 4 | ||||
| opcode with 4-byte signed argument | ||||
| .PT 8 | ||||
| opcode with 8-byte signed argument | ||||
| .PE | ||||
| Secondary (escaped) opcodes. | ||||
| .PS - 5 "  " | ||||
| .PT e | ||||
| The opcode thus marked is in the secondary opcode group instead | ||||
| of the primary | ||||
| .PE | ||||
| restrictions on arguments | ||||
| .PS - 5 "  " | ||||
| .PT N | ||||
| Negative arguments only | ||||
| .PT P | ||||
| Positive and zero arguments only | ||||
| .PE | ||||
| mapping of arguments | ||||
| .PS - 5 "  " | ||||
| .PT w | ||||
| argument must be divisible by the wordsize and is divided by the | ||||
| wordsize before use as opcode argument. | ||||
| .PT o | ||||
| argument ( possibly after division ) must be >= 1 and is | ||||
| decremented before use as opcode argument | ||||
| .PE | ||||
| .IE | ||||
| If the opcode type is 2,4 or 8 the resulting argument is used as | ||||
| opcode argument (least significant byte first). | ||||
| .N | ||||
| If the opcode type is mini, the argument is added | ||||
| to the first opcode - if in range - . | ||||
| If the argument is negative, the absolute value minus one is | ||||
| used in the algorithm above. | ||||
| .N | ||||
| For shorties with positive arguments the first opcode is used | ||||
| for arguments in the range 0..255, the second for the range | ||||
| 256..511, etc.. | ||||
| For shorties with negative arguments the first opcode is used | ||||
| for arguments in the range -1..-256, the second for the range | ||||
| -257..-512, etc.. | ||||
| The byte following the opcode contains the least significant | ||||
| byte of the argument. | ||||
| First some examples of these specifications. | ||||
| .PS - 5 | ||||
| .PT "aar mwPo 1 34" | ||||
| Indicates that opcode 34 is used as a mini for Positive | ||||
| instruction arguments only. | ||||
| The w and o indicate division and decrementing of the | ||||
| instruction argument. | ||||
| Because the resulting argument must be zero ( only opcode 34 may be used | ||||
| ), this mini can only be used for instruction argument 2. | ||||
| Conclusion: opcode 34 is for "AAR 2". | ||||
| .PT "adp sP 1 41" | ||||
| Opcode 41 is used as shortie for ADP with arguments in the range | ||||
| 0..255. | ||||
| .PT "bra sN 2 60" | ||||
| Opcode 60 is used as shortie for BRA with arguments -1..-256, | ||||
| 61 is used for arguments -257..-512. | ||||
| .PT "zer e- 145" | ||||
| Escaped opcode 145 is used for ZER. | ||||
| .PE | ||||
| The interpreter opcode table: | ||||
| .N 1 | ||||
| .IS 3 | ||||
| .DS B | ||||
| .so itables | ||||
| .DE 0 | ||||
| .IE | ||||
| .P | ||||
| The table above results in the following dispatch tables. | ||||
| Dispatch tables are used by interpreters to jump to the | ||||
| routines implementing the EM instructions, indexed by the next opcode. | ||||
| Each line of the dispatch tables gives the routine names | ||||
| of eight consecutive opcodes, preceded by the first opcode number | ||||
| on that line. | ||||
| Routine names consist of an EM mnemonic followed by a suffix. | ||||
| The suffices show the encoding used for each opcode. | ||||
| .N | ||||
| The following suffices exist: | ||||
| .N 1 | ||||
| .VS 1 0 | ||||
| .IS 4 | ||||
| .PS - 11 | ||||
| .PT .z | ||||
| no arguments | ||||
| .PT .l | ||||
| 16-bit argument | ||||
| .PT .lw | ||||
| 16-bit argument divided by the wordsize | ||||
| .PT .p | ||||
| positive 16-bit argument | ||||
| .PT .pw | ||||
| positive 16-bit argument divided by the wordsize | ||||
| .PT .n | ||||
| negative 16-bit argument | ||||
| .PT .nw | ||||
| negative 16-bit argument divided by the wordsize | ||||
| .PT .s<num> | ||||
| shortie with <num> as high order argument byte | ||||
| .PT .sw<num> | ||||
| shortie with argument divided by the wordsize | ||||
| .PT .<num> | ||||
| mini with <num> as argument | ||||
| .PT .<num>W | ||||
| mini with <num>*wordsize as argument | ||||
| .PE 3 | ||||
| <num> is a possibly negative integer. | ||||
| .VS 1 1 | ||||
| .IE | ||||
| The dispatch table for the 256 primary opcodes: | ||||
| .DS B | ||||
|    0   loc.0    loc.1    loc.2    loc.3    loc.4    loc.5    loc.6    loc.7 | ||||
|    8   loc.8    loc.9    loc.10   loc.11   loc.12   loc.13   loc.14   loc.15 | ||||
|   16   loc.16   loc.17   loc.18   loc.19   loc.20   loc.21   loc.22   loc.23 | ||||
|   24   loc.24   loc.25   loc.26   loc.27   loc.28   loc.29   loc.30   loc.31 | ||||
|   32   loc.32   loc.33   aar.1W   adf.s0   adi.1W   adi.2W   adp.l    adp.1 | ||||
|   40   adp.2    adp.s0   adp.s-1  ads.1W   and.1W   asp.1W   asp.2W   asp.3W | ||||
|   48   asp.4W   asp.5W   asp.w0   beq.l    beq.s0   bge.s0   bgt.s0   ble.s0 | ||||
|   56   blm.s0   blt.s0   bne.s0   bra.l    bra.s-1  bra.s-2  bra.s0   bra.s1 | ||||
|   64   cal.1    cal.2    cal.3    cal.4    cal.5    cal.6    cal.7    cal.8 | ||||
|   72   cal.9    cal.10   cal.11   cal.12   cal.13   cal.14   cal.15   cal.16 | ||||
|   80   cal.17   cal.18   cal.19   cal.20   cal.21   cal.22   cal.23   cal.24 | ||||
|   88   cal.25   cal.26   cal.27   cal.28   cal.s0   cff.z    cif.z    cii.z | ||||
|   96   cmf.s0   cmi.1W   cmi.2W   cmp.z    cms.s0   csa.1W   csb.1W   dec.z | ||||
|  104   dee.w0   del.w-1  dup.1W   dvf.s0   dvi.1W   fil.l    inc.z    ine.lw | ||||
|  112   ine.w0   inl.-1W  inl.-2W  inl.-3W  inl.w-1  inn.s0   ior.1W   ior.s0 | ||||
|  120   lae.l    lae.w0   lae.w1   lae.w2   lae.w3   lae.w4   lae.w5   lae.w6 | ||||
|  128   lal.p    lal.n    lal.0    lal.-1   lal.w0   lal.w-1  lal.w-2  lar.W | ||||
|  136   ldc.0    lde.lw   lde.w0   ldl.0    ldl.w-1  lfr.1W   lfr.2W   lfr.s0 | ||||
|  144   lil.w-1  lil.w0   lil.0    lil.1W   lin.l    lin.s0   lni.z    loc.l | ||||
|  152   loc.-1   loc.s0   loc.s-1  loe.lw   loe.w0   loe.w1   loe.w2   loe.w3 | ||||
|  160   loe.w4   lof.l    lof.1W   lof.2W   lof.3W   lof.4W   lof.s0   loi.l | ||||
|  168   loi.1    loi.1W   loi.2W   loi.3W   loi.4W   loi.s0   lol.pw   lol.nw | ||||
|  176   lol.0    lol.1W   lol.2W   lol.3W   lol.-1W  lol.-2W  lol.-3W  lol.-4W | ||||
|  184   lol.-5W  lol.-6W  lol.-7W  lol.-8W  lol.w0   lol.w-1  lxa.1    lxl.1 | ||||
|  192   lxl.2    mlf.s0   mli.1W   mli.2W   rck.1W   ret.0    ret.1W   ret.s0 | ||||
|  200   rmi.1W   sar.1W   sbf.s0   sbi.1W   sbi.2W   sdl.w-1  set.s0   sil.w-1 | ||||
|  208   sil.w0   sli.1W   ste.lw   ste.w0   ste.w1   ste.w2   stf.l    stf.W | ||||
|  216   stf.2W   stf.s0   sti.1    sti.1W   sti.2W   sti.3W   sti.4W   sti.s0 | ||||
|  224   stl.pw   stl.nw   stl.0    stl.1W   stl.-1W  stl.-2W  stl.-3W  stl.-4W | ||||
|  232   stl.-5W  stl.w-1  teq.z    tgt.z    tlt.z    tne.z    zeq.l    zeq.s0 | ||||
|  240   zeq.s1   zer.s0   zge.s0   zgt.s0   zle.s0   zlt.s0   zne.s0   zne.s-1 | ||||
|  248   zre.lw   zre.w0   zrl.-1W  zrl.-2W  zrl.w-1  zrl.nw   escape1  escape2 | ||||
| .DE 2 | ||||
| The list of secondary opcodes (escape1): | ||||
| .N  1 | ||||
| .DS  B | ||||
|    0   aar.l    aar.z    adf.l    adf.z    adi.l    adi.z    ads.l    ads.z | ||||
|    8   adu.l    adu.z    and.l    and.z    asp.lw   ass.l    ass.z    bge.l | ||||
|   16   bgt.l    ble.l    blm.l    bls.l    bls.z    blt.l    bne.l    cai.z | ||||
|   24   cal.l    cfi.z    cfu.z    ciu.z    cmf.l    cmf.z    cmi.l    cmi.z | ||||
|   32   cms.l    cms.z    cmu.l    cmu.z    com.l    com.z    csa.l    csa.z | ||||
|   40   csb.l    csb.z    cuf.z    cui.z    cuu.z    dee.lw   del.pw   del.nw | ||||
|   48   dup.l    dus.l    dus.z    dvf.l    dvf.z    dvi.l    dvi.z    dvu.l | ||||
|   56   dvu.z    fef.l    fef.z    fif.l    fif.z    inl.pw   inl.nw   inn.l | ||||
|   64   inn.z    ior.l    ior.z    lar.l    lar.z    ldc.l    ldf.l    ldl.pw | ||||
|   72   ldl.nw   lfr.l    lil.pw   lil.nw   lim.z    los.l    los.z    lor.s0 | ||||
|   80   lpi.l    lxa.l    lxl.l    mlf.l    mlf.z    mli.l    mli.z    mlu.l | ||||
|   88   mlu.z    mon.z    ngf.l    ngf.z    ngi.l    ngi.z    nop.z    rck.l | ||||
|   96   rck.z    ret.l    rmi.l    rmi.z    rmu.l    rmu.z    rol.l    rol.z | ||||
|  104   ror.l    ror.z    rtt.z    sar.l    sar.z    sbf.l    sbf.z    sbi.l | ||||
|  112   sbi.z    sbs.l    sbs.z    sbu.l    sbu.z    sde.l    sdf.l    sdl.pw | ||||
|  120   sdl.nw   set.l    set.z    sig.z    sil.pw   sil.nw   sim.z    sli.l | ||||
|  128   sli.z    slu.l    slu.z    sri.l    sri.z    sru.l    sru.z    sti.l | ||||
|  136   sts.l    sts.z    str.s0   tge.z    tle.z    trp.z    xor.l    xor.z | ||||
|  144   zer.l    zer.z    zge.l    zgt.l    zle.l    zlt.l    zne.l    zrf.l | ||||
|  152   zrf.z    zrl.pw   dch.z    exg.s0   exg.l    exg.z    lpb.z    gto.l | ||||
| .DE 2 | ||||
| Finally, the list of opcodes with four byte arguments (escape2). | ||||
| .DS | ||||
| 
 | ||||
|    0  loc | ||||
| .DE 0 | ||||
| .BP | ||||
| .AP "AN EXAMPLE PROGRAM" | ||||
| .DS B | ||||
|  1      program example(output); | ||||
|  2      {This program just demonstrates typical EM code.} | ||||
|  3      type rec = record r1: integer; r2:real; r3: boolean end; | ||||
|  4      var mi: integer;  mx:real;  r:rec; | ||||
|  5 | ||||
|  6      function sum(a,b:integer):integer; | ||||
|  7      begin | ||||
|  8        sum := a + b | ||||
|  9      end; | ||||
| 10 | ||||
| 11      procedure test(var r: rec); | ||||
| 12      label 1; | ||||
| 13      var i,j: integer; | ||||
| 14          x,y: real; | ||||
| 15          b: boolean; | ||||
| 16          c: char; | ||||
| 17          a: array[1..100] of integer; | ||||
| 18 | ||||
| 19      begin | ||||
| 20              j := 1; | ||||
| 21              i := 3 * j + 6; | ||||
| 22              x := 4.8; | ||||
| 23              y := x/0.5; | ||||
| 24              b := true; | ||||
| 25              c := 'z'; | ||||
| 26              for i:= 1 to 100 do a[i] := i * i; | ||||
| 27              r.r1 := j+27; | ||||
| 28              r.r3 := b; | ||||
| 29              r.r2 := x+y; | ||||
| 30              i := sum(r.r1, a[j]); | ||||
| 31              while i > 0 do begin j := j + r.r1; i := i - 1 end; | ||||
| 32              with r do begin r3 := b;  r2 := x+y;  r1 := 0 end; | ||||
| 33              goto 1; | ||||
| 34      1:      writeln(j, i:6, x:9:3, b) | ||||
| 35      end; {test} | ||||
| 36      begin {main program} | ||||
| 37        mx := 15.96; | ||||
| 38        mi := 99; | ||||
| 39        test(r) | ||||
| 40      end. | ||||
| .DE 0 | ||||
| .BP | ||||
| The EM code as produced by the Pascal-VU compiler is given below. Comments | ||||
| have been added manually.  Note that this code has already been  optimized. | ||||
| .DS B | ||||
|   mes 2,2,2              ; wordsize 2, pointersize 2 | ||||
|  .1 | ||||
|   rom 't.p\e000'         ; the name of the source file | ||||
|   hol 552,-32768,0       ; externals and buf occupy 552 bytes | ||||
|   exp $sum               ; sum can be called from other modules | ||||
|   pro $sum,2             ; procedure sum; 2 bytes local storage | ||||
|   lin 8                  ; code from source line 8 | ||||
|   ldl 0                  ; load two locals ( a and b ) | ||||
|   adi 2                  ; add them | ||||
|   ret 2                  ; return the result | ||||
|   end 2                  ; end of procedure ( still two bytes local storage ) | ||||
|  .2 | ||||
|   rom 1,99,2             ; descriptor of array a[] | ||||
|   exp $test              ; the compiler exports all level 0 procedures | ||||
|   pro $test,226          ; procedure test, 226 bytes local storage | ||||
|  .3 | ||||
|   rom 4.8F8              ; assemble Floating point 4.8 (8 bytes) in | ||||
|  .4                              ; global storage | ||||
|   rom 0.5F8              ; same for 0.5 | ||||
|   mes 3,-226,2,2         ; compiler temporary not referenced by address | ||||
|   mes 3,-24,2,0          ; the same is true for i, j, b and c in test | ||||
|   mes 3,-22,2,0 | ||||
|   mes 3,-4,2,0 | ||||
|   mes 3,-2,2,0 | ||||
|   mes 3,-20,8,0          ; and for x and y | ||||
|   mes 3,-12,8,0 | ||||
|   lin 20                 ; maintain source line number | ||||
|   loc 1 | ||||
|   stl -4                 ; j := 1 | ||||
|   lni                    ; lin 21 prior to optimization | ||||
|   lol -4 | ||||
|   loc 3 | ||||
|   mli 2 | ||||
|   loc 6 | ||||
|   adi 2 | ||||
|   stl -2                 ; i := 3 * j + 6 | ||||
|   lni                    ; lin 22 prior to optimization | ||||
|   lae .3 | ||||
|   loi 8 | ||||
|   lal -12 | ||||
|   sti 8                  ; x := 4.8 | ||||
|   lni                    ; lin 23 prior to optimization | ||||
|   lal -12 | ||||
|   loi 8 | ||||
|   lae .4 | ||||
|   loi 8 | ||||
|   dvf 8 | ||||
|   lal -20 | ||||
|   sti 8                  ; y := x / 0.5 | ||||
|   lni                    ; lin 24 prior to optimization | ||||
|   loc 1 | ||||
|   stl -22                ; b := true | ||||
|   lni                    ; lin 25 prior to optimization | ||||
|   loc 122 | ||||
|   stl -24                ; c := 'z' | ||||
|   lni                    ; lin 26 prior to optimization | ||||
|   loc 1 | ||||
|   stl -2                 ; for i:= 1 | ||||
|  2 | ||||
|   lol -2 | ||||
|   dup 2 | ||||
|   mli 2                  ; i*i | ||||
|   lal -224 | ||||
|   lol -2 | ||||
|   lae .2 | ||||
|   sar 2                  ; a[i] := | ||||
|   lol -2 | ||||
|   loc 100 | ||||
|   beq *3                 ; to 100 do | ||||
|   inl -2                 ; increment i and loop | ||||
|   bra *2 | ||||
|  3 | ||||
|   lin 27 | ||||
|   lol -4 | ||||
|   loc 27 | ||||
|   adi 2                  ; j + 27 | ||||
|   sil 0                  ; r.r1 := | ||||
|   lni                    ; lin 28 prior to optimization | ||||
|   lol -22                ; b | ||||
|   lol 0 | ||||
|   stf 10                 ; r.r3 := | ||||
|   lni                    ; lin 29 prior to optimization | ||||
|   lal -20 | ||||
|   loi 16 | ||||
|   adf 8                  ; x + y | ||||
|   lol 0 | ||||
|   adp 2 | ||||
|   sti 8                  ; r.r2 := | ||||
|   lni                    ; lin 23 prior to optimization | ||||
|   lal -224 | ||||
|   lol -4 | ||||
|   lae .2 | ||||
|   lar 2                  ; a[j] | ||||
|   lil 0                  ; r.r1 | ||||
|   cal $sum               ; call now | ||||
|   asp 4                  ; remove parameters from stack | ||||
|   lfr 2                  ; get function result | ||||
|   stl -2                 ; i := | ||||
|  4 | ||||
|   lin 31 | ||||
|   lol -2 | ||||
|   zle *5                 ; while i > 0 do | ||||
|   lol -4 | ||||
|   lil 0 | ||||
|   adi 2 | ||||
|   stl -4                 ; j := j + r.r1 | ||||
|   del -2                 ; i := i - 1 | ||||
|   bra *4                 ; loop | ||||
|  5 | ||||
|   lin 32 | ||||
|   lol 0 | ||||
|   stl -226               ; make copy of address of r | ||||
|   lol -22 | ||||
|   lol -226 | ||||
|   stf 10                 ; r3 := b | ||||
|   lal -20 | ||||
|   loi 16 | ||||
|   adf 8 | ||||
|   lol -226 | ||||
|   adp 2 | ||||
|   sti 8                  ; r2 := x + y | ||||
|   loc 0 | ||||
|   sil -226               ; r1 := 0 | ||||
|   lin 34                 ; note the abscence of the unnecesary jump | ||||
|   lae 22                 ; address of output structure | ||||
|   lol -4 | ||||
|   cal $_wri              ; write integer with default width | ||||
|   asp 4                  ; pop parameters | ||||
|   lae 22 | ||||
|   lol -2 | ||||
|   loc 6 | ||||
|   cal $_wsi              ; write integer width 6 | ||||
|   asp 6 | ||||
|   lae 22 | ||||
|   lal -12 | ||||
|   loi 8 | ||||
|   loc 9 | ||||
|   loc 3 | ||||
|   cal $_wrf              ; write fixed format real, width 9, precision 3 | ||||
|   asp 14 | ||||
|   lae 22 | ||||
|   lol -22 | ||||
|   cal $_wrb              ; write boolean, default width | ||||
|   asp 4 | ||||
|   lae 22 | ||||
|   cal $_wln              ; writeln | ||||
|   asp 2 | ||||
|   ret 0                  ; return, no result | ||||
|   end 226 | ||||
|   exp $_main | ||||
|   pro $_main,0           ; main program | ||||
|  .6 | ||||
|   con 2,-1,22            ; description of external files | ||||
|  .5 | ||||
|   rom 15.96F8 | ||||
|   fil .1                 ; maintain source file name | ||||
|   lae .6                 ; description of external files | ||||
|   lae 0                  ; base of hol area to relocate buffer addresses | ||||
|   cal $_ini              ; initialize files, etc... | ||||
|   asp 4 | ||||
|   lin 37 | ||||
|   lae .5 | ||||
|   loi 8 | ||||
|   lae 2 | ||||
|   sti 8                  ; mx := 15.96 | ||||
|   lni                    ; lin 38 prior to optimization | ||||
|   loc 99 | ||||
|   ste 0                  ; mi := 99 | ||||
|   lni                    ; lin 39 prior to optimization | ||||
|   lae 10                 ; address of r | ||||
|   cal $test | ||||
|   asp 2 | ||||
|   loc 0                  ; normal exit | ||||
|   cal $_hlt              ; cleanup and finish | ||||
|   asp 2 | ||||
|   end 0 | ||||
|   mes 5                  ; reals were used | ||||
| .DE 0 | ||||
| The compact code corresponding to the above program is listed below. | ||||
| Read it horizontally, line by line, not column by column. | ||||
| Each number represents a byte of compact code, printed in decimal. | ||||
| The first two bytes form the magic word. | ||||
| .N 1 | ||||
| .IS 3 | ||||
| .DS B | ||||
| 173   0 159 122 122 122 255 242   1 161 250 124 116  46 112   0 | ||||
| 255 156 245  40   2 245   0 128 120 155 249 123 115 117 109 160 | ||||
| 249 123 115 117 109 122  67 128  63 120   3 122  88 122 152 122 | ||||
| 242   2 161 121 219 122 255 155 249 124 116 101 115 116 160 249 | ||||
| 124 116 101 115 116 245 226   0 242   3 161 253 128 123  52  46 | ||||
|  56 255 242   4 161 253 128 123  48  46  53 255 159 123 245  30 | ||||
| 255 122 122 255 159 123  96 122 120 255 159 123  98 122 120 255 | ||||
| 159 123 116 122 120 255 159 123 118 122 120 255 159 123 100 128 | ||||
| 120 255 159 123 108 128 120 255  67 140  69 121 113 116  68  73 | ||||
| 116  69 123  81 122  69 126   3 122 113 118  68  57 242   3  72 | ||||
| 128  58 108 112 128  68  58 108  72 128  57 242   4  72 128  44 | ||||
| 128  58 100 112 128  68  69 121 113  98  68  69 245 122   0 113 | ||||
|  96  68  69 121 113 118 182  73 118  42 122  81 122  58 245  32 | ||||
| 255  73 118  57 242   2  94 122  73 118  69 220  10 123  54 118 | ||||
|  18 122 183  67 147  73 116  69 147   3 122 104 120  68  73  98 | ||||
|  73 120 111 130  68  58 100  72 136   2 128  73 120   4 122 112 | ||||
| 128  68  58 245  32 255  73 116  57 242   2  59 122  65 120  20 | ||||
| 249 123 115 117 109   8 124  64 122 113 118 184  67 151  73 118 | ||||
| 128 125  73 116  65 120   3 122 113 116  41 118  18 124 185  67 | ||||
| 152  73 120 113 245  30 255  73  98  73 245  30 255 111 130  58 | ||||
| 100  72 136   2 128  73 245  30 255   4 122 112 128  69 120 104 | ||||
| 245  30 255  67 154  57 142  73 116  20 249 124  95 119 114 105 | ||||
|   8 124  57 142  73 118  69 126  20 249 124  95 119 115 105   8 | ||||
| 126  57 142  58 108  72 128  69 129  69 123  20 249 124  95 119 | ||||
| 114 102   8 134  57 142  73  98  20 249 124  95 119 114  98   8 | ||||
| 124  57 142  20 249 124  95 119 108 110   8 122  88 120 152 245 | ||||
| 226   0 155 249 125  95 109  97 105 110 160 249 125  95 109  97 | ||||
| 105 110 120 242   6 151 122 119 142 255 242   5 161 253 128 125 | ||||
|  49  53  46  57  54 255  50 242   1  57 242   6  57 120  20 249 | ||||
| 124  95 105 110 105   8 124  67 157  57 242   5  72 128  57 122 | ||||
| 112 128  68  69 219 110 120  68  57 130  20 249 124 116 101 115 | ||||
| 116   8 122  69 120  20 249 124  95 104 108 116   8 122 152 120 | ||||
| 159 124 160 255 159 125 255 | ||||
| .DE 0 | ||||
| .IE | ||||
| .MS T A 0 | ||||
| .ME | ||||
| .BP | ||||
| .MS B A 0 | ||||
| .ME | ||||
| .CT | ||||
							
								
								
									
										756
									
								
								doc/em/assem.nr
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										756
									
								
								doc/em/assem.nr
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,756 @@ | |||
| .BP | ||||
| .SN 11 | ||||
| .S1 "EM ASSEMBLY LANGUAGE" | ||||
| We use two representations for assembly language programs, | ||||
| one is in ASCII and the other is the compact assembly language. | ||||
| The latter needs less space than the first for the same program | ||||
| and therefore allows faster processing. | ||||
| Our only program accepting ASCII assembly | ||||
| language converts it to the compact form. | ||||
| All other programs expect compact assembly input. | ||||
| The first part of the chapter describes the ASCII assembly | ||||
| language and its semantics. | ||||
| The second part describes the syntax of the compact assembly | ||||
| language. | ||||
| The last part lists the EM instructions with the type of | ||||
| arguments allowed and an indication of the function. | ||||
| Appendix A gives a detailed description of the effect of all | ||||
| instructions in the form of a Pascal program. | ||||
| .S2 "ASCII assembly language" | ||||
| An assembly language program consists of a series of lines, each | ||||
| line may be blank, contain one (pseudo)instruction or contain one | ||||
| label. | ||||
| Input to the assembler is in lower case. | ||||
| Upper case is used in this | ||||
| document merely to distinguish keywords from the surrounding prose. | ||||
| Comment is allowed at the end of each line and starts with a semicolon ";". | ||||
| This kind of comment does not exist in the compact form. | ||||
| .A | ||||
| Labels must be placed all by themselves on a line and start in | ||||
| column 1. | ||||
| There are two kinds of labels, instruction and data labels. | ||||
| Instruction labels are unsigned positive integers. | ||||
| The scope of an instruction label is its procedure. | ||||
| .A | ||||
| The pseudoinstructions CON, ROM and BSS may be preceded by a | ||||
| line containing a | ||||
| 1-8 character data label, the first character of which is a | ||||
| letter, period or underscore. | ||||
| The period may only be followed by | ||||
| digits, the others may be followed by letters, digits and underscores. | ||||
| The use of the character "." followed by a constant, | ||||
| which must be in the range 1 to 32767 (e.g. ".40") is recommended | ||||
| for compiler | ||||
| generated programs. | ||||
| These labels are considered as a special case and handled | ||||
| more efficiently in compact assembly language (see below). | ||||
| Note that a data label on its own or two consecutive labels are not | ||||
| allowed. | ||||
| .P | ||||
| Each statement may contain an instruction mnemonic or pseudoinstruction. | ||||
| These must begin in column 2 or later (not column 1) and must be followed | ||||
| by a space, tab, semicolon or LF. | ||||
| Everything on the line following a semicolon is | ||||
| taken as a comment. | ||||
| .P | ||||
| Each input file contains one module. | ||||
| A module may contain many procedures, | ||||
| which may be nested. | ||||
| A procedure consists of | ||||
| a PRO statement, a (possibly empty) | ||||
| collection of instructions and pseudoinstructions and finally an END | ||||
| statement. | ||||
| Pseudoinstructions are also allowed between procedures. | ||||
| They do not belong to a specific procedure. | ||||
| .P | ||||
| All constants in EM are interpreted in the decimal base. | ||||
| The ASCII assembly language accepts constant expressions | ||||
| wherever constants are allowed. | ||||
| The operators recognized are: +, -, *, % and / with the usual | ||||
| precedence order. | ||||
| Use of the parentheses ( and ) to alter the precedence order is allowed. | ||||
| .S3 "Instruction arguments" | ||||
| Unlike many other assembly languages, the EM assembly | ||||
| language requires all arguments of normal and pseudoinstructions | ||||
| to be either a constant or an identifier, but not a combination | ||||
| of these two. | ||||
| There is one exception to this rule: when a data label is used | ||||
| for initialization or as an instruction argument, | ||||
| expressions of the form 'label+constant' and 'label-constant' | ||||
| are allowed. | ||||
| This makes it possible to address, for example, the | ||||
| third word of a ten word BSS block | ||||
| directly. | ||||
| Thus LOE LABEL+4 is permitted and so is CON LABEL+3. | ||||
| The resulting address is must be in the same fragment as the label. | ||||
| It is not allowed to add or subtract from instruction labels or procedure | ||||
| identifiers, | ||||
| which certainly is not a severe restriction and greatly aids | ||||
| optimization. | ||||
| .P | ||||
| Instruction arguments can be constants, | ||||
| data labels, data labels offsetted by a constant, instruction | ||||
| labels and procedure identifiers. | ||||
| The range of integers allowed depends on the instruction. | ||||
| Most instructions allow only integers | ||||
| (signed or unsigned) | ||||
| that fit in a word. | ||||
| Arguments used as offsets to pointers should fit in a | ||||
| pointer-sized integer. | ||||
| Finally, arguments to LDC should fit in a double-word integer. | ||||
| .P | ||||
| Several instructions have two possible forms: | ||||
| with an explicit argument and with an implicit argument on top of the stack. | ||||
| The size of the implicit argument is the wordsize. | ||||
| The implicit argument is always popped before all other operands. | ||||
| For example: 'CMI 4' specifies that two four-byte signed | ||||
| integers on top of the stack are to be compared. | ||||
| \&'CMI' without an argument expects a wordsized integer | ||||
| on top of the stack that specifies the size of the integers to | ||||
| be compared. | ||||
| Thus the following two sequences are equivalent: | ||||
| .N 2 | ||||
| .TS | ||||
| center, tab(:) ; | ||||
| l r 30 l r. | ||||
| LDL:-10:LDL:-10 | ||||
| LDL:-14:LDL:-14 | ||||
| ::LOC:4 | ||||
| CMI:4:CMI: | ||||
| ZEQ:*1:ZEQ:*1 | ||||
| .TE 2 | ||||
| Section 11.1.6 shows the arguments allowed for each instruction. | ||||
| .S3 "Pseudoinstruction arguments" | ||||
| Pseudoinstruction arguments can be divided in two classes: | ||||
| Initializers and others. | ||||
| The following initializers are allowed: signed integer constants, | ||||
| unsigned integer constants, floating-point constants, strings, | ||||
| data labels, data labels offsetted by a constant, instruction | ||||
| labels and procedure identifiers. | ||||
| .P | ||||
| Constant initializers in BSS, HOL, CON and ROM pseudoinstructions | ||||
| can be followed by a letter I, U or F. | ||||
| This indicator | ||||
| specifies the type of the initializer: Integer, Unsigned or Float. | ||||
| If no indicator is present I is assumed. | ||||
| The size of the object is the wordsize unless | ||||
| the indicator is followed by an integer specifying the | ||||
| object's size. | ||||
| This integer is governed by the same restrictions as for | ||||
| transfer of objects to/from memory. | ||||
| As in instruction arguments, initializers include expressions of the form: | ||||
| \&"LABEL+offset" and "LABEL-offset". | ||||
| The offset must be an unsigned decimal constant. | ||||
| The 'IUF' indicators cannot be used in the offsets. | ||||
| .P | ||||
| Data labels are referred to by their name. | ||||
| .P | ||||
| Strings are surrounded by double quotes ("). | ||||
| Semecolon's in string do not indicate the start of comment. | ||||
| In the ASCII representation the escape character \e (backslash) | ||||
| alters the meaning of subsequent character(s). | ||||
| This feature allows inclusion of zeroes, graphic characters and | ||||
| the double quote in the string. | ||||
| The following escape sequences exist: | ||||
| .DS | ||||
| .TS | ||||
| center, tab(:); | ||||
| l l l. | ||||
| newline:NL\|(LF):\en | ||||
| horizontal tab:HT:\et | ||||
| backspace:BS:\eb | ||||
| carriage return:CR:\er | ||||
| form feed:FF:\ef | ||||
| backslash:\e:\e\e | ||||
| double quote:":\e" | ||||
| bit pattern:\fBddd\fP:\e\fBddd\fP | ||||
| .TE | ||||
| .DE | ||||
| The escape \fBddd\fP consists of the backslash followed by 1, | ||||
| 2, or 3 octal digits specifing the value of | ||||
| the desired character. | ||||
| If the character following a backslash is not one of those | ||||
| specified, | ||||
| the backslash is ignored. | ||||
| Example: CON "hello\e012\e0". | ||||
| Each string element initializes a single byte. | ||||
| The ASCII character set is used to map characters onto values. | ||||
| Strings are padded with zeroes up to a multiple of the wordsize. | ||||
| .P | ||||
| Instruction labels are referred to as *1, *2, etc.  in both branch | ||||
| instructions and as initializers. | ||||
| .P | ||||
| The notation $procname means the identifier for the procedure | ||||
| with the specified name. | ||||
| This identifier has the size of a pointer. | ||||
| .S3 Notation | ||||
| First, the notation used for the arguments, classes of | ||||
| instructions and pseudoinstructions. | ||||
| .IS 2 | ||||
| .TS | ||||
| tab(:); | ||||
| l l l. | ||||
| <cst>:\&=:integer constant (current range -2**31..2**31-1) | ||||
| <dlb>:\&=:data label | ||||
| <arg>:\&=:<cst> or <dlb> or <dlb>+<cst> or <dlb>-<cst> | ||||
| <con>:\&=:integer constant, unsigned constant, floating-point constant | ||||
| <str>:\&=:string constant (surrounded by double quotes), | ||||
| <ilb>:\&=:instruction label | ||||
| ::'*' followed by an integer in the range 0..32767. | ||||
| <pro>:\&=:procedure number ('$' followed by a procedure name) | ||||
| <val>:\&=:<arg>, <con>, <pro> or <ilb>. | ||||
| <par>:\&=:<val> or <str> | ||||
| <...>*:\&=:zero or more of <...> | ||||
| <...>+:\&=:one or more of <...> | ||||
| [...]:\&=:optional ... | ||||
| .TE | ||||
| .IE | ||||
| .S3 "Pseudoinstructions" | ||||
| .S4 Storage declaration | ||||
| Initialized global data is allocated by the pseudoinstruction CON, | ||||
| which needs at least one argument. | ||||
| For each argument, an integral number of words, | ||||
| determined by the argument type, is allocated and initialized. | ||||
| .P | ||||
| The pseudoinstruction ROM is the same as CON, | ||||
| except that it guarantees that the initialized words | ||||
| will not change during the execution of the program. | ||||
| This information allows optimizers to do | ||||
| certain calculations such as array indexing and | ||||
| subrange checking at compile time instead | ||||
| of at run time. | ||||
| .P | ||||
| The pseudoinstruction BSS allocates | ||||
| uninitialized global data or large blocks of data initialized | ||||
| by the same value. | ||||
| The first argument to this pseudo is the number | ||||
| of bytes required, which must be a multiple of the wordsize. | ||||
| The other arguments specify the value used for initialization and | ||||
| whether the initialization is only for convenience or a strict necessity. | ||||
| The pseudoinstruction HOL is similar to BSS in that it requests an | ||||
| (un)initialized global data block. | ||||
| Addressing of a HOL block, however, is quasi absolute. | ||||
| The first byte is addressed by 0, | ||||
| the second byte by 1 etc. in assembly language. | ||||
| The assembler/loader adds the base address of | ||||
| the HOL block to these numbers to obtain the | ||||
| absolute address in the machine language. | ||||
| .P | ||||
| The scope of a HOL block starts at the HOL pseudo and | ||||
| ends at the next HOL pseudo or at the end of a module | ||||
| whatever comes first. | ||||
| Each instruction falls in the scope of at most one | ||||
| HOL block, the current HOL block. | ||||
| It is not allowed to have more than one HOL block per procedure. | ||||
| .P | ||||
| The alignment restrictions are enforced by the | ||||
| pseudoinstructions. | ||||
| All objects are aligned on a multiple of their size or the wordsize | ||||
| whichever is smaller. | ||||
| Switching to another type of fragment or placing a label forces | ||||
| word-alignment. | ||||
| There are three types of fragments in global data space: CON, ROM and | ||||
| BSS/HOL. | ||||
| .N 2 | ||||
| .IS 2 | ||||
| .PS - 4 | ||||
| .PT "BSS <cst1>,<val>,<cst2>" | ||||
| Reserve <cst1> bytes. | ||||
| <val> is the value used to initialize the area. | ||||
| <cst1> must be a multiple of the size of <val>. | ||||
| <cst2> is 0 if the initialization is not strictly necessary, | ||||
| 1 if it is. | ||||
| .PT "HOL <cst1>,<val>,<cst2>" | ||||
| Idem, but all following absolute global data references will | ||||
| refer to this block. | ||||
| Only one HOL is allowed per procedure, | ||||
| it has to be placed before the first instruction. | ||||
| .PT "CON <val>+" | ||||
| Assemble global data words initialized with the <val> constants. | ||||
| .PT "ROM <val>+" | ||||
| Idem, but the initialized data will never be changed by the program. | ||||
| .PE | ||||
| .IE | ||||
| .S4 Partitioning | ||||
| Two pseudoinstructions partition the input into procedures: | ||||
| .IS 2 | ||||
| .PS - 4 | ||||
| .PT "PRO <pro>[,<cst>]" | ||||
| Start of procedure. | ||||
| <pro> is the procedure name. | ||||
| <cst> is the number of bytes for locals. | ||||
| The number of bytes for locals must be specified in the PRO or | ||||
| END pseudoinstruction. | ||||
| When specified in both, they must be identical. | ||||
| .PT "END  [<cst>]" | ||||
| End of Procedure. | ||||
| <cst> is the number of bytes for locals. | ||||
| The number of bytes for locals must be specified in either the PRO or | ||||
| END pseudoinstruction or both. | ||||
| .PE | ||||
| .IE | ||||
| .S4 Visibility | ||||
| Names of data and procedures in an EM module can either be | ||||
| internal or external. | ||||
| External names are known outside the module and are used to link | ||||
| several pieces of a program. | ||||
| Internal names are not known outside the modules they are used in. | ||||
| Other modules will not 'see' an internal name. | ||||
| .A | ||||
| To reduce the number of passes needed, | ||||
| it must be known at the first occurrence whether | ||||
| a name is internal or external. | ||||
| If the first occurrence of a name is in a definition, | ||||
| the name is considered to be internal. | ||||
| If the first occurrence of a name is a reference, | ||||
| the name is considered to be external. | ||||
| If the first occurrence is in one of the following pseudoinstructions, | ||||
| the effect of the pseudo has precedence. | ||||
| .IS 2 | ||||
| .PS - 4 | ||||
| .PT "EXA <dlb>" | ||||
| External name. | ||||
| <dlb> is known, possibly defined, outside this module. | ||||
| Note that <dlb> may be defined in the same module. | ||||
| .PT "EXP <pro>" | ||||
| External procedure identifier. | ||||
| Note that <pro> may be defined in the same module. | ||||
| .PT "INA <dlb>" | ||||
| Internal name. | ||||
| <dlb> is internal to this module and must be defined in this module. | ||||
| .PT "INP <pro>" | ||||
| Internal procedure. | ||||
| <pro> is internal to this module and must be defined in this module. | ||||
| .PE | ||||
| .IE | ||||
| .S4 Miscellaneous | ||||
| Two other pseudoinstructions provide miscellaneous features: | ||||
| .IS 2 | ||||
| .PS - 4 | ||||
| .PT "EXC <cst1>,<cst2>" | ||||
| Two blocks of instructions preceding this one are | ||||
| interchanged before being processed. | ||||
| <cst1> gives the number of lines of the first block. | ||||
| <cst2> gives the number of lines of the second one. | ||||
| Blank and pure comment lines do not count. | ||||
| .PT "MES <cst>[,<par>]*" | ||||
| A special type of comment. | ||||
| Used by compilers to communicate with the | ||||
| optimizer, assembler, etc. as follows: | ||||
| .VS 1 0 | ||||
| .PS - 4 | ||||
| .PT "MES 0" | ||||
| An error has occurred, stop further processing. | ||||
| .PT "MES 1" | ||||
| Suppress optimization. | ||||
| .PT "MES 2,<cst1>,<cst2>" | ||||
| Use wordsize <cst1> and pointer size <cst2>. | ||||
| .PT "MES 3,<cst1>,<cst2>,<cst3>,<cst4>" | ||||
| Indicates that a local variable is never referenced indirectly. | ||||
| Used to indicate that a register may be used for a specific | ||||
| variable. | ||||
| <cst1> is offset in bytes from AB if positive | ||||
| and offset from LB if negative. | ||||
| <cst2> gives the size of the variable. | ||||
| <cst3> indicates the class of the variable. | ||||
| The following values are currently recognized: | ||||
| .PS | ||||
| .PT 0 | ||||
| The variable can be used for anything. | ||||
| .PT 1 | ||||
| The variable is used as a loopindex. | ||||
| .PT 2 | ||||
| The variable is used as a pointer. | ||||
| .PT 3 | ||||
| The variable is used as a floating point number. | ||||
| .PE 0 | ||||
| <cst4> gives the priority of the variable, | ||||
| higher numbers indicate better candidates. | ||||
| .PT "MES 4,<cst>,<str>" | ||||
| Number of source lines in file <str> (for profiler). | ||||
| .PT "MES 5" | ||||
| Floating point used. | ||||
| .PT "MES 6,<val>*" | ||||
| Comment.  Used to provide comments in compact assembly language. | ||||
| .PT "MES 7,....." | ||||
| Reserved. | ||||
| .PT "MES 8,<pro>[,<dlb>]..." | ||||
| Library module. Indicates that the module may only be loaded | ||||
| if it is useful, that is, if it can satisfy any unresolved | ||||
| references during the loading process. | ||||
| May not be preceded by any other pseudo, except MES's. | ||||
| .PT "MES 9,<cst>" | ||||
| Guarantees that no more than <cst> bytes of parameters are | ||||
| accessed, either directly or indirectly. | ||||
| .PE 1 | ||||
| .VS 1 1 | ||||
| Each backend is free to skip irrelevant MES pseudos. | ||||
| .PE | ||||
| .IE | ||||
| .S2 "The Compact Assembly Language" | ||||
| The assembler accepts input in a highly encoded form. | ||||
| This | ||||
| form is intended to reduce the amount of file transport between the | ||||
| front ends, optimizers | ||||
| and back ends, and also reduces the amount of storage required for storing | ||||
| libraries. | ||||
| Libraries are stored as archived compact assembly language, not machine | ||||
| language. | ||||
| .P | ||||
| When beginning to read the input, the assembler is in neutral state, and | ||||
| expects either a label or an instruction (including the pseudoinstructions). | ||||
| The meaning of the next byte(s) when in neutral state is as follows, where | ||||
| b1, b2 | ||||
| etc. represent the succeeding bytes. | ||||
| .N 1 | ||||
| .DS | ||||
| .TS | ||||
| tab(:) ; | ||||
| rw17 4 l. | ||||
| 0:Reserved for future use | ||||
| 1-129:Machine instructions, see Appendix A, alphabetical list | ||||
| 130-149:Reserved for future use | ||||
| 150-161:BSS,CON,END,EXA,EXC,EXP,HOL,INA,INP,MES,PRO,ROM | ||||
| 162-179:Reserved for future pseudoinstructions | ||||
| 180-239:Instruction labels 0 - 59  (180 is local label 0 etc.) | ||||
| 240-244:See the Common Table below | ||||
| 245-255:Not used | ||||
| .TE 1 | ||||
| .DE 0 | ||||
| After a label, the assembler is back in neutral state; it can immediately | ||||
| accept another label or an instruction in the next byte. | ||||
| No linefeeds are used to separate lines. | ||||
| .P | ||||
| If an opcode expects no arguments, | ||||
| the assembler is back in neutral state after | ||||
| reading the one byte containing the instruction number. | ||||
| If it has one or | ||||
| more arguments (only pseudos have more than 1), the arguments follow directly, | ||||
| encoded as follows: | ||||
| .N 1 | ||||
| .IS 2 | ||||
| .TS | ||||
| tab(:); | ||||
| r l. | ||||
| 0-239:Offsets from -120 to 119 | ||||
| 
 | ||||
| 240-255:See the Common Table below | ||||
| .TE 1 | ||||
| Absence of an optional argument is indicated by a special | ||||
| byte. | ||||
| .IE 2 | ||||
| .CS | ||||
| Common Table for Neutral State and Arguments | ||||
| .CE | ||||
| .TS | ||||
| tab(:); | ||||
| c c s c | ||||
| l8 l l8 l. | ||||
| class:bytes:description | ||||
| 
 | ||||
| <ilb>:240:b1:Instruction label b1  (Not used for branches) | ||||
| <ilb>:241:b1 b2:16 bit instruction label  (256*b2 + b1) | ||||
| <dlb>:242:b1:Global label .0-.255, with b1 being the label | ||||
| <dlb>:243:b1 b2:Global label .0-.32767 | ||||
| :::with 256*b2+b1 being the label | ||||
| <dlb>:244:<string>:Global symbol not of the form .nnn | ||||
| <cst>:245:b1 b2:16 bit constant | ||||
| <cst>:246:b1 b2 b3 b4:32 bit constant | ||||
| <cst>:247:b1 .. b8:64 bit constant | ||||
| <arg>:248:<dlb><cst>:Global label + (possibly negative) constant | ||||
| <pro>:249:<string>:Procedure name  (not including $) | ||||
| <str>:250:<string>:String used in CON or ROM (no quotes-no escapes) | ||||
| <con>:251:<cst><string>:Integer constant, size <cst> bytes | ||||
| <con>:252:<cst><string>:Unsigned constant, size <cst> bytes | ||||
| <con>:253:<cst><string>:Floating constant, size <cst> bytes | ||||
| :254::unused | ||||
| <end>:255::Delimiter for argument lists or | ||||
| :::indicates absence of optional argument | ||||
| .TE 1 | ||||
| .P | ||||
| The bytes specifying the value of a 16, 32 or 64 bit constant | ||||
| are presented in two's complement notation, with the least | ||||
| significant byte first. For example: the value of a 32 bit | ||||
| constant is ((s4*256+b3)*256+b2)*256+b1, where s4 is b4-256 if | ||||
| b4 is greater than 128 else s4 takes the value of b4. | ||||
| A <string> consists of a <cst> inmediatly followed by | ||||
| a sequence of bytes with length <cst>. | ||||
| .P | ||||
| .ne 8 | ||||
| The pseudoinstructions fall into several categories, depending on their | ||||
| arguments: | ||||
| .N 1 | ||||
| .DS | ||||
|  Group 1 -- EXC, BSS, HOL have a known number of arguments | ||||
|  Group 2 -- EXA, EXP, INA, INP have a string as argument | ||||
|  Group 3 -- CON, MES, ROM have a variable number of various things | ||||
|  Group 4 -- END, PRO have a trailing optional argument. | ||||
| .DE 1 | ||||
| Groups 1 and 2 | ||||
| use the encoding described above. | ||||
| Group 3 also uses the encoding listed above, with an <end> byte after the | ||||
| last argument to indicate the end of the list. | ||||
| Group 4 uses | ||||
| an <end> byte if the trailing argument is not present. | ||||
| .N 2 | ||||
| .IS 2 | ||||
| .TS | ||||
| tab(|); | ||||
| l s l | ||||
| l s s | ||||
| l 2 lw(46) l. | ||||
| Example  ASCII|Example compact | ||||
| (LOC = 69, BRA = 18 here): | ||||
| 
 | ||||
| 2||182 | ||||
| 1||181 | ||||
|  LOC|10|69 130 | ||||
|  LOC|-10|69 110 | ||||
|  LOC|300|69 245 44 1 | ||||
|  BRA|*19|18 139 | ||||
| 300||241 44 1 | ||||
| .3||242 3 | ||||
|  CON|4,9,*2,$foo|151 124 129 240 2 249 123 102 111 111 255 | ||||
|  CON|.35|151 242 35 255 | ||||
| .TE 0 | ||||
| .IE 0 | ||||
| .BP | ||||
| .S2 "Assembly language instruction list" | ||||
| .P | ||||
| For each instruction in the list the range of argument values | ||||
| in the assembly language is given. | ||||
| The column headed \fIassem\fP contains the mnemonics defined | ||||
| in 11.1.3. | ||||
| The following column specifies restrictions of the argument | ||||
| value. | ||||
| Addresses have to obey the restrictions mentioned in chapter 2. | ||||
| The classes of arguments | ||||
| are indicated by letters: | ||||
| .ds b \fBb\fP | ||||
| .ds c \fBc\fP | ||||
| .ds d \fBd\fP | ||||
| .ds g \fBg\fP | ||||
| .ds f \fBf\fP | ||||
| .ds l \fBl\fP | ||||
| .ds n \fBn\fP | ||||
| .ds w \fBw\fP | ||||
| .ds p \fBp\fP | ||||
| .ds r \fBr\fP | ||||
| .ds s \fBs\fP | ||||
| .ds z \fBz\fP | ||||
| .ds o \fBo\fP | ||||
| .ds - \fB-\fP | ||||
| .N 1 | ||||
| .TS | ||||
| tab(:); | ||||
| c s l l | ||||
| l l 15 l l. | ||||
| \fIassem\fP:constraints:rationale | ||||
| 
 | ||||
| \&\*c:cst:fits word:constant | ||||
| \&\*d:cst:fits double word:constant | ||||
| \&\*l:cst::local offset | ||||
| \&\*g:arg:>= 0:global offset | ||||
| \&\*f:cst::fragment offset | ||||
| \&\*n:cst:>= 0:counter | ||||
| \&\*s:cst:>0 , word multiple:object size | ||||
| \&\*z:cst:>= 0 , zero or word multiple:object size | ||||
| \&\*o:cst:>= 0 , word multiple or fraction:object size | ||||
| \&\*w:cst:> 0 , word multiple:object size * | ||||
| \&\*p:pro::pro identifier | ||||
| \&\*b:ilb:>= 0:label number | ||||
| \&\*r:cst:0,1,2:register number | ||||
| \&\*-:::no argument | ||||
| .TE 1 | ||||
| .P | ||||
| The * at the rationale for \*w indicates that the argument | ||||
| can either be given as argument or on top of the stack. | ||||
| If the argument is omitted, the argument is fetched from the | ||||
| stack; | ||||
| it is assumed to be a wordsized unsigned integer. | ||||
| Instructions that check for undefined integer or floating-point | ||||
| values and underflow or overflow | ||||
| are indicated below by (*). | ||||
| .N 1 | ||||
| .DS B | ||||
| GROUP 1 - LOAD | ||||
| 
 | ||||
|   LOC \*c : Load constant (i.e. push one word onto the stack) | ||||
|   LDC \*d : Load double constant ( push two words ) | ||||
|   LOL \*l : Load word at \*l-th local (\*l<0) or parameter (\*l>=0) | ||||
|   LOE \*g : Load external word \*g | ||||
|   LIL \*l : Load word pointed to by \*l-th local or parameter | ||||
|   LOF \*f : Load offsetted (top of stack + \*f yield address) | ||||
|   LAL \*l : Load address of local or parameter | ||||
|   LAE \*g : Load address of external | ||||
|   LXL \*n : Load lexical (address of LB \*n static levels back) | ||||
|   LXA \*n : Load lexical (address of AB \*n static levels back) | ||||
|   LOI \*o : Load indirect \*o bytes (address is popped from the stack) | ||||
|   LOS \*w : Load indirect, \*w-byte integer on top of stack gives object size | ||||
|   LDL \*l : Load double local or parameter (two consecutive words are stacked) | ||||
|   LDE \*g : Load double external (two consecutive externals are stacked) | ||||
|   LDF \*f : Load double offsetted (top of stack + \*f yield address) | ||||
|   LPI \*p : Load procedure identifier | ||||
| 
 | ||||
| GROUP 2 - STORE | ||||
| 
 | ||||
|   STL \*l : Store local or parameter | ||||
|   STE \*g : Store external | ||||
|   SIL \*l : Store into word pointed to by \*l-th local or parameter | ||||
|   STF \*f : Store offsetted | ||||
|   STI \*o : Store indirect \*o bytes (pop address, then data) | ||||
|   STS \*w : Store indirect, \*w-byte integer on top of stack gives object size | ||||
|   SDL \*l : Store double local or parameter | ||||
|   SDE \*g : Store double external | ||||
|   SDF \*f : Store double offsetted | ||||
| 
 | ||||
| GROUP 3 - INTEGER ARITHMETIC | ||||
| 
 | ||||
|   ADI \*w : Addition (*) | ||||
|   SBI \*w : Subtraction (*) | ||||
|   MLI \*w : Multiplication (*) | ||||
|   DVI \*w : Division (*) | ||||
|   RMI \*w : Remainder (*) | ||||
|   NGI \*w : Negate (two's complement) (*) | ||||
|   SLI \*w : Shift left (*) | ||||
|   SRI \*w : Shift right (*) | ||||
| 
 | ||||
| GROUP 4 - UNSIGNED ARITHMETIC | ||||
| 
 | ||||
|   ADU \*w : Addition | ||||
|   SBU \*w : Subtraction | ||||
|   MLU \*w : Multiplication | ||||
|   DVU \*w : Division | ||||
|   RMU \*w : Remainder | ||||
|   SLU \*w : Shift left | ||||
|   SRU \*w : Shift right | ||||
| 
 | ||||
| GROUP 5 - FLOATING POINT ARITHMETIC | ||||
| 
 | ||||
|   ADF \*w : Floating add (*) | ||||
|   SBF \*w : Floating subtract (*) | ||||
|   MLF \*w : Floating multiply (*) | ||||
|   DVF \*w : Floating divide (*) | ||||
|   NGF \*w : Floating negate (*) | ||||
|   FIF \*w : Floating multiply and split integer and fraction part (*) | ||||
|   FEF \*w : Split floating number in exponent and fraction part (*) | ||||
| 
 | ||||
| GROUP 6 - POINTER ARITHMETIC | ||||
| 
 | ||||
|   ADP \*f : Add \*f to pointer on top of stack | ||||
|   ADS \*w : Add \*w-byte value and pointer | ||||
|   SBS \*w : Subtract pointers in same fragment and push diff as size \*w integer | ||||
| 
 | ||||
| GROUP 7 - INCREMENT/DECREMENT/ZERO | ||||
| 
 | ||||
|   INC \*- : Increment word on top of stack by 1 (*) | ||||
|   INL \*l : Increment local or parameter (*) | ||||
|   INE \*g : Increment external (*) | ||||
|   DEC \*- : Decrement word on top of stack by 1 (*) | ||||
|   DEL \*l : Decrement local or parameter (*) | ||||
|   DEE \*g : Decrement external (*) | ||||
|   ZRL \*l : Zero local or parameter | ||||
|   ZRE \*g : Zero external | ||||
|   ZRF \*w : Load a floating zero of size \*w | ||||
|   ZER \*w : Load \*w zero bytes | ||||
| 
 | ||||
| GROUP 8 - CONVERT    (stack: source, source size, dest. size (top)) | ||||
| 
 | ||||
|   CII \*- : Convert integer to integer (*) | ||||
|   CUI \*- : Convert unsigned to integer (*) | ||||
|   CFI \*- : Convert floating to integer (*) | ||||
|   CIF \*- : Convert integer to floating (*) | ||||
|   CUF \*- : Convert unsigned to floating (*) | ||||
|   CFF \*- : Convert floating to floating (*) | ||||
|   CIU \*- : Convert integer to unsigned | ||||
|   CUU \*- : Convert unsigned to unsigned | ||||
|   CFU \*- : Convert floating to unsigned | ||||
| 
 | ||||
| GROUP 9 - LOGICAL | ||||
| 
 | ||||
|   AND \*w : Boolean and on two groups of \*w bytes | ||||
|   IOR \*w : Boolean inclusive or on two groups of \*w bytes | ||||
|   XOR \*w : Boolean exclusive or on two groups of \*w bytes | ||||
|   COM \*w : Complement (one's complement of top \*w bytes) | ||||
|   ROL \*w : Rotate left a group of \*w bytes | ||||
|   ROR \*w : Rotate right a group of \*w bytes | ||||
| 
 | ||||
| GROUP 10 - SETS | ||||
| 
 | ||||
|   INN \*w : Bit test on \*w byte set (bit number on top of stack) | ||||
|   SET \*w : Create singleton \*w byte set with bit n on (n is top of stack) | ||||
| 
 | ||||
| GROUP 11 - ARRAY | ||||
| 
 | ||||
|   LAR \*w : Load array element, descriptor contains integers of size \*w | ||||
|   SAR \*w : Store array element | ||||
|   AAR \*w : Load address of array element | ||||
| 
 | ||||
| GROUP 12 - COMPARE | ||||
| 
 | ||||
|   CMI \*w : Compare \*w byte integers, Push negative, zero, positive for <, = or > | ||||
|   CMF \*w : Compare \*w byte reals | ||||
|   CMU \*w : Compare \*w byte unsigneds | ||||
|   CMS \*w : Compare \*w byte values, can only be used for bit for bit equality test | ||||
|   CMP \*- : Compare pointers | ||||
| 
 | ||||
|   TLT \*- : True if less, i.e. iff top of stack < 0 | ||||
|   TLE \*- : True if less or equal, i.e. iff top of stack <= 0 | ||||
|   TEQ \*- : True if equal, i.e. iff top of stack = 0 | ||||
|   TNE \*- : True if not equal, i.e. iff top of stack non zero | ||||
|   TGE \*- : True if greater or equal, i.e. iff top of stack >= 0 | ||||
|   TGT \*- : True if greater, i.e. iff top of stack > 0 | ||||
| 
 | ||||
| GROUP 13 - BRANCH | ||||
| 
 | ||||
|   BRA \*b : Branch unconditionally to label \*b | ||||
| 
 | ||||
|   BLT \*b : Branch less (pop 2 words, branch if top > second) | ||||
|   BLE \*b : Branch less or equal | ||||
|   BEQ \*b : Branch equal | ||||
|   BNE \*b : Branch not equal | ||||
|   BGE \*b : Branch greater or equal | ||||
|   BGT \*b : Branch greater | ||||
| 
 | ||||
|   ZLT \*b : Branch less than zero (pop 1 word, branch negative) | ||||
|   ZLE \*b : Branch less or equal to zero | ||||
|   ZEQ \*b : Branch equal zero | ||||
|   ZNE \*b : Branch not zero | ||||
|   ZGE \*b : Branch greater or equal zero | ||||
|   ZGT \*b : Branch greater than zero | ||||
| 
 | ||||
| GROUP 14 - PROCEDURE CALL | ||||
| 
 | ||||
|   CAI \*- : Call procedure (procedure identifier on stack) | ||||
|   CAL \*p : Call procedure (with identifier \*p) | ||||
|   LFR \*s : Load function result | ||||
|   RET \*z : Return (function result consists of top \*z bytes) | ||||
| 
 | ||||
| GROUP 15 - MISCELLANEOUS | ||||
| 
 | ||||
|   ASP \*f : Adjust the stack pointer by \*f | ||||
|   ASS \*w : Adjust the stack pointer by \*w-byte integer | ||||
|   BLM \*z : Block move \*z bytes; first pop destination addr, then source addr | ||||
|   BLS \*w : Block move, size is in \*w-byte integer on top of stack | ||||
|   CSA \*w : Case jump; address of jump table at top of stack | ||||
|   CSB \*w : Table lookup jump; address of jump table at top of stack | ||||
|   DCH \*- : Follow dynamic chain, convert LB to LB of caller | ||||
|   DUP \*s : Duplicate top \*s bytes | ||||
|   DUS \*w : Duplicate top \*w bytes | ||||
|   EXG \*w : Exchange top \*w bytes | ||||
|   FIL \*g : File name (external 4 := \*g) | ||||
|   GTO \*g : Non-local goto, descriptor at \*g | ||||
|   LIM \*- : Load 16 bit ignore mask | ||||
|   LIN \*n : Line number (external 0 := \*n) | ||||
|   LNI \*- : Line number increment | ||||
|   LOR \*r : Load register (0=LB, 1=SP, 2=HP) | ||||
|   LPB \*- : Convert local base to argument base | ||||
|   MON \*- : Monitor call | ||||
|   NOP \*- : No operation | ||||
|   RCK \*w : Range check; trap on error | ||||
|   RTT \*- : Return from trap | ||||
|   SIG \*- : Trap errors to proc identifier on top of stack, -2 resets default | ||||
|   SIM \*- : Store 16 bit ignore mask | ||||
|   STR \*r : Store register (0=LB, 1=SP, 2=HP) | ||||
|   TRP \*- : Cause trap to occur (Error number on stack) | ||||
| .DE 0 | ||||
							
								
								
									
										164
									
								
								doc/em/descr.nr
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										164
									
								
								doc/em/descr.nr
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,164 @@ | |||
| .SN 7 | ||||
| .BP | ||||
| .S1 "DESCRIPTORS" | ||||
| Several instructions use descriptors, notably the range check instruction, | ||||
| the array instructions, the goto instruction and the case jump instructions. | ||||
| Descriptors reside in data space. | ||||
| They may be constructed at run time, but | ||||
| more often they are fixed and allocated in ROM data. | ||||
| .P | ||||
| All instructions using descriptors, except GTO, have as argument | ||||
| the size of the integers in the descriptor. | ||||
| All implementations have to allow integers of the size of a | ||||
| word in descriptors. | ||||
| All integers popped from the stack and used for indexing or comparing | ||||
| must have the same size as the integers in the descriptor. | ||||
| .S2 "Range check descriptors" | ||||
| Range check descriptors consist of two integers: | ||||
| .IS 2 | ||||
| .PS 1 4 "" . | ||||
| .PT | ||||
| lower bound~~~~~~~signed | ||||
| .PT | ||||
| upper bound~~~~~~~signed | ||||
| .PE | ||||
| .IE | ||||
| The range check instruction checks an integer on the stack against | ||||
| these bounds and causes a trap if the value is outside the interval. | ||||
| The value itself is neither changed nor removed from the stack. | ||||
| .S2 "Array descriptors" | ||||
| Each array descriptor describes a single dimension. | ||||
| For multi-dimensional arrays, several array instructions are | ||||
| needed to access a single element. | ||||
| Array descriptors contain the following three integers: | ||||
| .IS 2 | ||||
| .PS 1 4 "" . | ||||
| .PT | ||||
| lower bound~~~~~~~~~~~~~~~~~~~~~signed | ||||
| .PT | ||||
| upper bound - lower bound~~~~~~~unsigned | ||||
| .PT | ||||
| number of bytes per element~~~~~unsigned | ||||
| .PE | ||||
| .IE | ||||
| The array instructions LAR, SAR and AAR have the pointer to the start | ||||
| of the descriptor as operand on the stack. | ||||
| .sp | ||||
| The element A[I] is fetched as follows: | ||||
| .IS 2 | ||||
| .PS 1 4 "" . | ||||
| .PT | ||||
| Stack the address of A  (e.g., using LAE or LAL) | ||||
| .PT | ||||
| Stack the value of I (n-byte integer) | ||||
| .PT | ||||
| Stack the pointer to the descriptor (e.g., using LAE) | ||||
| .PT | ||||
| LAR n (n is the size of the integers in the descriptor and I) | ||||
| .PE | ||||
| .IE | ||||
| All array instructions first pop the address of the descriptor | ||||
| and the index. | ||||
| If the index is not within the bounds specified, a trap occurs. | ||||
| If ok, (I~-~lower bound) is multiplied | ||||
| by the number of bytes per element (the third word).  The result is added | ||||
| to the address of A and replaces A on the stack. | ||||
| .A | ||||
| At this point LAR, SAR and AAR diverge. | ||||
| AAR is finished.  LAR pops the address and fetches the data | ||||
| item, | ||||
| the size being specified by the descriptor. | ||||
| The usual restrictions for memory access must be obeyed. | ||||
| SAR pops the address and stores the | ||||
| data item now exposed. | ||||
| .S2 "Non-local goto descriptors" | ||||
| The GTO instruction provides a way of returning directly to any | ||||
| active procedure invocation. | ||||
| The argument of the instruction is the address of a descriptor | ||||
| containing three pointers: | ||||
| .IS 2 | ||||
| .PS 1 4 "" . | ||||
| .PT | ||||
| value of PC after the jump | ||||
| .PT | ||||
| value of SP after the jump | ||||
| .PT | ||||
| value of LB after the jump | ||||
| .PE | ||||
| .IE | ||||
| GTO replaces the loads PC, SP and LB from the descriptor, | ||||
| thereby jumping to a procedure | ||||
| and removing zeor or more frames from the stack. | ||||
| The LB, SP and PC in the descriptor must belong to a | ||||
| dynamically enclosing procedure, | ||||
| because some EM implementations will need to backtrack through | ||||
| the dynamic chain and use the implementation dependent data | ||||
| in frames to restore registers etc. | ||||
| .S2 "Case descriptors" | ||||
| The case jump instructions CSA and CSB both | ||||
| provide multiway branches selected by a case index. | ||||
| Both fetch two operands from the stack: | ||||
| first a pointer to the low address of the case descriptor | ||||
| and then the case index. | ||||
| CSA uses the case index as index in the descriptor table, but CSB searches | ||||
| the table for an occurrence of the case index. | ||||
| Therefore, the descriptors for CSA and CSB, | ||||
| as shown in figure 4, are different. | ||||
| All pointers in the table must be addresses of instructions in the | ||||
| procedure executing the case instruction. | ||||
| .P | ||||
| CSA selects the new PC by indexing. | ||||
| If the index, a signed integer, is greater than or equal to | ||||
| the lower bound and less than or equal to the upper bound, | ||||
| then fetch the new PC from the list of instruction pointers by indexing with | ||||
| index-lower. | ||||
| The table does not contain the value of the upper bound, | ||||
| but the value of upper-lower as an unsigned integer. | ||||
| If the index is out of bounds or if the fetched pointer is 0, | ||||
| then fetch the default instruction pointer. | ||||
| If the resulting PC is 0, then trap. | ||||
| .P | ||||
| CSB selects the new PC by searching. | ||||
| The table is searched for an entry with index value equal to the case index. | ||||
| That entry or, if none is found, the default entry contains the | ||||
| new PC. | ||||
| When the resulting PC is 0, a trap is performed. | ||||
| .P | ||||
| The choice of which case instruction to use for | ||||
| each source language case statement | ||||
| is up to the front end. | ||||
| If the range of the index value is dense, i.e | ||||
| .DS | ||||
| (highest value - lowest value) / number of cases | ||||
| .DE 1 | ||||
| is less than some threshold, then CSA is the obvious choice. | ||||
| If the range is sparse, CSB is better. | ||||
| .N 2 | ||||
| .DS | ||||
|    |--------------------|        |--------------------|  high address | ||||
|    | pointer for upb    |        |    pointer n-1     | | ||||
|    |--------------------|        |-  -  -  -  -  -  - | | ||||
|    |         .          |        |     index  n-1     | | ||||
|    |         .          |        |--------------------| | ||||
|    |         .          |        |          .         | | ||||
|    |         .          |        |          .         | | ||||
|    |         .          |        |          .         | | ||||
|    |         .          |        |--------------------| | ||||
|    |         .          |        |    pointer  1      | | ||||
|    |--------------------|        |-  -  -  -  -  -  - | | ||||
|    | pointer for lwb+1  |        |     index   1      | | ||||
|    |--------------------|        |--------------------| | ||||
|    | pointer for lwb    |        |    pointer  0      | | ||||
|    |--------------------|        |-  -  -  -  -  -  - | | ||||
|    |   upper - lower    |        |     index   0      | | ||||
|    |--------------------|        |--------------------| | ||||
|    |    lower bound     |        | number of entries  | | ||||
|    |--------------------|        |--------------------| | ||||
|    |  default pointer   |        |  default pointer   |  low address | ||||
|    |--------------------|        |--------------------| | ||||
| 
 | ||||
|        CSA descriptor                CSB descriptor | ||||
| 
 | ||||
| 
 | ||||
|       Figure 4. Descriptor layout for CSA and CSB | ||||
| .DE | ||||
							
								
								
									
										377
									
								
								doc/em/dspace.nr
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										377
									
								
								doc/em/dspace.nr
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,377 @@ | |||
| .BP | ||||
| .SN 4 | ||||
| .S1 "DATA ADDRESS SPACE" | ||||
| The data address space is divided into three parts, called 'areas', | ||||
| each with its own addressing method: | ||||
| global data area, | ||||
| local data area (including the stack), | ||||
| and heap data area. | ||||
| These data areas must be part of the same | ||||
| address space because all data is accessed by | ||||
| the same type of pointers. | ||||
| .P | ||||
| Space for global data is reserved using several pseudoinstructions in the | ||||
| assembly language, as described in | ||||
| the next paragraph and chapter 11. | ||||
| The size of the global data area is fixed per program. | ||||
| .A | ||||
| Global data is addressed absolutely in the machine language. | ||||
| Many instructions are available to address global data. | ||||
| They all have an absolute address as argument. | ||||
| Examples are LOE, LAE and STE. | ||||
| .P | ||||
| Part of the global data area is initialized by the | ||||
| compiler, the | ||||
| rest is not initialized at all or is initialized | ||||
| with a value, typically -32768 or 0. | ||||
| Part of the initialized global data may be made read-only | ||||
| if the implementation supports protection. | ||||
| .P | ||||
| The local data area is used as a stack, | ||||
| which grows from high to low addresses | ||||
| and contains some data for each active procedure | ||||
| invocation, called a 'frame'. | ||||
| The size of the local data area varies dynamically during | ||||
| execution. | ||||
| Below the current procedure frame resides the operand stack. | ||||
| The stack pointer SP always points to the bottom of | ||||
| the local data area. | ||||
| Local data is addressed by offsetting from the local base pointer LB. | ||||
| LB always points to the frame of the current procedure. | ||||
| Only the words of the current frame and the parameters | ||||
| can be addressed directly. | ||||
| Variables in other active procedures are addressed by following | ||||
| the chain of statically enclosing procedures using the LXL or LXA instruction. | ||||
| The variables in dynamically enclosing procedures can be | ||||
| addressed with the use of the DCH instruction. | ||||
| .A | ||||
| Many instructions have offsets to LB as argument, | ||||
| for instance LOL, LAL and STL. | ||||
| The arguments of these instructions range from -1 to some | ||||
| (negative) minimum | ||||
| for the access of local storage and from 0 to some (positive) | ||||
| maximum for parameter access. | ||||
| .P | ||||
| The procedure call instructions CAL and CAI each create a new frame | ||||
| on the stack. | ||||
| Each procedure has an assembly-time parameter specifying | ||||
| the number of bytes needed for local storage. | ||||
| This storage is allocated each time the procedure is called and | ||||
| must be a multiple of the wordsize. | ||||
| Each procedure, therefore, starts with a stack with the local variables | ||||
| already allocated. | ||||
| The return instructions RET and RTT remove a frame. | ||||
| The actual parameters must be removed by the calling procedure. | ||||
| .P | ||||
| RET may copy some words from the stack of | ||||
| the returning procedure to an unnamed 'function return area'. | ||||
| This area is available for 'READ-ONCE' access using the LFR instruction. | ||||
| The result of a LFR is only defined if the size used to fetch | ||||
| is identical to the size used in the last return. | ||||
| The instruction ASP, used to remove the parameters from the | ||||
| stack, the branch instruction BRA and the non-local goto | ||||
| instrucion GTO are the only ones that leave the contents of | ||||
| the 'function return area' intact. | ||||
| All other instructions are allowed to destroy the function | ||||
| return area. | ||||
| Thus parameters can be popped before fetching the function result. | ||||
| The maximum size of all function return areas is | ||||
| implementation dependent, | ||||
| but should allow procedure instance identifiers and all | ||||
| implemented objects of type integer, unsigned, float | ||||
| and pointer to be returned. | ||||
| In most implementations | ||||
| the maximum size of the function return | ||||
| area is twice the pointer size, | ||||
| because we want to be able to handle 'procedure instance | ||||
| identifiers' which consist of a procedure identifier and the LB | ||||
| of a frame belonging to that procedure. | ||||
| .P | ||||
| The heap data area grows upwards, to higher numbered | ||||
| addresses. | ||||
| It is initially empty. | ||||
| The initial value of the heap pointer HP | ||||
| marks the low end. | ||||
| The heap pointer may be manipulated | ||||
| by the LOR and STR instructions. | ||||
| The heap can only be addressed indirectly, | ||||
| by pointers derived from previous values of HP. | ||||
| .S2 "Global data area" | ||||
| The initial size of the global data area is determined at assembly time. | ||||
| Global data is allocated by several | ||||
| pseudoinstructions in the EM assembly | ||||
| language. | ||||
| Each pseudoinstruction allocates one or more bytes. | ||||
| The bytes allocated for a single pseudo form | ||||
| a 'block'. | ||||
| A block differs from a fragment, because, | ||||
| under certain conditions, several blocks are allocated | ||||
| in a single fragment. | ||||
| This guarantees that the bytes of these blocks | ||||
| are consecutive. | ||||
| .P | ||||
| Global data is addressed absolutely in binary | ||||
| machine language. | ||||
| Most compilers, however, | ||||
| cannot assign absolute addresses to their global variables, | ||||
| especially not if the language | ||||
| allows programs to be composed of several separately compiled modules. | ||||
| The assembly language therefore allows the compiler to name | ||||
| the first address of a global data block with an alphanumeric label. | ||||
| Moreover, the only way to address such a named global data block | ||||
| in the assembly language is by using its name. | ||||
| It is the task of the assembler/loader to | ||||
| translate these labels into absolute addresses. | ||||
| These labels may also be used | ||||
| in CON and ROM pseudoinstructions to initialize pointers. | ||||
| .P | ||||
| The pseudoinstruction CON allocates initialized data. | ||||
| ROM acts like CON but indicates that the initialized data will | ||||
| not change during execution of the program. | ||||
| The pseudoinstruction BSS allocates a block of uninitialized | ||||
| or identically initialized | ||||
| data. | ||||
| The pseudoinstruction HOL is similar to BSS, | ||||
| but it alters the meaning of subsequent absolute addressing in | ||||
| the assembly language. | ||||
| .P | ||||
| Another type of global data is a small block, | ||||
| called the ABS block, with an implementation defined size. | ||||
| Storage in this type of block can only be addressed | ||||
| absolutely in assembly language. | ||||
| The first word has address 0 and is used to maintain the | ||||
| source line number. | ||||
| Special instructions LIN and LNI are provided to | ||||
| update this counter. | ||||
| A pointer at location 4 points to a string containing the | ||||
| current source file name. | ||||
| The instruction FIL can be used to update the pointer. | ||||
| .P | ||||
| All numeric arguments of the instructions that address | ||||
| the global data area refer to locations in the | ||||
| ABS block unless | ||||
| they are preceded by at least one HOL pseudo in the same | ||||
| module, | ||||
| in which case they refer to the storage area allocated by the | ||||
| last HOL pseudoinstruction. | ||||
| Thus LOE 0 loads the zeroth word of the most recent HOL, unless no HOL has | ||||
| appeared in the current file so | ||||
| far, in which case it loads the zeroth word of the | ||||
| ABS fragment. | ||||
| .P | ||||
| The global data area is highly fragmented. | ||||
| The ABS block and each HOL and BSS block are separate fragments. | ||||
| The way fragments are formed from CON and ROM blocks is more complex. | ||||
| The assemblers group several blocks into a single fragment. | ||||
| A fragment only contains blocks of the same type: CON or ROM. | ||||
| It is guaranteed that the bytes allocated for two consecutive CON pseudos are | ||||
| allocated consecutively in a single fragment, unless | ||||
| these CON pseudos are separated in the assembly language program | ||||
| by a data label definition or one or more of the following pseudos: | ||||
| .DS | ||||
| 
 | ||||
|      ROM, BSS, HOL and END | ||||
| 
 | ||||
| .DE | ||||
| An analogous rule holds for ROM pseudos. | ||||
| .S2 "Local data area" | ||||
| The local data area consists of a sequence of frames, one for | ||||
| each active procedure. | ||||
| Below the frame of the current procedure resides the | ||||
| expression stack. | ||||
| Frames are generated by procedure calls and are | ||||
| removed by procedure returns. | ||||
| A procedure frame consists of six 'zones': | ||||
| .DS | ||||
| 
 | ||||
|   1.  The return status block | ||||
|   2.  The local variables and compiler temporaries | ||||
|   3.  The register save block | ||||
|   4.  The dynamic local generators | ||||
|   5.  The operand stack. | ||||
|   6.  The parameters of a procedure one level deeper | ||||
| 
 | ||||
| .DE | ||||
| A sample frame is shown in Figure 1. | ||||
| .P | ||||
| Before a procedure call is performed the actual | ||||
| parameters are pushed onto the stack of the calling procedure. | ||||
| The exact details are compiler dependent. | ||||
| EM allows procedures to be called with a variable number of | ||||
| parameters. | ||||
| The implementation of the C-language almost forces its runtime | ||||
| system to push the parameters in reverse order, that is, | ||||
| the first positional parameter last. | ||||
| Most compilers use the C calling convention to be compatible. | ||||
| The parameters of a procedure belong to the frame of the | ||||
| calling procedure. | ||||
| Note that the evaluation of the actual parameters may imply | ||||
| the calling of procedures. | ||||
| The parameters can be accessed with certain instructions using | ||||
| offsets of 0 and greater. | ||||
| The first byte of the last parameter pushed has offset 0. | ||||
| Note that the parameter at offset 0 has a special use in the | ||||
| instructions following the static chain (LXL and LXA). | ||||
| These instructions assume that this parameter contains the LB of | ||||
| the statically enclosing procedure. | ||||
| Procedures that do not have a dynamically enclosing procedure | ||||
| do not need a static link at offset 0. | ||||
| .P | ||||
| Two instructions are available to perform procedure calls, CAL | ||||
| and CAI. | ||||
| Several tasks are performed by these call instructions. | ||||
| .A | ||||
| First, a part of the status of the calling procedure is | ||||
| saved on the stack in the return status block. | ||||
| This block should contain the return address of the calling | ||||
| procedure, its LB and other implementation dependent data. | ||||
| The size of this block is fixed for any given implementation | ||||
| because the lexical instructions LPB, LXL and LXA must be able to | ||||
| obtain the base addresses of the procedure parameters \fBand\fP local | ||||
| variables. | ||||
| An alternative solution can be used on machines with a highly | ||||
| segmented address space. | ||||
| The stack frames need not be contiguous then and the first | ||||
| status save area can contain the parameter base AB, | ||||
| which has the value of SP just after the last parameter has | ||||
| been pushed. | ||||
| .A | ||||
| Second, the LB is changed to point to the | ||||
| first word above the local variables. | ||||
| The new LB is a copy of the SP after the return status | ||||
| block has been pushed. | ||||
| .A | ||||
| Third, the amount of local storage needed by the procedure is | ||||
| reserved. | ||||
| The parameters and local storage are accessed by the same instructions. | ||||
| Negative offsets are used for access to local variables. | ||||
| The highest byte, that is the byte nearest | ||||
| to LB, has to be accessed with offset -1. | ||||
| The pseudoinstruction specifying the entry point of a | ||||
| procedure, has an argument that specifies the amount of local | ||||
| storage needed. | ||||
| The local variables allocated by the CAI or CAL instructions | ||||
| are the only ones that can be accessed with a fixed negative offset. | ||||
| The initial value of the allocated words is | ||||
| not defined, but implementations that check for undefined | ||||
| values will probably initialize them with a | ||||
| special 'undefined' pattern, typically -32768. | ||||
| .A | ||||
| Fourth, any EM implementation is allowed to reserve a variable size | ||||
| block beneath the local variables. | ||||
| This block could, for example, be used to save a variable number | ||||
| of registers. | ||||
| .A | ||||
| Finally, the address of the entry point of the called procedure | ||||
| is loaded into the Program Counter. | ||||
| .P | ||||
| The ASP instruction can be used to allocate further (dynamic) | ||||
| local storage. | ||||
| The base address of such storage must be obtained with a LOR~SP | ||||
| instruction. | ||||
| This same instruction ASP may also be used | ||||
| to remove some words from the stack. | ||||
| .P | ||||
| There is a version of ASP, called ASS, which fetches the number | ||||
| of bytes to allocate from the stack. | ||||
| It can be used to allocate space for local | ||||
| objects whose size is unknown at compile time, | ||||
| so called 'dynamic local generators'. | ||||
| .P | ||||
| Control is returned to the calling procedure with a RET instruction. | ||||
| Any return value is then copied to the 'function return area'. | ||||
| The frame created by the call is deallocated and the status of | ||||
| the calling procedure is restored. | ||||
| The value of SP just after the return value has been popped must | ||||
| be the same as the | ||||
| value of SP just before executing the first instruction of this | ||||
| invocation. | ||||
| This means that when a RET is executed the operand stack can | ||||
| only contain the return value and all dynamically generated locals must be | ||||
| deallocated. | ||||
| Violating this restriction might result in hard to detect | ||||
| errors. | ||||
| The calling procedure has to remove the parameters from the stack. | ||||
| This can be done with the aforementioned ASP instruction. | ||||
| .P | ||||
| Each procedure frame is a separate fragment. | ||||
| Because any fragment may be placed anywhere in memory, | ||||
| procedure frames need not be contiguous. | ||||
| .DS | ||||
|                 |===============================| | ||||
|                 |     actual parameter  n-1     | | ||||
|                 |-------------------------------| | ||||
|                 |              .                | | ||||
|                 |              .                | | ||||
|                 |              .                | | ||||
|                 |-------------------------------| | ||||
|                 |     actual parameter  0       | ( <- AB ) | ||||
|                 |===============================| | ||||
| 
 | ||||
| 
 | ||||
|                 |===============================| | ||||
|                 |///////////////////////////////| | ||||
|                 |///// return status block /////| | ||||
|                 |///////////////////////////////|   <- LB | ||||
|                 |===============================| | ||||
|                 |                               | | ||||
|                 |       local variables         | | ||||
|                 |                               | | ||||
|                 |-------------------------------| | ||||
|                 |                               | | ||||
|                 |      compiler temporaries     | | ||||
|                 |                               | | ||||
|                 |===============================| | ||||
|                 |///////////////////////////////| | ||||
|                 |///// register save block /////| | ||||
|                 |///////////////////////////////| | ||||
|                 |===============================| | ||||
|                 |                               | | ||||
|                 |   dynamic local generators    | | ||||
|                 |                               | | ||||
|                 |===============================| | ||||
|                 |           operand             | | ||||
|                 |-------------------------------| | ||||
|                 |           operand             | | ||||
|                 |===============================| | ||||
|                 |         parameter  m-1        | | ||||
|                 |-------------------------------| | ||||
|                 |              .                | | ||||
|                 |              .                | | ||||
|                 |              .                | | ||||
|                 |-------------------------------| | ||||
|                 |         parameter  0          | <- SP | ||||
|                 |===============================| | ||||
| 
 | ||||
|           Figure 1. A sample procedure frame and parameters. | ||||
| .DE | ||||
| .S2 "Heap data area" | ||||
| The heap area starts empty, with HP | ||||
| pointing to the low end of it. | ||||
| HP always contains a word address. | ||||
| A copy of HP can always be obtained with the LOR instruction. | ||||
| A new value may be stored in the heap pointer using the STR instruction. | ||||
| If the new value is greater than the old one, | ||||
| then the heap grows. | ||||
| If it is smaller, then the heap shrinks. | ||||
| HP may never point below its original value. | ||||
| All words between the current HP and the original HP | ||||
| are allocated to the heap. | ||||
| The heap may not grow into a part of memory that is already allocated | ||||
| for the stack. | ||||
| When this is attempted, the STR instruction will cause a trap to occur. | ||||
| .P | ||||
| The only way to address the heap is indirectly. | ||||
| Whenever an object is allocated by increasing HP, | ||||
| then the old HP value must be saved and can be used later to address | ||||
| the allocated object. | ||||
| If, in the meantime, HP is decreased so that the object | ||||
| is no longer part of the heap, then an attempt to access | ||||
| the object is not allowed. | ||||
| Furthermore, if the heap pointer is increased again to above | ||||
| the object address, then access to the old object gives undefined results. | ||||
| .P | ||||
| The heap is a single fragment. | ||||
| All bytes have consecutive addresses. | ||||
| No limits are imposed on the size of the heap as long as it fits | ||||
| in the available data address space. | ||||
							
								
								
									
										9
									
								
								doc/em/even.c
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										9
									
								
								doc/em/even.c
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,9 @@ | |||
| main() { | ||||
| 	register int l,j ; | ||||
| 
 | ||||
| 	for ( j=0 ; (l=getchar()) != -1 ; j++ ) { | ||||
| 		if ( j%16 == 15 ) printf("%3d\n",l&0377 ) ; | ||||
| 		else              printf("%3d ",l&0377 ) ; | ||||
| 	} | ||||
| 	printf("\n") ; | ||||
| } | ||||
							
								
								
									
										178
									
								
								doc/em/exam.e
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										178
									
								
								doc/em/exam.e
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,178 @@ | |||
|   mes 2,2,2              ; wordsize 2, pointersize 2 | ||||
|  .1 | ||||
|   rom 't.p\000'          ; the name of the source file | ||||
|   hol 552,-32768,0       ; externals and buf occupy 552 bytes | ||||
|   exp $sum               ; sum can be called from other modules | ||||
|   pro $sum,2             ; procedure sum; 2 bytes local storage | ||||
|   lin 8                  ; code from source line 8 | ||||
|   ldl 0                  ; load two locals ( a and b ) | ||||
|   adi 2                  ; add them | ||||
|   ret 2                  ; return the result | ||||
|   end 2                  ; end of procedure ( still two bytes local storage ) | ||||
|  .2 | ||||
|   rom 1,99,2             ; descriptor of array a[] | ||||
|   exp $test              ; the compiler exports all level 0 procedures | ||||
|   pro $test,226          ; procedure test, 226 bytes local storage | ||||
|  .3 | ||||
|   rom 4.8F8              ; assemble Floating point 4.8 (8 bytes) in | ||||
|  .4                              ; global storage | ||||
|   rom 0.5F8              ; same for 0.5 | ||||
|   mes 3,-226,2,2         ; compiler temporary not referenced indirect | ||||
|   mes 3,-24,2,0          ; the same is true for i, j, b and c in test | ||||
|   mes 3,-22,2,0 | ||||
|   mes 3,-4,2,0 | ||||
|   mes 3,-2,2,0 | ||||
|   mes 3,-20,8,0          ; and for x and y | ||||
|   mes 3,-12,8,0 | ||||
|   lin 20                 ; maintain source line number | ||||
|   loc 1 | ||||
|   stl -4                 ; j := 1 | ||||
|   lni                    ; was lin 21 prior to optimization | ||||
|   lol -4 | ||||
|   loc 3 | ||||
|   mli 2 | ||||
|   loc 6 | ||||
|   adi 2 | ||||
|   stl -2                 ; i := 3 * j + 6 | ||||
|   lni                    ; was lin 22 prior to optimization | ||||
|   lae .3 | ||||
|   loi 8 | ||||
|   lal -12 | ||||
|   sti 8                  ; x := 4.8 | ||||
|   lni                    ; was lin 23 prior to optimization | ||||
|   lal -12 | ||||
|   loi 8 | ||||
|   lae .4 | ||||
|   loi 8 | ||||
|   dvf 8 | ||||
|   lal -20 | ||||
|   sti 8                  ; y := x / 0.5 | ||||
|   lni                    ; was lin 24 prior to optimization | ||||
|   loc 1 | ||||
|   stl -22                ; b := true | ||||
|   lni                    ; was lin 25 prior to optimization | ||||
|   loc 122 | ||||
|   stl -24                ; c := 'z' | ||||
|   lni                    ; was lin 26 prior to optimization | ||||
|   loc 1 | ||||
|   stl -2                 ; for i:= 1 | ||||
|  2 | ||||
|   lol -2 | ||||
|   dup 2 | ||||
|   mli 2                  ; i*i | ||||
|   lal -224 | ||||
|   lol -2 | ||||
|   lae .2 | ||||
|   sar 2                  ; a[i] := | ||||
|   lol -2 | ||||
|   loc 100 | ||||
|   beq *3                 ; to 100 do | ||||
|   inl -2                 ; increment i and loop | ||||
|   bra *2 | ||||
|  3 | ||||
|   lin 27 | ||||
|   lol -4 | ||||
|   loc 27 | ||||
|   adi 2                  ; j + 27 | ||||
|   sil 0                  ; r.r1 := | ||||
|   lni                    ; was lin 28 prior to optimization | ||||
|   lol -22                ; b | ||||
|   lol 0 | ||||
|   stf 10                 ; r.r3 := | ||||
|   lni                    ; was lin 29 prior to optimization | ||||
|   lal -20 | ||||
|   loi 16 | ||||
|   adf 8                  ; x + y | ||||
|   lol 0 | ||||
|   adp 2 | ||||
|   sti 8                  ; r.r2 := | ||||
|   lni                    ; was lin 23 prior to optimization | ||||
|   lal -224 | ||||
|   lol -4 | ||||
|   lae .2 | ||||
|   lar 2                  ; a[j] | ||||
|   lil 0                  ; r.r1 | ||||
|   cal $sum               ; call now | ||||
|   asp 4                  ; remove parameters from stack | ||||
|   lfr 2                  ; get function result | ||||
|   stl -2                 ; i := | ||||
|  4 | ||||
|   lin 31 | ||||
|   lol -2 | ||||
|   zle *5                 ; while i > 0 do | ||||
|   lol -4 | ||||
|   lil 0 | ||||
|   adi 2 | ||||
|   stl -4                 ; j := j + r.r1 | ||||
|   del -2                 ; i := i - 1 | ||||
|   bra *4                 ; loop | ||||
|  5 | ||||
|   lin 32 | ||||
|   lol 0 | ||||
|   stl -226               ; make copy of address of r | ||||
|   lol -22 | ||||
|   lol -226 | ||||
|   stf 10                 ; r3 := b | ||||
|   lal -20 | ||||
|   loi 16 | ||||
|   adf 8 | ||||
|   lol -226 | ||||
|   adp 2 | ||||
|   sti 8                  ; r2 := x + y | ||||
|   loc 0 | ||||
|   sil -226               ; r1 := 0 | ||||
|   lin 34                 ; note the abscence of the unnecesary jump | ||||
|   lae 22                 ; address of output structure | ||||
|   lol -4 | ||||
|   cal $_wri              ; write integer with default width | ||||
|   asp 4                  ; pop parameters | ||||
|   lae 22 | ||||
|   lol -2 | ||||
|   loc 6 | ||||
|   cal $_wsi              ; write integer width 6 | ||||
|   asp 6 | ||||
|   lae 22 | ||||
|   lal -12 | ||||
|   loi 8 | ||||
|   loc 9 | ||||
|   loc 3 | ||||
|   cal $_wrf              ; write fixed format real, width 9, precision 3 | ||||
|   asp 14 | ||||
|   lae 22 | ||||
|   lol -22 | ||||
|   cal $_wrb              ; write boolean, default width | ||||
|   asp 4 | ||||
|   lae 22 | ||||
|   cal $_wln              ; writeln | ||||
|   asp 2 | ||||
|   ret 0                  ; return, no result | ||||
|   end 226 | ||||
|   exp $_main | ||||
|   pro $_main,0           ; main program | ||||
|  .6 | ||||
|   con 2,-1,22            ; description of external files | ||||
|  .5 | ||||
|   rom 15.96F8 | ||||
|   fil .1                 ; maintain source file name | ||||
|   lae .6                 ; description of external files | ||||
|   lae 0                  ; base of hol area to relocate buffer addresses | ||||
|   cal $_ini              ; initialize files, etc... | ||||
|   asp 4 | ||||
|   lin 37 | ||||
|   lae .5 | ||||
|   loi 8 | ||||
|   lae 2 | ||||
|   sti 8                  ; x := 15.9 | ||||
|   lni                    ; was lin 38 prior to optimization | ||||
|   loc 99 | ||||
|   ste 0                  ; mi := 99 | ||||
|   lni                    ; was lin 39 prior to optimization | ||||
|   lae 10                 ; address of r | ||||
|   cal $test | ||||
|   asp 2 | ||||
|   loc 0                  ; normal exit | ||||
|   cal $_hlt              ; cleanup and finish | ||||
|   asp 2 | ||||
|   end 0 | ||||
|   mes 4,40               ; length of source file is 40 lines | ||||
|   mes 5                  ; reals were used | ||||
							
								
								
									
										40
									
								
								doc/em/exam.p
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										40
									
								
								doc/em/exam.p
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,40 @@ | |||
|   program example(output); | ||||
|   {This program just demonstrates typical EM code.} | ||||
|   type rec = record r1: integer; r2:real; r3: boolean end; | ||||
|   var mi: integer;  mx:real;  r:rec; | ||||
| 
 | ||||
|   function sum(a,b:integer):integer; | ||||
|   begin | ||||
|     sum := a + b | ||||
|   end; | ||||
| 
 | ||||
|   procedure test(var r: rec); | ||||
|   label 1; | ||||
|   var   i,j: integer; | ||||
| 	x,y: real; | ||||
| 	b: boolean; | ||||
| 	c: char; | ||||
| 	a: array[1..100] of integer; | ||||
| 
 | ||||
|   begin | ||||
| 	j := 1; | ||||
| 	i := 3 * j + 6; | ||||
| 	x := 4.8; | ||||
| 	y := x/0.5; | ||||
| 	b := true; | ||||
| 	c := 'z'; | ||||
| 	for i:= 1 to 100 do a[i] := i * i; | ||||
| 	r.r1 := j+27; | ||||
| 	r.r3 := b; | ||||
| 	r.r2 := x+y; | ||||
| 	i := sum(r.r1, a[j]); | ||||
| 	while i > 0 do begin j := j + r.r1; i := i - 1 end; | ||||
| 	with r do begin r3 := b;  r2 := x+y;  r1 := 0 end; | ||||
| 	goto 1; | ||||
|   1:    writeln(j, i:6, x:9:3, b) | ||||
|   end; {test} | ||||
|   begin {main program} | ||||
|     mx := 15.96; | ||||
|     mi := 99; | ||||
|     test(r) | ||||
|   end. | ||||
							
								
								
									
										180
									
								
								doc/em/intro.nr
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										180
									
								
								doc/em/intro.nr
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,180 @@ | |||
| .BP | ||||
| .S1 "INTRODUCTION" | ||||
| EM is a family of intermediate languages designed for producing | ||||
| portable compilers. | ||||
| The general strategy is for a program called | ||||
| .B front end | ||||
| to translate the source program to EM. | ||||
| Another program, | ||||
| .B back | ||||
| .BW end | ||||
| translates EM to target assembly language. | ||||
| Alternatively, the EM code can be assembled to a binary form | ||||
| and interpreted. | ||||
| These considerations led to the following goals: | ||||
| .IS 2 10 | ||||
| .PS 1 4 | ||||
| .PT | ||||
| The design should allow translation to, | ||||
| or interpretation on, a wide range of existing machines. | ||||
| Design decisions should be delayed as far as possible | ||||
| and the implications of these decisions should | ||||
| be localized as much as possible. | ||||
| .N | ||||
| The current microcomputer technology offers 8, 16 and 32 bit machines | ||||
| with various sizes of address space. | ||||
| EM should be flexible enough to be useful on most of these | ||||
| machines. | ||||
| The differences between the members of the EM family should only | ||||
| concern the wordsize and address space size. | ||||
| .PT | ||||
| The architecture should ease the task of code generation for | ||||
| high level languages such as Pascal, C, Ada, Algol 68, BCPL. | ||||
| .PT | ||||
| The instruction set used by the interpreter should be compact, | ||||
| to reduce the amount of memory needed | ||||
| for program storage, and to reduce the time needed to transmit | ||||
| programs over communication lines. | ||||
| .PT | ||||
| It should be designed with microprogrammed implementations in | ||||
| mind; in particular, the use of many short fields within | ||||
| instruction opcodes should be avoided, because their extraction by the | ||||
| microprogram or conversion to other instruction formats is inefficient. | ||||
| .PE | ||||
| .IE | ||||
| .A | ||||
| The basic architecture is based on the concept of a stack. The stack | ||||
| is used for procedure return addresses, actual parameters, local variables, | ||||
| and arithmetic operations. | ||||
| There are several built-in object types, | ||||
| for example, signed and unsigned integers, | ||||
| floating point numbers, pointers and sets of bits. | ||||
| There are instructions to push and pop objects | ||||
| to and from the stack. | ||||
| The push and pop instructions are not typed. | ||||
| They only care about the size of the objects. | ||||
| For each built-in type there are | ||||
| reverse Polish type instructions that pop one or more | ||||
| objects from the top of | ||||
| the stack, perform an operation, and push the result back onto the | ||||
| stack. | ||||
| For all types except pointers, | ||||
| these instructions have the object size | ||||
| as argument. | ||||
| .P | ||||
| There are no visible general registers used for arithmetic operands | ||||
| etc. This is in contrast to most third generation computers, which usually | ||||
| have 8 or 16 general registers. The decision not to have a group of | ||||
| general registers was fully intentional, and follows W.L. Van der | ||||
| Poel's dictum that a machine should have 0, 1, or an infinite | ||||
| number of any feature. General registers have two primary uses: to hold | ||||
| intermediate results of complicated expressions, e.g. | ||||
| .IS 5 0 1 | ||||
| ((a*b + c*d)/e + f*g/h) * i | ||||
| .IE 1 | ||||
| and to hold local variables. | ||||
| .P | ||||
| Various studies | ||||
| have shown that the average expression has fewer than two operands, | ||||
| making the former use of registers of doubtful value. The present trend | ||||
| toward structured programs consisting of many small | ||||
| procedures greatly reduces the value of registers to hold local variables | ||||
| because the large number of procedure calls implies a large overhead in | ||||
| saving and restoring the registers at every call. | ||||
| .BP | ||||
| .P | ||||
| Although there are no general purpose registers, there are a | ||||
| few internal registers with specific functions as follows: | ||||
| .IS 2 | ||||
| .N 1 | ||||
| .TS | ||||
| tab(:); | ||||
| l 1 l l. | ||||
| PC:-:Program Counter:Pointer to next instruction | ||||
| LB:-:Local Base:Points to base of the local variables \ | ||||
| in the current procedure. | ||||
| SP:-:Stack Pointer:Points to the highest occupied word on the stack. | ||||
| HP:-:Heap Pointer:Points to the top of the heap area. | ||||
| .TE 1 | ||||
| .IE | ||||
| .A | ||||
| Furthermore, reverse Polish code is much easier to generate than | ||||
| multi-register machine code, especially if highly efficient code is | ||||
| desired. | ||||
| When translating to assembly language the back end can make | ||||
| good use of the target machine's registers. | ||||
| An EM machine can | ||||
| achieve high performance by keeping part of the stack | ||||
| in high speed storage (a cache or microprogram scratchpad memory) rather | ||||
| than in primary memory. | ||||
| .P | ||||
| Again according to van der Poel's dictum, | ||||
| all EM instructions have zero or one argument. | ||||
| We believe that instructions needing two arguments | ||||
| can be split into two simpler ones. | ||||
| The simpler ones can probably be used in other | ||||
| circumstances as well. | ||||
| Moreover, these two instructions together often | ||||
| have a shorter encoding than the single | ||||
| instruction before. | ||||
| .P | ||||
| This document describes EM at three different levels: | ||||
| the abstract level, the assembly language level and | ||||
| the machine language level. | ||||
| .A | ||||
| The most important level is that of the abstract EM architecture. | ||||
| This level deals with the basic design issues. | ||||
| Only the functional capabilities of instructions are relevant, not their | ||||
| format or encoding. | ||||
| Most chapters of this document refer to the abstract level | ||||
| and it is explicitly stated whenever | ||||
| another level is described. | ||||
| .A | ||||
| The assembly language is intended for the compiler writer. | ||||
| It presents a more or less orthogonal instruction | ||||
| set and provides symbolic names for data. | ||||
| Moreover, it facilitates the linking of | ||||
| separately compiled 'modules' into a single program | ||||
| by providing several pseudoinstructions. | ||||
| .A | ||||
| The machine language is designed for interpretation with a compact | ||||
| program text and easy decoding. | ||||
| The binary representation of the machine language instruction set is | ||||
| far from orthogonal. | ||||
| Frequent instructions have a short opcode. | ||||
| The encoding is fully byte oriented. | ||||
| These bytes do not contain small bit fields, because | ||||
| bit fields would slow down decoding considerably. | ||||
| .P | ||||
| A common use for EM is for producing portable (cross) compilers. | ||||
| When used this way, the compilers produce | ||||
| EM assembly language as their output. | ||||
| To run the compiled program on the target machine, | ||||
| the back end, translates the EM assembly language to | ||||
| the target machine's assembly language. | ||||
| When this approach is used, the format of the EM | ||||
| machine language instructions is irrelevant. | ||||
| On the other hand, when writing an interpreter for EM machine language | ||||
| programs, the interpreter must deal with the machine language | ||||
| and not with the symbolic assembly language. | ||||
| .P | ||||
| As mentioned above, the | ||||
| current microcomputer technology offers 8, 16 and 32 bit | ||||
| machines with address spaces ranging from 2\v'-0.5m'16\v'0.5m' | ||||
| to 2\v'-0.5m'32\v'0.5m' bytes. | ||||
| Having one size of pointers and integers restricts | ||||
| the usefulness of the language. | ||||
| We decided to have a different language for each combination of | ||||
| word and pointer size. | ||||
| All languages offer the same instruction set and differ only in | ||||
| memory alignment restrictions and the implicit size assumed in | ||||
| several instructions. | ||||
| The languages | ||||
| differ slightly for the | ||||
| different size combinations. | ||||
| For example: the | ||||
| size of any object on the stack and alignment restrictions. | ||||
| The wordsize is restricted to powers of 2 and | ||||
| the pointer size must be a multiple of the wordsize. | ||||
| Almost all programs handling EM will be parametrized with word | ||||
| and pointer size. | ||||
							
								
								
									
										376
									
								
								doc/em/iotrap.nr
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										376
									
								
								doc/em/iotrap.nr
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,376 @@ | |||
| .SN 8 | ||||
| .VS 1 0 | ||||
| .BP | ||||
| .S1 "ENVIRONMENT INTERACTIONS" | ||||
| EM programs can interact with their environment in three ways. | ||||
| Two, starting/stopping and monitor calls, are dealt with in this chapter. | ||||
| The remaining way to interact, interrupts, will be treated | ||||
| together with traps in chapter 9. | ||||
| .S2 "Program starting and stopping" | ||||
| EM user programs start with a call to a procedure called | ||||
| m_a_i_n. | ||||
| The assembler and backends look for the definition of a procedure | ||||
| with this name in their input. | ||||
| The call passes three parameters to the procedure. | ||||
| The parameters are similar to the parameters supplied by the | ||||
| UNIX | ||||
| .FS | ||||
| UNIX is a Trademark of Bell Laboratories. | ||||
| .FE | ||||
| operating system to C programs. | ||||
| These parameters are often called | ||||
| .BW argc , | ||||
| .B argv | ||||
| and | ||||
| .BW envp . | ||||
| Argc is the parameter nearest to LB and is a wordsized integer. | ||||
| The other two are pointers to the first element of an array of | ||||
| string pointers. | ||||
| .N | ||||
| The | ||||
| .B argv | ||||
| array contains | ||||
| .B argc | ||||
| strings, the first of which contains the program call name. | ||||
| The other strings in the | ||||
| .B argv | ||||
| array are the program parameters. | ||||
| .P | ||||
| The | ||||
| .B envp | ||||
| array contains strings in the form "name=string", where 'name' | ||||
| is the name of an environment variable and string its value. | ||||
| The | ||||
| .B envp | ||||
| is terminated by a zero pointer. | ||||
| .P | ||||
| An EM user program stops if the program returns from the first | ||||
| invocation of m_a_i_n. | ||||
| The contents of the function return area are used to procure a | ||||
| wordsized program return code. | ||||
| EM programs also stop when traps and interrupts occur that are | ||||
| not caught and when the exit monitor call is executed. | ||||
| .S2 "Input/Output and other monitor calls" | ||||
| EM differs from most conventional machines in that it has high level i/o | ||||
| instructions. | ||||
| Typical instructions are OPEN FILE and READ FROM FILE instead | ||||
| of low level instructions such as setting and clearing | ||||
| bits in device registers. | ||||
| By providing such high level i/o primitives, the task of implementing | ||||
| EM on various non EM machines is made considerably easier. | ||||
| .P | ||||
| I/O is initiated by the MON instruction, which expects an iocode on top | ||||
| of the stack. | ||||
| Often there are also parameters which are pushed on the | ||||
| stack in reverse order, that is: last | ||||
| parameter first. | ||||
| Some i/o functions also provide results, which are returned on the stack. | ||||
| In the list of monitor calls we use several types of parameters and results, | ||||
| these types consist of integers and unsigneds of varying sizes, but never | ||||
| smaller than the wordsize, and the two pointer types. | ||||
| .N 1 | ||||
| The names of the types used are: | ||||
| .IS 4 | ||||
| .PS - 10 | ||||
| .PT int | ||||
| an integer of wordsize | ||||
| .PT int2 | ||||
| an integer whose size is the maximum of the wordsize and 2 | ||||
| bytes | ||||
| .PT int4 | ||||
| an integer whose size is the maximum of the wordsize and 4 | ||||
| bytes | ||||
| .PT intp | ||||
| an integer with the size of a pointer | ||||
| .PT uns2 | ||||
| an unsigned integer whose size is the maximum of the wordsize and 2 | ||||
| .PT unsp | ||||
| an unsigned integer with the size of a pointer | ||||
| .PT ptr | ||||
| a pointer into data space | ||||
| .PE 1 | ||||
| .IE 0 | ||||
| The table below lists the i/o codes with their results and | ||||
| parameters. | ||||
| This list is similar to the system calls of the UNIX Version 7 | ||||
| operating system. | ||||
| .BP | ||||
| .A | ||||
| To execute a monitor call, proceed as follows: | ||||
| .IS 2 | ||||
| .N 1 | ||||
| .PS a 4 "" ) | ||||
| .PT | ||||
| Stack the parameters, in reverse order, last parameter first. | ||||
| .PT | ||||
| Push the monitor call number (iocode) onto the stack. | ||||
| .PT | ||||
| Execute the MON instruction. | ||||
| .PE 1 | ||||
| .IE | ||||
| An error code is present on the top of the stack after | ||||
| execution of most monitor calls. | ||||
| If this error code is zero, the call performed the action | ||||
| requested and the results are available on top of the stack. | ||||
| Non-zero error codes indicate a failure, in this case no | ||||
| results are available and the error code has been pushed twice. | ||||
| This construction enables programs to test for failure with a | ||||
| single instruction (~TEQ or TNE~) and still find out the cause of | ||||
| the failure. | ||||
| The result name 'e' is reserved for the error code. | ||||
| .N 1 | ||||
| List of monitor calls. | ||||
| .DS B | ||||
| number name    parameters      results           function | ||||
| 
 | ||||
|    1   Exit    status:int                        Terminate this process | ||||
|    2   Fork                    e,flag,pid:int    Spawn new process | ||||
|    3   Read    fildes:int;buf:ptr;nbytes:unsp | ||||
|                                e:int;rbytes:unsp Read from file | ||||
|    4   Write   fildes:int;buf:ptr;nbytes:unsp | ||||
|                                e:int;wbytes:unsp Write on a file | ||||
|    5   Open    string:ptr;flag:int | ||||
|                                e,fildes:int      Open file for read and/or write | ||||
|    6   Close   fildes:int      e:int             Close a file | ||||
|    7   Wait                    e:int;status,pid:int2 | ||||
|                                                  Wait for child | ||||
|    8   Creat   string:ptr;mode:int | ||||
|                                e,fildes:int      Create a new file | ||||
|    9   Link    string1,string2:ptr | ||||
|                                e:int             Link to a file | ||||
|   10   Unlink  string:ptr      e:int             Remove directory entry | ||||
|   12   Chdir   string:ptr      e:int             Change default directory | ||||
|   14   Mknod   string:ptr;mode,addr:int2 | ||||
|                                e:int             Make a special file | ||||
|   15   Chmod   string:ptr;mode:int2 | ||||
|                                e:int             Change mode of file | ||||
|   16   Chown   string:ptr;owner,group:int2 | ||||
|                                e:int             Change owner/group of a file | ||||
|   18   Stat    string,statbuf:ptr | ||||
|                                e:int             Get file status | ||||
|   19   Lseek   fildes:int;off:int4;whence:int | ||||
|                                e:int;oldoff:int4 Move read/write pointer | ||||
|   20   Getpid                  pid:int2          Get process identification | ||||
|   21   Mount   special,string:ptr;rwflag:int | ||||
|                                e:int             Mount file system | ||||
|   22   Umount  special:ptr     e:int             Unmount file system | ||||
|   23   Setuid  userid:int2     e:int             Set user ID | ||||
|   24   Getuid                  e_uid,r_uid:int2  Get user ID | ||||
|   25   Stime   time:int4       e:int             Set time and date | ||||
|   26   Ptrace  request:int;pid:int2;addr:ptr;data:int | ||||
|                                e,value:int       Process trace | ||||
|   27   Alarm   seconds:uns2    previous:uns2     Schedule signal | ||||
|   28   Fstat   fildes:int;statbuf:ptr | ||||
|                                e:int             Get file status | ||||
|   29   Pause                                     Stop until signal | ||||
|   30   Utime   string,timep:ptr | ||||
|                                e:int             Set file times | ||||
|   33   Access  string,mode:int e:int             Determine file accessibility | ||||
|   34   Nice    incr:int                          Set program priority | ||||
|   35   Ftime   bufp:ptr        e:int             Get date and time | ||||
|   36   Sync                                      Update filesystem | ||||
|   37   Kill    pid:int2;sig:int | ||||
|                                e:int             Send signal to a process | ||||
|   41   Dup     fildes,newfildes:int | ||||
|                                e,fildes:int      Duplicate a file descriptor | ||||
|   42   Pipe                    e,w_des,r_des:int Create a pipe | ||||
|   43   Times   buffer:ptr                        Get process times | ||||
|   44   Profil  buff:ptr;bufsiz,offset,scale:intp Execution time profile | ||||
|   46   Setgid  gid:int2        e:int             Set group ID | ||||
|   47   Getgid                  e_gid,r_gid:int   Get group ID | ||||
|   48   Sigtrp  trapno,signo:int | ||||
|                                e,prevtrap:int    See below | ||||
|   51   Acct    file:ptr        e:int             Turn accounting on or off | ||||
|   53   Lock    flag:int        e:int             Lock a process | ||||
|   54   Ioctl   fildes,request:int;argp:ptr | ||||
|                                e:int             Control device | ||||
|   56   Mpxcall cmd:int;vec:ptr e:int             Multiplexed file handling | ||||
|   59   Exece   name,argv,envp:ptr | ||||
|                                e:int             Execute a file | ||||
|   60   Umask   complmode:int2  oldmask:int2      Set file creation mode mask | ||||
|   61   Chroot  string:ptr      e:int             Change root directory | ||||
| .DE 1 | ||||
| Codes 0, 11, 13, 17, 31, 32, 38, 39, 40, 45, 49, 50, 52, | ||||
| 55, 57, 58, 62, and 63 are | ||||
| not used. | ||||
| .P | ||||
| All monitor calls, except fork and sigtrp | ||||
| are the same as the UNIX version 7 system calls. | ||||
| .P | ||||
| The sigtrp entry maps UNIX signals onto EM interrupts. | ||||
| Normally, trapno is in the range 0 to 252. | ||||
| In that case it requests that signal signo | ||||
| will cause trap trapno to occur. | ||||
| When given trap number -2, default signal handling is reset, and when given | ||||
| trap number -3, the signal is ignored. | ||||
| .P | ||||
| The flag returned by fork is 1 in the child process and 0 in | ||||
| the parent. | ||||
| The pid returned is the process-id of the other process. | ||||
| .BP | ||||
| .S1 "TRAPS AND INTERRUPTS" | ||||
| EM provides a means for the user program to catch all traps | ||||
| generated by the program itself, the hardware, or external conditions. | ||||
| This mechanism uses five instructions: LIM, SIM, SIG, TRP and RTT. | ||||
| This section of the manual may be omitted on the first reading since it | ||||
| presupposes knowledge of the EM instruction set. | ||||
| .P | ||||
| The action taken when a trap occures is determined by the value | ||||
| of an internal EM trap register. | ||||
| This register contains a pointer to a procedure. | ||||
| Initially the pointer used is zero and all traps halt the | ||||
| program with, hopefully, a useful message to the outside world. | ||||
| The SIG instruction can be used to alter the trap register, | ||||
| it pops a procedure pointer from the | ||||
| stack into the trap register. | ||||
| When a trap occurs after storing a nonzero value in the trap | ||||
| register, the procedure pointed to by the trap register | ||||
| is called with the trap number | ||||
| as the only parameter (see below). | ||||
| SIG returns the previous value of the trap register on the | ||||
| stack. | ||||
| Two consecutive SIGs are a no-op. | ||||
| When a trap occurs, the trap register is reset to its initial | ||||
| condition, to prevent recursive traps from hanging the machine up, | ||||
| e.g. stack overflow in the stack overflow handling procedure. | ||||
| .P | ||||
| The runtime systems for some languages need to ignore some EM | ||||
| traps. | ||||
| EM offers a feature called the ignore mask. | ||||
| It contains one bit for each of the lowest 16 trap numbers. | ||||
| The bits are numbered 0 to 15, with the least significant bit | ||||
| having number 0. | ||||
| If a certain bit is 1 the corresponding trap never | ||||
| occurs and processing simply continues. | ||||
| The actions performed by the offending instruction are | ||||
| described by the Pascal program in appendix A. | ||||
| .N | ||||
| If the bit is 0, traps are not ignored. | ||||
| The instructions LIM and SIM allow copying and replacement of | ||||
| the ignore mask.~ | ||||
| .P | ||||
| The TRP instruction generates a trap, the trap number being found on the | ||||
| stack. | ||||
| This is, among other things, | ||||
| useful for library procedures and runtime systems. | ||||
| It can also be used by a low level trap procedure to pass the trap to a | ||||
| higher level one (see example below). | ||||
| .P | ||||
| The RTT instruction returns from the trap procedure and continues after the | ||||
| trap. | ||||
| In the list below all traps marked with an asterisk ('*') are | ||||
| considered to be fatal and it is explicitly undefined what happens if | ||||
| you try to restart after the trap. | ||||
| .P | ||||
| The way a trap procedure is called is completely compatible | ||||
| with normal calling conventions. The only way a trap procedure | ||||
| differs from normal procedures is the return. It has to use RTT instead | ||||
| of RET. This is necessary because the complete runtime status is saved on the | ||||
| stack before calling the procedure and all this status has to be reloaded. | ||||
| Error numbers are in the range 0 to 252. | ||||
| The trap numbers are divided into three categories: | ||||
| .IS 4 | ||||
| .N 1 | ||||
| .PS - 10 | ||||
| .PT ~~0-~63 | ||||
| EM machine errors, e.g. illegal instruction. | ||||
| .PS - 8 | ||||
| .PT ~0-15 | ||||
| maskable | ||||
| .PT 16-63 | ||||
| not maskable | ||||
| .PE | ||||
| .PT ~64-127 | ||||
| Reserved for use by compilers, run time systems, etc. | ||||
| .PT 128-252 | ||||
| Available for user programs. | ||||
| .PE 1 | ||||
| .IE | ||||
| EM machine errors are numbered as follows: | ||||
| .DS I 5 | ||||
| .TS | ||||
| tab(@); | ||||
| n l l. | ||||
| 0@EARRAY@Array bound error | ||||
| 1@ERANGE@Range bound error | ||||
| 2@ESET@Set bound error | ||||
| 3@EIOVFL@Integer overflow | ||||
| 4@EFOVFL@Floating overflow | ||||
| 5@EFUNFL@Floating underflow | ||||
| 6@EIDIVZ@Divide by 0 | ||||
| 7@EFDIVZ@Divide by 0.0 | ||||
| 8@EIUND@Undefined integer | ||||
| 9@EFUND@Undefined float | ||||
| 10@ECONV@Conversion error | ||||
| 16*@ESTACK@Stack overflow | ||||
| 17*@EHEAP@Heap overflow | ||||
| 18*@EILLINS@Illegal instruction | ||||
| 19*@EODDZ@Illegal size argument | ||||
| 20*@ECASE@Case error | ||||
| 21*@EMEMFLT@Addressing non existent memory | ||||
| 22*@EBADPTR@Bad pointer used | ||||
| 23*@EBADPC@Program counter out of range | ||||
| 24@EBADLAE@Bad argument of LAE | ||||
| 25@EBADMON@Bad monitor call | ||||
| 26@EBADLIN@Argument of LIN too high | ||||
| 27@EBADGTO@GTO descriptor error | ||||
| .TE | ||||
| .DE 0 | ||||
| .P | ||||
| As an example, | ||||
| suppose a subprocedure has to be written to do a numeric | ||||
| calculation. | ||||
| When an overflow occurs the computation has to be stopped and | ||||
| the higher level procedure must be resumed. | ||||
| This can be programmed as follows using the mechanism described above: | ||||
| .DS B | ||||
|  mes 2,2,2              ; set sizes | ||||
| ersave | ||||
|  bss 2,0,0              ; Room to save previous value of trap procedure | ||||
| msave | ||||
|  bss 2,0,0              ; Room to save previous value of trap mask | ||||
| 
 | ||||
|  pro calcule,0          ; entry point | ||||
|  lxl 0                  ; fill in non-local goto descriptor with LB | ||||
|  ste jmpbuf+4 | ||||
|  lor 1                  ; and SP | ||||
|  ste jmpbuf+2 | ||||
|  lim                    ; get current ignore mask | ||||
|  ste msave              ; save it | ||||
|  lim | ||||
|  loc 4                  ; bit for EFOVFL | ||||
|  ior 2                  ; set in mask | ||||
|  sim                    ; ignore EFOVFL from now on | ||||
|  lpi $catch             ; load procedure identifier | ||||
|  sig                    ; catch wil get all traps now | ||||
|  ste ersave             ; save previous trap procedure identifier | ||||
| ; perform calculation now, possibly generating overflow | ||||
| 1                       ; label jumped to by catch procedure | ||||
|  loe ersave             ; get old trap procedure | ||||
|  sig                    ; refer all following trap to old procedure | ||||
|  asp 2                  ; remove result of sig | ||||
|  loe msave              ; restore previous mask | ||||
|  sim                    ; done now | ||||
| ; load result of calculation | ||||
|  ret 2                  ; return result | ||||
| jmpbuf | ||||
|  con *1,0,0 | ||||
|  end | ||||
| .DE 0 | ||||
| .VS 1 1 | ||||
| .DS | ||||
| Example of catch procedure | ||||
|  pro catch,0            ; Local procedure that must catch the overflow trap | ||||
|  lol 2                  ; Load trap number | ||||
|  loc 4                  ; check for overflow | ||||
|  bne *1                 ; if other trap, call higher trap procedure | ||||
|  gto jmpbuf             ; return to procedure calcule | ||||
| 1                       ; other trap has occurred | ||||
|  loe ersave             ; previous trap procedure | ||||
|  sig                    ; other procedure will get the traps now | ||||
|  asp 2                  ; remove the result of sig | ||||
|  lol 2                  ; stack trap number | ||||
|  trp                    ; call other trap procedure | ||||
|  rtt                    ; if other procedure returns, do the same | ||||
|  end | ||||
| .DE | ||||
							
								
								
									
										6
									
								
								doc/em/ip.awk
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										6
									
								
								doc/em/ip.awk
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,6 @@ | |||
| BEGIN { printf ".TS\nlw(6) lw(8) rw(3) rw(6) 14 lw(6) lw(8) rw(3) rw(6) 14 lw(6) lw(8) rw(3) rw(6).\n" } | ||||
| NF == 4 { printf "%s\t%s\t%d\t%d",$1,$2,$3,$4 } | ||||
| NF == 3 { printf "%s\t%s\t\t%d",$1,$2,$3 } | ||||
|  { if ( NR%3 == 0 ) printf("\n") ; else printf("\t"); } | ||||
| END { if ( NR%3 != 0 ) printf("\n") | ||||
|       printf ".TE\n" } | ||||
							
								
								
									
										61
									
								
								doc/em/ispace.nr
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										61
									
								
								doc/em/ispace.nr
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,61 @@ | |||
| .SN 3 | ||||
| .BP | ||||
| .S1 "INSTRUCTION ADDRESS SPACE" | ||||
| The instruction space of the EM machine contains | ||||
| the code for procedures. | ||||
| Tables necessary for the execution of this code, for example, procedure | ||||
| descriptor tables, may also be present. | ||||
| The instruction space does not change during | ||||
| the execution of a program, so that it may be | ||||
| protected. | ||||
| No further restrictions to the instruction address space are | ||||
| necessary for the abstract and assembly language level. | ||||
| .P | ||||
| Each procedure has a single entry point: the first instruction. | ||||
| A special type of pointer identifies a procedure. | ||||
| Pointers into the instruction | ||||
| address space have the same size as pointers into data space and | ||||
| can, for example, contain the address of the first instruction | ||||
| or an index in a procedure descriptor table. | ||||
| .A | ||||
| There is a single EM program counter, PC, pointing | ||||
| to the next instruction to be executed. | ||||
| The procedure pointed to by PC is | ||||
| called the 'current' procedure. | ||||
| A procedure may call another procedure using the CAL or CAI | ||||
| instruction. | ||||
| The calling procedure remains 'active' and is resumed whenever the called | ||||
| procedure returns. | ||||
| Note that a procedure has several 'active' invocations when | ||||
| called recursively. | ||||
| .P | ||||
| Each procedure must return properly. | ||||
| It is not allowed to fall through to the | ||||
| code of the next procedure. | ||||
| There are several ways to exit from a procedure: | ||||
| .IS 3 | ||||
| .PS | ||||
| .PT | ||||
| the RET instruction, which returns to the | ||||
| calling procedure. | ||||
| .PT | ||||
| the RTT instruction, which exits a trap handling routine and resumes | ||||
| the trapping instruction (see next chapter). | ||||
| .PT | ||||
| the GTO instruction, which is used for non-local goto's. | ||||
| It can remove several frames from the stack and transfer | ||||
| control to an active procedure. | ||||
| .PE | ||||
| .IE | ||||
| .P | ||||
| All branch instructions can transfer control | ||||
| to any label within the same procedure. | ||||
| Branch instructions can never jump out of a procedure. | ||||
| .P | ||||
| Several language implementations use a so called procedure | ||||
| instance identifier, a combination of a procedure identifier and | ||||
| the LB of a stack frame, also called static link. | ||||
| .P | ||||
| The program text for each procedure, as well as any tables, | ||||
| are fragments and can be allocated anywhere | ||||
| in the instruction address space. | ||||
							
								
								
									
										2525
									
								
								doc/em/itables
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										2525
									
								
								doc/em/itables
									
										
									
									
									
										Normal file
									
								
							
										
											
												File diff suppressed because it is too large
												Load diff
											
										
									
								
							
							
								
								
									
										390
									
								
								doc/em/mach.nr
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										390
									
								
								doc/em/mach.nr
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,390 @@ | |||
| .BP | ||||
| .SN 10 | ||||
| .S1 "EM MACHINE LANGUAGE" | ||||
| The EM machine language is designed to make program text compact | ||||
| and to make decoding easy. | ||||
| Compact program text has many advantages: programs execute faster, | ||||
| programs occupy less primary and secondary storage and loading | ||||
| programs into satellite processors is faster. | ||||
| The decoding of EM machine language is so simple, | ||||
| that it is feasible to use interpreters as long as EM hardware | ||||
| machines are not available. | ||||
| This chapter is irrelevant when back ends are used to | ||||
| produce executable target machine code. | ||||
| .S2 "Instruction encoding" | ||||
| A design goal of EM is to make the | ||||
| program text as compact as possible. | ||||
| Decoding must be easy, however. | ||||
| The encoding is fully byte oriented, without any small bit fields. | ||||
| There are 256 primary opcodes, two of which are an escape to | ||||
| two groups of 256 secondary opcodes each. | ||||
| .A | ||||
| EM instructions without arguments have a single opcode assigned, | ||||
| possibly escaped: | ||||
| .DS | ||||
| 
 | ||||
|          |--------------| | ||||
|          |    opcode    | | ||||
|          |--------------| | ||||
| 
 | ||||
|                 or | ||||
| 
 | ||||
|          |--------------|--------------| | ||||
|          |    escape    |     opcode   | | ||||
|          |--------------|--------------| | ||||
| 
 | ||||
| .DE | ||||
| The encoding for instructions with an argument is more complex. | ||||
| Several instructions have an address from the global data area | ||||
| as argument. | ||||
| Other instructions have different opcodes for positive | ||||
| and negative arguments. | ||||
| .N 1 | ||||
| There is always an opcode that takes the next two bytes as argument, | ||||
| high byte first: | ||||
| .DS | ||||
| 
 | ||||
|          |--------------|--------------|--------------| | ||||
|          |    opcode    |    hibyte    |    lobyte    | | ||||
|          |--------------|--------------|--------------| | ||||
| 
 | ||||
|                 or | ||||
| 
 | ||||
|          |--------------|--------------|--------------|--------------| | ||||
|          |    escape    |    opcode    |    hibyte    |    lobyte    | | ||||
|          |--------------|--------------|--------------|--------------| | ||||
| 
 | ||||
| .DE | ||||
| .DS | ||||
| An extra escape is provided for instructions with four or eight byte arguments. | ||||
| 
 | ||||
|   |--------------|--------------|--------------|   |--------------| | ||||
|   |    ESCAPE    |    opcode    |    hibyte    |...|    lobyte    | | ||||
|   |--------------|--------------|--------------|   |--------------| | ||||
| 
 | ||||
| .DE | ||||
| For most instructions some argument values predominate. | ||||
| The most frequent combinations of instruction and argument | ||||
| will be encoded in a single byte, called a mini: | ||||
| .DS | ||||
| 
 | ||||
|          |---------------| | ||||
|          |opcode+argument|  (mini) | ||||
|          |---------------| | ||||
| 
 | ||||
| .DE | ||||
| The number of minis is restricted, because only | ||||
| 254 primary opcodes are available. | ||||
| Many instructions have the bulk of their arguments | ||||
| fall in the range 0 to 255. | ||||
| Instructions that address global data have their arguments | ||||
| distributed over a wider range, | ||||
| but small values of the high byte are common. | ||||
| For all these cases there is another encoding | ||||
| that combines the instruction and the high byte of the argument | ||||
| into a single opcode. | ||||
| These opcodes are called shorties. | ||||
| Shorties may be escaped. | ||||
| .DS | ||||
| 
 | ||||
|          |--------------|--------------| | ||||
|          | opcode+high  |    lobyte    |  (shortie) | ||||
|          |--------------|--------------| | ||||
| 
 | ||||
|                 or | ||||
| 
 | ||||
|          |--------------|--------------|--------------| | ||||
|          |    escape    | opcode+high  |    lobyte    | | ||||
|          |--------------|--------------|--------------| | ||||
| 
 | ||||
| .DE | ||||
| Escaped shorties are useless if the normal encoding has a primary opcode. | ||||
| Note that for some instruction-argument combinations | ||||
| several different encodings are available. | ||||
| It is the task of the assembler to select the shortest of these. | ||||
| The savings by these mini and shortie | ||||
| opcodes are considerable, about 55%. | ||||
| .P | ||||
| Further improvements are possible: | ||||
| the arguments of | ||||
| many instructions are a multiple of the wordsize. | ||||
| Some do also not allow zero as an argument. | ||||
| If these arguments are divided by the wordsize and, | ||||
| when zero is not allowed, then decremented by 1, more of them can | ||||
| be encoded as shortie or mini. | ||||
| The arguments of some other instructions | ||||
| rarely or never assume the value 0, but start at 1. | ||||
| The value 1 is then encoded as 0, | ||||
| 2 as 1 and so on. | ||||
| .P | ||||
| Assigning opcodes to instructions by the assembler is completely | ||||
| table driven. | ||||
| For details see appendix B. | ||||
| .S2 "Procedure descriptors" | ||||
| The procedure identifiers used in the interpreter are indices | ||||
| into a table of procedure descriptors. | ||||
| Each descriptor contains: | ||||
| .IS 6 | ||||
| .PS - 4 | ||||
| .PT 1. | ||||
| the number of bytes to be reserved for locals at each | ||||
| invocation. | ||||
| .N | ||||
| This is a pointer-szied integer. | ||||
| .PT 2. | ||||
| the start address of the procedure | ||||
| .PE | ||||
| .IE | ||||
| .S2 "Load format" | ||||
| The EM machine language load format defines the interface between | ||||
| the EM assembler/loader and the EM machine itself. | ||||
| A load file consists of a header, the program text to be executed, | ||||
| a description of the global data area and the procedure descriptor table, | ||||
| in this order. | ||||
| All integers in the load file are presented with the | ||||
| least significant byte first. | ||||
| .P | ||||
| The header has two parts: the first half (eight 16-bit integers) | ||||
| aids in selecting | ||||
| the correct EM machine or interpreter. | ||||
| Some EM machines, for instance, may have hardware floating point | ||||
| instructions. | ||||
| .N | ||||
| The header entries are as follows (bit 0 is rightmost): | ||||
| .IS 2 | ||||
| .VS 1 0 | ||||
| .PS 1 4 "" : | ||||
| .PT | ||||
| magic number (07255) | ||||
| .PT | ||||
| flag bits with the following meaning: | ||||
| .PS - 7 "" : | ||||
| .PT bit 0 | ||||
| TEST; test for integer overflow etc. | ||||
| .PT bit 1 | ||||
| PROFILE; for each source line: count the number of memory | ||||
| cycles executed. | ||||
| .PT bit 2 | ||||
| FLOW; for each source line: set a bit in a bit map table if | ||||
| instructions on that line are executed. | ||||
| .PT bit 3 | ||||
| COUNT; for each source line: increment a counter if that line | ||||
| is entered. | ||||
| .PT bit 4 | ||||
| REALS; set if a program uses floating point instructions. | ||||
| .PT bit 5 | ||||
| EXTRA; more tests during compiler debugging. | ||||
| .PE | ||||
| .PT | ||||
| number of unresolved references. | ||||
| .PT | ||||
| version number; used to detect obsolete EM load files. | ||||
| .PT | ||||
| wordsize ; the number of bytes in each machine word. | ||||
| .PT | ||||
| pointer size ; the number of bytes available for addressing. | ||||
| .PT | ||||
| unused | ||||
| .PT | ||||
| unused | ||||
| .PE | ||||
| .IE | ||||
| The second part of the header (eight entries, of pointer size bytes each) | ||||
| describes the load file itself: | ||||
| .IS 2 | ||||
| .PS 1 4 "" : | ||||
| .PT | ||||
| NTEXT; the program text size in bytes. | ||||
| .PT | ||||
| NDATA; the number of load-file descriptors (see below). | ||||
| .PT | ||||
| NPROC; the number of entries in the procedure descriptor table. | ||||
| .PT | ||||
| ENTRY; procedure number of the procedure to start with. | ||||
| .PT | ||||
| NLINE; the maximum source line number. | ||||
| .PT | ||||
| SZDATA; the address of the lowest uninitialized data byte. | ||||
| .PT | ||||
| unused | ||||
| .PT | ||||
| unused | ||||
| .PE | ||||
| .IE | ||||
| .P | ||||
| The program text consists of NTEXT bytes. | ||||
| NTEXT is always a multiple of the wordsize. | ||||
| The first byte of the program text is the | ||||
| first byte of the instruction address | ||||
| space, i.e. it has address 0. | ||||
| Pointers into the program text are found in the procedure descriptor | ||||
| table where relocation is simple and in the global data area. | ||||
| The initialization of the global data area allows easy | ||||
| relocation of pointers into both address spaces. | ||||
| .P | ||||
| The global data area is described by the NDATA descriptors. | ||||
| Each descriptor describes a number of consecutive words (of~wordsize) | ||||
| and consists of a sequence of bytes. | ||||
| While reading the descriptors from the load file, one can | ||||
| initialize the global data area from low to high addresses. | ||||
| The size of the initialized data area is given by SZDATA, | ||||
| this number can be used to check the initialization. | ||||
| .N | ||||
| The header of each descriptor consists of a byte, describing the type, | ||||
| and a count. | ||||
| The number of bytes used for this (unsigned) count depends on the | ||||
| type of the descriptor and | ||||
| is either a pointer-sized integer | ||||
| or one byte. | ||||
| The meaning of the count depends on the descriptor type. | ||||
| At load time an interpreter can | ||||
| perform any conversion deemed necessary, such as | ||||
| reordering bytes in integers | ||||
| and pointers and adding base addresses to pointers. | ||||
| .BP | ||||
| .A | ||||
| In the following pictures we show a graphical notation of the | ||||
| initializers. | ||||
| The leftmost rectangle represents the leading byte. | ||||
| .N 1 | ||||
| .DS | ||||
| .PS - 4 " " | ||||
| Fields marked with | ||||
| .N 1 | ||||
| .PT n | ||||
| contain a pointer-sized integer used as a count | ||||
| .PT m | ||||
| contain a one-byte integer used as a count | ||||
| .PT b | ||||
| contain a one-byte integer | ||||
| .PT w | ||||
| contain a wordsized integer | ||||
| .PT p | ||||
| contain a data or instruction pointer | ||||
| .PT s | ||||
| contain a null terminated ASCII string | ||||
| .PE 1 | ||||
| .DE 0 | ||||
| .VS 1 1 | ||||
| .DS | ||||
| 
 | ||||
|     ------------------- | ||||
|     | 0 |      n      |           repeat last initialization n times | ||||
|     ------------------- | ||||
| .DE | ||||
| .DS | ||||
|     --------- | ||||
|     | 1 | m |                     m uninitialized words | ||||
|     --------- | ||||
| .DE | ||||
| .DS | ||||
|                ____________ | ||||
|               /    bytes   \e | ||||
|     -----------------   ----- | ||||
|     | 2 | m | b | b |...| b |     m initialized bytes | ||||
|     -----------------   ----- | ||||
| .DE | ||||
| .DS | ||||
|                _________ | ||||
|               /  word   \e | ||||
|     ----------------------- | ||||
|     | 3 | m |      w      |...    m initialized wordsized integers | ||||
|     ----------------------- | ||||
| .DE | ||||
| .DS | ||||
|                _________ | ||||
|               / pointer \e | ||||
|     ----------------------- | ||||
|     | 4 | m |      p      |...    m initialized data pointers | ||||
|     ----------------------- | ||||
| .DE | ||||
| .DS | ||||
|                _________ | ||||
|               / pointer \e | ||||
|     ----------------------- | ||||
|     | 5 | m |      p      |...    m initialized instruction pointers | ||||
|     ----------------------- | ||||
| .DE | ||||
| .DS | ||||
|                ____________ | ||||
|               /    bytes   \e | ||||
|     ------------------------- | ||||
|     | 6 | m | b | b |...| b |     initialized integer of size m | ||||
|     ------------------------- | ||||
| .DE | ||||
| .DS | ||||
|                ____________ | ||||
|               /    bytes   \e | ||||
|     ------------------------- | ||||
|     | 7 | m | b | b |...| b |     initialized unsigned of size m | ||||
|     ------------------------- | ||||
| .DE | ||||
| .DS | ||||
|                ____________ | ||||
|               /   string   \e | ||||
|     ------------------------- | ||||
|     | 8 | m |        s      |     initialized float of size m | ||||
|     ------------------------- | ||||
| .DE 3 | ||||
| .PS - 8 | ||||
| .PT type~0: | ||||
| If the last initialization initialized k bytes starting | ||||
| at address \fIa\fP, do the same initialization again n times, | ||||
| starting at \fIa\fP+k, \fIa\fP+2*k, .... \fIa\fP+n*k. | ||||
| This is the only descriptor whose starting byte | ||||
| is followed by an integer with the | ||||
| size of a | ||||
| pointer, | ||||
| in all other descriptors the first byte is followed by a one-byte count. | ||||
| This descriptor must be preceded by a descriptor of | ||||
| another type. | ||||
| .PT type~1: | ||||
| Reserve m words, not explicitly initialized (BSS and HOL). | ||||
| .PT type~2: | ||||
| The m bytes following the descriptor header are | ||||
| initializers for the next m bytes of the | ||||
| global data area. | ||||
| m is divisible by the wordsize. | ||||
| .PT type~3: | ||||
| The m words following the header are initializers for the next m words of the | ||||
| global data area. | ||||
| .PT type~4: | ||||
| The m data address space pointers following the header are | ||||
| initializers for the next | ||||
| m data pointers in the global data area. | ||||
| Interpreters that represent EM pointers by | ||||
| target machine addresses must relocate all data pointers. | ||||
| .PT type~5: | ||||
| The m instruction address space pointers following the header are | ||||
| initializers for the next | ||||
| m instruction pointers in the global data area. | ||||
| Interpreters that represent EM instruction pointers by | ||||
| target machine addresses must relocate these pointers. | ||||
| .PT type~6: | ||||
| The m bytes following the header form | ||||
| a signed integer number with a size of m bytes, | ||||
| which is an initializer for the next m bytes | ||||
| of the global data area. | ||||
| m is governed by the same restrictions as for | ||||
| transfer of objects to/from memory. | ||||
| .PT type~7: | ||||
| The m bytes following the header form | ||||
| an unsigned integer number with a size of m bytes, | ||||
| which is an initializer for the next m bytes | ||||
| of the global data area. | ||||
| m is governed by the same restrictions as for | ||||
| transfer of objects to/from memory. | ||||
| .PT type~8: | ||||
| The header is followed by an ASCII string, null terminated, to | ||||
| initialize, in global data, | ||||
| a floating point number with a size of m bytes. | ||||
| m is governed by the same restrictions as for | ||||
| transfer of objects to/from memory. | ||||
| The ASCII string contains the notation of a real as used in the | ||||
| Pascal language. | ||||
| .PE | ||||
| .P | ||||
| The NPROC procedure descriptors on the load file consist of | ||||
| an instruction space address (of~pointer~size) and | ||||
| an integer (of~pointer~size) specifying the number of bytes for | ||||
| locals. | ||||
							
								
								
									
										16
									
								
								doc/em/macr.nr
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										16
									
								
								doc/em/macr.nr
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,16 @@ | |||
| .so /usr/lib/tmac/tmac.kun | ||||
| .SS 6 | ||||
| .RP | ||||
| .PL 12i 11i | ||||
| .LL 89 | ||||
| .MS T E | ||||
| \!.TL '%''' | ||||
| .ME | ||||
| .MS T O | ||||
| \!.TL '''%' | ||||
| .ME | ||||
| .MS B | ||||
| .sp 1 | ||||
| .ME | ||||
| .SM S1 B | ||||
| .SM S2 B | ||||
							
								
								
									
										245
									
								
								doc/em/mapping.nr
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										245
									
								
								doc/em/mapping.nr
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,245 @@ | |||
| .SN 5 | ||||
| .BP | ||||
| .S1 "MAPPING OF EM DATA MEMORY ONTO TARGET MACHINE MEMORY" | ||||
| The EM architecture is designed to be implemented | ||||
| on many existing and future machines. | ||||
| EM memory is highly fragmented to make | ||||
| adaptation to various memory architectures possible. | ||||
| Format and encoding of pointers is explicitly undefined. | ||||
| .P | ||||
| This chapter gives solutions to some of the | ||||
| anticipated problems. | ||||
| First, we describe a possible memory layout for machines | ||||
| with 64K bytes of address space. | ||||
| Here we use a member of the EM family with 2-byte word and pointer | ||||
| size. | ||||
| The most straightforward layout is shown in figure 2. | ||||
| .N 1 | ||||
| .DS | ||||
|        65534 -> |-------------------------------| | ||||
|                 |///////////////////////////////| | ||||
|                 |//// unimplemented memory /////| | ||||
|                 |///////////////////////////////| | ||||
|           ML -> |-------------------------------| | ||||
|                 |                               | | ||||
|                 |                               | <- LB | ||||
|                 |     stack and local area      | | ||||
|                 |                               | | ||||
|                 |-------------------------------| <- SP | ||||
|                 |///////////////////////////////| | ||||
|                 |//////// inaccessible /////////| | ||||
|                 |///////////////////////////////| | ||||
|                 |-------------------------------| <- HP | ||||
|                 |                               | | ||||
|                 |           heap area           | | ||||
|                 |                               | | ||||
|                 |                               | | ||||
|           HB -> |-------------------------------| | ||||
|                 |                               | | ||||
|                 |       global data area        | | ||||
|                 |                               | | ||||
|           EB -> |-------------------------------| | ||||
|                 |                               | | ||||
|                 |         program text          | <- PC | ||||
|                 |                               | | ||||
|                 |        ( and tables )         | | ||||
|                 |                               | | ||||
|                 |                               | | ||||
|           PB -> |-------------------------------| | ||||
|                 |///////////////////////////////| | ||||
|                 |////////// undefined //////////| | ||||
|                 |///////////////////////////////| | ||||
|            0 -> |-------------------------------| | ||||
| 
 | ||||
|            Figure 2.  Memory layout showing typical register | ||||
|            positions during execution of an EM program. | ||||
| .DE 2 | ||||
| The base registers for the various memory pieces can be stored | ||||
| in target machine registers or memory. | ||||
| .IS | ||||
| .N 1 | ||||
| .TS | ||||
| tab(;); | ||||
| l 1 l l l. | ||||
| PB;:;program base;points to the base of the instruction address space. | ||||
| EB;:;external base;points to the base of the data address space. | ||||
| HB;:;heap base;points to the base of the heap area. | ||||
| ML;:;memory limit;marks the high end of the addressable data space. | ||||
| .TE 1 | ||||
| .IE | ||||
| The stack grows from high | ||||
| EM addresses to low EM addresses, and the heap the | ||||
| other way. | ||||
| The memory between SP and HP is not accessible, | ||||
| but may be allocated later to the stack or the heap if needed. | ||||
| The local data area is allocated starting at the high end of | ||||
| memory. | ||||
| .P | ||||
| Because EM address 0 is not mapped onto target | ||||
| address 0, a problem arises when pointers are used. | ||||
| If a program pushed a constant, say 6, onto the stack, | ||||
| and then tried to indirect through it, | ||||
| the wrong word would be fetched, | ||||
| because EM address 6 is mapped onto target address EB+6 | ||||
| and not target address 6 itself. | ||||
| This particular problem is solved by explicitly declaring | ||||
| the format of a pointer to be undefined, | ||||
| so that using a constant as a pointer is completely illegal. | ||||
| However, the general problem of mapping pointers still exists. | ||||
| .P | ||||
| There are two possible solutions. | ||||
| In the first solution, EM pointers are represented | ||||
| in the target machine as true EM addresses, | ||||
| for example, a pointer to EM address 6 really is | ||||
| stored as a 6 in the target machine. | ||||
| This solution implies that every time a pointer is fetched | ||||
| EB must be added before referencing | ||||
| the target machine's memory. | ||||
| If the target machine has powerful indexing | ||||
| facilities, EB can be kept in a target machine register, | ||||
| and the relocation can indeed be done on | ||||
| every reference to the data address space | ||||
| at a modest cost in speed. | ||||
| .P | ||||
| The other solution consists of having EM pointers | ||||
| refer to the true target machine address. | ||||
| Thus the instruction LAE 6 (Load Address of External 6) | ||||
| would push the value of EB+6 onto the stack. | ||||
| When this approach is chosen, back ends must know | ||||
| how to offset from EB, to translate all | ||||
| instructions that manipulate EM addresses. | ||||
| However, the problem is not completely solved, | ||||
| because a front end may have to initialize a pointer | ||||
| in CON or ROM data to point to a global address. | ||||
| This pointer must also be relocated by the back end or the interpreter. | ||||
| .P | ||||
| Although the EM stack grows from high to low EM addresses, | ||||
| some machines have hardware PUSH and POP | ||||
| instructions that require the stack to grow upwards. | ||||
| If reasons of efficiency urge you to use these | ||||
| instructions, then EM | ||||
| can be implemented with the memory layout | ||||
| upside down, as shown in figure 3. | ||||
| This is possible because the pointer format is explicitly undefined. | ||||
| The first element of a word array will have a | ||||
| lower physical address than the second element. | ||||
| .N 2 | ||||
| .DS | ||||
|           |                 |                    |                 | | ||||
|           |      EB=60      |                    |        ^        | | ||||
|           |                 |                    |        |        | | ||||
|           |-----------------|                    |-----------------| | ||||
|       105 |   45   |   44   | 104            214 |   41   |   40   | 215 | ||||
|           |-----------------|                    |-----------------| | ||||
|       103 |   43   |   42   | 102            212 |   43   |   42   | 213 | ||||
|           |-----------------|                    |-----------------| | ||||
|       101 |   41   |   40   | 100            210 |   45   |   44   | 211 | ||||
|           |-----------------|                    |-----------------| | ||||
|           |        |        |                    |                 | | ||||
|           |        v        |                    |      EB=255     | | ||||
|           |                 |                    |                 | | ||||
| 
 | ||||
|                 Type A                                 Type B | ||||
| .sp 2 | ||||
|               Figure 3. Two possible memory implementations. | ||||
|                  Numbers within the boxes are EM addresses. | ||||
|                  The other numbers are physical addresses. | ||||
| .DE 2 | ||||
| .A 0 0 | ||||
| So, we have two different EM memory implementations: | ||||
| .IS | ||||
| .PS - 4 | ||||
| .PT A~- | ||||
| stack downwards | ||||
| .PT B~- | ||||
| stack upwards | ||||
| .PE | ||||
| .IE | ||||
| .P | ||||
| For each of these two possibilities we give the translation of | ||||
| the EM instructions to push the third byte of a global data | ||||
| block starting at EM address 40 onto the stack and to load the | ||||
| word at address 40. | ||||
| All translations assume a word and pointer size of two bytes. | ||||
| The target machine used is a PDP-11 augmented with push and pop instructions. | ||||
| Registers 'r0' and 'r1' are used and suffer from sign extension for byte | ||||
| transfers. | ||||
| Push $40 means push the constant 40, not word 40. | ||||
| .P | ||||
| The translation of the EM instructions depends on the pointer representation | ||||
| used. | ||||
| For each of the two solutions explained above the translation is given. | ||||
| .P | ||||
| First, the translation for the two implementations using EM addresses as | ||||
| pointer representation: | ||||
| .DS | ||||
| .TS | ||||
| tab(:), center; | ||||
| l s l s l s | ||||
| _ s _ s _ s | ||||
| l 2 l 6 l 2 l 6 l 2 l. | ||||
| EM:type A:type B | ||||
| 
 | ||||
| 
 | ||||
| LAE:40:push:$40:push:$40 | ||||
| 
 | ||||
| ADP:3:pop:r0:pop:r0 | ||||
| ::add:$3,r0:add:$3,r0 | ||||
| ::push:r0:push:r0 | ||||
| 
 | ||||
| LOI:1:pop:r0:pop:r0 | ||||
| ::-::neg:r0 | ||||
| ::clr:r1:clr:r1 | ||||
| ::bisb:eb(r0),r1:bisb:eb(r0),r1 | ||||
| ::push:r1:push:r1 | ||||
| 
 | ||||
| LOE:40:push:eb+40:push:eb-41 | ||||
| .TE | ||||
| .DE | ||||
| .BP | ||||
| .P | ||||
| The translation for the two implementations, if the target machine address is | ||||
| used as pointer representation, is: | ||||
| .N 1 | ||||
| .DS | ||||
| .TS | ||||
| tab(:), center; | ||||
| l s l s l s | ||||
| _ s _ s _ s | ||||
| l 2 l 6 l 2 l 6 l 2 l. | ||||
| EM:type A:type B | ||||
| 
 | ||||
| 
 | ||||
| LAE:40:push:$eb+40:push:$eb-40 | ||||
| 
 | ||||
| ADP:3:pop:r0:pop:r0 | ||||
| ::add:$3,r0:sub:$3,r0 | ||||
| ::push:r0:push:r0 | ||||
| 
 | ||||
| LOI:1:pop:r0:pop:r0 | ||||
| ::clr:r1:clr:r1 | ||||
| ::bisb:(r0),r1:bisb:(r0),r1 | ||||
| ::push:r1:push:r1 | ||||
| 
 | ||||
| LOE:40:push:eb+40:push:eb-41 | ||||
| .TE | ||||
| .DE | ||||
| .P | ||||
| The translation presented above is not intended to be optimal. | ||||
| Most machines can handle these simple cases in one or two instructions. | ||||
| It demonstrates, however, the flexibility of the EM design. | ||||
| .P | ||||
| There are several possibilities to implement EM on machines with | ||||
| address spaces larger than 64k bytes. | ||||
| For EM with two byte pointers one could allocate instruction and | ||||
| data space each in a separate 64k piece of memory. | ||||
| EM pointers still have to fit in two bytes, | ||||
| but the base registers PB and EB may be loaded in hardware registers | ||||
| wider than 16 bits, if available. | ||||
| EM implementations can also make efficient use of a machine | ||||
| with separate instruction and data space. | ||||
| .P | ||||
| EM with 32 bit pointers allows one to make use of machines | ||||
| with large address spaces. | ||||
| In a virtual, segmented memory system one could use a separate | ||||
| segment for each fragment. | ||||
							
								
								
									
										80
									
								
								doc/em/mem.nr
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										80
									
								
								doc/em/mem.nr
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,80 @@ | |||
| .BP | ||||
| .SN 2 | ||||
| .S1 MEMORY | ||||
| The EM machine has two distinct address spaces, | ||||
| one for instructions and one for data. | ||||
| The data space is divided up into 8-bit bytes. | ||||
| The smallest addressable unit is a byte. | ||||
| Bytes are numbered consecutively from 0 to some maximum. | ||||
| All sizes in EM are expressed in bytes. | ||||
| .P | ||||
| Some EM instructions can transfer objects containing several bytes | ||||
| to and/or from memory. | ||||
| The size of all objects larger than a word must be a multiple of | ||||
| the wordsize. | ||||
| The size of all objects smaller than a word must be a divisor | ||||
| of the wordsize. | ||||
| For example: if the wordsize is 2 bytes, objects of the sizes 1, | ||||
| 2, 4, 6,... are allowed. | ||||
| The address of such an object is the lowest address of all bytes it contains. | ||||
| For objects smaller than the wordsize, the | ||||
| address must be a multiple of the object size. | ||||
| For all other objects the address must be a multiple of the | ||||
| wordsize. | ||||
| For example, if an instruction transfers a 4-byte object to memory at | ||||
| location \fIm\fP and the wordsize is 2, | ||||
| \fIm\fP must be a multiple of 2 and the bytes at | ||||
| locations \fIm\fP, \fIm\fP\|+\|1,\fIm\fP\|+\|2 and | ||||
| \fIm\fP\|+\|3 are overwritten. | ||||
| .P | ||||
| The size of almost all objects in EM | ||||
| is an integral number of words. | ||||
| Only two operations are allowed on | ||||
| objects whose size is a divisor of the wordsize: | ||||
| push it onto the stack and pop it from the stack. | ||||
| The addressing of these objects in memory is always indirect. | ||||
| If such a small object is pushed onto the stack | ||||
| it is assumed to be a small integer and stored | ||||
| in the least significant part of a word. | ||||
| The rest of the word is cleared to zero, | ||||
| although | ||||
| EM provides a way to sign-extend a small integer. | ||||
| Popping a small object from the stack removes a word | ||||
| from the stack, stores the least significant byte(s) | ||||
| of this word in memory and discards the rest of the word. | ||||
| .P | ||||
| The format of pointers into both address spaces is explicitly undefined. | ||||
| The size of a pointer, however, is fixed for a member of EM, so that | ||||
| the compiler writer knows how much storage to allocate for a pointer. | ||||
| .P | ||||
| A minor problem is raised by the undefined pointer format. | ||||
| Some languages, notably Pascal, require a special, | ||||
| otherwise illegal, pointer value to represent the nil pointer. | ||||
| The current Pascal-VU compiler uses the | ||||
| integer value 0 as nil pointer. | ||||
| This value is also used by many C programs as a normally impossible address. | ||||
| A better solution would be to have a special | ||||
| instruction loading an illegal pointer value, | ||||
| but it is hard to imagine an implementation | ||||
| for which the current solution is inadequate, | ||||
| especially because the first word in the EM data space | ||||
| is special and probably not the target of any pointer. | ||||
| .P | ||||
| The next two chapters describe the EM memory | ||||
| in more detail. | ||||
| One describes the instruction address space, | ||||
| the other the data address space. | ||||
| .P | ||||
| A design goal of EM has been to allow | ||||
| its implementation on a wide range of existing machines, | ||||
| as well as allowing a new one to be built in hardware. | ||||
| To this extent we have tried to minimize the demands | ||||
| of EM on the memory structure of the target machine. | ||||
| Therefore, apart from the logical partitioning, | ||||
| EM memory is divided into 'fragments'. | ||||
| A fragment consists of consecutive machine | ||||
| words and has a base address and a size. | ||||
| Pointer arithmetic is only defined within a fragment. | ||||
| The only exception to this rule is comparison with the null | ||||
| pointer. | ||||
| All fragments must be word aligned. | ||||
							
								
								
									
										5
									
								
								doc/em/print
									
										
									
									
									
										Executable file
									
								
							
							
						
						
									
										5
									
								
								doc/em/print
									
										
									
									
									
										Executable file
									
								
							|  | @ -0,0 +1,5 @@ | |||
| 
 | ||||
| case $# in | ||||
| 1)      make "$1".t ; ntlp "$1".t^lpr ;; | ||||
| *)      echo $0 heeft een argument nodig ;; | ||||
| esac | ||||
							
								
								
									
										4
									
								
								doc/em/show
									
										
									
									
									
										Executable file
									
								
							
							
						
						
									
										4
									
								
								doc/em/show
									
										
									
									
									
										Executable file
									
								
							|  | @ -0,0 +1,4 @@ | |||
| case $# in | ||||
| 1)      make $1.t ; ntout $1.t ;; | ||||
| *)      echo $0 heeft een argument nodig ;; | ||||
| esac | ||||
							
								
								
									
										38
									
								
								doc/em/title.nr
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										38
									
								
								doc/em/title.nr
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,38 @@ | |||
| .po 0 | ||||
| .TP 1 | ||||
| .ll 79 | ||||
| .sp 15 | ||||
| .ce 4 | ||||
| DESCRIPTION OF A MACHINE | ||||
| ARCHITECTURE FOR  USE WITH | ||||
| BLOCK  STRUCTURED  LANGUAGES | ||||
| .sp 6 | ||||
| .ce 4 | ||||
| Andrew S. Tanenbaum | ||||
| Hans  van  Staveren | ||||
| Ed G. Keizer | ||||
| Johan  W. Stevenson\v'-0.5m'*\v'0.5m' | ||||
| .sp 2 | ||||
| .ce | ||||
| August 1983 | ||||
| .sp 2 | ||||
| .ce | ||||
| Informatica Rapport IR-81 | ||||
| .sp 13 | ||||
| Abstract | ||||
| .sp 2 | ||||
| .ti +5 | ||||
| EM is a family of intermediate languages | ||||
| designed for producing portable compilers. | ||||
| A program called | ||||
| .B front end | ||||
| translates source programs to EM. | ||||
| Another program, | ||||
| .B back | ||||
| .BW end , | ||||
| translates EM to the assembly language of the target machine. | ||||
| Alternatively, the EM program can be assembled to a highly | ||||
| efficient binary format for interpretation. | ||||
| This document describes the EM languages in detail. | ||||
| .sp 4 | ||||
| \v'-0.5m'*\v'0.5m' Present affiliation: NV Philips, Eindhoven | ||||
							
								
								
									
										130
									
								
								doc/em/types.nr
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										130
									
								
								doc/em/types.nr
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,130 @@ | |||
| .SN 6 | ||||
| .BP | ||||
| .S1 "TYPE REPRESENTATIONS" | ||||
| The representations used for typed objects are not precisely | ||||
| specified by EM. | ||||
| Sometimes we only specify that a typed object occupies a | ||||
| certain amount of space and state no further restrictions. | ||||
| If one wants to have a different representation of the value of | ||||
| an object on the stack one has to use a convert instruction | ||||
| in most cases. | ||||
| We do specify some relations between the representations of | ||||
| types. | ||||
| This allows some intermixed use of operators for different types | ||||
| on the same object(s). | ||||
| For example, the instruction ZER pushes signed and | ||||
| unsigned integers with the value zero and empty sets. | ||||
| ZER has as only argument the size of the object. | ||||
| .A | ||||
| The representation of floating point numbers is a good example, | ||||
| it allows widely varying implementations. | ||||
| The only ways to create floating point numbers are via | ||||
| initialization and via conversions from integer numbers. | ||||
| Only by using conversions to integers and comparing | ||||
| two floating point numbers with each other, can these numbers | ||||
| be converted to human readable output. | ||||
| Implementations may use base 10, base 2 or any other | ||||
| base for exponents, and have freedom in choosing the range of | ||||
| exponent and mantissa. | ||||
| .A | ||||
| Other types are more precisely described. | ||||
| In the following paragraphs a description will be given of the | ||||
| restrictions imposed on the representation of the types used. | ||||
| A number \fBn\fP used in these paragraphs indicates the size of | ||||
| the object in \fIbits\fP. | ||||
| .S2 "Unsigned integers" | ||||
| The range of unsigned integers is 0..2\v'-0.5m'\fBn\fP\v'0.5m'-1. | ||||
| A binary representation is assumed. | ||||
| The order of the bits within an object is knowingly left | ||||
| unspecified. | ||||
| Discussing bit order within each 8-bit byte is academic, | ||||
| so the only real freedom of this specification lies in the byte | ||||
| order. | ||||
| We really do not care whether an implementation of a 4-byte | ||||
| integer has its bytes in a particular order of significance. | ||||
| This of course means that some sequences of instructions have | ||||
| unpredictable effects. | ||||
| For example: | ||||
| .DS | ||||
|    LOC 258 ; STL 0 ; LAL 0 ; LOI 1      ( wordsize >=2 ) | ||||
| .DE | ||||
| The value on the stack after executing this sequence | ||||
| can be anything, | ||||
| but will most likely be 1 or 2. | ||||
| .A | ||||
| Conversion between unsigned integers of different sizes have to | ||||
| be done with explicit convert instructions. | ||||
| One cannot simply pad an unsigned integer with zero's at either end | ||||
| and expect a correct result. | ||||
| .A | ||||
| We assume existence of at least single word unsigned arithmetic | ||||
| in any implementation. | ||||
| .S2 "Signed Integers" | ||||
| The range of signed integers is -2\v'-0.5m'\fBn\fP-1\v'0.5m'~..~2\v'-0.5m'\fBn\fP-1\v'0.5m'-1, | ||||
| in other words the range of signed integers of \fBn\fP bits | ||||
| using two's complement arithmetic. | ||||
| The representation is the same as for unsigned integers except | ||||
| the range 2\v'-0.5m'\fBn\fP-1\v'0.5m'~..~2\v'-0.5m'\fBn\fP\v'0.5m'-1 is mapped on the | ||||
| range -2\v'-0.5m'\fBn\fP-1\v'0.5m'~..~-1. | ||||
| In other words, the most significant bit is used as sign bit. | ||||
| The convert instructions between signed and unsigned integers | ||||
| of the same size can be used to catch errors. | ||||
| .A | ||||
| The value -2\v'-0.5m'\fBn\fP-1\v'0.5m' is used for undefined | ||||
| signed integers. | ||||
| EM implementations should trap when this value is used in an | ||||
| operation on signed integers. | ||||
| The instruction mask, accessed with SIM and LIM -~see chapter 9~- , | ||||
| can be used to disable such traps. | ||||
| .A | ||||
| We assume existence of at least single word signed arithmetic | ||||
| in any implementation. | ||||
| .BP | ||||
| .S2 "Floating point values" | ||||
| Floating point values must have a signed mantissa and a signed | ||||
| exponent. | ||||
| Although no base is specified, base 2 is the normal choice, | ||||
| because the FEF instruction pushes the exponent in base 2. | ||||
| .A | ||||
| The implementation of floating point arithmetic is optional. | ||||
| The compilers currently in use have runtime parameters for the | ||||
| size of the floating point values they should use. | ||||
| Common choices are 4 and/or 8 bytes. | ||||
| .S2 Pointers | ||||
| EM has two kinds of pointers: for instruction and for data | ||||
| space. | ||||
| Each kind can only be used for its own space, conversion between | ||||
| these two subtypes is impossible. | ||||
| We assume that pointers have a range from 0 upwards. | ||||
| Any implementation may have holes in the pointer range between | ||||
| fragments. | ||||
| One can of course not expect to be able to address two megabyte | ||||
| of memory using a 2-byte pointer. | ||||
| Normally, a 2-byte pointer allows up to 65536 bytes of | ||||
| addressable memory. | ||||
| .A | ||||
| Pointer representation has one restriction. | ||||
| The pointer with the same representation as the integer zero of | ||||
| the same size should be invalid. | ||||
| Some languages and/or runtime systems represent the nil | ||||
| pointer as zero. | ||||
| .S2 "Bit sets" | ||||
| All bit sets of size \fBn\fP are subsets of the set | ||||
| {~i~|~i>=0,~i<\fBn\fP~}. | ||||
| A bit set contains a bit for each element showing its | ||||
| presence or absence. | ||||
| Bit sets are subdivided into words. | ||||
| The word with the lowest EM address governs the subset | ||||
| {~i~|~i>=0,~i<\fBm\fP~}, where \fBm\fP is the number of bits in | ||||
| a word. | ||||
| The next higher words each govern the next higher \fBm\fP set elements. | ||||
| The relation between a set with size of | ||||
| a word and an unsigned integer word is that | ||||
| the value of the unsigned integer is the summation of the | ||||
| 2\v'-0.5m'i\v'0.5m' where i is in the set. | ||||
| .A | ||||
| Example: a 2-word bit set (wordsize 2) containing the | ||||
| elements 1, 6, 8, 15, 18, 21, 27 and 28 is composed of two | ||||
| integers, e.g. at addresses 40 and 42. | ||||
| The word at 40 contains the value 33090 (or~-32446), | ||||
| the word at 42 contains the value 6180. | ||||
		Loading…
	
	Add table
		
		Reference in a new issue