589 lines
		
	
	
	
		
			29 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			589 lines
		
	
	
	
		
			29 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| .\"	Implementation details
 | |
| .\"
 | |
| .\"	$Header$
 | |
| .bp
 | |
| .NH
 | |
| IMPLEMENTATION DETAILS.
 | |
| .PP
 | |
| The pertinent issues are addressed below, in arbitrary order.
 | |
| .NH 2
 | |
| Stack manipulation and start-up
 | |
| .PP
 | |
| It is not at all easy to start the EM machine with the stack in a reasonable
 | |
| and consistent state.  One reason is the anomalous value of the ML register
 | |
| and another is the absence of a proper RSB.  It may be argued that the initial
 | |
| stack does not have to be in a consistent state, since the first instruction
 | |
| proper is only executed after \fIargc\fP, \fIargv\fP and \fIenviron\fP
 | |
| have been stacked (which takes care of the empty stack) and the initial
 | |
| procedure has been called (which creates a RSB).  We would, however, like to
 | |
| preform the stacking of these values and the calling of the initial procedure
 | |
| using the normal stack and call routines, which again require the stack to be
 | |
| in an acceptable state.
 | |
| .NH 3
 | |
| The anomalous value of the ML register
 | |
| .PP
 | |
| All registers in the EM machine point to word boundaries, and all of them,
 | |
| except ML, address the even-numbered byte at the boundary.
 | |
| The exception has a good reason: the even numbered byte at the ML boundary does
 | |
| not exist.
 | |
| This problem is not particular to EM but is inherent in the number system: the
 | |
| number of N-digit numbers can itself not be expressed in an N-digit number, and
 | |
| the number of addresses in an N-bit machine will itself not fit in an N-bit
 | |
| address.  The problem is solved in the interpreter by having ML point to the
 | |
| highest word boundary that has bytes on either side; this makes ML+1
 | |
| expressible.
 | |
| .NH 3
 | |
| The absence of an initial Return Status Block
 | |
| .PP
 | |
| When the stack is empty, there is no legal value for AB, since there are no
 | |
| actuals; LB can be set naturally to ML+1.  This is all right when the
 | |
| interpreter starts with a call of the initial routine which stores the value
 | |
| of LB in the first RSB, but causes problems when finally this call returns.  We
 | |
| want this call to return completely before stopping the interpreter, to check
 | |
| the integrity of the last RSB; restoring information from it will, however,
 | |
| cause illegal values to be stored in LB and AB (ML+1 and ML+1+rsbsize, resp.).
 | |
| On top of this, the initial (illegal) Procedure Identifier of the running
 | |
| procedure will be restored; then, upon restoring the likewise illegal PC will
 | |
| cause a check to see if it still is inside the running procedure.  After a few
 | |
| attempts at writing special cases, we have decided that it is possible, but not
 | |
| worth the effort; the final (= initial) RSB will not be unstacked.
 | |
| .NH 2
 | |
| Floating point numbers.
 | |
| .PP
 | |
| The interpreter is capable of working with 4- and 8-byte floating point (FP)
 | |
| numbers.
 | |
| In C-terms, this corresponds to objects of type float and double respectively.
 | |
| Both types fit in a C-double so the obvious way to manipulate these entities
 | |
| internally is in doubles.
 | |
| Pushing a 8-byte FP, all bytes of the C-double are pushed.
 | |
| Pushing a 4-byte FP causes the 4 bytes representing the smallest fraction
 | |
| to be discarded.
 | |
| .PP
 | |
| In EM, floats can be obtained in two different ways: via conversion
 | |
| of another type, or via initialization in the loadfile.
 | |
| Initialized floats are represented in the loadfile by an ASCII string in
 | |
| the syntax of a Pascal real (signed \fPUnsignedReal\fP).
 | |
| I.e. a float looks like:
 | |
| .DS
 | |
| [ \fISign\fP ] \fIDigit\fP+ [ . \fIDigit\fP+ ] [ \fIExp\fP [ \fISign\fP ] \fIDigit\fP+ ]                                (G1)
 | |
| .DE
 | |
| followed by a null byte.
 | |
| Here \fISign\fP = {+, \-}; \fIDigit\fP = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
 | |
| \fIExp\fP = {e, E}; [ \fIAnything\fP ] means that \fIAnything\fP is optional;
 | |
| and a + means one or more times.
 | |
| To accommodate some loose code generators, the actual grammar accepted is:
 | |
| .DS
 | |
| [ \fISign\fP ] \fIDigit\fP\(** [ . \fIDigit\fP\(** ] [ \fIExp\fP [ \fISign\fP ] \fIDigit\fP+ ]                                (G2)
 | |
| .DE
 | |
| followed by a null byte. Here \(** means zero or more times.  A floating
 | |
| denotation which is in G2 but not in G1 draws a warning, one that is not even
 | |
| in G2 causes a fatal error.
 | |
| .LP
 | |
| A string, representing a float which does not fit in a double causes a
 | |
| warning to be given.
 | |
| In that case, the returned value will be the double 0.0.
 | |
| .LP
 | |
| Floating point arithmetic is handled by some simple routines, checking for
 | |
| over/underflow, and returning appropriate values in case of an ignored error.
 | |
| .PP
 | |
| Since not all C compilers provide floating point operations, there is a
 | |
| compile time flag NOFLOAT, which, if defined, suppresses the use of all
 | |
| fp operations in the interpreter.  The resulting interpreter will still load
 | |
| EM files with floats in the global data area (and ignore them) but will give a
 | |
| fatal error upon attempt to execute a floating point instruction; consequently
 | |
| code involving floating point operations can be run as long as the actual
 | |
| instructions are avoided.
 | |
| .NH 2
 | |
| Pointers.
 | |
| .PP
 | |
| The following sub-sections both deal with problems concerning pointers.
 | |
| First, something is said about pointer arithmetic in general.
 | |
| Then, the null-pointer problem is dealt with.
 | |
| .NH 3
 | |
| Pointer arithmetic.
 | |
| .PP
 | |
| Strictly speaking, pointer arithmetic is defined only within a \fBfragment\fP.
 | |
| From the explanation of the term fragment however (as given in [1], page 3),
 | |
| it is not quite clear what a fragment should look like
 | |
| from an interpreter's point of view.
 | |
| For this reason we introduced the term \fBsegment\fP,
 | |
| bordering the various areas within which pointer arithmetic is allowed.
 | |
| Every stack-frame is a segment, and so are the global data area (GDA) and
 | |
| the heap area.
 | |
| Thus, the number of segments varies over time, and at some point in time is
 | |
| given by the number of currently active stack-frames
 | |
| (#CAL + #CAI \- #RET \- #RTT) plus 2 (gda, heap).
 | |
| Pointers in the area between heap and stack (which is inaccessible by
 | |
| definition), are assumed to be in the heap segment.
 | |
| .PP
 | |
| The interpreter, while building a new stack-frame (i.e. segment), stores the
 | |
| value of the last ActualBase in a pointer-array  (\fIAB_list[\ ]\fP).
 | |
| When a pointer (say \fIP\fP) is available for arithmetic, the number
 | |
| of the segment where it points (say \fIS\d\s-2P\s+2\u\fP),
 | |
| is determined first.
 | |
| Next, the arithmetic is performed, followed by a check on the number
 | |
| of the segment where the resulting pointer \fIR\fP points
 | |
| (say \fIS\d\s-2R\s+2\u\fP).
 | |
| Now, if \fIS\d\s-2P\s+2\u != S\d\s-2R\s+2\u\fP, a warning is given:
 | |
| \fBPointer arithmetic yields pointer to bad segment\fP.
 | |
| .br
 | |
| It may also be clear now, why the illegal area between heap and stack
 | |
| was joined with the heap segment.
 | |
| When calculating a new heap pointer (\fIHP\fP), one will obtain intermediate
 | |
| results being pointers in this area just before it is made legal.
 | |
| We do not want error messages all of the time, just because someone is
 | |
| allocating space in the heap.
 | |
| .LP
 | |
| A similar treatment is given to the pointers in the SBS instruction; they have
 | |
| to point into the same fragment for subtraction to be meaningful.
 | |
| .LP
 | |
| The length of the \fIAB_list[\ ]\fP is initially 100,
 | |
| and it is reallocated in the same way the dynamically growing partitions
 | |
| are (see 1.1).
 | |
| .NH 3
 | |
| Null pointer.
 | |
| .PP
 | |
| Because the EM language lacks an instruction for loading a null pointer,
 | |
| most programs solve this problem by loading a pointer-sized integer of
 | |
| value zero, and using this as a null pointer (this is also proposed in [1]).
 | |
| \fBInt\fP allows this, and will not complain.
 | |
| A warning is given however, when an attempt is made to add something to a
 | |
| null pointer (i.e. the pointer-sized integer zero).
 | |
| .LP
 | |
| Since many programming languages use a pointer to location 0 as an illegal
 | |
| value, it is desirable to detect its use.
 | |
| The big problem is though that 0 is a perfectly legal EM address;
 | |
| address 0 holds the current line number in the source file.  It may be freely
 | |
| read but is written only by means of the LIN instruction.  This allows us to
 | |
| declare the area consisting of the line number and the file name pointer to be
 | |
| read-only memory.  Thus a store will be caught (and result in a warning) but a
 | |
| read will succeed (and yield the EM information stored there).
 | |
| .NH 2
 | |
| Function Return Area (FRA).
 | |
| .PP
 | |
| The Function Return Area (\fIFRA[\ ]\fP) has a default size of 8 bytes;
 | |
| this default can
 | |
| be overridden through the use of the \fB\-r\fP-option, but cannot be
 | |
| made smaller than the size of two pointers, in accordance with the
 | |
| remark on page 5 of [1].
 | |
| The global variable \fIFRASize\fP keeps track of how many bytes were
 | |
| stored in the FRA, the last time a RET instruction was executed.
 | |
| The LFR instruction only works when its argument is equal to this size.
 | |
| If not, the FRA contents are loaded anyhow, but one of the following warnings
 | |
| is given:
 | |
| \fBReturned function result too large\fP (\fIFRASize\fP > LFR size) or
 | |
| \fBReturned function result too small\fP (\fIFRASize\fP < LFR size).
 | |
| .LP
 | |
| Note that a C-program, falling through the end of its code without doing
 | |
| a proper \fIreturn\fP or \fIexit()\fP, will generate this warning.
 | |
| .PP
 | |
| The only instructions that do not disturb the contents of the FRA are
 | |
| GTO, BRA, ASP and RET.
 | |
| This is expressed in the program by setting \fIFRA_def\fP to "undefined"
 | |
| in any instruction except these four.
 | |
| We realize this is a useless action most of the time, but a more
 | |
| efficient solution does not seem to be at hand.
 | |
| If a result is loaded when \fIFRA_def\fP is "undefined", the warning:
 | |
| \fBReturned function result may be garbled\fP is generated.
 | |
| .LP
 | |
| Note that the FRA needs a shadow-FRA in order to store the shadow
 | |
| information when performing a LFR instruction.
 | |
| .NH 2
 | |
| Environment interaction.
 | |
| .PP
 | |
| The EM machine represented by \fBint\fP can communicate with
 | |
| the environment in three different ways.
 | |
| A first possibility is by means of (UNIX) interrupts;
 | |
| the second by executing (relatively) high level system calls (called
 | |
| monitor calls).
 | |
| A third means of interaction, especially interesting for the debugging
 | |
| programmer, is via internal variables set on the command line.
 | |
| The former two techniques, and the way they are implemented will be described
 | |
| in this section.
 | |
| The latter has been allotted a separate section (3).
 | |
| .NH 3
 | |
| Traps and interrupts.
 | |
| .PP
 | |
| Simple user programs will generally not mess around with UNIX-signals.
 | |
| In interpreting these programs, the default actions will be taken
 | |
| when a signal is received by the program: it gives a message and
 | |
| stops running.
 | |
| .LP
 | |
| There are programs however, which try to handle certain signals
 | |
| themselves.
 | |
| In C, this is achieved by the system call \fIsignal(\ sig_no,\ catch\ )\fP,
 | |
| which calls the handling routine \fIcatch()\fP, as soon as signal
 | |
| \fBsig_no\fP occurs.
 | |
| EM does not provide this call; instead, the \fIsigtrp()\fP monitor call
 | |
| is available for mapping UNIX signals onto EM traps.
 | |
| This implies that a \fIsignal()\fP call in a C-program
 | |
| must be translated by the EM library routine to a \fIsigtrp()\fP call in EM.
 | |
| .PP
 | |
| The interpreter keeps an administration of the mapping of UNIX-signals
 | |
| onto EM traps in the array \fIsig_map[NSIG]\fP.
 | |
| Initially, the signals all have their default values.
 | |
| Now assume a \fIsigtrp()\fP occurs, telling to map signal \fBsig_no\fP onto
 | |
| trap \fBtrap_no\fP.
 | |
| This results in:
 | |
| .IP 1.
 | |
| setting the relevant array element
 | |
| \fIsig_map[sig_no]\fP to \fBtrap_no\fP (after saving the old value),
 | |
| .IP 2.
 | |
| catching the next to come \fBsig_no\fP signal with the handling routine
 | |
| \fIHndlEMSig\fP (by a plain UNIX \fIsignal()\fP of course), and
 | |
| .IP 3.
 | |
| returning the saved map-value on the stack so the user can know the previous
 | |
| trap value onto which \fBsig_no\fP was mapped.
 | |
| .LP
 | |
| On an incoming signal,
 | |
| the handling routine for signal \fBsig_no\fP arms the
 | |
| correct EM trap by calling the routine \fIarm_trap()\fP with argument
 | |
| \fIsig_map[sig_no]\fP.
 | |
| At the end of the EM instruction the proper call of \fItrap()\fP is done.
 | |
| \fITrap()\fP on its turn examines the value of the \fIHaltOnTrap\fP variable;
 | |
| if it is set, the interpreter will stop with a message. In the normal case of
 | |
| controlled trap handling this bit is not on and the interpreter examines
 | |
| the value of the \fITrapPI\fP variable,
 | |
| which contains the procedure identifier of the EM trap handling routine.
 | |
| It then initiates a call to this routine and performs a \fIlongjmp()\fP
 | |
| to the main
 | |
| loop to bypass all further processing of the instruction that caused the trap.
 | |
| \fITrapPI\fP should be set properly by the library routines, through the
 | |
| SIG instruction.
 | |
| .LP
 | |
| In short:
 | |
| .IP 1.
 | |
| A UNIX interrupt is caught by the interpreter.
 | |
| .IP 2.
 | |
| A handling routine is called which generates the corresponding EM trap
 | |
| (according to the mapping).
 | |
| .IP 3.
 | |
| The trap handler calls the corresponding EM routine which emulates a UNIX
 | |
| interrupt for the benefit of the interpreted program.
 | |
| .PP
 | |
| When considering UNIX signals, it is important to notice that some of them
 | |
| are real signals, i.e., messages coming from outside the program, like DEL
 | |
| and QUIT, but some are actually program-caused synchronous traps, like Illegal
 | |
| Instruction.  The latter, if they happen, are incurred by the interpreter
 | |
| itself and consequently are of no concern to the interpreted program: it
 | |
| cannot catch them.  The present code assumes that the UNIX signals between
 | |
| SIGILL (4) and SIGSYS (12) are really traps; \fIdo_sigtrp()\fP
 | |
| will fail on them.
 | |
| .LP
 | |
| To avoid losing the last line(s) of output files, the interpreter should
 | |
| always do a proper close-down, even in the presence of signals.  To this end,
 | |
| all non-ignored genuine signals are initially caught by the interpreter,
 | |
| through the routine \fIHndlIntSig\fP, which gives a message and preforms a
 | |
| proper close-down.
 | |
| Synchronous trap can only be caused by the interpreter itself; they are never
 | |
| caught, and consequently the UNIX default action prevails.  Generally they
 | |
| cause a core dump.
 | |
| Signals requested by the interpreted program are caught by the routine
 | |
| \fIHndlEMSig\fP, as explained above.
 | |
| .NH 3
 | |
| Monitor calls.
 | |
| .PP
 | |
| For the convenience of the programmer, as many monitor calls as possible
 | |
| have been implemented.
 | |
| The list of monitor calls given in [1] pages 20/21, has been implemented
 | |
| completely, except for \fIptrace()\fP, \fIprofil()\fP and \fImpxcall()\fP.
 | |
| The semantics of \fIptrace()\fP and \fIprofil()\fP from an interpreted program
 | |
| is unclear; the data structure passed to \fImpxcall()\fP is non-trivial
 | |
| and the system call has low portability and applicability.
 | |
| For these calls, on invocation a warning is generated, and the arguments which
 | |
| were meant for the call are popped properly, so the program can continue
 | |
| without the stack being messed up.
 | |
| The errorcode 5 (IOERROR) is pushed onto the stack (twice), in order to
 | |
| fake an unsuccessful monitor call.
 | |
| No other \- more meaningful \- errorcode is available in the errno-list.
 | |
| .LP
 | |
| Now for the implemented monitor calls.
 | |
| The returned value is zero for a successful call.
 | |
| When something goes wrong, the value of the external \fIerrno\fP variable
 | |
| is pushed, thus enabling the user to find out what the reason of failure was.
 | |
| The implementation of the majority of the monitor calls is straightforward.
 | |
| Those working with a special format buffer, (e.g. \fIioctl()\fP,
 | |
| \fItime()\fP and \fIstat()\fP variants), need some extra attention.
 | |
| This is due to the fact that working with varying word/pointer size
 | |
| combinations may cause alignment problems.
 | |
| .LP
 | |
| The data structure returned by the UNIX system call results from
 | |
| C code that has been translated with the regular C compiler, which,
 | |
| on the VAX, happens to be a 4-4 compiler.
 | |
| The data structure expected by the interpreted program conforms
 | |
| to the translation by \fBack\fP of the pertinent include file.
 | |
| Depending on the exact call of \fBack\fP, sizes and alignment may differ.
 | |
| .LP
 | |
| An example is in order. The EM MON 18 instruction in the interpreted program
 | |
| leads to a UNIX \fIstat()\fP system call by the interpreter.
 | |
| This call fills the given struct with stat information, the contents
 | |
| and alignments of which are determined by the version of UNIX and the
 | |
| used C compiler, resp.
 | |
| The interpreter, like any program wishing to do system calls that fill
 | |
| structs, has to be translated by a C compiler that uses the
 | |
| appropriate struct definition and alignments, so that it can use, e.g.,
 | |
| \fIstab.st_mtime\fP and expect to obtain the right field.
 | |
| This struct cannot be copied directly to the EM memory to fulfill the
 | |
| MON instruction.
 | |
| First, the struct may contain extraneous, system-dependent fields,
 | |
| pertaining, e.g., to symbolic links, sockets, etc.
 | |
| Second, it may contain holes, due to alignment requirements.
 | |
| The EM program runs on an EM machine, knows nothing about these
 | |
| requirements and expects UNIX Version 7 fields, with offsets as
 | |
| determined by the em22, em24 or em44 compiler, resp.
 | |
| To do the conversion, the interpreter has a built-in table of the
 | |
| offsets of all the fields in the structs that are filled by the MON
 | |
| instruction.
 | |
| The appropriate fields from the result of the UNIX \fIstat()\fP are copied
 | |
| one by one to the appropriate positions in the EM memory to be filled
 | |
| by MON 18.
 | |
| .PP
 | |
| The \fIioctl()\fP call (MON 54) poses additional problems. Not only does it
 | |
| have a second argument which is a pointer to a struct, the type of
 | |
| which is dynamically determined, but its first argument is an opcode
 | |
| that varies considerably between the versions of UNIX.
 | |
| To solve the first problem, the interpreter examines the opcode (request) and
 | |
| treats the second argument accordingly.  The second problem can be solved by
 | |
| translating the UNIX Version 7 \fIioctl()\fP request codes to their proper
 | |
| values on the various systems.  This is, however, not always useful, since
 | |
| some EM run-time systems use the local request codes.  There is a compile-time
 | |
| flag, V7IOCTL, which, if defined, will restrict the \fIioctl()\fP call to the
 | |
| version 7 request codes and emulate them on the local system; otherwise the
 | |
| request codes of the local system will be used (as far as implemented).
 | |
| .PP
 | |
| Minor problems also showed up with the implementation of \fIexecve()\fP
 | |
| and \fIfork()\fP.
 | |
| \fIExecve()\fP expects three pointers on the stack.
 | |
| The first points to the name of the program to be executed,
 | |
| the second and third are the beginnings of the \fBargv\fP and \fBenvp\fP
 | |
| pointer arrays respectively.
 | |
| We cannot pass these pointers to the system call however, because
 | |
| the EM addresses to which they point do not correspond with UNIX
 | |
| addresses.
 | |
| Moreover, (it is not very likely to happen but) what if someone constructs
 | |
| a program holding the contents for one of these pointers in the stack?
 | |
| The stack is implemented upside down, so passing the pointer to
 | |
| \fIexecve()\fP causes trouble for this reason too.
 | |
| The only solution was to copy the pointer contents completely
 | |
| to fresh UNIX memory, constructing vectors which can be passed to the
 | |
| system call.
 | |
| Any impending memory fault while making these copies results in failure of the
 | |
| system call, with \fIerrno\fP set to EFAULT.
 | |
| .PP
 | |
| The implementation of the \fIfork()\fP call faced us with problems
 | |
| concerning IO-channels.
 | |
| Checking messages (as well as logging) must be divided over different files.
 | |
| Otherwise, these messages will coincide.
 | |
| This problem was solved by post-fixing the default message file
 | |
| \fBint.mess\fP (as well as the logging file \fBint.log\fP) with an
 | |
| automatically leveled number for every new forked process.
 | |
| Children of the original process do their diagnostics
 | |
| in files with postfix 1,2,3 etc.
 | |
| Second generation processes are assigned files numbered 11, 12, 21 etc.
 | |
| When 6 generations of processes exist at one moment, the seventh will
 | |
| get the same message file as the sixth, for the length of the filename
 | |
| will become too long.
 | |
| .PP
 | |
| Some of the monitor calls receive pointers (addresses) from to program, to be
 | |
| passed to the kernel; examples are the struct stat for \fIstat()\fP, the area
 | |
| to be filled for \fIread()\fP, etc. If the address is wrong, the kernel does
 | |
| not generate a trap, but rather the system call returns with failure, while
 | |
| \fIerrno\fP is set to EFAULT.  This is implemented by consistent checking of
 | |
| all pointers in the MON instruction.
 | |
| .NH 2
 | |
| Internal arithmetic.
 | |
| .PP
 | |
| Doing arithmetic on signed integers, the smallest negative integer
 | |
| (\fIminsint\fP) is considered a legal value.
 | |
| This is in contradiction with the EM Manual [1], page 14, which proposes using
 | |
| \fIminsint\fP for uninitialized integers.
 | |
| The shadow bytes already check for uninitialized integers however,
 | |
| so we do not need this special illegal value.
 | |
| Although the EM Manual provides two traps, for undefined integers and floats,
 | |
| undefined objects occur so frequently (e.g. in block copying partially
 | |
| initialized areas) that the interpreter just gives a warning.
 | |
| .LP
 | |
| Except for arithmetic on unsigneds, all arithmetic checks for overflow.
 | |
| The value that is pushed on the stack after an overflow occurs depends
 | |
| on the UNIX behavior with regard to that particular calculation.
 | |
| If UNIX would not accept the calculation (e.g. division by zero), a zero
 | |
| is pushed as a convention.
 | |
| Illegal computations which UNIX does accept in silence (e.g. one's
 | |
| complement of \fIminsint\fP), simply push the UNIX-result after giving a
 | |
| trap message.
 | |
| .NH 2
 | |
| Shadow bytes implementation.
 | |
| .PP
 | |
| A great deal of run-time checking is performed by the interpreter (except if
 | |
| used in the fast version).
 | |
| This section gives all details about the shadow bytes.
 | |
| In order to keep track of information about the contents of D-space (stack
 | |
| and global data area), there is one shadow-byte for each byte in these spaces.
 | |
| Each bit in a shadow-byte represents some piece
 | |
| of information about the contents of its corresponding 'sun-byte'.
 | |
| All bits off indicates an undefined sun-byte.
 | |
| One or more bits on always guarantees a well-defined sun-byte.
 | |
| The bits have the following meaning:
 | |
| .IP "\(bu bit 0:" 8
 | |
| indicates that the sun-byte is (a part of) an integer.
 | |
| .IP "\(bu bit 1:" 8
 | |
| the sun-byte is a part of a floating point number.
 | |
| .IP "\(bu bit 2:" 8
 | |
| the sun-byte is a part of a pointer in dataspace.
 | |
| .IP "\(bu bit 3:" 8
 | |
| the sun-byte is a part of a pointer in the instruction space.
 | |
| According to [1] (paragraph 6.4), there are two types pointers which
 | |
| must be distinguishable.
 | |
| Conversion between these two types is impossible.
 | |
| The shadow-bytes make the distinction here.
 | |
| .IP "\(bu bit 4:" 8
 | |
| protection bit.
 | |
| Indicates that the sun-byte is part of a protected piece of memory.
 | |
| There is a protected area in the stack, the Return Status Block.
 | |
| The EM machine language has no possibility to declare protected
 | |
| memory, as is possible in EM assembly (the ROM instruction).  The protection
 | |
| bit is, however, set for the line number and filename pointer area near
 | |
| location 0, to aid in catching references to location 0.
 | |
| .IP "\(bu bit 5/6/7:" 8
 | |
| free for later use.
 | |
| .LP
 | |
| The shadow bytes are managed by the routines declared in \fIshadow.h\fP.
 | |
| The warnings originating from checking these shadow-bytes during
 | |
| run-time are various.
 | |
| A list of them is given in appendix A, together with suggestions
 | |
| (primarily for the C-programmer) where to look for the trouble maker(s).
 | |
| .LP
 | |
| A point to notice is, that once a warning is generated, it may be repeated
 | |
| thousands of times.
 | |
| Since repetitive warnings carry little information, but consume much
 | |
| file space, the interpreter keeps track of the number of times a given warning
 | |
| has been produced from a given line in a given file.
 | |
| The warning message will
 | |
| be printed only if the corresponding counter is a power of four (starting at
 | |
| 1).  In this way, a logarithmic back-off in warning generation is established.
 | |
| .LP
 | |
| It might be argued that the counter should be kept for each (warning, PC
 | |
| value) pair rather than for each (warning, file position) pair.  Suppose,
 | |
| however, that two instruction in a given line would cause the same message
 | |
| regularly; this would produce two intertwined streams of identical messages,
 | |
| with their counters jumping up and down.  This does not seem desirable.
 | |
| .NH 2
 | |
| Return Status Block (RSB)
 | |
| .PP
 | |
| According to the description in [1], at least the return address and the
 | |
| base address of the previous RSB have to be pushed when performing a call.
 | |
| Besides these two pointers, other information can be stored in the RSB
 | |
| also.
 | |
| The interpreter pushes the following items:
 | |
| .IP \-
 | |
| a pointer to the current filename,
 | |
| .IP \-
 | |
| the current line number (always four bytes),
 | |
| .IP \-
 | |
| the Local Base,
 | |
| .IP \-
 | |
| the return address (Program Counter),
 | |
| .IP \-
 | |
| the current procedure identifier
 | |
| .IP \-
 | |
| the RSB code, which distinguishes between initial start-up, normal call,
 | |
| returnable trap and non-returnable trap (a word-size integer).
 | |
| .LP
 | |
| Consequently, the size of the RSB varies, depending on
 | |
| word size and pointer size; its value is available as \fIrsbsize\fP.
 | |
| When the RSB is removed from the stack (by a RET or RTT) the RSB code is under
 | |
| the Stack Pointer for immediate checking.  It is not clear what should be done
 | |
| if RSB code and return instruction do not match; at present we give a message
 | |
| and continue, for what it is worth.
 | |
| .PP
 | |
| The reason for pushing filename and line number is that some front-ends tend
 | |
| to forget the LIN and FIL instructions after returning from a function.
 | |
| This may result in error messages in wrong source files and/or line numbers.
 | |
| .PP
 | |
| The procedure identifier is kept and restored to check that the PC will not
 | |
| move out of the running procedure.  The PI is an index in the proctab, which
 | |
| tells the limits in the text segment of the running procedure.
 | |
| .PP
 | |
| If the Return Status Block is generated as a result of a trap, more is
 | |
| stacked.  Before stacking the normal RSB, the trap function pushes the
 | |
| following items:
 | |
| .IP \-
 | |
| the contents of the entire Function Return Area,
 | |
| .IP \-
 | |
| the number of bytes significant in the above (a word-size integer),
 | |
| .IP \-
 | |
| a word-size flag indicating if the contents of the FRA are valid,
 | |
| .IP \-
 | |
| the trap number (a word-size integer).
 | |
| .LP
 | |
| The latter is followed directly by the RSB, and consequently acts as the only
 | |
| parameter to the trap handler.
 | |
| .NH 2
 | |
| Operand access.
 | |
| .PP
 | |
| The EM Manual mentions two ways to access the operands of an instruction.  It
 | |
| should be noticed that the operand in EM is often not the direct operand of the
 | |
| operation; the operand of the ADI instruction, e.g., is the width of the
 | |
| integers to be added, not one of the integers themselves.  The various operand
 | |
| types are described in [1].  Each opcode in the text segment identifies an
 | |
| instruction with a particular operand type; these relations are described in
 | |
| computer-readable format in a file in the EM tree, \fIip_spec.t\fP.
 | |
| .PP
 | |
| The interpreter uses the third method.  Several other approaches
 | |
| can be designed, with increasing efficiency and equally increasing complexity.
 | |
| They are briefly treated below.
 | |
| .NH 3
 | |
| The Dispatch Table, Method 1.
 | |
| .PP
 | |
| When the interpreter starts, it reads the ip_spec.t file and constructs from it
 | |
| a dispatch table.  This table (of which there are actually three,
 | |
| for primary, secondary
 | |
| and tertiary opcodes) has 256 entries, each describing an instruction with
 | |
| indications on how to decode the operand.  For each instruction executed, the
 | |
| interpreter finds the entry in the dispatch table, finds information there on
 | |
| how to access the operand, constructs the operand and calls the appropriate
 | |
| routine with the operand as calculated.  There is one routine for each
 | |
| instruction, which is called with the ready-made operand.  Method 1 is easy to
 | |
| program but requires constant interpretation of the dispatch table.
 | |
| .NH 3
 | |
| Intelligent Routines, Method 2.
 | |
| .PP
 | |
| For each opcode there is a separate routine, and since an opcode uniquely
 | |
| defines the instruction and the operand format, the routine knows how to get
 | |
| the operand; this knowledge is built into the routine.  Preferably the heading
 | |
| of the routine is generated automatically from the ip_spec.t file.  Operand
 | |
| decoding is immediate, and no dispatch table is needed.  Generation of the
 | |
| 469 required routines is, however, far from simple.  Either a generated array
 | |
| of routine names or a generated switch statement is used to map the opcode onto
 | |
| the correct routine.  The switch approach has the advantage that parameters can
 | |
| be passed to the routines.
 | |
| .NH 3
 | |
| Intelligent Calls, Method 3.
 | |
| .PP
 | |
| The call in the switch statement does full operand construction, and the
 | |
| resulting operand is passed to the routine.  This reduces the number of
 | |
| routines to 133, the number of EM instructions.  Generation of the switch
 | |
| statement from ip_spec.t is more complicated, but the routine space is
 | |
| much cleaner.  This does not give any speed-up since the same actions are still
 | |
| required; they are just performed in a different place.
 | |
| .NH 3
 | |
| Static Evaluation.
 | |
| .PP
 | |
| It can be observed that the evaluation of the operand of a given instruction in
 | |
| the text segment will always give the same result.  It is therefore possible to
 | |
| preprocess the text segment, decomposing the instructions into structs which
 | |
| contain the address, the instruction code and the operand.  No operand decoding
 | |
| will be necessary at run-time: all operands have been precalculated.  This will
 | |
| probably give a considerable speed-up.  Jumps, especially GTO jumps, will,
 | |
| however, require more attention.
 | |
| .NH 2
 | |
| Disassembly.
 | |
| .PP
 | |
| A disassembly facility is available, which gives a readable but not
 | |
| letter-perfect disassembly of the EM object.  The procedure structure is
 | |
| indicated by placing the indication  \fBP[n]\fP  at the entry point of each
 | |
| procedure, where \fBn\fP is the procedure identifier.  The number of locals is
 | |
| given in a comment.
 | |
| .LP
 | |
| The disassembler was generated by the software in the directory \fIswitch\fP
 | |
| and then further processed by hand.
 |