595 lines
		
	
	
	
		
			29 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			595 lines
		
	
	
	
		
			29 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
.\"	Implementation details
 | 
						|
.\"
 | 
						|
.\"	$Header$
 | 
						|
.bp
 | 
						|
.NH
 | 
						|
IMPLEMENTATION DETAILS.
 | 
						|
.PP
 | 
						|
The pertinent issues are addressed below, in arbitrary order.
 | 
						|
.NH 2
 | 
						|
Stack manipulation and start-up
 | 
						|
.PP
 | 
						|
It is not at all easy to start the EM machine with the stack in a reasonable
 | 
						|
and consistent state.  One reason is the anomalous value of the ML register
 | 
						|
and another is the absence of a proper RSB.  It may be argued that the initial
 | 
						|
stack does not have to be in a consistent state, since the first instruction
 | 
						|
proper is only executed after \fIargc\fP, \fIargv\fP and \fIenviron\fP
 | 
						|
have been stacked (which takes care of the empty stack) and the initial
 | 
						|
procedure has been called (which creates a RSB).  We would, however, like to
 | 
						|
preform the stacking of these values and the calling of the initial procedure
 | 
						|
using the normal stack and call routines, which again require the stack to be
 | 
						|
in an acceptable state.
 | 
						|
.NH 3
 | 
						|
The anomalous value of the ML register
 | 
						|
.PP
 | 
						|
All registers in the EM machine point to word boundaries, and all of them,
 | 
						|
except ML, address the even-numbered byte at the boundary.
 | 
						|
The exception has a good reason: the even numbered byte at the ML boundary does
 | 
						|
not exist.
 | 
						|
This problem is not particular to EM but is inherent in the number system: the
 | 
						|
number of N-digit numbers can itself not be expressed in an N-digit number, and
 | 
						|
the number of addresses in an N-bit machine will itself not fit in an N-bit
 | 
						|
address.  The problem is solved in the interpreter by having ML point to the
 | 
						|
highest word boundary that has bytes on either side; this makes ML+1
 | 
						|
expressible.
 | 
						|
.NH 3
 | 
						|
The absence of an initial Return Status Block
 | 
						|
.PP
 | 
						|
When the stack is empty, there is no legal value for AB, since there are no
 | 
						|
actuals; LB can be set naturally to ML+1.  This is all right when the
 | 
						|
interpreter starts with a call of the initial routine which stores the value
 | 
						|
of LB in the first RSB, but causes problems when finally this call returns.  We
 | 
						|
want this call to return completely before stopping the interpreter, to check
 | 
						|
the integrity of the last RSB; restoring information from it will, however,
 | 
						|
cause illegal values to be stored in LB and AB (ML+1 and ML+1+rsbsize, resp.).
 | 
						|
On top of this, the initial (illegal) Procedure Identifier of the running
 | 
						|
procedure will be restored; then, upon restoring the likewise illegal PC will
 | 
						|
cause a check to see if it still is inside the running procedure.  After a few
 | 
						|
attempts at writing special cases, we have decided that it is possible, but not
 | 
						|
worth the effort; the final (= initial) RSB will not be unstacked.
 | 
						|
.NH 2
 | 
						|
Floating point numbers.
 | 
						|
.PP
 | 
						|
The interpreter is capable of working with 4- and 8-byte floating point (FP)
 | 
						|
numbers.
 | 
						|
In C-terms, this corresponds to objects of type float and double respectively.
 | 
						|
Both types fit in a C-double so the obvious way to manipulate these entities
 | 
						|
internally is in doubles.
 | 
						|
Pushing a 8-byte FP, all bytes of the C-double are pushed.
 | 
						|
Pushing a 4-byte FP causes the 4 bytes representing the smallest fraction
 | 
						|
to be discarded.
 | 
						|
.PP
 | 
						|
In EM, floats can be obtained in two different ways: via conversion
 | 
						|
of another type, or via initialization in the loadfile.
 | 
						|
Initialized floats are represented in the loadfile by an ASCII string in
 | 
						|
the syntax of a Pascal real (signed \fPUnsignedReal\fP).
 | 
						|
I.e. a float looks like:
 | 
						|
.DS
 | 
						|
[ \fISign\fP ] \fIDigit\fP+ [ . \fIDigit\fP+ ] [ \fIExp\fP [ \fISign\fP ] \fIDigit\fP+ ]                                (G1)
 | 
						|
.DE
 | 
						|
followed by a null byte.
 | 
						|
Here \fISign\fP = {+, \-}; \fIDigit\fP = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
 | 
						|
\fIExp\fP = {e, E}; [ \fIAnything\fP ] means that \fIAnything\fP is optional;
 | 
						|
and a + means one or more times.
 | 
						|
To accommodate some loose code generators, the actual grammar accepted is:
 | 
						|
.DS
 | 
						|
[ \fISign\fP ] \fIDigit\fP\(** [ . \fIDigit\fP\(** ] [ \fIExp\fP [ \fISign\fP ] \fIDigit\fP+ ]                                (G2)
 | 
						|
.DE
 | 
						|
followed by a null byte. Here \(** means zero or more times.  A floating
 | 
						|
denotation which is in G2 but not in G1 draws a warning, one that is not even
 | 
						|
in G2 causes a fatal error.
 | 
						|
.LP
 | 
						|
A string, representing a float which does not fit in a double causes a
 | 
						|
warning to be given.
 | 
						|
In that case, the returned value will be the double 0.0.
 | 
						|
.LP
 | 
						|
Floating point arithmetic is handled by some simple routines, checking for
 | 
						|
over/underflow, and returning appropriate values in case of an ignored error.
 | 
						|
.PP
 | 
						|
Since not all C compilers provide floating point operations, there is a
 | 
						|
compile time flag NOFLOAT, which, if defined, suppresses the use of all
 | 
						|
fp operations in the interpreter.  The resulting interpreter will still load
 | 
						|
EM files with floats in the global data area (and ignore them) but will give a
 | 
						|
fatal error upon attempt to execute a floating point instruction; consequently
 | 
						|
code involving floating point operations can be run as long as the actual
 | 
						|
instructions are avoided.
 | 
						|
.NH 2
 | 
						|
Pointers.
 | 
						|
.PP
 | 
						|
The following sub-sections both deal with problems concerning pointers.
 | 
						|
First, something is said about pointer arithmetic in general.
 | 
						|
Then, the null-pointer problem is dealt with.
 | 
						|
.NH 3
 | 
						|
Pointer arithmetic.
 | 
						|
.PP
 | 
						|
Strictly speaking, pointer arithmetic is defined only within a \fBfragment\fP.
 | 
						|
From the explanation of the term fragment however (as given in [1], page 3),
 | 
						|
it is not quite clear what a fragment should look like
 | 
						|
from an interpreter's point of view.
 | 
						|
For this reason we introduced the term \fBsegment\fP,
 | 
						|
bordering the various areas within which pointer arithmetic is allowed.
 | 
						|
Every stack-frame is a segment, and so are the global data area (GDA) and
 | 
						|
the heap area.
 | 
						|
Thus, the number of segments varies over time, and at some point in time is
 | 
						|
given by the number of currently active stack-frames
 | 
						|
(#CAL + #CAI \- #RET \- #RTT) plus 2 (gda, heap).
 | 
						|
Pointers in the area between heap and stack (which is inaccessible by
 | 
						|
definition), are assumed to be in the heap segment.
 | 
						|
.PP
 | 
						|
The interpreter, while building a new stack-frame (i.e. segment), stores the
 | 
						|
value of the last ActualBase in a pointer-array  (\fIAB_list[\ ]\fP).
 | 
						|
When a pointer (say \fIP\fP) is available for arithmetic, the number
 | 
						|
of the segment where it points (say \fIS\d\s-2P\s+2\u\fP),
 | 
						|
is determined first.
 | 
						|
Next, the arithmetic is performed, followed by a check on the number
 | 
						|
of the segment where the resulting pointer \fIR\fP points
 | 
						|
(say \fIS\d\s-2R\s+2\u\fP).
 | 
						|
Now, if \fIS\d\s-2P\s+2\u != S\d\s-2R\s+2\u\fP, a warning is given:
 | 
						|
\fBPointer arithmetic yields pointer to bad segment\fP.
 | 
						|
.br
 | 
						|
It may also be clear now, why the illegal area between heap and stack
 | 
						|
was joined with the heap segment.
 | 
						|
When calculating a new heap pointer (\fIHP\fP), one will obtain intermediate
 | 
						|
results being pointers in this area just before it is made legal.
 | 
						|
We do not want error messages all of the time, just because someone is
 | 
						|
allocating space in the heap.
 | 
						|
.LP
 | 
						|
A similar treatment is given to the pointers in the SBS instruction; they have
 | 
						|
to point into the same fragment for subtraction to be meaningful.
 | 
						|
.LP
 | 
						|
The length of the \fIAB_list[\ ]\fP is initially 100,
 | 
						|
and it is reallocated in the same way the dynamically growing partitions
 | 
						|
are (see 1.1).
 | 
						|
.NH 3
 | 
						|
Null pointer.
 | 
						|
.PP
 | 
						|
Because the EM language lacks an instruction for loading a null pointer,
 | 
						|
most programs solve this problem by loading a pointer-sized integer of
 | 
						|
value zero, and using this as a null pointer (this is also proposed in [1]).
 | 
						|
\fBInt\fP allows this, and will not complain.
 | 
						|
A warning is given however, when an attempt is made to add something to a
 | 
						|
null pointer (i.e. the pointer-sized integer zero).
 | 
						|
.LP
 | 
						|
Since many programming languages use a pointer to location 0 as an illegal
 | 
						|
value, it is desirable to detect its use.
 | 
						|
The big problem is though that 0 is a perfectly legal EM address;
 | 
						|
address 0 holds the current line number in the source file.  It may be freely
 | 
						|
read but is written only by means of the LIN instruction.  This allows us to
 | 
						|
declare the area consisting of the line number and the file name pointer to be
 | 
						|
read-only memory.  Thus a store will be caught (and result in a warning) but a
 | 
						|
read will succeed (and yield the EM information stored there).
 | 
						|
.NH 2
 | 
						|
Function Return Area (FRA).
 | 
						|
.PP
 | 
						|
The Function Return Area (\fIFRA[\ ]\fP) has a default size of 8 bytes;
 | 
						|
this default can
 | 
						|
be overridden through the use of the \fB\-r\fP-option, but cannot be
 | 
						|
made smaller than the size of two pointers, in accordance with the
 | 
						|
remark on page 5 of [1].
 | 
						|
The global variable \fIFRASize\fP keeps track of how many bytes were
 | 
						|
stored in the FRA, the last time a RET instruction was executed.
 | 
						|
The LFR instruction only works when its argument is equal to this size.
 | 
						|
If not, the FRA contents are loaded anyhow, but one of the following warnings
 | 
						|
is given:
 | 
						|
\fBReturned function result too large\fP (\fIFRASize\fP > LFR size) or
 | 
						|
\fBReturned function result too small\fP (\fIFRASize\fP < LFR size).
 | 
						|
.LP
 | 
						|
Note that a C-program, falling through the end of its code without doing
 | 
						|
a proper \fIreturn\fP or \fIexit()\fP, will generate this warning.
 | 
						|
.PP
 | 
						|
The only instructions that do not disturb the contents of the FRA are
 | 
						|
GTO, BRA, ASP and RET.
 | 
						|
This is expressed in the program by setting \fIFRA_def\fP to "undefined"
 | 
						|
in any instruction except these four.
 | 
						|
We realize this is a useless action most of the time, but a more
 | 
						|
efficient solution does not seem to be at hand.
 | 
						|
If a result is loaded when \fIFRA_def\fP is "undefined", the warning:
 | 
						|
\fBReturned function result may be garbled\fP is generated.
 | 
						|
.LP
 | 
						|
Note that the FRA needs a shadow-FRA in order to store the shadow
 | 
						|
information when performing a LFR instruction.
 | 
						|
.NH 2
 | 
						|
Environment interaction.
 | 
						|
.PP
 | 
						|
The EM machine represented by \fBint\fP can communicate with
 | 
						|
the environment in three different ways.
 | 
						|
A first possibility is by means of (UNIX) interrupts;
 | 
						|
the second by executing (relatively) high level system calls (called
 | 
						|
monitor calls).
 | 
						|
A third means of interaction, especially interesting for the debugging
 | 
						|
programmer, is via internal variables set on the command line.
 | 
						|
The former two techniques, and the way they are implemented will be described
 | 
						|
in this section.
 | 
						|
The latter has been allotted a separate section (3).
 | 
						|
.NH 3
 | 
						|
Traps and interrupts.
 | 
						|
.PP
 | 
						|
Simple user programs will generally not mess around with UNIX-signals.
 | 
						|
In interpreting these programs, the default actions will be taken
 | 
						|
when a signal is received by the program: it gives a message and
 | 
						|
stops running.
 | 
						|
.LP
 | 
						|
There are programs however, which try to handle certain signals
 | 
						|
themselves.
 | 
						|
In C, this is achieved by the system call \fIsignal(\ sig_no,\ catch\ )\fP,
 | 
						|
which calls the handling routine \fIcatch()\fP, as soon as signal
 | 
						|
\fBsig_no\fP occurs.
 | 
						|
EM does not provide this call; instead, the \fIsigtrp()\fP monitor call
 | 
						|
is available for mapping UNIX signals onto EM traps.
 | 
						|
This implies that a \fIsignal()\fP call in a C-program
 | 
						|
must be translated by the EM library routine to a \fIsigtrp()\fP call in EM.
 | 
						|
.PP
 | 
						|
The interpreter keeps an administration of the mapping of UNIX-signals
 | 
						|
onto EM traps in the array \fIsig_map[NSIG]\fP.
 | 
						|
Initially, the signals all have their default values.
 | 
						|
Now assume a \fIsigtrp()\fP occurs, telling to map signal \fBsig_no\fP onto
 | 
						|
trap \fBtrap_no\fP.
 | 
						|
This results in:
 | 
						|
.IP 1.
 | 
						|
setting the relevant array element
 | 
						|
\fIsig_map[sig_no]\fP to \fBtrap_no\fP (after saving the old value),
 | 
						|
.IP 2.
 | 
						|
catching the next to come \fBsig_no\fP signal with the handling routine
 | 
						|
\fIHndlEMSig\fP (by a plain UNIX \fIsignal()\fP of course), and
 | 
						|
.IP 3.
 | 
						|
returning the saved map-value on the stack so the user can know the previous
 | 
						|
trap value onto which \fBsig_no\fP was mapped.
 | 
						|
.LP
 | 
						|
On an incoming signal,
 | 
						|
the handling routine for signal \fBsig_no\fP arms the
 | 
						|
correct EM trap by calling the routine \fIarm_trap()\fP with argument
 | 
						|
\fIsig_map[sig_no]\fP.
 | 
						|
At the end of the EM instruction the proper call of \fItrap()\fP is done.
 | 
						|
\fITrap()\fP on its turn examines the value of the \fIHaltOnTrap\fP variable;
 | 
						|
if it is set, the interpreter will stop with a message. In the normal case of
 | 
						|
controlled trap handling this bit is not on and the interpreter examines
 | 
						|
the value of the \fITrapPI\fP variable,
 | 
						|
which contains the procedure identifier of the EM trap handling routine.
 | 
						|
It then initiates a call to this routine and performs a \fIlongjmp()\fP
 | 
						|
to the main
 | 
						|
loop to bypass all further processing of the instruction that caused the trap.
 | 
						|
\fITrapPI\fP should be set properly by the library routines, through the
 | 
						|
SIG instruction.
 | 
						|
.LP
 | 
						|
In short:
 | 
						|
.IP 1.
 | 
						|
A UNIX interrupt is caught by the interpreter.
 | 
						|
.IP 2.
 | 
						|
A handling routine is called which generates the corresponding EM trap
 | 
						|
(according to the mapping).
 | 
						|
.IP 3.
 | 
						|
The trap handler calls the corresponding EM routine which emulates a UNIX
 | 
						|
interrupt for the benefit of the interpreted program.
 | 
						|
.PP
 | 
						|
When considering UNIX signals, it is important to notice that some of them
 | 
						|
are real signals, i.e., messages coming from outside the program, like DEL
 | 
						|
and QUIT, but some are actually program-caused synchronous traps, like Illegal
 | 
						|
Instruction.  The latter, if they happen, are incurred by the interpreter
 | 
						|
itself and consequently are of no concern to the interpreted program: it
 | 
						|
cannot catch them.  The present code assumes that the UNIX signals between
 | 
						|
SIGILL (4) and SIGSYS (12) are really traps; \fIdo_sigtrp()\fP
 | 
						|
will fail on them.
 | 
						|
.LP
 | 
						|
To avoid losing the last line(s) of output files, the interpreter should
 | 
						|
always do a proper close-down, even in the presence of signals.  To this end,
 | 
						|
all non-ignored genuine signals are initially caught by the interpreter,
 | 
						|
through the routine \fIHndlIntSig\fP, which gives a message and preforms a
 | 
						|
proper close-down.
 | 
						|
Synchronous trap can only be caused by the interpreter itself; they are never
 | 
						|
caught, and consequently the UNIX default action prevails.  Generally they
 | 
						|
cause a core dump.
 | 
						|
Signals requested by the interpreted program are caught by the routine
 | 
						|
\fIHndlEMSig\fP, as explained above.
 | 
						|
.NH 3
 | 
						|
Monitor calls.
 | 
						|
.PP
 | 
						|
For the convenience of the programmer, as many monitor calls as possible
 | 
						|
have been implemented.
 | 
						|
The list of monitor calls given in [1] pages 20/21, has been implemented
 | 
						|
completely, except for \fIptrace()\fP, \fIprofil()\fP and \fImpxcall()\fP.
 | 
						|
The semantics of \fIptrace()\fP and \fIprofil()\fP from an interpreted program
 | 
						|
is unclear; the data structure passed to \fImpxcall()\fP is non-trivial
 | 
						|
and the system call has low portability and applicability.
 | 
						|
For these calls, on invocation a warning is generated, and the arguments which
 | 
						|
were meant for the call are popped properly, so the program can continue
 | 
						|
without the stack being messed up.
 | 
						|
The errorcode 5 (IOERROR) is pushed onto the stack (twice), in order to
 | 
						|
fake an unsuccessful monitor call.
 | 
						|
No other \- more meaningful \- errorcode is available in the errno-list.
 | 
						|
.LP
 | 
						|
Now for the implemented monitor calls.
 | 
						|
The returned value is zero for a successful call.
 | 
						|
When something goes wrong, the value of the external \fIerrno\fP variable
 | 
						|
is pushed, thus enabling the user to find out what the reason of failure was.
 | 
						|
The implementation of the majority of the monitor calls is straightforward.
 | 
						|
Those working with a special format buffer, (e.g. \fIioctl()\fP,
 | 
						|
\fItime()\fP and \fIstat()\fP variants), need some extra attention.
 | 
						|
This is due to the fact that working with varying word/pointer size
 | 
						|
combinations may cause alignment problems.
 | 
						|
.LP
 | 
						|
The data structure returned by the UNIX system call results from
 | 
						|
C code that has been translated with the regular C compiler, which,
 | 
						|
on the VAX, happens to be a 4-4 compiler.
 | 
						|
The data structure expected by the interpreted program conforms
 | 
						|
to the translation by \fBack\fP of the pertinent include file.
 | 
						|
Depending on the exact call of \fBack\fP, sizes and alignment may differ.
 | 
						|
.LP
 | 
						|
An example is in order. The EM MON 18 instruction in the interpreted program
 | 
						|
leads to a UNIX \fIstat()\fP system call by the interpreter.
 | 
						|
This call fills the given struct with stat information, the contents
 | 
						|
and alignments of which are determined by the version of UNIX and the
 | 
						|
used C compiler, resp.
 | 
						|
The interpreter, like any program wishing to do system calls that fill
 | 
						|
structs, has to be translated by a C compiler that uses the
 | 
						|
appropriate struct definition and alignments, so that it can use, e.g.,
 | 
						|
\fIstab.st_mtime\fP and expect to obtain the right field.
 | 
						|
This struct cannot be copied directly to the EM memory to fulfill the
 | 
						|
MON instruction.
 | 
						|
First, the struct may contain extraneous, system-dependent fields,
 | 
						|
pertaining, e.g., to symbolic links, sockets, etc.
 | 
						|
Second, it may contain holes, due to alignment requirements.
 | 
						|
The EM program runs on an EM machine, knows nothing about these
 | 
						|
requirements and expects UNIX Version 7 fields, with offsets as
 | 
						|
determined by the em22, em24 or em44 compiler, resp.
 | 
						|
To do the conversion, the interpreter has a built-in table of the
 | 
						|
offsets of all the fields in the structs that are filled by the MON
 | 
						|
instruction.
 | 
						|
The appropriate fields from the result of the UNIX \fIstat()\fP are copied
 | 
						|
one by one to the appropriate positions in the EM memory to be filled
 | 
						|
by MON 18.
 | 
						|
.PP
 | 
						|
The \fIioctl()\fP call (MON 54) poses additional problems. Not only does it
 | 
						|
have a second argument which is a pointer to a struct, the type of
 | 
						|
which is dynamically determined, but its first argument is an opcode
 | 
						|
that varies considerably between the versions of UNIX.
 | 
						|
To solve the first problem, the interpreter examines the opcode (request) and
 | 
						|
treats the second argument accordingly.  The second problem can be solved by
 | 
						|
translating the UNIX Version 7 \fIioctl()\fP request codes to their proper
 | 
						|
values on the various systems.  This is, however, not always useful, since
 | 
						|
some EM run-time systems use the local request codes.  There is a compile-time
 | 
						|
flag, V7IOCTL, which, if defined, will restrict the \fIioctl()\fP call to the
 | 
						|
version 7 request codes and emulate them on the local system; otherwise the
 | 
						|
request codes of the local system will be used (as far as implemented).
 | 
						|
.PP
 | 
						|
Minor problems also showed up with the implementation of \fIexecve()\fP
 | 
						|
and \fIfork()\fP.
 | 
						|
\fIExecve()\fP expects three pointers on the stack.
 | 
						|
The first points to the name of the program to be executed,
 | 
						|
the second and third are the beginnings of the \fBargv\fP and \fBenvp\fP
 | 
						|
pointer arrays respectively.
 | 
						|
We cannot pass these pointers to the system call however, because
 | 
						|
the EM addresses to which they point do not correspond with UNIX
 | 
						|
addresses.
 | 
						|
Moreover, (it is not very likely to happen but) what if someone constructs
 | 
						|
a program holding the contents for one of these pointers in the stack?
 | 
						|
The stack is implemented upside down, so passing the pointer to
 | 
						|
\fIexecve()\fP causes trouble for this reason too.
 | 
						|
The only solution was to copy the pointer contents completely
 | 
						|
to fresh UNIX memory, constructing vectors which can be passed to the
 | 
						|
system call.
 | 
						|
Any impending memory fault while making these copies results in failure of the
 | 
						|
system call, with \fIerrno\fP set to EFAULT.
 | 
						|
.PP
 | 
						|
The implementation of the \fIfork()\fP call faced us with problems
 | 
						|
concerning IO-channels.
 | 
						|
Checking messages (as well as logging) must be divided over different files.
 | 
						|
Otherwise, these messages will coincide.
 | 
						|
This problem was solved by post-fixing the default message file
 | 
						|
\fBint.mess\fP (as well as the logging file \fBint.log\fP) with an
 | 
						|
automatically leveled number for every new forked process.
 | 
						|
Children of the original process do their diagnostics
 | 
						|
in files with postfix 1,2,3 etc.
 | 
						|
Second generation processes are assigned files numbered 11, 12, 21 etc.
 | 
						|
When 6 generations of processes exist at one moment, the seventh will
 | 
						|
get the same message file as the sixth, for the length of the filename
 | 
						|
will become too long.
 | 
						|
.PP
 | 
						|
Some of the monitor calls receive pointers (addresses) from to program, to be
 | 
						|
passed to the kernel; examples are the struct stat for \fIstat()\fP, the area
 | 
						|
to be filled for \fIread()\fP, etc. If the address is wrong, the kernel does
 | 
						|
not generate a trap, but rather the system call returns with failure, while
 | 
						|
\fIerrno\fP is set to EFAULT.  This is implemented by consistent checking of
 | 
						|
all pointers in the MON instruction.
 | 
						|
.NH 2
 | 
						|
Internal arithmetic.
 | 
						|
.PP
 | 
						|
Doing arithmetic on signed integers, the smallest negative integer
 | 
						|
(\fIminsint\fP) is considered a legal value.
 | 
						|
This is in contradiction with the EM Manual [1], page 14, which proposes using
 | 
						|
\fIminsint\fP for uninitialized integers.
 | 
						|
The shadow bytes already check for uninitialized integers however,
 | 
						|
so we do not need this special illegal value.
 | 
						|
Although the EM Manual provides two traps, for undefined integers and floats,
 | 
						|
undefined objects occur so frequently (e.g. in block copying partially
 | 
						|
initialized areas) that the interpreter just gives a warning.
 | 
						|
.LP
 | 
						|
Except for arithmetic on unsigneds, all arithmetic checks for overflow.
 | 
						|
The value that is pushed on the stack after an overflow occurs depends
 | 
						|
on the UNIX behavior with regard to that particular calculation.
 | 
						|
If UNIX would not accept the calculation (e.g. division by zero), a zero
 | 
						|
is pushed as a convention.
 | 
						|
Illegal computations which UNIX does accept in silence (e.g. one's
 | 
						|
complement of \fIminsint\fP), simply push the UNIX-result after giving a
 | 
						|
trap message.
 | 
						|
.NH 2
 | 
						|
Shadow bytes implementation.
 | 
						|
.PP
 | 
						|
A great deal of run-time checking is performed by the interpreter (except if
 | 
						|
used in the fast version).
 | 
						|
This section gives all details about the shadow bytes.
 | 
						|
In order to keep track of information about the contents of D-space (stack
 | 
						|
and global data area), there is one shadow-byte for each byte in these spaces.
 | 
						|
Each bit in a shadow-byte represents some piece
 | 
						|
of information about the contents of its corresponding 'sun-byte'.
 | 
						|
All bits off indicates an undefined sun-byte.
 | 
						|
One or more bits on always guarantees a well-defined sun-byte.
 | 
						|
The bits have the following meaning:
 | 
						|
.IP "\(bu bit 0:" 8
 | 
						|
indicates that the sun-byte is (a part of) an integer.
 | 
						|
.IP "\(bu bit 1:" 8
 | 
						|
the sun-byte is a part of a floating point number.
 | 
						|
.IP "\(bu bit 2:" 8
 | 
						|
the sun-byte is a part of a pointer in dataspace.
 | 
						|
.IP "\(bu bit 3:" 8
 | 
						|
the sun-byte is a part of a pointer in the instruction space.
 | 
						|
According to [1] (paragraph 6.4), there are two types pointers which
 | 
						|
must be distinguishable.
 | 
						|
Conversion between these two types is impossible.
 | 
						|
The shadow-bytes make the distinction here.
 | 
						|
.IP "\(bu bit 4:" 8
 | 
						|
protection bit.
 | 
						|
Indicates that the sun-byte is part of a protected piece of memory.
 | 
						|
There is a protected area in the stack, the Return Status Block.
 | 
						|
The EM machine language has no possibility to declare protected
 | 
						|
memory, as is possible in EM assembly (the ROM instruction).  The protection
 | 
						|
bit is, however, set for the line number and filename pointer area near
 | 
						|
location 0, to aid in catching references to location 0.
 | 
						|
.IP "\(bu bit 5/6/7:" 8
 | 
						|
free for later use.
 | 
						|
.LP
 | 
						|
The shadow bytes are managed by the routines declared in \fIshadow.h\fP.
 | 
						|
The warnings originating from checking these shadow-bytes during
 | 
						|
run-time are various.
 | 
						|
A list of them is given in appendix A, together with suggestions
 | 
						|
(primarily for the C-programmer) where to look for the trouble maker(s).
 | 
						|
.LP
 | 
						|
A point to notice is, that once a warning is generated, it may be repeated
 | 
						|
thousands of times.
 | 
						|
Since repetitive warnings carry little information, but consume much
 | 
						|
file space, the interpreter keeps track of the number of times a given warning
 | 
						|
has been produced from a given line in a given file.
 | 
						|
The warning message will
 | 
						|
be printed only if the corresponding counter is a power of four (starting at
 | 
						|
1).  In this way, a logarithmic back-off in warning generation is established.
 | 
						|
.LP
 | 
						|
It might be argued that the counter should be kept for each (warning, PC
 | 
						|
value) pair rather than for each (warning, file position) pair.  Suppose,
 | 
						|
however, that two instruction in a given line would cause the same message
 | 
						|
regularly; this would produce two intertwined streams of identical messages,
 | 
						|
with their counters jumping up and down.  This does not seem desirable.
 | 
						|
.NH 2
 | 
						|
Return Status Block (RSB)
 | 
						|
.PP
 | 
						|
According to the description in [1], at least the return address and the
 | 
						|
base address of the previous RSB have to be pushed when performing a call.
 | 
						|
Besides these two pointers, other information can be stored in the RSB
 | 
						|
also.
 | 
						|
The interpreter pushes the following items:
 | 
						|
.IP \-
 | 
						|
a pointer to the current filename,
 | 
						|
.IP \-
 | 
						|
the current line number (always four bytes),
 | 
						|
.IP \-
 | 
						|
the Local Base,
 | 
						|
.IP \-
 | 
						|
the return address (Program Counter),
 | 
						|
.IP \-
 | 
						|
the current procedure identifier
 | 
						|
.IP \-
 | 
						|
the RSB code, which distinguishes between initial start-up, normal call,
 | 
						|
returnable trap and non-returnable trap (a word-size integer).
 | 
						|
.LP
 | 
						|
Consequently, the size of the RSB varies, depending on
 | 
						|
word size and pointer size; its value is available as \fIrsbsize\fP.
 | 
						|
When the RSB is removed from the stack (by a RET or RTT) the RSB code is under
 | 
						|
the Stack Pointer for immediate checking.  It is not clear what should be done
 | 
						|
if RSB code and return instruction do not match; at present we give a message
 | 
						|
and continue, for what it is worth.
 | 
						|
.PP
 | 
						|
The reason for pushing filename and line number is that some front-ends tend
 | 
						|
to forget the LIN and FIL instructions after returning from a function.
 | 
						|
This may result in error messages in wrong source files and/or line numbers.
 | 
						|
.PP
 | 
						|
The procedure identifier is kept and restored to check that the PC will not
 | 
						|
move out of the running procedure.  The PI is an index in the proctab, which
 | 
						|
tells the limits in the text segment of the running procedure.
 | 
						|
.PP
 | 
						|
If the Return Status Block is generated as a result of a trap, more is
 | 
						|
stacked.  Before stacking the normal RSB, the trap function pushes the
 | 
						|
following items:
 | 
						|
.IP \-
 | 
						|
the contents of the entire Function Return Area,
 | 
						|
.IP \-
 | 
						|
the number of bytes significant in the above (a word-size integer),
 | 
						|
.IP \-
 | 
						|
a word-size flag indicating if the contents of the FRA are valid,
 | 
						|
.IP \-
 | 
						|
the trap number (a word-size integer).
 | 
						|
.LP
 | 
						|
The latter is followed directly by the RSB, and consequently acts as the only
 | 
						|
parameter to the trap handler.
 | 
						|
.NH 2
 | 
						|
Operand access.
 | 
						|
.PP
 | 
						|
The EM Manual mentions two ways to access the operands of an instruction.  It
 | 
						|
should be noticed that the operand in EM is often not the direct operand of the
 | 
						|
operation; the operand of the ADI instruction, e.g., is the width of the
 | 
						|
integers to be added, not one of the integers themselves.  The various operand
 | 
						|
types are described in [1].  Each opcode in the text segment identifies an
 | 
						|
instruction with a particular operand type; these relations are described in
 | 
						|
computer-readable format in a file in the EM tree, \fIip_spec.t\fP.
 | 
						|
.PP
 | 
						|
The interpreter uses a variant of the second method.  Several other approaches
 | 
						|
can be designed, with increasing efficiency and equally increasing complexity.
 | 
						|
They are briefly treated below.
 | 
						|
.NH 3
 | 
						|
The Dispatch Table, Method 1.
 | 
						|
.PP
 | 
						|
When the interpreter starts, it reads the ip_spec.t file and constructs from it
 | 
						|
a dispatch table.  This table (of which there are actually three,
 | 
						|
for primary, secondary
 | 
						|
and tertiary opcodes) has 256 entries, each describing an instruction with
 | 
						|
indications on how to decode the operand.  For each instruction executed, the
 | 
						|
interpreter finds the entry in the dispatch table, finds information there on
 | 
						|
how to access the operand, constructs the operand and calls the appropriate
 | 
						|
routine with the operand as calculated.  There is one routine for each
 | 
						|
instruction, which is called with the ready-made operand.  Method 1 is easy to
 | 
						|
program but requires constant interpretation of the dispatch table.
 | 
						|
.NH 3
 | 
						|
Intelligent Routines, Method 2.
 | 
						|
.PP
 | 
						|
For each opcode there is a separate routine, and since an opcode uniquely
 | 
						|
defines the instruction and the operand format, the routine knows how to get
 | 
						|
the operand; this knowledge is built into the routine.  Preferably the heading
 | 
						|
of the routine is generated automatically from the ip_spec.t file.  Operand
 | 
						|
decoding is immediate, and no dispatch table is needed.  Generation of the
 | 
						|
469 required routines is, however, far from simple.  Either a generated array
 | 
						|
of routine names or a generated switch statement is used to map the opcode onto
 | 
						|
the correct routine.  The switch approach has the advantage that parameters can
 | 
						|
be passed to the routines.
 | 
						|
.LP
 | 
						|
The interpreter uses a variant of the switch statement scheme.  Numerical
 | 
						|
information that can be deduced from the opcode is passed as parameters to the
 | 
						|
routine; this includes the argument of minis, the high order byte of shorties,
 | 
						|
and the fact that the result is to be multiplied by the word size.  This
 | 
						|
reduces the number of required routines to 338.
 | 
						|
.NH 3
 | 
						|
Intelligent Calls.
 | 
						|
.PP
 | 
						|
The call in the switch statement does full operand construction, and the
 | 
						|
resulting operand is passed to the routine.  This reduces the number of
 | 
						|
routines to 133, the number of EM instructions.  Generation of the switch
 | 
						|
statement from ip_spec.t will be complicated, but the routine space will be
 | 
						|
much cleaner.  This will not give any speed-up since the same actions are still
 | 
						|
required; they are just performed in a different place.
 | 
						|
.NH 3
 | 
						|
Static Evaluation.
 | 
						|
.PP
 | 
						|
It can be observed that the evaluation of the operand of a given instruction in
 | 
						|
the text segment will always give the same result.  It is therefore possible to
 | 
						|
preprocess the text segment, decomposing the instructions into structs which
 | 
						|
contain the address, the instruction code and the operand.  No operand decoding
 | 
						|
will be necessary at run-time: all operands have been precalculated.  This will
 | 
						|
probably give a considerable speed-up.  Jumps, especially GTO jumps, will,
 | 
						|
however, require more attention.
 | 
						|
.NH 2
 | 
						|
Disassembly.
 | 
						|
.PP
 | 
						|
A disassembly facility is available, which gives a readable but not
 | 
						|
letter-perfect disassembly of the EM object.  The procedure structure is
 | 
						|
indicated by placing the indication  \fBP[n]\fP  at the entry point of each
 | 
						|
procedure, where \fBn\fP is the procedure identifier.  The number of locals is
 | 
						|
given in a comment.
 | 
						|
.LP
 | 
						|
The disassembler was generated by the software in the directory \fIswitch\fP
 | 
						|
and then further processed by hand.
 |