ack/doc/int/txt1

.\"	Introduction
.\"
.\"	$Id$
.NH
INTRODUCTION.
.PP
This document describes an EM interpreter which does extensive checking.
The interpreter exists in two versions: the normal version with full checking
and debugging facilities, and a fast stripped version that does interpretation
only.
This document assumes that the full version is used.
.LP
First the virtual EM machine embodied by the interpreter (called \fBint\fP) is
described, followed by some remarks on performance.
The second section gives some specific implementation decisions.
Section three explains the usage of the built-in debugging tool.
.LP
Appendix A gives an overview of the various warnings \fBint\fP gives,
with possible causes and solutions.
Appendix B is a simple tutorial on the use of \fBint\fP.
A separate manual page exists.
.PP
The document assumes a good understanding of what EM is and what
the assembly code looks like [1].
Notions like 'procedure descriptor', 'mini', 'shortie' etc. are not
explained.
In the sequel, any word in \fIthis font\fP refers to the name of a
variable, constant, function or whatever, used in the source code under
the same name.
.LP
To avoid confusion: \fBint\fP interprets EM machine language (e.out files),
\fInot\fP the assembly language (.e files) and \fInot\fP the compact
code (.k files).
.NH 2
The virtual EM machine.
.PP
The memory layout of the virtual EM machine represented by the interpreter
differs in details from the description in [1].
Virtual memory is split up into two separate spaces:
one space containing the instructions,
the other all the data, including stack and heap (D-space).
The procedure descriptors are preprocessed and stored in a separate array,
\fIproctab[]\fP.
Both spaces start off at address 0.
This is possible because pointers in the two different spaces are
distinguishable by context (and shadow-bytes: see 2.6).
.NH 3
Instruction Space
.PP
Figure 1 shows the I-space, together with the position of some important
EM registers.
.Dr 12
                      NEXT -->  |________________|  <-- DB    \e
                                |                |            |
                                |                |            |  T
                                |                |  <-- PC    |
                                |     Program    |            |  e
                                |                |            |
                                |      Text      |            |  x
                                |                |            |
                                |                |            |  t
                         0 -->  |________________|  <--(PB)   /
.Df
\fI Fig 1. Virtual instruction space (I-space).\fP
.De
.PP
The I-space is just big enough to contain all the instructions.
The size needed for the program text (\fINTEXT\fP) is found from the
header-bytes of the loadfile.
Legal values for the program counter (\fIPC\fP) consist of all
addresses in the range from 0 through \fINTEXT\fP \- 1.
If the \fIPC\fP is made to point to an illegal address, a trap will occur.
.NH 3
The Procedure Table
.PP
The \fINProc\fP constant indicates how many procedure descriptors there
are in the proctab array.
Elements of this array contain for each procedure: the number of locals, the
entry point and the entry point of the textually following procedure.  This is
used in testing the restriction that the program counter may not wander from
procedure to procedure.
.NH 3
The Data Space
.PP
Figure 2 shows the layout of the data space, which closely conforms to the EM
Manual.
.Dr 36
                                __________________
            maxaddr(psize) -->  |                |  <-- ML    \e
                                |                |            |  S
                                |     Locals     |            |  t
                                |       &        |            |  a
                                |      RSBs      |            |  c
                                |                |            |  k
                                |________________|  <-- SP    /
                                .                .
                                .                .
                                .     Unused     .
                                .                .
                                .                .
                                .                .
                                .                .
                                .                .
                                .     Unused     .
                                .                .
                                .                .
                                |________________|  <-- HP
                                |                |            \e
                                |      Heap      |            |
                                |________________|  <-- HB    |
                                |                |            |  D
                                |    Arguments   |            |
                                |     Environ    |            |  a
                                |  _   _   _   _ |            |
                                |                |            |  t
                                |                |            |
                                |                |            |  a
                                |   Global data  |            |
                                |                |            |
                                |                |            |
                         0 -->  |________________|  <--(EB)   /
.Df
\fI Fig 2. Virtual dataspace (D-space).\fP
.De
.PP
D-space begins at address 0, and ends at the largest address
representable by the pointer size (\fIpsize\fP) being used;
for a 2-byte pointer size this maximum address is
.DS
((2 ^ 16 \- 1) / word size * word size) \- 1
.DE
for a 4-byte pointer size it is
.DS
((2 ^ 31 \- 1) / word size * word size) \- 1
.DE
(not 2 ^ 32, to allow illegal pointers to be implemented in the future).  The
funny rounding construction is required to make ML+1 expressible as the
initialisation value of LB and SP.
.PP
D-space is split into two partitions: Data and Stack (indicated by the
brackets).
The Data partition holds the global data area (GDA) and the heap.
Its initial size is given by the loadfile constant SZDATA.
Some space is added to it, because arguments and environment are
stored here also.
This total size is static while interpreting.
However, as the heap may grow during execution (e.g. caused by dynamic
allocation) this results in a variable size for the Data partition.
Initially, the size for the Data partition is the sum of the space needed
by the GDA (including the space needed for arguments and environment) and
the initial heapspace.
The lowest legal Data address is 0; the highest \fIHP\fP \- 1.
.LP
The Stack partition holds the stack.
It begins at the highest available D-space address, and grows
towards the low addresses, so the Stack partition is of variable size too.
The lowest legal Stack address is the stackpointer (\fISP\fP),
the highest is the memory limit (\fIML\fP).
.NH 2
Physical lay-out
.PP
Each partition is mapped onto a piece of physical memory with the
same name: \fItext\fP (fig. 1), \fIstack\fP and \fIdata\fP (fig. 2).
These are the storage structures which \fBint\fP uses to physically
store the contents of the virtual EM spaces.
Figure 2 thus shows the mapping of D-space onto two
different physical parts: \fIstack\fP and \fIdata\fP.
The I-space is represented by one physical part: \fItext\fP.
.LP
Each time more space is needed, the actual partition is reallocated,
with the new size being computed with the formula:
.DS
\fInew size\fP = 1.5 \(mu (\fIold size\fP + \fIextra\fP)
.DE
\fIextra\fP is the number of bytes exceeding the \fIold size\fP.
One can prove that using this method, there is a
linear relationship between allocation time and needed partition size.
.PP
A virtual D-space starting at address 0 is in correspondence with
the definition in [1], p. 3\-6.
The main reason for having D-space start at address 0, is that it induces
a one-one correspondence between the heap \- and GDA
addresses on the virtual machine (and hence the definition) on one hand,
and the offset within the \fIdata\fP partition on the other.
This implies that no extra calculation is needed to perform load and
storage operations.
.LP
Some calculation however cannot be avoided, because the stack part of
the D-space grows downwards by EM definition.
The first address of the virtual stack (\fIML\fP, the maximum address for
the given \fIpsize\fP) is mapped onto the
beginning of the \fIstack\fP partition.
When the stack grows (i.e. EM addresses get lower), the offset within the
\fIstack\fP partition gets higher.
By taking offset \fIML \- A\fP in the stack partition, one obtains the
physical address corresponding to some virtual EM (stack) address \fIA\fP.
.NH 2
Speed.
.PP
From several test results with both versions of the interpreter, the
following may be concluded.
The speed of the interpreter depends strongly on the type of
program being interpreted.
If plain CPU arithmetic is performed, the interpreter is
relatively slow (1000 \(mu the cc version).
When stack manipulation is at hand, the interpreter is
quite fast (100 \(mu the cc version).
.LP
Most programs however will not be this extreme, so an interpretation
time of somewhere between 300 and 500 times direct execution
for a normal program is to be expected.
.LP
The fast version runs in about 60% of the time of the full version, at the
expense of a considerably lower functionality.
Tallying costs about 10%.