ack/doc/int/appB

487 lines
21 KiB
Plaintext
Raw Permalink Normal View History

1988-06-22 21:48:19 +00:00
.\" A simple tutorial
.\"
1994-06-24 11:31:16 +00:00
.\" $Id$
1988-06-22 21:48:19 +00:00
.\"
.bp
.DS
APPENDIX B
.DE
.SH
How to use the interpreter
.PP
The interpreter is not normally used for the debugging of programs under
construction. Its primary application is as a verification tool for almost
completed programs. Although the proper operation of the interpreter is
obviously a black art, this chapter tries to provide some guidelines.
.LP
For the sake of the argument, the source language is assumed to be C, but most
hints apply equally well to other languages supported by ACK.
.sp
.LP
.I "Initial measures"
.PP
Start with a test case of trivial size; to be on the safe side, reckon with a
time dilatation factor of about 500, i.e., a second grows into 10 minutes.
(The interpreter takes 0.5 msec to do one EM instruction on a Sun 3/50).
Fortunately many trivial test cases are much shorter than one second.
.PP
Compile the program into an \fIe.out\fP, the EM machine version of a
\fIa.out\fP, by calling \fIem22\fP (for 2-byte integers and 2-byte pointers),
\fIem24\fP (for 2 and 4) or \fIem44\fP (for 4 and 4) as seems appropriate;
if in doubt, use \fIem44\fP. These compilers can be found in the ACK
\fIbin\fP directory, and should be used instead of \fIacc\fP (or normal
.UX
1991-11-19 13:09:50 +00:00
\fIcc\fP). Alternatively, \fIacc \-memNN\fP can be used instead of
1988-06-22 21:48:19 +00:00
\fIemNN\fP.
.LP
1991-11-19 13:09:50 +00:00
If a C program consists of more than one file, as it usually does, there is
1988-06-22 21:48:19 +00:00
a small problem. The \fIacc\fP and \fIcc\fP compilers generate .o files,
whereas the \fIemNN\fP compilers generate .m files as object files.
A simple technique to avoid the problem is to call
.DS
em44 *.c
.DE
1991-11-19 13:09:50 +00:00
if possible. If not, the following hack on the \fIMakefile\fP generally works.
1988-06-22 21:48:19 +00:00
.IP \-
Make sure the \fIMakefile\fP is reasonably clean and complete: all calls to
the compiler are through \fI$(CC)\fP, \fICFLAGS\fP is used properly and all
dependencies are specified.
.IP \-
Add the following lines to the \fIMakefile\fP (possibly permanently):
.DS
\&.SUFFIXES: .o
\&.c.o:
\& $(CC) \-c $(CFLAGS) $<
.DE
.IP \-
Set CC to \fIem44 \-.c\fP (for example). Make sure CFLAGS includes
the \-O option; this yields a speed-up of about 15 %.
.IP \-
1991-11-19 13:09:50 +00:00
Change all .o to .m (or .k if the \-O option is not used).
1988-06-22 21:48:19 +00:00
.IP \-
If necessary, change \fIa.out\fP to \fIe.out\fP.
.PP
1991-11-19 13:09:50 +00:00
With these changes, \fImake\fP will produce an EM object;
\fIesize\fP can be used to verify that it is indeed an EM object and obtain some
1988-06-22 21:48:19 +00:00
statistics. Then call the interpreter:
.DS
int <EM-object-file> [ parameters ]
.DE
1991-11-19 13:09:50 +00:00
where the parameters are the normal parameters of the program. This should
1988-06-22 21:48:19 +00:00
work exactly like the original program, though slower. It reads from the
terminal if the original does, it opens and closes files like the original and
it accepts interrupts.
.sp
.LP
.I "Interpreting the results"
.PP
Now there are several possibilities.
.PP
It does all this. Great! This means the program
does not do very uncouth things. Now
read the file \fIint.mess\fP to see if any messages were generated. If there
are none, the program did not really run (perhaps the original cc \fIa.out\fP
got called instead?) Normally there is at least a termination message like
.DS
(Message): program exits with status 0 at "awa.p", line 64, INR = 4124
.DE
This says that the program terminated through an exit(0) on line 64 of the
file \fIawa.p\fP after 4124 EM instructions.
If this is the only message it is time to move to a bigger test case.
.PP
On the other hand, the program may come to a grinding halt with an error
message.
All messages (errors and warnings) have a format in which the sequence
.DS
"<file name>", line <ln#>
.DE
occurs, which is the same sequence many compilers produce for their error
messages. Consequently, the \fIint.mess\fP file can be processed as any
compiler message output.
.PP
One such message can be
.DS
(Fatal error) a.em: trap "Addressing non existent memory" not caught at "a.c", line 2, INR = 16
.DE
produced by the abysmal program
.DS
main() {
*(int*)200000 = 1;
}
.DE
.LP
Often the effects are more subtle, however. The program
.DS
main() {
int *a, b = 777;
b = *a;
}
.DE
produces the following five warnings (in far less than a second):
.DS
(Warning 47, #1): Local data pointer expected at "t.c", line 4, INR = 17
(Warning 61, cont.): Actual memory is undefined at "t.c", line 4, INR = 17
(Warning 102, #1): Returned function result too small at "<unknown>", line 0, INR = 21
(Warning 43, #1): Local integer expected at "exit.c", line 11, INR = 34
(Warning 61, cont.): Actual memory is undefined at "exit.c", line 11, INR = 34
.DE
The one about the function result looks the most frightening,
but is the most easily solved:
\fImain\fP is a function returning an int, so the start-up routine expects a
(four-byte) integer but gets an empty (zero-byte) return area.
.LP
\fINote\fP: The experts are divided about this. The traditional school holds
that \fImain\fP is an int function and its result is the return code; this
leaves them with two ways of supplying a return code: one as the parameter
of \fIexit()\fP and one as the result
of \fImain\fP. The modern school (Berkeley 4.2 etc.) claims that
return codes are supplied exclusively
by \fIexit()\fP, and they have an \fIexit(0)\fP in
the start-up routine, just after the call to \fImain()\fP; leaving \fImain()\fP
through the bottom implies successful termination.
.LP
We shall satisfy both groups by
.DS
main() {
int *a, b = 777;
b = *a;
exit(0);
}
.DE
This results in
.DS
(Warning 47, #1): Local data pointer expected at "t.c", line 4, INR = 17
(Warning 61, cont.): Actual memory is undefined at "t.c", line 4, INR = 17
(Message): program exits with status 0 at "exit.c", line 11, INR = 33
.DE
which is pretty clear as it stands.
.sp
.LP
.I "Using stack dumps"
.PP
Let's, for the sake of argument
and to avoid the fierce realism of 10000-line programs, assume that the above
1991-11-19 13:09:50 +00:00
still does not give enough information.
1988-06-22 21:48:19 +00:00
Since the error occurred in EM instruction number 17, we should like to see
more information around that moment. Call the interpreter again, now with the
shell variable AT set at 17:
.DS
int AT=17 t.em
.DE
(The interpreter has a number of internal variables that can be set by
assignments on the command line, like with \fImake\fP.)
1991-11-19 13:09:50 +00:00
This gives a file called \fIint.log\fP containing the
1988-06-22 21:48:19 +00:00
stack dump of 150 lines presented at the end of this chapter.
.PP
Since dumping is a subfacility of logging in the interpreter, the formats of
the lines are
the same. If a line starts with an @, it will contain a file-name/line-number
indication; the next two characters are the subject and the log
level. Then comes the information, preceded by a space. The text contains
three stack dumps, one before the offending instruction, one at it, and one
after it; then the interpreter stops. All kinds of other dumps can be
obtained, but this is default.
.PP
For each instruction we have, in order:
.IP \-
an @x9 line, giving the position in the program,
.IP \-
the messages, warnings and errors from the instruction as it is being executed,
.IP \-
dump(s), as requested.
.PP
The first two lines mean that at line 4 in file \fIt.c\fP the interpreter
performed its 16-th instruction, with the Program Counter at 30 pointing at
opcode 180 in the text segment; the instruction was an LOL (LOad Local)
with the operand \-4 derived from the opcode. It copies the local at offset
\-4 to the top of the stack. The effect can be seen from the subsequent stack
dump, where the undefined word at addresses 2147483568 to ...571 (the variable
\fIa\fP) has been copied to the top of the stack at 2147483560 (copying
undefined values does not generate a warning).
Since we used the \fIem44\fP compiler, all pointers and ints in our dump are
4 bytes long.
So a variable at address X in reality extends from address X to X+3.
.br
Note that this is not the offending instruction; this stack dump represents
the situation just before the error.
.PP
The stack consists of a sequence of frames, each containing data followed by
a Return Status Block resulting from a call; the last frame ends in
top-of-stack. The first frame represents the stack when the program starts,
through a call to the start-up routine. This routine prepares the second
stack frame with the actual parameters to \fImain()\fP:
\fIargc\fP at 2147483596, \fIargv\fP at 2147483600 and \fIenviron\fP at
2147483604.
.LP
The RSB line shows that the call to \fImain()\fP was made from procedure 0
which has 0 locals, with PC at
16, an LB of 2147483608 and file name and line number still unknown.
The \fIcode\fP in the RSB tells how this RSB was made; possible values are STP
(start-up), CAL, RTT (returnable trap) and NRT (non-returnable trap).
.PP
The next frame shows the local variable(s) of \fImain()\fP; there are two of
them, the pointer \fIa\fP at 2147483568, which is undefined, and variable
\fIb\fP at 2147483564, which has the value 777. Then comes a copy of \fIa\fP,
just made by the LOL instruction, at 2147483560. The following line shows that
the Function Return Area (which does not reside at the end of the stack, but
just happens to be printed here) has size 0 and is presently undefined.
The stack dump ends
by showing that the Actuals Base is at 2147483596 (pointing at \fIargc\fP), the
Locals Base at 2147483572 (pointing just above the local \fIa\fP), the Stack
Pointer at 2147483560 (pointing at the undefined pointer), the line count is 4
and the file name is "t.c".
.LP
1991-11-19 13:09:50 +00:00
(Notice that there is one more stack frame than one would probably expect, the
1988-06-22 21:48:19 +00:00
one above the start-up routine.)
.LP
The Function Return Area
could have a size larger than 0 and still be undefined, for
example when an instruction that does not preserve the contents of the FRA has
just been executed; likewise the FRA could have size 0 and be defined
nevertheless, for example just after a RET 0 instruction.
.PP
All this has set the scene for the distaster which is about to strike in the
next instruction. This is indeed a LOI (LOad Indirect) of size 4, opcode 169;
it causes the message
.DS
warning: Local data pointer expected [stack.c: 242]
.DE
and its continuation
.DS
warning cont.: Actual memory is undefined
.DE
(detected in the interpreter file \fIstack.c\fP at line 242; this can be
useful for sorting out dubious semantics). We see that the effect, as shown in
the third frame of this stack dump (at instruction number 17) is somewhat
unexpected: the LOI has fetched the value 4 and stacked it. The reason is
that, unfortunately, undefinedness is not transitive in the interpreter. When
an undefined value is used in an operation (other than copying) a warning is
given, but thereafter the value is treated as if it were zero. So, after the
warning a normal null pointer remains, which is then used to pick up the value
at location 0. This is the place where the EM machine stores its current line
number, which is presently 4.
.PP
The third stack dump shows the final effect: the value 4 has been unstacked
and copied to variable \fIb\fP at 2147483564 through an STL (STore Local)
instruction.
.PP
Since this form of logging dumps the stack only, the log file is relatively
small as dumps go.
Nevertheless, a useful excerpt can be obtained with the command
.DS
grep 'd1' int.log
.DE
This extracts the Return Status Block lines from the log, thus producing three
traces of calls, one for each instruction in the log:
.DS
d1 >> RSB: code = STP, PI = uninit, PC = 0, LB = 2147483644, LIN = 0, FIL = NULL
d1 >> RSB: code = CAL, PI = (0,0), PC = 16, LB = 2147483608, LIN = 0, FIL = NULL
d1 >> AB = 2147483596, LB = 2147483572, SP = 2147483560, HP = 848, LIN = 4, FIL = "t.c"
d1 >> RSB: code = STP, PI = uninit, PC = 0, LB = 2147483644, LIN = 0, FIL = NULL
d1 >> RSB: code = CAL, PI = (0,0), PC = 16, LB = 2147483608, LIN = 0, FIL = NULL
d1 >> AB = 2147483596, LB = 2147483572, SP = 2147483560, HP = 848, LIN = 4, FIL = "t.c"
d1 >> RSB: code = STP, PI = uninit, PC = 0, LB = 2147483644, LIN = 0, FIL = NULL
d1 >> RSB: code = CAL, PI = (0,0), PC = 16, LB = 2147483608, LIN = 0, FIL = NULL
d1 >> AB = 2147483596, LB = 2147483572, SP = 2147483564, HP = 848, LIN = 4, FIL = "t.c"
.DE
Theoretically, the pertinent trace is the middle one, but in practice all three
are equal. In the present case there isn't much to trace, but in real programs
the trace can be useful.
.sp
.LP
.I "Errors in libraries"
.PP
Since libraries are generally compiled with suppression of line number and
file name information, the line number and file name in the interpreter will
not be updated when it enters a library routine. Consequently, all messages
generated by interpreting library routines will seem to originate from the
line of the call. This is especially true for the routine malloc(), which,
from the nature of its business, often contains dubitable code.
.PP
A usual message is:
.DS
(Warning 43, #1): Local integer expected at "buff.c", line 18, INR = 266
(Warning 64, cont.): Actual memory contains a data pointer at "buff.c", line 18, INR = 266
.DE
and indeed at line 18 of the file buff.c we find:
.DS
buff = malloc(buff_size = BFSIZE);
.DE
This problem can be avoided by using a specially compiled version of the
library that contains the correct LIN and FIL instructions, or, less
elegantly, by including the source code of the library routines in the
1991-11-19 13:09:50 +00:00
program; in the latter case, one has to be sure to have them all.
1988-06-22 21:48:19 +00:00
.sp
.LP
.I "Unavoidable messages"
.br
Some messages produced by the logging are almost unavoidable; sometimes the
writer of a library routine is forced to take liberties with the semantics of
EM.
.LP
Examples from C include the memory allocation routines.
For efficiency reasons, one bit of an pointer in the administration is used as
a flag; setting, clearing and reading this bit requires bitwise operations on
pointers, which gives the above messages.
Realloc causes a problem in that it may have to copy the originally allocated
area to a different place; this area may contain uninitialised bytes.
.bp
.DS
.ft CW
@x9 "t.c", line 4, INR = 16, PC = 30 OPCODE = 180
@L6 "t.c", line 4, INR = 16, DoLOLm(-4)
d2
d2 . . STACK_DUMP[4/4] . . INR = 16 . . STACK_DUMP . .
d2 ----------------------------------------------------------------
d2 ADDRESS BYTE ITEM VALUE SHADOW
d2 2147483643 0 (Dp)
d2 2147483642 0 (Dp)
d2 2147483641 0 (Dp)
d2 2147483640 40 [ 40] (Dp)
d2 2147483639 0 (Dp)
d2 2147483638 0 (Dp)
d2 2147483637 3 (Dp)
d2 2147483636 64 [ 832] (Dp)
d2 2147483635 0 (In)
d2 2147483634 0 (In)
d2 2147483633 0 (In)
d2 2147483632 1 [ 1] (In)
d1 >> RSB: code = STP, PI = uninit, PC = 0, LB = 2147483644, LIN = 0, FIL = NULL
d2
d2 ADDRESS BYTE ITEM VALUE SHADOW
d2 2147483607 0 (Dp)
d2 2147483606 0 (Dp)
d2 2147483605 0 (Dp)
d2 2147483604 40 [ 40] (Dp)
d2 2147483603 0 (Dp)
d2 2147483602 0 (Dp)
d2 2147483601 3 (Dp)
d2 2147483600 64 [ 832] (Dp)
d2 2147483599 0 (In)
d2 2147483598 0 (In)
d2 2147483597 0 (In)
d2 2147483596 1 [ 1] (In)
d1 >> RSB: code = CAL, PI = (0,0), PC = 16, LB = 2147483608, LIN = 0, FIL = NULL
d2
d2 ADDRESS BYTE ITEM VALUE SHADOW
d2 2147483571 undef
d2 | | | | | |
d2 2147483568 undef (1 word)
d2 2147483567 0 (In)
d2 2147483566 0 (In)
d2 2147483565 3 (In)
d2 2147483564 9 [ 777] (In)
d2 2147483563 undef
d2 | | | | | |
d2 2147483560 undef (1 word)
d2 FRA: size = 0, undefined
d1 >> AB = 2147483596, LB = 2147483572, SP = 2147483560, HP = 848, \e
LIN = 4, FIL = "t.c"
d2 ----------------------------------------------------------------
d2
@x9 "t.c", line 4, INR = 17, PC = 31 OPCODE = 169
@w1 "t.c", line 4, INR = 17, warning: Local data pointer expected [stack.c: 242]
@w1 "t.c", line 4, INR = 17, warning cont.: Actual memory is undefined
@L6 "t.c", line 4, INR = 17, DoLOIm(4)
d2
d2 . . STACK_DUMP[4/4] . . INR = 17 . . STACK_DUMP . .
d2 ----------------------------------------------------------------
d2 ADDRESS BYTE ITEM VALUE SHADOW
d2 2147483643 0 (Dp)
d2 2147483642 0 (Dp)
d2 2147483641 0 (Dp)
d2 2147483640 40 [ 40] (Dp)
d2 2147483639 0 (Dp)
d2 2147483638 0 (Dp)
d2 2147483637 3 (Dp)
d2 2147483636 64 [ 832] (Dp)
d2 2147483635 0 (In)
d2 2147483634 0 (In)
d2 2147483633 0 (In)
d2 2147483632 1 [ 1] (In)
d1 >> RSB: code = STP, PI = uninit, PC = 0, LB = 2147483644, LIN = 0, FIL = NULL
d2
d2 ADDRESS BYTE ITEM VALUE SHADOW
d2 2147483607 0 (Dp)
d2 2147483606 0 (Dp)
d2 2147483605 0 (Dp)
d2 2147483604 40 [ 40] (Dp)
d2 2147483603 0 (Dp)
d2 2147483602 0 (Dp)
d2 2147483601 3 (Dp)
d2 2147483600 64 [ 832] (Dp)
d2 2147483599 0 (In)
d2 2147483598 0 (In)
d2 2147483597 0 (In)
d2 2147483596 1 [ 1] (In)
d1 >> RSB: code = CAL, PI = (0,0), PC = 16, LB = 2147483608, LIN = 0, FIL = NULL
d2
d2 ADDRESS BYTE ITEM VALUE SHADOW
d2 2147483571 undef
d2 | | | | | |
d2 2147483568 undef (1 word)
d2 2147483567 0 (In)
d2 2147483566 0 (In)
d2 2147483565 3 (In)
d2 2147483564 9 [ 777] (In)
d2 2147483563 0 (In)
d2 2147483562 0 (In)
d2 2147483561 0 (In)
d2 2147483560 4 [ 4] (In)
d2 FRA: size = 0, undefined
d1 >> AB = 2147483596, LB = 2147483572, SP = 2147483560, HP = 848, \e
LIN = 4, FIL = "t.c"
d2 ----------------------------------------------------------------
d2
@x9 "t.c", line 4, INR = 18, PC = 32 OPCODE = 229
@S6 "t.c", line 4, INR = 18, DoSTLm(-8)
d2
d2 . . STACK_DUMP[4/4] . . INR = 18 . . STACK_DUMP . .
d2 ----------------------------------------------------------------
d2 ADDRESS BYTE ITEM VALUE SHADOW
d2 2147483643 0 (Dp)
d2 2147483642 0 (Dp)
d2 2147483641 0 (Dp)
d2 2147483640 40 [ 40] (Dp)
d2 2147483639 0 (Dp)
d2 2147483638 0 (Dp)
d2 2147483637 3 (Dp)
d2 2147483636 64 [ 832] (Dp)
d2 2147483635 0 (In)
d2 2147483634 0 (In)
d2 2147483633 0 (In)
d2 2147483632 1 [ 1] (In)
d1 >> RSB: code = STP, PI = uninit, PC = 0, LB = 2147483644, LIN = 0, FIL = NULL
d2
d2 ADDRESS BYTE ITEM VALUE SHADOW
d2 2147483607 0 (Dp)
d2 2147483606 0 (Dp)
d2 2147483605 0 (Dp)
d2 2147483604 40 [ 40] (Dp)
d2 2147483603 0 (Dp)
d2 2147483602 0 (Dp)
d2 2147483601 3 (Dp)
d2 2147483600 64 [ 832] (Dp)
d2 2147483599 0 (In)
d2 2147483598 0 (In)
d2 2147483597 0 (In)
d2 2147483596 1 [ 1] (In)
d1 >> RSB: code = CAL, PI = (0,0), PC = 16, LB = 2147483608, LIN = 0, FIL = NULL
d2
d2 ADDRESS BYTE ITEM VALUE SHADOW
d2 2147483571 undef
d2 | | | | | |
d2 2147483568 undef (1 word)
d2 2147483567 0 (In)
d2 2147483566 0 (In)
d2 2147483565 0 (In)
d2 2147483564 4 [ 4] (In)
d2 FRA: size = 0, undefined
d1 >> AB = 2147483596, LB = 2147483572, SP = 2147483564, HP = 848, \e
LIN = 4, FIL = "t.c"
d2 ----------------------------------------------------------------
d2
.DE