.\" A simple tutorial .\" .\" $Header$ .\" .bp .DS APPENDIX B .DE .SH How to use the interpreter .PP The interpreter is not normally used for the debugging of programs under construction. Its primary application is as a verification tool for almost completed programs. Although the proper operation of the interpreter is obviously a black art, this chapter tries to provide some guidelines. .LP For the sake of the argument, the source language is assumed to be C, but most hints apply equally well to other languages supported by ACK. .sp .LP .I "Initial measures" .PP Start with a test case of trivial size; to be on the safe side, reckon with a time dilatation factor of about 500, i.e., a second grows into 10 minutes. (The interpreter takes 0.5 msec to do one EM instruction on a Sun 3/50). Fortunately many trivial test cases are much shorter than one second. .PP Compile the program into an \fIe.out\fP, the EM machine version of a \fIa.out\fP, by calling \fIem22\fP (for 2-byte integers and 2-byte pointers), \fIem24\fP (for 2 and 4) or \fIem44\fP (for 4 and 4) as seems appropriate; if in doubt, use \fIem44\fP. These compilers can be found in the ACK \fIbin\fP directory, and should be used instead of \fIacc\fP (or normal .UX \fIcc\fP). Alternatively, \fIacc \-memNN\fP can be used instead of \fIemNN\fP. .LP If a C program consists of more than one file, as it usually does, there is a small problem. The \fIacc\fP and \fIcc\fP compilers generate .o files, whereas the \fIemNN\fP compilers generate .m files as object files. A simple technique to avoid the problem is to call .DS em44 *.c .DE if possible. If not, the following hack on the \fIMakefile\fP generally works. .IP \- Make sure the \fIMakefile\fP is reasonably clean and complete: all calls to the compiler are through \fI$(CC)\fP, \fICFLAGS\fP is used properly and all dependencies are specified. .IP \- Add the following lines to the \fIMakefile\fP (possibly permanently): .DS \&.SUFFIXES: .o \&.c.o: \& $(CC) \-c $(CFLAGS) $< .DE .IP \- Set CC to \fIem44 \-.c\fP (for example). Make sure CFLAGS includes the \-O option; this yields a speed-up of about 15 %. .IP \- Change all .o to .m (or .k if the \-O option is not used). .IP \- If necessary, change \fIa.out\fP to \fIe.out\fP. .PP With these changes, \fImake\fP will produce an EM object; \fIesize\fP can be used to verify that it is indeed an EM object and obtain some statistics. Then call the interpreter: .DS int [ parameters ] .DE where the parameters are the normal parameters of the program. This should work exactly like the original program, though slower. It reads from the terminal if the original does, it opens and closes files like the original and it accepts interrupts. .sp .LP .I "Interpreting the results" .PP Now there are several possibilities. .PP It does all this. Great! This means the program does not do very uncouth things. Now read the file \fIint.mess\fP to see if any messages were generated. If there are none, the program did not really run (perhaps the original cc \fIa.out\fP got called instead?) Normally there is at least a termination message like .DS (Message): program exits with status 0 at "awa.p", line 64, INR = 4124 .DE This says that the program terminated through an exit(0) on line 64 of the file \fIawa.p\fP after 4124 EM instructions. If this is the only message it is time to move to a bigger test case. .PP On the other hand, the program may come to a grinding halt with an error message. All messages (errors and warnings) have a format in which the sequence .DS "", line .DE occurs, which is the same sequence many compilers produce for their error messages. Consequently, the \fIint.mess\fP file can be processed as any compiler message output. .PP One such message can be .DS (Fatal error) a.em: trap "Addressing non existent memory" not caught at "a.c", line 2, INR = 16 .DE produced by the abysmal program .DS main() { *(int*)200000 = 1; } .DE .LP Often the effects are more subtle, however. The program .DS main() { int *a, b = 777; b = *a; } .DE produces the following five warnings (in far less than a second): .DS (Warning 47, #1): Local data pointer expected at "t.c", line 4, INR = 17 (Warning 61, cont.): Actual memory is undefined at "t.c", line 4, INR = 17 (Warning 102, #1): Returned function result too small at "", line 0, INR = 21 (Warning 43, #1): Local integer expected at "exit.c", line 11, INR = 34 (Warning 61, cont.): Actual memory is undefined at "exit.c", line 11, INR = 34 .DE The one about the function result looks the most frightening, but is the most easily solved: \fImain\fP is a function returning an int, so the start-up routine expects a (four-byte) integer but gets an empty (zero-byte) return area. .LP \fINote\fP: The experts are divided about this. The traditional school holds that \fImain\fP is an int function and its result is the return code; this leaves them with two ways of supplying a return code: one as the parameter of \fIexit()\fP and one as the result of \fImain\fP. The modern school (Berkeley 4.2 etc.) claims that return codes are supplied exclusively by \fIexit()\fP, and they have an \fIexit(0)\fP in the start-up routine, just after the call to \fImain()\fP; leaving \fImain()\fP through the bottom implies successful termination. .LP We shall satisfy both groups by .DS main() { int *a, b = 777; b = *a; exit(0); } .DE This results in .DS (Warning 47, #1): Local data pointer expected at "t.c", line 4, INR = 17 (Warning 61, cont.): Actual memory is undefined at "t.c", line 4, INR = 17 (Message): program exits with status 0 at "exit.c", line 11, INR = 33 .DE which is pretty clear as it stands. .sp .LP .I "Using stack dumps" .PP Let's, for the sake of argument and to avoid the fierce realism of 10000-line programs, assume that the above still does not give enough information. Since the error occurred in EM instruction number 17, we should like to see more information around that moment. Call the interpreter again, now with the shell variable AT set at 17: .DS int AT=17 t.em .DE (The interpreter has a number of internal variables that can be set by assignments on the command line, like with \fImake\fP.) This gives a file called \fIint.log\fP containing the stack dump of 150 lines presented at the end of this chapter. .PP Since dumping is a subfacility of logging in the interpreter, the formats of the lines are the same. If a line starts with an @, it will contain a file-name/line-number indication; the next two characters are the subject and the log level. Then comes the information, preceded by a space. The text contains three stack dumps, one before the offending instruction, one at it, and one after it; then the interpreter stops. All kinds of other dumps can be obtained, but this is default. .PP For each instruction we have, in order: .IP \- an @x9 line, giving the position in the program, .IP \- the messages, warnings and errors from the instruction as it is being executed, .IP \- dump(s), as requested. .PP The first two lines mean that at line 4 in file \fIt.c\fP the interpreter performed its 16-th instruction, with the Program Counter at 30 pointing at opcode 180 in the text segment; the instruction was an LOL (LOad Local) with the operand \-4 derived from the opcode. It copies the local at offset \-4 to the top of the stack. The effect can be seen from the subsequent stack dump, where the undefined word at addresses 2147483568 to ...571 (the variable \fIa\fP) has been copied to the top of the stack at 2147483560 (copying undefined values does not generate a warning). Since we used the \fIem44\fP compiler, all pointers and ints in our dump are 4 bytes long. So a variable at address X in reality extends from address X to X+3. .br Note that this is not the offending instruction; this stack dump represents the situation just before the error. .PP The stack consists of a sequence of frames, each containing data followed by a Return Status Block resulting from a call; the last frame ends in top-of-stack. The first frame represents the stack when the program starts, through a call to the start-up routine. This routine prepares the second stack frame with the actual parameters to \fImain()\fP: \fIargc\fP at 2147483596, \fIargv\fP at 2147483600 and \fIenviron\fP at 2147483604. .LP The RSB line shows that the call to \fImain()\fP was made from procedure 0 which has 0 locals, with PC at 16, an LB of 2147483608 and file name and line number still unknown. The \fIcode\fP in the RSB tells how this RSB was made; possible values are STP (start-up), CAL, RTT (returnable trap) and NRT (non-returnable trap). .PP The next frame shows the local variable(s) of \fImain()\fP; there are two of them, the pointer \fIa\fP at 2147483568, which is undefined, and variable \fIb\fP at 2147483564, which has the value 777. Then comes a copy of \fIa\fP, just made by the LOL instruction, at 2147483560. The following line shows that the Function Return Area (which does not reside at the end of the stack, but just happens to be printed here) has size 0 and is presently undefined. The stack dump ends by showing that the Actuals Base is at 2147483596 (pointing at \fIargc\fP), the Locals Base at 2147483572 (pointing just above the local \fIa\fP), the Stack Pointer at 2147483560 (pointing at the undefined pointer), the line count is 4 and the file name is "t.c". .LP (Notice that there is one more stack frame than one would probably expect, the one above the start-up routine.) .LP The Function Return Area could have a size larger than 0 and still be undefined, for example when an instruction that does not preserve the contents of the FRA has just been executed; likewise the FRA could have size 0 and be defined nevertheless, for example just after a RET 0 instruction. .PP All this has set the scene for the distaster which is about to strike in the next instruction. This is indeed a LOI (LOad Indirect) of size 4, opcode 169; it causes the message .DS warning: Local data pointer expected [stack.c: 242] .DE and its continuation .DS warning cont.: Actual memory is undefined .DE (detected in the interpreter file \fIstack.c\fP at line 242; this can be useful for sorting out dubious semantics). We see that the effect, as shown in the third frame of this stack dump (at instruction number 17) is somewhat unexpected: the LOI has fetched the value 4 and stacked it. The reason is that, unfortunately, undefinedness is not transitive in the interpreter. When an undefined value is used in an operation (other than copying) a warning is given, but thereafter the value is treated as if it were zero. So, after the warning a normal null pointer remains, which is then used to pick up the value at location 0. This is the place where the EM machine stores its current line number, which is presently 4. .PP The third stack dump shows the final effect: the value 4 has been unstacked and copied to variable \fIb\fP at 2147483564 through an STL (STore Local) instruction. .PP Since this form of logging dumps the stack only, the log file is relatively small as dumps go. Nevertheless, a useful excerpt can be obtained with the command .DS grep 'd1' int.log .DE This extracts the Return Status Block lines from the log, thus producing three traces of calls, one for each instruction in the log: .DS d1 >> RSB: code = STP, PI = uninit, PC = 0, LB = 2147483644, LIN = 0, FIL = NULL d1 >> RSB: code = CAL, PI = (0,0), PC = 16, LB = 2147483608, LIN = 0, FIL = NULL d1 >> AB = 2147483596, LB = 2147483572, SP = 2147483560, HP = 848, LIN = 4, FIL = "t.c" d1 >> RSB: code = STP, PI = uninit, PC = 0, LB = 2147483644, LIN = 0, FIL = NULL d1 >> RSB: code = CAL, PI = (0,0), PC = 16, LB = 2147483608, LIN = 0, FIL = NULL d1 >> AB = 2147483596, LB = 2147483572, SP = 2147483560, HP = 848, LIN = 4, FIL = "t.c" d1 >> RSB: code = STP, PI = uninit, PC = 0, LB = 2147483644, LIN = 0, FIL = NULL d1 >> RSB: code = CAL, PI = (0,0), PC = 16, LB = 2147483608, LIN = 0, FIL = NULL d1 >> AB = 2147483596, LB = 2147483572, SP = 2147483564, HP = 848, LIN = 4, FIL = "t.c" .DE Theoretically, the pertinent trace is the middle one, but in practice all three are equal. In the present case there isn't much to trace, but in real programs the trace can be useful. .sp .LP .I "Errors in libraries" .PP Since libraries are generally compiled with suppression of line number and file name information, the line number and file name in the interpreter will not be updated when it enters a library routine. Consequently, all messages generated by interpreting library routines will seem to originate from the line of the call. This is especially true for the routine malloc(), which, from the nature of its business, often contains dubitable code. .PP A usual message is: .DS (Warning 43, #1): Local integer expected at "buff.c", line 18, INR = 266 (Warning 64, cont.): Actual memory contains a data pointer at "buff.c", line 18, INR = 266 .DE and indeed at line 18 of the file buff.c we find: .DS buff = malloc(buff_size = BFSIZE); .DE This problem can be avoided by using a specially compiled version of the library that contains the correct LIN and FIL instructions, or, less elegantly, by including the source code of the library routines in the program; in the latter case, one has to be sure to have them all. .sp .LP .I "Unavoidable messages" .br Some messages produced by the logging are almost unavoidable; sometimes the writer of a library routine is forced to take liberties with the semantics of EM. .LP Examples from C include the memory allocation routines. For efficiency reasons, one bit of an pointer in the administration is used as a flag; setting, clearing and reading this bit requires bitwise operations on pointers, which gives the above messages. Realloc causes a problem in that it may have to copy the originally allocated area to a different place; this area may contain uninitialised bytes. .bp .DS .ft CW @x9 "t.c", line 4, INR = 16, PC = 30 OPCODE = 180 @L6 "t.c", line 4, INR = 16, DoLOLm(-4) d2 d2 . . STACK_DUMP[4/4] . . INR = 16 . . STACK_DUMP . . d2 ---------------------------------------------------------------- d2 ADDRESS BYTE ITEM VALUE SHADOW d2 2147483643 0 (Dp) d2 2147483642 0 (Dp) d2 2147483641 0 (Dp) d2 2147483640 40 [ 40] (Dp) d2 2147483639 0 (Dp) d2 2147483638 0 (Dp) d2 2147483637 3 (Dp) d2 2147483636 64 [ 832] (Dp) d2 2147483635 0 (In) d2 2147483634 0 (In) d2 2147483633 0 (In) d2 2147483632 1 [ 1] (In) d1 >> RSB: code = STP, PI = uninit, PC = 0, LB = 2147483644, LIN = 0, FIL = NULL d2 d2 ADDRESS BYTE ITEM VALUE SHADOW d2 2147483607 0 (Dp) d2 2147483606 0 (Dp) d2 2147483605 0 (Dp) d2 2147483604 40 [ 40] (Dp) d2 2147483603 0 (Dp) d2 2147483602 0 (Dp) d2 2147483601 3 (Dp) d2 2147483600 64 [ 832] (Dp) d2 2147483599 0 (In) d2 2147483598 0 (In) d2 2147483597 0 (In) d2 2147483596 1 [ 1] (In) d1 >> RSB: code = CAL, PI = (0,0), PC = 16, LB = 2147483608, LIN = 0, FIL = NULL d2 d2 ADDRESS BYTE ITEM VALUE SHADOW d2 2147483571 undef d2 | | | | | | d2 2147483568 undef (1 word) d2 2147483567 0 (In) d2 2147483566 0 (In) d2 2147483565 3 (In) d2 2147483564 9 [ 777] (In) d2 2147483563 undef d2 | | | | | | d2 2147483560 undef (1 word) d2 FRA: size = 0, undefined d1 >> AB = 2147483596, LB = 2147483572, SP = 2147483560, HP = 848, \e LIN = 4, FIL = "t.c" d2 ---------------------------------------------------------------- d2 @x9 "t.c", line 4, INR = 17, PC = 31 OPCODE = 169 @w1 "t.c", line 4, INR = 17, warning: Local data pointer expected [stack.c: 242] @w1 "t.c", line 4, INR = 17, warning cont.: Actual memory is undefined @L6 "t.c", line 4, INR = 17, DoLOIm(4) d2 d2 . . STACK_DUMP[4/4] . . INR = 17 . . STACK_DUMP . . d2 ---------------------------------------------------------------- d2 ADDRESS BYTE ITEM VALUE SHADOW d2 2147483643 0 (Dp) d2 2147483642 0 (Dp) d2 2147483641 0 (Dp) d2 2147483640 40 [ 40] (Dp) d2 2147483639 0 (Dp) d2 2147483638 0 (Dp) d2 2147483637 3 (Dp) d2 2147483636 64 [ 832] (Dp) d2 2147483635 0 (In) d2 2147483634 0 (In) d2 2147483633 0 (In) d2 2147483632 1 [ 1] (In) d1 >> RSB: code = STP, PI = uninit, PC = 0, LB = 2147483644, LIN = 0, FIL = NULL d2 d2 ADDRESS BYTE ITEM VALUE SHADOW d2 2147483607 0 (Dp) d2 2147483606 0 (Dp) d2 2147483605 0 (Dp) d2 2147483604 40 [ 40] (Dp) d2 2147483603 0 (Dp) d2 2147483602 0 (Dp) d2 2147483601 3 (Dp) d2 2147483600 64 [ 832] (Dp) d2 2147483599 0 (In) d2 2147483598 0 (In) d2 2147483597 0 (In) d2 2147483596 1 [ 1] (In) d1 >> RSB: code = CAL, PI = (0,0), PC = 16, LB = 2147483608, LIN = 0, FIL = NULL d2 d2 ADDRESS BYTE ITEM VALUE SHADOW d2 2147483571 undef d2 | | | | | | d2 2147483568 undef (1 word) d2 2147483567 0 (In) d2 2147483566 0 (In) d2 2147483565 3 (In) d2 2147483564 9 [ 777] (In) d2 2147483563 0 (In) d2 2147483562 0 (In) d2 2147483561 0 (In) d2 2147483560 4 [ 4] (In) d2 FRA: size = 0, undefined d1 >> AB = 2147483596, LB = 2147483572, SP = 2147483560, HP = 848, \e LIN = 4, FIL = "t.c" d2 ---------------------------------------------------------------- d2 @x9 "t.c", line 4, INR = 18, PC = 32 OPCODE = 229 @S6 "t.c", line 4, INR = 18, DoSTLm(-8) d2 d2 . . STACK_DUMP[4/4] . . INR = 18 . . STACK_DUMP . . d2 ---------------------------------------------------------------- d2 ADDRESS BYTE ITEM VALUE SHADOW d2 2147483643 0 (Dp) d2 2147483642 0 (Dp) d2 2147483641 0 (Dp) d2 2147483640 40 [ 40] (Dp) d2 2147483639 0 (Dp) d2 2147483638 0 (Dp) d2 2147483637 3 (Dp) d2 2147483636 64 [ 832] (Dp) d2 2147483635 0 (In) d2 2147483634 0 (In) d2 2147483633 0 (In) d2 2147483632 1 [ 1] (In) d1 >> RSB: code = STP, PI = uninit, PC = 0, LB = 2147483644, LIN = 0, FIL = NULL d2 d2 ADDRESS BYTE ITEM VALUE SHADOW d2 2147483607 0 (Dp) d2 2147483606 0 (Dp) d2 2147483605 0 (Dp) d2 2147483604 40 [ 40] (Dp) d2 2147483603 0 (Dp) d2 2147483602 0 (Dp) d2 2147483601 3 (Dp) d2 2147483600 64 [ 832] (Dp) d2 2147483599 0 (In) d2 2147483598 0 (In) d2 2147483597 0 (In) d2 2147483596 1 [ 1] (In) d1 >> RSB: code = CAL, PI = (0,0), PC = 16, LB = 2147483608, LIN = 0, FIL = NULL d2 d2 ADDRESS BYTE ITEM VALUE SHADOW d2 2147483571 undef d2 | | | | | | d2 2147483568 undef (1 word) d2 2147483567 0 (In) d2 2147483566 0 (In) d2 2147483565 0 (In) d2 2147483564 4 [ 4] (In) d2 FRA: size = 0, undefined d1 >> AB = 2147483596, LB = 2147483572, SP = 2147483564, HP = 848, \e LIN = 4, FIL = "t.c" d2 ---------------------------------------------------------------- d2 .DE