1587 lines
		
	
	
	
		
			51 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			1587 lines
		
	
	
	
		
			51 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
.nr PS 12
 | 
						|
.nr VS 14
 | 
						|
.nr LL 6i
 | 
						|
.tr ~
 | 
						|
.TL
 | 
						|
The Code Expander Generator
 | 
						|
.AU
 | 
						|
Frans Kaashoek
 | 
						|
Koen Langendoen
 | 
						|
.AI
 | 
						|
Dept. of Mathematics and Computer Science
 | 
						|
Vrije Universiteit
 | 
						|
Amsterdam, The Netherlands
 | 
						|
.NH
 | 
						|
Introduction
 | 
						|
.PP
 | 
						|
A \fBcode expander\fR (\fBce\fR for short) is a part of the 
 | 
						|
Amsterdam Compiler Kit
 | 
						|
.[
 | 
						|
toolkit
 | 
						|
.]
 | 
						|
(\fBACK\fR) and provides the user with
 | 
						|
high-speed generation of medium-quality code. Although conceptually
 | 
						|
equivalent to the more usual \fBcode generator\fR, it differs in some
 | 
						|
aspects.
 | 
						|
.PP
 | 
						|
Normally, a program to be compiled with \fBACK\fR
 | 
						|
is first fed to the preprocessor. The output of the preprocessor goes 
 | 
						|
into the appropriate front end, which produces EM
 | 
						|
.[
 | 
						|
block
 | 
						|
.]
 | 
						|
(a
 | 
						|
machine independent low level intermediate code). The generated EM code is fed
 | 
						|
into the peephole optimizer, which scans it with a window of a few instructions,
 | 
						|
replacing certain inefficient code sequences by better ones. After the
 | 
						|
peephole optimizer a back end follows, which produces high-quality assembly code.
 | 
						|
The assembly code goes via the target optimizer into the assembler and the
 | 
						|
object code then goes into the
 | 
						|
linker/loader, the final component in the pipeline. 
 | 
						|
.PP
 | 
						|
For various applications 
 | 
						|
this scheme is too slow. When debugging, for example, 
 | 
						|
compile time is more important than execution time of a program.
 | 
						|
For this purpose a new scheme is introduced:
 | 
						|
.IP \ \ 1:
 | 
						|
The code generator and assembler are
 | 
						|
replaced by a library, the \fBcode expander\fR, consisting of a set of 
 | 
						|
routines, one for every EM-instruction. Each routine expands its EM-instruction
 | 
						|
into relocatable object code. In contrast, the usual ACK code generator uses
 | 
						|
expensive pattern matching on sequences of EM-instructions.
 | 
						|
The peephole and target optimizer are not used.
 | 
						|
.IP \ \ 2:
 | 
						|
These routines replace the usual EM-generating routines in the front end; this
 | 
						|
eliminates the overhead of intermediate files.
 | 
						|
.LP
 | 
						|
This results in a fast compiler producing object file, ready to be
 | 
						|
linked and loaded, at the cost of unoptimized object code.
 | 
						|
.PP
 | 
						|
Because of the
 | 
						|
simple nature of the code expander, it is much easier to build, to debug, and to
 | 
						|
test. Experience has demonstrated that a code expander can be constructed,
 | 
						|
debugged, and tested in less than two weeks.
 | 
						|
.PP
 | 
						|
This document describes the tools for automatically generating a
 | 
						|
\fBce\fR (a library of C files) from two tables and 
 | 
						|
a few machine-dependent functions. 
 | 
						|
A thorough knowledge of EM is necessary to understand this document.
 | 
						|
.NH
 | 
						|
The code expander generator
 | 
						|
.PP
 | 
						|
The code expander generator (\fBceg\fR) generates a code expander from 
 | 
						|
two tables and a few machine-dependent functions. This section explains how 
 | 
						|
\fBceg\fR works. The first half describes the transformations that are done on
 | 
						|
the two tables. The 
 | 
						|
second half tells how these transformations are done by the \fBceg\fR.
 | 
						|
.PP
 | 
						|
A code expander consists of a set of routines that convert EM-instructions
 | 
						|
directly to relocatable object code. These routines are called by a front 
 | 
						|
end through the EM_CODE(3ACK)
 | 
						|
.[
 | 
						|
EM_CODE
 | 
						|
.]
 | 
						|
interface. To free the table writer of the burden of building
 | 
						|
an object file, we supply a set of routines that build an object file
 | 
						|
in the ACK.OUT(5ACK)
 | 
						|
.[
 | 
						|
aout
 | 
						|
.]
 | 
						|
format (see appendix B). This set of routines is called
 | 
						|
the
 | 
						|
\fBback\fR-primitives (see appendix A). In short, a code expander consists of a
 | 
						|
set of routines that map the EM_CODE interface on the 
 | 
						|
\fBback\fR-primitives interface.
 | 
						|
.PP
 | 
						|
To avoid repetition of the same sequences of
 | 
						|
\fBback\fR-primitives in different
 | 
						|
EM-instructions
 | 
						|
and to improve readability, the EM-to-object information must be supplied in
 | 
						|
two
 | 
						|
tables. The EM_table maps EM to an assembly language, and the as_table
 | 
						|
maps
 | 
						|
assembly code to \fBback\fR-primitives. The assembly language is chosen by the
 | 
						|
table writer. It can either be an actual assembly language or his ad-hoc 
 | 
						|
designed language.
 | 
						|
.LP
 | 
						|
The following picture shows the dependencies between the different components:
 | 
						|
.sp
 | 
						|
.PS
 | 
						|
linewid = 0.5i
 | 
						|
A: line down 2i
 | 
						|
B: line down 2i with .start at A.start + (1.5i, 0)
 | 
						|
C: line down 2i with .start at B.start + (1.5i, 0)
 | 
						|
D: arrow right with .start at A.center - (0.25i, 0)
 | 
						|
E: arrow right with .start at B.center - (0.25i, 0)
 | 
						|
F: arrow right with .start at C.center - (0.25i, 0)
 | 
						|
"EM_CODE(3ACK)" at A.start above
 | 
						|
"EM_table" at B.start above
 | 
						|
"as_table" at C.start above
 | 
						|
"source language  " at D.start rjust
 | 
						|
"EM" at 0.5 of the way between D.end and E.start
 | 
						|
G: "assembly" at 0.5 of the way between E.end and F.start
 | 
						|
H: "  back primitives" at F.end ljust
 | 
						|
"(user defined)" at G - (0, 0.2i)
 | 
						|
"   (ACK.OUT)" at H - (0, 0.2i) ljust
 | 
						|
.PE
 | 
						|
.PP
 | 
						|
The picture suggests that, during compilation, the EM instructions are
 | 
						|
first transformed into assembly instructions and then the assembly instructions
 | 
						|
are transformed into object-generating calls. This
 | 
						|
is not what happens in practice, although the user is free to think it does.
 | 
						|
Actually, however the EM_table and the as_table are combined during code
 | 
						|
expander generation time, yielding an imaginary compound table that results in
 | 
						|
routines from the EM_CODE interface that generate object code directly.
 | 
						|
.PP
 | 
						|
As already indicated, the compound table does not exist either. Instead, each
 | 
						|
assembly instruction in the as_table is converted to a routine generating C
 | 
						|
.[
 | 
						|
Kernighan
 | 
						|
.]
 | 
						|
code
 | 
						|
to generate C code to call the \fBback\fR-primitives. The EM_table is
 | 
						|
converted into a program that for each EM instruction generates a routine,
 | 
						|
using the routines generated from the as_table. Execution of the latter program
 | 
						|
will then generate the code expander.
 | 
						|
.PP
 | 
						|
This scheme allows great flexibility 
 | 
						|
in the table writing, while still
 | 
						|
resulting in a very efficient code expander. One implication is that the
 | 
						|
as_table is interpreted twice and the EM_table only once. This has consequences
 | 
						|
for their structure.
 | 
						|
.PP
 | 
						|
To illustrate what happens, we give an example. The example is an entry in
 | 
						|
the tables for the VAX-machine. The assembly language chosen is a subset of the 
 | 
						|
VAX assembly language.
 | 
						|
.PP
 | 
						|
One of the most fundamental operations in EM is ``loc c'', load the value of c
 | 
						|
on the stack. To expand this instruction the 
 | 
						|
tables contain the following information:
 | 
						|
.DS
 | 
						|
EM_table   :
 | 
						|
.ft CW
 | 
						|
   C_loc   ==>   "pushl $$$1".
 | 
						|
     /* $1 refers to the first argument of C_loc. 
 | 
						|
      * $$ is a quoted $. */
 | 
						|
 | 
						|
 | 
						|
\fRas_table   :
 | 
						|
.ft CW
 | 
						|
   pushl  src : CONST   ==> 
 | 
						|
                         @text1( 0xd0);
 | 
						|
                         @text1( 0xef);
 | 
						|
                         @text4( %$( src->num)).
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
.LP
 | 
						|
The as_table is transformed in the following routine:
 | 
						|
.DS
 | 
						|
.ft CW
 | 
						|
pushl_instr(src)
 | 
						|
t_operand *src;    
 | 
						|
/* ``t_operand'' is a struct defined by the 
 | 
						|
 * table writer. */
 | 
						|
{
 | 
						|
   printf("swtxt();");
 | 
						|
   printf("text1( 0xd0 );");
 | 
						|
   printf("text1( 0xef );");
 | 
						|
   printf("text4(%s);", substitute_dollar( src->num));
 | 
						|
}
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
Using ``pushl_instr()'', the following routine is generated from the EM_table:
 | 
						|
.DS
 | 
						|
.ft CW
 | 
						|
C_loc( c)
 | 
						|
arith c;
 | 
						|
/* text1() and text4() are library routines that fill the
 | 
						|
 * text segment. */
 | 
						|
{
 | 
						|
    swtxt();
 | 
						|
    text1( 0xd0);    
 | 
						|
    text1( 0xef);   
 | 
						|
    text4( c);
 | 
						|
}
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
.LP
 | 
						|
A compiler call to ``C_loc()'' will cause the 1-byte numbers ``0xd0'' 
 | 
						|
and ``0xef''
 | 
						|
and the 4-byte value of the variable ``c'' to be stored in the text segment.
 | 
						|
.PP
 | 
						|
The transformations on the tables are done automatically by the code expander
 | 
						|
generator.
 | 
						|
The code expander generator is made up of two tools:
 | 
						|
\fBemg\fR and \fBasg\fR. \fBAsg\fR 
 | 
						|
transforms 
 | 
						|
each assembly instruction into a C routine. These C routines generate calls
 | 
						|
to the \fBback\fR-primitives. The generated C routines are used
 | 
						|
by \fBemg\fR to generate the actual code expander from the EM_table.
 | 
						|
.PP
 | 
						|
The link between \fBemg\fR and \fBasg\fR is an assembly language.
 | 
						|
We did not enforce a specific syntax for the assembly language;
 | 
						|
instead we have given the table writer the freedom
 | 
						|
to make an ad-hoc assembly language or to use an actual assembly language 
 | 
						|
suitable for his purpose. Apart from a greater flexibility this
 | 
						|
has another advantage; if the table writer adopts the assembly language that
 | 
						|
runs on the machine at hand, he can test the EM_table independently from the
 | 
						|
as_table. Of course there is a price to pay: the table writer has to
 | 
						|
do the decoding of the operands himself. See section 4 for more details.
 | 
						|
.PP
 | 
						|
Before we describe the structure of the tables in detail, we will give 
 | 
						|
an overview of the four main phases.
 | 
						|
.IP "phase 1:"
 | 
						|
.br
 | 
						|
The as_table is transformed by \fBasg\fR. This results in a set of C routines. 
 | 
						|
Each assembly-opcode generates one C routine. Note that a call to such a
 | 
						|
routine does not generate the corresponding object code; it generates C code,
 | 
						|
which, when executed, generates the desired object code.
 | 
						|
.IP "phase 2:"
 | 
						|
.br
 | 
						|
The C routines generated by \fBasg\fR are used by emg to expand the EM_table. 
 | 
						|
This
 | 
						|
results in a set of C routines, the code expander, which conform to the 
 | 
						|
procedural interface EM_CODE(3ACK). A call to such a routine does indeed
 | 
						|
generate the desired object code.
 | 
						|
.IP "phase 3:"
 | 
						|
.br
 | 
						|
The front end that uses the procedural interface is linked/loaded with the
 | 
						|
code expander generated in phase 2 and the \fBback\fR-primitives (a supplied
 | 
						|
library). This results in a compiler.
 | 
						|
.IP "phase 4:"
 | 
						|
.br
 | 
						|
The compiler runs. The routines in the code expander are
 | 
						|
executed and produce object code.
 | 
						|
.RE
 | 
						|
.NH
 | 
						|
Description of the EM_table
 | 
						|
.PP
 | 
						|
This section describes the EM_table. It contains four subsections.
 | 
						|
The first 3 sections describe the syntax of the EM_table,
 | 
						|
the
 | 
						|
semantics of the EM_table, and the functions and
 | 
						|
constants that must be present in the EM_table, in the file ``mach.c'' or in
 | 
						|
the file ``mach.h''. The last section explains how a table writer can generate
 | 
						|
assembly code instead of object code. The section on
 | 
						|
semantics contains many examples.
 | 
						|
.NH 2
 | 
						|
Grammar
 | 
						|
.PP
 | 
						|
The following grammar describes the syntax of the EM_table.
 | 
						|
.VS +4
 | 
						|
.TS
 | 
						|
center tab(%);
 | 
						|
l c l.
 | 
						|
TABLE%::=%( RULE)*
 | 
						|
RULE%::=%C_instr   ( COND_SEQUENCE | SIMPLE)
 | 
						|
COND_SEQUENCE%::=%( condition   SIMPLE)*   ``default''   SIMPLE
 | 
						|
SIMPLE%::=% ``==>'' ACTION_LIST
 | 
						|
ACTION_LIST%::=%[ ACTION   ( ``;'' ACTION)* ]   ``.''
 | 
						|
ACTION%::=%AS_INSTR
 | 
						|
%|%function-call
 | 
						|
AS_INSTR%::=%``"'' [ label ``:'']   [ INSTR] ``"''
 | 
						|
INSTR%::=%mnemonic   [ operand   ( ``,''   operand)* ]
 | 
						|
.TE
 | 
						|
.VS -4
 | 
						|
.PP
 | 
						|
The ``('' ``)'' brackets are used for grouping, ``['' ... ``]'' 
 | 
						|
means ... 0 or 1 time,
 | 
						|
a ``*'' means zero or more times, and 
 | 
						|
a ``|'' means 
 | 
						|
a choice between left or right. A \fBC_instr\fR is 
 | 
						|
a name in the EM_CODE(3ACK) interface. \fBcondition\fR is a C expression. 
 | 
						|
\fBfunction-call\fR is a call of a C function. \fBlabel\fR, \fBmnemonic\fR,
 | 
						|
and \fBoperand\fR are arbitrary strings. If an \fBoperand\fR 
 | 
						|
contains brackets, the
 | 
						|
brackets must match. There is an upper bound on the number of
 | 
						|
operands; the maximum number is defined by the constant MAX_OPERANDS in de
 | 
						|
file ``const.h'' in the directory assemble.c. Comments in the table should be
 | 
						|
placed between ``/*'' and ``*/''. 
 | 
						|
The table is processed by the C preprocessor, before being parsed by
 | 
						|
\fBemg\fR.
 | 
						|
.NH 2
 | 
						|
Semantics
 | 
						|
.PP
 | 
						|
The EM_table is processed by \fBemg\fR. \fBEmg\fR generates a C function
 | 
						|
for every instruction in the EM_CODE(3ACK). 
 | 
						|
For every EM-instruction not mentioned in the EM_table, a
 | 
						|
C function that prints an error message is generated.
 | 
						|
It is possible to divide the EM_CODE(3ACK)-interface into four parts :
 | 
						|
.IP \0\01: 
 | 
						|
text instructions      (e.g., C_loc, C_adi, ..)
 | 
						|
.IP \0\02: 
 | 
						|
pseudo instructions    (e.g., C_open, C_df_ilb, ..)
 | 
						|
.IP \0\03: 
 | 
						|
storage instructions   (e.g., C_rom_icon,  ..)
 | 
						|
.IP \0\04: 
 | 
						|
message instructions   (e.g., C_mes_begin, ..)
 | 
						|
.LP
 | 
						|
This section starts with giving the semantics of the grammar. The examples
 | 
						|
are text instructions. The section ends with remarks on the pseudo
 | 
						|
instructions and the storage instructions. Since message instructions are not
 | 
						|
useful for a code expander, they are ignored. 
 | 
						|
.PP
 | 
						|
.NH 3
 | 
						|
Actions
 | 
						|
.PP
 | 
						|
The EM_table is made up of rules describing how to expand a \fBC_instr\fR
 | 
						|
defined by the EM_CODE(3ACK)-interface (corresponding 
 | 
						|
to an EM instruction) into actions. 
 | 
						|
There are two kinds of actions: assembly instructions and C function calls. 
 | 
						|
An assembly instruction is defined as a mnemonic followed by zero or more
 | 
						|
operands separated by commas. The semantics of an assembly instruction is
 | 
						|
defined by the table writer. When the assembly language is not expressive 
 | 
						|
enough, then, as an escape route, function calls can be made. However, this
 | 
						|
reduces
 | 
						|
the speed of the actual code expander. Finally, actions can be grouped into
 | 
						|
a list of actions; actions are separated by a semicolon and terminated 
 | 
						|
by a ``.''.
 | 
						|
.DS
 | 
						|
.ft CW
 | 
						|
C_nop   ==> .            
 | 
						|
       /* Empty action list : no operation. */
 | 
						|
 | 
						|
C_inc   ==> "incl (sp)". 
 | 
						|
       /* Assembler instruction, which is evaluated 
 | 
						|
        * during expansion of the EM_table */
 | 
						|
 | 
						|
C_slu   ==> C_sli( $1).  
 | 
						|
       /* Function call, which is evaluated during
 | 
						|
        *  execution of the compiler. */
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
.NH 3
 | 
						|
Labels
 | 
						|
.PP
 | 
						|
Since an assembly language without instruction labels is a rather weak 
 | 
						|
language, labels inside a contiguous block of assembly instructions are 
 | 
						|
allowed. When using labels two rules must be observed:
 | 
						|
.IP \0\01:
 | 
						|
The name of a label should be unique inside an action list.
 | 
						|
.IP \0\02:
 | 
						|
The labels used in an assembler instruction should be defined in the same
 | 
						|
action list.
 | 
						|
.LP
 | 
						|
The following example illustrates the usage of labels.
 | 
						|
.DS
 | 
						|
.ft CW
 | 
						|
   /* Compare the two top elements on the stack. */
 | 
						|
C_cmp      ==>     "pop bx";           
 | 
						|
                   "pop cx";          
 | 
						|
                   "xor ax, ax";
 | 
						|
                   "cmp cx, bx";
 | 
						|
                /* Forward jump to local label */
 | 
						|
                   "je 2f";  
 | 
						|
                   "jb 1f";
 | 
						|
                   "inc ax";
 | 
						|
                   "jmp 2f";
 | 
						|
                   "1: dec ax";
 | 
						|
                   "2: push ax".
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
We will come back to labels in the section on the as_table.
 | 
						|
.NH 3
 | 
						|
Arguments of an EM instruction
 | 
						|
.PP
 | 
						|
In most cases the translation of a \fBC_instr\fR depends on its arguments.
 | 
						|
The arguments of a \fBC_instr\fR are numbered from 1 to \fIn\fR, where \fIn\fR
 | 
						|
is the
 | 
						|
total number of arguments of the current \fBC_instr\fR (there are a few
 | 
						|
exceptions, see Implicit arguments). The table writer may
 | 
						|
refer to an argument as $\fIi\fR. If a plain $-sign is needed in an
 | 
						|
assembly instruction, it must be preceded by a extra $-sign.
 | 
						|
.PP
 | 
						|
There are two groups of \fBC_instr\fRs whose arguments are handled specially:
 | 
						|
.RS
 | 
						|
.IP "1: Instructions dealing with local offsets"
 | 
						|
.br
 | 
						|
The value of the $\fIi\fR argument referring to a parameter ($\fIi\fR >= 0)
 | 
						|
is increased by ``EM_BSIZE''. ``EM_BSIZE'' is the size of the return status block
 | 
						|
and must be defined in the file ``mach.h'' (see section 3.3). For example :
 | 
						|
.DS
 | 
						|
.ft CW
 | 
						|
C_lol   ==>     "push $1(bp)". 
 | 
						|
       /* automatic conversion of $1 */
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
.IP "2: Instructions using global names or instruction labels"
 | 
						|
.br
 | 
						|
All the arguments referring to global names or instruction labels will be
 | 
						|
transformed into a unique assembly name. To prevent name clashes with library
 | 
						|
names the table writer has to provide the
 | 
						|
conversions in the file ``mach.h''. For example :
 | 
						|
.DS
 | 
						|
.ft CW
 | 
						|
C_bra   ==>     "jmp $1". 
 | 
						|
        /* automatic conversion of $1 */
 | 
						|
        /* type arith is converted to string */
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
.RE
 | 
						|
.NH 3
 | 
						|
Conditionals
 | 
						|
.PP
 | 
						|
The rules in the EM_table can be divided into two groups: simple rules and 
 | 
						|
conditional rules. The simple rules are made up of a \fBC_instr\fR followed by 
 | 
						|
a list of actions, as described above. The conditional rules (COND_SEQUENCE)
 | 
						|
allow the table writer to select an action list depending on the value of 
 | 
						|
a condition. 
 | 
						|
.PP
 | 
						|
A CONDITIONAL is a list of a boolean expression with the corresponding
 | 
						|
simple rule. If
 | 
						|
the expression evaluates to true then the corresponding simple rule is carried
 | 
						|
out. If more than one condition evaluates to true, the first one is chosen.
 | 
						|
The last case of a COND_SEQUENCE of a \fBC_instr\fR must handle 
 | 
						|
the default case.
 | 
						|
The boolean expressions in a COND_SEQUENCE must be C expressions. Besides the
 | 
						|
ordinary C operators and constants, $\fIi\fR references can be used 
 | 
						|
in an expression. 
 | 
						|
.DS
 | 
						|
.ft CW
 | 
						|
    /* Load address of LB $1 levels back. */
 | 
						|
C_lxl                                 
 | 
						|
    $1 == 0    ==>    "pushl fp".
 | 
						|
    $1 == 1    ==>    "pushl 4(ap)".
 | 
						|
    default    ==>    "movl $$$1, r0";
 | 
						|
                      "jsb .lxl";
 | 
						|
                      "pushl r0".
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
.NH 3
 | 
						|
Abbreviations
 | 
						|
.PP
 | 
						|
EM instructions with an external as an argument come in three variants in
 | 
						|
the EM_CODE(3ACK) interface. In most cases it will be possible to take 
 | 
						|
these variants together. For this purpose the ``..'' notation is introduced. 
 | 
						|
For the code expander there is no difference between the 
 | 
						|
following instructions. 
 | 
						|
.DS
 | 
						|
.ft CW
 | 
						|
C_loe_dlb    ==>    "pushl $1 + $2".
 | 
						|
C_loe_dnam   ==>    "pushl $1 + $2".
 | 
						|
C_loe        ==>    "pushl $1 + $2".
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
So it can be written in the following way.
 | 
						|
.DS
 | 
						|
.ft CW
 | 
						|
C_loe..      ==>    "pushl $1 + $2".
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
.NH 3
 | 
						|
Implicit arguments
 | 
						|
.PP
 | 
						|
In the last example ``C_loe'' has two arguments, but in the EM_CODE interface 
 | 
						|
it has one argument. This argument depends on the current ``hol''
 | 
						|
block; in the EM_table this is made explicit. Every \fBC_instr\fR whose
 | 
						|
argument depends on a ``hol'' block has one extra argument; argument 1 refers
 | 
						|
to the ``hol'' block.
 | 
						|
.NH 3
 | 
						|
Pseudo instructions
 | 
						|
.PP
 | 
						|
Most pseudo instructions are machine independent and are provided
 | 
						|
by \fBceg\fR. The table writer has only to supply the following functions,
 | 
						|
which are used to build a stackframe:
 | 
						|
.DS
 | 
						|
.ft CW
 | 
						|
C_prolog()
 | 
						|
/* Performs the prolog, for example save 
 | 
						|
 * return address */
 | 
						|
 | 
						|
C_locals( n) 
 | 
						|
arith n;
 | 
						|
/* Allocate n bytes for locals on the stack */
 | 
						|
 | 
						|
C_jump( label)
 | 
						|
char *label;
 | 
						|
/* Generates code for a jump to ``label'' */
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
.LP
 | 
						|
These functions can be defined in ``mach.c'' or in the EM_table (see 
 | 
						|
section 3.3).
 | 
						|
.NH 3
 | 
						|
Storage instructions
 | 
						|
.PP
 | 
						|
The storage instructions ``C_bss_\fIcstp()\fR'', ``C_hol_\fIcstp()\fR'',
 | 
						|
''C_con_\fIcstp()\fR'', and ``C_rom_\fIcstp()\fR'', except for the instructions
 | 
						|
dealing with constants of type string (C_..._icon, C_..._ucon, C_..._fcon), are
 | 
						|
generated automatically. No information is needed in the table.
 | 
						|
To generate the C_..._icon, C_..._ucon, C_..._fcon instructions 
 | 
						|
\fBceg\fR only has to know how to convert a number of type string to bytes;
 | 
						|
this can be defined with the constants ONE_BYTE, TWO_BYTES, and FOUR_BYTES.
 | 
						|
C_rom_icon, C_con_icon, C_bss_icon, C_hol_icon can be abbreviated by ..icon.
 | 
						|
This also holds for ..ucon and ..fcon.
 | 
						|
For example :
 | 
						|
.DS
 | 
						|
.ft CW
 | 
						|
\\.\\.icon
 | 
						|
    $2 == 1   ==>  gen1( (ONE_BYTE) atoi( $1)).
 | 
						|
    $2 == 2   ==>  gen2( (TWO_BYTES) atoi( $1)).
 | 
						|
    $2 == 4   ==>  gen4( (FOUR_BYTES) atol( $1)).
 | 
						|
    default   ==>   arg_error( "..icon", $2).
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
Gen1(), gen2() and gen4() are \fBback\fR-primitives (see appendix A), and
 | 
						|
generate one, two, or four byte constants. Atoi() is a C library function that
 | 
						|
converts strings to integers.
 | 
						|
The constants ``ONE_BYTE'', ``TWO_BYTES'', and ``FOUR_BYTES'' must be defined in
 | 
						|
the file ``mach.h''.
 | 
						|
.NH 2
 | 
						|
User supplied definitions and functions
 | 
						|
.PP
 | 
						|
If the table writer uses all the default functions he has only to supply
 | 
						|
the following constants and functions :
 | 
						|
.TS
 | 
						|
tab(#);
 | 
						|
l c lw(10c).
 | 
						|
C_prolog()#:#T{
 | 
						|
Do prolog
 | 
						|
T}
 | 
						|
C_jump( l)#:#T{
 | 
						|
Perform a jump to label l
 | 
						|
T}
 | 
						|
C_locals( n)#:#T{
 | 
						|
Allocate n bytes on the stack
 | 
						|
T}
 | 
						|
#
 | 
						|
NAME_FMT#:#T{
 | 
						|
Print format describing name to a unique name conversion. The format must
 | 
						|
contain %s.
 | 
						|
T}
 | 
						|
DNAM_FMT#:#T{
 | 
						|
Print format describing data-label to a unique name conversion. The  format
 | 
						|
must contain %s.
 | 
						|
T}
 | 
						|
DLB_FMT#:#T{
 | 
						|
Print format describing numerical-data-label to a unique name conversion.
 | 
						|
The format must contain a %ld.
 | 
						|
T}
 | 
						|
ILB_FMT#:#T{
 | 
						|
Print format describing instruction-label to a unique name conversion.
 | 
						|
The format must contain %d followed by %ld.
 | 
						|
T}
 | 
						|
HOL_FMT#:#T{
 | 
						|
Print format describing hol-block-number to a unique name conversion.
 | 
						|
The format must contain %d.
 | 
						|
T}
 | 
						|
#
 | 
						|
EM_WSIZE#:#T{
 | 
						|
Size of a word in bytes on the target machine
 | 
						|
T}
 | 
						|
EM_PSIZE#:#T{
 | 
						|
Size of a pointer in bytes on the target machine
 | 
						|
T}
 | 
						|
EM_BSIZE#:#T{
 | 
						|
Size of base block in bytes on the target machine
 | 
						|
T}
 | 
						|
#
 | 
						|
ONE_BYTE#:#T{
 | 
						|
\\C suitable type that can hold one byte on the machine where the \fBce\fR runs
 | 
						|
T}
 | 
						|
TWO_BYTES#:#T{
 | 
						|
\\C suitable type that can hold two bytes on the machine where the \fBce\fR runs
 | 
						|
T}
 | 
						|
FOUR_BYTES#:#T{
 | 
						|
\\C suitable type that can hold four bytes on the machine where the \fBce\fR runs
 | 
						|
T}
 | 
						|
#
 | 
						|
BSS_INIT#:#T{
 | 
						|
The default value that the loader puts in the bss segment
 | 
						|
T}
 | 
						|
#
 | 
						|
BYTES_REVERSED#:#T{
 | 
						|
Must be defined if you want the byte order reversed.
 | 
						|
By default the least significant byte is outputted first.\fR\(dg
 | 
						|
.FS 
 | 
						|
\fR\(dg When both byte orders are used, for 
 | 
						|
example NS 16032, the table writer has to
 | 
						|
supply his own set of routines.
 | 
						|
.FE
 | 
						|
T}
 | 
						|
WORDS_REVERSED#:#T{
 | 
						|
Must be defined if you want the word order reversed.
 | 
						|
By default the least significant word is outputted first.
 | 
						|
T}
 | 
						|
.TE
 | 
						|
.LP
 | 
						|
An example of the file ``mach.h'' for the vax4.
 | 
						|
.TS
 | 
						|
tab(:);
 | 
						|
l l l.
 | 
						|
#define : ONE_BYTE : int
 | 
						|
#define : TWO_BYTES : int
 | 
						|
#define : FOUR_BYTES : long
 | 
						|
:
 | 
						|
#define : EM_WSIZE : 4
 | 
						|
#define : EM_PSIZE : 4
 | 
						|
#define : EM_BSIZE : 0
 | 
						|
:
 | 
						|
#define : BSS_INIT : 0
 | 
						|
:
 | 
						|
#define : NAME_FMT : "_%s"
 | 
						|
#define : DNAM_FMT : "_%s"
 | 
						|
#define : DLB_FMT  : "_%ld"
 | 
						|
#define : ILB_FMT  : "I%03d%ld"
 | 
						|
#define : HOL_FMT  : "hol%d"
 | 
						|
.TE
 | 
						|
Notice that EM_BSIZE is zero. The vax ``call'' instruction takes automatically
 | 
						|
care of the base block.
 | 
						|
.PP
 | 
						|
There are three primitives that have to be defined by the table writer, either
 | 
						|
as functions in the file ``mach.c'' or as rules in the EM_table.
 | 
						|
For example, for the 8086 they look like this:
 | 
						|
.DS
 | 
						|
.ft CW
 | 
						|
C_jump       ==>       "jmp $1".
 | 
						|
 | 
						|
C_prolog     ==>       "push bp";
 | 
						|
                     "mov bp, sp".
 | 
						|
 | 
						|
C_locals     
 | 
						|
  $1  == 0   ==>     .
 | 
						|
  $1  == 2   ==>     "push ax".
 | 
						|
  $1  == 4   ==>     "push ax";
 | 
						|
                     "push ax".
 | 
						|
  default    ==>     "sub sp, $1".
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
.NH 2
 | 
						|
Generating assembly code 
 | 
						|
.PP
 | 
						|
When the code expander generator is used for generating assembly instead of
 | 
						|
object code (see section 5), additional print formats have to be defined 
 | 
						|
in ``mach.h''. The following table lists these formats.
 | 
						|
.TS
 | 
						|
tab(#);
 | 
						|
l c lw(10c).
 | 
						|
BYTE_FMT#:#T{
 | 
						|
Print format to allocate and initialize one byte. The format must 
 | 
						|
contain %ld.
 | 
						|
T}
 | 
						|
WORD_FMT#:#T{
 | 
						|
Print format to allocate and initialize one word. The format must 
 | 
						|
contain %ld.
 | 
						|
T}
 | 
						|
LONG_FMT#:#T{
 | 
						|
Print format to allocate and initialize one long. The format must 
 | 
						|
contain %ld.
 | 
						|
T}
 | 
						|
BSS_FMT#:#T{
 | 
						|
Print format to allocate space in the bss segment. The format must 
 | 
						|
contain %ld (number of bytes).
 | 
						|
T}
 | 
						|
COMM_FMT#:#T{
 | 
						|
Print format to declare a "common". The format must contain a %s (name to be declared
 | 
						|
common), followed by a %ld (number of bytes).
 | 
						|
T}
 | 
						|
 | 
						|
SEGTXT_FMT#:#T{
 | 
						|
Print format to switch to the text segment.
 | 
						|
T}
 | 
						|
SEGDAT_FMT#:#T{
 | 
						|
Print format to switch to the data segment.
 | 
						|
T}
 | 
						|
SEGBSS_FMT#:#T{
 | 
						|
Print format to switch to the bss segment.
 | 
						|
T}
 | 
						|
 | 
						|
SYMBOL_DEF_FMT#:#T{
 | 
						|
Print format to define a label. The format must contain %s.
 | 
						|
T}
 | 
						|
GLOBAL_FMT#:#T{
 | 
						|
Print format to declare a global name. The format must contain %s.
 | 
						|
T}
 | 
						|
LOCAL_FMT#:#T{
 | 
						|
Print format to declare a local name. The format must contain %s.
 | 
						|
T}
 | 
						|
 | 
						|
RELOC1_FMT#:#T{
 | 
						|
Print format to initialize a byte with an address expression. The format must
 | 
						|
contain %s (name) and %ld (offset).
 | 
						|
T}
 | 
						|
RELOC2_FMT#:#T{
 | 
						|
Print format to initialize a word with an address expression. The format must
 | 
						|
contain %s (name) and %ld (offset).
 | 
						|
T}
 | 
						|
RELOC4_FMT#:#T{
 | 
						|
Print format to initialize a long with an address expression. The format must
 | 
						|
contain %s (name) and %ld (offset).
 | 
						|
T}
 | 
						|
 | 
						|
ALIGN_FMT#:#T{
 | 
						|
Print format to align a segment.
 | 
						|
T}
 | 
						|
.TE
 | 
						|
.NH 1
 | 
						|
Description of the as_table
 | 
						|
.PP
 | 
						|
This section describes the as_table. Like the previous section, it is divided 
 | 
						|
into
 | 
						|
four parts: the first two parts describe the grammar and the semantics of the 
 | 
						|
as_table; the third part gives an overview
 | 
						|
of the functions and the constants that must be present in the as_table (in 
 | 
						|
the file ``as.h'' or in the file ``as.c''); the last part describes the case when
 | 
						|
assembly is generated instead of object code.
 | 
						|
The part on semantics contains examples that appear in the as_table for the
 | 
						|
VAX or for the 8086. 
 | 
						|
.NH 2
 | 
						|
Grammar
 | 
						|
.PP
 | 
						|
The form of the as_table is given by the following grammar :
 | 
						|
.VS +4
 | 
						|
.TS
 | 
						|
center tab(#);
 | 
						|
l c l.
 | 
						|
TABLE#::=#( RULE)*
 | 
						|
RULE#::=#( mnemonic | ``...'')   DECL_LIST   ``==>''   ACTION_LIST
 | 
						|
DECL_LIST#::=#DECLARATION   ( ``,''   DECLARATION)*
 | 
						|
DECLARATION#::=#operand   [ ``:''   type]
 | 
						|
ACTION_LIST#::=#ACTION   ( ``;''   ACTION) ``.''
 | 
						|
ACTION#::=#IF_STATEMENT
 | 
						|
#|#function-call
 | 
						|
#|#``@''function-call
 | 
						|
IF_STATEMENT#::=#''@if''   ``('' condition ``)''   ACTION_LIST
 | 
						|
##( ``@elsif''   ``('' condition ``)''   ACTION_LIST)*
 | 
						|
##[ ``@else''   ACTION_LIST]
 | 
						|
##''@fi''
 | 
						|
function-call#::=#function-identifier ``('' [arg (,arg)*] ``)''
 | 
						|
arg#::=#argument
 | 
						|
#|#reference
 | 
						|
.TE
 | 
						|
.VS -4
 | 
						|
.LP
 | 
						|
\fBmnemonic\fR, \fBoperand\fR, and \fBtype\fR are all C identifiers;
 | 
						|
\fBcondition\fR is a normal C expression;
 | 
						|
\fBfunction-call\fR must be a C function call. A function can be called with
 | 
						|
standard C arguments or with a reference (see section 4.2.4).
 | 
						|
Since the as_table is
 | 
						|
interpreted during code expander generation as well as during code
 | 
						|
expander execution, two levels of calls are present in it. A ``function-call''
 | 
						|
is done during code expander generation, a ``@function-call'' during code
 | 
						|
expander execution.
 | 
						|
.NH 2
 | 
						|
Semantics
 | 
						|
.PP
 | 
						|
The as_table is made up of rules that map assembly instructions onto
 | 
						|
\fBback\fR-primitives, a set of functions that construct an object file. 
 | 
						|
The table is processed by \fBasg\fR, which generates a C functions
 | 
						|
for each assembler mnemonic. The names of
 | 
						|
these functions are the assembler mnemonics postfixed 
 | 
						|
with ``_instr'' (e.g., ``add'' becomes ``add_instr()''). These functions 
 | 
						|
will be used by the function 
 | 
						|
assemble() during the expansion of the EM_table. 
 | 
						|
After explaining the semantics of the as_table the function
 | 
						|
assemble() will be described.
 | 
						|
.NH 3
 | 
						|
Rules
 | 
						|
.PP
 | 
						|
A rule in the as_table is made up of a left and a right hand side; 
 | 
						|
the left hand side describes an assembler 
 | 
						|
instruction (mnemonic and operands); the
 | 
						|
right hand side gives the corresponding actions as \fBback\fR-primitives or as
 | 
						|
functions defined by the table writer, which call \fBback-primitives\fR.
 | 
						|
Two simple examples from the VAX as_table and the 8086 as_table, resp.:
 | 
						|
.DS
 | 
						|
.ft CW
 | 
						|
movl src, dst  ==> @text1( 0xd0);
 | 
						|
                   gen_operand( src); 
 | 
						|
                   gen_operand( dst). 
 | 
						|
    /* ``gen_operand'' is a function that encodes 
 | 
						|
     * operands by calling back-primitives. */
 | 
						|
 | 
						|
rep ens:MOVS   ==>  @text1( 0xf3);
 | 
						|
                    @text1( 0xa5).  
 | 
						|
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
.NH 3
 | 
						|
Declaration of types.
 | 
						|
.PP
 | 
						|
In general, a machine instruction is encoded as an opcode followed by zero or
 | 
						|
more
 | 
						|
the operands. There are two methods for mapping assembler mnemonics
 | 
						|
onto opcodes: the mnemonic determines the opcode, or mnemonic and operands 
 | 
						|
together determine the opcode. Both cases can be 
 | 
						|
easily expressed in the as_table.
 | 
						|
The first case is obvious. 
 | 
						|
The second case is handled by introducing type fields for the operands.
 | 
						|
.PP
 | 
						|
When mnemonic and operands together determine the opcode, the table writer has 
 | 
						|
to give several rules for each combination of mnemonic and operands. The rules
 | 
						|
differ in the type fields of the operands.
 | 
						|
The table writer has to supply functions that check the type
 | 
						|
of the operand. The name of such a function is the name of the type; it
 | 
						|
has one argument: a pointer to a struct of type \fIt_operand\fR; it returns
 | 
						|
non-zero when the operand is of this type, otherwise it returns 0.
 | 
						|
.PP
 | 
						|
This will usually lead to a list of rules per mnemonic. To reduce the amount of
 | 
						|
work an abbreviation is supplied. Once the mnemonic is specified it can be
 | 
						|
referred to in the following rules by ``...''.
 | 
						|
One has to make sure
 | 
						|
that each mnemonic is mentioned only once in the as_table, otherwise 
 | 
						|
\fBasg\fR will generate more than one function with the same name.
 | 
						|
.PP
 | 
						|
The following example shows the usage of type fields.
 | 
						|
.DS 
 | 
						|
.ft CW
 | 
						|
 mov dst:REG, src:EADDR  ==>  
 | 
						|
          @text1( 0x8b);                /* opcode */
 | 
						|
          mod_RM( %d(dst->reg), src). /* operands */
 | 
						|
 | 
						|
 ... dst:EADDR, src:REG  ==>  
 | 
						|
          @text1( 0x89);                /* opcode */
 | 
						|
          mod_RM( %d(src->reg), dst). /* operands */
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
The table-writer must supply the restriction functions, 
 | 
						|
.ft CW
 | 
						|
REG\fR and
 | 
						|
.ft CW
 | 
						|
EADDR\fR in the previous example, in ``as.c'' or ''as.h''.
 | 
						|
.NH 3 
 | 
						|
The function of the @-sign and the if-statement.
 | 
						|
.PP
 | 
						|
The right hand side of a rule is made up of function calls. 
 | 
						|
Since the as_table is
 | 
						|
interpreted on two levels, during code expander generation and during code
 | 
						|
expander execution, two levels of calls are present in it. A function-call
 | 
						|
without an ``@''-sign
 | 
						|
is called during code expander generation (e.g., the
 | 
						|
.ft CW
 | 
						|
gen_operand()\fR in the
 | 
						|
first example). 
 | 
						|
A function call with an ``@''-sign is called during code 
 | 
						|
expander execution (e.g.,
 | 
						|
the \fBback\fR-primitives). So the last group will be part of the compiler.
 | 
						|
.PP
 | 
						|
The need for the ``@''-sign construction arises, for example, when you 
 | 
						|
implement push/pop optimization (e.g., ``push x'' followed by ``pop y'' 
 | 
						|
can be replaced by ``move x, y'').
 | 
						|
In this case flags need to be set, unset, and tested during the execution of
 | 
						|
the compiler:
 | 
						|
.DS L
 | 
						|
.ft CW
 | 
						|
PUSH src  ==>   /* save in ax */
 | 
						|
                mov_instr( AX_oper, src);  
 | 
						|
                /* set flag */
 | 
						|
                @assign( push_waiting, TRUE).         
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
.DS
 | 
						|
.ft CW
 | 
						|
POP dst   ==>   @if ( push_waiting)
 | 
						|
                       /* ``mov_instr'' is asg-generated */
 | 
						|
                       mov_instr( dst, AX_oper);      
 | 
						|
                       @assign( push_waiting, FALSE).
 | 
						|
                @else
 | 
						|
                       /* ``pop_instr'' is asg-generated */
 | 
						|
                       pop_instr( dst).               
 | 
						|
                @fi.
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
.LP
 | 
						|
Although the @-sign is followed syntactically by a
 | 
						|
function name, this function can very well be the name of a macro defined in C.
 | 
						|
This is in fact the case with ``@assign()'' in the above example.
 | 
						|
.PP
 | 
						|
The case may arise when information is needed that is not known 
 | 
						|
until execution of
 | 
						|
the compiler.  For example one needs to know if a ``$\fIi\fR'' argument fits in
 | 
						|
one byte.
 | 
						|
In this case one can use a special if-statement provided 
 | 
						|
by \fBasg\fR: @if, @elsif, @else, @fi. This means that the conditions 
 | 
						|
will be evaluated at
 | 
						|
run time of the \fBce\fR. In such a condition one may of course refer 
 | 
						|
to the ''$\fIi\fR'' arguments. For example, constants can be 
 | 
						|
packed into one or two byte arguments as follows:
 | 
						|
.DS 
 | 
						|
.ft CW
 | 
						|
mov dst:ACCU, src:DATA ==> 
 | 
						|
                       @if ( fits_byte( %$(dst->expr)))
 | 
						|
                            @text1( 0xc0);
 | 
						|
                            @text1( %$(dst->expr)).
 | 
						|
                       @else
 | 
						|
                            @text1( 0xc8);
 | 
						|
                            @text2( %$(dst->expr)).
 | 
						|
                       @fi.
 | 
						|
.DE
 | 
						|
.NH 3
 | 
						|
References to operands
 | 
						|
.PP
 | 
						|
As noted before, the operands of an assembler instruction may be used as
 | 
						|
pointers to the struct \fIt_operand\fR in the right hand side of the table.
 | 
						|
Because of the free format assembler, the types of the fields in the struct
 | 
						|
\fIt_operand\fR are unknown to \fBasg\fR. As these fields can appear in calls
 | 
						|
to functions, \fBasg\fR must know 
 | 
						|
these types. This section explains how these types must be specified.
 | 
						|
.PP
 | 
						|
References to operands come in three forms: ordinary operands, operands that
 | 
						|
contain ``$\fIi\fR'' references, and operands that refer to names of local labels.
 | 
						|
The ``$\fIi\fR'' in operands represent names or numbers of a \fBC_instr\fR and must
 | 
						|
be given as arguments to the \fBback\fR-primitives. Labels in operands
 | 
						|
must be converted to a number that tells the distance, the number of bytes, 
 | 
						|
between the label and the current position in the text-segment. 
 | 
						|
.LP
 | 
						|
All these three cases are treated in an uniform way. When the table writer
 | 
						|
makes a reference to an operand of an assembly instruction, he must describe
 | 
						|
the type of the operand in the following way.
 | 
						|
.VS +4
 | 
						|
.TS
 | 
						|
center tab(#);
 | 
						|
l c l.
 | 
						|
reference#::=#``%'' conversion
 | 
						|
##``('' operand-name ``\->'' field-name ``)''
 | 
						|
conversion#::=# printformat
 | 
						|
#|#``$''
 | 
						|
#|#``dist''
 | 
						|
printformat#::=#see PRINT(3ACK)
 | 
						|
.[
 | 
						|
PRINT
 | 
						|
.]
 | 
						|
.TE
 | 
						|
.VS -4
 | 
						|
.LP
 | 
						|
The three cases differ only in the conversion field. The printformat conversion
 | 
						|
applies to ordinary operands. The ``%$'' applies to operands that contain
 | 
						|
a ``$\fIi\fR''. The expression between parentheses must result in a pointer to
 | 
						|
a char. The
 | 
						|
result of ``%$'' is of the type of ``$\fIi\fR''. The ``%dist''
 | 
						|
applies to operands that refer to a local label. The expression between
 | 
						|
the brackets must result in a pointer to a char. The result of ``%dist'' is 
 | 
						|
of type arith.
 | 
						|
.PP
 | 
						|
The following example illustrates the usage of ``%$''. (For an
 | 
						|
example that illustrates the usage of ordinary fields see
 | 
						|
the section on ``User supplied definitions and functions'').
 | 
						|
.DS
 | 
						|
.ft CW
 | 
						|
jmp dst ==> 
 | 
						|
    @text1( 0xe9);
 | 
						|
    @reloc2( %$(dst->lab), %$(dst->off), PC_REL).
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
.PP
 | 
						|
A useful function concerning $\fIi\fRs is arg_type(), which takes as input a
 | 
						|
string starting with $\fIi\fR and returns the type of the \fIi\fR''th argument
 | 
						|
of the current EM-instruction, which can be STRING, ARITH or INT. One may need
 | 
						|
this function while decoding operands if the context of the $\fIi\fR does not
 | 
						|
give enough information.
 | 
						|
If the function arg_type() is used, the file
 | 
						|
arg_type.h must contain the definition of STRING, ARITH and INT.
 | 
						|
.PP
 | 
						|
%dist is only guaranteed to work when called as a parameter of text1(), text2() or text4().
 | 
						|
The goal of the %dist conversion is to reduce the number of reloc1(), reloc2()
 | 
						|
and reloc4()
 | 
						|
calls, saving space and time (no relocation at compiler run time). 
 | 
						|
The following example illustrates the usage of ``%dist''.
 | 
						|
.DS 
 | 
						|
.ft CW
 | 
						|
 jmp dst:ILB    ==> /* label in an instruction list */
 | 
						|
     @text1( 0xeb);          
 | 
						|
     @text1( %dist( dst->lab)).
 | 
						|
 | 
						|
 ... dst:LABEL  ==> /* global label */
 | 
						|
     @text1( 0xe9);       
 | 
						|
     @reloc2( %$(dst->lab), %$(dst->off), PC_REL).
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
.NH 3
 | 
						|
The functions assemble() and block_assemble()
 | 
						|
.PP
 | 
						|
The functions assemble() and block_assemble() are provided by \fBceg\fR.
 | 
						|
If, however, the table writer is not satisfied with the way they work 
 | 
						|
he can
 | 
						|
supply his own assemble() or block_assemble().
 | 
						|
The default function assemble() splits an assembly string into a 
 | 
						|
label, mnemonic,
 | 
						|
and operands and performs the following actions on them:
 | 
						|
.IP \0\01:
 | 
						|
It processes the local label; it records the name and current position. Thereafter it calls the function process_label() with one argument of type string,
 | 
						|
the label. The table writer has to define this function.
 | 
						|
.IP \0\02:
 | 
						|
Thereafter it calls the function process_mnemonic() with one argument of
 | 
						|
type string, the mnemonic. The table writer has to define this function.
 | 
						|
.IP \0\03:
 | 
						|
It calls process_operand() for each operand. Process_operand() must be
 | 
						|
written by the table-writer since no fixed representation for operands
 | 
						|
is enforced. It has two arguments: a string (the operand to decode) 
 | 
						|
and a pointer to the struct \fIt_operand\fR. The declaration of the struct 
 | 
						|
\fIt_operand\fR must be given in the
 | 
						|
file ``as.h'', and the table-writer can put all the information needed for
 | 
						|
encoding the operand in machine format in it.
 | 
						|
.IP \0\04:
 | 
						|
It examines the mnemonic and calls the associated function, generated by
 | 
						|
\fBasg\fR, with pointers to the decoded operands as arguments. This makes it
 | 
						|
possible to use the decoded operands in the right hand side of a rule (see
 | 
						|
below).
 | 
						|
.LP
 | 
						|
If the default assemble() does not work the way the table writer wants, he
 | 
						|
can supply his own version of it. Assemble() has the following arguments:
 | 
						|
.DS
 | 
						|
.ft CW
 | 
						|
assemble( instruction )
 | 
						|
    char *instruction;
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
\fIinstruction\fR points to a null-terminated string.
 | 
						|
.PP
 | 
						|
The default function block_assemble() is called with a sequence of assembly
 | 
						|
instructions that belong to one action list. It calls assemble() for 
 | 
						|
every assembly instruction in
 | 
						|
this block. But if a special action is
 | 
						|
required on a block of assembly instructions, the table writer only has to
 | 
						|
rewrite this function to get a new \fBceg\fR that obliges to his wishes.
 | 
						|
The function block_assemble has the following arguments:
 | 
						|
.DS
 | 
						|
.ft CW
 | 
						|
block_assemble( instructions, nr, first, last)
 | 
						|
      char   **instruction;
 | 
						|
      int      nr, first, last;
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
\fIInstruction\fR point to an array of pointers to strings representing
 | 
						|
assembly instructions. \fINr\fR is
 | 
						|
the number of instructions that must be assembled. \fIFirst\fR 
 | 
						|
and \fIlast\fR have no function in the default block_assemble(), but are 
 | 
						|
useful when optimizations are done in block_assemble().
 | 
						|
.PP
 | 
						|
Four things have to be specified in ``as.h'' and ``as.c''. First the user must
 | 
						|
give the declaration of struct \fIt_operand\fR in ``as.h'', and the functions
 | 
						|
process_operand(), process_mnemonic(), and process_label() must be given 
 | 
						|
in ``as.c''. If the right hand side of the as_table
 | 
						|
contains function calls other than the \fBback\fR-primitives, these functions
 | 
						|
must also be present in ``as.c''. Note that both the ``@''-sign (see 4.2.3) 
 | 
						|
and ``references'' (see 4.2.4) also work in the functions defined in ``as.c''. 
 | 
						|
.PP
 | 
						|
The following example shows the representative and essential parts of the 
 | 
						|
8086 ``as.h'' and ``as.c'' files. 
 | 
						|
.nr PS 10
 | 
						|
.nr VS 12
 | 
						|
.LP
 | 
						|
.DS L
 | 
						|
.ft CW
 | 
						|
/* Constants and type definitions in as.h */
 | 
						|
 | 
						|
#define        UNKNOWN                0
 | 
						|
#define        IS_REG                 0x1
 | 
						|
#define        IS_ACCU                0x2
 | 
						|
#define        IS_DATA                0x4
 | 
						|
#define        IS_LABEL               0x8
 | 
						|
#define        IS_MEM                 0x10
 | 
						|
#define        IS_ADDR                0x20
 | 
						|
#define        IS_ILB                 0x40
 | 
						|
 | 
						|
#define AX                0
 | 
						|
#define BX                3
 | 
						|
#define CL                1
 | 
						|
#define SP                4
 | 
						|
#define BP                5
 | 
						|
#define SI                6
 | 
						|
#define DI                7
 | 
						|
 | 
						|
#define REG( op)         ( op->type & IS_REG)
 | 
						|
#define ACCU( op)        ( op->type & IS_REG  &&  op->reg == AX)
 | 
						|
#define REG_CL( op)      ( op->type & IS_REG  &&  op->reg == CL)
 | 
						|
#define DATA( op)        ( op->type & IS_DATA)
 | 
						|
#define LABEL( op)       ( op->type & IS_LABEL)
 | 
						|
#define ILB( op)         ( op->type & IS_ILB)
 | 
						|
#define MEM( op)         ( op->type & IS_MEM)
 | 
						|
#define ADDR( op)        ( op->type & IS_ADDR)
 | 
						|
#define EADDR( op)       ( op->type & ( IS_ADDR | IS_MEM | IS_REG))
 | 
						|
#define CONST1( op)      ( op->type & IS_DATA  && strcmp( "1", op->expr) == 0)
 | 
						|
#define MOVS( op)        ( op->type & IS_LABEL&&strcmp("\"movs\"", op->lab) == 0)
 | 
						|
#define IMMEDIATE( op)   ( op->type & ( IS_DATA | IS_LABEL))
 | 
						|
 | 
						|
struct t_operand {
 | 
						|
        unsigned type;
 | 
						|
        int reg;
 | 
						|
        char *expr, *lab, *off;
 | 
						|
       };
 | 
						|
 | 
						|
extern struct t_operand saved_op, *AX_oper;
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
.nr PS 12
 | 
						|
.nr VS 14
 | 
						|
.LP
 | 
						|
.nr PS 10
 | 
						|
.nr VS 12
 | 
						|
.DS L
 | 
						|
.ft CW
 | 
						|
 | 
						|
/* Some functions in as.c. */
 | 
						|
 | 
						|
#include "arg_type.h"
 | 
						|
#include "as.h"
 | 
						|
 | 
						|
#define last( s)     ( s + strlen( s) - 1)
 | 
						|
#define LEFT         '('
 | 
						|
#define RIGHT        ')'
 | 
						|
#define DOLLAR       '$'
 | 
						|
 | 
						|
process_operand( str, op)
 | 
						|
char *str;
 | 
						|
struct t_operand *op;
 | 
						|
 | 
						|
/*        expr            ->        IS_DATA en IS_LABEL
 | 
						|
 *        reg             ->        IS_REG en IS_ACCU
 | 
						|
 *        (expr)          ->        IS_ADDR
 | 
						|
 *        expr(reg)       ->        IS_MEM
 | 
						|
 */
 | 
						|
{
 | 
						|
        char *ptr, *index();
 | 
						|
 | 
						|
        op->type = UNKNOWN;
 | 
						|
        if ( *last( str) == RIGHT) {
 | 
						|
                ptr = index( str, LEFT);
 | 
						|
                *last( str) = '\0';
 | 
						|
                *ptr = '\0';
 | 
						|
                if ( is_reg( ptr+1, op)) {
 | 
						|
                        op->type = IS_MEM;
 | 
						|
                        op->expr = ( *str == '\0' ? "0" : str);
 | 
						|
                }
 | 
						|
                else {
 | 
						|
                        set_label( ptr+1, op);
 | 
						|
                        op->type = IS_ADDR;
 | 
						|
                }
 | 
						|
        }
 | 
						|
        else
 | 
						|
                if ( is_reg( str, op))
 | 
						|
                        op->type = IS_REG;
 | 
						|
                else {
 | 
						|
                        if ( contains_label( str))
 | 
						|
                                set_label( str, op);
 | 
						|
                        else {
 | 
						|
                                op->type = IS_DATA;
 | 
						|
                                op->expr = str;
 | 
						|
                        }
 | 
						|
                }
 | 
						|
}
 | 
						|
 | 
						|
/*********************************************************************/
 | 
						|
 | 
						|
mod_RM( reg, op)
 | 
						|
int reg;
 | 
						|
struct t_operand *op;
 | 
						|
 | 
						|
/* This function helps to decode operands in machine format.
 | 
						|
 * Note the $-operators
 | 
						|
 */
 | 
						|
{
 | 
						|
      if ( REG( op))
 | 
						|
              R233( 0x3, reg, op->reg);
 | 
						|
      else if ( ADDR( op)) {
 | 
						|
              R233( 0x0, reg, 0x6);
 | 
						|
              @reloc2( %$(op->lab), %$(op->off), ABSOLUTE);
 | 
						|
      }
 | 
						|
      else if ( strcmp( op->expr, "0") == 0)
 | 
						|
              switch( op->reg) {
 | 
						|
                case SI : R233( 0x0, reg, 0x4);
 | 
						|
                          break;
 | 
						|
 | 
						|
                case DI : R233( 0x0, reg, 0x5);
 | 
						|
                          break;
 | 
						|
 | 
						|
                case BP : R233( 0x1, reg, 0x6);        /* exception! */
 | 
						|
                          @text1( 0);
 | 
						|
                          break;
 | 
						|
 | 
						|
                case BX : R233( 0x0, reg, 0x7);
 | 
						|
                          break;
 | 
						|
 | 
						|
                default : fprint( STDERR, "Wrong index register %d\en",
 | 
						|
                                  op->reg);
 | 
						|
              }
 | 
						|
      else {
 | 
						|
              @if ( fit_byte( %$(op->expr)))
 | 
						|
                      switch( op->reg) {
 | 
						|
                          case SI : R233( 0x1, reg, 0x4);
 | 
						|
                                  break;
 | 
						|
      
 | 
						|
                        case DI : R233( 0x1, reg, 0x5);
 | 
						|
                                  break;
 | 
						|
      
 | 
						|
                        case BP : R233( 0x1, reg, 0x6);
 | 
						|
                                  break;
 | 
						|
      
 | 
						|
                        case BX : R233( 0x1, reg, 0x7);
 | 
						|
                                  break;
 | 
						|
      
 | 
						|
                        default : fprint( STDERR, "Wrong index register %d\en",
 | 
						|
                                          op->reg);
 | 
						|
                      }
 | 
						|
                      @text1( %$(op->expr));
 | 
						|
              @else
 | 
						|
                      switch( op->reg) {
 | 
						|
                        case SI : R233( 0x2, reg, 0x4);
 | 
						|
                                  break;
 | 
						|
      
 | 
						|
                        case DI : R233( 0x2, reg, 0x5);
 | 
						|
                                  break;
 | 
						|
      
 | 
						|
                        case BP : R233( 0x2, reg, 0x6);
 | 
						|
                                  break;
 | 
						|
      
 | 
						|
                        case BX : R233( 0x2, reg, 0x7);
 | 
						|
                                  break;
 | 
						|
      
 | 
						|
                        default : fprint( STDERR, "Wrong index register %d\en",
 | 
						|
                                          op->reg);
 | 
						|
                      }
 | 
						|
                      @text2( %$(op->expr));
 | 
						|
              @fi
 | 
						|
      }
 | 
						|
}
 | 
						|
\fR
 | 
						|
.DE
 | 
						|
.nr PS 12
 | 
						|
.nr VS 14
 | 
						|
.NH 2
 | 
						|
Generating assembly code
 | 
						|
.PP
 | 
						|
It is possible to generate assembly instead of object files (see section 5), in
 | 
						|
which case there is no need to supply ``as_table'', ``as.h'', and ``as.c''. 
 | 
						|
This option is useful for debugging the EM_table.
 | 
						|
.NH 1
 | 
						|
Building a code expander
 | 
						|
.PP
 | 
						|
This section describes how to generate a code expander in two phases.
 | 
						|
In phase one, the EM_table is
 | 
						|
written and assembly code is generated. If the assembly code is an actual
 | 
						|
language, the EM_table can be tested by assembling and running the generated 
 | 
						|
code. 
 | 
						|
If an ad-hoc assembly language is used by the table writer, it is not possible
 | 
						|
to test the EM_table, but the code generated is at least in readable form.
 | 
						|
In the second phase, the as_table is written and object code is generated.
 | 
						|
After the generated object code is fed into the loader, it can be tested.
 | 
						|
.NH 2
 | 
						|
Phase one
 | 
						|
.PP
 | 
						|
The following is a list of instructions to make a
 | 
						|
code expander that generates assembly instructions.
 | 
						|
.IP \0\01:
 | 
						|
Create a new directory.
 | 
						|
.IP \0\02:
 | 
						|
Create the ``EM_table'', ``mach.h'', and ``mach.c'' files; there is no need 
 | 
						|
for ``as_table'', ``as.h'', and ``as.c'' at this moment.
 | 
						|
.IP \0\03:
 | 
						|
type
 | 
						|
.br
 | 
						|
.ft CW
 | 
						|
install_ceg -as
 | 
						|
\fR
 | 
						|
.br
 | 
						|
install_ceg will create a Makefile and three directories : ceg, ce, and back.
 | 
						|
Ceg will contain the program ceg; this program will be
 | 
						|
used to turn ``EM_table'' into a set of C source files (in the ce directory),
 | 
						|
one for each
 | 
						|
EM-instruction. All these files will be compiled and put in a library called
 | 
						|
\fBce.a\fR.
 | 
						|
.br
 | 
						|
The option 
 | 
						|
.ft CW
 | 
						|
-as\fR means that a \fBback\fR-library will be 
 | 
						|
generated (in the directory ``back'') that
 | 
						|
supports the generation of assembly language. The library is named ``back.a''.
 | 
						|
.IP \0\04:
 | 
						|
Link a front end, ``ce.a'', and ``back.a'' together resulting in a compiler
 | 
						|
that generates assembly code.
 | 
						|
.LP
 | 
						|
If the table writer has chosen an actual assembly language, the EM_table can be
 | 
						|
tested (e.g., by running the compiler on the EM test set). If an error occurs,
 | 
						|
change the EM_table and type
 | 
						|
.IP
 | 
						|
.br
 | 
						|
.ft CW
 | 
						|
update_ceg\fR \fBC_instr
 | 
						|
\fR
 | 
						|
.br
 | 
						|
.LP
 | 
						|
where \fBC_instr\fR stands for the name of the erroneous EM-instruction.
 | 
						|
If the table writer has chosen an ad-hoc assembly language, he can at least
 | 
						|
read the generated code and look for possible errors. If an error is found,
 | 
						|
the same procedure as described above can be followed.
 | 
						|
.NH 2
 | 
						|
Phase two
 | 
						|
.PP
 | 
						|
The next phase is to generate a \fBce\fR that produces relocatable object
 | 
						|
code.
 | 
						|
.IP \0\01:
 | 
						|
Remove the ``ce'', ``ceg'', and ``back'' directories.
 | 
						|
.IP \0\02:
 | 
						|
Write the ``as_table'', ``as.h'', and ``as.c'' files.
 | 
						|
.IP \0\03:
 | 
						|
type
 | 
						|
.sp
 | 
						|
.ft CW
 | 
						|
install_ceg -obj \fR
 | 
						|
.sp
 | 
						|
The option 
 | 
						|
.ft CW
 | 
						|
-obj\fR means that ``back.a'' will contain a library 
 | 
						|
for generating
 | 
						|
ACK.OUT(5ACK) object files, see appendix B. 
 | 
						|
If the writer does not want to use the default ``back.a'',
 | 
						|
the 
 | 
						|
.ft CW
 | 
						|
-obj\fR flag must omitted and a ``back.a'' should be supplied that
 | 
						|
generates the generates object code in the desired format.
 | 
						|
.IP \0\04:
 | 
						|
Link a front end, ``ce.a'', and ``back.a'' together resulting in a compiler
 | 
						|
that generates object code.
 | 
						|
.LP
 | 
						|
The as_table is ready to be tested. If an error occurs, adapt the table.
 | 
						|
Then there are two ways to proceed: 
 | 
						|
.IP \0\01:
 | 
						|
recompile the whole EM_table,
 | 
						|
.sp
 | 
						|
.ft CW
 | 
						|
update_ceg ALL \fR
 | 
						|
.sp
 | 
						|
.IP \0\02:
 | 
						|
recompile just the few EM-instructions that contained the error,
 | 
						|
.sp
 | 
						|
.ft CW
 | 
						|
update_ceg \fBC_instr\fR
 | 
						|
.sp
 | 
						|
where \fBC_instr\fR is an erroneous EM-instruction.
 | 
						|
This has to be done for every EM-instruction that contained the erroneous
 | 
						|
assembly instruction.
 | 
						|
.NH
 | 
						|
Acknowledgements
 | 
						|
.PP
 | 
						|
We want to thank Henri Bal, Dick Grune, and Ceriel Jacobs for their 
 | 
						|
valuable suggestions and the critical reading of this paper.
 | 
						|
.NH
 | 
						|
References
 | 
						|
.LP
 | 
						|
.[
 | 
						|
$LIST$
 | 
						|
.]
 | 
						|
.bp
 | 
						|
.SH 
 | 
						|
Appendix A, \fRthe \fBback\fR-primitives
 | 
						|
.PP
 | 
						|
This appendix describes the routines available to generate relocatable
 | 
						|
object code. If the default back.a is used, the object code is in 
 | 
						|
ACK.OUT(5ACK) format.
 | 
						|
In de default back.a, the names defined here are remapped to more hidden names,
 | 
						|
to avoid name conflicts with for instance names used in the front-end. This
 | 
						|
remapping is done in an include-file, "back.h". If you implement your own
 | 
						|
back.a library, you are advised to do the same thing. You need some parts of
 | 
						|
the default "back.h" anyway.
 | 
						|
.nr PS 10
 | 
						|
.nr VS 12
 | 
						|
.PP
 | 
						|
.IP A1.
 | 
						|
Text and data generation; with ONE_BYTE b; TWO_BYTES w; FOUR_BYTES l; arith n;
 | 
						|
.VS +4
 | 
						|
.TS
 | 
						|
tab(#);
 | 
						|
l c lw(10c).
 | 
						|
text1( b)#:#T{
 | 
						|
Put one byte in text-segment.
 | 
						|
T}
 | 
						|
text2( w)#:#T{
 | 
						|
Put word (two bytes) in text-segment, byte-order is defined by
 | 
						|
BYTES_REVERSED in mach.h.
 | 
						|
T}
 | 
						|
text4( l)#:#T{
 | 
						|
Put long ( two words) in text-segment, word-order is defined by
 | 
						|
WORDS_REVERSED in mach.h.
 | 
						|
T}
 | 
						|
#
 | 
						|
con1( b)#:#T{
 | 
						|
Same for CON-segment.
 | 
						|
T}
 | 
						|
con2( w)#:
 | 
						|
con4( l)#:
 | 
						|
#
 | 
						|
rom1( b)#:#T{
 | 
						|
Same for ROM-segment.
 | 
						|
T}
 | 
						|
rom2( w)#:
 | 
						|
rom4( l)#:
 | 
						|
#
 | 
						|
gen1( b)#:#T{
 | 
						|
Same for the current segment, only to be used in the ``..icon'', ``..ucon'', etc.
 | 
						|
pseudo EM-instructions.
 | 
						|
T}
 | 
						|
gen2( w)#:
 | 
						|
gen4( l)#:
 | 
						|
#
 | 
						|
bss( n)#:#T{
 | 
						|
Put n bytes in bss-segment, value is BSS_INIT.
 | 
						|
T}
 | 
						|
common( n)#:#T{
 | 
						|
If there is a saved label, generate a "common" for it, of size
 | 
						|
n. Otherwise, it is equivalent to bss(n).
 | 
						|
(see also the save_label routine).
 | 
						|
T}
 | 
						|
.TE
 | 
						|
.VS -4
 | 
						|
.IP A2.
 | 
						|
Relocation; with char *s; arith o; int r;
 | 
						|
.VS +4
 | 
						|
.TS
 | 
						|
tab(#);
 | 
						|
l c lw(10c).
 | 
						|
reloc1( s, o, r)#:#T{
 | 
						|
Generates relocation-information for 1 byte in the current segment.
 | 
						|
T}
 | 
						|
##s\0:\0the string which must be relocated
 | 
						|
##o\0:\0the offset in bytes from the string. 
 | 
						|
##T{
 | 
						|
r\0:\0relocation type. It can have the values ABSOLUTE or PC_REL. These
 | 
						|
two constants are defined in the file ``back.h''
 | 
						|
T}
 | 
						|
reloc2( s, o, r)#:#T{
 | 
						|
Generates relocation-information for 1 word in the
 | 
						|
current segment. Byte-order according to BYTES_REVERSED in mach.h.
 | 
						|
T}
 | 
						|
reloc4( s, o, r)#:#T{
 | 
						|
Generates relocation-information for 1 long in the
 | 
						|
current segment. Word-order according to WORDS_REVERSED in mach.h.
 | 
						|
T}
 | 
						|
.TE
 | 
						|
.VS -4
 | 
						|
.IP A3.
 | 
						|
Symbol table interaction; with int seg; char *s;
 | 
						|
.VS +4
 | 
						|
.TS
 | 
						|
tab(#);
 | 
						|
l c lw(10c).
 | 
						|
switch_segment( seg)#:#T{
 | 
						|
sets current segment to ``seg'', and does alignment if necessary. ``seg'' 
 | 
						|
can be one of the four constants defined in ``back.h'': SEGTXT, SEGROM,
 | 
						|
SEGCON, SEGBSS.
 | 
						|
T}
 | 
						|
#
 | 
						|
symbol_definition( s)#:#T{
 | 
						|
Define s in symbol-table.
 | 
						|
T}
 | 
						|
set_local_visible( s)#:#T{
 | 
						|
Record scope-information in symbol table.
 | 
						|
T}
 | 
						|
set_global_visible( s)#:#T{
 | 
						|
Record scope-information in symbol table.
 | 
						|
T}
 | 
						|
.TE
 | 
						|
.VS -4
 | 
						|
.IP A4.
 | 
						|
Start/end actions; with char *f;
 | 
						|
.VS +4
 | 
						|
.TS
 | 
						|
tab(#);
 | 
						|
l c lw(10c).
 | 
						|
open_back( f)#:#T{
 | 
						|
Directs output to file ``f'', if f is the null pointer output must be given on
 | 
						|
standard output.
 | 
						|
T}
 | 
						|
close_back()#:#T{
 | 
						|
close output stream.
 | 
						|
T}
 | 
						|
init_back()#:#T{
 | 
						|
Only used with user-written back-library, gives the opportunity to initialize.
 | 
						|
T}
 | 
						|
end_back()#:#T{
 | 
						|
Only used with user-written back-library.
 | 
						|
T}
 | 
						|
.TE
 | 
						|
.VS -4
 | 
						|
.IP A5.
 | 
						|
Label generation routines; with int n; arith g; char *l; These routines all
 | 
						|
return a "char *" to a static area, which is overwritten at each call.
 | 
						|
.VS +4
 | 
						|
.TS
 | 
						|
tab(#);
 | 
						|
l c lw(10c).
 | 
						|
extnd_pro( n)#:#T{
 | 
						|
Label set at the end of procedure \fIn\fP, to generate space for locals.
 | 
						|
T}
 | 
						|
extnd_start( n)#:#T{
 | 
						|
Label set at the beginning of procedure \fIn\fP, to jump back to after generating
 | 
						|
space for locals.
 | 
						|
T}
 | 
						|
extnd_name( l)#:#T{
 | 
						|
Create a name for a procedure named \fIl\fP.
 | 
						|
T}
 | 
						|
extnd_dnam( l)#:#T{
 | 
						|
Create a name for an external variable named \fIl\fP.
 | 
						|
T}
 | 
						|
extnd_dlb( g)#:#T{
 | 
						|
Create a name for numeric data label \fIg\fP.
 | 
						|
T}
 | 
						|
extnd_ilb( l, n)#:#T{
 | 
						|
Create a name for instruction label \fIl\fP in procedure \fIn\fP.
 | 
						|
T}
 | 
						|
extnd_hol( n)#:#T{
 | 
						|
Create a name for HOL block number \fIn\fP.
 | 
						|
T}
 | 
						|
extnd_part( n)#:#T{
 | 
						|
Create a unique label for the C_insertpart mechanism.
 | 
						|
T}
 | 
						|
extnd_cont( n)#:#T{
 | 
						|
Create another unique label for the C_insertpart mechanism.
 | 
						|
T}
 | 
						|
extnd_main( n)#:#T{
 | 
						|
Create yet another unique label for the C_insertpart mechanism.
 | 
						|
T}
 | 
						|
.TE
 | 
						|
.VS -4
 | 
						|
.IP A6.
 | 
						|
Some miscellaneous routines, with char *l; 
 | 
						|
.VS +4
 | 
						|
.TS
 | 
						|
tab(#);
 | 
						|
l c lw(10c).
 | 
						|
save_label( l)#:#T{
 | 
						|
Save label \fIl\fP. Unfortunately, in EM when you see a label, you don't
 | 
						|
know yet in which segment it will end up. The save_label/dump_label mechanism
 | 
						|
is there to solve this problem.
 | 
						|
T}
 | 
						|
dump_label()#:#T{
 | 
						|
If there is a label saved, force definition for it now.
 | 
						|
T}
 | 
						|
align_word()#:#T{
 | 
						|
Align to a word boundary, if the current segment is not a text segment.
 | 
						|
T}
 | 
						|
.TE
 | 
						|
.VS -4
 | 
						|
.nr PS 12
 | 
						|
.nr VS 14
 | 
						|
.bp
 | 
						|
.SH 
 | 
						|
Appendix B, description of ACK-a.out library
 | 
						|
.PP 
 | 
						|
The object file produced by \fBce\fR is by default in ACK.OUT(5ACK)
 | 
						|
format. The object file is made up of one header, followed by
 | 
						|
four segment headers, followed by text, data, relocation information, 
 | 
						|
symbol table, and the string area. The object file is tuned for the ACK-LED,
 | 
						|
so there are some special things done just before the object file is dumped.
 | 
						|
First, four relocation records are added which contain the names of the four
 | 
						|
segments. Second, all the local relocation is resolved. This is done by the 
 | 
						|
function do_relo(). If there is a record belonging to a local
 | 
						|
name this address is relocated in the segment to which the record belongs.
 | 
						|
Besides doing the local relocation, do_relo() changes the ``nami''-field
 | 
						|
of the local relocation records. This field receives the index of one of the
 | 
						|
four
 | 
						|
relocation records belonging to a segment. After the local
 | 
						|
relocation has been resolved the routine output_back() dumps the 
 | 
						|
ACK object file.
 | 
						|
.LP
 | 
						|
If a different a.out format is wanted, one can choose between three strategies:
 | 
						|
.IP \ \1:
 | 
						|
The most simple one is to use a conversion program, which converts the ACK
 | 
						|
a.out format to the wanted a.out format. This program exists for all most
 | 
						|
all machines on which ACK runs. However,
 | 
						|
not all conversion programs can generate relocation information.
 | 
						|
The disadvantage is that the compiler will become slower.
 | 
						|
.IP \ \2: 
 | 
						|
A better solution is to change the functions output_back(), do_relo(),
 | 
						|
open_back(), and close_back() in such a way
 | 
						|
that they produce the wanted a.out format. This strategy saves a lot of I/O.
 | 
						|
.IP \ \3:
 | 
						|
If you still are not satisfied and have a lot of spare time adapt the
 | 
						|
\fBback\fR-primitives to produce the wanted a.out format.
 |