1245 lines
42 KiB
Text
1245 lines
42 KiB
Text
|
.nr PS 12
|
||
|
.nr VS 14
|
||
|
.nr LL 6i
|
||
|
.TL
|
||
|
Code expander generator
|
||
|
.AU
|
||
|
Frans Kaashoek
|
||
|
Koen Langendoen
|
||
|
.AI
|
||
|
Dept. of Mathematics and Computer Science
|
||
|
Vrije Universiteit
|
||
|
Amsterdam, The Netherlands
|
||
|
.NH
|
||
|
Introduction
|
||
|
.PP
|
||
|
A \fBcode expander\fR ( \fBce\fR for short) is a part of the
|
||
|
Amsterdam Compiler Kit (\fBACK\fR), which provides the user with
|
||
|
high-speed generation of medium-quality code. Although conceptually
|
||
|
equivalent to the more usual \fBcode generator\fR, it differs in some
|
||
|
aspects.
|
||
|
.LP
|
||
|
Normally, a program to be compiled with \fBACK\fR
|
||
|
is first fed into the preprocessor. The output of the preprocessor goes
|
||
|
into the apropiate front end, whose job it is to produce EM (a
|
||
|
machine independent low level intermediate code). The generated EM code is fed
|
||
|
into the peephole optimizer, which scans it with a window of a few instructions,
|
||
|
replacing certain inefficient code sequences by better ones. After the
|
||
|
peephole optimizer a backend follows, which produces high quality assembly code.
|
||
|
The assembly code goes via the target optimizer into the assembler and the
|
||
|
objectcode then goes into the
|
||
|
linker/loader, the final component in the pipeline.
|
||
|
.LP
|
||
|
For various applications
|
||
|
this scheme is too slow, for example, for debugging programs; in this case
|
||
|
the program has to be compiled fast and the runtime of the program may be
|
||
|
slower. For this purpose a new scheme is introduced:
|
||
|
.IP \ \ 1:
|
||
|
The code generator and assembler have
|
||
|
been replaced by one program: the \fBcode expander\fR, which directly expands
|
||
|
the EM-instructions into an relocatable objectfile.
|
||
|
The peephole and target optimizer are not used.
|
||
|
.IP \ \ 2:
|
||
|
The front end and \fBce\fR have been combined into a single
|
||
|
program, eliminating the overhead of intermediate files.
|
||
|
.LP
|
||
|
This results in a fast compiler producing objectfiles, ready to be
|
||
|
linked and loaded, at the cost of unoptimized object code.
|
||
|
.LP
|
||
|
An extra speedup is gained by the way the code expander works. Instead of
|
||
|
trying to generate code for a sequence of EM-instructions, like the usual
|
||
|
code generator, it expands each EM-instruction separately.
|
||
|
.LP
|
||
|
Because of the
|
||
|
simple nature of the code expander, it is much easier to build, to debug and to
|
||
|
test. Experience has demonstrated that a code expander can be constructed,
|
||
|
debugged and tested in less than two weeks.
|
||
|
.LP
|
||
|
This document describes the tools for automatically generating a
|
||
|
\fBce\fR (a library of "C"-files), from two tables and
|
||
|
a few machine-dependent functions.
|
||
|
To understand this document and the examples it is necessary to have a
|
||
|
throughout knowledge of EM.
|
||
|
.NH
|
||
|
An overview
|
||
|
.PP
|
||
|
A code expander consists of a set of routines that convert EM-instructions
|
||
|
directly to relocatable object code. These routines are called by a front
|
||
|
end through the
|
||
|
EM_CODE(3L) interface. To free the table writer of the burden of building
|
||
|
a object file, we supply a set of routines that build an object file
|
||
|
in the NEW_A.OUT(5L) format (see appendix B). This set of routines is called
|
||
|
the
|
||
|
\fBback\fR-primitives (see appendix A).
|
||
|
.PP
|
||
|
To avoid repetition of the same sequences of
|
||
|
\fBback\fR-primitives in different
|
||
|
EM-instructions
|
||
|
and to improve readability, the EM to object information must be supplied in
|
||
|
two
|
||
|
tables. One that maps EM to an assembly language, the EM_table, and one one
|
||
|
that maps
|
||
|
assembler to \fBback\fR-primitives, the as_table. The assembler may be an
|
||
|
actual assembler or ad-hoc designed by the table writer.
|
||
|
.LP
|
||
|
The following picture shows the dependencies between the different components:
|
||
|
.sp
|
||
|
.PS
|
||
|
linewid = 0.5i
|
||
|
A: line down 2i
|
||
|
B: line down 2i with .start at A.start + (1.5i, 0)
|
||
|
C: line down 2i with .start at B.start + (1.5i, 0)
|
||
|
D: arrow right with .start at A.center - (0.25i, 0)
|
||
|
E: arrow right with .start at B.center - (0.25i, 0)
|
||
|
F: arrow right with .start at C.center - (0.25i, 0)
|
||
|
"EM_CODE(3L)" at A.start above
|
||
|
"EM_TABLE" at B.start above
|
||
|
"as_table" at C.start above
|
||
|
"source language " at D.start rjust
|
||
|
"EM" at 0.5 of the way between D.end and E.start
|
||
|
G: "assembler" at 0.5 of the way between E.end and F.start
|
||
|
H: " back primitives" at F.end ljust
|
||
|
"(user defined)" at G - (0, 0.2i)
|
||
|
" (NEW_A.OUT)" at H - (0, 0.2i) ljust
|
||
|
.PE
|
||
|
.PP
|
||
|
The entries in the as_table map assembly instructions on \fBback\fR-primitives.
|
||
|
The as_table is used to transform the EM - assembly mapping into a EM -
|
||
|
\fBback\fR- primitives mapping;
|
||
|
the expanded EM_table is then transformed into a set of C-routines, which are
|
||
|
normally incorporated in a compiler. All this happens during compiler
|
||
|
generation time. The C-routines are activated during the
|
||
|
execution of the compiler.
|
||
|
.PP
|
||
|
To illustrate what happens, we give an example. The example is an entry in
|
||
|
the tables for the VAX-machine. The assembly language chosen is a subset of the
|
||
|
VAX assembly language.
|
||
|
.PP
|
||
|
One of the most fundamental operations in EM is 'loc c', load the value of c
|
||
|
on the stack. To expand this instruction the
|
||
|
tables contain the following information:
|
||
|
.DS
|
||
|
\f5
|
||
|
EM_table : C_loc ==> "pushl $$$1".
|
||
|
/* $1 refers to the first argument of C_loc. */
|
||
|
|
||
|
|
||
|
as_table : pushl src : CONST ==>
|
||
|
@text1( 0xd0);
|
||
|
@text1( 0xef);
|
||
|
@text4( %$( src->num)).
|
||
|
\fR
|
||
|
.DE
|
||
|
.LP
|
||
|
The following routine will be generated for C_loc:
|
||
|
.DS
|
||
|
\f5
|
||
|
C_loc( c)
|
||
|
arith c;
|
||
|
{
|
||
|
swtxt();
|
||
|
text1( 0xd0); /* text1(), text4() are library routines, */
|
||
|
text1( 0xef); /* which fill the text segment */
|
||
|
text4( c);
|
||
|
}
|
||
|
\fR
|
||
|
.DE
|
||
|
.LP
|
||
|
A call by the compiler to 'C_loc' will cause that the 1-byte numbers '0xd0'
|
||
|
and '0xef'
|
||
|
and the 4-byte value of the variable 'c' will be stored in the text segment.
|
||
|
.PP
|
||
|
The transformations on the tables are done automatically by the code expander
|
||
|
generator.
|
||
|
The code expander generator consists of two tools, one to handle the EM_table
|
||
|
, emg, and one to handle the as_table, \fBasg\fR. Asg transforms
|
||
|
each assembly instruction in a C-routine. These C-routines generate calls
|
||
|
to the \fBback\fR-primitives. Finally, the generated C-routines are used
|
||
|
by emg to generate from the EM_table the actual code expander.
|
||
|
.PP
|
||
|
The link between emg and \fBasg\fR is an assembly language.
|
||
|
We didn't enforce a specific syntax for the assembly language;
|
||
|
instead we have chosen to give the table writer the freedom
|
||
|
to make an ad-hoc assembly language or to use an actual assembly language
|
||
|
suitable for his purpose. Apart from a greater flexibility this
|
||
|
has another advantage; if the table writer adopts the assembly language that
|
||
|
runs on the machine at hand, he can test the EM_table independently from the
|
||
|
as_table. Of course there is a price to pay; the table writer has to
|
||
|
do the decoding of the operands himself. See section 4 for more details.1
|
||
|
.PP
|
||
|
Before we explain the several parts of the ceg, we will give an overview of
|
||
|
the four important phases.
|
||
|
.IP "phase 1):"
|
||
|
.br
|
||
|
The as_table is transformed by \fBasg\fR. This results in a set of C-routines.
|
||
|
Each assembly-opcode generates one C-routine.
|
||
|
.IP "phase 2):"
|
||
|
.br
|
||
|
The C-routines generated by \fBasg\fR are used by emg to expand the EM_table.
|
||
|
This
|
||
|
results in a set of C-routines, the code expander, which form the procedural
|
||
|
interface EM_CODE(3L).
|
||
|
.IP "phase 3):"
|
||
|
.br
|
||
|
The front end that uses the procedural interface is linked/loaded with the
|
||
|
code expander generated in phase 2) and the \fBback\fR-primitives.
|
||
|
This results in a compiler.
|
||
|
.IP "phase 4):"
|
||
|
.br
|
||
|
Execution of the compiler; The routines in the code expander are
|
||
|
executed and produce object code.
|
||
|
.RE
|
||
|
.NH
|
||
|
Description of the EM_table
|
||
|
.PP
|
||
|
This section describes the EM_table. It contains four subsections :
|
||
|
a section that describes the syntax of the EM_table; a section that deals with the
|
||
|
semantics of the EM_table; a section that gives an list of the functions and
|
||
|
constants that must be present in the EM_table, in the file 'mach.c' or in
|
||
|
the file 'mach.h'; a section that deals with the case that the table
|
||
|
writer wants to generate assembly instead of object code. The section on
|
||
|
semantics contains many examples.
|
||
|
.NH 2
|
||
|
Grammar
|
||
|
.PP
|
||
|
The following grammar describes the syntax of the EM_table.
|
||
|
.VS +4
|
||
|
.TS
|
||
|
center tab(%);
|
||
|
l c l.
|
||
|
TABLE%::=%( RULE)*
|
||
|
RULE%::=%C_instr ( CONDITIONALS | SIMPLE)
|
||
|
CONDITIONAL%::=%( condition SIMPLE)+ 'default' SIMPLE
|
||
|
SIMPLE%::=%( '==>' | '::=') ACTION_LIST
|
||
|
ACTION_LIST%::=%[ ACTION ( ';' ACTION)* ] '.'
|
||
|
ACTION%::=%AS_INSTR
|
||
|
%|%function-call
|
||
|
.sp
|
||
|
AS_INSTR%::=%'"' [ label ':'] [ INSTR] '"'
|
||
|
INSTR%::=%mnemonic [ operand ( ',' operand)* ]
|
||
|
.TE
|
||
|
.VS -4
|
||
|
.PP
|
||
|
\'(' ')' brackets are used for grouping, '[' ... ']' means ... 0 or 1 time,
|
||
|
\'*' means zero or more times, '+' means one or more times and '|' means
|
||
|
a choice between left or right. A \fBC_instr\fR is
|
||
|
a name in the EM_CODE(3L) interface. \fBcondition\fR is a 'C' expression.
|
||
|
\fBfunction-call\fR is a call of a 'C' function. \fBlabel\fR, \fBmnemonic\fR
|
||
|
and \fBoperand\fR are arbitrary strings. If an \fBoperand\fR contains brackets the
|
||
|
brackets must match. In reality there is an upperound to the number of
|
||
|
operands; The maxium number is defined by the constant MAX_OPERANDS in de
|
||
|
file 'const.h' in the directory assemble.c. Comments in the table should be
|
||
|
placed between '/*' and '*/'. Finally, before the table is parsed, the
|
||
|
C-preprocessor runs.
|
||
|
.NH 2
|
||
|
Semantics
|
||
|
.PP
|
||
|
The EM_table is processed by \fBemg\fR. \fBEmg\fR generates for every
|
||
|
instruction in the EM_CODE(3L) a C function.
|
||
|
For every EM-instruction not mentioned in the EM_table an
|
||
|
C function that prints an error message is generated .
|
||
|
It is possible to divide the EM_CODE(3L)-interface in four parts :
|
||
|
.IP \0\01)
|
||
|
text instructions (e.g., C_loc, C_adi, ..)
|
||
|
.IP \0\02)
|
||
|
pseudo instructions (e.g., C_open, C_df_ilb, ..)
|
||
|
.IP \0\03)
|
||
|
storage instructions (e.g., C_rom_icon, ..)
|
||
|
.IP \0\04)
|
||
|
message instructions (e.g., C_mes_begin, ..)
|
||
|
.LP
|
||
|
This section starts with giving the semantics of the grammar. The examples
|
||
|
are text instructions. The section ends with remarks on the pseudo
|
||
|
instructions and the storage instructions. Since message instructions aren't
|
||
|
useful for a code expander, they are ignored.
|
||
|
.PP
|
||
|
.NH 3
|
||
|
Actions
|
||
|
.PP
|
||
|
The EM_table consists of rules which describe how to expand a \fBC_instr\fR
|
||
|
from the EM_CODE(3L)-interface, an EM instruction, into actions.
|
||
|
There are two kind of actions: assembly instructions and C function calls.
|
||
|
An assembly instruction is defined as a mnemonic followed by zero or more
|
||
|
operands, separated by commas. The semantic of an assembly instruction is
|
||
|
defined by the table writer. When the assembly language is not expressive
|
||
|
enough, then, as an escape route, function calls can be made. However, this
|
||
|
reduces
|
||
|
the speed of the actual code expander. Finally, actions can be grouped into
|
||
|
a list of actions; actions are separated by a semicolon and terminated
|
||
|
by a '.'.
|
||
|
.DS
|
||
|
\f5
|
||
|
C_nop ==> . /* Empty action list : no operation. */
|
||
|
|
||
|
C_inc ==> "incl (sp)". /* Assembler instruction, which is evaluated
|
||
|
* during expansion of the EM_table */
|
||
|
|
||
|
C_slu ==> C_sli( $1). /* Function call, which is evaluated during
|
||
|
* execution of the compiler. */
|
||
|
\fR
|
||
|
.DE
|
||
|
.NH 3
|
||
|
Labels
|
||
|
.PP
|
||
|
Since an assembly language without instruction labels is a rather weak
|
||
|
language, labels inside a contiguous block of assembly instructions are
|
||
|
allowed. When using labels two rules must be observed:
|
||
|
.IP \0\01)
|
||
|
The name of a label should be unique inside an action list.
|
||
|
.IP \0\02)
|
||
|
The labels used in an assembler instruction should be defined in the same
|
||
|
action list.
|
||
|
.LP
|
||
|
The following example illustrates the usage of labels.
|
||
|
.DS
|
||
|
\f5
|
||
|
C_cmp ==> "pop bx"; /* Compare the two top */
|
||
|
"pop cx"; /* elements on the stack. */
|
||
|
"xor ax, ax";
|
||
|
"cmp cx, bx";
|
||
|
"je 2f"; /* Forward jump to local label */
|
||
|
"jb 1f";
|
||
|
"inc ax";
|
||
|
"jmp 2f";
|
||
|
"1: dec ax";
|
||
|
"2: push ax".
|
||
|
\fR
|
||
|
.DE
|
||
|
We will come back to labels in the section on the as_table.
|
||
|
.NH 3
|
||
|
Arguments of an EM instruction
|
||
|
.PP
|
||
|
In most cases the translation of a \fBC_instr\fR depends on its arguments.
|
||
|
The arguments of a \fBC_instr\fR are numbered from 1 to \fIn\fR, where \fIn\fR
|
||
|
is the
|
||
|
total number of arguments of the current \fBC_instr\fR (There are a few
|
||
|
exceptions, see Implicit arguments). The table writer may
|
||
|
refer to an argument as $\fIi\fR. If a plain $-sign is needed in an
|
||
|
assembly instruction, it must be preceded by a extra $-sign.
|
||
|
.PP
|
||
|
There are two groups of \fBC_instr\fRs whose arguments are specially handled:
|
||
|
.RS
|
||
|
.IP "1) Instructions dealing with local offsets."
|
||
|
.br
|
||
|
The value of the $\fIi\fR argument referring to a parameter ($\fIi\fR >= 0),
|
||
|
is increased by 'EM_BSIZE'. 'EM_BSIZE' is the size of the return status block
|
||
|
and must be defined in the file 'mach.h', see section 3.3. For example :
|
||
|
.DS
|
||
|
\f5
|
||
|
C_lol ==> "push $1(bp)". /* automatic conversion of $1 */
|
||
|
\fR
|
||
|
.DE
|
||
|
.IP "2) Instructions using global names or instruction labels"
|
||
|
.br
|
||
|
All the arguments referring to global names or instruction labels will be
|
||
|
transformed into a unique assembly name. To prevent name clashes with library
|
||
|
names the table writer has to provide the
|
||
|
conversions in the file 'mach.h'. For example :
|
||
|
.DS
|
||
|
\f5
|
||
|
C_bra ==> "jmp $1". /* automatic conversion of $1 */
|
||
|
/* type arith is converted to string */
|
||
|
\fR
|
||
|
.DE
|
||
|
.RE
|
||
|
.NH 3
|
||
|
Conditionals
|
||
|
.PP
|
||
|
The rules in the EM_table can be divided in two groups: simple rules and
|
||
|
conditional rules. The simple rules consist of a \fBC_instr\fR followed by
|
||
|
a list of actions, as described above. The conditional rules (CONDITIONAL)
|
||
|
allow the table writer to select an action list depending on the value of
|
||
|
a condition.
|
||
|
.PP
|
||
|
A CONDITIONAL is a list of a boolean expression with the corresponding
|
||
|
simple rule. If
|
||
|
the expression evaluates to true then the corresponding simple rule is carried
|
||
|
out. If more than one condition evaluates to true, an abritary is chosen.
|
||
|
The last case of a CONDITIONAL of a \fBC_instr\fR must handle the default case.
|
||
|
The boolean expression in a CONDITIONAL must be an 'C' expression. Besides the
|
||
|
ordinary 'C' operators and constants, $\fIi\fR references can be used
|
||
|
in an expression.
|
||
|
.DS
|
||
|
\f5
|
||
|
C_lxl /* Load address of LB $1 levels back. */
|
||
|
$1 == 0 ==> "pushl fp".
|
||
|
$1 == 1 ==> "pushl 4(ap)".
|
||
|
default ==> "movl $$$1, r0";
|
||
|
"jsb .lxl";
|
||
|
"pushl r0".
|
||
|
\fR
|
||
|
.DE
|
||
|
.NH 3
|
||
|
Equivalence rule
|
||
|
.PP
|
||
|
Among the simple rules there is special case rule:
|
||
|
the equivalence rule. This rule declares two \fBC_instr\fR equivalent. To
|
||
|
distinguish it from the usual simple rule '==>' is replaced by a '::='.
|
||
|
The benefit of a equivalence rule is that the arguments are not converted.
|
||
|
.DS
|
||
|
\f5
|
||
|
C_slu ::= C_sli( $1).
|
||
|
\fR
|
||
|
.DE
|
||
|
.NH 3
|
||
|
Abbreviations
|
||
|
.PP
|
||
|
EM instructions with an external as argument come in three variants in
|
||
|
the EM_CODE(3L) interface. In most cases it will be possible to take
|
||
|
these variants together. For this purpose the '..' notation is introduced.
|
||
|
.DS
|
||
|
\f5
|
||
|
/* For the code expander there is no difference between the following
|
||
|
* instructions. */
|
||
|
C_loe_dlb ==> "pushl $1 + $2".
|
||
|
C_loe_dnam ==> "pushl $1 + $2".
|
||
|
C_loe ==> "pushl $1 + $2".
|
||
|
|
||
|
/* So it can be written in the following way.
|
||
|
*/
|
||
|
C_loe.. ==> "pushl $1 + $2".
|
||
|
\fR
|
||
|
.DE
|
||
|
.NH 3
|
||
|
Implicit arguments
|
||
|
.PP
|
||
|
In the last example 'C_loe' has two arguments, but in the EM_CODE interface
|
||
|
it has one argument. However, this argument dependents on the current 'hol'
|
||
|
block; in the EM_table this it made explicit. Every \fBC_instr\fR whose
|
||
|
argument depends on 'hol' block has one extra argument; argument 1 refers
|
||
|
to the 'hol' block.
|
||
|
.NH 3
|
||
|
Pseudo instructions
|
||
|
.PP
|
||
|
Most pseudo instructions are machine independent and are provided
|
||
|
by \fBceg\fR. The table writer has only to supply the functions :
|
||
|
.DS
|
||
|
\f5
|
||
|
prolog()
|
||
|
/* Performs the prolog, for example save return address */
|
||
|
|
||
|
locals( n)
|
||
|
arith n;
|
||
|
/* Allocate n bytes for locals on the stack */
|
||
|
|
||
|
jump( label)
|
||
|
char *label;
|
||
|
/* Generates code for a jump to 'label' */
|
||
|
\fR
|
||
|
.DE
|
||
|
.LP
|
||
|
These functions can be defined in 'mach.c' or in the EM_table.
|
||
|
.NH 3
|
||
|
Storage instructions
|
||
|
.PP
|
||
|
The storage instructions 'C_bss_\fIcstp()\fR', 'C_hol_\fIcstp()\fR',
|
||
|
'C_con_\fIcstp()\fR' and 'C_rom_\fIcstp()\fR', except for the instructions
|
||
|
dealing with constants of type string ( C_..._icon, C_..._ucon, C_..._fcon), are
|
||
|
generated automatically. No information is needed in the table.
|
||
|
To generate the C_..._icon, C_..._ucon, C_..._fcon instructions
|
||
|
\fBceg\fR only has to know how to convert a number of type string to bytes;
|
||
|
this can be defined with the constants ONE_BYTE, TWO_BYTES, and FOUR_BYTES.
|
||
|
C_rom_icon, C_con_icon, C_bss_icon, C_hol_icon can be abbreviated by ..icon.
|
||
|
This also holds for ..ucon and ..fcon.
|
||
|
For example :
|
||
|
.DS
|
||
|
\f5
|
||
|
\\.\\.icon
|
||
|
$2 == 1 ==> gen1( (ONE_BYTE) atoi( $1)).
|
||
|
$2 == 2 ==> gen2( (TWO_BYTES) atoi( $1)).
|
||
|
$2 == 4 ==> gen4( (FOUR_BYTES) atoi( $1)).
|
||
|
default ==> arg_error( "..icon", $2).
|
||
|
\fR
|
||
|
.DE
|
||
|
Gen1(), gen2() and gen4() are \fBback\fR-primitives, see appendix A, and
|
||
|
generate one, two or four byte constants. Atoi() is a 'C' library function which
|
||
|
converts strings to integers.
|
||
|
The constants 'ONE_BYTE', 'TWO_BYTES' and 'FOUR_BYTES' must be defined in
|
||
|
the file 'mach.h'.
|
||
|
.NH 2
|
||
|
User supplied definitions and functions
|
||
|
.PP
|
||
|
If the table writer uses all the default functions he has only to supply
|
||
|
the following constants and functions :
|
||
|
.TS
|
||
|
tab(#);
|
||
|
l c lw(10c).
|
||
|
prolog()#:#T{
|
||
|
Do prolog
|
||
|
T}
|
||
|
jump( l)#:#T{
|
||
|
Perform a jump to label l
|
||
|
T}
|
||
|
locals( n)#:#T{
|
||
|
Allocate n bytes on the stack
|
||
|
T}
|
||
|
#
|
||
|
NAME_FMT#:#T{
|
||
|
Print format describing name to a unique name conversion. The format must
|
||
|
contain %s.
|
||
|
T}
|
||
|
DNAM_FMT#:#T{
|
||
|
Print format describing data-label to a unique name conversion. The format
|
||
|
must contain %s.
|
||
|
T}
|
||
|
DLB_FMT#:#T{
|
||
|
Print format describing numerical-data-label to a unique name conversion.
|
||
|
The format must contain a %ld.
|
||
|
T}
|
||
|
ILB_FMT#:#T{
|
||
|
Print format describing instruction-label to a unique name conversion.
|
||
|
The format must contain %d followed by %ld.
|
||
|
T}
|
||
|
HOL_FMT#:#T{
|
||
|
Print format describing hol-block-number to a unique name conversion.
|
||
|
The format must contain %d.
|
||
|
T}
|
||
|
#
|
||
|
EM_WSIZE#:#T{
|
||
|
Size of a word in bytes on the target machine
|
||
|
T}
|
||
|
EM_PSIZE#:#T{
|
||
|
Size of a pointer in bytes on the target machine
|
||
|
T}
|
||
|
EM_BSIZE#:#T{
|
||
|
Size of base block in bytes on the target machine
|
||
|
T}
|
||
|
#
|
||
|
ONE_BYTE#:#T{
|
||
|
\\'C'-type which occupies one byte on the machine where the \fBce\fR runs
|
||
|
T}
|
||
|
TWO_BYTES#:#T{
|
||
|
\\'C'-type which occupies two bytes on the machine where the \fBce\fR runs
|
||
|
T}
|
||
|
FOUR_BYTES#:#T{
|
||
|
\\'C'-type which occupies four bytes on the machine where the \fBce\fR runs
|
||
|
T}
|
||
|
#
|
||
|
BSS_INIT#:#T{
|
||
|
The default value which the loader puts in the bss segment
|
||
|
T}
|
||
|
#
|
||
|
BYTES_REVERSED#:#T{
|
||
|
Must be defined if you want the byte order reversed.
|
||
|
By default the least significant byte is outputted first.
|
||
|
T}
|
||
|
WORD_REVERSED#:#T{
|
||
|
Must be defined if you want the word order reversed.
|
||
|
By default the least significant word is outputted first.
|
||
|
T}
|
||
|
.TE
|
||
|
.LP
|
||
|
An example of the file 'mach.h' for the vax4 with 4.1 BSD - UNIX.
|
||
|
.TS
|
||
|
tab(:);
|
||
|
l l l.
|
||
|
#define : ONE_BYTE : char
|
||
|
#define : TWO_BYTES : short
|
||
|
#define : FOUR_BYTES : long
|
||
|
:
|
||
|
#define : EM_WSIZE : 4
|
||
|
#define : EM_PSIZE : 4
|
||
|
#define : EM_BSIZE : 0
|
||
|
:
|
||
|
#define : BSS_INIT : 0
|
||
|
:
|
||
|
#define : NAME_FMT : "_%s"
|
||
|
#define : DNAM_FMT : "_%s"
|
||
|
#define : DLB_FMT : "_%ld"
|
||
|
#define : ILB_FMT : "I%03d%ld"
|
||
|
#define : HOL_FMT : "hol%d"
|
||
|
.TE
|
||
|
.nr PS 12
|
||
|
.nr VS 20
|
||
|
Notice that EM_BSIZE is zero. The vax4 takes care of this automatically.
|
||
|
.PP
|
||
|
There are three routine's which have to be defined by the table writer. The
|
||
|
table writer can define them as ordinary "C"-functions in the file "mach.c" or
|
||
|
define them in the EM_table. For example, for the 8086 it looks like this:
|
||
|
.DS
|
||
|
\f5
|
||
|
jump ==> "jmp $1".
|
||
|
|
||
|
prolog ==> "push bp";
|
||
|
"mov bp, sp".
|
||
|
|
||
|
locals
|
||
|
$1 == 0 ::= .
|
||
|
$1 == 2 ==> "push ax".
|
||
|
$1 == 4 ==> "push ax";
|
||
|
"push ax".
|
||
|
default ==> "sub sp, $1".
|
||
|
\fR
|
||
|
.DE
|
||
|
.NH 2
|
||
|
Generating assembly
|
||
|
.PP
|
||
|
The constants 'BYTES_REVERSED' and 'WORDS_REVERSED' are not needed.
|
||
|
.NH 1
|
||
|
Description of the as_table
|
||
|
.PP
|
||
|
This section describes the as_table. Like the previous section it is divided in
|
||
|
four parts: the first part describes the grammar of the as_table; the second
|
||
|
part describes the semantics of the as_table; the third part gives an overview
|
||
|
of the functions and the constants that must be present in the as_table, in
|
||
|
the file 'as.h' or in the file 'as.c'; the last part describes the case when
|
||
|
assembly is generated instead of object code.
|
||
|
The part on semantics contains examples which appear in the as_table for the
|
||
|
VAX or for the 8086.
|
||
|
.NH 2
|
||
|
Grammar
|
||
|
.PP
|
||
|
The formal form of the as_table is given by the following grammar :
|
||
|
.VS +4
|
||
|
.TS
|
||
|
center tab(#);
|
||
|
l c l.
|
||
|
TABLE#::=#( RULE)*
|
||
|
RULE#::=#( mnemonic | '...') DECL_LIST '==>' ACTION_LIST
|
||
|
DECL_LIST#::=#DECLARATION ( ',' DECLARATION)*
|
||
|
DECLARATION#::=#operand [ ':' type]
|
||
|
ACTION_LIST#::=#ACTION ( ';' ACTION) '.'
|
||
|
ACTION#::=#IF_STATEMENT
|
||
|
#|#function-call
|
||
|
#|#@function-call
|
||
|
IF_STATEMENT#::=#'@if' '(' condition ')' ACTION_LIST
|
||
|
##( '@elsif' '(' condition ')' ACTION_LIST)*
|
||
|
##[ '@else' ACTION_LIST]
|
||
|
##'@fi'
|
||
|
.TE
|
||
|
.VS -4
|
||
|
.LP
|
||
|
\fBmnemonic\fR, \fBoperand\fR and \fBtype\fR are all C-identifiers,
|
||
|
\fBcondition\fR is a normal C-expression.
|
||
|
\fBfunction-call\fR must be a C function call.
|
||
|
.NH 2
|
||
|
Semantics
|
||
|
.PP
|
||
|
The as_table consists of rules which map assembly instructions onto
|
||
|
\fBback\fR-primitives, a set of functions that write in the object file.
|
||
|
The table is processed by \fBasg\fR, and it generates a set of C-functions,
|
||
|
one for each assembler mnemonic. (The names of
|
||
|
these functions are the assembler mnemonics postfixed with '_instr', e.g.
|
||
|
\'add' becomes 'add_instr()'.) These functions will be used by the function
|
||
|
assemble() during the expansion of the EM_table.
|
||
|
After explainig the semantics of the as_table the function function
|
||
|
assemble() will be described.
|
||
|
.NH 3
|
||
|
Rules
|
||
|
.PP
|
||
|
A rule in the as_table consists of a left and right side;
|
||
|
the left side describes an assembler instruction (mnemonic and operands); the
|
||
|
right side gives the corresponding actions as \fBback\fR-primitives or as
|
||
|
functions, defined by the table writer, that call \fBback-primitives\fR.
|
||
|
A simple example from the VAX as_table and the 8086 as_table:
|
||
|
.DS L
|
||
|
\f5
|
||
|
movl src, dst ==> @text1( 0xd0);
|
||
|
gen_operand( src); /* function that encodes operands */
|
||
|
gen_operand( dst). /* by calling back-primitives. */
|
||
|
|
||
|
rep ens:MOVS ==> @text1( 0xf3);
|
||
|
@text1( 0xa5).
|
||
|
|
||
|
\fR
|
||
|
.DE
|
||
|
.NH 3
|
||
|
Declaration of types.
|
||
|
.PP
|
||
|
In general a machine instruction is encoded as an opcode optionally followed by
|
||
|
the operands, but there are two methods for mapping assembler mnemonics
|
||
|
onto opcodes : the mnemonic determines the opcode, or mnemonic and operands
|
||
|
determine the opcode. Both cases can be easily expressed in the as_table.
|
||
|
The first case is obvious. For the second case type fields for the operands
|
||
|
are introduced.
|
||
|
.LP
|
||
|
When both mnemonic and operands determine the opcode, the table writer has
|
||
|
to give several rules for each combination of mnemonic and operands. The rules
|
||
|
differ in the type fields of the operands.
|
||
|
The table writer has to supply functions that check the type
|
||
|
of the operand. The name of such an function is the name of the type; it
|
||
|
has one argument: a pointer to a struct of type t_operand; it returns
|
||
|
1 when the operand is of this type, otherwise it returns 0.
|
||
|
.LP
|
||
|
This will usually lead to a list of rules per mnemonic. To reduce the amount of
|
||
|
work an abbrevation is supplied. Once the mnemonic is specified it can be
|
||
|
refered to in the following rules by '...'.
|
||
|
One has to make sure
|
||
|
that each mnemonic is once mentioned in the as_table, otherwise \fBasg\fR will
|
||
|
generate more than one function with the same name.
|
||
|
.LP
|
||
|
The following example shows the usage of type fields.
|
||
|
.DS L
|
||
|
\f5
|
||
|
mov dst:REG, src:EADDR ==> @text1( 0x8b); /* opcode */
|
||
|
mod_RM( %d(dst->reg), src). /* operands */
|
||
|
|
||
|
... dst:EADDR, src:REG ==> @text1( 0x89); /* opcode */
|
||
|
mod_RM( %d(src->reg), dst). /* operands */
|
||
|
\fR
|
||
|
.DE
|
||
|
The table-writer must supply the restriction functions, \f5REG\fR and
|
||
|
\f5EADDR\fR in the previous example, in 'as.c'/'as.h'.
|
||
|
.NH 3
|
||
|
The function of the @-sign and the if-statement.
|
||
|
.PP
|
||
|
The righthand side of a rule consists of function calls. Some of the
|
||
|
functions generate object code directly (e.g., the \fBback\fR-primitives),
|
||
|
others are needed for further assemblation (e.g., \f5gen_operand()\fR in the
|
||
|
first example). The last group will be evaluated during the expansion
|
||
|
of the EM_table, while the first group is incorporated in the compiler.
|
||
|
This is denoted by the @-sign in front of the \fBback\fR-primitives.
|
||
|
.LP
|
||
|
The next example concerns the use of the '@'-sign in front of a table writer
|
||
|
written
|
||
|
function. The need for this construction arises when you implement push/pop
|
||
|
optimization; flags need to be set/unset and tested during the execution of
|
||
|
the compiler:
|
||
|
.DS L
|
||
|
\f5
|
||
|
PUSH src ==> mov_instr( AX_oper, src); /* save in ax */
|
||
|
@assign( push_waiting, TRUE). /* set flag */
|
||
|
|
||
|
POP dst ==> @if ( push_waiting)
|
||
|
mov_instr( dst, AX_oper); /* asg-generated */
|
||
|
@assign( push_waiting, FALSE).
|
||
|
@else
|
||
|
pop_instr( dst). /* asg-generated */
|
||
|
@fi.
|
||
|
\fR
|
||
|
.DE
|
||
|
.PP
|
||
|
A problem arises when information is needed that is not known until execution of
|
||
|
the compiler. For example one needs to know if a '$\fIi\fR' argument fits in
|
||
|
one byte.
|
||
|
In this case one can use a special if-statement provided by \fBasg\fR:
|
||
|
@if, @elsif, @else, @fi. This means that the conditions will be evaluated at
|
||
|
runtime of the \fBce\fR. In such a condition one may of course refer to the
|
||
|
'$\fIi\fR' arguments. For example, constants can be packed into one or two byte
|
||
|
arguments:
|
||
|
.DS L
|
||
|
\f5
|
||
|
mov dst:ACCU, src:DATA ==> @if ( fits_byte( %$(dst->expr)))
|
||
|
@text1( 0xc0);
|
||
|
@text1( %$(dst->expr)).
|
||
|
@else
|
||
|
@text1( 0xc8);
|
||
|
@text2( %$(dst->expr)).
|
||
|
@fi.
|
||
|
.DE
|
||
|
.NH 3
|
||
|
References to operands
|
||
|
.PP
|
||
|
As mentioned before, the operands of an assembler instruction may be used as
|
||
|
pointers, to the struct t_operand, in the righthand side of the table.
|
||
|
Because of the free format assembler, the types of the fields in the struct
|
||
|
t_operand are unknown to \fBasg\fR. Clearly \fBasg\fR must know these types.
|
||
|
This section explains how these types must be specified.
|
||
|
.LP
|
||
|
References to operands come in three forms: ordinary operands, operands that
|
||
|
contain '$\fIi\fR' referneces, and operands that refer to names of local labels.
|
||
|
The '$\fIi\fR' in operands represent names or numbers of an \fBC_instr\fR and must
|
||
|
be given as arguments to the \fBback\fR-primitives. Labels in operands
|
||
|
must be converted to a number that tells the distance, the number of bytes,
|
||
|
between the label and the current position in the text-segment.
|
||
|
.LP
|
||
|
All these three cases are treated in an uniform way. When the table writer
|
||
|
makes a reference to an operand of an assembly instruction, he must describe
|
||
|
the type of the operand in the following way.
|
||
|
.DS
|
||
|
\f5
|
||
|
reference := '%' conversion '(' operand-name '->' field-name ')'
|
||
|
conversion := printformat |
|
||
|
'$' |
|
||
|
'dist'
|
||
|
printformat := see PRINT(3ACK)
|
||
|
\fR
|
||
|
.DE
|
||
|
The three cases differ only in the conversion field. The first conversion
|
||
|
applies to ordinary operands. The second applies to operands that contain
|
||
|
a '$\fIi\fR'. The expression between brackets must of type char *. The
|
||
|
result of '%$' is of the type of '$\fIi\fR'. The
|
||
|
third applies operands that refer to a local label. The expression between
|
||
|
the brackets must be of type char *. The result of '%dist' is of type arith.
|
||
|
.LP
|
||
|
The following example illustrates the usage of '%$'. (For an
|
||
|
example that illustrates the usage of ordinary fields see the example in
|
||
|
the section on 'User supplied definitions and functions).
|
||
|
.DS L
|
||
|
\f5
|
||
|
jmp dst ==> @text1( 0xe9);
|
||
|
@reloc2( %$(dst->lab), %$(dst->off), PC_REL).
|
||
|
\fR
|
||
|
.DE
|
||
|
.LP
|
||
|
A useful function concerning $\fIi\fRs is arg_type(), which takes as input a
|
||
|
string starting with $\fIi\fR and returns the type of the \fIi\fR'th argument
|
||
|
of the current EM-instruction, which can be STRING, ARITH or INT. One may need
|
||
|
this function while decoding operands if the context of the $\fIi\fR doesn't
|
||
|
give enough information.
|
||
|
If the function arg_type() is used, the file
|
||
|
arg_type.h must contain the definition of STRING, ARITH and INT.
|
||
|
.LP
|
||
|
%dist is only guaranteed to work when called as a parameter of text1(), text2() or text4().
|
||
|
The goal of the %dist conversion is to reduce the number of reloc1(), reloc2()
|
||
|
and reloc4()
|
||
|
calls, saving space and time (no relocation at compiler runtime).
|
||
|
.LP
|
||
|
The following example illustrates the usage of '%dist'.
|
||
|
.DS L
|
||
|
\f5
|
||
|
jmp dst:ILB ==> @text1( 0xeb); /* label in an instructionlist */
|
||
|
@text1( %dist( dst->lab)).
|
||
|
|
||
|
... dst:LABEL ==> @text1( 0xe9); /* global label */
|
||
|
@reloc2( %$(dst->lab), %$(dst->off), PC_REL).
|
||
|
\fR
|
||
|
.DE
|
||
|
.NH 3
|
||
|
The functions assemble() and block_assemble
|
||
|
.PP
|
||
|
Assemble() and block_assemble() are two function that are provided by \fBceg\fR.
|
||
|
However, if one is not satisfied with the way they work the table writer can
|
||
|
supply his own assemble or block_assemble().
|
||
|
The default function assemble() splits an assembly string in a label, mnemonic
|
||
|
and operands and performs the following actions on them:
|
||
|
.IP \0\01)
|
||
|
It processes the local label; records the name and current position. Thereafter it calls the function process_label() with one argument of type string,
|
||
|
the label. The table writer has to define this function.
|
||
|
.IP \0\02)
|
||
|
Thereafter it calls the function process_mnemonic() with one argument of
|
||
|
type string, the mnemonic. The table writer has to define this function.
|
||
|
.IP \0\03)
|
||
|
It calls process_operand() for each operand. Process_operand() must be
|
||
|
written by the table-writer since no fixed representation for operands
|
||
|
is enforced. It has two arguments, a string (the operand to decode)
|
||
|
and a pointer to the struct t_operand. The declaration of the struct
|
||
|
t_operand must be given in the
|
||
|
file 'as.h', and the table-writer can put in it all the information needed for
|
||
|
encoding the operand in machine format.
|
||
|
.IP \0\04)
|
||
|
It examines the mnemonic and calls the associated function, generated by
|
||
|
\fBasg\fR, with pointers to the decoded operands as arguments. This makes it
|
||
|
possible to use the decoded operands in the right hand side of a rule (see
|
||
|
below).
|
||
|
.PP
|
||
|
The default function block_assemble() is called with a sequence of assembly
|
||
|
instructions that belong to one action list. For every assembly instruction
|
||
|
in
|
||
|
this block assemble() is called. But, if a special action is
|
||
|
required on bloack of assembly instructions, the table writer only has to
|
||
|
rewrite this function to get a new \fBceg\fR that oblies to his wishes.
|
||
|
.PP
|
||
|
Only four things have to be specified in 'as.h' and 'as.c'. First the user must
|
||
|
give the declaration of struct t_operand in 'as.h', and the functions
|
||
|
process_operand(), process_mnemonic() and process_label() must be given
|
||
|
in 'as.c'. If the right side of the as_table
|
||
|
contains function calls other than the \fBback\fR-primitives, these functions
|
||
|
must also be present in 'as.c'. Note that both the '@'-sign and 'references'
|
||
|
also work in
|
||
|
the functions defined in 'as.c'. Example, part of 8086 'as.h' and 'as.c'
|
||
|
files :
|
||
|
.nr PS 10
|
||
|
.nr VS 12
|
||
|
.DS L
|
||
|
\f5
|
||
|
/*============== as.h ========================================*/
|
||
|
|
||
|
/* type of operand */
|
||
|
#define UNKNOWN 0
|
||
|
#define IS_REG 0x1
|
||
|
#define IS_ACCU 0x2
|
||
|
#define IS_DATA 0x4
|
||
|
#define IS_LABEL 0x8
|
||
|
#define IS_MEM 0x10
|
||
|
#define IS_ADDR 0x20
|
||
|
#define IS_ILB 0x40
|
||
|
|
||
|
/* restriction macros */
|
||
|
#define REG( op) ( op->type & IS_REG)
|
||
|
#define DATA( op) ( op->type & IS_DATA)
|
||
|
#define lABEL( op) ( op->type & IS_LABEL)
|
||
|
#define ILB( op) ( op->type & IS_ILB)
|
||
|
#define MEM( op) ( op->type & IS_MEM)
|
||
|
#define ADDR( op) ( op->type & IS_ADDR)
|
||
|
|
||
|
/* decoded information */
|
||
|
struct t_operand {
|
||
|
unsigned type;
|
||
|
int reg;
|
||
|
char *expr, *lab, *off;
|
||
|
};
|
||
|
\fR
|
||
|
.DE
|
||
|
.DS L
|
||
|
\f5
|
||
|
/*============== as.c ========================================*/
|
||
|
|
||
|
#include "as.h"
|
||
|
#include "arg_type.h"
|
||
|
|
||
|
|
||
|
#define last( s) ( s + strlen( s) - 1)
|
||
|
#define LEFT '('
|
||
|
#define RIGHT ')'
|
||
|
|
||
|
decode_operand( str, op)
|
||
|
char *str;
|
||
|
struct t_operand *op;
|
||
|
|
||
|
/* Operands in i86-assembly have the following syntax :
|
||
|
*
|
||
|
* expr -> IS_DATA | IS_LABEL | IS_ILB
|
||
|
* reg -> IS_REG
|
||
|
* (expr) -> IS_ADDR
|
||
|
* expr(reg) -> IS_MEM
|
||
|
*/
|
||
|
{
|
||
|
char *ptr, *index();
|
||
|
|
||
|
op->type = UNKNOWN;
|
||
|
if ( *last( str) == RIGHT) { /* (expr) or expr(reg) */
|
||
|
ptr = index( str, LEFT);
|
||
|
*last( str) = '\\\\0';
|
||
|
*ptr = '\\\\0';
|
||
|
if ( is_reg( ptr+1, op)) { /* expr(reg) */
|
||
|
op->type = IS_MEM;
|
||
|
op->expr = ( *str == '\\\\0' ? "0" : str);
|
||
|
}
|
||
|
else {
|
||
|
set_label( ptr+1, op); /* (expr) */
|
||
|
op->type = IS_ADDR;
|
||
|
}
|
||
|
}
|
||
|
else
|
||
|
if ( is_reg( str, op))
|
||
|
op->type = IS_REG;
|
||
|
else {
|
||
|
if ( contains_label( str))
|
||
|
set_label( str, op);
|
||
|
else {
|
||
|
op->type = IS_DATA;
|
||
|
op->expr = str;
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
|
||
|
|
||
|
mod_RM( reg, op)
|
||
|
int reg;
|
||
|
struct t_operand *op;
|
||
|
|
||
|
/* This function helps to decode operands in machine format,
|
||
|
* note the $-operators
|
||
|
*/
|
||
|
{
|
||
|
if ( REG( op))
|
||
|
@R233( 0x3, reg, op->reg);
|
||
|
else if ( ADDR( op)) {
|
||
|
@R233( 0x0, reg, 0x6);
|
||
|
@reloc2( %$(op->lab), %$(op->off), !PC_REL);
|
||
|
}
|
||
|
else if ( strcmp( op->expr, "0") == 0)
|
||
|
switch( op->reg) {
|
||
|
case SI : @R233( 0x0, %d(reg), 0x4);
|
||
|
break;
|
||
|
|
||
|
case DI : @R233( 0x0, %d(reg), 0x5);
|
||
|
break;
|
||
|
|
||
|
case BP : @R233( 0x1, %d(reg), 0x6);
|
||
|
@text1( 0);
|
||
|
break;
|
||
|
|
||
|
case BX : @R233( 0x0, %d(reg), 0x7);
|
||
|
break;
|
||
|
|
||
|
default : fprint( STDERR, "Wrong index register %d\\\\n",
|
||
|
op->reg);
|
||
|
}
|
||
|
else {
|
||
|
switch( op->reg) {
|
||
|
case SI : @R233( 0x2, %d(reg), 0x4);
|
||
|
break;
|
||
|
|
||
|
case DI : @R233( 0x2, %d(reg), 0x5);
|
||
|
break;
|
||
|
|
||
|
case BP : @R233( 0x2, %d(reg), 0x6);
|
||
|
break;
|
||
|
|
||
|
case BX : @R233( 0x2, %d(reg), 0x7);
|
||
|
break;
|
||
|
|
||
|
default : fprint( STDERR, "Wrong index register %d\\\\n",
|
||
|
op->reg);
|
||
|
}
|
||
|
@text2( %$(op->expr));
|
||
|
}
|
||
|
}
|
||
|
\fR
|
||
|
.DE
|
||
|
.nr PS 12
|
||
|
.nr VS 20
|
||
|
If one is unsatisfied with the default assemble() function, one may put one's
|
||
|
own one in the file 'as.c'; assemble() has one string-argument.
|
||
|
.NH 2
|
||
|
Generating assembly
|
||
|
.PP
|
||
|
It is possible to generate assembly in stead of objectfiles (see section 5), in
|
||
|
which case one doesn't have to supply 'as_table', 'as.h' and 'as.c'. This option
|
||
|
is useful for debugging the EM_table.
|
||
|
.NH 1
|
||
|
Building a ce
|
||
|
.PP
|
||
|
This section describes how to generate a code expander. The best way to
|
||
|
generate one is to build it in two phases. In phase one, the EM_table is
|
||
|
written and tested. In the second phase, the as_table is written and tested.
|
||
|
.NH 2
|
||
|
Phase one
|
||
|
.PP
|
||
|
The following is a list of instruction that describe how to make a
|
||
|
code expander that generates assembly instruction.
|
||
|
.IP \0\0-1
|
||
|
Create a new directory.
|
||
|
.IP \0\0-2
|
||
|
Create the 'EM_table', 'mach.h' and 'mach.c' files; there is no need
|
||
|
for 'as_table', 'as.h' and 'as.c' at this moment.
|
||
|
.IP \0\0-3
|
||
|
type
|
||
|
.br
|
||
|
\f5
|
||
|
install_ceg -as
|
||
|
\fR
|
||
|
.br
|
||
|
install_ceg will create a Makefile, and three directories : ceg, ce and back.
|
||
|
Ceg will contain the program ceg; this program will be
|
||
|
used to turn 'EM_table' into a set of C-source files ( in the ce directory)
|
||
|
, one for each
|
||
|
EM-instruction. All these files will be compiled and put in a library called
|
||
|
\fBce.a\fR.
|
||
|
.br
|
||
|
The option \f5-as\fR means that a \fBback\fR-library will be generated ( in the directory back) that
|
||
|
supports the generation of assembly language. The library is named 'back.a'.
|
||
|
.IP \0\0-4
|
||
|
Link a front end, 'ce.a' and 'back.a' together resulting in a compiler.
|
||
|
.LP
|
||
|
Now, the EM_table can be tested; if an error occures, change the table
|
||
|
and type
|
||
|
\f5
|
||
|
.DS
|
||
|
\f5update\fR \fBC_instr\fR
|
||
|
,where \fBC_instr\fR stands for the name of the erronous EM-instruction.
|
||
|
.DE
|
||
|
\fR
|
||
|
.NH 2
|
||
|
Phase two
|
||
|
.PP
|
||
|
The next phase is to generate a \fBce\fR that produces relocatable object
|
||
|
code.
|
||
|
.IP \0\0-1
|
||
|
Remove the 'ce' and 'ceg' directories.
|
||
|
.IP \0\0-2
|
||
|
Write the 'as_table', 'as.h' and 'as.c' files.
|
||
|
.IP \0\0-3
|
||
|
type
|
||
|
.br
|
||
|
\f5
|
||
|
install_ceg -obj
|
||
|
\fR
|
||
|
.br
|
||
|
The option \f5-obj\fR means that 'back.a' will contain a library for generating
|
||
|
NEW A.OUT(5L) object files, see appendix B. If another 'back.a' is used,
|
||
|
omit the \f5-obj\fR flag.
|
||
|
.IP \0\0-4
|
||
|
Link a front end, 'ce.a' and 'back.a' together resulting in a compiler.
|
||
|
.LP
|
||
|
The as_table is ready to be tested. If an error occures, change the table.
|
||
|
Then there are two ways to proceed:
|
||
|
.IP \0\0-1
|
||
|
recompile the whole EM_table,
|
||
|
.br
|
||
|
\f5
|
||
|
update ALL
|
||
|
\fR
|
||
|
.br
|
||
|
.IP \0\0-2
|
||
|
recompile just the few EM-instructions that contained the error,
|
||
|
\f5
|
||
|
.br
|
||
|
update \fBC_instr\fR
|
||
|
.br
|
||
|
,where \fBC_instr\fR is an erroneous EM-instruction.
|
||
|
\fR
|
||
|
.NH
|
||
|
References
|
||
|
.PP
|
||
|
.IP \ \1:
|
||
|
PRINT(3ACK), an ACK manual page.
|
||
|
.IP \ \2:
|
||
|
EM_CODE(3L), an ACK manual page.
|
||
|
.IP \ \3:
|
||
|
NEW_A.OUT(5L), an ACK manual page.
|
||
|
.IP \ \4:
|
||
|
The C programming language, B.W. Kernighan & D.M. Ritchie.
|
||
|
.IP \ \5:
|
||
|
Description of a Machine Architecture for use with Block Structured
|
||
|
Languages (IR-81), A.S Tanenbaum & H. van Staveren & E.G. Keizer &
|
||
|
J.H. Stevenson.
|
||
|
.bp
|
||
|
.SH
|
||
|
Appendix A, \fRthe \fBback\fR-primitives
|
||
|
.PP
|
||
|
This appendix describes the routines avaible to generate relocatable
|
||
|
object code. If the default back.a is used, the object code is in
|
||
|
ACK A.OUT(5L) format.
|
||
|
.nr PS 10
|
||
|
.nr VS 12
|
||
|
.PP
|
||
|
.IP A1.
|
||
|
Text and data generation; with ONE_BYTE b; TWO_BYTES w; FOUR_BYTES l; arith n;
|
||
|
.VS +4
|
||
|
.TS
|
||
|
tab(#);
|
||
|
l c lw(10c).
|
||
|
text1( b)#:#T{
|
||
|
Put one byte in text-segment.
|
||
|
T}
|
||
|
text2( w)#:#T{
|
||
|
Put word (two bytes) in text-segment, byte-order is defined by
|
||
|
BYTES_REVERSED in mach.h.
|
||
|
T}
|
||
|
text4( l)#:#T{
|
||
|
Put long ( two words) in text-segment, word-order is defined by
|
||
|
WORDS_REVERSED in mach.h.
|
||
|
T}
|
||
|
#
|
||
|
con1( b)#:#T{
|
||
|
Same for CON-segment.
|
||
|
T}
|
||
|
con2( w)#:
|
||
|
con4( l)#:
|
||
|
#
|
||
|
rom1( b)#:#T{
|
||
|
Same for ROM-segment.
|
||
|
T}
|
||
|
rom2( w)#:
|
||
|
rom4( l)#:
|
||
|
#
|
||
|
gen1( b)#:#T{
|
||
|
Same for the current segment, only to be used in the "..icon", "..ucon", etc.
|
||
|
pseudo EM-instructions.
|
||
|
T}
|
||
|
gen2( w)#:
|
||
|
gen4( l)#:
|
||
|
#
|
||
|
bss( n)#:#T{
|
||
|
Put n bytes in bss-segment, value is BSS_INIT.
|
||
|
T}
|
||
|
.TE
|
||
|
.VS -4
|
||
|
.IP A2.
|
||
|
Relocation; with char *s; arith o; int r;
|
||
|
.VS +4
|
||
|
.TS
|
||
|
tab(#);
|
||
|
l c lw(10c).
|
||
|
reloc1( s, o, r)#:#T{
|
||
|
Generates relocation-information for 1 byte in the current segment.
|
||
|
T}
|
||
|
##s\0:\0the string which must be relocated
|
||
|
##o\0:\0the offset in bytes from the string.
|
||
|
##T{
|
||
|
r\0:\0relocation type. It can have the values ABSOLUTE or PC_REL. These
|
||
|
two constants are defined in the file 'back.h'
|
||
|
T}
|
||
|
reloc2( s, o, r)#:#T{
|
||
|
Generates relocation-information for 1 word in the
|
||
|
current segment. Byte-order according to BYTES_REVERSED in mach.h.
|
||
|
T}
|
||
|
reloc4( s, o, r)#:#T{
|
||
|
Generates relocation-information for 1 long in the
|
||
|
current segment. Word-order according to WORDS_REVERSED in mach.h.
|
||
|
T}
|
||
|
.TE
|
||
|
.VS -4
|
||
|
.IP A3.
|
||
|
Symbol table interaction; with int seg; char *s;
|
||
|
.VS +4
|
||
|
.TS
|
||
|
tab(#);
|
||
|
l c lw(10c).
|
||
|
switch_segment( seg)#:#T{
|
||
|
sets current segment to 'seg', and does alignment if necessary.
|
||
|
'seg' can be one of the four constants defined in 'back.h': SEGTXT, SEGROM,
|
||
|
SEGCON, SEGBSS.
|
||
|
T}
|
||
|
#
|
||
|
symbol_definition( s)#:#T{
|
||
|
Define s in symbol-table.
|
||
|
T}
|
||
|
set_local_visible( s)#:#T{
|
||
|
Record scope-information in symbol table.
|
||
|
T}
|
||
|
set_global_visible( s)#:
|
||
|
.TE
|
||
|
.VS -4
|
||
|
.IP A4.
|
||
|
Start/end actions; with char *f;
|
||
|
.VS +4
|
||
|
.TS
|
||
|
tab(#);
|
||
|
l c lw(10c).
|
||
|
do_open( f)#:#T{
|
||
|
Directs output to file 'f', if f is the null pointer output must be given on
|
||
|
standard output.
|
||
|
T}
|
||
|
output()#:#T{
|
||
|
End of the job, flush output.
|
||
|
T}
|
||
|
do_close()#:#T{
|
||
|
close outputstream.
|
||
|
T}
|
||
|
init_back()#:#T{
|
||
|
Only used with user-written back-library, gives the opportunity to initialize.
|
||
|
T}
|
||
|
end_back()#:#T{
|
||
|
Only used with user-written back-library.
|
||
|
T}
|
||
|
.TE
|
||
|
.VS -4
|
||
|
.nr PS 12
|
||
|
.nr VS 14
|
||
|
.bp
|
||
|
.SH
|
||
|
Appendix B, description of ACK-a.out library
|
||
|
.PP
|
||
|
The object file produced by \fBce\fR is by default in ACK NEW_A.OUT(5L)
|
||
|
format. The object file consists of one header, followed by
|
||
|
four segment headers, followed by text, data, relocation information,
|
||
|
symbol table and the string area. The object file is tuned for the ACK-LED,
|
||
|
so there are some special things done just before the object file is dumped.
|
||
|
First, the four relocation records are added which contain the names of the four
|
||
|
segments. Second, all the local relocation is resolved. This is done by the
|
||
|
function do_relo(). If there is a record belonging to a local
|
||
|
name this address is relocated in the segment to which the record belongs.
|
||
|
Besides doing the local relocation, do_relo() changes the 'nami'-field
|
||
|
of the local relocation records. This field receives the index of one of the
|
||
|
four
|
||
|
relocation records belonging to a segment. After the local
|
||
|
relocation has been resolved the routine output() dumps the ACK object file.
|
||
|
.LP
|
||
|
If a different a.out format is wanted, one can choose between three strategies:
|
||
|
.IP \ \1:
|
||
|
The most simple one is to use a conversion program, which converts the ACK
|
||
|
a.out format to the wanted a.out format. This program exists for all most
|
||
|
all machines on which ACK runs. The disadvantage is that the compiler
|
||
|
will become slower.
|
||
|
.IP \ \2:
|
||
|
A better solution is to change the function output(), do_relo(), do_open()
|
||
|
and do_close() in such a way
|
||
|
that it produces the wanted a.out format. This strategy saves a lot of I/O.
|
||
|
.IP \ \3:
|
||
|
If you still are not satisfied and have a lot of spare time change the
|
||
|
\fBback\fR-primitives in such a way that they produce the wanted a.out format.
|