1988-04-12 14:31:05 +00:00
.nr PS 12
.nr VS 14
.nr LL 6i
1988-04-13 14:33:11 +00:00
.tr ~
1988-04-12 14:31:05 +00:00
.TL
1988-04-19 11:41:16 +00:00
The Code Expander Generator
1988-04-12 14:31:05 +00:00
.AU
Frans Kaashoek
Koen Langendoen
.AI
Dept. of Mathematics and Computer Science
Vrije Universiteit
Amsterdam, The Netherlands
.NH
Introduction
.PP
1988-04-13 14:33:11 +00:00
A \fBcode expander\fR (\fBce\fR for short) is a part of the
1988-04-19 10:41:05 +00:00
Amsterdam Compiler Kit (\fBACK\fR) and provides the user with
1988-04-12 14:31:05 +00:00
high-speed generation of medium-quality code. Although conceptually
equivalent to the more usual \fBcode generator\fR, it differs in some
aspects.
1988-05-03 15:15:28 +00:00
.PP
1988-04-12 14:31:05 +00:00
Normally, a program to be compiled with \fBACK\fR
1988-04-19 10:41:05 +00:00
is first fed to the preprocessor. The output of the preprocessor goes
1988-05-03 15:15:28 +00:00
into the appropriate front end, which produces EM
.[
1988-05-16 10:51:32 +00:00
Tanenbaum toolkit
1988-05-03 15:15:28 +00:00
.]
1988-04-13 14:33:11 +00:00
(a
1988-04-12 14:31:05 +00:00
machine independent low level intermediate code). The generated EM code is fed
into the peephole optimizer, which scans it with a window of a few instructions,
replacing certain inefficient code sequences by better ones. After the
1988-04-19 10:41:05 +00:00
peephole optimizer a back end follows, which produces high-quality assembly code.
1988-04-12 14:31:05 +00:00
The assembly code goes via the target optimizer into the assembler and the
1988-04-13 14:33:11 +00:00
object code then goes into the
1988-04-12 14:31:05 +00:00
linker/loader, the final component in the pipeline.
1988-05-03 15:15:28 +00:00
.PP
1988-04-12 14:31:05 +00:00
For various applications
1988-04-13 14:33:11 +00:00
this scheme is too slow. When debugging, for example,
1988-05-03 15:15:28 +00:00
compile time is more important than execution time of a program.
1988-04-13 14:33:11 +00:00
For this purpose a new scheme is introduced:
1988-04-12 14:31:05 +00:00
.IP \ \ 1:
1988-04-13 14:33:11 +00:00
The code generator and assembler are
1988-05-03 15:15:28 +00:00
replaced by a library, the \fBcode expander\fR, consisting of a set of
routines, one for every EM-instruction. Each routine expands its EM-instruction
into relocatable object code. In contrast, the usual ACK code generator uses
expensive pattern matching on sequences of EM-instructions.
1988-04-12 14:31:05 +00:00
The peephole and target optimizer are not used.
.IP \ \ 2:
1988-04-19 10:41:05 +00:00
These routines replace the usual EM-generating routines in the front end; this
eliminates the overhead of intermediate files.
1988-04-12 14:31:05 +00:00
.LP
1988-04-19 10:41:05 +00:00
This results in a fast compiler producing object file, ready to be
1988-04-12 14:31:05 +00:00
linked and loaded, at the cost of unoptimized object code.
1988-05-03 15:15:28 +00:00
.PP
1988-04-12 14:31:05 +00:00
Because of the
1988-05-03 15:15:28 +00:00
simple nature of the code expander, it is much easier to build, to debug, and to
1988-04-12 14:31:05 +00:00
test. Experience has demonstrated that a code expander can be constructed,
1988-05-03 15:15:28 +00:00
debugged, and tested in less than two weeks.
.PP
1988-04-12 14:31:05 +00:00
This document describes the tools for automatically generating a
1988-05-03 15:15:28 +00:00
\fBce\fR (a library of C files) from two tables and
1988-04-12 14:31:05 +00:00
a few machine-dependent functions.
1988-04-19 10:41:05 +00:00
A thorough knowledge of EM is necessary to understand this document.
1988-04-12 14:31:05 +00:00
.NH
1988-04-19 10:41:05 +00:00
The code expander generator
.PP
The code expander generator (\fBceg\fR) generates a code expander from
two tables and a few machine-dependent functions. This section explains how
1988-04-19 11:41:16 +00:00
\fBceg\fR works. The first half describes the transformations that are done on
the two tables. The
second half tells how these transformations are done by the \fBceg\fR.
1988-04-12 14:31:05 +00:00
.PP
A code expander consists of a set of routines that convert EM-instructions
directly to relocatable object code. These routines are called by a front
1988-04-19 10:41:05 +00:00
end through the EM_CODE(3ACK)
1988-05-03 15:15:28 +00:00
.[
1988-04-19 10:41:05 +00:00
EM_CODE
1988-05-03 15:15:28 +00:00
.]
1988-04-13 14:33:11 +00:00
interface. To free the table writer of the burden of building
an object file, we supply a set of routines that build an object file
1988-05-03 15:15:28 +00:00
in the ACK.OUT(5ACK)
.[
1988-04-19 12:40:43 +00:00
aout
1988-05-03 15:15:28 +00:00
.]
1988-04-13 14:33:11 +00:00
format (see appendix B). This set of routines is called
1988-04-12 14:31:05 +00:00
the
1988-04-19 10:41:05 +00:00
\fBback\fR-primitives (see appendix A). In short, a code expander consists of a
1988-05-03 15:15:28 +00:00
set of routines that map the EM_CODE interface on the
1988-04-19 11:41:16 +00:00
\fBback\fR-primitives interface.
1988-04-12 14:31:05 +00:00
.PP
To avoid repetition of the same sequences of
\fBback\fR-primitives in different
EM-instructions
1988-04-13 14:33:11 +00:00
and to improve readability, the EM-to-object information must be supplied in
1988-04-12 14:31:05 +00:00
two
1988-04-13 14:33:11 +00:00
tables. The EM_table maps EM to an assembly language, and the as_table
maps
1988-05-03 15:15:28 +00:00
assembly code to \fBback\fR-primitives. The assembly language is chosen by the
1988-04-19 10:41:05 +00:00
table writer. It can either be an actual assembly language or his ad-hoc
designed language.
1988-04-12 14:31:05 +00:00
.LP
The following picture shows the dependencies between the different components:
.sp
.PS
linewid = 0.5i
A: line down 2i
B: line down 2i with .start at A.start + (1.5i, 0)
C: line down 2i with .start at B.start + (1.5i, 0)
D: arrow right with .start at A.center - (0.25i, 0)
E: arrow right with .start at B.center - (0.25i, 0)
F: arrow right with .start at C.center - (0.25i, 0)
1988-04-13 14:33:11 +00:00
"EM_CODE(3ACK)" at A.start above
1988-04-19 10:41:05 +00:00
"EM_table" at B.start above
1988-04-12 14:31:05 +00:00
"as_table" at C.start above
"source language " at D.start rjust
"EM" at 0.5 of the way between D.end and E.start
1988-04-13 14:33:11 +00:00
G: "assembly" at 0.5 of the way between E.end and F.start
1988-04-12 14:31:05 +00:00
H: " back primitives" at F.end ljust
"(user defined)" at G - (0, 0.2i)
1988-05-03 15:15:28 +00:00
" (ACK.OUT)" at H - (0, 0.2i) ljust
1988-04-12 14:31:05 +00:00
.PE
.PP
1988-05-03 15:15:28 +00:00
The picture suggests that, during compilation, the EM instructions are
1988-04-19 10:41:05 +00:00
first transformed into assembly instructions and then the assembly instructions
1988-05-03 15:15:28 +00:00
are transformed into object-generating calls. This
1988-04-19 10:41:05 +00:00
is not what happens in practice, although the user is free to think it does.
Actually, however the EM_table and the as_table are combined during code
expander generation time, yielding an imaginary compound table that results in
routines from the EM_CODE interface that generate object code directly.
.PP
As already indicated, the compound table does not exist either. Instead, each
assembly instruction in the as_table is converted to a routine generating C
1988-04-19 11:41:16 +00:00
code
1988-05-03 15:15:28 +00:00
.[
1988-04-13 14:33:11 +00:00
Kernighan
1988-05-03 15:15:28 +00:00
.]
1988-04-19 11:41:16 +00:00
to generate C code to call the \fBback\fR-primitives. The EM_table is
1988-04-19 10:41:05 +00:00
converted into a program that for each EM instruction generates a routine,
using the routines generated from the as_table. Execution of the latter program
will then generate the code expander.
.PP
1988-05-03 15:15:28 +00:00
This scheme allows great flexibility
1988-04-19 11:41:16 +00:00
in the table writing, while still
1988-04-19 10:41:05 +00:00
resulting in a very efficient code expander. One implication is that the
as_table is interpreted twice and the EM_table only once. This has consequences
for their structure.
1988-04-12 14:31:05 +00:00
.PP
To illustrate what happens, we give an example. The example is an entry in
the tables for the VAX-machine. The assembly language chosen is a subset of the
VAX assembly language.
.PP
1988-05-03 15:15:28 +00:00
One of the most fundamental operations in EM is ``loc c'', load the value of c
1988-04-12 14:31:05 +00:00
on the stack. To expand this instruction the
tables contain the following information:
.DS
1988-04-19 10:41:05 +00:00
EM_table : \f5
C_loc ==> "pushl $$$1".
/* $1 refers to the first argument of C_loc.
* $$ is a quoted $. */
1988-04-12 14:31:05 +00:00
1988-04-19 10:41:05 +00:00
\fRas_table :\f5
pushl src : CONST ==>
1988-04-12 14:31:05 +00:00
@text1( 0xd0);
@text1( 0xef);
@text4( %$( src->num)).
\fR
.DE
.LP
1988-04-19 10:41:05 +00:00
The as_table is transformed in the following routine:
.DS
\f5
pushl_instr(src)
t_operand *src;
1988-05-03 15:15:28 +00:00
/* ``t_operand'' is a struct defined by the
* table writer. */
1988-04-19 10:41:05 +00:00
{
printf("swtxt();");
1988-05-03 15:15:28 +00:00
printf("text1( 0xd0 );");
printf("text1( 0xef );");
printf("text4(%s);", substitute_dollar( src->num));
1988-04-19 10:41:05 +00:00
}
\fR
.DE
1988-05-03 15:15:28 +00:00
Using ``pushl_instr()'', the following routine is generated from the EM_table:
1988-04-12 14:31:05 +00:00
.DS
\f5
C_loc( c)
arith c;
1988-05-03 15:15:28 +00:00
/* text1() and text4() are library routines that fill the
* text segment. */
1988-04-12 14:31:05 +00:00
{
swtxt();
1988-05-03 15:15:28 +00:00
text1( 0xd0);
text1( 0xef);
1988-04-12 14:31:05 +00:00
text4( c);
}
\fR
.DE
.LP
1988-05-03 15:15:28 +00:00
A compiler call to ``C_loc()'' will cause the 1-byte numbers ``0xd0''
and ``0xef''
and the 4-byte value of the variable ``c'' to be stored in the text segment.
1988-04-12 14:31:05 +00:00
.PP
The transformations on the tables are done automatically by the code expander
generator.
1988-05-03 15:15:28 +00:00
The code expander generator is made up of two tools:
\fBemg\fR and \fBasg\fR. \fBAsg\fR
1988-04-19 10:41:05 +00:00
transforms
1988-05-03 15:15:28 +00:00
each assembly instruction into a C routine. These C routines generate calls
to the \fBback\fR-primitives. The generated C routines are used
1988-04-19 10:41:05 +00:00
by \fBemg\fR to generate the actual code expander from the EM_table.
1988-04-12 14:31:05 +00:00
.PP
1988-04-19 10:41:05 +00:00
The link between \fBemg\fR and \fBasg\fR is an assembly language.
1988-04-13 14:33:11 +00:00
We did not enforce a specific syntax for the assembly language;
1988-05-03 15:15:28 +00:00
instead we have given the table writer the freedom
1988-04-12 14:31:05 +00:00
to make an ad-hoc assembly language or to use an actual assembly language
suitable for his purpose. Apart from a greater flexibility this
has another advantage; if the table writer adopts the assembly language that
runs on the machine at hand, he can test the EM_table independently from the
1988-04-13 14:33:11 +00:00
as_table. Of course there is a price to pay: the table writer has to
do the decoding of the operands himself. See section 4 for more details.
1988-04-12 14:31:05 +00:00
.PP
1988-04-19 10:41:05 +00:00
Before we describe the structure of the tables in detail, we will give
an overview of the four main phases.
.IP "phase 1:"
1988-04-12 14:31:05 +00:00
.br
1988-04-13 14:33:11 +00:00
The as_table is transformed by \fBasg\fR. This results in a set of C routines.
1988-04-19 10:41:05 +00:00
Each assembly-opcode generates one C routine. Note that a call to such a
routine does not generate the corresponding object code; it generates C code,
which, when executed, generates the desired object code.
.IP "phase 2:"
1988-04-12 14:31:05 +00:00
.br
1988-04-13 14:33:11 +00:00
The C routines generated by \fBasg\fR are used by emg to expand the EM_table.
1988-04-12 14:31:05 +00:00
This
1988-04-19 10:41:05 +00:00
results in a set of C routines, the code expander, which conform to the
procedural interface EM_CODE(3ACK). A call to such a routine does indeed
generate the desired object code.
.IP "phase 3:"
1988-04-12 14:31:05 +00:00
.br
The front end that uses the procedural interface is linked/loaded with the
1988-04-19 10:41:05 +00:00
code expander generated in phase 2 and the \fBback\fR-primitives (a supplied
library). This results in a compiler.
.IP "phase 4:"
1988-04-12 14:31:05 +00:00
.br
1988-05-03 15:15:28 +00:00
The compiler runs. The routines in the code expander are
1988-04-12 14:31:05 +00:00
executed and produce object code.
.RE
.NH
Description of the EM_table
.PP
1988-05-03 15:15:28 +00:00
This section describes the EM_table. It contains four subsections.
The first 3 sections describe the syntax of the EM_table,
1988-04-13 14:33:11 +00:00
the
1988-05-03 15:15:28 +00:00
semantics of the EM_table, and the functions and
constants that must be present in the EM_table, in the file ``mach.c'' or in
the file ``mach.h''. The last section explains how a table writer can generate
assembly code instead of object code. The section on
1988-04-12 14:31:05 +00:00
semantics contains many examples.
.NH 2
Grammar
.PP
The following grammar describes the syntax of the EM_table.
.VS +4
.TS
center tab(%);
l c l.
TABLE%::=%( RULE)*
1988-05-03 15:15:28 +00:00
RULE%::=%C_instr ( COND_SEQUENCE | SIMPLE)
COND_SEQUENCE%::=%( condition SIMPLE)* ``default'' SIMPLE
SIMPLE%::=% ``==>'' ACTION_LIST
ACTION_LIST%::=%[ ACTION ( ``;'' ACTION)* ] ``.''
1988-04-12 14:31:05 +00:00
ACTION%::=%AS_INSTR
%|%function-call
1988-05-03 15:15:28 +00:00
AS_INSTR%::=%``"'' [ label ``:''] [ INSTR] ``"''
INSTR%::=%mnemonic [ operand ( ``,'' operand)* ]
1988-04-12 14:31:05 +00:00
.TE
.VS -4
.PP
1988-05-03 15:15:28 +00:00
The ``('' ``)'' brackets are used for grouping, ``['' ... ``]''
means ... 0 or 1 time,
a ``*'' means zero or more times, and
a ``|'' means
1988-04-12 14:31:05 +00:00
a choice between left or right. A \fBC_instr\fR is
1988-04-13 14:33:11 +00:00
a name in the EM_CODE(3ACK) interface. \fBcondition\fR is a C expression.
1988-05-03 15:15:28 +00:00
\fBfunction-call\fR is a call of a C function. \fBlabel\fR, \fBmnemonic\fR,
1988-04-13 14:33:11 +00:00
and \fBoperand\fR are arbitrary strings. If an \fBoperand\fR
contains brackets, the
1988-05-03 15:15:28 +00:00
brackets must match. There is an upper bound on the number of
1988-04-19 10:41:05 +00:00
operands; the maximum number is defined by the constant MAX_OPERANDS in de
1988-05-03 15:15:28 +00:00
file ``const.h'' in the directory assemble.c. Comments in the table should be
placed between ``/*'' and ``*/''.
The table is processed by the C preprocessor, before being parsed by
\fBemg\fR.
1988-04-12 14:31:05 +00:00
.NH 2
Semantics
.PP
1988-04-13 14:33:11 +00:00
The EM_table is processed by \fBemg\fR. \fBEmg\fR generates a C function
for every instruction in the EM_CODE(3ACK).
For every EM-instruction not mentioned in the EM_table, a
C function that prints an error message is generated.
1988-05-03 15:15:28 +00:00
It is possible to divide the EM_CODE(3ACK)-interface into four parts :
1988-04-19 10:41:05 +00:00
.IP \0\01:
1988-04-12 14:31:05 +00:00
text instructions (e.g., C_loc, C_adi, ..)
1988-04-19 10:41:05 +00:00
.IP \0\02:
1988-04-12 14:31:05 +00:00
pseudo instructions (e.g., C_open, C_df_ilb, ..)
1988-04-19 10:41:05 +00:00
.IP \0\03:
1988-04-12 14:31:05 +00:00
storage instructions (e.g., C_rom_icon, ..)
1988-04-19 10:41:05 +00:00
.IP \0\04:
1988-04-12 14:31:05 +00:00
message instructions (e.g., C_mes_begin, ..)
.LP
This section starts with giving the semantics of the grammar. The examples
are text instructions. The section ends with remarks on the pseudo
1988-04-13 14:33:11 +00:00
instructions and the storage instructions. Since message instructions are not
1988-04-12 14:31:05 +00:00
useful for a code expander, they are ignored.
.PP
.NH 3
Actions
.PP
1988-05-03 15:15:28 +00:00
The EM_table is made up of rules describing how to expand a \fBC_instr\fR
defined by the EM_CODE(3ACK)-interface (corresponding
to an EM instruction) into actions.
1988-04-13 14:33:11 +00:00
There are two kinds of actions: assembly instructions and C function calls.
1988-04-12 14:31:05 +00:00
An assembly instruction is defined as a mnemonic followed by zero or more
1988-05-03 15:15:28 +00:00
operands separated by commas. The semantics of an assembly instruction is
1988-04-12 14:31:05 +00:00
defined by the table writer. When the assembly language is not expressive
enough, then, as an escape route, function calls can be made. However, this
reduces
the speed of the actual code expander. Finally, actions can be grouped into
a list of actions; actions are separated by a semicolon and terminated
1988-05-03 15:15:28 +00:00
by a ``.''.
1988-04-12 14:31:05 +00:00
.DS
\f5
1988-04-13 14:33:11 +00:00
C_nop ==> .
/* Empty action list : no operation. */
1988-04-12 14:31:05 +00:00
1988-04-13 14:33:11 +00:00
C_inc ==> "incl (sp)".
/* Assembler instruction, which is evaluated
* during expansion of the EM_table */
1988-04-12 14:31:05 +00:00
1988-04-13 14:33:11 +00:00
C_slu ==> C_sli( $1).
/* Function call, which is evaluated during
* execution of the compiler. */
1988-04-12 14:31:05 +00:00
\fR
.DE
.NH 3
Labels
.PP
Since an assembly language without instruction labels is a rather weak
language, labels inside a contiguous block of assembly instructions are
allowed. When using labels two rules must be observed:
1988-04-19 10:41:05 +00:00
.IP \0\01:
1988-04-12 14:31:05 +00:00
The name of a label should be unique inside an action list.
1988-04-19 10:41:05 +00:00
.IP \0\02:
1988-04-12 14:31:05 +00:00
The labels used in an assembler instruction should be defined in the same
action list.
.LP
The following example illustrates the usage of labels.
.DS
\f5
1988-04-13 14:33:11 +00:00
/* Compare the two top elements on the stack. */
C_cmp ==> "pop bx";
"pop cx";
1988-04-12 14:31:05 +00:00
"xor ax, ax";
"cmp cx, bx";
1988-05-03 15:15:28 +00:00
/* Forward jump to local label */
"je 2f";
1988-04-12 14:31:05 +00:00
"jb 1f";
"inc ax";
"jmp 2f";
"1: dec ax";
"2: push ax".
\fR
.DE
We will come back to labels in the section on the as_table.
.NH 3
Arguments of an EM instruction
.PP
In most cases the translation of a \fBC_instr\fR depends on its arguments.
The arguments of a \fBC_instr\fR are numbered from 1 to \fIn\fR, where \fIn\fR
is the
1988-04-13 14:33:11 +00:00
total number of arguments of the current \fBC_instr\fR (there are a few
1988-04-12 14:31:05 +00:00
exceptions, see Implicit arguments). The table writer may
refer to an argument as $\fIi\fR. If a plain $-sign is needed in an
1988-04-19 10:41:05 +00:00
assembly instruction, it must be preceded by a extra $-sign.
1988-04-12 14:31:05 +00:00
.PP
1988-04-13 14:33:11 +00:00
There are two groups of \fBC_instr\fRs whose arguments are handled specially:
1988-04-12 14:31:05 +00:00
.RS
1988-05-03 15:15:28 +00:00
.IP "1: Instructions dealing with local offsets"
1988-04-12 14:31:05 +00:00
.br
1988-05-03 15:15:28 +00:00
The value of the $\fIi\fR argument referring to a parameter ($\fIi\fR >= 0)
is increased by ``EM_BSIZE''. ``EM_BSIZE'' is the size of the return status block
and must be defined in the file ``mach.h'' (see section 3.3). For example :
1988-04-12 14:31:05 +00:00
.DS
\f5
1988-04-13 14:33:11 +00:00
C_lol ==> "push $1(bp)".
/* automatic conversion of $1 */
1988-04-12 14:31:05 +00:00
\fR
.DE
1988-04-19 10:41:05 +00:00
.IP "2: Instructions using global names or instruction labels"
1988-04-12 14:31:05 +00:00
.br
All the arguments referring to global names or instruction labels will be
transformed into a unique assembly name. To prevent name clashes with library
names the table writer has to provide the
1988-05-03 15:15:28 +00:00
conversions in the file ``mach.h''. For example :
1988-04-12 14:31:05 +00:00
.DS
\f5
1988-04-13 14:33:11 +00:00
C_bra ==> "jmp $1".
/* automatic conversion of $1 */
/* type arith is converted to string */
1988-04-12 14:31:05 +00:00
\fR
.DE
.RE
.NH 3
Conditionals
.PP
1988-05-03 15:15:28 +00:00
The rules in the EM_table can be divided into two groups: simple rules and
conditional rules. The simple rules are made up of a \fBC_instr\fR followed by
a list of actions, as described above. The conditional rules (COND_SEQUENCE)
1988-04-12 14:31:05 +00:00
allow the table writer to select an action list depending on the value of
a condition.
.PP
A CONDITIONAL is a list of a boolean expression with the corresponding
simple rule. If
the expression evaluates to true then the corresponding simple rule is carried
1988-04-13 14:33:11 +00:00
out. If more than one condition evaluates to true, the first one is chosen.
1988-05-03 15:15:28 +00:00
The last case of a COND_SEQUENCE of a \fBC_instr\fR must handle
the default case.
The boolean expressions in a COND_SEQUENCE must be C expressions. Besides the
1988-04-13 14:33:11 +00:00
ordinary C operators and constants, $\fIi\fR references can be used
1988-04-12 14:31:05 +00:00
in an expression.
.DS
\f5
1988-04-13 14:33:11 +00:00
/* Load address of LB $1 levels back. */
C_lxl
1988-04-12 14:31:05 +00:00
$1 == 0 ==> "pushl fp".
$1 == 1 ==> "pushl 4(ap)".
default ==> "movl $$$1, r0";
"jsb .lxl";
"pushl r0".
\fR
.DE
.NH 3
Abbreviations
.PP
1988-04-19 10:41:05 +00:00
EM instructions with an external as an argument come in three variants in
1988-04-13 14:33:11 +00:00
the EM_CODE(3ACK) interface. In most cases it will be possible to take
1988-05-03 15:15:28 +00:00
these variants together. For this purpose the ``..'' notation is introduced.
For the code expander there is no difference between the
following instructions.
1988-04-12 14:31:05 +00:00
.DS
\f5
C_loe_dlb ==> "pushl $1 + $2".
C_loe_dnam ==> "pushl $1 + $2".
C_loe ==> "pushl $1 + $2".
1988-05-03 15:15:28 +00:00
\fR
.DE
So it can be written in the following way.
.DS
\f5
1988-04-12 14:31:05 +00:00
C_loe.. ==> "pushl $1 + $2".
\fR
.DE
.NH 3
Implicit arguments
.PP
1988-05-03 15:15:28 +00:00
In the last example ``C_loe'' has two arguments, but in the EM_CODE interface
it has one argument. This argument depends on the current ``hol''
1988-04-13 14:33:11 +00:00
block; in the EM_table this is made explicit. Every \fBC_instr\fR whose
1988-05-03 15:15:28 +00:00
argument depends on a ``hol'' block has one extra argument; argument 1 refers
to the ``hol'' block.
1988-04-12 14:31:05 +00:00
.NH 3
Pseudo instructions
.PP
Most pseudo instructions are machine independent and are provided
1988-05-03 15:15:28 +00:00
by \fBceg\fR. The table writer has only to supply the following functions,
which are used to build a stackframe:
1988-04-12 14:31:05 +00:00
.DS
\f5
prolog()
1988-04-13 14:33:11 +00:00
/* Performs the prolog, for example save
* return address */
1988-04-12 14:31:05 +00:00
locals( n)
arith n;
/* Allocate n bytes for locals on the stack */
jump( label)
char *label;
1988-05-03 15:15:28 +00:00
/* Generates code for a jump to ``label'' */
1988-04-12 14:31:05 +00:00
\fR
.DE
.LP
1988-05-03 15:15:28 +00:00
These functions can be defined in ``mach.c'' or in the EM_table (see
section 3.3).
1988-04-12 14:31:05 +00:00
.NH 3
Storage instructions
.PP
1988-05-03 15:15:28 +00:00
The storage instructions ``C_bss_\fIcstp()\fR'', ``C_hol_\fIcstp()\fR'',
''C_con_\fIcstp()\fR'', and ``C_rom_\fIcstp()\fR'', except for the instructions
dealing with constants of type string (C_..._icon, C_..._ucon, C_..._fcon), are
1988-04-12 14:31:05 +00:00
generated automatically. No information is needed in the table.
To generate the C_..._icon, C_..._ucon, C_..._fcon instructions
\fBceg\fR only has to know how to convert a number of type string to bytes;
this can be defined with the constants ONE_BYTE, TWO_BYTES, and FOUR_BYTES.
C_rom_icon, C_con_icon, C_bss_icon, C_hol_icon can be abbreviated by ..icon.
This also holds for ..ucon and ..fcon.
For example :
.DS
\f5
\\.\\.icon
$2 == 1 ==> gen1( (ONE_BYTE) atoi( $1)).
$2 == 2 ==> gen2( (TWO_BYTES) atoi( $1)).
$2 == 4 ==> gen4( (FOUR_BYTES) atoi( $1)).
default ==> arg_error( "..icon", $2).
\fR
.DE
1988-05-03 15:15:28 +00:00
Gen1(), gen2() and gen4() are \fBback\fR-primitives (see appendix A), and
generate one, two, or four byte constants. Atoi() is a C library function that
1988-04-12 14:31:05 +00:00
converts strings to integers.
1988-05-03 15:15:28 +00:00
The constants ``ONE_BYTE'', ``TWO_BYTES'', and ``FOUR_BYTES'' must be defined in
the file ``mach.h''.
1988-04-12 14:31:05 +00:00
.NH 2
User supplied definitions and functions
.PP
If the table writer uses all the default functions he has only to supply
the following constants and functions :
.TS
tab(#);
l c lw(10c).
prolog()#:#T{
Do prolog
T}
jump( l)#:#T{
Perform a jump to label l
T}
locals( n)#:#T{
Allocate n bytes on the stack
T}
#
NAME_FMT#:#T{
Print format describing name to a unique name conversion. The format must
contain %s.
T}
DNAM_FMT#:#T{
Print format describing data-label to a unique name conversion. The format
must contain %s.
T}
DLB_FMT#:#T{
Print format describing numerical-data-label to a unique name conversion.
1988-04-19 11:41:16 +00:00
The format must contain a %ld.
1988-04-12 14:31:05 +00:00
T}
ILB_FMT#:#T{
Print format describing instruction-label to a unique name conversion.
The format must contain %d followed by %ld.
T}
HOL_FMT#:#T{
Print format describing hol-block-number to a unique name conversion.
The format must contain %d.
T}
#
EM_WSIZE#:#T{
Size of a word in bytes on the target machine
T}
EM_PSIZE#:#T{
Size of a pointer in bytes on the target machine
T}
EM_BSIZE#:#T{
Size of base block in bytes on the target machine
T}
#
ONE_BYTE#:#T{
1988-05-03 15:15:28 +00:00
\\C type that occupies one byte on the machine where the \fBce\fR runs
1988-04-12 14:31:05 +00:00
T}
TWO_BYTES#:#T{
1988-05-03 15:15:28 +00:00
\\C type that occupies two bytes on the machine where the \fBce\fR runs
1988-04-12 14:31:05 +00:00
T}
FOUR_BYTES#:#T{
1988-05-03 15:15:28 +00:00
\\C type that occupies four bytes on the machine where the \fBce\fR runs
1988-04-12 14:31:05 +00:00
T}
#
BSS_INIT#:#T{
1988-05-03 15:15:28 +00:00
The default value that the loader puts in the bss segment
1988-04-12 14:31:05 +00:00
T}
#
BYTES_REVERSED#:#T{
Must be defined if you want the byte order reversed.
1988-05-03 15:15:28 +00:00
By default the least significant byte is outputted first.\fR\(dg
.FS
\fR\(dg When both byte orders are used, for
example NS 16032, the table writer has to
1988-04-13 14:33:11 +00:00
supply his own set of routines.
.FE
1988-04-12 14:31:05 +00:00
T}
WORD_REVERSED#:#T{
Must be defined if you want the word order reversed.
By default the least significant word is outputted first.
T}
.TE
.LP
1988-05-03 15:15:28 +00:00
An example of the file ``mach.h'' for the vax4.
1988-04-12 14:31:05 +00:00
.TS
tab(:);
l l l.
#define : ONE_BYTE : char
#define : TWO_BYTES : short
#define : FOUR_BYTES : long
:
#define : EM_WSIZE : 4
#define : EM_PSIZE : 4
#define : EM_BSIZE : 0
:
#define : BSS_INIT : 0
:
#define : NAME_FMT : "_%s"
#define : DNAM_FMT : "_%s"
#define : DLB_FMT : "_%ld"
#define : ILB_FMT : "I%03d%ld"
#define : HOL_FMT : "hol%d"
.TE
1988-05-03 15:15:28 +00:00
Notice that EM_BSIZE is zero. The vax ``call'' instruction takes automatically
care of the base block.
1988-04-12 14:31:05 +00:00
.PP
1988-05-03 15:15:28 +00:00
There are three primitives that have to be defined by the table writer, either
as functions in the file ``mach.c'' or as rules in the EM_table.
For example, for the 8086 they look like this:
1988-04-12 14:31:05 +00:00
.DS
\f5
jump ==> "jmp $1".
prolog ==> "push bp";
"mov bp, sp".
locals
1988-05-03 15:15:28 +00:00
$1 == 0 ==> .
1988-04-12 14:31:05 +00:00
$1 == 2 ==> "push ax".
$1 == 4 ==> "push ax";
"push ax".
default ==> "sub sp, $1".
\fR
.DE
.NH 2
1988-04-19 10:41:05 +00:00
Generating assembly code
1988-04-12 14:31:05 +00:00
.PP
1988-04-19 10:41:05 +00:00
When the code expander generator is used for generating assembly instead of
1988-05-03 15:15:28 +00:00
object code (see section 5), not all the above mentioned constants
and functions have to
be defined. In this
case, the constants ``BYTES_REVERSED'' and ``WORDS_REVERSED'' are not used.
1988-04-12 14:31:05 +00:00
.NH 1
Description of the as_table
.PP
1988-05-03 15:15:28 +00:00
This section describes the as_table. Like the previous section, it is divided
into
four parts: the first two parts describe the grammar and the semantics of the
as_table; the third part gives an overview
of the functions and the constants that must be present in the as_table (in
the file ``as.h'' or in the file ``as.c''); the last part describes the case when
1988-04-12 14:31:05 +00:00
assembly is generated instead of object code.
1988-05-03 15:15:28 +00:00
The part on semantics contains examples that appear in the as_table for the
1988-04-12 14:31:05 +00:00
VAX or for the 8086.
.NH 2
Grammar
.PP
1988-04-19 10:41:05 +00:00
The form of the as_table is given by the following grammar :
1988-04-12 14:31:05 +00:00
.VS +4
.TS
center tab(#);
l c l.
TABLE#::=#( RULE)*
1988-05-03 15:15:28 +00:00
RULE#::=#( mnemonic | ``...'') DECL_LIST ``==>'' ACTION_LIST
DECL_LIST#::=#DECLARATION ( ``,'' DECLARATION)*
DECLARATION#::=#operand [ ``:'' type]
ACTION_LIST#::=#ACTION ( ``;'' ACTION) ``.''
1988-04-12 14:31:05 +00:00
ACTION#::=#IF_STATEMENT
#|#function-call
1988-05-03 15:15:28 +00:00
#|#``@''function-call
IF_STATEMENT#::=#''@if'' ``('' condition ``)'' ACTION_LIST
##( ``@elsif'' ``('' condition ``)'' ACTION_LIST)*
##[ ``@else'' ACTION_LIST]
##''@fi''
function-call#::=#function-identifier ``('' [arg (,arg)*] ``)''
arg#::=#argument
#|#reference
1988-04-12 14:31:05 +00:00
.TE
.VS -4
.LP
1988-05-03 15:15:28 +00:00
\fBmnemonic\fR, \fBoperand\fR, and \fBtype\fR are all C identifiers;
\fBcondition\fR is a normal C expression;
\fBfunction-call\fR must be a C function call. A function can be called with
standard C arguments or with a reference (see section 4.2.4).
1988-04-19 10:41:05 +00:00
Since the as_table is
1988-05-03 15:15:28 +00:00
interpreted during code expander generation as well as during code
expander execution, two levels of calls are present in it. A ``function-call''
is done during code expander generation, a ``@function-call'' during code
1988-04-19 10:41:05 +00:00
expander execution.
1988-04-12 14:31:05 +00:00
.NH 2
Semantics
.PP
1988-05-03 15:15:28 +00:00
The as_table is made up of rules that map assembly instructions onto
1988-04-13 14:33:11 +00:00
\fBback\fR-primitives, a set of functions that construct an object file.
1988-05-03 15:15:28 +00:00
The table is processed by \fBasg\fR, which generates a C functions
for each assembler mnemonic. The names of
these functions are the assembler mnemonics postfixed
with ``_instr'' (e.g., ``add'' becomes ``add_instr()''). These functions
will be used by the function
1988-04-12 14:31:05 +00:00
assemble() during the expansion of the EM_table.
1988-04-19 10:41:05 +00:00
After explaining the semantics of the as_table the function
1988-04-12 14:31:05 +00:00
assemble() will be described.
.NH 3
Rules
.PP
1988-05-03 15:15:28 +00:00
A rule in the as_table is made up of a left and a right hand side;
the left hand side describes an assembler
instruction (mnemonic and operands); the
right hand side gives the corresponding actions as \fBback\fR-primitives or as
functions defined by the table writer, which call \fBback-primitives\fR.
Two simple examples from the VAX as_table and the 8086 as_table, resp.:
.DS
1988-04-12 14:31:05 +00:00
\f5
movl src, dst ==> @text1( 0xd0);
1988-04-13 14:33:11 +00:00
gen_operand( src);
gen_operand( dst).
1988-05-03 15:15:28 +00:00
/* ``gen_operand'' is a function that encodes
1988-04-13 14:33:11 +00:00
* operands by calling back-primitives. */
1988-04-12 14:31:05 +00:00
rep ens:MOVS ==> @text1( 0xf3);
@text1( 0xa5).
\fR
.DE
.NH 3
Declaration of types.
.PP
1988-05-03 15:15:28 +00:00
In general, a machine instruction is encoded as an opcode followed by zero or
more
the operands. There are two methods for mapping assembler mnemonics
1988-04-13 14:33:11 +00:00
onto opcodes: the mnemonic determines the opcode, or mnemonic and operands
1988-05-03 15:15:28 +00:00
together determine the opcode. Both cases can be
easily expressed in the as_table.
The first case is obvious.
The second case is handled by introducing type fields for the operands.
.PP
1988-04-19 10:41:05 +00:00
When mnemonic and operands together determine the opcode, the table writer has
1988-04-12 14:31:05 +00:00
to give several rules for each combination of mnemonic and operands. The rules
differ in the type fields of the operands.
The table writer has to supply functions that check the type
1988-04-13 14:33:11 +00:00
of the operand. The name of such a function is the name of the type; it
1988-05-03 15:15:28 +00:00
has one argument: a pointer to a struct of type \fIt_operand\fR; it returns
1988-04-19 10:41:05 +00:00
non-zero when the operand is of this type, otherwise it returns 0.
1988-05-03 15:15:28 +00:00
.PP
1988-04-12 14:31:05 +00:00
This will usually lead to a list of rules per mnemonic. To reduce the amount of
1988-04-19 10:41:05 +00:00
work an abbreviation is supplied. Once the mnemonic is specified it can be
1988-05-03 15:15:28 +00:00
referred to in the following rules by ``...''.
1988-04-12 14:31:05 +00:00
One has to make sure
1988-05-03 15:15:28 +00:00
that each mnemonic is mentioned only once in the as_table, otherwise
1988-04-19 10:41:05 +00:00
\fBasg\fR will generate more than one function with the same name.
1988-05-03 15:15:28 +00:00
.PP
1988-04-12 14:31:05 +00:00
The following example shows the usage of type fields.
1988-05-03 15:15:28 +00:00
.DS
1988-04-12 14:31:05 +00:00
\f5
1988-05-03 15:15:28 +00:00
mov dst:REG, src:EADDR ==>
@text1( 0x8b); /* opcode */
mod_RM( %d(dst->reg), src). /* operands */
1988-04-12 14:31:05 +00:00
1988-05-03 15:15:28 +00:00
... dst:EADDR, src:REG ==>
@text1( 0x89); /* opcode */
mod_RM( %d(src->reg), dst). /* operands */
1988-04-12 14:31:05 +00:00
\fR
.DE
The table-writer must supply the restriction functions, \f5REG\fR and
1988-05-03 15:15:28 +00:00
\f5EADDR\fR in the previous example, in ``as.c'' or ''as.h''.
1988-04-12 14:31:05 +00:00
.NH 3
The function of the @-sign and the if-statement.
.PP
1988-05-03 15:15:28 +00:00
The right hand side of a rule is made up of function calls.
1988-04-19 10:41:05 +00:00
Since the as_table is
interpreted on two levels, during code expander generation and during code
expander execution, two levels of calls are present in it. A function-call
1988-05-03 15:15:28 +00:00
without an ``@''-sign
1988-04-19 10:41:05 +00:00
is called during code expander generation (e.g., the \f5gen_operand()\fR in the
first example).
1988-05-03 15:15:28 +00:00
A function call with an ``@''-sign is called during code
expander execution (e.g.,
the \fBback\fR-primitives). So the last group will be part of the compiler.
.PP
The need for the ``@''-sign construction arises, for example, when you
implement push/pop optimization (e.g., ``push x'' followed by ``pop y''
can be replaced by ``move x, y'').
In this case flags need to be set, unset, and tested during the execution of
1988-04-12 14:31:05 +00:00
the compiler:
.DS L
\f5
1988-04-13 14:33:11 +00:00
PUSH src ==> /* save in ax */
mov_instr( AX_oper, src);
/* set flag */
@assign( push_waiting, TRUE).
1988-04-12 14:31:05 +00:00
POP dst ==> @if ( push_waiting)
1988-05-03 15:15:28 +00:00
/* ``mov_instr'' is asg-generated */
1988-04-13 14:33:11 +00:00
mov_instr( dst, AX_oper);
1988-04-12 14:31:05 +00:00
@assign( push_waiting, FALSE).
@else
1988-05-03 15:15:28 +00:00
/* ``pop_instr'' is asg-generated */
1988-04-13 14:33:11 +00:00
pop_instr( dst).
1988-04-12 14:31:05 +00:00
@fi.
\fR
.DE
1988-05-03 15:15:28 +00:00
.LP
Although the @-sign is followed syntactically by a
function name, this function can very well be the name of a macro defined in C.
This is in fact the case with ``@assign()'' in the above example.
1988-04-12 14:31:05 +00:00
.PP
1988-05-03 15:15:28 +00:00
The case may arise when information is needed that is not known
until execution of
the compiler. For example one needs to know if a ``$\fIi\fR'' argument fits in
1988-04-12 14:31:05 +00:00
one byte.
1988-05-03 15:15:28 +00:00
In this case one can use a special if-statement provided
by \fBasg\fR: @if, @elsif, @else, @fi. This means that the conditions
will be evaluated at
run time of the \fBce\fR. In such a condition one may of course refer
to the ''$\fIi\fR'' arguments. For example, constants can be
packed into one or two byte arguments as follows:
.DS
1988-04-12 14:31:05 +00:00
\f5
1988-05-03 15:15:28 +00:00
mov dst:ACCU, src:DATA ==>
@if ( fits_byte( %$(dst->expr)))
@text1( 0xc0);
@text1( %$(dst->expr)).
@else
@text1( 0xc8);
@text2( %$(dst->expr)).
@fi.
1988-04-12 14:31:05 +00:00
.DE
.NH 3
References to operands
.PP
1988-04-19 10:41:05 +00:00
As noted before, the operands of an assembler instruction may be used as
1988-05-03 15:15:28 +00:00
pointers to the struct \fIt_operand\fR in the right hand side of the table.
1988-04-12 14:31:05 +00:00
Because of the free format assembler, the types of the fields in the struct
1988-05-03 15:15:28 +00:00
\fIt_operand\fR are unknown to \fBasg\fR. As these fields can appear in calls
to functions, \fBasg\fR must know
these types. This section explains how these types must be specified.
.PP
1988-04-12 14:31:05 +00:00
References to operands come in three forms: ordinary operands, operands that
1988-05-03 15:15:28 +00:00
contain ``$\fIi\fR'' references, and operands that refer to names of local labels.
The ``$\fIi\fR'' in operands represent names or numbers of a \fBC_instr\fR and must
1988-04-12 14:31:05 +00:00
be given as arguments to the \fBback\fR-primitives. Labels in operands
must be converted to a number that tells the distance, the number of bytes,
between the label and the current position in the text-segment.
.LP
All these three cases are treated in an uniform way. When the table writer
makes a reference to an operand of an assembly instruction, he must describe
the type of the operand in the following way.
1988-05-03 15:15:28 +00:00
.VS +4
.TS
center tab(#);
l c l.
reference#::=#``%'' conversion
##``('' operand-name ``\(->'' field-name ``)''
conversion#::=# printformat
#|#``$''
#|#``dist''
printformat#::=#see PRINT(3ACK)
.[
1988-04-19 11:41:16 +00:00
PRINT
1988-05-03 15:15:28 +00:00
.]
.TE
.VS -4
1988-04-12 14:31:05 +00:00
.LP
1988-05-03 15:15:28 +00:00
The three cases differ only in the conversion field. The printformat conversion
applies to ordinary operands. The ``$%'' applies to operands that contain
a ``$\fIi\fR''. The expression between parentheses must result in a pointer to
a char. The
result of ``%$'' is of the type of ``$\fIi\fR''. The ``%dist''
applies to operands that refer to a local label. The expression between
the brackets must result in a pointer to a char. The result of ``%dist'' is
of type arith.
.PP
The following example illustrates the usage of ``%$''. (For an
example that illustrates the usage of ordinary fields see
the section on ``User supplied definitions and functions'').
.DS
1988-04-12 14:31:05 +00:00
\f5
1988-05-03 15:15:28 +00:00
jmp dst ==>
@text1( 0xe9);
@reloc2( %$(dst->lab), %$(dst->off), PC_REL).
1988-04-12 14:31:05 +00:00
\fR
.DE
1988-05-03 15:15:28 +00:00
.PP
1988-04-12 14:31:05 +00:00
A useful function concerning $\fIi\fRs is arg_type(), which takes as input a
1988-05-03 15:15:28 +00:00
string starting with $\fIi\fR and returns the type of the \fIi\fR''th argument
1988-04-12 14:31:05 +00:00
of the current EM-instruction, which can be STRING, ARITH or INT. One may need
1988-04-13 14:33:11 +00:00
this function while decoding operands if the context of the $\fIi\fR does not
1988-04-12 14:31:05 +00:00
give enough information.
If the function arg_type() is used, the file
arg_type.h must contain the definition of STRING, ARITH and INT.
1988-05-03 15:15:28 +00:00
.PP
1988-04-12 14:31:05 +00:00
%dist is only guaranteed to work when called as a parameter of text1(), text2() or text4().
The goal of the %dist conversion is to reduce the number of reloc1(), reloc2()
and reloc4()
1988-04-19 10:41:05 +00:00
calls, saving space and time (no relocation at compiler run time).
1988-05-03 15:15:28 +00:00
The following example illustrates the usage of ``%dist''.
.DS
1988-04-12 14:31:05 +00:00
\f5
1988-05-03 15:15:28 +00:00
jmp dst:ILB ==> /* label in an instruction list */
@text1( 0xeb);
@text1( %dist( dst->lab)).
1988-04-12 14:31:05 +00:00
1988-05-03 15:15:28 +00:00
... dst:LABEL ==> /* global label */
@text1( 0xe9);
@reloc2( %$(dst->lab), %$(dst->off), PC_REL).
1988-04-12 14:31:05 +00:00
\fR
.DE
.NH 3
1988-04-19 10:41:05 +00:00
The functions assemble() and block_assemble()
1988-04-12 14:31:05 +00:00
.PP
1988-05-03 15:15:28 +00:00
The functions assemble() and block_assemble() are provided by \fBceg\fR.
If, however, the table writer is not satisfied with the way they work
he can
1988-04-19 10:41:05 +00:00
supply his own assemble() or block_assemble().
1988-05-03 15:15:28 +00:00
The default function assemble() splits an assembly string into a
label, mnemonic,
1988-04-12 14:31:05 +00:00
and operands and performs the following actions on them:
1988-04-19 10:41:05 +00:00
.IP \0\01:
1988-04-13 14:33:11 +00:00
It processes the local label; it records the name and current position. Thereafter it calls the function process_label() with one argument of type string,
1988-04-12 14:31:05 +00:00
the label. The table writer has to define this function.
1988-04-19 10:41:05 +00:00
.IP \0\02:
1988-04-12 14:31:05 +00:00
Thereafter it calls the function process_mnemonic() with one argument of
type string, the mnemonic. The table writer has to define this function.
1988-04-19 10:41:05 +00:00
.IP \0\03:
1988-04-12 14:31:05 +00:00
It calls process_operand() for each operand. Process_operand() must be
written by the table-writer since no fixed representation for operands
1988-05-03 15:15:28 +00:00
is enforced. It has two arguments: a string (the operand to decode)
and a pointer to the struct \fIt_operand\fR. The declaration of the struct
\fIt_operand\fR must be given in the
file ``as.h'', and the table-writer can put all the information needed for
encoding the operand in machine format in it.
1988-04-19 10:41:05 +00:00
.IP \0\04:
1988-04-12 14:31:05 +00:00
It examines the mnemonic and calls the associated function, generated by
\fBasg\fR, with pointers to the decoded operands as arguments. This makes it
possible to use the decoded operands in the right hand side of a rule (see
below).
1988-05-03 15:15:28 +00:00
.LP
If the default assemble() does not work the way the table writer wants, he
can supply his own version of it. Assemble() has the following arguments:
.DS
\f5
assemble( instruction )
char *instruction;
\fR
.DE
\fIinstruction\fR points to a null-terminated string.
1988-04-12 14:31:05 +00:00
.PP
The default function block_assemble() is called with a sequence of assembly
1988-05-03 15:15:28 +00:00
instructions that belong to one action list. It calls assemble() for
every assembly instruction in
this block. But if a special action is
required on a block of assembly instructions, the table writer only has to
1988-04-19 10:41:05 +00:00
rewrite this function to get a new \fBceg\fR that obliges to his wishes.
1988-05-03 15:15:28 +00:00
The function block_assemble has the following arguments:
.DS
\f5
block_assemble( instructions, nr, first, last)
char **instruction;
int nr, first, last;
\fR
.DE
\fIInstruction\fR point to an array of pointers to strings representing
assembly instructions. \fINr\fR is
the number of instructions that must be assembled. \fIFirst\fR
and \fIlast\fR have no function in the default block_assemble(), but are
useful when optimizations are done in block_assemble().
1988-04-12 14:31:05 +00:00
.PP
1988-05-03 15:15:28 +00:00
Four things have to be specified in ``as.h'' and ``as.c''. First the user must
give the declaration of struct \fIt_operand\fR in ``as.h'', and the functions
process_operand(), process_mnemonic(), and process_label() must be given
in ``as.c''. If the right hand side of the as_table
1988-04-12 14:31:05 +00:00
contains function calls other than the \fBback\fR-primitives, these functions
1988-05-03 15:15:28 +00:00
must also be present in ``as.c''. Note that both the ``@''-sign (see 4.2.3)
and ``references'' (see 4.2.4) also work in the functions defined in ``as.c''.
.PP
The following example shows the representative and essential parts of the
8086 ``as.h'' and ``as.c'' files.
.DS L
1988-04-12 14:31:05 +00:00
.nr PS 10
.nr VS 12
\f5
1988-04-19 11:41:16 +00:00
/* Constants and type definitions in as.h */
1988-04-13 14:33:11 +00:00
#define UNKNOWN 0
1988-04-19 10:41:05 +00:00
#define IS_REG 0x1
1988-04-13 14:33:11 +00:00
#define IS_ACCU 0x2
#define IS_DATA 0x4
1988-04-19 10:41:05 +00:00
#define IS_LABEL 0x8
#define IS_MEM 0x10
1988-04-13 14:33:11 +00:00
#define IS_ADDR 0x20
1988-04-19 10:41:05 +00:00
#define IS_ILB 0x40
1988-04-13 14:33:11 +00:00
#define AX 0
#define BX 3
#define CL 1
#define SP 4
#define BP 5
#define SI 6
#define DI 7
1988-04-19 10:41:05 +00:00
#define REG( op) ( op->type & IS_REG)
1988-04-13 14:33:11 +00:00
#define ACCU( op) ( op->type & IS_REG && op->reg == AX)
1988-04-19 10:41:05 +00:00
#define REG_CL( op) ( op->type & IS_REG && op->reg == CL)
1988-04-13 14:33:11 +00:00
#define DATA( op) ( op->type & IS_DATA)
1988-04-19 10:41:05 +00:00
#define LABEL( op) ( op->type & IS_LABEL)
#define ILB( op) ( op->type & IS_ILB)
#define MEM( op) ( op->type & IS_MEM)
1988-04-13 14:33:11 +00:00
#define ADDR( op) ( op->type & IS_ADDR)
1988-04-19 10:41:05 +00:00
#define EADDR( op) ( op->type & ( IS_ADDR | IS_MEM | IS_REG))
#define CONST1( op) ( op->type & IS_DATA && strcmp( "1", op->expr) == 0)
1988-04-13 14:33:11 +00:00
#define MOVS( op) ( op->type & IS_LABEL&&strcmp("\"movs\"", op->lab) == 0)
1988-05-03 15:15:28 +00:00
#define IMMEDIATE( op) ( op->type & ( IS_DATA | IS_LABEL))
1988-04-13 14:33:11 +00:00
1988-04-12 14:31:05 +00:00
struct t_operand {
1988-04-13 14:33:11 +00:00
unsigned type;
int reg;
char *expr, *lab, *off;
1988-04-12 14:31:05 +00:00
};
1988-04-13 14:33:11 +00:00
extern struct t_operand saved_op, *AX_oper;
1988-04-12 14:31:05 +00:00
\fR
1988-05-03 15:15:28 +00:00
.nr PS 12
.nr VS 14
1988-04-12 14:31:05 +00:00
.DE
.DS L
1988-05-03 15:15:28 +00:00
.nr PS 10
.nr VS 12
1988-04-12 14:31:05 +00:00
\f5
1988-04-19 11:41:16 +00:00
/* Some functions in as.c. */
1988-04-12 14:31:05 +00:00
#include "arg_type.h"
1988-04-13 14:33:11 +00:00
#include "as.h"
1988-04-12 14:31:05 +00:00
1988-04-19 10:41:05 +00:00
#define last( s) ( s + strlen( s) - 1)
#define LEFT '('
1988-04-13 14:33:11 +00:00
#define RIGHT ')'
1988-04-19 10:41:05 +00:00
#define DOLLAR '$'
1988-04-13 14:33:11 +00:00
process_label( l)
char *l;
{
}
1988-04-12 14:31:05 +00:00
1988-04-13 14:33:11 +00:00
process_mnemonic( m)
char *m;
{
}
process_operand( str, op)
1988-04-12 14:31:05 +00:00
char *str;
struct t_operand *op;
1988-04-13 14:33:11 +00:00
/* expr -> IS_DATA en IS_LABEL
* reg -> IS_REG en IS_ACCU
* (expr) -> IS_ADDR
* expr(reg) -> IS_MEM
1988-04-12 14:31:05 +00:00
*/
{
char *ptr, *index();
op->type = UNKNOWN;
1988-04-13 14:33:11 +00:00
if ( *last( str) == RIGHT) {
1988-04-12 14:31:05 +00:00
ptr = index( str, LEFT);
1988-04-13 14:33:11 +00:00
*last( str) = '\0';
*ptr = '\0';
if ( is_reg( ptr+1, op)) {
1988-04-12 14:31:05 +00:00
op->type = IS_MEM;
1988-04-13 14:33:11 +00:00
op->expr = ( *str == '\0' ? "0" : str);
1988-04-12 14:31:05 +00:00
}
else {
1988-04-13 14:33:11 +00:00
set_label( ptr+1, op);
1988-04-12 14:31:05 +00:00
op->type = IS_ADDR;
}
}
else
if ( is_reg( str, op))
op->type = IS_REG;
else {
if ( contains_label( str))
set_label( str, op);
else {
op->type = IS_DATA;
op->expr = str;
}
}
}
1988-04-19 11:41:16 +00:00
/*********************************************************************/
1988-04-12 14:31:05 +00:00
mod_RM( reg, op)
int reg;
struct t_operand *op;
1988-04-19 10:41:05 +00:00
/* This function helps to decode operands in machine format.
* Note the $-operators
*/
1988-04-12 14:31:05 +00:00
{
if ( REG( op))
1988-04-13 14:33:11 +00:00
R233( 0x3, reg, op->reg);
1988-04-12 14:31:05 +00:00
else if ( ADDR( op)) {
1988-04-13 14:33:11 +00:00
R233( 0x0, reg, 0x6);
@reloc2( %$(op->lab), %$(op->off), ABSOLUTE);
1988-04-12 14:31:05 +00:00
}
else if ( strcmp( op->expr, "0") == 0)
switch( op->reg) {
1988-04-13 14:33:11 +00:00
case SI : R233( 0x0, reg, 0x4);
1988-04-12 14:31:05 +00:00
break;
1988-04-13 14:33:11 +00:00
case DI : R233( 0x0, reg, 0x5);
1988-04-12 14:31:05 +00:00
break;
1988-04-19 10:41:05 +00:00
case BP : R233( 0x1, reg, 0x6); /* exception! */
1988-04-12 14:31:05 +00:00
@text1( 0);
break;
1988-04-13 14:33:11 +00:00
case BX : R233( 0x0, reg, 0x7);
1988-04-12 14:31:05 +00:00
break;
1988-04-13 14:33:11 +00:00
default : fprint( STDERR, "Wrong index register %d\n",
1988-04-12 14:31:05 +00:00
op->reg);
}
else {
1988-04-13 14:33:11 +00:00
@if ( fit_byte( %$(op->expr)))
switch( op->reg) {
case SI : R233( 0x1, reg, 0x4);
break;
case DI : R233( 0x1, reg, 0x5);
break;
case BP : R233( 0x1, reg, 0x6);
break;
case BX : R233( 0x1, reg, 0x7);
break;
default : fprint( STDERR, "Wrong index register %d\n",
op->reg);
}
@text1( %$(op->expr));
@else
switch( op->reg) {
case SI : R233( 0x2, reg, 0x4);
break;
case DI : R233( 0x2, reg, 0x5);
break;
case BP : R233( 0x2, reg, 0x6);
break;
case BX : R233( 0x2, reg, 0x7);
break;
default : fprint( STDERR, "Wrong index register %d\n",
op->reg);
}
@text2( %$(op->expr));
@fi
}
}
1988-04-12 14:31:05 +00:00
\fR
1988-04-19 10:41:05 +00:00
.nr PS 12
.nr VS 14
1988-05-03 15:15:28 +00:00
.DE
1988-04-12 14:31:05 +00:00
.NH 2
1988-05-03 15:15:28 +00:00
Generating assembly code
1988-04-12 14:31:05 +00:00
.PP
1988-04-19 10:41:05 +00:00
It is possible to generate assembly instead of object files (see section 5), in
1988-05-03 15:15:28 +00:00
which case there is no need to supply ``as_table'', ``as.h'', and ``as.c''.
1988-04-13 14:33:11 +00:00
This option is useful for debugging the EM_table.
1988-04-12 14:31:05 +00:00
.NH 1
1988-05-03 15:15:28 +00:00
Building a code expander
1988-04-12 14:31:05 +00:00
.PP
1988-05-03 15:15:28 +00:00
This section describes how to generate a code expander in two phases.
In phase one, the EM_table is
written and assembly code is generated. If the assembly code is an actual
language, the EM_table can be tested by assembling and running the generated
code.
If an ad-hoc assembly language is used by the table writer, it is not possible
to test the EM_table, but the code generated is at least in readable form.
In the second phase, the as_table is written and object code is generated.
After the generated object code is fed into the loader, it can be tested.
1988-04-12 14:31:05 +00:00
.NH 2
Phase one
.PP
1988-05-03 15:15:28 +00:00
The following is a list of instructions to make a
1988-04-19 10:41:05 +00:00
code expander that generates assembly instructions.
.IP \0\01:
1988-04-12 14:31:05 +00:00
Create a new directory.
1988-04-19 10:41:05 +00:00
.IP \0\02:
1988-05-03 15:15:28 +00:00
Create the ``EM_table'', ``mach.h'', and ``mach.c'' files; there is no need
for ``as_table'', ``as.h'', and ``as.c'' at this moment.
1988-04-19 10:41:05 +00:00
.IP \0\03:
1988-04-12 14:31:05 +00:00
type
.br
\f5
install_ceg -as
\fR
.br
1988-05-03 15:15:28 +00:00
install_ceg will create a Makefile and three directories : ceg, ce, and back.
1988-04-12 14:31:05 +00:00
Ceg will contain the program ceg; this program will be
1988-05-03 15:15:28 +00:00
used to turn ``EM_table'' into a set of C source files (in the ce directory),
one for each
1988-04-12 14:31:05 +00:00
EM-instruction. All these files will be compiled and put in a library called
\fBce.a\fR.
.br
1988-05-03 15:15:28 +00:00
The option \f5-as\fR means that a \fBback\fR-library will be
generated (in the directory ``back'') that
supports the generation of assembly language. The library is named ``back.a''.
1988-04-19 10:41:05 +00:00
.IP \0\04:
1988-05-03 15:15:28 +00:00
Link a front end, ``ce.a'', and ``back.a'' together resulting in a compiler
that generates assembly code.
1988-04-12 14:31:05 +00:00
.LP
1988-05-03 15:15:28 +00:00
If the table writer has chosen an actual assembly language, the EM_table can be
tested (e.g., by running the compiler on the EM test set). If an error occurs,
change the EM_table and type
.IP
.br
1988-04-12 14:31:05 +00:00
\f5update\fR \fBC_instr\fR
1988-05-03 15:15:28 +00:00
.br
.LP
where \fBC_instr\fR stands for the name of the erroneous EM-instruction.
If the table writer has chosen an ad-hoc assembly language, he can at least
read the generated code and look for possible errors. If an error is found,
the same procedure as described above can be followed.
1988-04-12 14:31:05 +00:00
.NH 2
Phase two
.PP
The next phase is to generate a \fBce\fR that produces relocatable object
code.
1988-04-19 10:41:05 +00:00
.IP \0\01:
1988-05-03 15:15:28 +00:00
Remove the ``ce'' and ``ceg'' directories.
1988-04-19 10:41:05 +00:00
.IP \0\02:
1988-05-03 15:15:28 +00:00
Write the ``as_table'', ``as.h'', and ``as.c'' files.
1988-04-19 10:41:05 +00:00
.IP \0\03:
1988-04-12 14:31:05 +00:00
type
.br
1988-05-03 15:15:28 +00:00
\f5 install_ceg -obj \fR
1988-04-12 14:31:05 +00:00
.br
1988-05-03 15:15:28 +00:00
The option \f5-obj\fR means that ``back.a'' will contain a library
for generating
ACK.OUT(5ACK) object files, see appendix B.
If the writer does not want to use the default ``back.a'',
the \f5-obj\fR flag must omitted and a ``back.a'' should be supplied that
generates the generates object code in the desired format.
1988-04-19 10:41:05 +00:00
.IP \0\04:
1988-05-03 15:15:28 +00:00
Link a front end, ``ce.a'', and ``back.a'' together resulting in a compiler
that generates object code.
1988-04-12 14:31:05 +00:00
.LP
1988-05-03 15:15:28 +00:00
The as_table is ready to be tested. If an error occurs, adapt the table.
1988-04-12 14:31:05 +00:00
Then there are two ways to proceed:
1988-04-19 10:41:05 +00:00
.IP \0\01:
1988-04-12 14:31:05 +00:00
recompile the whole EM_table,
.br
\f5
update ALL
\fR
1988-04-19 10:41:05 +00:00
.IP \0\02:
1988-04-12 14:31:05 +00:00
recompile just the few EM-instructions that contained the error,
\f5
.br
1988-04-13 14:33:11 +00:00
update \fBC_instr\fR
1988-05-03 15:15:28 +00:00
.br
where \fBC_instr\fR is an erroneous EM-instruction.
1988-04-13 14:33:11 +00:00
This has to be done for every EM-instruction that contained the erroneous
assembly instruction.
1988-04-12 14:31:05 +00:00
.NH
1988-04-19 10:41:05 +00:00
Acknowledgements
1988-05-03 15:15:28 +00:00
.PP
We want to thank Henri Bal, Dick Grune, and Ceriel Jacobs for their
1988-04-19 10:41:05 +00:00
valuable suggestions and the critical reading of this paper.
.NH
1988-04-12 14:31:05 +00:00
References
1988-04-13 14:33:11 +00:00
.LP
.[
$LIST$
.]
1988-04-12 14:31:05 +00:00
.bp
.SH
Appendix A, \fRthe \fBback\fR-primitives
.PP
1988-04-19 10:41:05 +00:00
This appendix describes the routines available to generate relocatable
1988-04-12 14:31:05 +00:00
object code. If the default back.a is used, the object code is in
1988-05-03 15:15:28 +00:00
ACK.OUT(5ACK) format.
1988-04-12 14:31:05 +00:00
.nr PS 10
.nr VS 12
.PP
.IP A1.
Text and data generation; with ONE_BYTE b; TWO_BYTES w; FOUR_BYTES l; arith n;
.VS +4
.TS
tab(#);
l c lw(10c).
text1( b)#:#T{
Put one byte in text-segment.
T}
text2( w)#:#T{
Put word (two bytes) in text-segment, byte-order is defined by
BYTES_REVERSED in mach.h.
T}
text4( l)#:#T{
Put long ( two words) in text-segment, word-order is defined by
WORDS_REVERSED in mach.h.
T}
#
con1( b)#:#T{
Same for CON-segment.
T}
con2( w)#:
con4( l)#:
#
rom1( b)#:#T{
Same for ROM-segment.
T}
rom2( w)#:
rom4( l)#:
#
gen1( b)#:#T{
1988-05-03 15:15:28 +00:00
Same for the current segment, only to be used in the ``..icon'', ``..ucon'', etc.
1988-04-12 14:31:05 +00:00
pseudo EM-instructions.
T}
gen2( w)#:
gen4( l)#:
#
bss( n)#:#T{
Put n bytes in bss-segment, value is BSS_INIT.
T}
.TE
.VS -4
.IP A2.
Relocation; with char *s; arith o; int r;
.VS +4
.TS
tab(#);
l c lw(10c).
reloc1( s, o, r)#:#T{
Generates relocation-information for 1 byte in the current segment.
T}
##s\0:\0the string which must be relocated
##o\0:\0the offset in bytes from the string.
##T{
r\0:\0relocation type. It can have the values ABSOLUTE or PC_REL. These
1988-05-03 15:15:28 +00:00
two constants are defined in the file ``back.h''
1988-04-12 14:31:05 +00:00
T}
reloc2( s, o, r)#:#T{
Generates relocation-information for 1 word in the
current segment. Byte-order according to BYTES_REVERSED in mach.h.
T}
reloc4( s, o, r)#:#T{
Generates relocation-information for 1 long in the
current segment. Word-order according to WORDS_REVERSED in mach.h.
T}
.TE
.VS -4
.IP A3.
Symbol table interaction; with int seg; char *s;
.VS +4
.TS
tab(#);
l c lw(10c).
switch_segment( seg)#:#T{
1988-05-03 15:15:28 +00:00
sets current segment to ``seg'', and does alignment if necessary. ``seg''
can be one of the four constants defined in ``back.h'': SEGTXT, SEGROM,
1988-04-12 14:31:05 +00:00
SEGCON, SEGBSS.
T}
#
symbol_definition( s)#:#T{
Define s in symbol-table.
T}
set_local_visible( s)#:#T{
Record scope-information in symbol table.
T}
1988-05-03 15:15:28 +00:00
set_global_visible( s)#:#T{
Record scope-information in symbol table.
T}
1988-04-12 14:31:05 +00:00
.TE
.VS -4
.IP A4.
Start/end actions; with char *f;
.VS +4
.TS
tab(#);
l c lw(10c).
1988-05-03 15:15:28 +00:00
open_back( f)#:#T{
Directs output to file ``f'', if f is the null pointer output must be given on
1988-04-12 14:31:05 +00:00
standard output.
T}
1988-05-03 15:15:28 +00:00
output_back()#:#T{
1988-04-12 14:31:05 +00:00
End of the job, flush output.
T}
1988-05-03 15:15:28 +00:00
close_back()#:#T{
1988-04-19 10:41:05 +00:00
close output stream.
1988-04-12 14:31:05 +00:00
T}
init_back()#:#T{
Only used with user-written back-library, gives the opportunity to initialize.
T}
end_back()#:#T{
Only used with user-written back-library.
T}
.TE
.VS -4
.nr PS 12
.nr VS 14
.bp
.SH
Appendix B, description of ACK-a.out library
.PP
1988-05-03 15:15:28 +00:00
The object file produced by \fBce\fR is by default in ACK.OUT(5ACK)
format. The object file is made up of one header, followed by
1988-04-12 14:31:05 +00:00
four segment headers, followed by text, data, relocation information,
1988-05-03 15:15:28 +00:00
symbol table, and the string area. The object file is tuned for the ACK-LED,
1988-04-12 14:31:05 +00:00
so there are some special things done just before the object file is dumped.
1988-04-19 10:41:05 +00:00
First, four relocation records are added which contain the names of the four
1988-04-12 14:31:05 +00:00
segments. Second, all the local relocation is resolved. This is done by the
function do_relo(). If there is a record belonging to a local
name this address is relocated in the segment to which the record belongs.
1988-05-03 15:15:28 +00:00
Besides doing the local relocation, do_relo() changes the ``nami''-field
1988-04-12 14:31:05 +00:00
of the local relocation records. This field receives the index of one of the
four
relocation records belonging to a segment. After the local
1988-05-03 15:15:28 +00:00
relocation has been resolved the routine output_back() dumps the
ACK object file.
1988-04-12 14:31:05 +00:00
.LP
If a different a.out format is wanted, one can choose between three strategies:
.IP \ \1:
The most simple one is to use a conversion program, which converts the ACK
a.out format to the wanted a.out format. This program exists for all most
1988-05-03 15:15:28 +00:00
all machines on which ACK runs. However,
not all conversion programs can generate relocation information.
The disadvantage is that the compiler will become slower.
1988-04-12 14:31:05 +00:00
.IP \ \2:
1988-05-03 15:15:28 +00:00
A better solution is to change the functions output_back(), do_relo(),
open_back(), and close_back() in such a way
that they produce the wanted a.out format. This strategy saves a lot of I/O.
1988-04-12 14:31:05 +00:00
.IP \ \3:
1988-05-03 15:15:28 +00:00
If you still are not satisfied and have a lot of spare time adapt the
\fBback\fR-primitives to produce the wanted a.out format.