ack/doc/ego/ca/ca1

.bp
.NH 1
Compact assembly generation
.NH 2
Introduction
.PP
The "Compact Assembly generation phase" (CA) transforms the
intermediate code of the optimizer into EM code in
Compact Assembly Language (CAL) format.
In the intermediate code, all program entities
(such as procedures, labels, global variables)
are denoted by a unique identifying number (see 3.5).
In the CAL output of the optimizer these numbers have to
be replaced by normal identifiers (strings).
The original identifiers of the input program are used whenever possible.
Recall that the IC phase generates two files that can be
used to map unique identifying numbers to procedure names and
global variable names.
For instruction labels CA always generates new names.
The reasons for doing so are:
.IP -
instruction labels are only visible inside one procedure, so they can
not be referenced in other modules
.IP -
the names are not very suggestive anyway, as they must be integer numbers
.IP -
the optimizer considerably changes the control structure of the program,
so there is really no one to one mapping of instruction labels in
the input and the output program.
.LP
As the optimizer combines all input modules into one module,
visibility problems may occur.
Two modules M1 and M2 can both define an identifier X (provided that
X is not externally visible in any of these modules).
If M1 and M2 are combined into one module M, two distinct
entities with the same name would exist in M, which
is not allowed.
.[~[
tanenbaum machine architecture
.], section 11.1.4.3]
In these cases, CA invents a new unique name for one of the entities.
.NH 2
Implementation
.PP
CA first reads the files containing the procedure and global variable names
and stores the names in two tables.
It scans these tables to make sure that all names are different.
Subsequently it reads the EM text, one procedure at a time,
and outputs it in CAL format.
The major part of the code that does the latter transformation
is adapted from the EM Peephole Optimizer.
.PP
The main problem of the implementation of CA is to
assure that the visibility rules are obeyed.
If an identifier must be externally visible (i.e.
it was externally visible in the input program)
and the identifier is defined (in the output program) before
being referenced,
an EXA or EXP pseudo must be generated for it.
(Note that the optimizer may change the order of definitions and
references, so some pseudos may be needed that were not
present in the input program).
On the other hand, an identifier may be only internally visible.
If such an identifier is referenced before being defined,
an INA or INP pseudo must be emitted prior to its first reference.
.UH
Acknowledgements
.PP
The author would like to thank Andy Tanenbaum for his guidance,
Duk Bekema for implementing the Common Subexpression Elimination phase
and writing the initial documentation of that phase,
Dick Grune for reading the manuscript of this report
and Ceriel Jacobs, Ed Keizer, Martin Kersten, Hans van Staveren
and the members of the S.T.W. user's group for their
interest and assistance.