ack/doc/peep.doc
1994-12-06 09:12:22 +00:00

521 lines
16 KiB
Text

.\" $Id$
.TL
Internal documentation on the peephole optimizer
.br
from the Amsterdam Compiler Kit
.NH 1
Introduction
.PP
Part of the Amsterdam Compiler Kit is a program to do
peephole optimization on an EM program.
The optimizer scans the program to match patterns from a table
and if found makes the optimization from the table,
and with the result of the optimization
it tries to find yet another optimization
continuing until no more optimizations are found.
.PP
Furthermore it does some optimizations that can not be called
peephole optimizations for historical reasons,
like branch chaining and the deletion of unreachable code.
.PP
The peephole optimizer consists of three parts
.IP 1)
A driving table
.IP 2)
A program translating the table to internal format
.IP 3)
C code compiled with the table to make the optimizer proper
.PP
In this document the table format, internal format and
data structures in the optimizer will be explained,
plus a hint on what the code does where it might not be obvious.
It is a simple program mostly.
.NH 1
Table format
.PP
The driving table consists of pattern/replacement pairs,
in principle one per line,
although a line starting with white space is considered
a continuation line for the previous.
The general format is:
.DS
optimization : pattern ':' replacement '\en'
.sp
pattern : EMlist optional_boolean_expression
.sp
replacement : EM_plus_operand_list
.DE
Example of a simple one
.DS
loc stl $1==0 : zrl $2
.DE
There is no real limit for the length of the pattern or the replacement,
the replacement might even be longer than the pattern,
and expressions can be made arbitrarily complicated.
.PP
The expressions in the table are made of the following pieces:
.IP -
Integer constants
.IP -
$\fIn\fP, standing for the operand of the \fIn\fP'th EM
instruction in the pattern,
undefined if that instruction has no operand.
.IP -
w, standing for the wordsize of the code optimized.
.IP -
p, for the pointersize.
.IP -
defined(expr), true if expression is defined
.IP -
samesign(expr,expr), true if expressions have the same sign.
.IP -
sfit(expr,expr), ufit(expr,expr),
true if the first expression fits signed or unsigned in the number
of bits given in the second expression.
.IP -
rotate(expr,expr),
first expression rotated left the number of bits given by the second expression.
.IP -
notreg(expr),
true if the local with the expression as number is not a candidate to put
in a register.
.IP -
rom(\fIn\fP,expr), contents of the rom descriptor at index expr that
is associated with the global label that should be the argument of
the \fIn\fP'th EM instruction.
Undefined if such a thing does not exist.
.PP
The usual arithmetic operators may be used on integer values,
if any operand is undefined the expression is undefined,
except for the defined() function above.
An undefined expression used for its truth value is false.
All arithmetic on local label operands is forbidden,
only things allowed are tests for equality.
Arithmetic on global labels makes sense,
i.e. one can add a global label and a constant,
but not two global labels.
.PP
In the table one can use five additional EM instructions in patterns.
These are:
.IP lab
Stands for a local label
.IP LLP
Load Local Pointer, translates into a
.B lol
or into a
.B ldl
depending on the relationship between wordsize and pointersize.
.IP LEP
Load External Pointer, translates into a
.B loe
or into a
.B lde .
.IP SLP
Store Local Pointer,
.B stl
or
.B sdl .
.IP SEP
Store External Pointer,
.B ste
or
.B sde .
.PP
There is only one peephole optimizer,
so the substitutions to be made for the last four instructions
are made at run time before the first optimizations are made.
.NH 1
Internal format
.PP
The translating program,
.I mktab
converts the table into an array of bytes where all
patterns follow unaligned.
Format of a pattern is:
.IP 1)
One byte for high byte of hash value,
will be explained later on.
.IP 2)
Two bytes for the index of the next pattern in a chain.
.IP 3)
An integer\u*\d,
.FS
* An integer is encoded as a byte when less than 255,
otherwise as a byte containing 255 followed by two
bytes with the real value.
.FE
pattern length.
.IP 4)
The list of pattern opcodes, one per byte.
.IP 5)
An integer expression index, 0 if not used.
.IP 6)
An integer, replacement length.
.IP 7)
A list of pairs consisting of a one byte opcode and an integer
expression index.
.PP
The expressions are kept in an array of triples,
implementing a binary tree.
The
.I mktab
program tries to minimize the number of triples by reusing
duplicates and even reverses the operands of commutative operators
when doing so would spare a triple.
.NH 1
A tour through the sources
.PP
Now we will walk through the sources and note things of interest.
.NH 2
The header files
.PP
The header files are the place where data structures and options reside.
.NH 3
alloc.h
.PP
In the header file alloc.h several defines can be used to select various
kinds of core allocation schemes.
This is important on small machines like the PDP-11 since a complete
procedure must be in core at the same space,
and the peephole optimizer should not be the limiting factor in
determining the maximum size of procedures if possible.
Options are:
.IP -
USEMALLOC, standard malloc() and free() are used instead of the own
core allocation package.
Not recommended unless the own package does not work on some bizarre
machine.
.IP -
COREDEBUG, prints large amounts of information about core management.
Not recommended unless the code is changed and it stops working.
.IP -
SEPID, defining this will add an extra procedure that will
go through a lot of work to scrape the last bytes together if the
system won't provide more.
This is not a good idea if memory is scarce and code and data reside
in the same spaces, since the room used by the procedure might well
be more than the room saved.
.IP -
STACKROOM, number of shorts used in stack space.
This is used if memory is scarce and stack space and data space are
different.
On the PDP-11 a UNIX process starts with an 8K stack segment which
cannot be transferred to the data segment.
Under these conditions one can use a lot of the stack space for storage.
.NH 3
assert.h
.PP
Just defines the assert macro.
When compiled with -DNDEBUG all asserts will be off.
.NH 3
ext.h
.PP
Gives external definitions of variables used by more than one module.
.NH 3
line.h
.PP
Defines the structures used to keep instructions,
one structure per line of EM code,
and the structure to keep arguments of pseudos,
one structure per argument.
Both structures essentially contain a pointer to the next,
a type,
and a union containing information depending on the type.
Core is allocated only for the part of the union used.
.PP
The
.I
struct line
.R
has a very compact encoding for small integers,
they are encoded in the type field.
On the PDP-11 this gives a line structure of only 4 bytes for most
instructions.
.NH 3
lookup.h
.PP
Contains definition of the struct used for symbol table management,
global labels and procedure names are kept in one table.
.NH 3
optim.h
.PP
If one defines the DIAGOPT option in this header file,
for every optimization performed a number is written on stderr.
The number gives the number of the pattern in the table
or one of the four special numbers in this header file.
.NH 3
param.h
.PP
Contains one settable option,
LONGOFF.
If this is not defined the optimizer can only optimize programs
with wordsize 2 and pointersize 2.
Set this only if it must be run on a Z80 or something pathetic like that.
.PP
Other defines here should not be touched.
.NH 3
pattern.h
.PP
Contains defines of indices in a pattern,
definition of the expression triples,
definitions of the various expression operators
and definition of the result struct where expression results are put.
.PP
This header file is the main one that is also included by
.I mktab .
.NH 3
proinf.h
.PP
This one contains definitions
for the local label table structs
and for the struct where all information for one procedure is kept.
This is in one struct so it can be saved easily when recursive
procedures have to be resolved.
.NH 3
tes.h
.PP
Contains the data structure used by the top element size computation.
.NH 3
types.h
.PP
Collection of typedefs to be used by almost all modules.
.NH 2
The C code itself.
.PP
The C code will now be the center of our attention.
We will make a walk through the sources and we will try
to follow the sources in a logical order.
So we will start at
.NH 3
main.c
.PP
The main.c module contains the main() function.
Here nothing spectacular happens,
only thing of interest is the handling of flags:
.IP -L
This is an instruction to the peephole optimizer to perform
one of its auxiliary functions, the generation of a library module.
This makes the peephole optimizer write its output on a temporary file,
and at the end making the real output by first generating a list
of exported symbols and then copying the temporary file behind it.
.IP -n
Disables all optimization.
Only thing the optimizer does now is filling in the blank after the
.I END
pseudo and resolving recursive procedures.
.PP
The place where main() is left is the call to getlines() which brings
us to
.NH 3
getline.c
.PP
This module reads the EM code and constructs a list of
.I
struct line
.R
records,
linked together backwards,
i.e. the first instruction read is the last in the list.
Pseudos are handled here also,
for most pseudos this just means that a chain of argument records
is linked into the linked line list but some pseudos get special attention:
.IP exc
This pseudo is acted upon right away.
Lines read are shuffled around according to instruction.
.IP mes
Some messages are acted upon.
These are:
.RS
.IP ms_err 8
The input is drained, just in case it is a pipe.
After that the optimizer exits.
.IP ms_opt
The do not optimize flag is set.
Acts just like -n on the command line.
.IP ms_emx
The word- and pointersize are read,
complain if we are not able to handle this.
.IP ms_reg
We take notice of the offset of this local.
See also comments in the description of peephole.c
.RE
.IP pro
A new procedure starts, if we are already in one save the status,
else process collected input.
Collect information about this procedure and if already in a procedure
call getlines() recursively.
.IP end
Process collected input.
.PP
The phrase "process collected input" is used twice,
which brings us to
.NH 3
process.c
.PP
This module contains the entry point process() which is called at any
time the collected input must be processed.
It calls a variety of other routines to get the real work done.
Routines in this module are in chronological order:
.IP symknown 12
Marks all symbols seen until now as known,
i.e. it is now known whether their scope is local or global.
This information is used again during output.
.IP symvalue
Runs through the chain of pseudos to give values to data labels.
This needs an extra pass.
It cannot be done during the getlines pass, since an
.B exc
pseudo could destroy things.
Nor can it be done during the backward pass since it is impossible
to do good fragment numbering backward.
.IP checklocs
Checks whether all local labels referenced are defined.
It needs to be sure about this since otherwise the
semi global optimizations made cannot work.
.IP relabel
This routine finds the final destination for each label in the procedure.
Labels followed by unconditional branches or other labels are marked during
the peephole fase and this leeds to chains of identical labels.
These chains are followed here, and in the local label table each label
has associated with it its replacement label, after this procedure is run.
Care is taken in this routine to prevent a loop in the program to
cause the optimizer to loop.
.IP cleanlocals
This routine empties the local label table after everything
is processed.
.PP
But before this can all be done,
the backward linked list of instructions first has to be reversed,
so here comes
.NH 3
backward.c
.PP
The routine backward has a number of functions:
.IP -
It reverses the backward linked list, making two forward linked lists,
one for the instructions and one for the pseudos.
.IP -
It notes the last occurrence of data labels in the backward linked list
and puts it in the global symbol table.
This is of course the first occurence in the procedure.
This information is needed to decide whether the symbols are global
or local to this module.
.IP -
It decides about the fragment boundaries of data blocks.
Fragments are numbered backwards starting at 3.
This is done to be able to make the type of an expression
containing a symbol equal to its fragment.
This type can then not clash with the types integer and local label.
.IP -
It allocates a rom buffer to every data label with a rom behind
it, if that rom contains only plain integers at the start.
.PP
The first thing done after process() has called backward() and some
of its own little routines is a call to the real routine,
the one that does the work the program was written for
.NH 3
peephole.c
.PP
The first routines in peephole.c
implement a linked list for the offsets of local variables
that are candidates for a register implementation.
Several patterns use the notreg() function,
since it is forbidden to combine a load of that variable
with the load of another and
it is not allowed to take the address of that variable.
.PP
The routine peephole hashes the patterns the first time it is called
after which it doesn't do much more than calling optimize.
But first hashpatterns().
.PP
The patterns are hashed at run time of the optimizer because of
the
.B LLP ,
.B LEP ,
.B SLP
and
.B SEP
instructions added to the instruction set in this optimizer.
These are first replaced everywhere in the table by the correct
replacement after which the first three instructions of the
pattern are hashed and the pattern is linked into one of the
256 linked lists.
There is a define CHK_HASH in this module that
can be set if the randomness of the hashing
function is not trusted.
.PP
The attention now shifts to optimize().
This routine calls basicblock() for every piece of code between two labels.
It also notes which labels have another label or a branch behind them
so the relabel() routine from process.c can do something with that.
.PP
Basicblock() keeps making passes over its basic block
until no more optimizations are found.
This might be inefficient if there is a long basicblock with some
deep recursive optimization in one part of it.
The entire basic block is then scanned a lot of times just for
that one piece.
The alternative is backing up after making an optimization and running
through the same code again, but that is difficult
in a single linked list.
.PP
It hashes instructions and calls trypat() for every pattern that has
a full hash value match,
i.e. lower byte and upper byte equal.
Longest pattern is tried first.
.PP
Trypat() checks length and opcodes of the pattern.
If correct it fills the iargs[] array with argument values
and calculates the expression.
If that is also correct the work shifts to tryrepl().
.PP
Tryrepl() generates the list of replacement instructions,
links it into the list and returns true.
Why then the name tryrepl() if it always succeeds?
Well, there is a mechanism in the optimizer,
unused until today that makes it possible to do optimizations that cannot
be described by the table.
It is possible to give a number as a replacement which will cause the
optimizer to call a routine special() to do some work.
This routine might decide not to do an optimization and return false.
.PP
The last routine that is called from process() is putline()
to write the optimized code, bringing us to
.NH 3
tes.c
.PP
Contains the routines used by the top element size computation phase,
which is run after the peephole-optimisation.
The main routine of tes.c is tes_instr(). This looks at an instruction and
decides the size of the element on top of the stack after the instruction
is executed. When a label is defined or used, the size of the top element
is remembered for later use. When the information in consistent throuhout
the procedure, it is passed to the code generator by means of an ms_tes
message.
.NH 3
putline.c
.PP
The major part of putline.c is the standard set of routines
that makes EM compact code.
The extra functions performed are:
.IP -
For every occurence of a global symbol it might be necessary to
output a
.B exa ,
.B exp ,
.B ina
or
.B inp
pseudo instruction.
That task is performed.
.IP -
The
.B lin
instructions are optimized here,
.B lni
instructions added for
.B lin
instructions and superfluous
.B lin
instructions deleted.