579 lines
17 KiB
Plaintext
579 lines
17 KiB
Plaintext
.\" $Header$
|
|
.TL
|
|
A Tour of the New Peephole Optimizer
|
|
.AU
|
|
B. J. McKenzie
|
|
.NH
|
|
Introduction
|
|
.LP
|
|
The peephole optimizer consists of four major parts:
|
|
.IP a)
|
|
the table describing the optimization to be performed
|
|
.IP b)
|
|
a program to parse these tables and build input and output routines to
|
|
interface to the library and a dfa based routine to recognize patterns and
|
|
make the requested replacements.
|
|
.IP c)
|
|
common routines for the library that are independent of the table of a)
|
|
.IP d)
|
|
a stand alone version of the optimizer.
|
|
.LP
|
|
The library conforms to the
|
|
.I EM_CODE(3)
|
|
module interface but with routine names of the form
|
|
.BI C_ xxx
|
|
replaced by names like
|
|
.BI O_ xxx.
|
|
Furthermore there is also no routine
|
|
.I O_getid
|
|
and no variable
|
|
.I O_tmpdir
|
|
in the module.
|
|
The library module results in calls to the usual
|
|
.I EM_CODE(3)
|
|
module. It is possible to write a front end so that it can call either the
|
|
normal
|
|
.I EM_CODE(3)
|
|
module or this new module by adding
|
|
.B
|
|
#define PEEPHOLE
|
|
.R
|
|
before the line
|
|
.B
|
|
#include <em.h>
|
|
.R
|
|
This will map all calls to the routine
|
|
.BI C_ xxx
|
|
into a call to the routine
|
|
.BI O_ xxx.
|
|
|
|
.LP
|
|
We shall now describe each of these major parts in some detail.
|
|
|
|
.NH
|
|
The optimization table
|
|
.LP
|
|
The file
|
|
.I patterns
|
|
contains the patterns of EM instructions to be recognized by the optimizer
|
|
and the EM instructions to replace them. Each pattern may have an
|
|
optional restriction that must be satisfied before the replacement is made.
|
|
The syntax of the table will be described using extended BNF notation
|
|
used by
|
|
.I LLGen
|
|
where:
|
|
.DS
|
|
.I
|
|
[...] - are used to group items
|
|
| - is used to separate alternatives
|
|
; - terminates a rule
|
|
? - indicates item is optional
|
|
* - indicates item is repeated zero or more times
|
|
+ - indicates item is repeated one or more times
|
|
.R
|
|
.DE
|
|
The format of each rule in the table is:
|
|
.DS
|
|
.I
|
|
rule : pattern global_restriction? ':' replacement
|
|
;
|
|
.R
|
|
.DE
|
|
Each rule must be on a single line except that it may be broken after the
|
|
colon if the next line begins with a tab character.
|
|
The pattern has the syntax:
|
|
.DS
|
|
.I
|
|
pattern : [ EM_mnem [ local_restriction ]? ]+
|
|
;
|
|
EM-mnem : "An EM instruction mnemonic"
|
|
| 'lab'
|
|
;
|
|
.R
|
|
.DE
|
|
and consists of a sequence of one or more EM instructions or
|
|
.I lab
|
|
which stands for a defined instruction label. Each EM-mnem may optionally be
|
|
followed by a local restriction on the argument of the mnemonic and take
|
|
one of the following forms depending on the type of the EM instruction it
|
|
follows:
|
|
.DS
|
|
.I
|
|
local_restriction : normal_restriction
|
|
| opt_arg_restriction
|
|
| ext_arg_restriction
|
|
;
|
|
.R
|
|
.DE
|
|
A normal restriction is used after all types of EM instruction except for
|
|
those that allow an optional argument, (such as
|
|
.I adi
|
|
) or those involving external names, (such as
|
|
.I lae
|
|
)
|
|
and takes the form:
|
|
.DS
|
|
.I
|
|
normal_restriction : [ rel_op ]? expression
|
|
;
|
|
rel_op : '=='
|
|
| '!='
|
|
| '<='
|
|
| '<'
|
|
| '>='
|
|
| '>'
|
|
;
|
|
.R
|
|
.DE
|
|
If the rel_op is missing, the equality
|
|
.I ==
|
|
operator is assumed. The general form of expression is defined later but
|
|
basically it involves simple constants, references to EM_mnem arguments
|
|
that appear earlier in the pattern and expressions similar to those used
|
|
in C expressions.
|
|
|
|
The form of the restriction after those EM instructions like
|
|
.I adi
|
|
whose arguments are optional takes the form:
|
|
.DS
|
|
.I
|
|
opt_arg_restriction : normal_restriction
|
|
| 'defined'
|
|
| 'undefined'
|
|
;
|
|
.R
|
|
.DE
|
|
The
|
|
.I defined
|
|
and
|
|
.I undefined
|
|
indicate that the argument is present
|
|
or absent respectively. The normal restriction form implies that the
|
|
argument is present and satisfies the restriction.
|
|
|
|
The form of the restriction after those EM instructions like
|
|
.I lae
|
|
whose arguments refer to external object take the form:
|
|
.DS
|
|
.I
|
|
ext_arg_restriction : patarg offset_part?
|
|
;
|
|
offset_part : [ '+' | '-' ] expression
|
|
;
|
|
.R
|
|
.DE
|
|
Such an argument has one of three forms: a offset with no name, an
|
|
offset form a name or an offset from a label. With no offset part
|
|
the restriction requires the argument to be identical to a previous
|
|
external argument. With an offset part it requires an identical name
|
|
part, (either empty, same name or same label) and supplies a relationship
|
|
among the offset parts. It is possible to refer to test for the same
|
|
external argument, the same name or to obtain the offset part of an external
|
|
argument using the
|
|
.I sameext
|
|
,
|
|
.I samenam
|
|
and
|
|
.I offset
|
|
functions given below.
|
|
.LP
|
|
The general form of an expression is:
|
|
.DS
|
|
.I
|
|
expression : expression binop expression
|
|
| unaryop expression
|
|
| '(' expression ')'
|
|
| bin_function '(' expression ',' expression ')'
|
|
| ext_function '(' patarg ',' patarg ')'
|
|
| 'offset' '(' patarg ')'
|
|
| patarg
|
|
| 'p'
|
|
| 'w'
|
|
| INTEGER
|
|
;
|
|
.R
|
|
.DE
|
|
.DS
|
|
.I
|
|
bin_function : 'sfit'
|
|
| 'ufit'
|
|
| 'samesign'
|
|
| 'rotate'
|
|
;
|
|
.R
|
|
.DE
|
|
.DS
|
|
.I
|
|
ext_function : 'samenam'
|
|
| 'sameext'
|
|
;
|
|
patarg : '$' INTEGER
|
|
;
|
|
binop : "As for C language"
|
|
unaryop : "As for C language"
|
|
.R
|
|
.DE
|
|
The INTEGER in the
|
|
.I patarg
|
|
refers to the first, second, etc. argument in the pattern and it is
|
|
required to refer to a pattern that appears earlier in the pattern
|
|
The
|
|
.I w
|
|
and
|
|
.I p
|
|
refer to the word size and pointer size (in bytes) respectively. The
|
|
various function test for:
|
|
.IP sfit 10
|
|
the first argument fits as a signed value of
|
|
the number of bit specified by the second argument.
|
|
.IP ufit 10
|
|
as for sfit but for unsigned values.
|
|
.IP samesign 10
|
|
the first argument has the same sign as the second.
|
|
.IP rotate 10
|
|
the value of the first argument rotated by the number of bit specified
|
|
by the second argument.
|
|
.IP samenam 10
|
|
both arguments refer to externals and have either no name, the same name
|
|
or same label.
|
|
.IP sameext 10
|
|
both arguments refer to the same external.
|
|
.IP offset 10
|
|
the argument is an external and this yields it offset part.
|
|
|
|
.LP
|
|
The global restriction takes the form:
|
|
.DS
|
|
.I
|
|
global_restriction : '?' expression
|
|
;
|
|
.R
|
|
.DE
|
|
and is used to express restrictions that cannot be expressed as simple
|
|
restrictions on a single argument or are can be expressed in a more
|
|
readable fashion as a global restriction. An example of such a rule is:
|
|
.DS
|
|
.I
|
|
dup w ldl stf ? p==2*w : ldl $2 stf $3 ldl $2 lof $3
|
|
.R
|
|
.DE
|
|
which says that this rule only applies if the pointer size is twice the
|
|
word size.
|
|
|
|
.NH
|
|
Incompatibilities with Previous Optimizer
|
|
.LP
|
|
The current table format is not compatible with previous versions of the
|
|
peephole optimizer tables. In particular the previous table had no provision
|
|
for local restrictions and only the equivalent of the global restriction.
|
|
This meant that our
|
|
.I '?'
|
|
character that announces the presence of the optional global restriction was
|
|
not required. The previous optimizer performed a number of other tasks that
|
|
were unrelated to optimization that were possible because the old optimizer
|
|
read the EM code for a complete procedure at a time. This included tasks such
|
|
as register variable reference counting and moving the information regarding
|
|
the number of bytes of local storage required by a procedure from it
|
|
.I end
|
|
pseudo instruction to it's
|
|
.I pro
|
|
pseudo instruction. These tasks are no longer done by this module but have
|
|
been moved to other modules or programs in the pipeline. The register variable
|
|
reference counting is now performed by the front end. The reordering of
|
|
code, such as the moving of mes instructions and the local storage
|
|
requirements from the end to beginning of procedures, is now performed using
|
|
the insertpart mechanism in the
|
|
.I EM_CODE
|
|
(or
|
|
.I EM_OPT
|
|
) module.
|
|
The removal of dead code is performed by the global optimizer.
|
|
Various
|
|
.I ext_functions
|
|
available in the old tables are no longer available as they rely on
|
|
information that is not available to the current program.
|
|
These are the
|
|
.I notreg
|
|
and the
|
|
.I rom
|
|
functions.
|
|
The previous optimizer allowed the use of
|
|
.I LLP,
|
|
.I LEP,
|
|
.I SLP
|
|
and
|
|
.I SEP
|
|
in patterns. For example
|
|
.I LLP
|
|
stood for either
|
|
.I lol
|
|
if the pointer size was the same as the word size, or for
|
|
.I ldl
|
|
if the pointer size was twice the word size.
|
|
In the current optimizer it is necessary to include two patterns for each
|
|
such single pattern in the old table. For example for a pattern containing
|
|
.I LLP
|
|
there would be one pattern with
|
|
.I lol
|
|
and with a global restriction of the form
|
|
.I p=w
|
|
and another pattern with ldl and a global restriction of the form
|
|
.I p=2*w.
|
|
|
|
.NH
|
|
The Parser
|
|
.LP
|
|
The program to parse the tables and build the pattern table dependent dfa
|
|
routines is built from the files:
|
|
.IP parser.h 15
|
|
header file
|
|
.IP parser.g 15
|
|
LLGen source file defining syntax of table
|
|
.IP syntax.l 15
|
|
Lex sources file defining form of tokens in table.
|
|
.IP initlex.c 15
|
|
Uses the data in the library
|
|
.I em_data.a
|
|
to initialize the lexical analyzer to recognize EM instruction mnemonics.
|
|
.IP outputdfa.c 15
|
|
Routines to output the dfa when it has been constructed. It outputs the files
|
|
.I dfa.c
|
|
and
|
|
.I trans.c
|
|
.IP outcalls.c 15
|
|
Routines to output the file
|
|
.I incalls.r
|
|
defined in the next section.
|
|
.IP findworst.c 15
|
|
Routines to analyze patterns to find how to continue matching after a
|
|
successful replacement or failed match.
|
|
|
|
.LP
|
|
The parser checks that the tables conform to the syntax outlined in the
|
|
previous section and also makes a number of semantic checks on their
|
|
validity. Further versions could make further checks such as looking for
|
|
cycles in the rules or checking that each replacement leaves the same
|
|
number of bytes on the stack as the pattern it replaces. The parser
|
|
builds an internal dfa representation of the rules by combining rules with
|
|
common prefixes. All local and global restrictions are combined into a single
|
|
test to be performed are a complete pattern has been detected in the input.
|
|
The idea is to build a structure so that each of the patterns can be matched
|
|
and then the corresponding tests made and the first that succeeds is replaced.
|
|
If two rules have the same pattern and both their tests also succeed the one
|
|
that appears first in the tables file will be done. Somewhat less obvious
|
|
is that if one pattern is a proper prefix of a longer pattern and its test
|
|
succeeds then the second pattern will not be checked for.
|
|
|
|
A major task of the parser if to decide on the action to take when a rule has
|
|
been partially matched or when a pattern has been completely matched but its
|
|
test does not succeed. This requires a search of all patterns to see if any
|
|
part of the part matched could be part of some other pattern. for example
|
|
given the two patterns:
|
|
.DS
|
|
.I
|
|
loc adi w loc adi w : loc $1+$3 adi w
|
|
loc adi w loc sbi w : loc $1-$3 adi w
|
|
.R
|
|
.DE
|
|
If the first pattern fails after seeing the input:
|
|
.DS
|
|
.I
|
|
loc adi loc
|
|
.R
|
|
.DE
|
|
the parser will still need to check whether the second pattern matches.
|
|
This requires a decision on how to fix up any internal data structures in
|
|
the dfa matcher, such as moving some instructions from the pattern to the
|
|
output queue and moving the pattern along and then deciding what state
|
|
it should continue from. Similar decisions are requires after a pattern
|
|
has been replaced. For example if the replacement is empty it is necessary
|
|
to backup
|
|
.I n-1
|
|
instructions where
|
|
.I n
|
|
is the length of the longest pattern in the tables.
|
|
|
|
.NH
|
|
Structure of the Resulting Library
|
|
|
|
.LP
|
|
The major data structures maintained by the library consist of three queues;
|
|
an
|
|
.I output
|
|
queue of instructions awaiting output, a
|
|
.I pattern
|
|
queue containing instructions that match the current prefix, and a
|
|
.I backup
|
|
queue of instructions that have been backed up over and need to be reparsed
|
|
for further pattern matches.
|
|
These three queues are maintained in a single fixed size buffer as explained
|
|
in more detail in the next section.
|
|
Also, after a successful match, a replacement queue is constructed.
|
|
|
|
|
|
.LP
|
|
If no errors are detected by the parser in the tables it output the following
|
|
files if they have changed from the existing version of the file:
|
|
.IP dfa.c 10
|
|
this consists of a routine for each state in the dfa. Each routine contains
|
|
a switch statement that decides on the basis of the current instruction
|
|
opcode the next state if any in the dfa to make a transition to.
|
|
These routines are called from
|
|
.I OO_dfa
|
|
declared in
|
|
.I OO_dfa.c
|
|
via an array
|
|
.I OO_fstate
|
|
that is indexed by state. Attempt to implement this code by a large nested
|
|
switch statement experienced difficulties with compilers that had fixed
|
|
limits on the size of switch statements. A better implementation of this
|
|
might be to find some hashing function that mapped state and opcode onto a
|
|
unique value and then switch on this via an array.
|
|
.IP trans.c 10
|
|
this contains external declarations of transition routines with names like
|
|
.B OO_xxxdotrans
|
|
(where
|
|
.I xxx
|
|
is a small integer).
|
|
These are called when there a transition to state
|
|
.I xxx
|
|
that corresponds to a
|
|
complete pattern. Any tests are performed if necessary to confirm that the
|
|
pattern matches and then the replacement instructions are placed on the
|
|
output queue and backup and freeing of instructions is performed. If there are
|
|
a number of patterns with the same instructions but different tests, these
|
|
will all appear in the same routine and the tests performed in the order they
|
|
appear in the original
|
|
.I patterns
|
|
file.
|
|
.IP incalls.r 10
|
|
this contains an entry for every EM instruction (plus
|
|
.I lab
|
|
) giving information on how to build a routine with the name
|
|
.BI O_ xxx
|
|
for the library version of the module.
|
|
If the EM instruction does not appear in the tables
|
|
patterns at all then the dfa routine is called to flush any current queued
|
|
output and the the output
|
|
.BI C_ xxx
|
|
routine is called. If the EM instruction does appear in a pattern then the
|
|
instruction data structure fields are
|
|
initialized and it is added onto the end of the pattern queue.
|
|
The dfa routines are then called to attempted to make a transition.
|
|
This file is input to the
|
|
.I awk
|
|
program
|
|
.I makefuns.awk.
|
|
|
|
.LP
|
|
The following files contain code that is independent of the pattern tables:
|
|
.IP main.c 10
|
|
this is used only in the stand alone version of the optimizer and consists
|
|
of code to open the input file, read the input using the
|
|
.I READ_EM(3)
|
|
module and call the dfa routines. This version does not require the routines
|
|
constructed from the incalls.r file described above.
|
|
.IP nopt.c 10
|
|
general routines to initialize, and maintain the data structures. The file
|
|
handling routines
|
|
.I O_open
|
|
etc are defined here. Also defined are routines for flushing the output queue
|
|
by calling the
|
|
.I EM_mkcalls
|
|
routine from the
|
|
.I READ_EM(3)
|
|
module and moving instructions from the output to the backup queue.
|
|
Routines to free the strings stored in instructions
|
|
with types of
|
|
.I sof_ptyp,
|
|
.I pro_ptyp,
|
|
.I str_ptyp,
|
|
.I ico_ptyp,
|
|
.I uco_ptyp,
|
|
and
|
|
.I fco_ptyp are also defined. These strings are copied to a large array that
|
|
is extended by
|
|
.I Realloc
|
|
if it overflows. The strings can be thrown away on any flush that occurs when
|
|
the backup queue is empty.
|
|
.IP mkstrct.c 10
|
|
contains routines to build the data structure from the input
|
|
.BI C_ xxx
|
|
routines and place the structure on the pattern queue. These routines are also
|
|
used to build the data structures when a replacement is constructed.
|
|
.IP aux.c 10
|
|
routines to implement the external functions used in the pattern table.
|
|
|
|
.LP
|
|
The following files are also used in building the module library:
|
|
.IP makefuns.awk 10
|
|
this
|
|
.I awk
|
|
program is used to produce individual C files with names like
|
|
.BI O_ xxx.c
|
|
each containing a single function definition and then call the
|
|
.I cc
|
|
compiler to produce a single output file.
|
|
This enables the loader to only load those routines that are actually
|
|
needed when the library is loaded.
|
|
.IP pseudo.r 10
|
|
this file is like the
|
|
.I incalls.r
|
|
file produced by the parser but is built by hand and handles the pseudo
|
|
EM instructions. It is also processed by
|
|
.I makefuns.awk.
|
|
|
|
.NH
|
|
Miscellaneous Issues
|
|
.LP
|
|
The output, pattern and backup queues are maintained in fixed length array,
|
|
.I OO_buffer
|
|
allocated of size
|
|
.I MAXBUFFER
|
|
(a constant declared in nopt.h) at run time.
|
|
It consists of an array of the
|
|
.I e_instr
|
|
data structure used by the
|
|
.I READ_EM(3)
|
|
module.
|
|
At any time the pointers
|
|
.I OO_patternqueue
|
|
and
|
|
.I OO_nxtpatt
|
|
point to the beginning and end of the current pattern prefix that corresponds
|
|
to the current state. Any instructions on the backup queue are between
|
|
.I OO_nxtpatt
|
|
and
|
|
.I OO_endbackup.
|
|
If there are no instructions on the backup queue then
|
|
.I OO_endbackup
|
|
will be 0 (zero).
|
|
The size of the replacement queue is set to the length of the maximum
|
|
replacement length by the tables output by the parser.
|
|
|
|
.LP
|
|
The fixed size of the buffer causes no difficulty in
|
|
practice and can only result in some potential optimizations being missed.
|
|
When space for a new instruction is required and the buffer is full the
|
|
routine
|
|
.I OO_halfflush
|
|
is called to flush half the buffer and move all the data structures left.
|
|
It should be noted that it is not possible to statically determine the
|
|
maximum possible size for these queues as they need to be unbounded in
|
|
the worst case.
|
|
A study of the rule
|
|
.DS
|
|
.I
|
|
inc dec :
|
|
.R
|
|
.DE
|
|
with the input consisting of
|
|
.I N
|
|
.I inc
|
|
and then
|
|
.I N
|
|
.I dec
|
|
instructions requires an output queue length of
|
|
.I N-1
|
|
to find all possible replacements.
|