Added optimser document
This commit is contained in:
parent
40b9920f8f
commit
ea247567ba
|
@ -14,7 +14,7 @@ RESFILES= \
|
|||
peep.$(SUF) cg.$(SUF) ncg.$(SUF) regadd.$(SUF) LLgen.$(SUF) \
|
||||
basic.$(SUF) crefman.$(SUF) pcref.$(SUF) val.$(SUF) \
|
||||
6500.$(SUF) i80.$(SUF) z80.$(SUF) top.$(SUF) ego.$(SUF) \
|
||||
m68020.$(SUF) occam.$(SUF)
|
||||
m68020.$(SUF) occam.$(SUF) nopt.$(SUF)
|
||||
|
||||
.SUFFIXES: .doc .$(SUF)
|
||||
|
||||
|
|
543
doc/nopt.doc
Normal file
543
doc/nopt.doc
Normal file
|
@ -0,0 +1,543 @@
|
|||
.\" $Header$
|
||||
.TL
|
||||
A Tour of the New Peephole Optimizer
|
||||
.AU
|
||||
B. J. McKenzie
|
||||
.NH
|
||||
Introduction
|
||||
.LP
|
||||
The peephole optimizer consists of four major parts:
|
||||
.IP a)
|
||||
the table describing the optimization to be performed
|
||||
.IP b)
|
||||
a program to parse these tables and build input and output routines to
|
||||
interface to the library and a dfa based routine to recognize patterns and
|
||||
make the requested replacements.
|
||||
.IP c)
|
||||
common routines for the library that are independent of the table of a)
|
||||
.IP d)
|
||||
a stand alone version of the optimizer.
|
||||
.LP
|
||||
The library conforms to the
|
||||
.I EM_CODE(3)
|
||||
module interface but with routine names of the form
|
||||
.BI C_ xxx
|
||||
replaced by names like
|
||||
.BI O_ xxx.
|
||||
Furthermore there is also routine
|
||||
.I O_getid
|
||||
and no variable
|
||||
.I O_tmpdir
|
||||
in the module.
|
||||
The library module results in calls to the usual
|
||||
.I EM_CODE(3)
|
||||
module. It is possible to write a front end so that it can call either the
|
||||
normal
|
||||
.I EM_CODE(3)
|
||||
module or this new module by adding
|
||||
.B
|
||||
#define PEEPHOLE
|
||||
.R
|
||||
before the line
|
||||
.B
|
||||
#include <em.h>
|
||||
.R
|
||||
This will map all calls to the routine
|
||||
.BI C_ xxx
|
||||
into a call to the routine
|
||||
.BI O_ xxx.
|
||||
|
||||
.LP
|
||||
We shall now describe each of these major parts in some detail.
|
||||
|
||||
.NH
|
||||
The optimization table
|
||||
.LP
|
||||
The file
|
||||
.I patterns
|
||||
contains the patterns of EM instructions to be recognized by the optimizer
|
||||
and the EM instructions to replace them. Each pattern may have an
|
||||
optional restriction that must be satisfied before the replacement is made.
|
||||
The syntax of the table will be described using extended BNF notation
|
||||
used by
|
||||
.I LLGen
|
||||
where:
|
||||
.DS
|
||||
.I
|
||||
[...] - are used to group items
|
||||
| - is used to separate alternatives
|
||||
; - terminates a rule
|
||||
? - indicates item is optional
|
||||
* - indicates item is repeated zero or more times
|
||||
+ - indicates item is repeated one or more times
|
||||
.R
|
||||
.DE
|
||||
The format of each rule in the table is:
|
||||
.DS
|
||||
.I
|
||||
rule : pattern global_restriction? ':' replacement
|
||||
;
|
||||
.R
|
||||
.DE
|
||||
Each rule must be on a single line except that it may be broken after the
|
||||
colon if the next line begins with a tab character.
|
||||
The pattern has the syntax:
|
||||
.DS
|
||||
.I
|
||||
pattern : [ EM_mnem [ local_restriction ]? ]+
|
||||
;
|
||||
EM-mnem : "An EM instruction mnemonic"
|
||||
| 'lab'
|
||||
;
|
||||
.R
|
||||
.DE
|
||||
and consists of a sequence of one or more EM instructions or
|
||||
.I lab
|
||||
which stands for a defined instruction label. Each EM-mnem may optionally be
|
||||
followed by a local restriction on the argument of the mnemonic and take
|
||||
one of the following forms depending on the type of the EM instruction it
|
||||
follows:
|
||||
.DS
|
||||
.I
|
||||
local_restriction : normal_restriction
|
||||
| opt_arg_restriction
|
||||
| ext_arg_restriction
|
||||
;
|
||||
.R
|
||||
.DE
|
||||
A normal restriction is used after all types of EM instruction except for
|
||||
those that allow an optional argument, (such as
|
||||
.I adi
|
||||
) or those involving external names, (such as
|
||||
.I lae
|
||||
)
|
||||
and takes the form:
|
||||
.DS
|
||||
.I
|
||||
normal_restriction : [ rel_op ]? expression
|
||||
;
|
||||
rel_op : '=='
|
||||
| '!='
|
||||
| '<='
|
||||
| '<'
|
||||
| '>='
|
||||
| '>'
|
||||
;
|
||||
.R
|
||||
.DE
|
||||
If the rel_op is missing, the equality
|
||||
.I ==
|
||||
operator is assumed. The general form of expression is defined later but
|
||||
basically it involves simple constants, references to EM_mnem arguments
|
||||
that appear earlier in the pattern and expressions similar to those used
|
||||
in C expressions.
|
||||
|
||||
The form of the restriction after those EM instructions like
|
||||
.I adi
|
||||
whose arguments are optional takes the form:
|
||||
.DS
|
||||
.I
|
||||
opt_arg_restriction : normal_restriction
|
||||
| 'defined'
|
||||
| 'undefined'
|
||||
;
|
||||
.R
|
||||
.DE
|
||||
The
|
||||
.I defined
|
||||
and
|
||||
.I undefined
|
||||
indicate that the argument is present
|
||||
or absent respectively. The normal restriction form implies that the
|
||||
argument is present and satisfies the restriction.
|
||||
|
||||
The form of the restriction after those EM instructions like
|
||||
.I lae
|
||||
whose arguments refer to external object take the form:
|
||||
.DS
|
||||
.I
|
||||
ext_arg_restriction : patarg offset_part?
|
||||
;
|
||||
offset_part : [ '+' | '-' ] expression
|
||||
;
|
||||
.R
|
||||
.DE
|
||||
Such an argument has one of three forms: a offset with no name, an
|
||||
offset form a name or an offset from a label. With no offset part
|
||||
the restriction requires the argument to be identical to a previous
|
||||
external argument. With an offset part it requires an identical name
|
||||
part, (either empty, same name or same label) and supplies a relationship
|
||||
among the offset parts. It is possible to refer to test for the same
|
||||
external argument, the same name or to obtain the offset part of an external
|
||||
argument using the
|
||||
.I sameext
|
||||
,
|
||||
.I samenam
|
||||
and
|
||||
.I offset
|
||||
functions given below.
|
||||
.LP
|
||||
The general form of an expression is:
|
||||
.DS
|
||||
.I
|
||||
expression : expression binop expression
|
||||
| unaryop expression
|
||||
| '(' expression ')'
|
||||
| bin_function '(' expression ',' expression ')'
|
||||
| ext_function '(' patarg ',' patarg ')'
|
||||
| 'offset' '(' patarg ')'
|
||||
| patarg
|
||||
| 'p'
|
||||
| 'w'
|
||||
| INTEGER
|
||||
;
|
||||
.R
|
||||
.DE
|
||||
.DS
|
||||
.I
|
||||
bin_function : 'sfit'
|
||||
| 'ufit'
|
||||
| 'samesign'
|
||||
| 'rotate'
|
||||
;
|
||||
.R
|
||||
.DE
|
||||
.DS
|
||||
.I
|
||||
ext_function : 'samenam'
|
||||
| 'sameext'
|
||||
;
|
||||
patarg : '$' INTEGER
|
||||
;
|
||||
binop : "As for C language"
|
||||
unaryop : "As for C language"
|
||||
.R
|
||||
.DE
|
||||
The INTEGER in the
|
||||
.I patarg
|
||||
refers to the first, second, etc. argument in the pattern and it is
|
||||
required to refer to a pattern that appears earlier in the pattern
|
||||
The
|
||||
.I w
|
||||
and
|
||||
.I p
|
||||
refer to the word size and pointer size (in bytes) respectively. The
|
||||
various function test for:
|
||||
.IP sfit 10
|
||||
the first argument fits as a signed value of
|
||||
the number of bit specified by the second argument.
|
||||
.IP ufit 10
|
||||
as for sfit but for unsigned values.
|
||||
.IP samesign 10
|
||||
the first argument has the same sign as the second.
|
||||
.IP rotate 10
|
||||
the value of the first argument rotated by the number of bit specified
|
||||
by the second argument.
|
||||
.IP samenam 10
|
||||
both arguments refer to externals and have either no name, the same name
|
||||
or same label.
|
||||
.IP sameext 10
|
||||
both arguments refer to the same external.
|
||||
.IP offset 10
|
||||
the argument is an external and this yields it offset part.
|
||||
|
||||
.LP
|
||||
The global restriction takes the form:
|
||||
.DS
|
||||
.I
|
||||
global_restriction : '?' expression
|
||||
;
|
||||
.R
|
||||
.DE
|
||||
and is used to express restrictions that cannot be expressed as simple
|
||||
restrictions on a single argument or are can be expressed in a more
|
||||
readable fashion as a global restriction. An example of such a rule is:
|
||||
.DS
|
||||
.I
|
||||
dup w ldl stf ? p==2*w : ldl $2 stf $3 ldl $2 lof $3
|
||||
.R
|
||||
.DE
|
||||
which says that this rule only applies if the pointer size is twice the
|
||||
word size.
|
||||
|
||||
.NH
|
||||
Incompatibilities with Previous Optimizer
|
||||
.LP
|
||||
The current table format is not compatible with previous versions of the
|
||||
peephole optimizer tables. In particular the previous table had no provision
|
||||
for local restrictions and only the equivalent of the global restriction.
|
||||
This meant that our
|
||||
.I '?'
|
||||
character that announces the presence of the optional global restriction was
|
||||
not required. The previous optimizer performed a number of other tasks that
|
||||
were unrelated to optimization that were possible because the old optimizer
|
||||
read the EM code for a complete procedure at a time. This included tasks such
|
||||
as register variable reference counting and moving the information regarding
|
||||
the number of bytes of local storage required by a procedure from it
|
||||
.I end
|
||||
pseudo instruction to it's
|
||||
.I pro
|
||||
pseudo instruction. These tasks are no longer done by this module but have
|
||||
been moved to other modules or programs in the pipeline. The register variable
|
||||
reference counting is now performed by the front end. The reordering of
|
||||
code, such as the moving of mes instructions and the local storage
|
||||
requirements from the end to beginning of procedures, is now performed using
|
||||
the insertpart mechanism in the
|
||||
.I EM_CODE
|
||||
(or
|
||||
.I EM_OPT
|
||||
) module.
|
||||
The removal of dead code is performed by the global optimizer.
|
||||
|
||||
.NH
|
||||
The Parser
|
||||
.LP
|
||||
The program to parse the tables and build the pattern table dependent dfa
|
||||
routines is built from the files:
|
||||
.IP parser.h 15
|
||||
header file
|
||||
.IP parser.g 15
|
||||
LLGen source file defining syntax of table
|
||||
.IP syntax.l 15
|
||||
Lex sources file defining form of tokens in table.
|
||||
.IP initlex.c 15
|
||||
Uses the data in the library
|
||||
.I em_data.a
|
||||
to initialize the lexical analyser to recognize EM instruction mnemonics.
|
||||
.IP outputdfa.c 15
|
||||
Routines to output the dfa when it has been constructed. It outputs the files
|
||||
.I dfa.c
|
||||
and
|
||||
.I trans.c
|
||||
.IP outcalls.c 15
|
||||
Routines to output the file
|
||||
.I incalls.r
|
||||
defined in the next section.
|
||||
.IP findworst.c 15
|
||||
Routines to analyze patterns to find how to continue matching after a
|
||||
successful replacement or failed match.
|
||||
|
||||
.LP
|
||||
The parser checks that the tables conform to the syntax outlined in the
|
||||
previous section and also makes a number of semantic checks on their
|
||||
validity. Further versions could make further checks such as looking for
|
||||
cycles in the rules or checking that each replacement leaves the same
|
||||
number of bytes on the stack as the pattern it replaces. The parser
|
||||
builds an internal dfa representation of the rules by combining rules with
|
||||
common prefixes. All local and global restrictions are combined into a single
|
||||
test to be performed are a complete pattern has been detected in the input.
|
||||
The idea is to build a structure so that each of the patterns can be matched
|
||||
and then the corresponding tests made and the first that succeeds is replaced.
|
||||
If two rules have the same pattern and both their tests also succeed the one
|
||||
that appears first in the tables file will be done. Somewhat less obvious
|
||||
is that if one pattern is a proper prefix of a longer pattern and its test
|
||||
succeeds then the second pattern will not be checked for.
|
||||
|
||||
A major task of the parser if to decide on the action to take when a rule has
|
||||
been partially matched or when a pattern has been completely matched but its
|
||||
test does not succeed. This requires a search of all patterns to see if any
|
||||
part of the part matched could be part of some other pattern. for example
|
||||
given the two patterns:
|
||||
.DS
|
||||
.I
|
||||
loc adi w loc adi w : loc $1+$3 adi w
|
||||
loc adi w loc sbi w : loc $1-$3 adi w
|
||||
.R
|
||||
.DE
|
||||
If the first pattern fails after seeing the input:
|
||||
.DS
|
||||
.I
|
||||
loc adi loc
|
||||
.R
|
||||
.DE
|
||||
the parser will still need to check whether the second pattern matches.
|
||||
This requires a decision on how to fix up any internal data structures in
|
||||
the dfa matcher, such as moving some instructions from the pattern to the
|
||||
output queue and moving the pattern along and then deciding what state
|
||||
it should continue from. Similar decisions are requires after a pattern
|
||||
has been replaced. For example if the replacement is empty it is necessary
|
||||
to backup
|
||||
.I n-1
|
||||
instructions where
|
||||
.I n
|
||||
is the length of the longest pattern in the tables.
|
||||
|
||||
.NH
|
||||
Structure of the Resulting Library
|
||||
|
||||
.LP
|
||||
The major data structures maintained by the library consist of three queues;
|
||||
an
|
||||
.I output
|
||||
queue of instructions awaiting output, a
|
||||
.I pattern
|
||||
queue containing instructions that match the current prefix, and a
|
||||
.I backup
|
||||
queue of instructions that have been backed up over and need to be reparsed
|
||||
for further pattern matches.
|
||||
|
||||
.LP
|
||||
If no errors are detected by the parser in the tables it output the following
|
||||
files if they have changed from the existing version of the file:
|
||||
.IP dfa.c 10
|
||||
this consists of a routine for each state in the dfa. Each routine contains
|
||||
a switch statement that decides on the basis of the current instruction
|
||||
opcode the next state if any in the dfa to make a transition to.
|
||||
These routines are called from
|
||||
.I OO_dfa
|
||||
declared in
|
||||
.I OO_dfa.c
|
||||
via an array
|
||||
.I OO_fstate
|
||||
that is indexed by state. Attempt to implement this code by a large nested
|
||||
switch statement experienced difficulties with compilers that had fixed
|
||||
limits on the size of switch statements. A better implementation of this
|
||||
might be to find some hashing function that mapped state and opcode onto a
|
||||
unique value and then switch on this via an array.
|
||||
.IP trans.c 10
|
||||
this contains external declarations of transition routines with names like
|
||||
.B OO_xxxdotrans
|
||||
(where
|
||||
.I xxx
|
||||
is a small integer).
|
||||
These are called when there a transition to state
|
||||
.I xxx
|
||||
that corresponds to a
|
||||
complete pattern. Any tests are performed if necessary to confirm that the
|
||||
pattern matches and then the replacement instructions are placed on the
|
||||
output queue and backup and freeing of instructions is performed. If there are
|
||||
a number of patterns with the same instructions but different tests, these
|
||||
will all appear in the same routine and the tests performed in the order they
|
||||
appear in the original
|
||||
.I patterns
|
||||
file.
|
||||
.IP incalls.r 10
|
||||
this contains an entry for every EM instruction (plus
|
||||
.I lab
|
||||
) giving information on how to build a routine with the name
|
||||
.BI O_ xxx
|
||||
for the library version of the module.
|
||||
If the EM instruction does not appear in the tables
|
||||
patterns at all then the dfa routine is called to flush any current queued
|
||||
output and the the output
|
||||
.BI C_ xxx
|
||||
routine is called. If the EM instruction does appear in a pattern then the
|
||||
instruction data structure is allocated, (from the free list), its fields
|
||||
initialized and it is added onto the end of the pattern queue.
|
||||
The dfa routines are then called to attempted to make a transition.
|
||||
This file is input to the
|
||||
.I awk
|
||||
program
|
||||
.I makefuns.awk.
|
||||
|
||||
.LP
|
||||
The following files contain code that is independent of the pattern tables:
|
||||
.IP main.c 10
|
||||
this is used only in the stand alone version of the optimizer and consists
|
||||
of code to open the input file, read the input using the
|
||||
.I READ_EM(3)
|
||||
module and call the dfa routines. This version does not require the routines
|
||||
constructed from the incalls.r file described above.
|
||||
.IP nopt.c 10
|
||||
general routines to initialize, and maintain the data structures. The file
|
||||
handling routines
|
||||
.I O_open
|
||||
etc are defined here. Also defined are routines for flushing the output queue
|
||||
by calling the
|
||||
.I EM_mkcalls
|
||||
routine from the
|
||||
.I READ_EM(3)
|
||||
module and moving instructions from the output to the backup queue.
|
||||
Routines to free the strings stored in instructions
|
||||
with types of
|
||||
.I sof_ptyp,
|
||||
.I pro_ptyp,
|
||||
.I str_ptyp,
|
||||
.I ico_ptyp,
|
||||
.I uco_ptyp,
|
||||
and
|
||||
.I fco_ptyp are also defined. These strings are copied to a large array that
|
||||
is extended by
|
||||
.I Realloc
|
||||
if it overflows. The strings can be thrown away on any flush that occurs when
|
||||
the backup queue is empty.
|
||||
.IP mkstrct.c 10
|
||||
contains routines to build the data structure from the input
|
||||
.BI C_ xxx
|
||||
routines and place the structure on the pattern queue. These routines are not
|
||||
required in the stand alone optimizer.
|
||||
.IP aux.c 10
|
||||
routines to implement the functions used in the rules.
|
||||
|
||||
.LP
|
||||
The following files are also used in building the module library:
|
||||
.IP makefuns.awk 10
|
||||
this
|
||||
.I awk
|
||||
program is used to produce individual C files with names like
|
||||
.BI O_ xxx.c
|
||||
each containing a single function definition and then call the
|
||||
.I cc
|
||||
compiler to produce a single output file.
|
||||
This enables the loader to only load those routines that are actually
|
||||
needed when the library is loaded.
|
||||
.IP pseudo.r 10
|
||||
this file is like the
|
||||
.I incalls.r
|
||||
file produced by the parser but is built by hand and handles the pseudo
|
||||
EM instructions. It is also processed by
|
||||
.I makefuns.awk.
|
||||
|
||||
.NH
|
||||
Miscellaneous Issues
|
||||
.LP
|
||||
The output and backup queues are maintained on fixed length arrays
|
||||
of pointers the the
|
||||
.I e_instr
|
||||
data structure used by the
|
||||
.I READ_EM(3)
|
||||
module.
|
||||
The size of these queues are fixed in size according to the
|
||||
values of
|
||||
.I MAXOUTPUT
|
||||
and
|
||||
.I MAXBACKUP
|
||||
defined in the file
|
||||
.I nopt.c.
|
||||
The size of the pattern queue is set to the length of the maximum pattern
|
||||
length by the tables output by the parser.
|
||||
The space for the structures are initially obtained by calls to
|
||||
.I Malloc
|
||||
(from the
|
||||
.I alloc(3)
|
||||
module),
|
||||
and freed when the output queue or patterns queue is cleared. These freed
|
||||
structures are collected on the free list and reused to avoid the overheads
|
||||
of repeated calls to
|
||||
.I malloc
|
||||
and
|
||||
.I free.
|
||||
|
||||
.LP
|
||||
The fixed size of the output and pattern queues causes no difficulty in
|
||||
practice and can only result in some potential optimizations being missed.
|
||||
When the output queue fills it is simply prematurely flushed and backups
|
||||
when the backup queue is fill are simply ignored. A possible improvement
|
||||
would be to flush only part of the output queue when it fills. It should
|
||||
be noted that it is not possible to statically determine the maximum possible
|
||||
size for these queues as they need to be unbounded in the worst case. A
|
||||
study of the rule
|
||||
.DS
|
||||
.I
|
||||
inc dec :
|
||||
.R
|
||||
.DE
|
||||
with the input consisting of
|
||||
.I N
|
||||
.I inc
|
||||
and then
|
||||
.I N
|
||||
.I dec
|
||||
instructions requires an output queue length of
|
||||
.I N-1
|
||||
to find all possible replacements.
|
Loading…
Reference in a new issue