Initial revision
This commit is contained in:
parent
d0103bd7d9
commit
2f204ff2ca
434
modules/src/em_opt/doc.t
Normal file
434
modules/src/em_opt/doc.t
Normal file
|
@ -0,0 +1,434 @@
|
|||
.TL
|
||||
A Tour of the Peephole Optimizer Library
|
||||
.AU
|
||||
B. J. McKenzie
|
||||
.NH
|
||||
Introduction
|
||||
.LP
|
||||
The peephole optimizer consists of three major parts:
|
||||
.IP a)
|
||||
the table describing the optimization to be performed
|
||||
.IP b)
|
||||
a program to parse these tables and build input and output routines to
|
||||
interface to the library and a dfa based routine to recognize patterns and
|
||||
make the requested replacements.
|
||||
.IP c)
|
||||
common routines for the library that are independent of the table of a)
|
||||
.LP
|
||||
The library conforms to the
|
||||
.I EM_CODE(3)
|
||||
module interface with entry points with names like
|
||||
.I C_xxx.
|
||||
The library module results in calls to a module with an identical interface
|
||||
but with calls to routines with names of the form
|
||||
.I O_xxx.
|
||||
|
||||
.LP
|
||||
We shall now describe each of these in turn in some detail.
|
||||
|
||||
.NH
|
||||
The optimization table
|
||||
.LP
|
||||
The file
|
||||
.I patterns
|
||||
contains the patterns of EM instructions to be recognized by the optimizer
|
||||
and the EM instructions to replace them. Each pattern may have an
|
||||
optional restriction that must be satisfied before the replacement is made.
|
||||
The syntax of the table will be described using extended BNF notation
|
||||
used by
|
||||
.I LLGen
|
||||
where:
|
||||
.DS
|
||||
.I
|
||||
[...] - are used to group items
|
||||
| - is used to separate alternatives
|
||||
; - terminates a rule
|
||||
? - indicates item is optional
|
||||
* - indicates item is repeated zero or more times
|
||||
+ - indicates item is repeated one or more times
|
||||
.R
|
||||
.DE
|
||||
The format of each rule in the table is:
|
||||
.DS
|
||||
.I
|
||||
rule : pattern global_restriction? ':' replacement
|
||||
;
|
||||
.R
|
||||
.DE
|
||||
Each rule must be on a single line except that it may be broken after the
|
||||
colon if the next line begins with a tab character.
|
||||
The pattern has the syntax:
|
||||
.DS
|
||||
.I
|
||||
pattern : [ EM_mnem [ local_restriction ]? ]+
|
||||
;
|
||||
EM-mnem : "An EM instruction mnemonic"
|
||||
| 'lab'
|
||||
;
|
||||
.R
|
||||
.DE
|
||||
and consists of a sequence of one or more EM instructions or
|
||||
.I lab
|
||||
which stands for a defined instruction label. Each EM-mnem may optionally be
|
||||
followed by a local restriction on the argument of the mnemonic and take
|
||||
one of the following forms depending on the type of the EM instruction it
|
||||
follows:
|
||||
.DS
|
||||
.I
|
||||
local_restriction : normal_restriction
|
||||
| opt_arg_restriction
|
||||
| ext_arg_restriction
|
||||
;
|
||||
.R
|
||||
.DE
|
||||
A normal restriction is used after all types of EM instruction except for
|
||||
those that allow an optional argument, (such as
|
||||
.I adi
|
||||
) or those involving external names, (such as
|
||||
.I lae
|
||||
)
|
||||
and takes the form:
|
||||
.DS
|
||||
.I
|
||||
normal_restriction : [ rel_op ]? expression
|
||||
;
|
||||
rel_op : '=='
|
||||
| '!='
|
||||
| '<='
|
||||
| '<'
|
||||
| '>='
|
||||
| '>'
|
||||
;
|
||||
.R
|
||||
.DE
|
||||
If the rel_op is missing, the equality
|
||||
.I ==
|
||||
operator is assumed. The general form of expression is defined later but
|
||||
basically it involves simple constants, references to EM_mnem arguments
|
||||
that appear earlier in the pattern and expressions similar to those used
|
||||
in C expressions.
|
||||
|
||||
The form of the restriction after those EM instructions like
|
||||
.I adi
|
||||
whose arguments are optional takes the form:
|
||||
.DS
|
||||
.I
|
||||
opt_arg_restriction : normal_restriction
|
||||
| 'defined'
|
||||
| 'undefined'
|
||||
;
|
||||
.R
|
||||
.DE
|
||||
The
|
||||
.I defined
|
||||
and
|
||||
.I undefined
|
||||
indicate that the argument is present
|
||||
or absent respectively. The normal restriction form implies that the
|
||||
argument is present and satisfies the restriction.
|
||||
|
||||
The form of the restriction after those EM instructions like
|
||||
.I lae
|
||||
whose arguments refer to external object take the form:
|
||||
.DS
|
||||
.I
|
||||
ext_arg_restriction : patarg offset_part?
|
||||
;
|
||||
offset_part : [ '+' | '-' ] expression
|
||||
;
|
||||
.R
|
||||
.DE
|
||||
Such an argument has one of three forms: a offset with no name, an
|
||||
offset form a name or an offset from a label. With no offset part
|
||||
the restriction requires the argument to be identical to a previous
|
||||
external argument. With an offset part it requires an identical name
|
||||
part, (either empty, same name or same label) and supplies a relationship
|
||||
among the offset parts. It is possible to refer to test for the same
|
||||
external argument, the same name or to obtain the offset part of an external
|
||||
argument using the
|
||||
.I sameext
|
||||
,
|
||||
.I samenam
|
||||
and
|
||||
.I offset
|
||||
functions given below.
|
||||
.LP
|
||||
The general form of an expression is:
|
||||
.DS
|
||||
.I
|
||||
expression : expression binop expression
|
||||
| unaryop expression
|
||||
| '(' expression ')'
|
||||
| bin_function '(' expression ',' expression ')'
|
||||
| ext_function '(' patarg ',' patarg ')'
|
||||
| 'offset' '(' patarg ')'
|
||||
| patarg
|
||||
| 'p'
|
||||
| 'w'
|
||||
| INTEGER
|
||||
;
|
||||
.R
|
||||
.DE
|
||||
.DS
|
||||
.I
|
||||
bin_function : 'sfit'
|
||||
| 'ufit'
|
||||
| 'samesign'
|
||||
| 'rotate'
|
||||
;
|
||||
.R
|
||||
.DE
|
||||
.DS
|
||||
.I
|
||||
ext_function : 'samenam'
|
||||
| 'sameext'
|
||||
;
|
||||
patarg : '$' INTEGER
|
||||
;
|
||||
binop : "As for C language"
|
||||
unaryop : "As for C language"
|
||||
.R
|
||||
.DE
|
||||
The INTEGER in the
|
||||
.I patarg
|
||||
refers to the first, second, etc. argument in the pattern and it is
|
||||
required to refer to a pattern that appears earlier in the pattern
|
||||
The
|
||||
.I w
|
||||
and
|
||||
.I p
|
||||
refer to the word size and pointer size (in bytes) respectively. The
|
||||
various function test for:
|
||||
.IP sfit 10
|
||||
the first argument fits as a signed value of
|
||||
the number of bit specified by the second argument.
|
||||
.IP ufit 10
|
||||
as for sfit but for unsigned values.
|
||||
.IP samesign 10
|
||||
the first argument has the same sign as the second.
|
||||
.IP rotate 10
|
||||
the value of the first argument rotated by the number of bit specified
|
||||
by the second argument.
|
||||
.IP samenam 10
|
||||
both arguments refer to externals and have either no name, the same name
|
||||
or same label.
|
||||
.IP sameext 10
|
||||
both arguments refer to the same external.
|
||||
.IP offset 10
|
||||
the argument is an external and this yields it offset part.
|
||||
|
||||
.LP
|
||||
The global restriction takes the form:
|
||||
.DS
|
||||
.I
|
||||
global_restriction : '?' expression
|
||||
;
|
||||
.R
|
||||
.DE
|
||||
and is used to express restrictions that cannot be expressed as simple
|
||||
restrictions on a single argument or are can be expressed in a more
|
||||
readable fashion as a global restriction. An example of such a rule is:
|
||||
.DS
|
||||
.I
|
||||
dup w ldl stf ? p==2*w : ldl $2 stf $3 ldl $2 lof $3
|
||||
.R
|
||||
.DE
|
||||
which says that this rule only applies if the pointer size is twice the
|
||||
word size.
|
||||
|
||||
.NH
|
||||
Incompatibilities with Previous Optimizer
|
||||
.LP
|
||||
The current table format is not compatible with previous versions of the
|
||||
peephole optimizer tables. In particular the previous table had no provision
|
||||
for local restrictions and only the equivalent of the global restriction.
|
||||
This meant that our
|
||||
.I '?'
|
||||
character that announces the presence of the optional global restriction was
|
||||
not required. The previous optimizer performed a number of other tasks that
|
||||
were unrelated to optimization that were possible because the old optimizer
|
||||
read the EM code for a complete procedure at a time. This included task such
|
||||
as register variable reference counting and moving the information regarding
|
||||
the number of bytes of local storage required by a procedure from it
|
||||
.I end
|
||||
pseudo instruction to it's
|
||||
.I pro
|
||||
pseudo instruction. These tasks are no longer done. If there are required
|
||||
then the must be performed by some other program in the pipeline.
|
||||
|
||||
.NH
|
||||
The Parser
|
||||
.LP
|
||||
The program to parse the tables and build the pattern table dependent dfa
|
||||
routines is built from the files:
|
||||
.IP parser.h 15
|
||||
header file
|
||||
.IP parser.g 15
|
||||
LLGen source file defining syntax of table
|
||||
.IP syntax.l 15
|
||||
Lex sources file defining form of tokens in table.
|
||||
.IP initlex.c 15
|
||||
Uses the data in the library
|
||||
.I em_data.a
|
||||
to initialize the lexical analyser to recognize EM instruction mnemonics.
|
||||
.IP outputdfa.c 15
|
||||
Routines to output dfa when it has been constructed.
|
||||
.IP outcalls.c 15
|
||||
Routines to output the file
|
||||
.I incalls.c
|
||||
defined in section 4.
|
||||
.IP findworst.c 15
|
||||
Routines to analyze patterns to find how to continue matching after a
|
||||
successful replacement or failed match.
|
||||
|
||||
.LP
|
||||
The parser checks that the tables conform to the syntax outlined in the
|
||||
previous section and also mades a number of semantic checks on their
|
||||
validity. Further versions could make further checks such as looking for
|
||||
cycles in the rules or checking that each replacement leaves the same
|
||||
number of bytes on the stack as the pattern it replaces. The parser
|
||||
builds an internal dfa representation of the rules by combining rules with
|
||||
common prefixes. All local and global restrictions are combined into a single
|
||||
test to be performed are a complete pattern has been detected in the input.
|
||||
The idea is to build a structure so that each of the patterns can be matched
|
||||
and then the corresponding tests made and the first that succeeds is replaced.
|
||||
If two rules have the same pattern and both their tests also succeed the one
|
||||
that appears first in the tables file will be done. Somewhat less obvious
|
||||
is that id one pattern is a proper prefix of a longer pattern and its test
|
||||
succeeds then the second pattern will not be checked for.
|
||||
|
||||
A major task of the parser if to decide on the action to take when a rule has
|
||||
been partially matched or when a pattern has been completely matched but its
|
||||
test does not succeed. This requires a search of all patterns to see if any
|
||||
part of the part matched could be part of some other pattern. for example
|
||||
given the two patterns:
|
||||
.DS
|
||||
.I
|
||||
loc adi w loc adi w : loc $1+$3 adi w
|
||||
loc adi w loc sbi w : loc $1-$3 adi w
|
||||
.R
|
||||
.DE
|
||||
If the first pattern fails after seeing the input:
|
||||
.DS
|
||||
.I
|
||||
loc adi loc
|
||||
.R
|
||||
.DE
|
||||
the parser will still need to check whether the second pattern matches.
|
||||
This requires a decision on how to fix up any internal data structures in
|
||||
the dfa matcher, such as moving some instructions from the pattern to the
|
||||
output queue and moving the pattern along and then deciding what state
|
||||
it should continue from. Similar decisions are requires after a pattern
|
||||
has been replaced. For example if the replacement is empty it is necessary
|
||||
to backup
|
||||
.I n-1
|
||||
instructions where
|
||||
.I n
|
||||
is the length of the longest pattern in the tables.
|
||||
|
||||
.NH
|
||||
Structure of the Resulting Library
|
||||
|
||||
.LP
|
||||
The major data structures maintained by the library consist of three queues;
|
||||
an
|
||||
.I output
|
||||
queue of instructions awaiting output, a
|
||||
.I pattern
|
||||
queue containing instructions that match the current prefix, and a
|
||||
.I backup
|
||||
queue of instructions that have been backed up over and need to be reparsed
|
||||
for further pattern matches.
|
||||
|
||||
.LP
|
||||
If no errors are detected by the parser in the tables it output the following
|
||||
files:
|
||||
.IP dfa.c 10
|
||||
this consists of a large switch statement that maintains the current state of
|
||||
the dfa and makes a transition to the next state if the next input instruction
|
||||
matches.
|
||||
.IP incalls.r 10
|
||||
this contains an entry for every EM instruction (plus
|
||||
.I lab
|
||||
) giving information on how to build a routine with the name
|
||||
.I C_xxx
|
||||
that conforms to the
|
||||
.I EM_CODE(3)
|
||||
modules interface. If the EM instruction does not appear in the tables
|
||||
patterns at all then the dfa routine is called to flush any current queued
|
||||
output and the the output
|
||||
.I O_xxx
|
||||
routine is called. If the EM instruction does appear in a pattern then the instruction is added onto the end of the pattern queue and the dfa routines called
|
||||
to attempted to make a transition. This file is input to the
|
||||
.I awk
|
||||
program
|
||||
.I makefuns.awk
|
||||
to produce individual C files with names like
|
||||
.I C_xxx.c
|
||||
each containing a single function definition. This enables the loader to
|
||||
only load those routines that are actually needed when the library is loaded.
|
||||
.IP trans.c 10
|
||||
this contains a routine that is called after each transition to a state that
|
||||
contains restrictions and replacements. The restrictions a converted to
|
||||
C expressions and the replacements coded as calls to output instructions
|
||||
into the output queue.
|
||||
|
||||
.LP
|
||||
The following files contain code that is independent of the pattern tables:
|
||||
.IP nopt.c 10
|
||||
general routines to initialize, and maintain the data structures.
|
||||
.IP aux.c 10
|
||||
routines to implement the functions used in the rules.
|
||||
.IP mkcalls.c 10
|
||||
code to convert the internal data structures to calls on the output
|
||||
.I O_xxx
|
||||
routines when the output queue is flushed.
|
||||
|
||||
.NH
|
||||
Miscellaneous Issues
|
||||
.LP
|
||||
The size of the output and backup queues are fixed in size according to the
|
||||
values of
|
||||
.I MAXOUTPUT
|
||||
and
|
||||
.I MAXBACKUP
|
||||
defined in the file
|
||||
.I nopt.h.
|
||||
The size of the pattern queue is set to the length of the maximum pattern
|
||||
length by the tables output by the parser. The queues are implemented as
|
||||
arrays of pointers to structures containing the instruction and its arguments.
|
||||
The space for the structures are initially obtained by calls to
|
||||
.I Malloc
|
||||
(from the
|
||||
.I alloc(3)
|
||||
module),
|
||||
and freed when the output queue or patterns queue is cleared. These freed
|
||||
structures are collected on a free list and reused to avoid the overheads
|
||||
of repeated calls to
|
||||
.I malloc
|
||||
and
|
||||
.I free.
|
||||
|
||||
.LP
|
||||
The fixed size of the output and pattern queues causes no difficulty in
|
||||
practice and can only result in some potential optimizations being missed.
|
||||
When the output queue fills it is simply prematurely flushed and backups
|
||||
when the backup queue is fill are simply ignored. A possible improvement
|
||||
would be to flush only part of the output queue when it fills. It should
|
||||
be noted that it is not possible to statically determine the maximum possible
|
||||
size for these queues as they need to be unbounded in the worst case. A
|
||||
study of the rule
|
||||
.DS
|
||||
.I
|
||||
inc dec :
|
||||
.R
|
||||
.DE
|
||||
with the input consisting of
|
||||
.I N
|
||||
.I inc
|
||||
and then
|
||||
.I N
|
||||
.I dec
|
||||
instructions requires an output queue length of
|
||||
.I N-1
|
||||
to find all possible replacements.
|
Loading…
Reference in a new issue