Initial revision
This commit is contained in:
parent
d0103bd7d9
commit
2f204ff2ca
434
modules/src/em_opt/doc.t
Normal file
434
modules/src/em_opt/doc.t
Normal file
|
@ -0,0 +1,434 @@
|
||||||
|
.TL
|
||||||
|
A Tour of the Peephole Optimizer Library
|
||||||
|
.AU
|
||||||
|
B. J. McKenzie
|
||||||
|
.NH
|
||||||
|
Introduction
|
||||||
|
.LP
|
||||||
|
The peephole optimizer consists of three major parts:
|
||||||
|
.IP a)
|
||||||
|
the table describing the optimization to be performed
|
||||||
|
.IP b)
|
||||||
|
a program to parse these tables and build input and output routines to
|
||||||
|
interface to the library and a dfa based routine to recognize patterns and
|
||||||
|
make the requested replacements.
|
||||||
|
.IP c)
|
||||||
|
common routines for the library that are independent of the table of a)
|
||||||
|
.LP
|
||||||
|
The library conforms to the
|
||||||
|
.I EM_CODE(3)
|
||||||
|
module interface with entry points with names like
|
||||||
|
.I C_xxx.
|
||||||
|
The library module results in calls to a module with an identical interface
|
||||||
|
but with calls to routines with names of the form
|
||||||
|
.I O_xxx.
|
||||||
|
|
||||||
|
.LP
|
||||||
|
We shall now describe each of these in turn in some detail.
|
||||||
|
|
||||||
|
.NH
|
||||||
|
The optimization table
|
||||||
|
.LP
|
||||||
|
The file
|
||||||
|
.I patterns
|
||||||
|
contains the patterns of EM instructions to be recognized by the optimizer
|
||||||
|
and the EM instructions to replace them. Each pattern may have an
|
||||||
|
optional restriction that must be satisfied before the replacement is made.
|
||||||
|
The syntax of the table will be described using extended BNF notation
|
||||||
|
used by
|
||||||
|
.I LLGen
|
||||||
|
where:
|
||||||
|
.DS
|
||||||
|
.I
|
||||||
|
[...] - are used to group items
|
||||||
|
| - is used to separate alternatives
|
||||||
|
; - terminates a rule
|
||||||
|
? - indicates item is optional
|
||||||
|
* - indicates item is repeated zero or more times
|
||||||
|
+ - indicates item is repeated one or more times
|
||||||
|
.R
|
||||||
|
.DE
|
||||||
|
The format of each rule in the table is:
|
||||||
|
.DS
|
||||||
|
.I
|
||||||
|
rule : pattern global_restriction? ':' replacement
|
||||||
|
;
|
||||||
|
.R
|
||||||
|
.DE
|
||||||
|
Each rule must be on a single line except that it may be broken after the
|
||||||
|
colon if the next line begins with a tab character.
|
||||||
|
The pattern has the syntax:
|
||||||
|
.DS
|
||||||
|
.I
|
||||||
|
pattern : [ EM_mnem [ local_restriction ]? ]+
|
||||||
|
;
|
||||||
|
EM-mnem : "An EM instruction mnemonic"
|
||||||
|
| 'lab'
|
||||||
|
;
|
||||||
|
.R
|
||||||
|
.DE
|
||||||
|
and consists of a sequence of one or more EM instructions or
|
||||||
|
.I lab
|
||||||
|
which stands for a defined instruction label. Each EM-mnem may optionally be
|
||||||
|
followed by a local restriction on the argument of the mnemonic and take
|
||||||
|
one of the following forms depending on the type of the EM instruction it
|
||||||
|
follows:
|
||||||
|
.DS
|
||||||
|
.I
|
||||||
|
local_restriction : normal_restriction
|
||||||
|
| opt_arg_restriction
|
||||||
|
| ext_arg_restriction
|
||||||
|
;
|
||||||
|
.R
|
||||||
|
.DE
|
||||||
|
A normal restriction is used after all types of EM instruction except for
|
||||||
|
those that allow an optional argument, (such as
|
||||||
|
.I adi
|
||||||
|
) or those involving external names, (such as
|
||||||
|
.I lae
|
||||||
|
)
|
||||||
|
and takes the form:
|
||||||
|
.DS
|
||||||
|
.I
|
||||||
|
normal_restriction : [ rel_op ]? expression
|
||||||
|
;
|
||||||
|
rel_op : '=='
|
||||||
|
| '!='
|
||||||
|
| '<='
|
||||||
|
| '<'
|
||||||
|
| '>='
|
||||||
|
| '>'
|
||||||
|
;
|
||||||
|
.R
|
||||||
|
.DE
|
||||||
|
If the rel_op is missing, the equality
|
||||||
|
.I ==
|
||||||
|
operator is assumed. The general form of expression is defined later but
|
||||||
|
basically it involves simple constants, references to EM_mnem arguments
|
||||||
|
that appear earlier in the pattern and expressions similar to those used
|
||||||
|
in C expressions.
|
||||||
|
|
||||||
|
The form of the restriction after those EM instructions like
|
||||||
|
.I adi
|
||||||
|
whose arguments are optional takes the form:
|
||||||
|
.DS
|
||||||
|
.I
|
||||||
|
opt_arg_restriction : normal_restriction
|
||||||
|
| 'defined'
|
||||||
|
| 'undefined'
|
||||||
|
;
|
||||||
|
.R
|
||||||
|
.DE
|
||||||
|
The
|
||||||
|
.I defined
|
||||||
|
and
|
||||||
|
.I undefined
|
||||||
|
indicate that the argument is present
|
||||||
|
or absent respectively. The normal restriction form implies that the
|
||||||
|
argument is present and satisfies the restriction.
|
||||||
|
|
||||||
|
The form of the restriction after those EM instructions like
|
||||||
|
.I lae
|
||||||
|
whose arguments refer to external object take the form:
|
||||||
|
.DS
|
||||||
|
.I
|
||||||
|
ext_arg_restriction : patarg offset_part?
|
||||||
|
;
|
||||||
|
offset_part : [ '+' | '-' ] expression
|
||||||
|
;
|
||||||
|
.R
|
||||||
|
.DE
|
||||||
|
Such an argument has one of three forms: a offset with no name, an
|
||||||
|
offset form a name or an offset from a label. With no offset part
|
||||||
|
the restriction requires the argument to be identical to a previous
|
||||||
|
external argument. With an offset part it requires an identical name
|
||||||
|
part, (either empty, same name or same label) and supplies a relationship
|
||||||
|
among the offset parts. It is possible to refer to test for the same
|
||||||
|
external argument, the same name or to obtain the offset part of an external
|
||||||
|
argument using the
|
||||||
|
.I sameext
|
||||||
|
,
|
||||||
|
.I samenam
|
||||||
|
and
|
||||||
|
.I offset
|
||||||
|
functions given below.
|
||||||
|
.LP
|
||||||
|
The general form of an expression is:
|
||||||
|
.DS
|
||||||
|
.I
|
||||||
|
expression : expression binop expression
|
||||||
|
| unaryop expression
|
||||||
|
| '(' expression ')'
|
||||||
|
| bin_function '(' expression ',' expression ')'
|
||||||
|
| ext_function '(' patarg ',' patarg ')'
|
||||||
|
| 'offset' '(' patarg ')'
|
||||||
|
| patarg
|
||||||
|
| 'p'
|
||||||
|
| 'w'
|
||||||
|
| INTEGER
|
||||||
|
;
|
||||||
|
.R
|
||||||
|
.DE
|
||||||
|
.DS
|
||||||
|
.I
|
||||||
|
bin_function : 'sfit'
|
||||||
|
| 'ufit'
|
||||||
|
| 'samesign'
|
||||||
|
| 'rotate'
|
||||||
|
;
|
||||||
|
.R
|
||||||
|
.DE
|
||||||
|
.DS
|
||||||
|
.I
|
||||||
|
ext_function : 'samenam'
|
||||||
|
| 'sameext'
|
||||||
|
;
|
||||||
|
patarg : '$' INTEGER
|
||||||
|
;
|
||||||
|
binop : "As for C language"
|
||||||
|
unaryop : "As for C language"
|
||||||
|
.R
|
||||||
|
.DE
|
||||||
|
The INTEGER in the
|
||||||
|
.I patarg
|
||||||
|
refers to the first, second, etc. argument in the pattern and it is
|
||||||
|
required to refer to a pattern that appears earlier in the pattern
|
||||||
|
The
|
||||||
|
.I w
|
||||||
|
and
|
||||||
|
.I p
|
||||||
|
refer to the word size and pointer size (in bytes) respectively. The
|
||||||
|
various function test for:
|
||||||
|
.IP sfit 10
|
||||||
|
the first argument fits as a signed value of
|
||||||
|
the number of bit specified by the second argument.
|
||||||
|
.IP ufit 10
|
||||||
|
as for sfit but for unsigned values.
|
||||||
|
.IP samesign 10
|
||||||
|
the first argument has the same sign as the second.
|
||||||
|
.IP rotate 10
|
||||||
|
the value of the first argument rotated by the number of bit specified
|
||||||
|
by the second argument.
|
||||||
|
.IP samenam 10
|
||||||
|
both arguments refer to externals and have either no name, the same name
|
||||||
|
or same label.
|
||||||
|
.IP sameext 10
|
||||||
|
both arguments refer to the same external.
|
||||||
|
.IP offset 10
|
||||||
|
the argument is an external and this yields it offset part.
|
||||||
|
|
||||||
|
.LP
|
||||||
|
The global restriction takes the form:
|
||||||
|
.DS
|
||||||
|
.I
|
||||||
|
global_restriction : '?' expression
|
||||||
|
;
|
||||||
|
.R
|
||||||
|
.DE
|
||||||
|
and is used to express restrictions that cannot be expressed as simple
|
||||||
|
restrictions on a single argument or are can be expressed in a more
|
||||||
|
readable fashion as a global restriction. An example of such a rule is:
|
||||||
|
.DS
|
||||||
|
.I
|
||||||
|
dup w ldl stf ? p==2*w : ldl $2 stf $3 ldl $2 lof $3
|
||||||
|
.R
|
||||||
|
.DE
|
||||||
|
which says that this rule only applies if the pointer size is twice the
|
||||||
|
word size.
|
||||||
|
|
||||||
|
.NH
|
||||||
|
Incompatibilities with Previous Optimizer
|
||||||
|
.LP
|
||||||
|
The current table format is not compatible with previous versions of the
|
||||||
|
peephole optimizer tables. In particular the previous table had no provision
|
||||||
|
for local restrictions and only the equivalent of the global restriction.
|
||||||
|
This meant that our
|
||||||
|
.I '?'
|
||||||
|
character that announces the presence of the optional global restriction was
|
||||||
|
not required. The previous optimizer performed a number of other tasks that
|
||||||
|
were unrelated to optimization that were possible because the old optimizer
|
||||||
|
read the EM code for a complete procedure at a time. This included task such
|
||||||
|
as register variable reference counting and moving the information regarding
|
||||||
|
the number of bytes of local storage required by a procedure from it
|
||||||
|
.I end
|
||||||
|
pseudo instruction to it's
|
||||||
|
.I pro
|
||||||
|
pseudo instruction. These tasks are no longer done. If there are required
|
||||||
|
then the must be performed by some other program in the pipeline.
|
||||||
|
|
||||||
|
.NH
|
||||||
|
The Parser
|
||||||
|
.LP
|
||||||
|
The program to parse the tables and build the pattern table dependent dfa
|
||||||
|
routines is built from the files:
|
||||||
|
.IP parser.h 15
|
||||||
|
header file
|
||||||
|
.IP parser.g 15
|
||||||
|
LLGen source file defining syntax of table
|
||||||
|
.IP syntax.l 15
|
||||||
|
Lex sources file defining form of tokens in table.
|
||||||
|
.IP initlex.c 15
|
||||||
|
Uses the data in the library
|
||||||
|
.I em_data.a
|
||||||
|
to initialize the lexical analyser to recognize EM instruction mnemonics.
|
||||||
|
.IP outputdfa.c 15
|
||||||
|
Routines to output dfa when it has been constructed.
|
||||||
|
.IP outcalls.c 15
|
||||||
|
Routines to output the file
|
||||||
|
.I incalls.c
|
||||||
|
defined in section 4.
|
||||||
|
.IP findworst.c 15
|
||||||
|
Routines to analyze patterns to find how to continue matching after a
|
||||||
|
successful replacement or failed match.
|
||||||
|
|
||||||
|
.LP
|
||||||
|
The parser checks that the tables conform to the syntax outlined in the
|
||||||
|
previous section and also mades a number of semantic checks on their
|
||||||
|
validity. Further versions could make further checks such as looking for
|
||||||
|
cycles in the rules or checking that each replacement leaves the same
|
||||||
|
number of bytes on the stack as the pattern it replaces. The parser
|
||||||
|
builds an internal dfa representation of the rules by combining rules with
|
||||||
|
common prefixes. All local and global restrictions are combined into a single
|
||||||
|
test to be performed are a complete pattern has been detected in the input.
|
||||||
|
The idea is to build a structure so that each of the patterns can be matched
|
||||||
|
and then the corresponding tests made and the first that succeeds is replaced.
|
||||||
|
If two rules have the same pattern and both their tests also succeed the one
|
||||||
|
that appears first in the tables file will be done. Somewhat less obvious
|
||||||
|
is that id one pattern is a proper prefix of a longer pattern and its test
|
||||||
|
succeeds then the second pattern will not be checked for.
|
||||||
|
|
||||||
|
A major task of the parser if to decide on the action to take when a rule has
|
||||||
|
been partially matched or when a pattern has been completely matched but its
|
||||||
|
test does not succeed. This requires a search of all patterns to see if any
|
||||||
|
part of the part matched could be part of some other pattern. for example
|
||||||
|
given the two patterns:
|
||||||
|
.DS
|
||||||
|
.I
|
||||||
|
loc adi w loc adi w : loc $1+$3 adi w
|
||||||
|
loc adi w loc sbi w : loc $1-$3 adi w
|
||||||
|
.R
|
||||||
|
.DE
|
||||||
|
If the first pattern fails after seeing the input:
|
||||||
|
.DS
|
||||||
|
.I
|
||||||
|
loc adi loc
|
||||||
|
.R
|
||||||
|
.DE
|
||||||
|
the parser will still need to check whether the second pattern matches.
|
||||||
|
This requires a decision on how to fix up any internal data structures in
|
||||||
|
the dfa matcher, such as moving some instructions from the pattern to the
|
||||||
|
output queue and moving the pattern along and then deciding what state
|
||||||
|
it should continue from. Similar decisions are requires after a pattern
|
||||||
|
has been replaced. For example if the replacement is empty it is necessary
|
||||||
|
to backup
|
||||||
|
.I n-1
|
||||||
|
instructions where
|
||||||
|
.I n
|
||||||
|
is the length of the longest pattern in the tables.
|
||||||
|
|
||||||
|
.NH
|
||||||
|
Structure of the Resulting Library
|
||||||
|
|
||||||
|
.LP
|
||||||
|
The major data structures maintained by the library consist of three queues;
|
||||||
|
an
|
||||||
|
.I output
|
||||||
|
queue of instructions awaiting output, a
|
||||||
|
.I pattern
|
||||||
|
queue containing instructions that match the current prefix, and a
|
||||||
|
.I backup
|
||||||
|
queue of instructions that have been backed up over and need to be reparsed
|
||||||
|
for further pattern matches.
|
||||||
|
|
||||||
|
.LP
|
||||||
|
If no errors are detected by the parser in the tables it output the following
|
||||||
|
files:
|
||||||
|
.IP dfa.c 10
|
||||||
|
this consists of a large switch statement that maintains the current state of
|
||||||
|
the dfa and makes a transition to the next state if the next input instruction
|
||||||
|
matches.
|
||||||
|
.IP incalls.r 10
|
||||||
|
this contains an entry for every EM instruction (plus
|
||||||
|
.I lab
|
||||||
|
) giving information on how to build a routine with the name
|
||||||
|
.I C_xxx
|
||||||
|
that conforms to the
|
||||||
|
.I EM_CODE(3)
|
||||||
|
modules interface. If the EM instruction does not appear in the tables
|
||||||
|
patterns at all then the dfa routine is called to flush any current queued
|
||||||
|
output and the the output
|
||||||
|
.I O_xxx
|
||||||
|
routine is called. If the EM instruction does appear in a pattern then the instruction is added onto the end of the pattern queue and the dfa routines called
|
||||||
|
to attempted to make a transition. This file is input to the
|
||||||
|
.I awk
|
||||||
|
program
|
||||||
|
.I makefuns.awk
|
||||||
|
to produce individual C files with names like
|
||||||
|
.I C_xxx.c
|
||||||
|
each containing a single function definition. This enables the loader to
|
||||||
|
only load those routines that are actually needed when the library is loaded.
|
||||||
|
.IP trans.c 10
|
||||||
|
this contains a routine that is called after each transition to a state that
|
||||||
|
contains restrictions and replacements. The restrictions a converted to
|
||||||
|
C expressions and the replacements coded as calls to output instructions
|
||||||
|
into the output queue.
|
||||||
|
|
||||||
|
.LP
|
||||||
|
The following files contain code that is independent of the pattern tables:
|
||||||
|
.IP nopt.c 10
|
||||||
|
general routines to initialize, and maintain the data structures.
|
||||||
|
.IP aux.c 10
|
||||||
|
routines to implement the functions used in the rules.
|
||||||
|
.IP mkcalls.c 10
|
||||||
|
code to convert the internal data structures to calls on the output
|
||||||
|
.I O_xxx
|
||||||
|
routines when the output queue is flushed.
|
||||||
|
|
||||||
|
.NH
|
||||||
|
Miscellaneous Issues
|
||||||
|
.LP
|
||||||
|
The size of the output and backup queues are fixed in size according to the
|
||||||
|
values of
|
||||||
|
.I MAXOUTPUT
|
||||||
|
and
|
||||||
|
.I MAXBACKUP
|
||||||
|
defined in the file
|
||||||
|
.I nopt.h.
|
||||||
|
The size of the pattern queue is set to the length of the maximum pattern
|
||||||
|
length by the tables output by the parser. The queues are implemented as
|
||||||
|
arrays of pointers to structures containing the instruction and its arguments.
|
||||||
|
The space for the structures are initially obtained by calls to
|
||||||
|
.I Malloc
|
||||||
|
(from the
|
||||||
|
.I alloc(3)
|
||||||
|
module),
|
||||||
|
and freed when the output queue or patterns queue is cleared. These freed
|
||||||
|
structures are collected on a free list and reused to avoid the overheads
|
||||||
|
of repeated calls to
|
||||||
|
.I malloc
|
||||||
|
and
|
||||||
|
.I free.
|
||||||
|
|
||||||
|
.LP
|
||||||
|
The fixed size of the output and pattern queues causes no difficulty in
|
||||||
|
practice and can only result in some potential optimizations being missed.
|
||||||
|
When the output queue fills it is simply prematurely flushed and backups
|
||||||
|
when the backup queue is fill are simply ignored. A possible improvement
|
||||||
|
would be to flush only part of the output queue when it fills. It should
|
||||||
|
be noted that it is not possible to statically determine the maximum possible
|
||||||
|
size for these queues as they need to be unbounded in the worst case. A
|
||||||
|
study of the rule
|
||||||
|
.DS
|
||||||
|
.I
|
||||||
|
inc dec :
|
||||||
|
.R
|
||||||
|
.DE
|
||||||
|
with the input consisting of
|
||||||
|
.I N
|
||||||
|
.I inc
|
||||||
|
and then
|
||||||
|
.I N
|
||||||
|
.I dec
|
||||||
|
instructions requires an output queue length of
|
||||||
|
.I N-1
|
||||||
|
to find all possible replacements.
|
Loading…
Reference in a new issue