.\" $Header$
.TL
A Tour of the New Peephole Optimizer
.AU
B. J. McKenzie
.NH
Introduction
.LP
The peephole optimizer consists of four major parts:
.IP a)
the table describing the optimization to be performed
.IP b)
a program to parse these tables and build input and output routines to
interface to the library and a dfa based routine to recognize patterns and
make the requested replacements.
.IP c)
common routines for the library that are independent of the table of a)
.IP d)
a stand alone version of the optimizer.
.LP
The library conforms to the
.I EM_CODE(3)
module interface but with routine names of the form
.BI C_ xxx
replaced by names like
.BI O_ xxx.
Furthermore there is also routine
.I O_getid
and no variable
.I O_tmpdir
in the module.
The library module results in calls to the usual
.I EM_CODE(3)
module. It is possible to write a front end so that it can call either the
normal
.I EM_CODE(3)
module or this new module by adding
.B
#define PEEPHOLE
.R
before the line
.B
#include <em.h>
.R
This will map all calls to the routine
.BI C_ xxx
into a call to the routine
.BI O_ xxx.

.LP
We shall now describe each of these major parts in some detail.

.NH
The optimization table
.LP
The file
.I patterns
contains the patterns of EM instructions  to be recognized by the optimizer
and the EM instructions to replace them. Each pattern may have an
optional restriction that must be satisfied before the replacement is made.
The syntax of the table will be described using extended BNF notation
used by
.I LLGen
where:
.DS
.I
	[...]	- are used to group items
	|	- is used to separate alternatives
	;	- terminates a rule
	?	- indicates item is optional
	*	- indicates item is repeated zero or more times
	+	- indicates item is repeated one or more times
.R
.DE
The format of each rule in the table is:
.DS
.I
	rule	: pattern global_restriction? ':' replacement
		;
.R
.DE
Each rule must be on a single line except that it may be broken after the
colon if the next line begins with a tab character.
The pattern has the syntax:
.DS
.I
	pattern	: [ EM_mnem [ local_restriction ]? ]+
		;
	EM-mnem : "An EM instruction mnemonic"
		| 'lab'
		;
.R
.DE
and consists of a sequence of one or more EM instructions or
.I lab
which stands for a defined instruction label. Each EM-mnem may optionally be
followed by a local restriction on the argument of the mnemonic and take
one of the following forms depending on the type of the EM instruction it
follows:
.DS
.I
	local_restriction	: normal_restriction
				| opt_arg_restriction
				| ext_arg_restriction
				;
.R
.DE
A normal restriction is used after all types of EM instruction except for
those that allow an optional argument, (such as
.I adi
) or those involving external names, (such as
.I lae
)
and takes the form:
.DS
.I
	normal_restriction	: [ rel_op ]? expression
				;
	rel_op	: '=='
		| '!='
		| '<='
		| '<'
		| '>='
		| '>'
		;
.R
.DE
If the rel_op is missing, the equality
.I ==
operator is assumed. The general form of expression is defined later but
basically it involves simple constants, references to EM_mnem arguments
that appear earlier in the pattern and expressions similar to those used
in C expressions.

The form of the restriction after those EM instructions like
.I adi
whose arguments are optional takes the form:
.DS
.I
	opt_arg_restriction	: normal_restriction
				| 'defined'
				| 'undefined'
				;
.R
.DE
The
.I defined
and
.I undefined
indicate that the argument is present
or absent respectively. The normal restriction form implies that the
argument is present and satisfies the restriction.

The form of the restriction after those EM instructions like
.I lae
whose arguments refer to external object take the form:
.DS
.I
	ext_arg_restriction	: patarg  offset_part?
				;
	offset_part		: [ '+' | '-' ] expression
				;
.R
.DE
Such an argument has one of three forms: a offset with no name, an
offset form a name or an offset from a label. With no offset part
the restriction requires the argument to be identical to a previous
external argument. With an offset part it requires an identical name
part, (either empty, same name or same label) and supplies a relationship
among the offset parts. It is possible to refer to test for the same
external argument, the same name or to obtain the offset part of an external
argument using the
.I sameext
,
.I samenam
and
.I offset
functions given below.
.LP
The general form of an expression is:
.DS
.I
	expression	: expression binop expression
			| unaryop expression
			| '(' expression ')'
			| bin_function '(' expression ',' expression ')'
			| ext_function '(' patarg ',' patarg ')'
			| 'offset' '(' patarg ')'
			| patarg
			| 'p'
			| 'w'
			| INTEGER
			;
.R
.DE
.DS
.I
	bin_function	: 'sfit'
			| 'ufit'
			| 'samesign'
			| 'rotate'
			;
.R
.DE
.DS
.I
	ext_function	: 'samenam'
			| 'sameext'
			;
	patarg		: '$' INTEGER
			;
	binop		: "As for C language"
	unaryop		: "As for C language"
.R
.DE
The INTEGER in the
.I patarg
refers to the first, second, etc. argument in the pattern and it is
required to refer to a pattern that appears earlier in the pattern
The
.I w
and
.I p
refer to the word size and pointer size (in bytes) respectively. The
various function test for:
.IP sfit 10
the first argument fits as a signed value of
the number of bit specified by the second argument.
.IP ufit 10
as for sfit but for unsigned values.
.IP samesign 10
the first argument has the same sign as the second.
.IP rotate 10
the value of the first argument rotated by the number of bit specified
by the second argument.
.IP samenam 10
both arguments refer to externals and have either no name, the same name
or same label.
.IP sameext 10
both arguments refer to the same external.
.IP offset 10
the argument is an external and this yields it offset part.

.LP
The global restriction takes the form:
.DS
.I
	global_restriction	: '?' expression
				;
.R
.DE
and is used to express restrictions that cannot be expressed as simple
restrictions on a single argument or are can be expressed in a more
readable fashion as a global restriction. An example of such a rule is:
.DS
.I
	dup w ldl stf  ? p==2*w : ldl $2 stf $3 ldl $2 lof $3
.R
.DE
which says that this rule only applies if the pointer size is twice the
word size.

.NH
Incompatibilities with Previous Optimizer
.LP
The current table format is not compatible with previous versions of the
peephole optimizer tables. In particular the previous table had no provision
for local restrictions and only the equivalent of the global restriction.
This meant that our
.I '?'
character that announces the presence of the optional global restriction was
not required. The previous optimizer performed a number of other tasks that
were unrelated to optimization that were possible because the old optimizer
read the EM code for a complete procedure at a time. This included tasks such
as register variable reference counting and moving the information regarding
the number of bytes of local storage required by a procedure from it
.I end
pseudo instruction to it's
.I pro
pseudo instruction. These tasks are no longer done by this module but have
been moved to other modules or programs in the pipeline. The register variable
reference counting is now performed by the front end. The reordering of
code, such as the moving of mes instructions and the local storage
requirements from the end to beginning of procedures, is now performed using
the insertpart mechanism in the
.I EM_CODE
(or
.I EM_OPT
) module.
The removal of dead code is performed by the global optimizer.

.NH
The Parser
.LP
The program to parse the tables and build the pattern table dependent dfa
routines is built from the files:
.IP parser.h 15
header file
.IP parser.g 15
LLGen source file defining syntax of table
.IP syntax.l 15
Lex sources file defining form of tokens in table.
.IP initlex.c 15
Uses the data in the library
.I em_data.a
to initialize the lexical analyser to recognize EM instruction mnemonics.
.IP outputdfa.c 15
Routines to output the dfa when it has been constructed. It outputs the files
.I dfa.c
and
.I trans.c
.IP outcalls.c 15
Routines to output the file
.I incalls.r
defined in the next section.
.IP findworst.c 15
Routines to analyze patterns to find how to continue matching after a
successful replacement or failed match.

.LP
The parser checks that the tables conform to the syntax outlined in the
previous section and also makes a number of semantic checks on their
validity. Further versions could make further checks such as looking for
cycles in the rules or checking that each replacement leaves the same
number of bytes on the stack as the pattern it replaces. The parser
builds an internal dfa representation of the rules by combining rules with
common prefixes. All local and global restrictions are combined into a single
test to be performed are a complete pattern has been detected in the input.
The idea is to build a structure so that each of the patterns can be matched
and then the corresponding tests made and the first that succeeds is replaced.
If two rules have the same pattern and both their tests also succeed the one
that appears first in the tables file will be done. Somewhat less obvious
is that if one pattern is a proper prefix of a longer pattern and its test
succeeds then the second pattern will not be checked for.

A major task of the parser if to decide on the action to take when a rule has
been partially matched or when a pattern has been completely matched but its
test does not succeed. This requires a search of all patterns to see if any
part of the part matched could be part of some other pattern. for example
given the two patterns:
.DS
.I
	loc adi w loc adi w : loc $1+$3 adi w
	loc adi w loc sbi w : loc $1-$3 adi w
.R
.DE
If the first pattern fails after seeing the input:
.DS
.I
	loc adi loc
.R
.DE
the parser will still need to check whether the second pattern matches.
This requires a decision on how to fix up any internal data structures in
the dfa matcher, such as moving some instructions from the pattern to the
output queue and moving the pattern along and then deciding what state
it should continue from. Similar decisions  are requires after a pattern
has been replaced. For example if the replacement is empty it is necessary
to backup
.I n-1
instructions where
.I n
is the length of the longest pattern in the tables.

.NH
Structure of the Resulting Library

.LP
The major data structures maintained by the library consist of three queues;
an
.I output
queue of instructions awaiting output, a
.I pattern
queue containing instructions that match the current prefix, and a
.I backup
queue of instructions that have been backed up over and need to be reparsed
for further pattern matches.

.LP
If no errors are detected by the parser in the tables it output the following
files if they have changed from the existing version of the file:
.IP dfa.c 10
this consists of a routine for each state in the dfa. Each routine contains
a switch statement that decides on the basis of the current instruction
opcode the next state if any in the dfa to make a transition to.
These routines are called from
.I OO_dfa
declared in
.I OO_dfa.c
via an array
.I OO_fstate
that is indexed by state. Attempt to implement this code by a large nested
switch statement experienced difficulties with compilers that had fixed
limits on the size of switch statements. A better implementation of this
might be to find some hashing function that mapped state and opcode onto a
unique value and then switch on this via an array.
.IP trans.c 10
this contains external declarations of transition routines with names like
.B OO_xxxdotrans
(where
.I xxx
is a small integer).
These are called when there a transition to state
.I xxx
that corresponds to a
complete pattern. Any tests are performed if necessary to confirm that the
pattern matches and then the replacement instructions are placed on the
output queue and backup and freeing of instructions is performed. If there are
a number of patterns with the same instructions but different tests, these
will all appear in the same routine and the tests performed in the order they
appear in the original
.I patterns
file.
.IP incalls.r 10
this contains an entry for every EM instruction (plus
.I lab
) giving information on how to build a routine with the name
.BI O_ xxx
for the library version of the module.
If the EM instruction does not appear in the tables
patterns at all then the dfa routine is called to flush any current queued
output and the the output
.BI C_ xxx
routine is called. If the EM instruction does appear in a pattern then the
instruction data structure is allocated, (from the free list), its fields
initialized and it is added onto the end of the pattern queue.
The dfa routines are then called to attempted to make a transition.
This file is input to the
.I awk
program
.I makefuns.awk.

.LP
The following files contain code that is independent of the pattern tables:
.IP main.c 10
this is used only in the stand alone version of the optimizer and consists
of code to open the input file, read the input using the
.I READ_EM(3)
module and call the dfa routines. This version does not require the routines
constructed from the incalls.r file described above.
.IP nopt.c 10
general routines to initialize, and maintain the data structures. The file
handling routines
.I O_open
etc are defined here. Also defined are routines for flushing the output queue
by calling the
.I EM_mkcalls
routine from the
.I READ_EM(3)
module and moving instructions from the output to the backup queue.
Routines to free the strings stored in instructions
with types of
.I sof_ptyp,
.I pro_ptyp,
.I str_ptyp,
.I ico_ptyp,
.I uco_ptyp,
and
.I fco_ptyp are also defined. These strings are copied to a large array that
is extended by
.I Realloc
if it overflows. The strings can be thrown away on any flush that occurs when
the backup queue is empty.
.IP mkstrct.c 10
contains routines to build the data structure from the input
.BI C_ xxx
routines and place the structure on the pattern queue. These routines are not
required in the stand alone optimizer.
.IP aux.c 10
routines to implement the functions used in the rules.

.LP
The following files are also used in building the module library:
.IP makefuns.awk 10
this
.I awk
program is used to produce individual C files with names like
.BI O_ xxx.c
each containing a single function definition and then call the
.I cc
compiler to produce a single output file.
This enables the loader to only load those routines that are actually
needed when the library is loaded.
.IP pseudo.r 10
this file is like the
.I incalls.r
file produced by the parser but is built by hand and handles the pseudo
EM instructions. It is also processed by
.I makefuns.awk.

.NH
Miscellaneous Issues
.LP
The output and backup queues are maintained on fixed length arrays
of pointers the the
.I e_instr
data structure used by the
.I READ_EM(3)
module.
The size of these queues are fixed in size according to the
values of
.I MAXOUTPUT
and
.I MAXBACKUP
defined in the file
.I nopt.c.
The size of the pattern queue is set to the length of the maximum pattern
length by the tables output by the parser.
The space for the structures are initially obtained by calls to
.I Malloc
(from the
.I alloc(3)
module),
and freed when the output queue or patterns queue is cleared. These freed
structures are collected on the free list and reused to avoid the overheads
of repeated calls to
.I malloc
and
.I free.

.LP
The fixed size of the output and pattern queues causes no difficulty in
practice and can only result in some potential optimizations being missed.
When the output queue fills it is simply prematurely flushed and backups
when the backup queue is fill are simply ignored. A possible improvement
would be to flush only part of the output queue when it fills. It should
be noted that it is not possible to statically determine the maximum possible
size for these queues as they need to be unbounded in the worst case. A
study of the rule
.DS
.I
	inc dec :
.R
.DE
with the input consisting of
.I N
.I inc
and then
.I N
.I dec
instructions requires an output queue length of
.I N-1
to find all possible replacements.