591 lines
		
	
	
	
		
			17 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			591 lines
		
	
	
	
		
			17 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| .\" $Header$
 | |
| .TL
 | |
| A Tour of the New Peephole Optimizer
 | |
| .AU
 | |
| B. J. McKenzie
 | |
| .NH
 | |
| Introduction
 | |
| .LP
 | |
| The peephole optimizer consists of four major parts:
 | |
| .IP a)
 | |
| the table describing the optimization to be performed
 | |
| .IP b)
 | |
| a program to parse these tables and build input and output routines to
 | |
| interface to the library and a dfa based routine to recognize patterns and
 | |
| make the requested replacements.
 | |
| .IP c)
 | |
| common routines for the library that are independent of the table of a)
 | |
| .IP d)
 | |
| a stand alone version of the optimizer.
 | |
| .LP
 | |
| The library conforms to the
 | |
| .I EM_CODE(3)
 | |
| module interface but with routine names of the form
 | |
| .BI C_ xxx
 | |
| replaced by names like
 | |
| .BI O_ xxx.
 | |
| Furthermore there is also no routine
 | |
| .I O_getid
 | |
| and no variable
 | |
| .I O_tmpdir
 | |
| in the module.
 | |
| The library module results in calls to the usual
 | |
| .I EM_CODE(3)
 | |
| module. It is possible to write a front end so that it can call either the
 | |
| normal
 | |
| .I EM_CODE(3)
 | |
| module or this new module by adding
 | |
| .B
 | |
| #define PEEPHOLE
 | |
| .R
 | |
| before the line
 | |
| .B
 | |
| #include <em.h>
 | |
| .R
 | |
| This will map all calls to the routine
 | |
| .BI C_ xxx
 | |
| into a call to the routine
 | |
| .BI O_ xxx.
 | |
| 
 | |
| .LP
 | |
| We shall now describe each of these major parts in some detail.
 | |
| 
 | |
| .NH
 | |
| The optimization table
 | |
| .LP
 | |
| The file
 | |
| .I patterns
 | |
| contains the patterns of EM instructions  to be recognized by the optimizer
 | |
| and the EM instructions to replace them. Each pattern may have an
 | |
| optional restriction that must be satisfied before the replacement is made.
 | |
| The syntax of the table will be described using extended BNF notation
 | |
| used by
 | |
| .I LLGen
 | |
| where:
 | |
| .DS
 | |
| .I
 | |
| 	[...]	- are used to group items
 | |
| 	|	- is used to separate alternatives
 | |
| 	;	- terminates a rule
 | |
| 	?	- indicates item is optional
 | |
| 	*	- indicates item is repeated zero or more times
 | |
| 	+	- indicates item is repeated one or more times
 | |
| .R
 | |
| .DE
 | |
| The format of each rule in the table is:
 | |
| .DS
 | |
| .I
 | |
| 	rule	: pattern global_restriction? ':' replacement
 | |
| 		;
 | |
| .R
 | |
| .DE
 | |
| Each rule must be on a single line except that it may be broken after the
 | |
| colon if the next line begins with a tab character.
 | |
| The pattern has the syntax:
 | |
| .DS
 | |
| .I
 | |
| 	pattern	: [ EM_mnem [ local_restriction ]? ]+
 | |
| 		;
 | |
| 	EM-mnem : "An EM instruction mnemonic"
 | |
| 		| 'lab'
 | |
| 		;
 | |
| .R
 | |
| .DE
 | |
| and consists of a sequence of one or more EM instructions or
 | |
| .I lab
 | |
| which stands for a defined instruction label. Each EM-mnem may optionally be
 | |
| followed by a local restriction on the argument of the mnemonic and take
 | |
| one of the following forms depending on the type of the EM instruction it
 | |
| follows:
 | |
| .DS
 | |
| .I
 | |
| 	local_restriction	: normal_restriction
 | |
| 				| opt_arg_restriction
 | |
| 				| ext_arg_restriction
 | |
| 				;
 | |
| .R
 | |
| .DE
 | |
| A normal restriction is used after all types of EM instruction except for
 | |
| those that allow an optional argument, (such as
 | |
| .I adi
 | |
| ) or those involving external names, (such as
 | |
| .I lae
 | |
| )
 | |
| and takes the form:
 | |
| .DS
 | |
| .I
 | |
| 	normal_restriction	: [ rel_op ]? expression
 | |
| 				;
 | |
| 	rel_op	: '=='
 | |
| 		| '!='
 | |
| 		| '<='
 | |
| 		| '<'
 | |
| 		| '>='
 | |
| 		| '>'
 | |
| 		;
 | |
| .R
 | |
| .DE
 | |
| If the rel_op is missing, the equality
 | |
| .I ==
 | |
| operator is assumed. The general form of expression is defined later but
 | |
| basically it involves simple constants, references to EM_mnem arguments
 | |
| that appear earlier in the pattern and expressions similar to those used
 | |
| in C expressions.
 | |
| 
 | |
| The form of the restriction after those EM instructions like
 | |
| .I adi
 | |
| whose arguments are optional takes the form:
 | |
| .DS
 | |
| .I
 | |
| 	opt_arg_restriction	: normal_restriction
 | |
| 				| 'defined'
 | |
| 				| 'undefined'
 | |
| 				;
 | |
| .R
 | |
| .DE
 | |
| The
 | |
| .I defined
 | |
| and
 | |
| .I undefined
 | |
| indicate that the argument is present
 | |
| or absent respectively. The normal restriction form implies that the
 | |
| argument is present and satisfies the restriction.
 | |
| 
 | |
| The form of the restriction after those EM instructions like
 | |
| .I lae
 | |
| whose arguments refer to external object take the form:
 | |
| .DS
 | |
| .I
 | |
| 	ext_arg_restriction	: patarg  offset_part?
 | |
| 				;
 | |
| 	offset_part		: [ '+' | '-' ] expression
 | |
| 				;
 | |
| .R
 | |
| .DE
 | |
| Such an argument has one of three forms: a offset with no name, an
 | |
| offset form a name or an offset from a label. With no offset part
 | |
| the restriction requires the argument to be identical to a previous
 | |
| external argument. With an offset part it requires an identical name
 | |
| part, (either empty, same name or same label) and supplies a relationship
 | |
| among the offset parts. It is possible to refer to test for the same
 | |
| external argument, the same name or to obtain the offset part of an external
 | |
| argument using the
 | |
| .I sameext
 | |
| ,
 | |
| .I samenam
 | |
| and
 | |
| .I offset
 | |
| functions given below.
 | |
| .LP
 | |
| The general form of an expression is:
 | |
| .DS
 | |
| .I
 | |
| 	expression	: expression binop expression
 | |
| 			| unaryop expression
 | |
| 			| '(' expression ')'
 | |
| 			| bin_function '(' expression ',' expression ')'
 | |
| 			| ext_function '(' patarg ',' patarg ')'
 | |
| 			| 'offset' '(' patarg ')'
 | |
| 			| patarg
 | |
| 			| 'p'
 | |
| 			| 'w2'
 | |
| 			| 'w'
 | |
| 			| INTEGER
 | |
| 			;
 | |
| .R
 | |
| .DE
 | |
| .DS
 | |
| .I
 | |
| 	bin_function	: 'sfit'
 | |
| 			| 'ufit'
 | |
| 			| 'samesign'
 | |
| 			| 'rotate'
 | |
| 			;
 | |
| .R
 | |
| .DE
 | |
| .DS
 | |
| .I
 | |
| 	ext_function	: 'samenam'
 | |
| 			| 'sameext'
 | |
| 			;
 | |
| 	patarg		: '$' INTEGER
 | |
| 			;
 | |
| 	binop		: "As for C language"
 | |
| 	unaryop		: "As for C language"
 | |
| .R
 | |
| .DE
 | |
| The INTEGER in the
 | |
| .I patarg
 | |
| refers to the first, second, etc. argument in the pattern and it is
 | |
| required to refer to a pattern that appears earlier in the pattern
 | |
| The
 | |
| .I w
 | |
| and
 | |
| .I p
 | |
| refer to the word size and pointer size (in bytes) respectively.
 | |
| The
 | |
| .I w2
 | |
| refers to twice the word size.
 | |
| The
 | |
| various function test for:
 | |
| .IP sfit 10
 | |
| the first argument fits as a signed value of
 | |
| the number of bit specified by the second argument.
 | |
| .IP ufit 10
 | |
| as for sfit but for unsigned values.
 | |
| .IP samesign 10
 | |
| the first argument has the same sign as the second.
 | |
| .IP rotate 10
 | |
| the value of the first argument rotated by the number of bit specified
 | |
| by the second argument.
 | |
| .IP samenam 10
 | |
| both arguments refer to externals and have either no name, the same name
 | |
| or same label.
 | |
| .IP sameext 10
 | |
| both arguments refer to the same external.
 | |
| .IP offset 10
 | |
| the argument is an external and this yields it offset part.
 | |
| 
 | |
| .LP
 | |
| The global restriction takes the form:
 | |
| .DS
 | |
| .I
 | |
| 	global_restriction	: '?' expression
 | |
| 				;
 | |
| .R
 | |
| .DE
 | |
| and is used to express restrictions that cannot be expressed as simple
 | |
| restrictions on a single argument or are can be expressed in a more
 | |
| readable fashion as a global restriction. An example of such a rule is:
 | |
| .DS
 | |
| .I
 | |
| 	dup w ldl stf  ? p==2*w : ldl $2 stf $3 ldl $2 lof $3
 | |
| .R
 | |
| .DE
 | |
| which says that this rule only applies if the pointer size is twice the
 | |
| word size.
 | |
| 
 | |
| .NH
 | |
| Incompatibilities with Previous Optimizer
 | |
| .LP
 | |
| The current table format is not compatible with previous versions of the
 | |
| peephole optimizer tables. In particular the previous table had no provision
 | |
| for local restrictions and only the equivalent of the global restriction.
 | |
| This meant that our
 | |
| .I '?'
 | |
| character that announces the presence of the optional global restriction was
 | |
| not required. The previous optimizer performed a number of other tasks that
 | |
| were unrelated to optimization that were possible because the old optimizer
 | |
| read the EM code for a complete procedure at a time. This included tasks such
 | |
| as register variable reference counting and moving the information regarding
 | |
| the number of bytes of local storage required by a procedure from it
 | |
| .I end
 | |
| pseudo instruction to it's
 | |
| .I pro
 | |
| pseudo instruction. These tasks are no longer done by this module but have
 | |
| been moved to other modules or programs in the pipeline. The register variable
 | |
| reference counting is now performed by the front end. The reordering of
 | |
| code, such as the moving of mes instructions and the local storage
 | |
| requirements from the end to beginning of procedures, is now performed using
 | |
| the insertpart mechanism in the
 | |
| .I EM_CODE
 | |
| (or
 | |
| .I EM_OPT
 | |
| ) module.
 | |
| The removal of dead code is performed by the global optimizer.
 | |
| Various
 | |
| .I ext_functions
 | |
| available in the old tables are no longer available as they rely on
 | |
| information that is not available to the current program.
 | |
| These are the
 | |
| .I notreg
 | |
| and the
 | |
| .I rom
 | |
| functions.
 | |
| The previous optimizer allowed the use of
 | |
| .I LLP,
 | |
| .I LEP,
 | |
| .I SLP
 | |
| and
 | |
| .I SEP
 | |
| in patterns. For example
 | |
| .I LLP
 | |
| stood for either
 | |
| .I lol
 | |
| if the pointer size was the same as the word size, or for
 | |
| .I ldl
 | |
| if the pointer size was twice the word size.
 | |
| In the current optimizer it is necessary to include two patterns for each
 | |
| such single pattern in the old table. For example for a pattern containing
 | |
| .I LLP
 | |
| there would be one pattern with
 | |
| .I lol
 | |
| and with a global restriction of the form
 | |
| .I p=w
 | |
| and another pattern with ldl and a global restriction of the form
 | |
| .I p=2*w.
 | |
| 
 | |
| .NH
 | |
| The Parser
 | |
| .LP
 | |
| The program to parse the tables and build the pattern table dependent dfa
 | |
| routines is built from the files:
 | |
| .IP parser.h 15
 | |
| header file
 | |
| .IP parser.g 15
 | |
| LLGen source file defining syntax of table
 | |
| .IP syntax.l 15
 | |
| Lex sources file defining form of tokens in table.
 | |
| .IP initlex.c 15
 | |
| Uses the data in the library
 | |
| .I em_data.a
 | |
| to initialize the lexical analyzer to recognize EM instruction mnemonics.
 | |
| .IP outputdfa.c 15
 | |
| Routines to output the dfa when it has been constructed. It outputs the files
 | |
| .I dfa.c
 | |
| and
 | |
| .I trans.c
 | |
| .IP outcalls.c 15
 | |
| Routines to output the file
 | |
| .I incalls.r
 | |
| defined in the next section.
 | |
| .IP findworst.c 15
 | |
| Routines to analyze patterns to find how to continue matching after a
 | |
| successful replacement or failed match.
 | |
| 
 | |
| .LP
 | |
| The parser checks that the tables conform to the syntax outlined in the
 | |
| previous section and also makes a number of semantic checks on their
 | |
| validity. Further versions could make further checks such as looking for
 | |
| cycles in the rules or checking that each replacement leaves the same
 | |
| number of bytes on the stack as the pattern it replaces. The parser
 | |
| builds an internal dfa representation of the rules by combining rules with
 | |
| common prefixes. All local and global restrictions are combined into a single
 | |
| test to be performed are a complete pattern has been detected in the input.
 | |
| The idea is to build a structure so that each of the patterns can be matched
 | |
| and then the corresponding tests made and the first that succeeds is replaced.
 | |
| If two rules have the same pattern and both their tests also succeed the one
 | |
| that appears first in the tables file will be done. Somewhat less obvious
 | |
| is that if one pattern is a proper prefix of a longer pattern and its test
 | |
| succeeds then the second pattern will not be checked for.
 | |
| 
 | |
| A major task of the parser if to decide on the action to take when a rule has
 | |
| been partially matched or when a pattern has been completely matched but its
 | |
| test does not succeed. This requires a search of all patterns to see if any
 | |
| part of the part matched could be part of some other pattern. for example
 | |
| given the two patterns:
 | |
| .DS
 | |
| .I
 | |
| 	loc adi w loc adi w : loc $1+$3 adi w
 | |
| 	loc adi w loc sbi w : loc $1-$3 adi w
 | |
| .R
 | |
| .DE
 | |
| If the first pattern fails after seeing the input:
 | |
| .DS
 | |
| .I
 | |
| 	loc adi loc
 | |
| .R
 | |
| .DE
 | |
| the parser will still need to check whether the second pattern matches.
 | |
| This requires a decision on how to fix up any internal data structures in
 | |
| the dfa matcher, such as moving some instructions from the pattern to the
 | |
| output queue and moving the pattern along and then deciding what state
 | |
| it should continue from. Similar decisions  are requires after a pattern
 | |
| has been replaced. For example if the replacement is empty it is necessary
 | |
| to backup
 | |
| .I n-1
 | |
| instructions where
 | |
| .I n
 | |
| is the length of the longest pattern in the tables.
 | |
| 
 | |
| .NH
 | |
| Structure of the Resulting Library
 | |
| 
 | |
| .LP
 | |
| The major data structures maintained by the library consist of three queues;
 | |
| an
 | |
| .I output
 | |
| queue of instructions awaiting output, a
 | |
| .I pattern
 | |
| queue containing instructions that match the current prefix, and a
 | |
| .I backup
 | |
| queue of instructions that have been backed up over and need to be reparsed
 | |
| for further pattern matches.
 | |
| These three queues are maintained in a single fixed size buffer as explained
 | |
| in more detail in the next section.
 | |
| Also, after a successful match, a replacement queue is constructed.
 | |
| 
 | |
| 
 | |
| .LP
 | |
| If no errors are detected by the parser in the tables it output the following
 | |
| files if they have changed from the existing version of the file:
 | |
| .IP dfa.c 10
 | |
| this contains the dfa encoded into a number of arrays using the technique
 | |
| of row displacement for compacted sparse matricies. Given an opcode and
 | |
| the current state, the value of
 | |
| .I OO_base[OO_state]
 | |
| is consulted to obtain a pointer into the array
 | |
| .I OO_checknext.
 | |
| If this pointer in zero or the
 | |
| .I check
 | |
| field of the addressed structure does
 | |
| not correspond to the curerent state then it is known there is no entry for
 | |
| this opcode/state pair and the
 | |
| .I OO_default
 | |
| array is consulted instead.
 | |
| If the check field does match then the
 | |
| .I next
 | |
| field contains the new state.
 | |
| After each transition the array
 | |
| .I OO_ftrans
 | |
| is consulted to see if this state corresponds to a final state
 | |
| (i.e. a complete pattern) and if so the corresponding function is called.
 | |
| .IP trans.c 10
 | |
| this contains external declarations of transition routines with names like
 | |
| .B OO_xxxdotrans
 | |
| (where
 | |
| .I xxx
 | |
| is a small integer).
 | |
| These are called when there a transition to state
 | |
| .I xxx
 | |
| that corresponds to a
 | |
| complete pattern. Any tests are performed if necessary to confirm that the
 | |
| pattern matches and then the replacement instructions are placed on the
 | |
| output queue and the routine
 | |
| .I OO_mkrepl
 | |
| is called to make the replacement and if backup the amount required.
 | |
| If there are a number of patterns with the same instructions but different
 | |
| tests, these will all appear in the same routine and the tests performed in
 | |
| the order they appear in the original
 | |
| .I patterns
 | |
| file.
 | |
| .IP incalls.r 10
 | |
| this contains an entry for every EM instruction (plus
 | |
| .I lab
 | |
| ) giving information on how to build a routine with the name
 | |
| .BI O_ xxx
 | |
| for the library version of the module.
 | |
| If the EM instruction does not appear in the tables
 | |
| patterns at all then the dfa routine is called to flush any current queued
 | |
| output and the the output
 | |
| .BI C_ xxx
 | |
| routine is called. If the EM instruction does appear in a pattern then the
 | |
| instruction data structure fields are
 | |
| initialized and it is added onto the end of the pattern queue.
 | |
| The dfa routines are then called to attempted to make a transition.
 | |
| This file is input to the
 | |
| .I awk
 | |
| program
 | |
| .I makefuns.awk.
 | |
| 
 | |
| .LP
 | |
| The following files contain code that is independent of the pattern tables:
 | |
| .IP main.c 10
 | |
| this is used only in the stand alone version of the optimizer and consists
 | |
| of code to open the input file, read the input using the
 | |
| .I READ_EM(3)
 | |
| module and call the dfa routines. This version does not require the routines
 | |
| constructed from the incalls.r file described above.
 | |
| .IP nopt.c 10
 | |
| general routines to initialize, and maintain the data structures. The file
 | |
| handling routines
 | |
| .I O_open
 | |
| etc are defined here. Also defined are routines for flushing the output queue
 | |
| by calling the
 | |
| .I EM_mkcalls
 | |
| routine from the
 | |
| .I READ_EM(3)
 | |
| module and moving instructions from the output to the backup queue.
 | |
| Routines to free the strings stored in instructions
 | |
| with types of
 | |
| .I sof_ptyp,
 | |
| .I pro_ptyp,
 | |
| .I str_ptyp,
 | |
| .I ico_ptyp,
 | |
| .I uco_ptyp,
 | |
| and
 | |
| .I fco_ptyp are also defined. These strings are copied to a large array that
 | |
| is extended by
 | |
| .I Realloc
 | |
| if it overflows. The strings can be thrown away on any flush that occurs when
 | |
| the backup queue is empty.
 | |
| .IP mkstrct.c 10
 | |
| contains routines to build the data structure from the input
 | |
| .BI C_ xxx
 | |
| routines and place the structure on the pattern queue. These routines are also
 | |
| used to build the data structures when a replacement is constructed.
 | |
| .IP aux.c 10
 | |
| routines to implement the external functions used in the pattern table.
 | |
| 
 | |
| .LP
 | |
| The following files are also used in building the module library:
 | |
| .IP makefuns.awk 10
 | |
| this
 | |
| .I awk
 | |
| program is used to produce individual C files with names like
 | |
| .BI O_ xxx.c
 | |
| each containing a single function definition and then call the
 | |
| .I cc
 | |
| compiler to produce a single output file.
 | |
| This enables the loader to only load those routines that are actually
 | |
| needed when the library is loaded.
 | |
| .IP pseudo.r 10
 | |
| this file is like the
 | |
| .I incalls.r
 | |
| file produced by the parser but is built by hand and handles the pseudo
 | |
| EM instructions. It is also processed by
 | |
| .I makefuns.awk.
 | |
| 
 | |
| .NH
 | |
| Miscellaneous Issues
 | |
| .LP
 | |
| The output, pattern and backup queues are maintained in fixed length array,
 | |
| .I OO_buffer
 | |
| allocated of size
 | |
| .I MAXBUFFER
 | |
| (a constant declared in nopt.h) at run time.
 | |
| It consists of an array of the
 | |
| .I e_instr
 | |
| data structure used by the
 | |
| .I READ_EM(3)
 | |
| module.
 | |
| At any time the pointers
 | |
| .I OO_patternqueue
 | |
| and
 | |
| .I OO_nxtpatt
 | |
| point to the beginning and end of the current pattern prefix that corresponds
 | |
| to the current state. Any instructions on the backup queue are between
 | |
| .I OO_nxtpatt
 | |
| and
 | |
| .I OO_endbackup.
 | |
| If there are no instructions on the backup queue then
 | |
| .I OO_endbackup
 | |
| will be 0 (zero).
 | |
| The size of the replacement queue is set to the length of the maximum
 | |
| replacement length by the tables output by the parser.
 | |
| 
 | |
| .LP
 | |
| The fixed size of the buffer causes no difficulty in
 | |
| practice and can only result in some potential optimizations being missed.
 | |
| When space for a new instruction is required and the buffer is full the
 | |
| routine
 | |
| .I OO_halfflush
 | |
| is called to flush half the buffer and move all the data structures left.
 | |
| It should be noted that it is not possible to statically determine the
 | |
| maximum possible size for these queues as they need to be unbounded in
 | |
| the worst case.
 | |
| A study of the rule
 | |
| .DS
 | |
| .I
 | |
| 	inc dec :
 | |
| .R
 | |
| .DE
 | |
| with the input consisting of
 | |
| .I N
 | |
| .I inc
 | |
| and then
 | |
| .I N
 | |
| .I dec
 | |
| instructions requires an output queue length of
 | |
| .I N-1
 | |
| to find all possible replacements.
 |