diff --git a/doc/Makefile b/doc/Makefile index b735a3b83..ab14cc969 100644 --- a/doc/Makefile +++ b/doc/Makefile @@ -14,7 +14,7 @@ RESFILES= \ peep.$(SUF) cg.$(SUF) ncg.$(SUF) regadd.$(SUF) LLgen.$(SUF) \ basic.$(SUF) crefman.$(SUF) pcref.$(SUF) val.$(SUF) \ 6500.$(SUF) i80.$(SUF) z80.$(SUF) top.$(SUF) ego.$(SUF) \ - m68020.$(SUF) occam.$(SUF) + m68020.$(SUF) occam.$(SUF) nopt.$(SUF) .SUFFIXES: .doc .$(SUF) diff --git a/doc/nopt.doc b/doc/nopt.doc new file mode 100644 index 000000000..03ccb02a6 --- /dev/null +++ b/doc/nopt.doc @@ -0,0 +1,543 @@ +.\" $Header$ +.TL +A Tour of the New Peephole Optimizer +.AU +B. J. McKenzie +.NH +Introduction +.LP +The peephole optimizer consists of four major parts: +.IP a) +the table describing the optimization to be performed +.IP b) +a program to parse these tables and build input and output routines to +interface to the library and a dfa based routine to recognize patterns and +make the requested replacements. +.IP c) +common routines for the library that are independent of the table of a) +.IP d) +a stand alone version of the optimizer. +.LP +The library conforms to the +.I EM_CODE(3) +module interface but with routine names of the form +.BI C_ xxx +replaced by names like +.BI O_ xxx. +Furthermore there is also routine +.I O_getid +and no variable +.I O_tmpdir +in the module. +The library module results in calls to the usual +.I EM_CODE(3) +module. It is possible to write a front end so that it can call either the +normal +.I EM_CODE(3) +module or this new module by adding +.B +#define PEEPHOLE +.R +before the line +.B +#include +.R +This will map all calls to the routine +.BI C_ xxx +into a call to the routine +.BI O_ xxx. + +.LP +We shall now describe each of these major parts in some detail. + +.NH +The optimization table +.LP +The file +.I patterns +contains the patterns of EM instructions to be recognized by the optimizer +and the EM instructions to replace them. Each pattern may have an +optional restriction that must be satisfied before the replacement is made. +The syntax of the table will be described using extended BNF notation +used by +.I LLGen +where: +.DS +.I + [...] - are used to group items + | - is used to separate alternatives + ; - terminates a rule + ? - indicates item is optional + * - indicates item is repeated zero or more times + + - indicates item is repeated one or more times +.R +.DE +The format of each rule in the table is: +.DS +.I + rule : pattern global_restriction? ':' replacement + ; +.R +.DE +Each rule must be on a single line except that it may be broken after the +colon if the next line begins with a tab character. +The pattern has the syntax: +.DS +.I + pattern : [ EM_mnem [ local_restriction ]? ]+ + ; + EM-mnem : "An EM instruction mnemonic" + | 'lab' + ; +.R +.DE +and consists of a sequence of one or more EM instructions or +.I lab +which stands for a defined instruction label. Each EM-mnem may optionally be +followed by a local restriction on the argument of the mnemonic and take +one of the following forms depending on the type of the EM instruction it +follows: +.DS +.I + local_restriction : normal_restriction + | opt_arg_restriction + | ext_arg_restriction + ; +.R +.DE +A normal restriction is used after all types of EM instruction except for +those that allow an optional argument, (such as +.I adi +) or those involving external names, (such as +.I lae +) +and takes the form: +.DS +.I + normal_restriction : [ rel_op ]? expression + ; + rel_op : '==' + | '!=' + | '<=' + | '<' + | '>=' + | '>' + ; +.R +.DE +If the rel_op is missing, the equality +.I == +operator is assumed. The general form of expression is defined later but +basically it involves simple constants, references to EM_mnem arguments +that appear earlier in the pattern and expressions similar to those used +in C expressions. + +The form of the restriction after those EM instructions like +.I adi +whose arguments are optional takes the form: +.DS +.I + opt_arg_restriction : normal_restriction + | 'defined' + | 'undefined' + ; +.R +.DE +The +.I defined +and +.I undefined +indicate that the argument is present +or absent respectively. The normal restriction form implies that the +argument is present and satisfies the restriction. + +The form of the restriction after those EM instructions like +.I lae +whose arguments refer to external object take the form: +.DS +.I + ext_arg_restriction : patarg offset_part? + ; + offset_part : [ '+' | '-' ] expression + ; +.R +.DE +Such an argument has one of three forms: a offset with no name, an +offset form a name or an offset from a label. With no offset part +the restriction requires the argument to be identical to a previous +external argument. With an offset part it requires an identical name +part, (either empty, same name or same label) and supplies a relationship +among the offset parts. It is possible to refer to test for the same +external argument, the same name or to obtain the offset part of an external +argument using the +.I sameext +, +.I samenam +and +.I offset +functions given below. +.LP +The general form of an expression is: +.DS +.I + expression : expression binop expression + | unaryop expression + | '(' expression ')' + | bin_function '(' expression ',' expression ')' + | ext_function '(' patarg ',' patarg ')' + | 'offset' '(' patarg ')' + | patarg + | 'p' + | 'w' + | INTEGER + ; +.R +.DE +.DS +.I + bin_function : 'sfit' + | 'ufit' + | 'samesign' + | 'rotate' + ; +.R +.DE +.DS +.I + ext_function : 'samenam' + | 'sameext' + ; + patarg : '$' INTEGER + ; + binop : "As for C language" + unaryop : "As for C language" +.R +.DE +The INTEGER in the +.I patarg +refers to the first, second, etc. argument in the pattern and it is +required to refer to a pattern that appears earlier in the pattern +The +.I w +and +.I p +refer to the word size and pointer size (in bytes) respectively. The +various function test for: +.IP sfit 10 +the first argument fits as a signed value of +the number of bit specified by the second argument. +.IP ufit 10 +as for sfit but for unsigned values. +.IP samesign 10 +the first argument has the same sign as the second. +.IP rotate 10 +the value of the first argument rotated by the number of bit specified +by the second argument. +.IP samenam 10 +both arguments refer to externals and have either no name, the same name +or same label. +.IP sameext 10 +both arguments refer to the same external. +.IP offset 10 +the argument is an external and this yields it offset part. + +.LP +The global restriction takes the form: +.DS +.I + global_restriction : '?' expression + ; +.R +.DE +and is used to express restrictions that cannot be expressed as simple +restrictions on a single argument or are can be expressed in a more +readable fashion as a global restriction. An example of such a rule is: +.DS +.I + dup w ldl stf ? p==2*w : ldl $2 stf $3 ldl $2 lof $3 +.R +.DE +which says that this rule only applies if the pointer size is twice the +word size. + +.NH +Incompatibilities with Previous Optimizer +.LP +The current table format is not compatible with previous versions of the +peephole optimizer tables. In particular the previous table had no provision +for local restrictions and only the equivalent of the global restriction. +This meant that our +.I '?' +character that announces the presence of the optional global restriction was +not required. The previous optimizer performed a number of other tasks that +were unrelated to optimization that were possible because the old optimizer +read the EM code for a complete procedure at a time. This included tasks such +as register variable reference counting and moving the information regarding +the number of bytes of local storage required by a procedure from it +.I end +pseudo instruction to it's +.I pro +pseudo instruction. These tasks are no longer done by this module but have +been moved to other modules or programs in the pipeline. The register variable +reference counting is now performed by the front end. The reordering of +code, such as the moving of mes instructions and the local storage +requirements from the end to beginning of procedures, is now performed using +the insertpart mechanism in the +.I EM_CODE +(or +.I EM_OPT +) module. +The removal of dead code is performed by the global optimizer. + +.NH +The Parser +.LP +The program to parse the tables and build the pattern table dependent dfa +routines is built from the files: +.IP parser.h 15 +header file +.IP parser.g 15 +LLGen source file defining syntax of table +.IP syntax.l 15 +Lex sources file defining form of tokens in table. +.IP initlex.c 15 +Uses the data in the library +.I em_data.a +to initialize the lexical analyser to recognize EM instruction mnemonics. +.IP outputdfa.c 15 +Routines to output the dfa when it has been constructed. It outputs the files +.I dfa.c +and +.I trans.c +.IP outcalls.c 15 +Routines to output the file +.I incalls.r +defined in the next section. +.IP findworst.c 15 +Routines to analyze patterns to find how to continue matching after a +successful replacement or failed match. + +.LP +The parser checks that the tables conform to the syntax outlined in the +previous section and also makes a number of semantic checks on their +validity. Further versions could make further checks such as looking for +cycles in the rules or checking that each replacement leaves the same +number of bytes on the stack as the pattern it replaces. The parser +builds an internal dfa representation of the rules by combining rules with +common prefixes. All local and global restrictions are combined into a single +test to be performed are a complete pattern has been detected in the input. +The idea is to build a structure so that each of the patterns can be matched +and then the corresponding tests made and the first that succeeds is replaced. +If two rules have the same pattern and both their tests also succeed the one +that appears first in the tables file will be done. Somewhat less obvious +is that if one pattern is a proper prefix of a longer pattern and its test +succeeds then the second pattern will not be checked for. + +A major task of the parser if to decide on the action to take when a rule has +been partially matched or when a pattern has been completely matched but its +test does not succeed. This requires a search of all patterns to see if any +part of the part matched could be part of some other pattern. for example +given the two patterns: +.DS +.I + loc adi w loc adi w : loc $1+$3 adi w + loc adi w loc sbi w : loc $1-$3 adi w +.R +.DE +If the first pattern fails after seeing the input: +.DS +.I + loc adi loc +.R +.DE +the parser will still need to check whether the second pattern matches. +This requires a decision on how to fix up any internal data structures in +the dfa matcher, such as moving some instructions from the pattern to the +output queue and moving the pattern along and then deciding what state +it should continue from. Similar decisions are requires after a pattern +has been replaced. For example if the replacement is empty it is necessary +to backup +.I n-1 +instructions where +.I n +is the length of the longest pattern in the tables. + +.NH +Structure of the Resulting Library + +.LP +The major data structures maintained by the library consist of three queues; +an +.I output +queue of instructions awaiting output, a +.I pattern +queue containing instructions that match the current prefix, and a +.I backup +queue of instructions that have been backed up over and need to be reparsed +for further pattern matches. + +.LP +If no errors are detected by the parser in the tables it output the following +files if they have changed from the existing version of the file: +.IP dfa.c 10 +this consists of a routine for each state in the dfa. Each routine contains +a switch statement that decides on the basis of the current instruction +opcode the next state if any in the dfa to make a transition to. +These routines are called from +.I OO_dfa +declared in +.I OO_dfa.c +via an array +.I OO_fstate +that is indexed by state. Attempt to implement this code by a large nested +switch statement experienced difficulties with compilers that had fixed +limits on the size of switch statements. A better implementation of this +might be to find some hashing function that mapped state and opcode onto a +unique value and then switch on this via an array. +.IP trans.c 10 +this contains external declarations of transition routines with names like +.B OO_xxxdotrans +(where +.I xxx +is a small integer). +These are called when there a transition to state +.I xxx +that corresponds to a +complete pattern. Any tests are performed if necessary to confirm that the +pattern matches and then the replacement instructions are placed on the +output queue and backup and freeing of instructions is performed. If there are +a number of patterns with the same instructions but different tests, these +will all appear in the same routine and the tests performed in the order they +appear in the original +.I patterns +file. +.IP incalls.r 10 +this contains an entry for every EM instruction (plus +.I lab +) giving information on how to build a routine with the name +.BI O_ xxx +for the library version of the module. +If the EM instruction does not appear in the tables +patterns at all then the dfa routine is called to flush any current queued +output and the the output +.BI C_ xxx +routine is called. If the EM instruction does appear in a pattern then the +instruction data structure is allocated, (from the free list), its fields +initialized and it is added onto the end of the pattern queue. +The dfa routines are then called to attempted to make a transition. +This file is input to the +.I awk +program +.I makefuns.awk. + +.LP +The following files contain code that is independent of the pattern tables: +.IP main.c 10 +this is used only in the stand alone version of the optimizer and consists +of code to open the input file, read the input using the +.I READ_EM(3) +module and call the dfa routines. This version does not require the routines +constructed from the incalls.r file described above. +.IP nopt.c 10 +general routines to initialize, and maintain the data structures. The file +handling routines +.I O_open +etc are defined here. Also defined are routines for flushing the output queue +by calling the +.I EM_mkcalls +routine from the +.I READ_EM(3) +module and moving instructions from the output to the backup queue. +Routines to free the strings stored in instructions +with types of +.I sof_ptyp, +.I pro_ptyp, +.I str_ptyp, +.I ico_ptyp, +.I uco_ptyp, +and +.I fco_ptyp are also defined. These strings are copied to a large array that +is extended by +.I Realloc +if it overflows. The strings can be thrown away on any flush that occurs when +the backup queue is empty. +.IP mkstrct.c 10 +contains routines to build the data structure from the input +.BI C_ xxx +routines and place the structure on the pattern queue. These routines are not +required in the stand alone optimizer. +.IP aux.c 10 +routines to implement the functions used in the rules. + +.LP +The following files are also used in building the module library: +.IP makefuns.awk 10 +this +.I awk +program is used to produce individual C files with names like +.BI O_ xxx.c +each containing a single function definition and then call the +.I cc +compiler to produce a single output file. +This enables the loader to only load those routines that are actually +needed when the library is loaded. +.IP pseudo.r 10 +this file is like the +.I incalls.r +file produced by the parser but is built by hand and handles the pseudo +EM instructions. It is also processed by +.I makefuns.awk. + +.NH +Miscellaneous Issues +.LP +The output and backup queues are maintained on fixed length arrays +of pointers the the +.I e_instr +data structure used by the +.I READ_EM(3) +module. +The size of these queues are fixed in size according to the +values of +.I MAXOUTPUT +and +.I MAXBACKUP +defined in the file +.I nopt.c. +The size of the pattern queue is set to the length of the maximum pattern +length by the tables output by the parser. +The space for the structures are initially obtained by calls to +.I Malloc +(from the +.I alloc(3) +module), +and freed when the output queue or patterns queue is cleared. These freed +structures are collected on the free list and reused to avoid the overheads +of repeated calls to +.I malloc +and +.I free. + +.LP +The fixed size of the output and pattern queues causes no difficulty in +practice and can only result in some potential optimizations being missed. +When the output queue fills it is simply prematurely flushed and backups +when the backup queue is fill are simply ignored. A possible improvement +would be to flush only part of the output queue when it fills. It should +be noted that it is not possible to statically determine the maximum possible +size for these queues as they need to be unbounded in the worst case. A +study of the rule +.DS +.I + inc dec : +.R +.DE +with the input consisting of +.I N +.I inc +and then +.I N +.I dec +instructions requires an output queue length of +.I N-1 +to find all possible replacements.