521 lines
		
	
	
	
		
			16 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			521 lines
		
	
	
	
		
			16 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
| .\" $Header$
 | |
| .TL
 | |
| Internal documentation on the peephole optimizer
 | |
| .br
 | |
| from the Amsterdam Compiler Kit
 | |
| .NH 1
 | |
| Introduction
 | |
| .PP
 | |
| Part of the Amsterdam Compiler Kit is a program to do
 | |
| peephole optimization on an EM program.
 | |
| The optimizer scans the program to match patterns from a table
 | |
| and if found makes the optimization from the table,
 | |
| and with the result of the optimization
 | |
| it tries to find yet another optimization
 | |
| continuing until no more optimizations are found.
 | |
| .PP
 | |
| Furthermore it does some optimizations that can not be called
 | |
| peephole optimizations for historical reasons,
 | |
| like branch chaining and the deletion of unreachable code.
 | |
| .PP
 | |
| The peephole optimizer consists of three parts
 | |
| .IP 1)
 | |
| A driving table
 | |
| .IP 2)
 | |
| A program translating the table to internal format
 | |
| .IP 3)
 | |
| C code compiled with the table to make the optimizer proper
 | |
| .PP
 | |
| In this document the table format, internal format and 
 | |
| data structures in the optimizer will be explained,
 | |
| plus a hint on what the code does where it might not be obvious.
 | |
| It is a simple program mostly.
 | |
| .NH 1
 | |
| Table format
 | |
| .PP
 | |
| The driving table consists of pattern/replacement pairs,
 | |
| in principle one per line,
 | |
| although a line starting with white space is considered
 | |
| a continuation line for the previous.
 | |
| The general format is:
 | |
| .DS
 | |
| optimization : pattern ':' replacement '\en'
 | |
| .sp
 | |
| pattern : EMlist optional_boolean_expression
 | |
| .sp
 | |
| replacement : EM_plus_operand_list
 | |
| .DE
 | |
| Example of a simple one
 | |
| .DS
 | |
| loc stl $1==0 : zrl $2
 | |
| .DE
 | |
| There is no real limit for the length of the pattern or the replacement,
 | |
| the replacement might even be longer than the pattern,
 | |
| and expressions can be made arbitrarily complicated.
 | |
| .PP
 | |
| The expressions in the table are made of the following pieces:
 | |
| .IP -
 | |
| Integer constants
 | |
| .IP -
 | |
| $\fIn\fP, standing for the operand of the \fIn\fP'th EM
 | |
| instruction in the pattern,
 | |
| undefined if that instruction has no operand.
 | |
| .IP -
 | |
| w, standing for the wordsize of the code optimized.
 | |
| .IP -
 | |
| p, for the pointersize.
 | |
| .IP -
 | |
| defined(expr), true if expression is defined
 | |
| .IP -
 | |
| samesign(expr,expr), true if expressions have the same sign.
 | |
| .IP -
 | |
| sfit(expr,expr), ufit(expr,expr),
 | |
| true if the first expression fits signed or unsigned in the number
 | |
| of bits given in the second expression.
 | |
| .IP -
 | |
| rotate(expr,expr),
 | |
| first expression rotated left the number of bits given by the second expression.
 | |
| .IP -
 | |
| notreg(expr),
 | |
| true if the local with the expression as number is not a candidate to put
 | |
| in a register.
 | |
| .IP -
 | |
| rom(\fIn\fP,expr), contents of the rom descriptor at index expr that
 | |
| is associated with the global label that should be the argument of
 | |
| the \fIn\fP'th EM instruction.
 | |
| Undefined if such a thing does not exist.
 | |
| .PP
 | |
| The usual arithmetic operators may be used on integer values,
 | |
| if any operand is undefined the expression is undefined,
 | |
| except for the defined() function above.
 | |
| An undefined expression used for its truth value is false.
 | |
| All arithmetic on local label operands is forbidden,
 | |
| only things allowed are tests for equality.
 | |
| Arithmetic on global labels makes sense,
 | |
| i.e. one can add a global label and a constant,
 | |
| but not two global labels.
 | |
| .PP
 | |
| In the table one can use five additional EM instructions in patterns.
 | |
| These are:
 | |
| .IP lab
 | |
| Stands for a local label
 | |
| .IP LLP
 | |
| Load Local Pointer, translates into a 
 | |
| .B lol
 | |
| or into a 
 | |
| .B ldl
 | |
| depending on the relationship between wordsize and pointersize.
 | |
| .IP LEP
 | |
| Load External Pointer, translates into a 
 | |
| .B loe
 | |
| or into a 
 | |
| .B lde .
 | |
| .IP SLP
 | |
| Store Local Pointer,
 | |
| .B stl
 | |
| or 
 | |
| .B sdl .
 | |
| .IP SEP
 | |
| Store External Pointer,
 | |
| .B ste
 | |
| or
 | |
| .B sde .
 | |
| .PP
 | |
| There is only one peephole optimizer,
 | |
| so the substitutions to be made for the last four instructions
 | |
| are made at run time before the first optimizations are made.
 | |
| .NH 1
 | |
| Internal format
 | |
| .PP
 | |
| The translating program,
 | |
| .I mktab
 | |
| converts the table into an array of bytes where all
 | |
| patterns follow unaligned.
 | |
| Format of a pattern is:
 | |
| .IP 1)
 | |
| One byte for high byte of hash value,
 | |
| will be explained later on.
 | |
| .IP 2)
 | |
| Two bytes for the index of the next pattern in a chain.
 | |
| .IP 3)
 | |
| An integer\u*\d,
 | |
| .FS
 | |
| * An integer is encoded as a byte when less than 255,
 | |
| otherwise as a byte containing 255 followed by two
 | |
| bytes with the real value.
 | |
| .FE
 | |
| pattern length.
 | |
| .IP 4)
 | |
| The list of pattern opcodes, one per byte.
 | |
| .IP 5)
 | |
| An integer expression index, 0 if not used.
 | |
| .IP 6)
 | |
| An integer, replacement length.
 | |
| .IP 7)
 | |
| A list of pairs consisting of a one byte opcode and an integer
 | |
| expression index.
 | |
| .PP
 | |
| The expressions are kept in an array of triples,
 | |
| implementing a binary tree.
 | |
| The
 | |
| .I mktab
 | |
| program tries to minimize the number of triples by reusing
 | |
| duplicates and even reverses the operands of commutative operators
 | |
| when doing so would spare a triple.
 | |
| .NH 1
 | |
| A tour through the sources
 | |
| .PP
 | |
| Now we will walk through the sources and note things of interest.
 | |
| .NH 2
 | |
| The header files
 | |
| .PP
 | |
| The header files are the place where data structures and options reside.
 | |
| .NH 3
 | |
| alloc.h
 | |
| .PP
 | |
| In the header file alloc.h several defines can be used to select various
 | |
| kinds of core allocation schemes.
 | |
| This is important on small machines like the PDP-11 since a complete
 | |
| procedure must be in core at the same space,
 | |
| and the peephole optimizer should not be the limiting factor in
 | |
| determining the maximum size of procedures if possible.
 | |
| Options are:
 | |
| .IP -
 | |
| USEMALLOC, standard malloc() and free() are used instead of the own
 | |
| core allocation package.
 | |
| Not recommended unless the own package does not work on some bizarre
 | |
| machine.
 | |
| .IP -
 | |
| COREDEBUG, prints large amounts of information about core management.
 | |
| Not recommended unless the code is changed and it stops working.
 | |
| .IP -
 | |
| SEPID, defining this will add an extra procedure that will
 | |
| go through a lot of work to scrape the last bytes together if the
 | |
| system won't provide more.
 | |
| This is not a good idea if memory is scarce and code and data reside
 | |
| in the same spaces, since the room used by the procedure might well
 | |
| be more than the room saved.
 | |
| .IP -
 | |
| STACKROOM, number of shorts used in stack space.
 | |
| This is used if memory is scarce and stack space and data space are
 | |
| different.
 | |
| On the PDP-11 a UNIX process starts with an 8K stack segment which
 | |
| cannot be transferred to the data segment.
 | |
| Under these conditions one can use a lot of the stack space for storage.
 | |
| .NH 3
 | |
| assert.h
 | |
| .PP
 | |
| Just defines the assert macro.
 | |
| When compiled with -DNDEBUG all asserts will be off.
 | |
| .NH 3
 | |
| ext.h
 | |
| .PP
 | |
| Gives external definitions of variables used by more than one module.
 | |
| .NH 3
 | |
| line.h
 | |
| .PP
 | |
| Defines the structures used to keep instructions,
 | |
| one structure per line of EM code,
 | |
| and the structure to keep arguments of pseudos,
 | |
| one structure per argument.
 | |
| Both structures essentially contain a pointer to the next,
 | |
| a type,
 | |
| and a union containing information depending on the type.
 | |
| Core is allocated only for the part of the union used.
 | |
| .PP
 | |
| The 
 | |
| .I
 | |
| struct line
 | |
| .R
 | |
| has a very compact encoding for small integers,
 | |
| they are encoded in the type field.
 | |
| On the PDP-11 this gives a line structure of only 4 bytes for most
 | |
| instructions.
 | |
| .NH 3
 | |
| lookup.h
 | |
| .PP
 | |
| Contains definition of the struct used for symbol table management,
 | |
| global labels and procedure names are kept in one table.
 | |
| .NH 3
 | |
| optim.h
 | |
| .PP
 | |
| If one defines the DIAGOPT option in this header file,
 | |
| for every optimization performed a number is written on stderr.
 | |
| The number gives the number of the pattern in the table
 | |
| or one of the four special numbers in this header file.
 | |
| .NH 3
 | |
| param.h
 | |
| .PP
 | |
| Contains one settable option,
 | |
| LONGOFF.
 | |
| If this is not defined the optimizer can only optimize programs
 | |
| with wordsize 2 and pointersize 2.
 | |
| Set this only if it must be run on a Z80 or something pathetic like that.
 | |
| .PP
 | |
| Other defines here should not be touched.
 | |
| .NH 3
 | |
| pattern.h
 | |
| .PP
 | |
| Contains defines of indices in a pattern,
 | |
| definition of the expression triples,
 | |
| definitions of the various expression operators
 | |
| and definition of the result struct where expression results are put.
 | |
| .PP
 | |
| This header file is the main one that is also included by
 | |
| .I mktab .
 | |
| .NH 3
 | |
| proinf.h
 | |
| .PP
 | |
| This one contains definitions 
 | |
| for the local label table structs
 | |
| and for the struct where all information for one procedure is kept.
 | |
| This is in one struct so it can be saved easily when recursive
 | |
| procedures have to be resolved.
 | |
| .NH 3
 | |
| tes.h
 | |
| .PP
 | |
| Contains the data structure used by the top element size computation.
 | |
| .NH 3
 | |
| types.h
 | |
| .PP
 | |
| Collection of typedefs to be used by almost all modules.
 | |
| .NH 2
 | |
| The C code itself.
 | |
| .PP
 | |
| The C code will now be the center of our attention.
 | |
| We will make a walk through the sources and we will try
 | |
| to follow the sources in a logical order.
 | |
| So we will start at
 | |
| .NH 3
 | |
| main.c
 | |
| .PP
 | |
| The main.c module contains the main() function.
 | |
| Here nothing spectacular happens,
 | |
| only thing of interest is the handling of flags:
 | |
| .IP -L
 | |
| This is an instruction to the peephole optimizer to perform
 | |
| one of its auxiliary functions, the generation of a library module.
 | |
| This makes the peephole optimizer write its output on a temporary file,
 | |
| and at the end making the real output by first generating a list
 | |
| of exported symbols and then copying the temporary file behind it.
 | |
| .IP -n
 | |
| Disables all optimization.
 | |
| Only thing the optimizer does now is filling in the blank after the
 | |
| .I END
 | |
| pseudo and resolving recursive procedures.
 | |
| .PP
 | |
| The place where main() is left is the call to getlines() which brings
 | |
| us to
 | |
| .NH 3
 | |
| getline.c
 | |
| .PP
 | |
| This module reads the EM code and constructs a list of 
 | |
| .I
 | |
| struct line
 | |
| .R
 | |
| records,
 | |
| linked together backwards,
 | |
| i.e. the first instruction read is the last in the list.
 | |
| Pseudos are handled here also,
 | |
| for most pseudos this just means that a chain of argument records
 | |
| is linked into the linked line list but some pseudos get special attention:
 | |
| .IP exc
 | |
| This pseudo is acted upon right away.
 | |
| Lines read are shuffled around according to instruction.
 | |
| .IP mes
 | |
| Some messages are acted upon.
 | |
| These are:
 | |
| .RS
 | |
| .IP ms_err 8
 | |
| The input is drained, just in case it is a pipe.
 | |
| After that the optimizer exits.
 | |
| .IP ms_opt
 | |
| The do not optimize flag is set.
 | |
| Acts just like -n on the command line.
 | |
| .IP ms_emx
 | |
| The word- and pointersize are read,
 | |
| complain if we are not able to handle this.
 | |
| .IP ms_reg
 | |
| We take notice of the offset of this local.
 | |
| See also comments in the description of peephole.c
 | |
| .RE
 | |
| .IP pro
 | |
| A new procedure starts, if we are already in one save the status,
 | |
| else process collected input.
 | |
| Collect information about this procedure and if already in a procedure
 | |
| call getlines() recursively.
 | |
| .IP end
 | |
| Process collected input.
 | |
| .PP
 | |
| The phrase "process collected input" is used twice,
 | |
| which brings us to
 | |
| .NH 3
 | |
| process.c
 | |
| .PP
 | |
| This module contains the entry point process() which is called at any
 | |
| time the collected input must be processed.
 | |
| It calls a variety of other routines to get the real work done.
 | |
| Routines in this module are in chronological order:
 | |
| .IP symknown 12
 | |
| Marks all symbols seen until now as known,
 | |
| i.e. it is now known whether their scope is local or global.
 | |
| This information is used again during output.
 | |
| .IP symvalue
 | |
| Runs through the chain of pseudos to give values to data labels.
 | |
| This needs an extra pass.
 | |
| It cannot be done during the getlines pass, since an
 | |
| .B exc
 | |
| pseudo could destroy things.
 | |
| Nor can it be done during the backward pass since it is impossible
 | |
| to do good fragment numbering backward.
 | |
| .IP checklocs
 | |
| Checks whether all local labels referenced are defined.
 | |
| It needs to be sure about this since otherwise the
 | |
| semi global optimizations made cannot work.
 | |
| .IP relabel
 | |
| This routine finds the final destination for each label in the procedure.
 | |
| Labels followed by unconditional branches or other labels are marked during
 | |
| the peephole fase and this leeds to chains of identical labels.
 | |
| These chains are followed here, and in the local label table each label
 | |
| has associated with it its replacement label, after this procedure is run.
 | |
| Care is taken in this routine to prevent a loop in the program to
 | |
| cause the optimizer to loop.
 | |
| .IP cleanlocals
 | |
| This routine empties the local label table after everything
 | |
| is processed.
 | |
| .PP
 | |
| But before this can all be done,
 | |
| the backward linked list of instructions first has to be reversed,
 | |
| so here comes
 | |
| .NH 3
 | |
| backward.c
 | |
| .PP
 | |
| The routine backward has a number of functions:
 | |
| .IP -
 | |
| It reverses the backward linked list, making two forward linked lists,
 | |
| one for the instructions and one for the pseudos.
 | |
| .IP -
 | |
| It notes the last occurrence of data labels in the backward linked list
 | |
| and puts it in the global symbol table.
 | |
| This is of course the first occurence in the procedure.
 | |
| This information is needed to decide whether the symbols are global
 | |
| or local to this module.
 | |
| .IP -
 | |
| It decides about the fragment boundaries of data blocks.
 | |
| Fragments are numbered backwards starting at 3.
 | |
| This is done to be able to make the type of an expression
 | |
| containing a symbol equal to its fragment.
 | |
| This type can then not clash with the types integer and local label.
 | |
| .IP -
 | |
| It allocates a rom buffer to every data label with a rom behind
 | |
| it, if that rom contains only plain integers at the start.
 | |
| .PP
 | |
| The first thing done after process() has called backward() and some
 | |
| of its own little routines is a call to the real routine,
 | |
| the one that does the work the program was written for
 | |
| .NH 3
 | |
| peephole.c
 | |
| .PP
 | |
| The first routines in peephole.c 
 | |
| implement a linked list for the offsets of local variables
 | |
| that are candidates for a register implementation.
 | |
| Several patterns use the notreg() function,
 | |
| since it is forbidden to combine a load of that variable
 | |
| with the load of another and
 | |
| it is not allowed to take the address of that variable.
 | |
| .PP
 | |
| The routine peephole hashes the patterns the first time it is called
 | |
| after which it doesn't do much more than calling optimize.
 | |
| But first hashpatterns().
 | |
| .PP
 | |
| The patterns are hashed at run time of the optimizer because of
 | |
| the
 | |
| .B LLP ,
 | |
| .B LEP ,
 | |
| .B SLP 
 | |
| and
 | |
| .B SEP
 | |
| instructions added to the instruction set in this optimizer.
 | |
| These are first replaced everywhere in the table by the correct
 | |
| replacement after which the first three instructions of the
 | |
| pattern are hashed and the pattern is linked into one of the
 | |
| 256 linked lists.
 | |
| There is a define CHK_HASH in this module that
 | |
| can be set if the randomness of the hashing
 | |
| function is not trusted.
 | |
| .PP
 | |
| The attention now shifts to optimize().
 | |
| This routine calls  basicblock() for every piece of code between two labels.
 | |
| It also notes which labels have another label or a branch behind them
 | |
| so the relabel() routine from process.c can do something with that.
 | |
| .PP
 | |
| Basicblock() keeps making passes over its basic block
 | |
| until no more optimizations are found.
 | |
| This might be inefficient if there is a long basicblock with some
 | |
| deep recursive optimization in one part of it.
 | |
| The entire basic block is then scanned a lot of times just for
 | |
| that one piece.
 | |
| The alternative is backing up after making an optimization and running
 | |
| through the same code again, but that is difficult
 | |
| in a single linked list.
 | |
| .PP
 | |
| It hashes instructions and calls trypat() for every pattern that has
 | |
| a full hash value match,
 | |
| i.e. lower byte and upper byte equal.
 | |
| Longest pattern is tried first.
 | |
| .PP
 | |
| Trypat() checks length and opcodes of the pattern.
 | |
| If correct it fills the iargs[] array with argument values
 | |
| and calculates the expression.
 | |
| If that is also correct the work shifts to tryrepl().
 | |
| .PP
 | |
| Tryrepl() generates the list of replacement instructions,
 | |
| links it into the list and returns true.
 | |
| Why then the name tryrepl() if it always succeeds?
 | |
| Well, there is a mechanism in the optimizer,
 | |
| unused until today that makes it possible to do optimizations that cannot
 | |
| be described by the table.
 | |
| It is possible to give a number as a replacement which will cause the
 | |
| optimizer to call a routine special() to do some work.
 | |
| This routine might decide not to do an optimization and return false.
 | |
| .PP
 | |
| The last routine that is called from process() is putline()
 | |
| to write the optimized code, bringing us to
 | |
| .NH 3
 | |
| tes.c
 | |
| .PP
 | |
| Contains the routines used by the top element size computation phase,
 | |
| which is run after the peephole-optimisation.
 | |
| The main routine of tes.c is tes_instr().  This looks at an instruction and
 | |
| decides the size of the element on top of the stack after the instruction
 | |
| is executed.  When a label is defined or used, the size of the top element
 | |
| is remembered for later use.  When the information in consistent throuhout
 | |
| the procedure, it is passed to the code generator by means of an ms_tes
 | |
| message.
 | |
| .NH 3
 | |
| putline.c
 | |
| .PP
 | |
| The major part of putline.c is the standard set of routines
 | |
| that makes EM compact code.
 | |
| The extra functions performed are:
 | |
| .IP -
 | |
| For every occurence of a global symbol it might be necessary to
 | |
| output a 
 | |
| .B exa ,
 | |
| .B exp ,
 | |
| .B ina
 | |
| or 
 | |
| .B inp
 | |
| pseudo instruction.
 | |
| That task is performed.
 | |
| .IP -
 | |
| The
 | |
| .B lin
 | |
| instructions are optimized here,
 | |
| .B lni
 | |
| instructions added for 
 | |
| .B lin
 | |
| instructions and superfluous
 | |
| .B lin
 | |
| instructions deleted.
 | |
| 
 |