Initial revision

1987-03-03 10:59:52 +00:00 · 1987-03-03 10:59:52 +00:00 · 004f017550
commit 004f017550
parent 4d4c8b45fb
30 changed files with 3903 additions and 0 deletions
--- a/doc/ego/ic/ic1
+++ b/doc/ego/ic/ic1
@ -0,0 +1,57 @@
+.bp
+.NH
+The Intermediate Code and the IC phase
+.PP
+In this chapter the intermediate code of the EM global optimizer
+will be defined.
+The 'Intermediate Code construction' phase (IC),
+which builds the initial intermediate code from
+EM Compact Assembly Language,
+will be described.
+.NH 2
+Introduction
+.PP
+The EM global optimizer is a multi pass program,
+hence there is a need for an intermediate code.
+Usually, programs in the Amsterdam Compiler Kit use the
+Compact Assembly Language format
+.[~[
+keizer architecture
+.], section 11.2]
+for this purpose.
+Although this code has some convenient features,
+such as being compact,
+it is quite unsuitable in our case,
+because of a number of reasons.
+At first, the code lacks global information
+about whole procedures or whole basic blocks.
+Second, it uses identifiers ('names') to bind
+defining and applied occurrences of
+procedures, data labels and instruction labels.
+Although this is usual in high level programming
+languages, it is awkward in an intermediate code
+that must be read many times.
+Each pass of the optimizer would have
+to incorporate an identifier look-up mechanism
+to associate a defining occurrence with each
+applied occurrence of an identifier.
+Finally, EM programs are used to declare blocks of bytes,
+rather than variables. A 'hol 6' instruction may be used to
+declare three 2-byte variables.
+Clearly, the optimizer wants to deal with variables, and
+not with rows of bytes.
+.PP
+To overcome these problems, we have developed a new
+intermediate code.
+This code does not merely consist of the EM instructions,
+but also contains global information in the
+form of tables and graphs.
+Before describing the intermediate code we will
+first leap aside to outline
+the problems one generally encounters
+when trying to store complex data structures such as
+graphs outside the program, i.e. in a file.
+We trust this will enhance the
+comprehensibility of the
+intermediate code definition and the design and implementation
+of the IC phase.
--- a/doc/ego/ic/ic2
+++ b/doc/ego/ic/ic2
@ -0,0 +1,146 @@
+.NH 2
+Representation of complex data structures in a sequential file
+.PP
+Most programmers are quite used to deal with
+complex data structures, such as
+arrays, graphs and trees.
+There are some particular problems that occur
+when storing such a data structure
+in a sequential file.
+We call data that is kept in
+main memory
+.UL internal
+,as opposed to
+.UL external
+data
+that is kept in a file outside the program.
+.sp
+We assume a simple data structure of a
+scalar type (integer, floating point number)
+has some known external representation.
+An
+.UL array
+having elements of a scalar type can be represented
+externally easily, by successively
+representing its elements.
+The external representation may be preceded by a
+number, giving the length of the array.
+Now, consider a linear, singly linked list,
+the elements of which look like:
+.DS
+record
+        data: scalar_type;
+        next: pointer_type;
+end;
+.DE
+It is significant to note that the "next"
+fields of the elements only have a meaning within
+main memory.
+The field contains the address of some location in
+main memory.
+If a list element is written to a file in
+some program,
+and read by another program,
+the element will be allocated at a different
+address in main memory.
+Hence this address value is completely
+useless outside the program.
+.sp
+One may represent the list by ignoring these "next" fields
+and storing the data items in the order they are linked.
+The "next" fields are represented \fIimplicitly\fR.
+When the file is read again,
+the same list can be reconstructed.
+In order to know where the external representation of the
+list ends,
+it may be useful to put the length of
+the list in front of it.
+.sp
+Note that arrays and linear lists have the
+same external representation.
+.PP
+A doubly linked, linear list,
+with elements of the type:
+.DS
+record
+        data: scalar_type;
+        next,
+        previous: pointer_type;
+end
+.DE
+can be represented in precisely the same way.
+Both the "next" and the "previous" fields are represented
+implicitly.
+.PP
+Next, consider a binary tree,
+the nodes of which have type:
+.DS
+record
+        data: scalar_type;
+        left,
+        right: pointer_type;
+end
+.DE
+Such a tree can be represented sequentially,
+by storing its nodes in some fixed order, e.g. prefix order.
+A special null data item may be used to
+denote a missing left or right son.
+For example, let the scalar type be integer,
+and let the null item be 0.
+Then the tree of fig. 3.1(a)
+can be represented as in fig. 3.1(b).
+.DS
+                        4
+
+                    9      12
+
+                12    3   4   6
+
+                     8  1  5 1
+
+Fig. 3.1(a) A binary tree
+
+
+4 9 12 0 0 3 8 0 0 1 0 0 12 4 0 5 0 0 6 1 0 0 0
+
+Fig. 3.1(b) Its sequential representation
+.DE
+We are still able to represent the pointer fields ("left"
+and "right") implicitly.
+.PP
+Finally, consider a general
+.UL graph
+, where each node has a "data" field and
+pointer fields,
+with no restriction on where they may point to.
+Now we're at the end of our tale.
+There is no way to represent the pointers implicitly,
+like we did with lists and trees.
+In order to represent them explicitly,
+we use the following scheme.
+Every node gets an extra field,
+containing some unique number that identifies the node.
+We call this number its
+.UL id.
+A pointer is represented externally as the id of the node
+it points to.
+When reading the file we use a table that maps
+an id to the address of its node.
+In general this table will not be completely filled in
+until we have read the entire external representation of
+the graph and allocated internal memory locations for
+every node.
+Hence we cannot reconstruct the graph in one scan.
+That is, there may be some pointers from node A to B,
+where B is placed after A in the sequential file than A.
+When we read the node of A we cannot map the id of B
+to the address of node B,
+as we have not yet allocated node B.
+We can overcome this problem if the size
+of every node is known in advance.
+In this case we can allocate memory for a node
+on first reference.
+Else, the mapping from id to pointer
+cannot be done while reading nodes.
+The mapping can be done either in an extra scan
+or at every reference to the node.
--- a/doc/ego/ic/ic3
+++ b/doc/ego/ic/ic3
@ -0,0 +1,414 @@
+.NH 2
+Definition of the intermediate code
+.PP
+The intermediate code of the optimizer consists
+of several components:
+.IP -
+the object table
+.IP -
+the procedure table
+.IP -
+the em code
+.IP -
+the control flow graphs
+.IP -
+the loop table
+.LP -
+.PP
+These components are described in
+the next sections.
+The syntactic structure of every component
+is described by a set of context free syntax rules,
+with the following conventions:
+.DS
+x               a non-terminal symbol
+A               a terminal symbol (in capitals)
+x: a b c;       a grammar rule
+a | b           a or b
+(a)+            1 or more occurrences of a
+{a}             0 or more occurrences of a
+.DE
+.NH 3
+The object table
+.PP
+EM programs declare blocks of bytes rather than (global) variables.
+A typical program may declare 'HOL 7780'
+to allocate space for 8 I/O buffers,
+2 large arrays and 10 scalar variables.
+The optimizer wants to deal with
+.UL objects
+like variables, buffers and arrays
+and certainly not with huge numbers of bytes.
+Therefore the intermediate code contains information
+about which global objects are used.
+This information can be obtained from an EM program
+by just looking at the operands of instruction
+such as LOE, LAE, LDE, STE, SDE, INE, DEE and ZRE.
+.PP
+The object table consists of a list of
+.UL datablock
+entries.
+Each such entry represents a declaration like HOL, BSS,
+CON or ROM.
+There are five kinds of datablock entries.
+The fifth kind,
+UNKNOWN, denotes a declaration in a
+separately compiled file that is not made
+available to the optimizer.
+Each datablock entry contains the type of the block,
+its size, and a description of the objects that
+belong to it.
+If it is a rom,
+it also contains a list of values given
+as arguments to the rom instruction,
+provided that this list contains only integer numbers.
+An object has an offset (within its datablock)
+and a size.
+The size need not always be determinable.
+Both datablock and object contain a unique
+identifying number
+(see previous section for their use).
+.DS
+.UL syntax
+  object_table:
+                {datablock} ;
+  datablock:
+                D_ID            -- unique identifying number
+                PSEUDO          -- one of ROM,CON,BSS,HOL,UNKNOWN
+                SIZE            -- # bytes declared
+                FLAGS
+                {value}         -- contents of rom
+                {object} ;      -- objects of the datablock
+  object:
+                O_ID            -- unique identifying number
+                OFFSET          -- offset within the datablock
+                SIZE ;          -- size of the object in bytes
+  value:
+                argument ;
+.DE
+A data block has only one flag: "external", indicating
+whether the data label is externally visible.
+The syntax for "argument" will be given later on
+(see em_text).
+.NH 3
+The procedure table
+.PP
+The procedure table contains global information
+about all procedures that are made available
+to the optimizer
+and that are needed by the EM program.
+(Library units may not be needed, see section 3.5).
+The table has one entry for
+every procedure.
+.DS
+.UL syntax
+  procedure_table:
+                {procedure}
+  procedure:
+                P_ID            -- unique identifying number
+                #LABELS         -- number of instruction labels
+                #LOCALS         -- number of bytes for locals 
+		#FORMALS        -- number of bytes for formals
+                FLAGS           -- flag bits
+                calling         -- procedures called by this one
+                change          -- info about global variables changed
+                use ;           -- info about global variables used
+  calling:
+                {P_ID} ;        -- procedures called
+  change:
+                ext             -- external variables changed
+                FLAGS ;
+  use:
+                FLAGS ;
+  ext:
+                {O_ID} ;        -- a set of objects
+.DE
+.PP
+The number of bytes of formal parameters accessed by
+a procedure is determined by the front ends and
+passed via a message (parameter message) to the optimizer.
+If the front end is not able to determine this number
+(e.g. the parameter may be an array of dynamic size or
+the procedure may have a variable number of arguments) the attribute
+contains the value 'UNKNOWN_SIZE'.
+.sp 0
+A procedure has the following flags:
+.IP -
+external: true if the proc. is externally visible
+.IP -
+bodyseen: true if its code is available as EM text
+.IP -
+calunknown: true if it calls a procedure that has its bodyseen
+flag not set
+.IP -
+environ: true if it uses or changes a (non-global) variable in
+a lexically enclosing procedure
+.IP -
+lpi: true if is used as operand of an lpi instruction, so
+it may be called indirect
+.LP
+The change and use attributes both have one flag: "indirect",
+indicating whether the procedure does a 'use indirect'
+or a 'store indirect' (indirect means through a pointer).
+.NH 3
+The EM text
+.PP
+The EM text contains the EM instructions.
+Every EM instruction has an operation code (opcode)
+and 0 or 1 operands.
+EM pseudo instructions can have more than
+1 operand.
+The opcode is just a small (8 bit) integer.
+.sp
+There are several kinds of operands, which we will
+refer to as
+.UL types.
+Many EM instructions can have more than one type of operand.
+The types and their encodings in Compact Assembly Language
+are discussed extensively in.
+.[~[
+keizer architecture 
+.], section 11.2]
+Of special interest is the way numeric values
+are represented.
+Of prime importance is the machine independency of
+the representation.
+Ultimately, one could store every integer
+just as a string of the characters '0' to '9'.
+As doing arithmetic on strings is awkward,
+Compact Assembly Language allows several alternatives.
+The main idea is to look at the value of the integer.
+Integers that fit in 16, 32 or 64 bits are
+represented as a row of resp. 2, 4 and 8 bytes,
+preceded by an indication of how many bytes are used.
+Longer integers are represented as strings;
+this is only allowed within pseudo instructions, however.
+This concept works very well for target machines
+with reasonable word sizes.
+At present, most ACK software cannot be used for word sizes
+higher than 32 bits,
+although the handles for using larger word sizes are
+present in the design of the EM code.
+In the intermediate code we essentially use the
+same ideas.
+We allow three representations of integers.
+.IP -
+integers that fit in a short are represented as a short
+.IP -
+integers that fit in a long but not in a short are represented
+as longs
+.IP -
+all remaining integers are represented as strings
+(only allowed in pseudos).
+.LP
+The terms short and long are defined in
+.[~[
+ritchie reference manual programming language
+.], section 4]
+and depend only on the source machine
+(i.e. the machine on which ACK runs),
+not on the target machines.
+For historical reasons a long will often be called an
+.UL offset.
+.PP
+Operands can also be instruction labels,
+objects or procedures.
+Instruction labels are denoted by a
+.UL label
+.UL identifier,
+which can be distinguished from a normal identifier.
+.sp
+The operand of a pseudo instruction can be a list of
+.UL arguments.
+Arguments can have the same type as operands, except
+for the type short, which is not used for arguments.
+Furthermore, an argument can be a string or
+a string representation of a signed integer, unsigned integer
+or floating point number.
+If the number of arguments is not fully determined by
+the pseudo instruction (e.g. a ROM pseudo can have any number
+of arguments), then the list is terminated by a special
+argument of type CEND.
+.DS
+.UL syntax
+  em_text:
+                {line} ;
+  line:
+                INSTR           -- opcode
+                OPTYPE          -- operand type
+                operand ;
+  operand:
+                empty |         -- OPTYPE = NO
+                SHORT |         -- OPTYPE = SHORT
+                OFFSET |        -- OPTYPE = OFFSET
+                LAB_ID |        -- OPTYPE = INSTRLAB
+                O_ID |          -- OPTYPE = OBJECT
+                P_ID |          -- OPTYPE = PROCEDURE
+                {argument} ;    -- OPTYPE = LIST
+  argument:
+                ARGTYPE
+                arg ;
+  arg:
+                empty |         -- ARGTYPE = CEND
+                OFFSET |
+                LAB_ID |
+                O_ID |
+                P_ID |
+                string |        -- ARGTYPE = STRING
+                const ;         -- ARGTYPE = ICON,UCON or FCON
+  string:
+                LENGTH          -- number of characters
+                {CHARACTER} ;
+  const:
+                SIZE            -- number of bytes
+                string ;        -- string representation of (un)signed
+                                -- or floating point constant
+.DE
+.NH 3
+The control flow graphs
+.PP
+Each procedure can be divided
+into a number of basic blocks.
+A basic block is a piece of code with
+no jumps in, except at the beginning,
+and no jumps out, except at the end.
+.PP
+Every basic block has a set of
+.UL successors,
+which are basic blocks that can follow it immediately in
+the dynamic execution sequence.
+The
+.UL predecessors
+are the basic blocks of which this one
+is a successor.
+The successor and predecessor attributes
+of all basic blocks of a single procedure
+are said to form the
+.UL control
+.UL flow
+.UL graph
+of that procedure.
+.PP
+Another important attribute is the
+.UL immediate
+.UL dominator.
+A basic block B dominates a block C if
+every path in the graph from the procedure entry block
+to C goes through B.
+The immediate dominator of C is the closest dominator
+of C on any path from the entry block.
+(Note that the dominator relation is transitive,
+so the immediate dominator is well defined.)
+.PP
+A basic block also has an attribute containing
+the identifiers of every
+.UL loop
+that the block belongs to (see next section for loops).
+.DS
+.UL syntax
+  control_flow_graph:
+                {basic_block} ;
+  basic_block:
+                B_ID            -- unique identifying number
+                #INSTR          -- number of EM instructions
+                succ
+                pred
+                idom            -- immediate dominator
+                loops           -- set of loops
+		FLAGS ;         -- flag bits
+  succ:
+                {B_ID} ;
+  pred:
+                {B_ID} ;
+  idom:
+                B_ID ;
+  loops:
+                {LP_ID} ;
+.DE
+The flag bits can have the values 'firm' and 'strong',
+which are explained below.
+.NH 3
+The loop tables
+.PP
+Every procedure has an associated
+.UL loop
+.UL table
+containing information about all the loops
+in the procedure.
+Loops can be detected by a close inspection of
+the control flow graph.
+The main idea is to look for two basic blocks,
+B and C, for which the following holds:
+.IP -
+B is a successor of C
+.IP -
+B is a dominator of C
+.LP
+B is called the loop
+.UL entry
+and C is called the loop
+.UL end.
+Intuitively, C contains a jump backwards to
+the beginning of the loop (B).
+.PP
+A loop L1 is said to be
+.UL nested
+within loop L2 if all basic blocks of L1
+are also part of L2.
+It is important to note that loops could
+originally be written as a well structured for -or
+while loop or as a messy goto loop.
+Hence loops may partly overlap without one
+being nested inside the other.
+The
+.UL nesting
+.UL level
+of a loop is the number of loops in
+which it is nested (so it is 0 for
+an outermost loop).
+The details of loop detection will be discussed later.
+.PP
+It is often desirable to know whether a
+basic block gets executed during every iteration
+of a loop.
+This leads to the following definitions:
+.IP -
+A basic block B of a loop L is said to be a \fIfirm\fR block
+of L if B is executed on all successive iterations of L,
+with the only possible exception of the last iteration.
+.IP -
+A basic block B of a loop L is said to be a \fIstrong\fR block
+of L if B is executed on all successive iterations of L.
+.LP
+Note that a strong block is also a firm block.
+If a block is part of a conditional statement, it is neither
+strong nor firm, as it may be skipped during some iterations
+(see Fig. 3.2).
+.DS
+loop
+       if cond1 then
+	      ... -- this code will not
+		  -- result in a firm or strong block
+       end if;
+       ...  -- strong (always executed)
+       exit when cond2;
+       ...  -- firm (not executed on
+            -- last iteration).
+end loop;
+
+Fig. 3.2 Example of firm and strong block
+.DE
+.DS
+.UL syntax
+  looptable:
+                {loop} ;
+  loop:
+                LP_ID           -- unique identifying number
+                LEVEL           -- loop nesting level
+                entry           -- loop entry block
+                end ;
+  entry:
+                B_ID ;
+  end:
+                B_ID ;
+.DE
--- a/doc/ego/ic/ic4
+++ b/doc/ego/ic/ic4
@ -0,0 +1,80 @@
+.NH 2
+External representation of the intermediate code
+.PP
+The syntax of the intermediate code was given
+in the previous section.
+In this section we will make some remarks about
+the representation of the code in sequential files.
+.sp
+We use sequential files in order to avoid
+the bookkeeping of complex file indices.
+As a consequence of this decision
+we can't store all components
+of the intermediate code
+in one file.
+If a phase wishes to change some attribute
+of a procedure,
+or wants to add or delete entire procedures
+(inline substitution may do the latter),
+the procedure table will only be fully updated
+after the entire EM text has been scanned.
+Yet, the next phase undoubtedly wants
+to read the procedure table before it
+starts working on the EM text.
+Hence there is an ordering problem, which
+can be solved easily by putting the
+procedure table in a separate file.
+Similarly, the data block table is kept
+in a file of its own.
+.PP
+The control flow graphs (CFGs) could be mixed
+with the EM text.
+Rather, we have chosen to put them
+in a separate file too.
+The control flow graph file should be regarded as a
+file that imposes some structure on the EM-text file,
+just as an overhead sheet containing a picture
+of a Flow Chart may be put on an overhead sheet
+containing statements.
+The loop tables are also put in the CFG file.
+A loop imposes an extra structure on the
+CFGs and hence on the EM text.
+So there are four files:
+.IP -
+the EM-text file
+.IP -
+the procedure table file
+.IP -
+the object table file
+.IP -
+the CFG and loop tables file
+.LP
+Every table is preceded by its length, in order to
+tell where it ends.
+The CFG file also contains the number of instructions of
+every basic block,
+indicating which part of the EM text belongs
+to that block.
+.DS
+.UL syntax
+  intermediate_code:
+                object_table_file
+                proctable_file
+                em_text_file
+                cfg_file ;
+  object_table_file:
+                LENGTH          -- number of objects
+                object_table ;
+  proctable_file:
+                LENGTH          -- number of procedures
+                procedure_table ;
+  em_text_file:
+                em_text ;
+  cfg_file:
+                {per_proc} ;    -- one for every procedure
+  per_proc:
+                BLENGTH         -- number of basic blocks
+                LLENGTH         -- number of loops
+                control_flow_graph
+                looptable ;
+.DE
--- a/doc/ego/ic/ic5
+++ b/doc/ego/ic/ic5
@ -0,0 +1,163 @@
+.NH 2
+The Intermediate Code construction phase
+.PP
+The first phase of the global optimizer,
+called
+.UL IC,
+constructs a major part of the intermediate code.
+To be specific, it produces:
+.IP -
+the EM text
+.IP -
+the object table
+.IP -
+part of the procedure table
+.LP
+The calling, change and use attributes of a procedure
+and all its flags except the external and bodyseen flags
+are computed by the next phase (Control Flow phase).
+.PP
+As explained before,
+the intermediate code does not contain
+any names of variables or procedures.
+The normal identifiers are replaced by identifying
+numbers.
+Yet, the output of the global optimizer must
+contain normal identifiers, as this
+output is in Compact Assembly Language format.
+We certainly want all externally visible names
+to be the same in the input as in the output,
+because the optimized EM module may be a library unit,
+used by other modules.
+IC dumps the names of all procedures and data labels
+on two files:
+.IP -
+the procedure dump file, containing tuples (P_ID, procedure name)
+.IP -
+the data dump file, containing tuples (D_ID, data label name)
+.LP
+The names of instruction labels are not dumped,
+as they are not visible outside the procedure
+in which they are defined.
+.PP
+The input to IC consists of one or more files.
+Each file is either an EM module in Compact Assembly Language
+format, or a Unix archive file (library) containing such modules.
+IC only extracts those modules from a library that are
+needed somehow, just as a linker does.
+It is advisable to present as much code
+of the EM program as possible to the optimizer,
+although it is not required to present the whole program.
+If a procedure is called somewhere in the EM text,
+but its body (text) is not included in the input,
+its bodyseen flag in the procedure table will still
+be off.
+Whenever such a procedure is called,
+we assume the worst case for everything;
+it will change and use all variables it has access to,
+it will call every procedure etc.
+.sp
+Similarly, if a data label is used
+but not defined, the PSEUDO attribute in its data block
+will be set to UNKNOWN.
+.NH 3
+Implementation
+.PP
+Part of the code for the EM Peephole Optimizer
+.[
+staveren peephole toplass
+.]
+has been used for IC.
+Especially the routines that read and unravel
+Compact Assembly Language and the identifier
+lookup mechanism have been used.
+New code was added to recognize objects,
+build the object and procedure tables and to
+output the intermediate code.
+.PP
+IC uses singly linked linear lists for both the
+procedure and object table.
+Hence there are no limits on the size of such
+a table (except for the trivial fact that it must fit
+in main memory).
+Both tables are outputted after all EM code has
+been processed.
+IC reads the EM text of one entire procedure
+at a time,
+processes it and appends the modified code to
+the EM text file.
+EM code is represented internally as a doubly linked linear
+list of EM instructions.
+.PP
+Objects are recognized by looking at the operands
+of instructions that reference global data.
+If we come across the instructions:
+.DS
+LDE X+6         -- Load Double External
+LAE X+20        -- Load Address External
+.DE
+we conclude that the data block
+preceded by the data label X contains an object
+at offset 6 of size twice the word size,
+and an object at offset 20 of unknown size.
+.sp
+A data block entry of the object table is allocated
+at the first reference to a data label.
+If this reference is a defining occurrence
+or a INA pseudo instruction,
+the label is not externally visible
+.[~[
+keizer architecture
+.], section 11.1.4.3]
+In this case, the external flag of the data block
+is turned off.
+If the first reference is an applied occurrence
+or a EXA pseudo instruction, the flag is set.
+We record this information, because the
+optimizer may change the order of defining and
+applied occurrences.
+The INA and EXA pseudos are removed from the EM text.
+They may be regenerated by the last phase
+of the optimizer.
+.sp
+Similar rules hold for the procedure table
+and the INP and EXP pseudos.
+.NH 3
+Source files of IC
+.PP
+The source files of IC consist
+of the files ic.c, ic.h and several packages.
+.UL ic.h
+contains type definitions, macros and
+variable declarations that may be used by
+ic.c and by every package.
+.UL ic.c
+contains the definitions of these variables,
+the procedure
+.UL main
+and some high level I/O routines used by main.
+.sp
+Every package xxx consists of two files.
+ic_xxx.h contains type definitions,
+macros, variable declarations and
+procedure declarations that may be used by
+every .c file that includes this .h file.
+The file ic_xxx.c provides the
+definitions of these variables and
+the implementation of the declared procedures.
+IC uses the following packages:
+.IP lookup: 18
+procedures that loop up procedure, data label
+and instruction label names; procedures to dump
+the procedure and data label names.
+.IP lib:
+one procedure that gets the next useful input module;
+while scanning archives, it skips unnecessary modules.
+.IP aux:
+several auxiliary routines.
+.IP io:
+low-level I/O routines that unravel the Compact
+Assembly Language.
+.IP put:
+routines that output the intermediate code
+.LP
--- a/doc/ego/il/il1
+++ b/doc/ego/il/il1
@ -0,0 +1,112 @@
+.bp
+.NH 1
+Inline substitution
+.NH 2
+Introduction
+.PP
+The Inline Substitution technique (IL)
+tries to decrease the overhead associated
+with procedure calls (invocations).
+During a procedure call, several actions
+must be undertaken to set up the right
+environment for the called procedure.
+.[
+johnson calling sequence
+.]
+On return from the procedure, most of these
+effects must be undone.
+This entire process introduces significant
+costs in execution time as well as
+in object code size.
+.PP
+The inline substitution technique replaces
+some of the calls by the modified body of
+the called procedure, hence eliminating
+the overhead.
+Furthermore, as the calling and called procedure
+are now integrated, they can be optimized
+together, using other techniques of the optimizer.
+This often leads to extra opportunities for
+optimization
+.[
+ball predicting effects
+.]
+.[
+carter code generation cacm
+.]
+.[
+scheifler inline cacm
+.]
+.PP
+An inline substitution of a call to a procedure P increases
+the size of the program, unless P is very small or P is
+called only once.
+In the latter case, P can be eliminated.
+In practice, procedures that are called only once occur
+quite frequently, due to the
+introduction of structured programming.
+(Carter
+.[
+carter umi ann arbor
+.]
+states that almost 50% of the Pascal procedures
+he analyzed were called just once).
+.PP
+Scheifler
+.[
+scheifler inline cacm
+.]
+has a more general view of inline substitution.
+In his model, the program under consideration is
+allowed to grow by a certain amount,
+i.e. code size is sacrificed to speed up the program.
+The above two cases are just special cases of
+his model, obtained by setting the size-change to
+(approximately) zero.
+He formulates the substitution problem as follows:
+.IP
+"Given a program, a subset of all invocations,
+a maximum program size, and a maximum procedure size,
+find a sequence of substitutions that minimizes
+the expected execution time."
+.LP
+Scheifler shows that this problem is NP-complete
+.[~[
+aho hopcroft ullman analysis algorithms
+.], chapter 10]
+by reduction to the Knapsack Problem.
+Heuristics will have to be used to find a near-optimal
+solution.
+.PP
+In the following chapters we will extend
+Scheifler's view and adapt it to the EM Global Optimizer.
+We will first describe the transformations that have
+to be applied to the EM text when a call is substituted
+in line.
+Next we will examine in which cases inline substitution
+is not possible or desirable.
+Heuristics will be developed for
+chosing a good sequence of substitutions.
+These heuristics make no demand on the user
+(such as making profiles
+.[
+scheifler inline cacm
+.]
+or giving pragmats
+.[~[
+ichbiah ada military standard
+.], section 6.3.2]),
+although the model could easily be extended
+to use such information.
+Finally, we will discuss the implementation
+of the IL phase of the optimizer.
+.PP
+We will often use the term inline expansion
+as a synonym of inline substitution.
+.sp 0
+The inverse technique of procedure abstraction
+(automatic subroutine generation)
+.[
+shaffer subroutine generation
+.]
+will not be discussed in this report.
--- a/doc/ego/il/il2
+++ b/doc/ego/il/il2
@ -0,0 +1,93 @@
+.NH 2
+Parameters and local variables.
+.PP
+In the EM calling sequence, the calling procedure
+pushes its parameters on the stack
+before doing the CAL.
+The called routine first saves some
+status information on the stack and then
+allocates space for its own locals
+(also on the stack).
+Usually, one special purpose register,
+the Local Base (LB) register,
+is used to access both the locals and the
+parameters.
+If memory is highly segmented,
+the stack frames of the caller and the callee
+may be allocated in different fragments;
+an extra Argument Base (AB) register is used
+in this case to access the actual parameters.
+See 4.2 of
+.[
+keizer architecture
+.]
+for further details.
+.PP
+If a procedure call is expanded in line,
+there are two problems:
+.IP 1. 3
+No stack frame will be allocated for the called procedure;
+we must find another place to put its locals.
+.IP 2.
+The LB register cannot be used to access the actual
+parameters;
+as the CAL instruction is deleted, the LB will
+still point to the local base of the \fIcalling\fR procedure.
+.LP
+The local variables of the called procedure will
+be put in the stack frame of the calling procedure,
+just after its own locals.
+The size of the stack frame of the
+calling procedure will be increased
+during its entire lifetime.
+Therefore our model will allow a
+limit to be set on the number of bytes
+for locals that the called procedure may have
+(see next section).
+.PP
+There are several alternatives to access the parameters.
+An actual parameter may be any auxiliary expression,
+which we will refer to as
+the \fIactual parameter expression\fR.
+The value of this expression is stored
+in a location on the stack (see above),
+the \fIparameter location\fR.
+.sp 0
+The alternatives for accessing parameters are:
+.IP -
+save the value of the stackpointer at the point of the CAL
+in a temporary variable X;
+this variable can be used to simulate the AB register,  i.e.
+parameter locations are accessed via an offset to
+the value of X.
+.IP -
+create a new temporary local variable T for
+the parameter (in the stack frame of the caller);
+every access to the parameter location must be changed
+into an access to T.
+.IP -
+do not evaluate the actual parameter expression before the call;
+instead, substitute this expression for every use of the
+parameter location.
+.LP
+The first method may be expensive if X is not
+put in a register.
+We will not use this method.
+The time required to evaluate and access the
+parameters when the second method is used
+will not differ much from the normal
+calling sequence (i.e. not in line call).
+It is not expensive, but there are no
+extra savings either.
+The third method is essentially the 'by name'
+parameter mechanism of Algol60.
+If the actual parameter is just a numeric constant,
+it is advantageous to use it.
+Yet, there are several circumstances
+under which it cannot or should not be used.
+We will deal with this in the next section.
+.sp 0
+In general we will use the third method,
+if it is possible and desirable.
+Such parameters will be called \fIin line parameters\fR.
+In all other cases we will use the second method.
--- a/doc/ego/il/il3
+++ b/doc/ego/il/il3
@ -0,0 +1,164 @@
+.NH 2
+Feasibility and desirability analysis
+.PP
+Feasibility and desirability analysis
+of in line substitution differ
+somewhat from most other techniques.
+Usually, much effort is needed to find
+a feasible opportunity for optimization
+(e.g. a redundant subexpression).
+Desirability analysis then checks
+if it is really advantageous to do
+the optimization.
+For IL, opportunities are easy to find.
+To see if an in line expansion is
+desirable will not be hard either.
+Yet, the main problem is to find the most
+desirable ones.
+We will deal with this problem later and
+we will first attend feasibility and
+desirability analysis.
+.PP
+There are several reasons why a procedure invocation
+cannot or should not be expanded in line.
+.sp
+A call to a procedure P cannot be expanded in line
+in any of the following cases:
+.IP 1. 3
+The body of P is not available as EM text.
+Clearly, there is no way to do the substitution.
+.IP 2.
+P, or any procedure called by P (transitively),
+follows the chain of statically enclosing
+procedures (via a LXL or LXA instruction)
+or follows the chain of dynamically enclosing
+procedures (via a DCH).
+If the call were expanded in line,
+one level would be removed from the chains,
+leading to total chaos.
+This chaos could be solved by patching up
+every LXL, LXA or DCH in all procedures
+that could be part of the chains,
+but this is hard to implement.
+.IP 3.
+P, or any procedure called by P (transitively),
+calls a procedure whose body is not
+available as EM text.
+The unknown procedure may use an LXL, LXA or DCH.
+However, in several languages a separately
+compiled procedure has no access to the
+static or dynamic chain.
+In this case
+this point does not apply.
+.IP 4.
+P, or any procedure called by P (transitively),
+uses the LPB instruction, which converts a
+local base to an argument base;
+as the locals and parameters are stored
+in a non-standard way (differing from the
+normal EM calling sequence) this instruction
+would yield incorrect results.
+.IP 5.
+The total number of bytes of the parameters
+of P is not known.
+P may be a procedure with a variable number
+of parameters or may have an array of dynamic size
+as value parameter.
+.LP
+It is undesirable to expand a call to a procedure P in line
+in any of the following cases:
+.IP 1. 3
+P is large, i.e. the number of EM instructions
+of P exceeds some threshold.
+The expanded code would be large too.
+Furthermore, several programs in ACK,
+including the global optimizer itself,
+may run out of memory if they they have to run
+in a small address space and are provided
+very large procedures.
+The threshold may be set to infinite,
+in which case this point does not apply.
+.IP 2.
+P has many local variables.
+All these variables would have to be allocated
+in the stack frame of the calling procedure.
+.PP
+If a call may be expanded in line, we have to
+decide how to access its parameters.
+In the previous section we stated that we would
+use in line parameters whenever possible and desirable.
+There are several reasons why a parameter
+cannot or should not be expanded in line.
+.sp
+No parameter of a procedure P can be expanded in line,
+in any of the following cases:
+.IP 1. 3
+P, or any procedure called by P (transitively),
+does a store-indirect or a use-indirect (i.e. through
+a pointer).
+However, if the front-end has generated messages
+telling that certain parameters can not be accessed
+indirectly, those parameters may be expanded in line.
+.IP 2.
+P, or any procedure called by P (transitively),
+calls a procedure whose body is not available as EM text.
+The unknown procedure may do a store-indirect
+or a use-indirect.
+However, the same remark about front-end messages
+as for 1. holds here.
+.IP 3.
+The address of a parameter location is taken (via a LAL).
+In the normal calling sequence, all parameters
+are stored sequentially. If the address of one
+parameter location is taken, the address of any
+other parameter location can be computed from it.
+Hence we must put every parameter in a temporary location;
+furthermore, all these locations must be in
+the same order as for the normal calling sequence.
+.IP 4.
+P has overlapping parameters; for example, it uses
+the parameter at offset 10 both as a 2 byte and as a 4 byte
+parameter.
+Such code may be produced by the front ends if
+the formal parameter is of some record type
+with variants.
+.PP
+Sometimes a specific parameter must not be expanded in line.
+.sp 0
+An actual parameter expression cannot be expanded in line
+in any of the following cases:
+.IP 1. 3
+P stores into the parameter location.
+Even if the actual parameter expression is a simple
+variable, it is incorrect to change the 'store into
+formal' into a 'store into actual', because of
+the parameter mechanism used.
+In Pascal, the following expansion is incorrect:
+.DS
+procedure p (x:integer);
+begin
+   x := 20;
+end;
+...
+a := 10;                a := 10;
+p(a);        --->       a := 20;
+write(a);               write(a);
+.DE
+.IP 2.
+P changes any of the operands of the
+actual parameter expression.
+If the expression is expanded and evaluated
+after the operand has been changed,
+the wrong value will be used.
+.IP 3.
+The actual parameter expression has side effects.
+It must be evaluated only once,
+at the place of the call.
+.LP
+It is undesirable to expand an actual parameter in line
+in the following case:
+.IP 1. 3
+The parameter is used more than once
+(dynamically) and the actual parameter expression
+is not just a simple variable or constant.
+.LP
--- a/doc/ego/il/il4
+++ b/doc/ego/il/il4
@ -0,0 +1,132 @@
+.NH 2
+Heuristic rules
+.PP
+Using the information described
+in the previous section,
+we can find all calls that can
+be expanded in line, and for which
+this expansion is desirable.
+In general, we cannot expand all these calls,
+so we have to choose the 'best' ones.
+With every CAL instruction
+that may be expanded, we associate
+a \fIpay off\fR,
+which expresses how desirable it is
+to expand this specific CAL.
+.sp
+Let Tc denote the portion of EM text involved
+in a specific call, i.e. the pushing of the actual
+parameter expressions, the CAL itself,
+the popping of the parameters and the
+pushing of the result (if any, via an LFR).
+Let Te denote the EM text that would be obtained
+by expanding the call in line.
+Let Pc be the original program and Pe the program
+with Te substituted for Tc.
+The pay off of the CAL depends on two factors:
+.IP -
+T = execution_time(Pe) - execution_time(Pc)
+.IP -
+S = code_size(Pe) - code_size(Pc)
+.LP
+The change in execution time (T) depends on:
+.IP -
+T1 = execution_time(Te) - execution_time(Tc)
+.IP -
+N = number of times Te or Tc get executed.
+.LP
+We assume that T1 will be the same every
+time the code gets executed.
+This is a reasonable assumption.
+(Note that we are talking about one CAL,
+not about different calls to the same procedure).
+Hence
+.DS
+T = N * T1
+.DE
+T1 can be estimated by a careful analysis
+of the transformations that are performed.
+Below, we list everything that will be
+different when a call is expanded in line:
+.IP -
+The CAL instruction is not executed.
+This saves a subroutine jump.
+.IP -
+The instructions in the procedure prolog
+are not executed.
+These instructions, generated from the PRO pseudo,
+save some machine registers 
+(including the old LB), set the new LB and allocate space
+for the locals of the called routine.
+The savings may be less if there are no
+locals to allocate.
+.IP -
+In line parameters are not evaluated before the call
+and are not pushed on the stack.
+.IP -
+All remaining parameters are stored in local variables,
+instead of being pushed on the stack.
+.IP -
+If the number of parameters is nonzero,
+the ASP instruction after the CAL is not executed.
+.IP -
+Every reference to an in line parameter is
+substituted by the parameter expression.
+.IP -
+RET (return) instructions are replaced by
+BRA (branch) instructions.
+If the called procedure 'falls through'
+(i.e. it has only one RET, at the end of its code),
+even the BRA is not needed.
+.IP -
+The LFR (fetch function result) is not executed
+.PP
+Besides these changes, which are caused directly by IL,
+other changes may occur as IL influences other optimization
+techniques, such as Register Allocation and Constant Propagation.
+Our heuristic rules do not take into account the quite
+inpredictable effects on Register Allocation.
+It does, however, favour calls that have numeric \fIconstants\fR
+as parameter; especially the constant "0" as an inline
+parameter gets high scores,
+as further optimizations may often be possible.
+.PP
+It cannot be determined statically how often a CAL instruction gets
+executed.
+We will use \fIloop nesting\fR information here.
+The nesting level of the loop in which
+the CAL appears (if any) will be used as an
+indication for the number of times it gets executed.
+.PP
+Based on all these facts,
+the pay off of a call will be computed.
+The following model was developed empirically.
+Assume procedure P calls procedure Q.
+The call takes place in basic block B.
+.DS
+ZP = # zero parameters
+CP = # constant parameters - ZP
+LN = Loop Nesting level (0 if outside any loop)
+F  = \fIif\fR # formal parameters of Q > 0 \fIthen\fR 1 \fIelse\fR 0
+FT = \fIif\fR Q falls through \fIthen\fR 1 \fIelse\fR 0
+S  = size(Q) - 1 - # inline_parameters - F
+L  = \fIif\fR # local variables of P > 0 \fIthen\fR 0 \fIelse\fR -1
+A  = CP + 2 * ZP
+N  = \fIif\fR LN=0 and P is never called from a loop \fIthen\fR 0 \fIelse\fR (LN+1)**2
+FM = \fIif\fR B is a firm block \fIthen\fR 2 \fIelse\fR 1
+
+pay_off = (100/S + FT + F + L + A) * N * FM
+.DE
+S stands for the size increase of the program,
+which is slightly less than the size of Q.
+The size of a procedure is taken to be its number
+of (non-pseudo) EM instructions.
+The terms "loop nesting level" and "firm" were defined
+in the chapter on the Intermediate Code (section "loop tables").
+If a call is not inside a loop and the calling procedure
+is itself never called from a loop (transitively),
+then the call will probably be executed at most once.
+Such a call is never expanded in line (its pay off is zero).
+If the calling procedure doesn't have local variables, a penalty (L)
+is introduced, as it will most likely get local variables if the
+call gets expanded.
--- a/doc/ego/il/il5
+++ b/doc/ego/il/il5
@ -0,0 +1,440 @@
+.NH 2
+Implementation
+.PP
+A major factor in the implementation
+of Inline Substitution is the requirement
+not to use an excessive amount of memory.
+IL essentially analyzes the entire program;
+it makes decisions based on which procedure calls
+appear in the whole program.
+Yet, because of the memory restriction, it is
+not feasible to read the entire program
+in main memory.
+To solve this problem, the IL phase has been
+split up into three subphases that are executed sequentially:
+.IP 1.
+analyze every procedure; see how it accesses its parameters;
+simultaneously collect all calls
+appearing in the whole program an put them
+in a \fIcall-list\fR.
+.IP 2.
+use the call-list and decide which calls will be substituted
+in line.
+.IP 3.
+take the decisions of subphase 2 and modify the
+program accordingly.
+.LP
+Subphases 1 and 3 scan the input program; only
+subphase 3 modifies it.
+It is essential that the decisions can be made
+in subphase 2
+without using the input program,
+provided that subphase 1 puts enough information
+in the call-list.
+Subphase 2 keeps the entire call-list in main memory
+and repeatedly scans it, to
+find the next best candidate for expansion.
+.PP
+We will specify the
+data structures used by IL before 
+describing the subphases.
+.NH 3
+Data structures
+.NH 4
+The procedure table
+.PP
+In subphase 1 information is gathered about every procedure
+and added to the procedure table.
+This information is used by the heuristic rules.
+A proctable entry for procedure p has
+the following extra information:
+.IP -
+is it allowed to substitute an invocation of p in line?
+.IP -
+is it allowed to put any parameter of such a call in line?
+.IP -
+the size of p (number of EM instructions)
+.IP -
+does p 'fall through'?
+.IP -
+a description of the formal parameters that p accesses; this information
+is obtained by looking at the code of p. For every parameter f,
+we record:
+.RS
+.IP -
+the offset of f
+.IP -
+the type of f (word, double word, pointer)
+.IP -
+may the corresponding actual parameter be put in line?
+.IP -
+is f ever accessed indirectly?
+.IP -
+if f used: never, once or more than once?
+.RE
+.IP -
+the number of times p is called (see below)
+.IP -
+the file address of its call-count information (see below).
+.LP
+.NH 4
+Call-count information
+.PP
+As a result of Inline Substitution, some procedures may
+become useless, because all their invocations have been
+substituted in line.
+One of the tasks of IL is to keep track which
+procedures are no longer called.
+Note that IL is especially keen on procedures that are
+called only once
+(possibly as a result of expanding all other calls to it).
+So we want to know how many times a procedure
+is called \fIduring\fR Inline Substitution.
+It is not good enough to compute this
+information afterwards.
+The task is rather complex, because
+the number of times a procedure is called
+varies during the entire process:
+.IP 1.
+If a call to p is substituted in line,
+the number of calls to p gets decremented by 1.
+.IP 2.
+If a call to p is substituted in line,
+and p contains n calls to q, then the number of calls to q
+gets incremented by n.
+.IP 3.
+If a procedure p is removed (because it is no
+longer called) and p contains n calls to q,
+then the number of calls to q gets decremented by n.
+.LP
+(Note that p may be the same as q, if p is recursive).
+.sp 0
+So we actually want to have the following information:
+.DS
+NRCALL(p,q) = number of call to q appearing in p,
+
+for all procedures p and q that may be put in line.
+.DE
+This information, called \fIcall-count information\fR is
+computed by the first subphase.
+It is stored in a file.
+It is represented as a number of lists, rather than as
+a (very sparse) matrix.
+Every procedure has a list of (proc,count) pairs,
+telling which procedures it calls, and how many times.
+The file address of its call-count list is stored
+in its proctable entry.
+Whenever this information is needed, it is fetched from
+the file, using direct access.
+The proctable entry also contains the number of times
+a procedure is called, at any moment.
+.NH 4
+The call-list
+.PP
+The call-list is the major data structure use by IL.
+Every item of the list describes one procedure call.
+It contains the following attributes:
+.IP -
+the calling procedure (caller)
+.IP -
+the called procedure (callee)
+.IP -
+identification of the CAL instruction (sequence number)
+.IP -
+the loop nesting level; our heuristic rules appreciate
+calls inside a loop (or even inside a loop nested inside
+another loop, etc.) more than other calls
+.IP -
+the actual parameter expressions involved in the call;
+for every actual, we record:
+.RS
+.IP -
+the EM code of the expression
+.IP -
+the number of bytes of its result (size)
+.IP -
+an indication if the actual may be put in line
+.RE
+.LP
+The structure of the call-list is rather complex.
+Whenever a call is expanded in line, new calls
+will suddenly appear in the program,
+that were not contained in the original body
+of the calling subroutine.
+These calls are inherited from the called procedure.
+We will refer to these invocations as \fInested calls\fR
+(see Fig. 5.1).
+.DS
+procedure p is
+begin                           .
+     a();                       .
+     b();                       .
+end;
+
+procedure r is            procedure r is
+begin                     begin
+     x();                      x();
+     p();  -- in line          a();  -- nested call
+     y();                      b();  -- nested call
+end;                           y();
+                          end;
+
+Fig. 5.1 Example of nested procedure calls
+.DE
+Nested calls may subsequently be put in line too
+(probably resulting in a yet deeper nesting level, etc.).
+So the call-list does not always reflect the source program,
+but changes dynamically, as decisions are made.
+If a call to p is expanded, all calls appearing in p
+will be added to the call-list.
+.sp 0
+A convenient and elegant way to represent
+the call-list is to use a LISP-like list.
+.[
+poel lisp trac
+.]
+Calls that appear at the same level
+are linked in the CDR direction. If a call C
+to a procedure p is expanded,
+all calls appearing in p are put in a sub-list
+of C, i.e. in its CAR.
+In the example above, before the decision
+to expand the call to p is made, the
+call-list of procedure r looks like:
+.DS
+(call-to-x, call-to-p, call-to-y)
+.DE
+After the decision, it looks like:
+.DS
+(call-to-x, (call-to-p*, call-to-a, call-to-b), call-to-y)
+.DE
+The call to p is marked, because it has been
+substituted.
+Whenever IL wants to traverse the call-list of some procedure,
+it uses the well-known LISP technique of
+recursion in the CAR direction and
+iteration in the CDR direction
+(see page 1.19-2 of
+.[
+poel lisp trac
+.]
+).
+All list traversals look like:
+.DS
+traverse(list)
+{
+    for (c = first(list); c != 0; c = CDR(c)) {
+	if (c is marked) {
+	    traverse(CAR(c));
+	} else {
+	    do something with c
+	}
+    }
+}
+.DE
+The entire call-list consists of a number of LISP-like lists,
+one for every procedure.
+The proctable entry of a procedure contains a pointer
+to the beginning of the list.
+.NH 3
+The first subphase: procedure analysis
+.PP
+The tasks of the first subphase are to determine
+several attributes of every procedure
+and to construct the basic call-list,
+i.e. without nested calls.
+The size of a procedure is determined
+by simply counting its EM instructions.
+Pseudo instructions are skipped.
+A procedure does not 'fall through' if its CFG
+contains a basic block
+that is not the last block of the CFG and
+that ends on a RET instruction.
+The formal parameters of a procedure are determined
+by inspection of
+its code.
+.PP
+The call-list in constructed by looking at all CAL instructions
+appearing in the program.
+The call-list should only contain calls to procedures
+that may be put in line.
+This fact is only known if the procedure was
+analyzed earlier.
+If a call to a procedure p appears in the program
+before the body of p,
+the call will always be put in the call-list.
+If p is later found to be unsuitable,
+the call will be removed from the list by the
+second subphase.
+.PP
+An important issue is the recognition
+of the actual parameter expressions of the call.
+The front ends produces messages telling how many
+bytes of formal parameters every procedure accesses.
+(If there is no such message for a procedure, it
+cannot be put in line).
+The actual parameters together must account for
+the same number of bytes.A recursive descent parser is used
+to parse side-effect free EM expressions.
+It uses a table and some
+auxiliary routines to determine
+how many bytes every EM instruction pops from the stack
+and how many bytes it pushes onto the stack.
+These numbers depend on the EM instruction, its argument,
+and the wordsize and pointersize of the target machine.
+Initially, the parser has to recognize the
+number of bytes specified in the formals-message,
+say N.
+Assume the first instruction before the CAL pops S bytes
+and pushes R bytes.
+If R > N, too many bytes are recognized
+and the parser fails.
+Else, it calls itself recursively to recognize the
+S bytes used as operand of the instruction.
+If it succeeds in doing so, it continues with the next instruction,
+i.e. the first instruction before the code recognized by
+the recursive call, to recognize N-R more bytes.
+The result is a number of EM instructions that collectively push N bytes.
+If an instruction is come across that has side-effects
+(e.g. a store or a procedure call) or of which R and S cannot
+be computed statically (e.g. a LOS), it fails.
+.sp 0
+Note that the parser traverses the code backwards.
+As EM code is essentially postfix code, the parser works top down.
+.PP
+If the parser fails to recognize the parameters, the call will not
+be substituted in line.
+If the parameters can be determined, they still have to
+match the formal parameters of the called procedure.
+This check is performed by the second subphase; it cannot be
+done here, because it is possible that the called
+procedure has not been analyzed yet.
+.PP
+The entire call-list is written to a file,
+to be processed by the second subphase.
+.NH 3
+The second subphase: making decisions
+.PP
+The task of the second subphase is quite easy
+to understand.
+It reads the call-list file,
+builds an incore call-list and deletes every
+call that may not be expanded in line (either because the called
+procedure may not be put in line, or because the actual parameters
+of the call do not match the formal parameters of the called procedure).
+It assigns a \fIpay-off\fR to every call,
+indicating how desirable it is to expand it.
+.PP
+The subphase repeatedly scans the call-list and takes
+the call with the highest ratio.
+The chosen one gets marked,
+and the call-list is extended with the nested calls,
+as described above.
+These nested calls are also assigned a ratio,
+and will be considered too during the next scans.
+.sp 0
+After every decision the number of times
+every procedure is called is updated, using
+the call-count information.
+Meanwhile, the subphase keeps track of the amount of space left
+available.
+If all space is used, or if there are no more calls left to
+be expanded, it exits this loop.
+Finally, calls to procedures that are called only
+once are also chosen.
+.PP
+The actual parameters of a call are only needed by
+this subphase to assign a ratio to a call.
+To save some space, these actuals are not kept in main memory.
+They are removed after the call has been read and a ratio
+has been assigned to it.
+So this subphase works with \fIabstracts\fR of calls.
+After all work has been done,
+the actual parameters of the chosen calls are retrieved
+from a file,
+as they are needed by the transformation subphase.
+.NH 3
+The third subphase: doing transformations
+.PP
+The third subphase makes the actual modifications to
+the EM text.
+It is directed by the decisions made in the previous subphase,
+as expressed via the call-list.
+The call-list read by this subphase contains
+only calls that were selected for expansion.
+The list is ordered in the same way as the EM text,
+i.e. if a call C1 appears before a call C2 in the call-list,
+C1 also appears before C2 in the EM text.
+So the EM text is traversed linearly,
+the calls that have to be substituted are determined
+and the modifications are made.
+If a procedure is come across that is no longer needed,
+it is simply not written to the output EM file.
+The substitution of a call takes place in distinct steps:
+.IP "change the calling sequence" 7
+.sp 0
+The actual parameter expressions are changed.
+Parameters that are put in line are removed.
+All remaining ones must store their result in a
+temporary local variable, rather than
+push it on the stack.
+The CAL instruction and any ASP (to pop actual parameters)
+or LFR (to fetch the result of a function)
+are deleted.
+.IP "fetch the text of the called procedure"
+.sp 0
+Direct disk access is used to to read the text of the
+called procedure.
+The file offset is obtained from the proctable entry.
+.IP "allocate bytes for locals and temporaries"
+.sp 0
+The local variables of the called procedure will be put in the
+stack frame of the calling procedure.
+The same applies to any temporary variables
+that hold the result of parameters
+that were not put in line.
+The proctable entry of the caller is updated.
+.IP "put a label after the CAL"
+.sp 0
+If the called procedure contains a RET (return) instruction
+somewhere in the middle of its text (i.e. it does
+not fall through), the RET must be changed into
+a BRA (branch), to jump over the
+remainder of the text.
+This label is not needed if the called
+procedure falls through.
+.IP "copy the text of the called procedure and modify it"
+.sp 0
+References to local variables of the called routine
+and to parameters that are not put in line
+are changed to refer to the
+new local of the caller.
+References to in line parameters are replaced
+by the actual parameter expression.
+Returns (RETs) are either deleted or
+replaced by a BRA.
+Messages containing information about local
+variables or parameters are changed.
+Global data declarations and the PRO and END pseudos
+are removed.
+Instruction labels and references to them are
+changed to make sure they do not have the
+same identifying number as
+labels in the calling procedure.
+.IP "insert the modified text"
+.sp 0
+The pseudos of the called procedure are put after the pseudos
+of the calling procedure.
+The real text of the callee is put at
+the place where the CAL was.
+.IP "take care of nested substitutions"
+.sp 0
+The expanded procedure may contain calls that
+have to be expanded too (nested calls).
+If the descriptor of this call contains actual
+parameter expressions,
+the code of the expressions has to be changed
+the same way as the code of the callee was changed.
+Next, the entire process of finding CALs and doing
+the substitutions is repeated recursively.
+.LP
--- a/doc/ego/il/il6
+++ b/doc/ego/il/il6
@ -0,0 +1,27 @@
+.NH 2
+Source files of IL
+.PP
+The sources of IL are in the following files
+and packages (the prefixes 1_, 2_ and 3_ refer to the three subphases):
+.IP il.h: 14
+declarations of global variables and
+data structures
+.IP il.c:
+the routine main; the driving routines of the three subphases
+.IP 1_anal:
+contains a subroutine that analyzes a procedure
+.IP 1_cal:
+contains a subroutine that analyzes a call
+.IP 1_aux:
+implements auxiliary procedures used by subphase 1
+.IP 2_aux:
+implements auxiliary procedures used by subphase 2
+.IP 3_subst:
+the driving routine for doing the substitution
+.IP 3_change:
+lower level routines that do certain modifications
+.IP 3_aux:
+implements auxiliary procedures used by subphase 3
+.IP aux
+implements auxiliary procedures used by several subphases.
+.LP
--- a/doc/ego/intro/head
+++ b/doc/ego/intro/head
@ -0,0 +1,7 @@
+.ND
+.ll 80m
+.nr LL 80m
+.nr tl 78m
+.tr ~ 
+.ds >. .
+.ds [. " \[
--- a/doc/ego/intro/intro1
+++ b/doc/ego/intro/intro1
@ -0,0 +1,79 @@
+.TL
+The design and implementation of
+the EM Global Optimizer
+.AU
+H.E. Bal
+.AI
+Vrije Universiteit
+Wiskundig Seminarium, Amsterdam
+.AB
+The EM Global Optimizer is part of the Amsterdam Compiler Kit,
+a toolkit for making retargetable compilers.
+It optimizes the intermediate code common to all compilers of
+the toolkit (EM),
+so it can be used for all programming languages and
+all processors supported by the kit.
+.PP
+The optimizer is based on well-understood concepts like
+control flow analysis and data flow analysis.
+It performs the following optimizations:
+Inline Substitution, Strength Reduction, Common Subexpression Elimination,
+Stack Pollution, Cross Jumping, Branch Optimization, Copy Propagation,
+Constant Propagation, Dead Code Elimination and Register Allocation.
+.PP
+This report describes the design of the optimizer and several
+of its implementation issues.
+.AE
+.bp
+.NH 1
+Introduction
+.PP
+.FS
+This work was supported by the
+Stichting Technische Wetenschappen (STW)
+under grant VWI00.0001.
+.FE
+The EM Global Optimizer is part of a software toolkit
+for making production-quality retargetable compilers.
+This toolkit,
+called the Amsterdam Compiler Kit
+.[
+tanenbaum toolkit rapport
+.]
+.[
+tanenbaum toolkit cacm
+.]
+runs under the Unix*
+.FS
+*Unix is a Trademark of Bell Laboratories
+.FE
+operating system.
+.sp 0
+The main design philosophy of the toolkit is to use
+a language- and machine-independent
+intermediate code, called EM.
+.[
+keizer architecture
+.]
+The basic compilation process can be split up into
+two parts.
+A language-specific front end translates the source program into EM.
+A machine-specific back end transforms EM to assembly code
+of the target machine.
+.PP
+The global optimizer is an optional phase of the
+compilation process, and can be used to obtain
+machine code of a higher quality.
+The optimizer transforms EM-code to better EM-code,
+so it comes between the front end and the back end.
+It can be used with any combination of languages
+and machines, as far as they are supported by
+the compiler kit.
+.PP
+This report describes the design of the
+global optimizer and several of its
+implementation issues.
+Measurements can be found in.
+.[
+bal tanenbaum global
+.]
--- a/doc/ego/intro/tail
+++ b/doc/ego/intro/tail
@ -0,0 +1,3 @@
+.[
+$LIST$
+.]
--- a/doc/ego/lv/lv1
+++ b/doc/ego/lv/lv1
@ -0,0 +1,95 @@
+.bp
+.NH 1
+Live-Variable analysis
+.NH 2
+Introduction
+.PP
+The "Live-Variable analysis" optimization technique (LV)
+performs some code improvements and computes information that may be
+used by subsequent optimizations.
+The main task of this phase is the 
+computation of \fIlive-variable information\fR.
+.[~[
+aho compiler design
+.] section 14.4]
+A variable A is said to be \fIdead\fR at some point p of the
+program text, if on no path in the control flow graph
+from p to a RET (return), A can be used before being changed;
+else A is said to be \fIlive\fR. 
+.PP
+A statement of the form
+.DS
+VARIABLE := EXPRESSION
+.DE
+is said to be dead if the left hand side variable is dead just after
+the statement and the right hand side expression has no
+side effects (i.e. it doesn't change any variable).
+Such a statement can be eliminated entirely.
+Dead code will seldom be present in the original program,
+but it may be the result of earlier optimizations,
+such as copy propagation.
+.PP
+Live-variable information is passed to other phases via
+messages in the EM code.
+Live/dead messages are generated at points in the EM text where
+variables become dead or live.
+This information is especially useful for the Register
+Allocation phase.
+.NH 2
+Implementation
+.PP
+The implementation uses algorithm 14.6 of.
+.[
+aho compiler design
+.]
+First two sets DEF and USE are computed for every basic block b:
+.IP DEF(b) 9
+the set of all variables that are assigned a value in b before
+being used
+.IP USE(b) 9
+the set of all variables that may be used in b before being changed.
+.LP
+(So variables that may, but need not, be used resp. changed via a procedure
+call or through a pointer are included in USE but not in DEF).
+The next step is to compute the sets IN and OUT :
+.IP IN[b] 9
+the set of all variables that are live at the beginning of b
+.IP OUT[b] 9
+the set of all variables that are live at the end of b
+.LP
+IN and OUT can be computed for all blocks simultaneously by solving the
+data flow equations:
+.DS
+(1)   IN[b] = OUT[b] - DEF[b] + USE[b]
+[2]   OUT[b] = IN[s1] + ... + IN[sn] ;
+	where SUCC[b] = {s1, ... , sn}
+.DE
+The equations are solved by a similar algorithm as for
+the Use Definition equations (see previous chapter).
+.PP
+Finally, each basic block is visited in turn to remove its dead code
+and to emit the live/dead messages.
+Every basic block b is traversed from its last
+instruction backwards to the beginning of b.
+Initially, all variables that are dead at the end
+of b are marked dead. All others are marked live.
+If we come across an assignment to a variable X that
+was marked live, a live-message is put after the
+assignment and X is marked dead;
+if X was marked dead, the assignment may be removed, provided that
+the right hand side expression contains no side effects.
+If we come across a use of a variable X that
+was marked dead, a dead-message is put after the
+use and X is marked live.
+So at any point, the mark of X tells whether X is
+live or dead immediately before that point.
+A message is also generated at the start of a basic block
+for every variable that was live at the end of the (textually)
+previous block, but dead at the entry of this block, or v.v.
+.PP
+Only local variables are considered.
+This significantly reduces the memory needed by this phase,
+eases the implementation and is hardly less efficient than
+considering all variables.
+(Note that it is very hard to prove that an assignment to
+a global variable is dead).
--- a/doc/ego/ov/ov1
+++ b/doc/ego/ov/ov1
@ -0,0 +1,371 @@
+.bp
+.NH 1
+Overview of the global optimizer
+.NH 2
+The ACK compilation process
+.PP
+The EM Global Optimizer is one of three optimizers that are
+part of the Amsterdam Compiler Kit (ACK).
+The phases of ACK are:
+.IP 1.
+A Front End translates a source program to EM
+.IP 2.
+The Peephole Optimizer
+.[
+tanenbaum staveren peephole toplass
+.]
+reads EM code and produces 'better' EM code.
+It performs a number of optimizations (mostly peephole
+optimizations)
+such as constant folding, strength reduction and unreachable code
+elimination.
+.IP 3.
+The Global Optimizer further improves the EM code.
+.IP 4.
+The Code Generator transforms EM to assembly code
+of the target computer.
+.IP 5.
+The Target Optimizer improves the assembly code.
+.IP 6.
+An Assembler/Loader generates an executable file.
+.LP
+For a more extensive overview of the ACK compilation process,
+we refer to.
+.[
+tanenbaum toolkit rapport
+.]
+.[
+tanenbaum toolkit cacm
+.]
+.PP
+The input of the Global Optimizer may consist of files and
+libraries.
+Every file or module in the library must contain EM code in
+Compact Assembly Language format.
+.[~[
+tanenbaum machine architecture
+.], section 11.2]
+The output consists of one such EM file.
+The input files and libraries together need not
+constitute an entire program,
+although as much of the program as possible should be supplied.
+The more information about the program the optimizer 
+gets, the better its output code will be.
+.PP
+The Global Optimizer is language- and machine-independent,
+i.e. it can be used for all languages and machines supported by ACK.
+Yet, it puts some unavoidable restrictions on the EM code
+produced by the Front End (see below).
+It must have some knowledge of the target machine.
+This knowledge is expressed in a machine description table
+which is passed as argument to the optimizer.
+This table does not contain very detailed information about the
+target (such as its instruction set and addressing modes).
+.NH 2
+The EM code
+.PP
+The definition of EM, the intermediate code of all ACK compilers,
+is given in a separate document.
+.[
+tanenbaum machine architecture
+.]
+We will only discuss some features of EM that are most relevant
+to the Global Optimizer.
+.PP
+EM is the assembly code of a virtual \fIstack machine\fR.
+All operations are performed on the top of the stack.
+For example, the statement "A := B + 3" may be expressed in EM as:
+.DS
+LOL -4         -- push local variable B
+LOC 3          -- push constant 3
+ADI 2          -- add two 2-byte items on top of
+	       -- the stack and push the result
+STL -2         -- pop A
+.DE
+So EM is essentially a \fIpostfix\fR code.
+.PP
+EM has a rich instruction set, containing several arithmetic
+and logical operators.
+It also contains special-case instructions (such as INCrement).
+.PP
+EM has \fIglobal\fR (\fIexternal\fR) variables, accessible
+by all procedures and \fIlocal\fR variables, accessible by a few
+(nested) procedures.
+The local variables of a lexically enclosing procedure may
+be accessed via a \fIstatic link\fR. 
+EM has instructions to follow the static chain.
+There are EM instruction to allow a procedure
+to access its local variables directly (such as LOL and STL above).
+Local variables are referenced via an offset in the stack frame
+of the procedure, rather than by their names (e.g. -2 and -4 above).
+The EM code does not contain the (source language) type
+of the variables.
+.PP
+All structured statements in the source program are expressed in
+low level jump instructions.
+Besides conditional and unconditional branch instructions, there are 
+two case instructions (CSA and CSB),
+to allow efficient translation of case statements.
+.NH 2
+Requirements on the EM input
+.PP
+As the optimizer should be useful for all languages,
+it clearly should not put severe restrictions on the EM code
+of the input.
+There is, however, one immovable requirement:
+it must be possible to determine the \fIflow of control\fR of the
+input program.
+As virtually all global optimizations are based on control flow information,
+the optimizer would be totally powerless without it.
+For this reason we restrict the usage of the case jump instructions (CSA/CSB)
+of EM.
+Such an instruction is always called with the address of a case descriptor
+on top the the stack.
+.[~[
+tanenbaum machine architecture
+.] section 7.4]
+This descriptor contains the labels of all possible
+destinations of the jump.
+We demand that all case descriptors are allocated in a global
+data fragment of type ROM, i.e. the case descriptors
+may not be modifyable.
+Furthermore, any case instruction should be immediately preceded by
+a LAE (Load Address External) instruction, that loads the
+address of the descriptor,
+so the descriptor can be uniquely identified.
+.PP
+The optimizer will work improperly if the user deceives the control flow.
+We will give two methods to do this.
+.PP
+In "C" the notorious library routines "setjmp" and "longjmp"
+.[
+unix programmer's manual
+.]
+may be used to jump out of a procedure,
+but can also be used for a number of other stuffy purposes,
+for example, to create an extra entry point in a loop.
+.DS
+ while (condition) {
+	 ....
+	 setjmp(buf);
+	 ...
+ }
+ ...
+ longjmp(buf);
+.DE
+The invocation to longjmp actually is a jump to the place of
+the last call to setjmp with the same argument (buf).
+As the calls to setjmp and longjmp are indistinguishable from
+normal procedure calls, the optimizer will not see the danger.
+No need to say that several loop optimizations will behave
+unexpectedly when presented with such pathological input.
+.PP
+Another way to deceive the flow of control is
+by using exception handling routines.
+Ada*
+.FS
+* Ada is a registered trademark of the U.S. Government
+(Ada Joint Program Office).
+.FE
+has clearly recognized the dangers of exception handling,
+but other languages (such as PL/I) have not.
+.[
+ada rationale
+.]
+.PP
+The optimizer will be more effective if the EM input contains
+some extra information about the source program.
+Especially the \fIregister message\fR is very important.
+These messages indicate which local variables may never be
+accessed indirectly.
+Most optimizations benefit significantly by this information.
+.PP
+The Inline Substitution technique needs to know how many bytes
+of formal parameters every procedure accesses.
+Only calls to procedures for which the EM code contains this information
+will be substituted in line.
+.NH 2
+Structure of the optimizer
+.PP
+The Global Optimizer is organized as a number of \fIphases\fR,
+each one performing some task.
+The main structure is as follows:
+.IP IC 6
+the Intermediate Code construction phase transforms EM into the
+intermediate code (ic) of the optimizer
+.IP CF
+the Control Flow phase extends the ic with control flow
+information and interprocedural information
+.IP OPTs
+zero or more optimization phases, each one performing one or
+more related optimizations
+.IP CA
+the Compact Assembly phase generates Compact Assembly Language EM code
+out of ic.
+.LP
+.PP
+An important issue in the design of a global optimizer is the
+interaction between optimization techniques.
+It is often advantageous to combine several techniques in
+one algorithm that takes into account all interactions between them.
+Ideally, one single algorithm should be developed that does
+all optimizations simultaneously and deals with all possible interactions.
+In practice, such an algorithm is still far out of  reach.
+Instead some rather ad hoc (albeit important) combinations are chosen,
+such as Common Subexpression Elimination and Register Allocation.
+.[
+prabhala sethi common subexpressions
+.]
+.[
+sethi ullman optimal code
+.]
+.PP
+In the Em Global Optimizer there is one separate algorithm for
+every technique.
+Note that this does not mean that all techniques are independent
+of each other.
+.PP
+In principle, the optimization phases can be run in any order;
+a phase may even be run more than once.
+However, the following rules should be obeyed:
+.IP -
+the Live Variable analysis phase (LV) must be run prior to
+Register Allocation (RA), as RA uses information outputted by LV.
+.IP -
+RA should be the last phase; this is a consequence of the way
+the interface between RA and the Code Generator is defined.
+.LP
+The ordering of the phases has significant impact on
+the quality of the produced code.
+In
+.[
+wulf overview production quality carnegie-mellon
+.]
+two kinds of phase ordering problems are distinguished.
+If two techniques A and B both take away opportunities of each other,
+there is a "negative" ordering problem.
+If, on the other hand, both A and B introduce new optimization
+opportunities for each other, the problem is called "positive".
+In the Global Optimizer the following interactions must be
+taken into account:
+.IP -
+Inline Substitution (IL) may create new opportunities for most
+other techniques, so it should be run as early as possible
+.IP -
+Use Definition analysis (UD) may introduce opportunities for LV.
+.IP -
+Strength Reduction may create opportunities for UD
+.LP
+The optimizer has a default phase ordering, which can
+be changed by the user.
+.NH 2
+Structure of this document
+.PP
+The remaining chapters of this document each describe one
+phase of the optimizer.
+For every phase, we describe its task, its design,
+its implementation, and its source files.
+The latter two sections are intended to aid the
+maintenance of the optimizer and
+can be skipped by the initial reader.
+.NH 2
+References
+.PP
+There are very 
+few modern textbooks on optimization.
+Chapters 12, 13, and 14 of
+.[
+aho compiler design
+.]
+are a good introduction to the subject.
+Wulf et. al.
+.[
+wulf optimizing compiler
+.]
+describe one specific optimizing (Bliss) compiler.
+Anklam et. al.
+.[
+anklam vax-11
+.]
+discuss code generation and optimization in
+compilers for one specific machine (a Vax-11).
+Kirchgaesner et. al. 
+.[
+optimizing ada compiler
+.]
+present a brief description of many
+optimizations; the report also contains a lengthy (over 60 pages)
+bibliography.
+.PP
+The number of articles on optimization is quite impressive.
+The Lowrey and Medlock paper on the Fortran H compiler
+.[
+object code optimization
+.]
+is a classical one.
+Other papers on global optimization are.
+.[
+faiman optimizing pascal
+.]
+.[
+perkins sites
+.]
+.[
+harrison general purpose optimizing
+.]
+.[
+morel partial redundancies
+.]
+.[
+Mintz global optimizer
+.]
+Freudenberger
+.[
+freudenberger setl optimizer
+.]
+describes an optimizer for a Very High Level Language (SETL).
+The Production-Quality Compiler-Compiler (PQCC) project uses
+very sophisticated compiler techniques, as described in.
+.[
+wulf overview ieee
+.]
+.[
+wulf overview carnegie-mellon
+.]
+.[
+wulf machine-relative
+.]
+.PP
+Several Ph.D. theses are dedicated to optimization.
+Davidson
+.[
+davidson simplifying
+.]
+outlines a machine-independent peephole optimizer that
+improves assembly code.
+Katkus
+.[
+katkus
+.]
+describes how efficient programs can be obtained at little cost by
+optimizing only a small part of a program.
+Photopoulos
+.[
+photopoulos mixed code
+.]
+discusses the idea of generating interpreted intermediate code as well
+as assembly code, to obtain programs that are both small and  fast.
+Shaffer
+.[
+shaffer automatic
+.]
+describes the theory of automatic subroutine generation.
+.]
+Leverett
+.[
+leverett register allocation compilers
+.]
+deals with register allocation in the PQCC compilers.
+.PP
+References to articles about specific optimization techniques
+will be given in later chapters.
--- a/doc/ego/ra/ra1
+++ b/doc/ego/ra/ra1
@ -0,0 +1,33 @@
+.bp
+.NH 1
+Register Allocation
+.NH 2
+Introduction
+.PP
+The efficient usage of the general purpose registers
+of the target machine plays a key role in any optimizing compiler.
+This subject, often referred to as \fIRegister Allocation\fR,
+has great impact on both the code generator and the
+optimizing part of such a compiler.
+The code generator needs registers for at least the evaluation of
+arithmetic expressions;
+the optimizer uses the registers to decrease the access costs
+of frequently used entities (such as variables).
+The design of an optimizing compiler must pay great
+attention to the cooperation of optimization, register allocation
+and code generation.
+.PP
+Register allocation has received much attention in literature (see
+.[
+leverett register allocation compilers
+.]
+.[
+chaitin register coloring
+.]
+.[
+freiburghouse usage counts
+.]
+and
+.[~[
+sites register
+.]]).
--- a/doc/ego/ra/ra2
+++ b/doc/ego/ra/ra2
@ -0,0 +1,139 @@
+.NH 2
+Usage of registers in ACK compilers
+.PP
+We will first describe the major design decisions 
+of the Amsterdam Compiler Kit,
+as far as they concern register allocation.
+Subsequently we will outline 
+the role of the Global Optimizer in the register
+allocation process and the interface
+between the code generator and the optimizer.
+.NH 3
+Usage of registers without the intervention of the Global Optimizer
+.PP
+Registers are used for two purposes:
+.IP 1.
+for the evaluation of arithmetic expressions
+.IP 2.
+to hold local variables, for the duration of the procedure they
+are local to.
+.LP
+It is essential to note that no translation part of the compilers,
+except for the code generator, knows anything at all
+about the register set of the target computer.
+Hence all decisions about registers are ultimately made by
+the code generator.
+Earlier phases of a compiler can only \fIadvise\fR the code generator.
+.PP
+The code generator splits the register set into two:
+a fixed part for the evaluation of expressions (called \fIscratch\fR
+registers) and a fixed part to store local variables.
+This partitioning, which depends only on the target computer, significantly
+reduces the complexity of register allocation, at the penalty
+of some loss of code quality.
+.PP
+The code generator has some (machine-dependent) knowledge of the access costs
+of memory locations and registers and of the costs of saving and
+restoring registers. (Registers are always saved by the \fIcalled\fR
+procedure).
+This knowledge is expressed in a set of procedures for each target machine.
+The code generator also knows how many registers there are and of
+which type they are.
+A register can be of type \fIpointer\fR, \fIfloating point\fR
+or \fIgeneral\fR.
+.PP
+The front ends of the compilers determine which local variables may
+be put in a register;
+such a variable may never be accessed indirectly (i.e. through a pointer).
+The front end also determines the types and sizes of these variables.
+The type can be any of the register types or the type \fIloop variable\fR,
+which denotes a general-typed variable that is used as loop variable
+in a for-statement.
+All this information is collected in a \fIregister message\fR in
+the EM code.
+Such a message is a pseudo EM instruction.
+This message also contains a \fIscore\fR field,
+indicating how desirable it is to put this variable in a register.
+A front end may assign a high score to a variable if it
+was declared as a register variable (which is only possible in
+some languages, such as "C").
+Any compiler phase before the code generator may change this score field,
+if it has reason to do so.
+The code generator bases its decisions on the information contained
+in the register message, most notably on the score.
+.PP
+If the global optimizer is not used,
+the score fields are set by the Peephole Optimizer.
+This optimizer simply counts the number of occurrences
+of every local (register) variable and adds this count
+to the score provided by the front end.
+In this way a simple, yet quite effective
+register allocation scheme is achieved.
+.NH 3
+The role of the Global Optimizer
+.PP
+The Global Optimizer essentially tries to improve the scheme
+outlined above.
+It uses the following principles for this purpose:
+.IP -
+Entities are not always assigned a register for the duration
+of an entire procedure; smaller regions of the program text
+may be considered too.
+.IP -
+several variables may be put in the same register simultaneously,
+provided at most one of them is live at any point.
+.IP -
+besides local variables, other entities (such as constants and addresses of
+variables and procedures) may be put in a register.
+.IP -
+more accurate cost estimates are used.
+.LP
+To perform its task, the optimizer must have some
+knowledge of the target machine.
+.NH 3
+The interface between the register allocator and the code generator
+.PP
+The RA phase of the optimizer must somehow be able to express its
+decisions.
+Such decisions may look like: 'put constant 1283 in a register from
+line 12 to line 40'.
+To be precise, RA must be able to tell the code generator to:
+.IP -
+initialize a register with some value
+.IP -
+update an entity from a register
+.IP -
+replace all occurrences of an entity in a certain region
+of text by a reference to the register.
+.LP
+At least three problems occur here: the code generator is only used to
+put local variables in registers,
+it only assigns a register to a variable for the duration of an entire
+procedure and it is not used to have some earlier compiler phase
+make all the decisions.
+.PP
+All problems are solved by one mechanism, that involves no changes
+to the code generator.
+With every (non-scratch) register R that will be used in
+a procedure P, we associate a new variable T, local to P.
+The size of T is the same as the size of R.
+A register message is generated for T with an exceptionally high score.
+The scores of all original register messages are set to zero.
+Consequently, the code generator will always assign precisely those new
+variables to a register.
+If the optimizer wants to put some entity, say the constant 1283, in
+a register, it emits the code "T := 1283" and replaces all occurrences
+of '1283' by T.
+Similarly, it can put the address of a procedure in T and replace all
+calls to that procedure by indirect calls.
+Furthermore, it can put several different entities in T (and thus in R)
+during the lifetime of P.
+.PP
+In principle, the code generated by the optimizer in this way would
+always be valid EM code, even if the optimizer would be presented
+a totally wrong description of the target computer register set.
+In practice, it would be a waste of data as well as text space to
+allocate memory for these new variables, as they will always be assigned
+a register (in the correct order of events).
+Hence, no memory locations are allocated for them.
+For this reason they are called pseudo local variables.
--- a/doc/ego/ra/ra3
+++ b/doc/ego/ra/ra3
@ -0,0 +1,383 @@
+.NH 2
+The register allocation phase
+.PP
+.NH 3
+Overview
+.PP
+The RA phase deals with one procedure at a time.
+For every procedure, it first determines which entities
+may be put in a register. Such an entity
+is called an \fIitem\fR.
+For every item it decides during which parts of the procedure it
+might be assigned a register.
+Such a region is called a \fItimespan\fR.
+For any item, several (possibly overlapping) timespans may
+be considered.
+A pair (item,timespan) is called an \fIallocation\fR.
+If the items of two allocations are both live at some
+point of time in the intersections of their timespans,
+these allocations are said to be \fIrivals\fR of each other,
+as they cannot be assigned the same register.
+The rivals-set of every allocation is computed.
+Next, the gains of assigning a register to an allocation are estimated,
+for every allocation.
+With all this information, decisions are made which allocations
+to store in which registers (\fIpacking\fR).
+Finally, the EM text is transformed to reflect these decisions.
+.NH 3
+The item recognition subphase
+.PP
+RA tries to put the following entities in a register:
+.IP -
+a local variable for which a register message was found
+.IP -
+the address of a local variable for which no
+register message was found
+.IP -
+the address of a global variable
+.IP -
+the address of a procedure
+.IP -
+a numeric constant.
+.LP
+Only the \fIaddress\fR of a global variable
+may be put in a register, not the variable itself.
+This approach avoids the very complex problems that would be
+caused by procedure calls and indirect pointer references (see
+.[~[
+aho design compiler
+.] sections 14.7 and 14.8]
+and 
+.[~[
+spillman side-effects
+.]]).
+Still, on most machines accessing a global variable using indirect
+addressing through a register is much cheaper than
+accessing it via its address.
+Similarly, if the address of a procedure is put in a register, the
+procedure can be called via an indirect call.
+.PP
+With every item we associate a register type.
+This type is
+.DS
+for local variables: the type contained in the register message
+for addresses of variables and procedures: the pointer type
+for constants: the general type
+.DE
+An entity other than a local variable is not taken to be an item
+if it is used only once within the current procedure.
+.PP
+An item is said to be \fIlive\fR at some point of the program text
+if its value may be used before it is changed.
+As addresses and constants are never changed, all items but local
+variables are always live.
+The region of text during which a local variable is live is
+determined via the live/dead messages generated by the
+Live Variable analysis phase of the Global Optimizer.
+.NH 3
+The allocation determination subphase
+.PP
+If a procedure has more items than registers,
+it may be advantageous to put an item in a register
+only during those parts of the procedure where it is most
+heavily used.
+Such a part will be called a timespan.
+With every item we may associate a set of timespans.
+If two timespans of an item overlap,
+at most one of them may be granted a register,
+as there is no use in putting the same item in two
+registers simultaneously.
+If two timespans of an item are distinct,
+both may be chosen;
+the item will possibly be put in two
+different registers during different parts of the procedure.
+The timespan may also consist
+of the whole procedure.
+.PP
+A list of (item,timespan) pairs (allocations)
+is build, which will be the input to the decision making
+subphase of RA (packing subphase).
+This allocation list is the main data structure of RA.
+The description of the remainder of RA will be in terms
+of allocations rather than items.
+The phrase "to assign a register to an allocation" means "to assign
+a register to the item of the allocation for the duration of
+the timespan of the allocation".
+Subsequent subphases will add more information
+to this list.
+.PP
+Several factors must be taken into account when a
+timespan for an item is constructed:
+.IP 1.
+At any \fIentry point\fR of the timespan where the
+item is live,
+the register must be initialized with the item
+.IP 2.
+At any exit point of the timespan where the item is live,
+the item must be updated.
+.LP
+In order to decrease these costs, we will only consider timespans with
+one entry point
+and no live exit points.
+.NH 3
+The rivals computation subphase
+.PP
+As stated before, several different items may be put in the
+same register, provided they are not live simultaneously.
+For every allocation we determine the intersection
+of its timespan and the lifetime of its item (i.e. the part of the
+procedure during which the item is live).
+The allocation is said to be busy during this intersection.
+If two allocations are ever busy simultaneously they are
+said to be rivals of each other.
+The rivals information is added to the allocation list.
+.NH 3
+The profits computation subphase
+.PP
+To make good decisions, the packing subphase needs to
+know which allocations can be assigned the same register
+(rivals information) and how much is gained by
+granting an allocation a register.
+.PP
+Besides the gains of using a register instead of an
+item,
+two kinds of overhead costs must be
+taken into account:
+.IP -
+the register must be initialized with the item
+.IP -
+the register must be saved at procedure entry
+and restored at procedure exit.
+.LP
+The latter costs should not be due to a single
+allocation, as several allocations can be assigned the same register.
+These costs are dealt with after packing has been done.
+They do not influence the decisions of the packing algorithm,
+they may only undo them.
+.PP
+The actual profits consist of improvements
+of execution time and code size.
+As the former is far more difficult to estimate , we will 
+discuss code size improvements first.
+.PP
+The gains of putting a certain item in a register
+depends on how the item is used.
+Suppose the item is
+a pointer variable.
+On machines that do not have a
+double-indirect addressing mode,
+two instructions are needed to dereference the variable
+if it is not in a register, but only one if it is put in a register.
+If the variable is not dereferenced, but simply copied, one instruction
+may be sufficient in both cases.
+So  the gains of putting a pointer variable in a register are higher
+if the variable is dereferenced often.
+.PP
+To make accurate estimates, detailed knowledge of
+the target machine and of the code generator
+would be needed.
+Therefore, a simplification has been made that substantially limits
+the amount of target machine information that is needed.
+The estimation of the number of bytes saved does
+not take into account how an item is used.
+Rather, an average number is used.
+So these gains are computed as follows:
+.DS
+#bytes_saved = #occurrences * gains_per_occurrence
+.DE
+The number of occurrences is derived from
+the EM code.
+Note that this is not exact either,
+as there is no one-to-one correspondence between occurrences in
+the EM code and in the assembler code.
+.PP
+The gains of one occurrence depend on:
+.IP 1.
+the type of the item
+.IP 2.
+the size of the item
+.IP 3.
+the type of the register
+.LP
+and for local variables and addresses of local variables:
+.IP 4.
+the type of the local variable
+.IP 5.
+the offset of the variable in the stackframe
+.LP
+For every allocation we try two types of registers: the register type
+of the item and the general register type.
+Only the type with the highest profits will subsequently be used.
+This type is added to the allocation information.
+.PP
+To compute the gains, RA uses a machine-dependent table
+that is read from a machine descriptor file.
+By means of this table the number of bytes saved can be computed
+as a function of the five properties.
+.PP
+The costs of initializing a register with an item
+is determined in a similar way.
+The cost of one initialization is also
+obtained from the descriptor file.
+Note that there can be at most one initialization for any
+allocation.
+.PP
+To summarize, the number of bytes a certain allocation would
+save is computed as follows:
+.DS
+net_bytes_saved =  bytes_saved - init_cost
+bytes_saved =      #occurrences * gains_per_occ
+init_cost =        #initializations * costs_per_init
+.DE
+.PP
+It is inherently more difficult to estimate the execution
+time saved by putting an item in a register,
+because it is impossible to predict how
+many times an item will be used dynamically.
+If an occurrence is part of a loop,
+it may be executed many times.
+If it is part of a conditional statement, 
+it may never be executed at all.
+In the latter case, the speed of the program may even get
+worse if an initialization is needed.
+As a clear example, consider the piece of "C" code in Fig. 13.1.
+.DS
+switch(expr) {
+      case 1:  p(); break;
+      case 2:  p(); p(); break;
+      case 3:  p(); break;
+      default: break;
+}
+
+Fig. 13.1 A "C" switch statement
+.DE
+Lots of bytes may be saved by putting the address of procedure p
+in a register, as p is called four times (statically).
+Dynamically, p will be called zero, one or two times,
+depending on the value of the expression.
+.PP
+The optimizer uses the following strategy for optimizing
+execution time:
+.IP 1.
+try to put items in registers during \fIloops\fR first
+.IP 2.
+always keep the initializing code outside the loop
+.IP 3.
+if an item is not used in a loop, do not put it in a register if
+the initialization costs may be higher than the gains
+.LP
+The latter condition can be checked by determining the 
+minimal number of usages (dynamically) of the item during the procedure,
+via a shortest path algorithm.
+In the example above, this minimal number is zero, so the address of
+p is not put in a register.
+.PP
+The costs of one occurrence is estimated as described above for the
+code size.
+The number of dynamic occurrences is guessed by looking at the
+loop nesting level of every occurrence.
+If the item is never used in a loop,
+the minimal number of occurrences is used.
+From these facts, the execution time improvement is assessed
+for every allocation.
+.NH 3
+The packing subphase
+.PP
+The packing subphase takes as input the allocation
+list and outputs a
+description of which allocations should be put
+in which registers.
+So it is essentially the decision making part of RA.
+.PP
+The packing system tries to assign a register to allocations one
+at a time, in some yet to be defined order.
+For every allocation A, it first checks if there is a register
+(of the right type)
+that is already assigned to one or more allocations,
+none of which are rivals of A.
+In this case A is assigned the same register.
+Else, A is assigned a new register, if one exists.
+A table containing the number of free registers for every type
+is maintained.
+It is initialized with the number of non-scratch registers of
+the target computer and updated whenever a
+new register is handed out.
+The packing algorithm stops when no more allocations can 
+or need be assigned a register.
+.PP
+After an allocation A has been packed,
+all allocations with non-disjunct timespans (including
+A itself) are removed from the allocation list.
+.PP
+In case the number of items exceeds the number of registers, it
+is important to choose the most profitable allocations.
+Due to the possibility of having several allocations
+occupying the same register,
+this problem is quite complex.
+Our packing algorithm uses simple heuristic rules
+and avoids any combinatorial search.
+It has distinct rules for different costs measures.
+.PP
+If object code size is the most important factor,
+the algorithm is greedy and chooses allocations in
+decreasing order of their profits attribute.
+It does not take into account the fact that
+other allocations may be passed over because of
+this decision.
+.PP
+If execution time is at prime stake, the algorithm
+first considers allocations whose timespans consist of loops.
+After all these have been packed, it considers the remaining
+allocations.
+Within the two subclasses, it considers allocations
+with the highest profits first.
+When assigning a register to an allocation with a loop
+as timespan, the algorithm checks if the item has
+already been put in a register during another loop.
+If so, it tries to use the same register for the
+new allocation.
+After all packing has been done,
+it checks if the item has always been assigned the same
+register (although not necessarily during all loops).
+If so, it tries to put the item in that register during
+the entire procedure. This is possible
+if the allocation (item,whole_procedure) is not a rival
+of any allocation with a different item that has been
+assigned to the same register.
+Note that this approach is essentially 'bottom up',
+as registers are first assigned over small regions
+of text which are later collapsed into larger regions.
+The advantage of this approach is the fact that
+the decisions for one loop can be made independently
+of all other loops.
+.PP
+After the entire packing process has been completed,
+we compute for each register how much is gained in using
+this register, by simply adding the net profits
+of all allocations assigned to it.
+This total yield should outweigh the costs of
+saving/restoring the register at procedure entry/exit.
+As most modern processors (e.g. 68000, Vax) have special
+instructions to save/restore several registers,
+the differential costs of saving one extra register are by
+no means constant.
+The costs are read from the machine descriptor file and
+compared to the total yields of the registers.
+As a consequence of this analysis, some allocations 
+may have their registers taken away.
+.NH 3
+The transformation subphase
+.PP
+The final subphase of RA transforms the EM text according to the
+decisions made by the packing system.
+It traverses the text of the currently optimized procedure and
+changes all occurrences of items at points where
+they are assigned a register.
+It also clears the score field of the register messages for
+normal local variables and emits register messages with a very
+high score for the pseudo locals.
+At points where registers have to be initialized with items,
+it generates EM code to do so.
+Finally it tries to decrease the size of the stackframe
+of the procedure by looking at which local variables need not
+be given memory locations.
--- a/doc/ego/ra/ra4
+++ b/doc/ego/ra/ra4
@ -0,0 +1,28 @@
+.NH 2
+Source files of RA
+.PP
+The sources of RA are in the following files and packages:
+.IP ra.h: 14
+declarations of global variables and data structures
+.IP ra.c:
+the routine main; initialization of target machine-dependent tables
+.IP items:
+a routine to build the list of items of one procedure;
+routines to manipulate items
+.IP lifetime:
+contains a subroutine that determines when items are live/dead
+.IP alloclist:
+contains subroutines that build the initial allocations list
+and that compute the rivals sets.
+.IP profits:
+contains a subroutine that computes the profits of the allocations
+and a routine that determines the costs of saving/restoring registers
+.IP pack:
+contains the packing subphase
+.IP xform:
+contains the transformation subphase
+.IP interval:
+contains routines to manipulate intervals of time
+.IP aux:
+contains auxiliary routines
+.LP
--- a/doc/ego/sp/sp1
+++ b/doc/ego/sp/sp1
@ -0,0 +1,171 @@
+.bp
+.NH 1
+Stack pollution
+.NH 2
+Introduction
+.PP
+The "Stack Pollution" optimization technique (SP) decreases the costs
+(time as well as space) of procedure calls.
+In the EM calling sequence, the actual parameters are popped from
+the stack by the \fIcalling\fR procedure.
+The ASP (Adjust Stack Pointer) instruction is used for this purpose.
+A call in EM is shown in Fig. 8.1
+.DS
+Pascal:                EM:
+
+f(a,2)                 LOC 2
+		       LOE A
+		       CAL F
+		       ASP 4    -- pop 4 bytes
+
+Fig. 8.1 An example procedure call in Pascal and EM
+.DE
+As procedure calls occur often in most programs,
+the ASP is one of the most frequently used EM instructions.
+.PP
+The main intention of removing the actual parameters after a procedure call
+is to avoid the stack size to increase rapidly.
+Yet, in some cases, it is possible to \fIdelay\fR or even \fIavoid\fR the
+removal of the parameters without letting the stack grow
+significantly.
+In this way, considerable savings in code size and execution time may
+be achieved, at the cost of a slightly increased stack size.
+.PP
+A stack adjustment may be delayed if there is some other stack adjustment
+later on in the same basic block.
+The two ASPs can be combined into one.
+.DS
+Pascal:           EM:               optimized EM:
+
+f(a,2)            LOC 2             LOC 2
+g(3,b,c)          LOE A             LOE A
+		  CAL F             CAL F
+		  ASP 4             LOE C
+		  LOE C             LOE B
+		  LOE B             LOC 3
+		  LOC 3             CAL G
+		  CAL G             ASP 10
+		  ASP 6
+
+Fig. 8.2 An example of local Stack Pollution
+.DE
+The stacksize will be increased only temporarily.
+If the basic block contains another ASP, the ASP 10 may subsequently be
+combined with that next ASP, and so on.
+.PP
+For some back ends, a stack adjustment also takes place
+at the point of a procedure return.
+There is no need to specify the number of bytes to be popped at a
+return.
+This provides an opportunity to remove ASPs more globally.
+If all ASPs outside any loop are removed, the increase of the
+stack size will still only be small, as no such ASP is executed more
+than once without an intervening return from the procedure it is part of.
+.PP
+This second approach is not generally applicable to all target machines,
+as some back ends require the stack to be cleaned up at the point of
+a procedure return.
+.NH 2
+Implementation
+.PP
+There is one main problem the implementation has to solve.
+In EM, the stack is not only used for passing parameters,
+but also for evaluating expressions.
+Hence, ASP instructions can only be combined or removed
+if certain conditions are satisfied.
+.PP
+Two consecutive ASPs of one basic block can only be combined
+(as described above) if:
+.IP 1.
+On no point of text in between the two ASPs, any item is popped from
+the stack that was pushed onto it before the first ASP.
+.IP 2.
+The number of bytes popped from the stack by the second ASP must equal
+the number of bytes pushed since the first ASP.
+.LP
+Condition 1. is not satisfied in Fig. 8.3.
+.DS
+Pascal:               EM:
+
+5 + f(10) + g(30)     LOC 5
+		      LOC 10
+		      CAL F
+		      ASP 2    -- cannot be removed
+		      LFR 2    -- push function result
+		      ADI 2
+		      LOC 30
+		      CAL G
+		      ASP 2
+		      LFR 2
+		      ADI 2
+Fig. 8.3 An illegal transformation
+.DE
+If the first ASP were removed (delayed), the first ADI would add
+10 and f(10), instead of 5 and f(10).
+.sp
+Condition 2. is not satisfied in Fig. 8.4.
+.DS
+Pascal:               EM:
+
+f(10) + 5 * g(30)     LOC 10
+		      CAL F
+		      ASP 2
+		      LFR 2
+		      LOC 5
+		      LOC 30
+		      CAL G
+		      ASP 2
+		      LFR 2
+		      MLI 2   --  5 * g(30)
+		      ADI 2
+
+Fig. 8.4 A second illegal transformation
+.DE
+If the two ASPs were combined into one 'ASP 4', the constant 5 would
+have been popped, rather than the parameter 10 (so '10 + f(10)*g(30)'
+would have been computed).
+.PP
+The second approach to deleting ASPs (i.e. let the procedure return
+do the stack clean-up)
+is only applied to the last ASP of every basic block.
+Any preceding ASPs are dealt with by the first approach.
+The last ASP of a basic block B will only be removed if:
+.IP -
+on no path in the control flow graph from B to any block containing a
+RET (return) there is a basic block that, at some point of its text, pops
+items from the stack that it has not itself pushed earlier.
+.LP
+Clearly, if this condition is satisfied, no harm can be done; no
+other basic block will ever access items that were pushed
+on the stack before the ASP.
+.PP
+The number of bytes pushed onto or popped from the stack can be
+easily encoded in a so called "pop-push table".
+The numbers in general depend on the target machine word- and pointer
+size and on the argument given to the instruction.
+For example, an ADS instruction is described by:
+.DS
+   -a-p+p
+.DE
+which means: an 'ADS n' first pops an n-byte value (n being the argument),
+next pops a pointer-size value and finally pushes a pointer-size value.
+For some infrequently used EM instructions the pop-push numbers
+cannot be computed statically.
+.PP
+The stack pollution algorithm first performs a depth first search over
+the control flow graph and marks all blocks that do not satisfy
+the global condition.
+Next it visits all basic blocks in turn.
+For every pair of adjacent ASPs, it checks conditions 1. and 2. and
+combines the ASPs if they are satisfied.
+The new ASP may be used as first ASP in the next pair.
+If a condition fails, it simply continues with the next ASP.
+Finally, the last ASP is removed if:
+.IP -
+nothing has been popped from the stack after the last ASP that was
+pushed before it
+.IP -
+the block was not marked by the depth first search
+.IP -
+the block is not in a loop
+.LP
--- a/doc/ego/sr/sr1
+++ b/doc/ego/sr/sr1
@ -0,0 +1,44 @@
+.bp
+.NH 1
+Strength reduction
+.NH 2
+Introduction
+.PP
+The Strength Reduction optimization technique (SR)
+tries to replace expensive operators
+by cheaper ones,
+in order to decrease the execution time
+of the program.
+A classical example is replacing a 'multiplication by 2'
+by an addition or a shift instruction.
+These kinds of local transformations are already
+done by the EM Peephole Optimizer.
+Strength reduction can also be applied
+more generally to operators used in a loop.
+.DS
+i := 1;                    i := 1;
+while i < 100 loop  -->    TMP := i * 118;
+   put(i * 118);           while i < 100 loop
+   i := i + 1;                put(TMP);
+end loop;                     i := i + 1;
+			      TMP := TMP + 118;
+			   end loop;
+
+Fig. 6.1 An example of Strenght Reduction
+.DE
+In Fig. 6.1, a multiplication inside a loop is
+replaced by an addition inside the loop and a multiplication
+outside the loop.
+Clearly, this is a global optimization; it cannot
+be done by a peephole optimizer.
+.PP
+In some cases a related technique, \fItest replacement\fR,
+can be used to eliminate the
+loop variable i.
+This technique will not be discussed in this report.
+.sp 0
+In the example above, the resulting code
+can be further optimized by using
+constant propagation.
+Obviously, this is not the task of the
+Strength Reduction phase.
--- a/doc/ego/sr/sr2
+++ b/doc/ego/sr/sr2
@ -0,0 +1,217 @@
+.NH 2
+The model of strength reduction
+.PP
+In this section we will describe 
+the transformations performed by
+Strength Reduction (SR).
+Before doing so, we will introduce the
+central notion of an induction variable.
+.NH 3
+Induction variables
+.PP
+SR looks for variables whose
+values form an arithmetic progression
+at the beginning of a loop.
+These variables are called induction variables.
+The most frequently occurring example of such
+a variable is a loop-variable in a high-order
+programming language.
+Several quite sophisticated models of strength
+reduction can be found in the literature.
+.[
+cocke reduction strength cacm
+.]
+.[
+allen cocke kennedy reduction strength
+.]
+.[
+lowry medlock cacm
+.]
+.[
+aho compiler design
+.]
+In these models the notion of an induction variable
+is far more general than the intuitive notion
+of a loop-variable.
+The definition of an induction variable we present here
+is more restricted,
+yielding a simpler model and simpler transformations.
+We think the principle source for strength reduction lies in
+expressions using a loop-variable,
+i.e. a variable that is incremented or decremented
+by the same amount after every loop iteration,
+and that cannot be changed in any other way.
+.PP
+Of course, the EM code does not contain high level constructs
+such as for-statements.
+We will define an induction variable in terms
+of the Intermediate Code of the optimizer.
+Note that the notions of a loop in the
+EM text and of a firm basic block
+were defined in section 3.3.5.
+.sp
+.UL definition
+.sp 0
+An induction variable i of a loop L is a local variable
+that is never accessed indirectly,
+whose size is the word size of the target machine, and
+that is assigned exactly once within L,
+the assignment:
+.IP -
+being of the form i := i + c or i := c +i,
+c is a constant
+called the \fIstep value\fR of i.
+.IP -
+occurring in a firm block of L.
+.LP
+(Note that the first restriction on the assignment
+is not described in terms of the Intermediate Code;
+we will give such a description later; the current
+definition is easier to understand however).
+.NH 3
+Recognized expressions
+.PP
+SR recognizes certain expressions using
+an induction variable and replaces
+them by cheaper ones.
+Two kinds of expensive operations are recognized:
+multiplication and array address computations.
+The expressions that are simplified must
+use an induction variable
+as an operand of
+a multiplication or as index in an array expression.
+.PP
+Often a linear function of an induction variable is used,
+rather than the variable itself.
+In these cases optimization is still possible.
+We call such expressions \fIiv-expressions\fR.
+.sp
+.UL definition:
+.sp 0
+An iv-expression of an induction variable i of a loop L is
+an expression that:
+.IP -
+uses only the operators + and - (unary as well as binary)
+.IP -
+uses i as operand exactly once
+.IP -
+uses (besides i) only constants or variables that are
+never changed in L as operands.
+.LP
+.PP
+The expressions recognized by SR are of the following forms:
+.IP (1)
+iv_expression * constant
+.IP (2)
+constant * iv_expression
+.IP (3)
+A[iv-expression] :=       (assign to array element)
+.IP (4)
+A[iv-expression]          (use array element)
+.IP (5)
+& A[iv-expression]        (take address of array element)
+.LP
+(Note that EM has different instructions to use an array element,
+store into one, or take the address of one, resp. LAR, SAR, and AAR).
+.sp 0
+The size of the elements of A must
+be known statically.
+In cases (3) and (4) this size 
+must equal the word size of the
+target machine.
+.NH 3
+Transformations
+.PP
+With every recognized expression we associate
+a new temporary local variable TMP,
+allocated in the stack frame of the
+procedure containing the expression.
+At any program point within the loop, TMP will
+contain the following value:
+.IP multiplication: 18
+the current value of iv-expression * constant
+.IP arrays:
+the current value of &A[iv-expression].
+.LP
+In the second case, TMP essentially is a pointer variable,
+pointing to the element of A that is currently in use.
+.sp 0
+If the same expression occurs several times in the loop,
+the same temporary local is used each time.
+.PP
+Three transformations are applied to the EM text:
+.IP (1)
+TMP is initialized with the right value.
+This initialization takes place just
+before the loop.
+.IP (2)
+The recognized expression is simplified.
+.IP (3)
+TMP is incremented; this takes place just
+after the induction variable is incremented.
+.LP
+For multiplication, the initial value of TMP
+is the value of the recognized expression at
+the program point immediately before the loop.
+For arrays, TMP is initialized with the address
+of the first array element that is accessed.
+So the initialization code is:
+.DS
+TMP := iv-expression * constant;  or
+TMP := &A[iv-expression]
+.DE
+At the point immediately before the loop,
+the induction variable will already have been
+initialized,
+so the value used in the code above will be the
+value it has during the first iteration.
+.PP
+For multiplication, the recognized expression can simply be
+replaced by TMP.
+For array optimizations, the replacement
+depends on the form:
+.DS
+\fIform\fR                         \fIreplacement\fR
+(3) A[iv-expr] :=            *TMP :=     (assign indirect)
+(4) A[iv-expr]               *TMP        (use indirect)
+(5) &A[iv-expr]              TMP
+.DE
+The '*' denotes the indirect operator. (Note that
+EM has different instructions to do
+an assign-indirect and a use-indirect).
+As the size of the array elements is restricted
+to be the word size in case (3) and (4),
+only one EM instruction needs to
+be generated in all cases.
+.PP
+The amount by which TMP is incremented is:
+.IP multiplication: 18
+step value * constant
+.IP arrays:
+step value * element size
+.LP
+Note that the step value (see definition of induction variable above),
+the constant, and the element size (see previous section) can all
+be determined statically.
+If the sign of the induction variable in the
+iv-expression is negative, the amount
+must be negated.
+.PP
+The transformations are demonstrated by an example.
+.DS
+i := 100;                     i := 100;
+while i > 1 loop              TMP := (6-i) * 5;
+   X := (6-i) * 5 + 2;        while i > 1 loop
+   Y := (6-i) * 5 - 8;   -->     X := TMP + 2;
+   i := i - 3;                   Y := TMP - 8;
+end loop;                        i := i - 3;
+			         TMP := TMP + 15;
+			      end loop;
+
+Fig. 6.2 Example of complex Strength Reduction transformations
+.DE
+The expression '(6-i)*5' is recognized twice. The constant
+is 5.
+The step value is -3.
+The sign of i in the recognized expression is '-'.
+So the increment value of TMP is -(-3*5) = +15.
--- a/doc/ego/sr/sr3
+++ b/doc/ego/sr/sr3
@ -0,0 +1,232 @@
+.NH 2
+Implementation
+.PP
+Like most phases, SR deals with one procedure
+at a time.
+Within a procedure, SR works on one loop at a time.
+Loops are processed in textual order.
+If loops are nested inside each other,
+SR starts with the outermost loop and proceeds in the
+inwards direction.
+This order is chosen,
+because it enables the optimization
+of multi-dimensional array address computations,
+if the elements are accessed in the usual way
+(i.e. row after row, rather than column after column).
+For every loop, SR first detects all induction variables
+and then tries to recognize
+expressions that can be optimized.
+.NH 3
+Finding induction variables
+.PP
+The process of finding induction variables
+can conveniently be split up
+into two parts.
+First, the EM text of the loop is scanned to find
+all \fIcandidate\fR induction variables,
+which are word-sized local variables
+that are assigned precisely once
+in the loop, within a firm block.
+Second, for every candidate, the single assignment
+is inspected, to see if it has the form
+required by the definition of an induction variable.
+.PP
+Candidates are found by scanning the EM code of the loop.
+During this scan, two sets are maintained.
+The set "cand" contains all variables that were
+assigned exactly once so far, within a firm block.
+The set "dismiss" contains all variables that
+should not be made a candidate.
+Initially, both sets are empty.
+If a variable is assigned to, it is put
+in the cand set, if three conditions are met:
+.IP 1.
+the variable was not in cand or dismiss already
+.IP 2.
+the assignment takes place in a firm block
+.IP 3.
+the assignment is not a ZRL instruction (assignment by zero)
+or a SDL instruction (store double local).
+.LP
+If any condition fails, the variable is dismissed from cand
+(if it was there already) and put in dismiss
+(if it was not there already).
+.sp 0
+All variables for which no register message was generated (i.e. those
+variables that may be accessed indirectly) are assumed
+to be changed in the loop.
+.sp 0
+All variables that remain in cand are candidate induction variables.
+.PP
+From the set of candidates, the induction variables can
+be determined, by inspecting the single assignment.
+The assignment must match one of the EM patterns below.
+('x' is the candidate. 'ws' is the word size of the target machine.
+'n' is any number.)
+.DS
+\fIpattern\fR                                     \fIstep size\fR
+INL x  |                                      +1
+DEL x  |                                      -1
+LOL x ; (INC | DEC) ; STL x  |                +1 | -1
+LOL x ; LOC n ; (ADI ws | SBI ws) ; STL x  |  +n | -n
+LOC n ; LOL x ; ADI ws ; STL x.               +n
+.DE
+From the patterns the step size of the induction variable
+can also be determined.
+These step sizes are displayed on the right hand side.
+.sp
+For every induction variable we maintain the following information:
+.IP -
+the offset of the variable in the stackframe of its procedure
+.IP -
+a pointer to the EM text of the assignment statement
+.IP -
+the step value
+.LP
+.NH 3
+Optimizing expressions
+.PP
+If any induction variables of the loop were found,
+the EM text of the loop is scanned again,
+to detect expressions that can be optimized.
+SR scans for multiplication and array instructions.
+Whenever it finds such an instruction, it analyses the
+code in front of it.
+If an expression is to be optimized, it must
+be generated by the following syntax rules.
+.DS
+   optimizable_expr:
+		iv_expr const mult |
+		const iv_expr mult |
+		address iv_expr address array_instr;
+   mult:
+		MLI ws |
+		MLU ws ;
+   array_instr:
+		LAR ws |
+		SAR ws |
+		AAR ws ;
+   const:
+		LOC n ;
+.DE
+An 'address' is an EM instruction that loads an
+address on the stack.
+An instruction like LOL may be an 'address', if
+the size of an address (pointer size, =ps) is
+the same as the word size.
+If the pointer size is twice the word size,
+instructions like LDL are an 'address'.
+(The addresses in the third grammar rule
+denote resp. the array address and the
+array descriptor address).
+.DS
+   address:
+		LAE |
+		LAL |
+		LOL if ps=ws |
+		LOE    ,,    |
+		LIL    ,,    |
+		LDL if ps=2*ws |
+		LDE    ,,      ;
+.DE
+The notion of an iv-expression was introduced earlier.
+.DS
+   iv_expr:
+		iv_expr unair_op |
+		iv_expr iv_expr binary_op |
+		loopconst |
+		iv ;
+   unair_op:
+		NGI ws |
+		INC |
+		DEC ;
+   binary_op:
+		ADI ws |
+		ADU ws |
+		SBI ws |
+		SBU ws ;
+   loopconst:
+		const |
+		LOL x  if x is not changed in loop ;
+   iv:
+		LOL x  if x is an induction variable ;
+.DE
+An iv_expression must satisfy one additional constraint:
+it must use exactly one operand that is an induction
+variable.
+A simple, hand written, top-down parser is used
+to recognize an iv-expression.
+It scans the EM code from right to left
+(recall that EM is essentially postfix).
+It uses semantic attributes (inherited as well as
+derived) to check the additional constraint.
+.PP
+All information assembled during the recognition
+process is put in a 'code_info' structure.
+This structure contains the following information:
+.IP -
+the optimizable code itself
+.IP -
+the loop and basic block the code is part of
+.IP -
+the induction variable
+.IP -
+the iv-expression
+.IP -
+the sign of the induction variable in the
+iv-expression
+.IP -
+the offset and size of the temporary local variable
+.IP -	
+the expensive operator (MLI, LAR etc.)
+.IP -
+the instruction that loads the constant
+(for multiplication) or the array descriptor
+(for arrays).
+.LP
+The entire transformation process is driven
+by this information.
+As the EM text is represented internally
+as a list, this process consists
+mainly of straightforward list manipulations.
+.sp 0
+The initialization code must be put
+immediately before the loop entry.
+For this purpose a \fIheader block\fR is
+created that has the loop entry block as
+its only successor and that dominates the
+entry block.
+The CFG and all relations (SUCC,PRED, IDOM, LOOPS etc.)
+are updated.
+.sp 0
+An EM instruction that will
+replace the optimizable code
+is created and put at the place of the old code.
+The list representing the old optimizable code
+is used to create a list for the initializing code,
+as they are similar.
+Only two modifications are required:
+.IP -
+if the expensive operator is a LAR or SAR,
+it must be replaced by an AAR, as the initial value
+of TMP is the \fIaddress\fR of the first
+array element that is accessed.
+.IP -
+code must be appended to store the result of the
+expression in TMP.
+.LP
+Finally, code to increment TMP is created and put after
+the code of the single assignment to the
+induction variable.
+The generated code uses either an integer addition
+(ADI) or an integer-to-pointer addition (ADS)
+to do the increment.
+.PP
+SR maintains a set of all expressions that have already
+been recognized in the present loop.
+Such expressions are said to be \fIavailable\fR.
+If an expression is recognized that is
+already available,
+no new temporary local variable is allocated for it,
+and the code to initialize and increment the local
+is not generated.
--- a/doc/ego/sr/sr4
+++ b/doc/ego/sr/sr4
@ -0,0 +1,28 @@
+.NH 2
+Source files of SR
+.PP
+The sources of SR are in the following files
+and packages:
+.IP sr.h: 14
+declarations of global variables and
+data structures
+.IP sr.c:
+the routine main; a driving routine to process
+(possibly nested) loops in the right order
+.IP iv
+implements a procedure that finds the induction variables
+of a loop
+.IP reduce
+implements a procedure that finds optimizable expressions
+and that does the transformations
+.IP cand
+implements a procedure that finds the candidate induction
+variables; used to implement iv
+.IP xform
+implements several useful routines that transform
+lists of EM text or a CFG; used to implement reduce
+.IP expr
+implements a procedure that parses iv-expressions
+.IP aux
+implements several auxiliary procedures.
+.LP
--- a/doc/ego/ud/ud1
+++ b/doc/ego/ud/ud1
@ -0,0 +1,58 @@
+.bp
+.NH 1
+Use-Definition analysis
+.NH 2
+Introduction
+.PP
+The "Use-Definition analysis" phase (UD) consists of two related optimization
+techniques that both depend on "Use-Definition" information.
+The techniques are Copy Propagation and Constant Propagation.
+They are best explained via an example (see Figs. 11.1 and 11.2).
+.DS
+   (1)  A := B                  A := B
+	 ...          -->        ...
+   (2)  use(A)                  use(B)
+
+Fig. 11.1 An example of Copy Propagation
+.DE
+.DS
+   (1)  A := 12                  A := 12
+	 ...          -->        ...
+   (2)  use(A)                  use(12)
+
+Fig. 11.2 An example of Constant Propagation
+.DE
+Both optimizations have to check that the value of A at line (2)
+can only be obtained at line (1).
+Copy Propagation also has to assure that the value of B is
+the same at line (1) as at line (2).
+.PP
+One purpose of both transformations is to introduce
+opportunities for the Dead Code Elimination optimization.
+If the variable A is used nowhere else, the assignment A := B
+becomes useless and can be eliminated.
+.sp 0
+If B is less expensive to access than A (e.g. this is sometimes the case
+if A is a local variable and B is a global variable),
+Copy Propagation directly improves the code itself.
+If A is cheaper to access the transformation will not be performed.
+Likewise, a constant as operand may be cheeper than a variable.
+Having a constant as operand may also facilitate other optimizations.
+.PP
+The design of UD is based on the theory described in section
+14.1 and 14.3 of.
+.[
+aho compiler design
+.]
+As a main departure from that theory,
+we do not demand the statement A := B to become redundant after
+Copy Propagation.
+If B is cheaper to access than A, the optimization is always performed;
+if B is more expensive than A, we never do the transformation.
+If A and B are equally expensive UD uses the heuristic rule to
+replace infrequently used variables by frequently used ones.
+This rule increases the chances of the assignment to become useless.
+.PP
+In the next section we will give a brief outline of the data
+flow theory used
+for the implementation of UD.
--- a/doc/ego/ud/ud2
+++ b/doc/ego/ud/ud2
@ -0,0 +1,64 @@
+.NH 2
+Data flow information
+.PP
+.NH 3
+Use-Definition information
+.PP
+A \fIdefinition\fR of a variable A is an assignment to A.
+A definition is said to \fIreach\fR a point p if there is a
+path in the control flow graph from the definition to p, such that
+A is not redefined on that path.
+.PP
+For every basic block B, we define the following sets:
+.IP GEN[b] 9
+the set of definitions in b that reach the end of b.
+.IP KILL[b]
+the set of definitions outside b that define a variable that
+is changed in b.
+.IP IN[b]
+the set of all definitions reaching the beginning of b.
+.IP OUT[b]
+the set of all definitions reaching the end of b.
+.LP
+GEN and KILL can be determined by inspecting the code of the procedure.
+IN and OUT are computed by solving the following data flow equations:
+.DS
+(1)    OUT[b] = IN[b] - KILL[b] + GEN[b]
+(2)    IN[b]  = OUT[p1] + ... + OUT[pn],
+	 where PRED(b) = {p1, ... , pn}
+.DE
+.NH 3
+Copy information
+.PP
+A \fIcopy\fR is a definition of the form "A := B".
+A copy is said to be \fIgenerated\fR in a basic block n if
+it occurs in n and there is no subsequent assignment to B in n.
+A copy is said to be \fIkilled\fR in n if:
+.IP (i)
+it occurs in n and there is a subsequent assignment to B within n, or
+.IP (ii)
+it occurs outside n, the definition A := B reaches the beginning of n
+and B is changed in n (note that a copy also is a definition).
+.LP
+A copy \fIreaches\fR a point p, if there are no assignments to B
+on any path in the control flow graph from the copy to p.
+.PP
+We define the following sets:
+.IP C_GEN[b] 11
+the set of all copies in b generated in b.
+.IP C_KILL[b]
+the set of all copies killed in b.
+.IP C_IN[b]
+the set of all copies reaching the beginning of b.
+.IP C_OUT[b]
+the set of all copies reaching the end of b.
+.LP
+C_IN and C_OUT are computed by solving the following equations:
+(root is the entry node of the current procedure; '*' denotes
+set intersection)
+.DS
+(1)    C_OUT[b] = C_IN[b] - C_KILL[b] + C_GEN[b]
+(2)    C_IN[b]  = C_OUT[p1] * ... * C_OUT[pn],
+	 where PRED(b) = {p1, ... , pn} and b /= root
+       C_IN[root] = {all copies}
+.DE
--- a/doc/ego/ud/ud3
+++ b/doc/ego/ud/ud3
@ -0,0 +1,26 @@
+.NH 2
+Pointers and subroutine calls
+.PP
+The theory outlined above assumes that variables can
+only be changed by a direct assignment.
+This condition does not hold for EM.
+In case of an assignment through a pointer variable,
+it is in general impossible to see which variable is affected
+by the assignment.
+Similar problems occur in the presence of procedure calls.
+Therefore we distinguish two kinds of definitions:
+.IP -
+an \fIexplicit\fR definition is a direct assignment to one
+specific variable
+.IP -
+an \fIimplicit\fR definition is the potential alteration of
+a variable as a result of a procedure call or an indirect assignment.
+.LP
+An indirect assignment causes implicit definitions to
+all variables that may be accessed indirectly, i.e. 
+all local variables for which no register message was generated
+and all global variables.
+If a procedure contains an indirect assignment it may change the
+same set of variables, else it may change some global variables directly.
+The KILL, GEN, IN and OUT sets contain explicit as well
+as implicit definitions.
--- a/doc/ego/ud/ud4
+++ b/doc/ego/ud/ud4
@ -0,0 +1,78 @@
+.NH 2
+Implementation
+.PP
+UD first builds a number of tables:
+.IP locals: 9
+contains information about the local variables of the
+current procedure (offset,size,whether a register message was found
+for it and, if so, the score field of that message)
+.IP defs:
+a table of all explicit definitions appearing in the
+current procedure.
+.IP copies:
+a table of all copies appearing in the
+current procedure.
+.LP
+Every variable (local as well as global), definition and copy
+is identified by a unique number, which is the index
+in the table.
+All tables are constructed by traversing the EM code.
+A fourth table, "vardefs" is used, indexed by a 'variable number',
+which contains for every variable the set of explicit definitions of it.
+Also, for each basic block b, the set CHGVARS containing all variables
+changed by it is computed.
+.PP
+The GEN sets are obtained in one scan over the EM text,
+by analyzing every EM instruction.
+The KILL set of a basic block b is computed by looking at the
+set of variables
+changed by b (i.e. CHGVARS[b]).
+For every such variable v, all explicit definitions to v
+(i.e. vardefs[v]) that are not in GEN[b] are added to KILL[b].
+Also, the implicit defininition of v is added to KILL[b].
+Next, the data flow equations for use-definition information
+are solved,
+using a straight forward, iterative algorithm.
+All sets are represented as bitvectors, so the operations
+on sets (union, difference) can be implemented efficiently.
+.PP
+The C_GEN and C_KILL sets are computed simultaneously in one scan
+over the EM text.
+For every copy A := B appearing in basic block b we do
+the following:
+.IP 1.
+for every basic block n /= b that changes B, see if the definition A := B
+reaches the beginning of n (i.e. check if the index number of A := B in
+the "defs" table is an element of IN[n]);
+if so, add the copy to C_KILL[n]
+.IP 2.
+if B is redefined later on in b, add the copy to C_KILL[b], else
+add it to C_GEN[b]
+.LP
+C_IN and C_OUT are computed from C_GEN and C_KILL via the second set of
+data flow equations.
+.PP
+Finally, in one last scan all opportunities for optimization are
+detected.
+For every use u of a variable A, we check if
+there is a unique explicit definition d reaching u.
+.sp
+If the definition is a copy A := B and B has the same value at d as
+at u, then the use of A at u may be changed into B.
+The latter condition can be verified as follows:
+.IP -
+if u and d are in the same basic block, see if there is
+any assignment to B in between d and u
+.IP -
+if u and d are in different basic blocks, the condition is
+satisfied if there is no assignment to B in the block of u prior to u
+and d is in C_IN[b].
+.LP
+Before the transformation is actually done, UD first makes sure the
+alteration is really desirable, as described before.
+The information needed for this purpose (access costs of local and
+global variables) is read from a machine descriptor file.
+.sp
+If the only definition reaching u has the form "A := constant", the use
+of A at u is replaced by the constant.
+
--- a/doc/ego/ud/ud5
+++ b/doc/ego/ud/ud5
@ -0,0 +1,19 @@
+
+.NH 2
+Source files of UD
+.PP
+The sources of UD are in the following files and packages:
+.IP ud.h: 14
+declarations of global variables and data structures
+.IP ud.c:
+the routine main; initialization of target machine dependent tables
+.IP defs:
+routines to compute the GEN and KILL sets and routines to analyse
+EM instructions
+.IP const:
+routines involved in constant propagation
+.IP copy:
+routines involved in copy propagation
+.IP aux:
+contains auxiliary routines
+.LP