ack/doc/ego/cs/cs4

.NH 2
Implementation.
.PP
In this section we will discuss the implementation of the CS phase.
We will first describe the basic actions that are undertaken
by the algorithm, than the algorithm itself.
.NH 3
Partioning the EM instructions
.PP
There are over 100 EM instructions.
For our purpose we partition this huge set into groups of
instructions which can be more or less conveniently handled together.
.PP
There are groups for all sorts of load instructions:
simple loads, expensive loads, loads of an array element.
A load is considered \fIexpensive\fP when more than one EM instructions
are involved in loading it.
The load of a lexical entity is also considered expensive.
For instance: LOF is expensive, LAL is not.
LAR forms a group on its own,
because it is not only an expensive load,
but also implicitly includes the ternary operator AAR,
which computes the address of the array element.
.PP
There are groups for all sorts of operators:
unary, binary, and ternary.
The groups of operators are further partitioned according to the size
of their operand(s) and result.
.\" .PP
.\" The distinction between operators and expensive loads is not always clear.
.\" The ADP instruction for example,
.\" might seem a unary operator because it pops one item
.\" (a pointer) from the stack.
.\" However, two ADP-instructions which pop an item with the same value number
.\" need not have the same result,
.\" because the attributes (an offset, to be added to the pointer)
.\" can be different.
.\" Is it then a binary operator?
.\" That would give rise to the strange, and undesirable,
.\" situation that some binary operators pop two operands
.\" and others pop one.
.\" The conclusion is inevitable:
.\" we have been fooled by the name (ADd Pointer).
.\" The ADP-instruction is an expensive load.
.\" In this context LAF, meaning Load Address of oFfsetted,
.\" would have been a better name,
.\" corresponding to LOF, like LAL,
.\" Load Address of Local, corresponds to LOL.
.PP
There are groups for all sorts of stores:
direct, indirect, array element.
The SAR forms a group on its own for the same reason
as appeared with LAR.
.PP
The effect of the remaining instructions is less clear.
They do not help very much in parsing expressions or
in constructing our pseudo symboltable.
They are partitioned according to the following criteria:
.RS
.IP "-"
They change the value of an entity without using the stack
(e.g. ZRL, DEE).
.IP "-"
They are subroutine calls (CAI, CAL).
.IP "-"
They change the stack in some irreproduceable way (e.g. ASP, LFR, DUP).
.IP "-"
They have no effect whatever on the stack or on the entities.
This does not mean they can be deleted,
but they can be ignored for the moment
(e.g. MES, LIN, NOP).
.IP "-"
Their effect is too complicate too compute,
so we just assume worst case behaviour.
Hopefully, they do not occur very often.
(e.g. MON, STR, BLM).
.IP "-"
They signal the end of the basic block (e.g. BLT, RET, TRP).
.RE
.NH 3
Parsing expressions
.PP
To recognize expressions,
we simulate the behaviour of the EM machine,
by means of a fake-stack.
When we scan the instructions in sequential order,
we first encounter the instructions that load
the operands on the stack,
and then the instruction that indicates the operator,
because EM expressions are postfix.
When we find an instruction to load an operand,
we load on the fake-stack a struct with the following information:
.DS
.TS
l l.
(1)	the value number of the operand
(2)	the size of the operand
(3)	a pointer to the first line of EM-code
	that constitutes the operand
.TE
.DE
In most cases, (3) will point to the line
that loaded the operand (e.g. LOL, LOC),
i.e. there is only one line that refers to this operand,
but sometimes some information must be popped
to load the operand (e.g. LOI, LAR).
This information must have been pushed before,
so we also pop a pointer to the first line that pushed
the information.
This line is now the first line that defines the operand.
.PP
When we find the operator instruction,
we pop its operand(s) from the fake-stack.
The first line that defines the first operand is
now the first line of the expression.
We now have all information to determine
whether the just parsed expression has occurred before.
We also know the first and last line of the expression;
we need this when we decide to eliminate it.
Associated with each available expression is a set of
which the elements contains the first and last line of
a recurrence of this expression.
.PP
Not only will the operand(s) be popped from the fake-stack,
but the following will be pushed:
.DS
.TS
l l.
(1)	the value number of the result
(2)	the size of the result
(3)	a pointer to the first line of the expression
.TE
.DE
In this way an item on the fake-stack always contains
the necessary information.
EM expressions are parsed bottum up.
.NH 3
Updating entities
.PP
As said before,
we build our private "symboltable",
while scanning the EM-instructions.
The behaviour of the EM-machine is not only reflected
in the fake-stack,
but also in the entities.
When an entity is created,
we do not yet know its value,
so we assign a brand new value number to it.
Each time a store-instruction is encountered,
we change the value number of the target entity of this store
to the value number of the token that was popped
from the fake-stack.
Because entities may overlap,
we must also "forget" the value numbers of entities
that might be affected by this store.
Each such entity will be \fIkilled\fP,
i.e. assigned a brand new valuenumber.
.PP
Because we lose information when we forget
the value number of an entity,
we try to save as much entities as possible.
When we store into an external,
we don't have to kill locals and vice versa.
Furthermore, we can see whether two locals or
two externals overlap,
because we know the offset from the local base,
resp. the offset within the data block,
and the size.
The situation becomes more complicated when we have
to consider indirection.
The worst case is that we store through an unknown pointer.
In that case we kill all entities except those locals
for which a so-called \fIregister message\fP has been generated;
this register message indicates that this local can never be
accessed indirectly.
If we know this pointer we can be more careful.
If it points to a local then the entity that is accessed through
this pointer can never overlap with an external.
If it points to an external this entity can never overlap with a local.
Furthermore, in the latter case,
we can find the data block this entity belongs to.
Since pointer arithmetic is only defined within a data block,
this entity can never overlap with entities that are known to
belong to another data block.
.PP
Not only after a store-instruction but also after a
subroutine-call it may be necessary to kill entities;
the subroutine may affect global variables or store
through a pointer.
If a subroutine is called that is not available as EM-text,
we assume worst case behaviour,
i.e. we kill all entities without register message.
.NH 3
Additions and replacements.
.PP
When a new expression comes available,
we check whether the result is saved in a local
that may go in a register.
The last line of the expression must be followed
by a STL or SDL instruction,
depending on the size of the result
(resp. WS and 2*WS),
and a register message must be present for
this local.
If we have found such a local,
we store a pointer to it with the available expression.
Each time a new occurrence of this expression
is found,
we compare the value number of the local against
the value number of the result.
When they are different we remove the pointer to it,
because we cannot use it.
.PP
The available expressions are singly linked in a list.
When a new expression comes available,
we link it at the head of the list.
In this way expressions that are contained within other
expressions appear later in the list,
because EM-expressions are postfix.
When we are going to eliminate expressions,
we walk through the list,
starting at the head, to find the largest expressions first.
When we decide to eliminate an expression,
we look at the expressions in the tail of the list,
starting from where we are now,
to delete expressions that are contained within
the chosen one because
we cannot eliminate an expression more than once.
.PP
When we are going to eliminate expressions,
and we do not have a local that holds the result,
we emit a STL or SDL after the line where the expression
was first found.
The other occurrences are simply removed,
unless they contain instructions that not only have
effect on the stack; e.g. messages, stores, calls.
Before each instruction that needs the result on the stack,
we emit a LOL or LDL.
When the expression was an AAR,
but the instruction was a LAR or a SAR,
we append a LOI resp. a STI of the number of bytes
in an array-element after each LOL/LDL.
.NH 3
Desirability analysis
.PP
Although the global optimizer works on EM code,
the goal is to improve the quality of the object code.
Therefore we need some machine dependent information
to decide whether it is desirable to
eliminate a given expression.
Because it is impossible for the CS phase to know
exactly what code will be generated,
we use some heuristics.
In most cases it will save time when we eliminate an
operator, so we just do it.
We only look for some special cases.
.PP
Some operators can in some cases be translated
into an addressing mode for the machine at hand.
We only eliminate such an operator,
when its operand is itself "expensive",
i.e. not just a simple load.
The user of the CS phase has to supply
a set of such operators.
.PP
Eliminating the loading of the Local Base or
the Argument Base by the LXL resp. LXA instruction
is only beneficial when the number of lexical levels
we have to go back exceeds a certain threshold.
This threshold will be different when registers
are saved by the back end.
The user must supply this threshold.
.PP
Replacing a SAR or a LAR by an AAR followed by a LOI
may possibly increase the size of the object code.
We assume that this is only possible when the
size of the array element is greater than some
(user-supplied) limit.
.PP
There are back ends that can very efficiently translate
the index computing instruction sequence LOC SLI ADS.
If this is the case,
we do not eliminate the SLI instruction between a LOC
and an ADS.
.PP
To handle unforeseen cases, the user may also supply
a set of operators that should never be eliminated.
.NH 3
The algorithm
.PP
After these preparatory explanations,
we can be short about the algorithm itself.
For each instruction within our window,
the following steps are performed in the order given:
.IP 1.
We check if this instructin defines an entity.
If this is the case the set of entities is updated accordingly.
.IP 2.
We kill all entities that might be affected by this instruction.
.IP 3.
The instruction is simulated on the fake-stack.
Copy propagation is done.
If this instruction is an operator,
we update the list of available expressions accordingly.
.PP
When we have processed all instructions this way,
we have built a list of available expressions plus the information we
need to eliminate them.
Those expressions of which desirability analysis tells us so,
we eliminate.
The we shift our window and continue.