ack/doc/ego/sr/sr3

.NH 2
Implementation
.PP
Like most phases, SR deals with one procedure
at a time.
Within a procedure, SR works on one loop at a time.
Loops are processed in textual order.
If loops are nested inside each other,
SR starts with the outermost loop and proceeds in the
inwards direction.
This order is chosen,
because it enables the optimization
of multi-dimensional array address computations,
if the elements are accessed in the usual way
(i.e. row after row, rather than column after column).
For every loop, SR first detects all induction variables
and then tries to recognize
expressions that can be optimized.
.NH 3
Finding induction variables
.PP
The process of finding induction variables
can conveniently be split up
into two parts.
First, the EM text of the loop is scanned to find
all \fIcandidate\fR induction variables,
which are word-sized local variables
that are assigned precisely once
in the loop, within a firm block.
Second, for every candidate, the single assignment
is inspected, to see if it has the form
required by the definition of an induction variable.
.PP
Candidates are found by scanning the EM code of the loop.
During this scan, two sets are maintained.
The set "cand" contains all variables that were
assigned exactly once so far, within a firm block.
The set "dismiss" contains all variables that
should not be made a candidate.
Initially, both sets are empty.
If a variable is assigned to, it is put
in the cand set, if three conditions are met:
.IP 1.
the variable was not in cand or dismiss already
.IP 2.
the assignment takes place in a firm block
.IP 3.
the assignment is not a ZRL instruction (assignment by zero)
or a SDL instruction (store double local).
.LP
If any condition fails, the variable is dismissed from cand
(if it was there already) and put in dismiss
(if it was not there already).
.sp 0
All variables for which no register message was generated (i.e. those
variables that may be accessed indirectly) are assumed
to be changed in the loop.
.sp 0
All variables that remain in cand are candidate induction variables.
.PP
From the set of candidates, the induction variables can
be determined, by inspecting the single assignment.
The assignment must match one of the EM patterns below.
('x' is the candidate. 'ws' is the word size of the target machine.
'n' is any number.)
.DS
.TS
l l.
\fIpattern\fR	\fIstep size\fR
INL x  |	+1
DEL x  |	-1
LOL x ; (INC | DEC) ; STL x  |	+1 | -1
LOL x ; LOC n ; (ADI ws | SBI ws) ; STL x  |	+n | -n
LOC n ; LOL x ; ADI ws ; STL x	+n
.TE
.DE
From the patterns the step size of the induction variable
can also be determined.
These step sizes are displayed on the right hand side.
.sp
For every induction variable we maintain the following information:
.IP -
the offset of the variable in the stackframe of its procedure
.IP -
a pointer to the EM text of the assignment statement
.IP -
the step value
.LP
.NH 3
Optimizing expressions
.PP
If any induction variables of the loop were found,
the EM text of the loop is scanned again,
to detect expressions that can be optimized.
SR scans for multiplication and array instructions.
Whenever it finds such an instruction, it analyses the
code in front of it.
If an expression is to be optimized, it must
be generated by the following syntax rules.
.DS
.TS
l l.
optimizable_expr:
	iv_expr const mult |
	const iv_expr mult |
	address iv_expr address array_instr;
mult:
	MLI ws |
	MLU ws ;
array_instr:
	LAR ws |
	SAR ws |
	AAR ws ;
const:
	LOC n ;
.TE
.DE
An 'address' is an EM instruction that loads an
address on the stack.
An instruction like LOL may be an 'address', if
the size of an address (pointer size, =ps) is
the same as the word size.
If the pointer size is twice the word size,
instructions like LDL are an 'address'.
(The addresses in the third grammar rule
denote resp. the array address and the
array descriptor address).
.DS
.TS
l l.
address:
	LAE |
	LAL |
	LOL if ps=ws |
	LOE    ,,    |
	LIL    ,,    |
	LDL if ps=2*ws |
	LDE    ,,      ;
.TE
.DE
The notion of an iv-expression was introduced earlier.
.DS
.TS
l l.
iv_expr:
	iv_expr unair_op |
	iv_expr iv_expr binary_op |
	loopconst |
	iv ;
unair_op:
	NGI ws |
	INC |
	DEC ;
binary_op:
	ADI ws |
	ADU ws |
	SBI ws |
	SBU ws ;
loopconst:
	const |
	LOL x  if x is not changed in loop ;
iv:
	LOL x  if x is an induction variable ;
.TE
.DE
An iv_expression must satisfy one additional constraint:
it must use exactly one operand that is an induction
variable.
A simple, hand written, top-down parser is used
to recognize an iv-expression.
It scans the EM code from right to left
(recall that EM is essentially postfix).
It uses semantic attributes (inherited as well as
derived) to check the additional constraint.
.PP
All information assembled during the recognition
process is put in a 'code_info' structure.
This structure contains the following information:
.IP -
the optimizable code itself
.IP -
the loop and basic block the code is part of
.IP -
the induction variable
.IP -
the iv-expression
.IP -
the sign of the induction variable in the
iv-expression
.IP -
the offset and size of the temporary local variable
.IP -
the expensive operator (MLI, LAR etc.)
.IP -
the instruction that loads the constant
(for multiplication) or the array descriptor
(for arrays).
.LP
The entire transformation process is driven
by this information.
As the EM text is represented internally
as a list, this process consists
mainly of straightforward list manipulations.
.sp 0
The initialization code must be put
immediately before the loop entry.
For this purpose a \fIheader block\fR is
created that has the loop entry block as
its only successor and that dominates the
entry block.
The CFG and all relations (SUCC,PRED, IDOM, LOOPS etc.)
are updated.
.sp 0
An EM instruction that will
replace the optimizable code
is created and put at the place of the old code.
The list representing the old optimizable code
is used to create a list for the initializing code,
as they are similar.
Only two modifications are required:
.IP -
if the expensive operator is a LAR or SAR,
it must be replaced by an AAR, as the initial value
of TMP is the \fIaddress\fR of the first
array element that is accessed.
.IP -
code must be appended to store the result of the
expression in TMP.
.LP
Finally, code to increment TMP is created and put after
the code of the single assignment to the
induction variable.
The generated code uses either an integer addition
(ADI) or an integer-to-pointer addition (ADS)
to do the increment.
.PP
SR maintains a set of all expressions that have already
been recognized in the present loop.
Such expressions are said to be \fIavailable\fR.
If an expression is recognized that is
already available,
no new temporary local variable is allocated for it,
and the code to initialize and increment the local
is not generated.