415 lines
12 KiB
Plaintext
415 lines
12 KiB
Plaintext
|
.NH 2
|
||
|
Definition of the intermediate code
|
||
|
.PP
|
||
|
The intermediate code of the optimizer consists
|
||
|
of several components:
|
||
|
.IP -
|
||
|
the object table
|
||
|
.IP -
|
||
|
the procedure table
|
||
|
.IP -
|
||
|
the em code
|
||
|
.IP -
|
||
|
the control flow graphs
|
||
|
.IP -
|
||
|
the loop table
|
||
|
.LP -
|
||
|
.PP
|
||
|
These components are described in
|
||
|
the next sections.
|
||
|
The syntactic structure of every component
|
||
|
is described by a set of context free syntax rules,
|
||
|
with the following conventions:
|
||
|
.DS
|
||
|
x a non-terminal symbol
|
||
|
A a terminal symbol (in capitals)
|
||
|
x: a b c; a grammar rule
|
||
|
a | b a or b
|
||
|
(a)+ 1 or more occurrences of a
|
||
|
{a} 0 or more occurrences of a
|
||
|
.DE
|
||
|
.NH 3
|
||
|
The object table
|
||
|
.PP
|
||
|
EM programs declare blocks of bytes rather than (global) variables.
|
||
|
A typical program may declare 'HOL 7780'
|
||
|
to allocate space for 8 I/O buffers,
|
||
|
2 large arrays and 10 scalar variables.
|
||
|
The optimizer wants to deal with
|
||
|
.UL objects
|
||
|
like variables, buffers and arrays
|
||
|
and certainly not with huge numbers of bytes.
|
||
|
Therefore the intermediate code contains information
|
||
|
about which global objects are used.
|
||
|
This information can be obtained from an EM program
|
||
|
by just looking at the operands of instruction
|
||
|
such as LOE, LAE, LDE, STE, SDE, INE, DEE and ZRE.
|
||
|
.PP
|
||
|
The object table consists of a list of
|
||
|
.UL datablock
|
||
|
entries.
|
||
|
Each such entry represents a declaration like HOL, BSS,
|
||
|
CON or ROM.
|
||
|
There are five kinds of datablock entries.
|
||
|
The fifth kind,
|
||
|
UNKNOWN, denotes a declaration in a
|
||
|
separately compiled file that is not made
|
||
|
available to the optimizer.
|
||
|
Each datablock entry contains the type of the block,
|
||
|
its size, and a description of the objects that
|
||
|
belong to it.
|
||
|
If it is a rom,
|
||
|
it also contains a list of values given
|
||
|
as arguments to the rom instruction,
|
||
|
provided that this list contains only integer numbers.
|
||
|
An object has an offset (within its datablock)
|
||
|
and a size.
|
||
|
The size need not always be determinable.
|
||
|
Both datablock and object contain a unique
|
||
|
identifying number
|
||
|
(see previous section for their use).
|
||
|
.DS
|
||
|
.UL syntax
|
||
|
object_table:
|
||
|
{datablock} ;
|
||
|
datablock:
|
||
|
D_ID -- unique identifying number
|
||
|
PSEUDO -- one of ROM,CON,BSS,HOL,UNKNOWN
|
||
|
SIZE -- # bytes declared
|
||
|
FLAGS
|
||
|
{value} -- contents of rom
|
||
|
{object} ; -- objects of the datablock
|
||
|
object:
|
||
|
O_ID -- unique identifying number
|
||
|
OFFSET -- offset within the datablock
|
||
|
SIZE ; -- size of the object in bytes
|
||
|
value:
|
||
|
argument ;
|
||
|
.DE
|
||
|
A data block has only one flag: "external", indicating
|
||
|
whether the data label is externally visible.
|
||
|
The syntax for "argument" will be given later on
|
||
|
(see em_text).
|
||
|
.NH 3
|
||
|
The procedure table
|
||
|
.PP
|
||
|
The procedure table contains global information
|
||
|
about all procedures that are made available
|
||
|
to the optimizer
|
||
|
and that are needed by the EM program.
|
||
|
(Library units may not be needed, see section 3.5).
|
||
|
The table has one entry for
|
||
|
every procedure.
|
||
|
.DS
|
||
|
.UL syntax
|
||
|
procedure_table:
|
||
|
{procedure}
|
||
|
procedure:
|
||
|
P_ID -- unique identifying number
|
||
|
#LABELS -- number of instruction labels
|
||
|
#LOCALS -- number of bytes for locals
|
||
|
#FORMALS -- number of bytes for formals
|
||
|
FLAGS -- flag bits
|
||
|
calling -- procedures called by this one
|
||
|
change -- info about global variables changed
|
||
|
use ; -- info about global variables used
|
||
|
calling:
|
||
|
{P_ID} ; -- procedures called
|
||
|
change:
|
||
|
ext -- external variables changed
|
||
|
FLAGS ;
|
||
|
use:
|
||
|
FLAGS ;
|
||
|
ext:
|
||
|
{O_ID} ; -- a set of objects
|
||
|
.DE
|
||
|
.PP
|
||
|
The number of bytes of formal parameters accessed by
|
||
|
a procedure is determined by the front ends and
|
||
|
passed via a message (parameter message) to the optimizer.
|
||
|
If the front end is not able to determine this number
|
||
|
(e.g. the parameter may be an array of dynamic size or
|
||
|
the procedure may have a variable number of arguments) the attribute
|
||
|
contains the value 'UNKNOWN_SIZE'.
|
||
|
.sp 0
|
||
|
A procedure has the following flags:
|
||
|
.IP -
|
||
|
external: true if the proc. is externally visible
|
||
|
.IP -
|
||
|
bodyseen: true if its code is available as EM text
|
||
|
.IP -
|
||
|
calunknown: true if it calls a procedure that has its bodyseen
|
||
|
flag not set
|
||
|
.IP -
|
||
|
environ: true if it uses or changes a (non-global) variable in
|
||
|
a lexically enclosing procedure
|
||
|
.IP -
|
||
|
lpi: true if is used as operand of an lpi instruction, so
|
||
|
it may be called indirect
|
||
|
.LP
|
||
|
The change and use attributes both have one flag: "indirect",
|
||
|
indicating whether the procedure does a 'use indirect'
|
||
|
or a 'store indirect' (indirect means through a pointer).
|
||
|
.NH 3
|
||
|
The EM text
|
||
|
.PP
|
||
|
The EM text contains the EM instructions.
|
||
|
Every EM instruction has an operation code (opcode)
|
||
|
and 0 or 1 operands.
|
||
|
EM pseudo instructions can have more than
|
||
|
1 operand.
|
||
|
The opcode is just a small (8 bit) integer.
|
||
|
.sp
|
||
|
There are several kinds of operands, which we will
|
||
|
refer to as
|
||
|
.UL types.
|
||
|
Many EM instructions can have more than one type of operand.
|
||
|
The types and their encodings in Compact Assembly Language
|
||
|
are discussed extensively in.
|
||
|
.[~[
|
||
|
keizer architecture
|
||
|
.], section 11.2]
|
||
|
Of special interest is the way numeric values
|
||
|
are represented.
|
||
|
Of prime importance is the machine independency of
|
||
|
the representation.
|
||
|
Ultimately, one could store every integer
|
||
|
just as a string of the characters '0' to '9'.
|
||
|
As doing arithmetic on strings is awkward,
|
||
|
Compact Assembly Language allows several alternatives.
|
||
|
The main idea is to look at the value of the integer.
|
||
|
Integers that fit in 16, 32 or 64 bits are
|
||
|
represented as a row of resp. 2, 4 and 8 bytes,
|
||
|
preceded by an indication of how many bytes are used.
|
||
|
Longer integers are represented as strings;
|
||
|
this is only allowed within pseudo instructions, however.
|
||
|
This concept works very well for target machines
|
||
|
with reasonable word sizes.
|
||
|
At present, most ACK software cannot be used for word sizes
|
||
|
higher than 32 bits,
|
||
|
although the handles for using larger word sizes are
|
||
|
present in the design of the EM code.
|
||
|
In the intermediate code we essentially use the
|
||
|
same ideas.
|
||
|
We allow three representations of integers.
|
||
|
.IP -
|
||
|
integers that fit in a short are represented as a short
|
||
|
.IP -
|
||
|
integers that fit in a long but not in a short are represented
|
||
|
as longs
|
||
|
.IP -
|
||
|
all remaining integers are represented as strings
|
||
|
(only allowed in pseudos).
|
||
|
.LP
|
||
|
The terms short and long are defined in
|
||
|
.[~[
|
||
|
ritchie reference manual programming language
|
||
|
.], section 4]
|
||
|
and depend only on the source machine
|
||
|
(i.e. the machine on which ACK runs),
|
||
|
not on the target machines.
|
||
|
For historical reasons a long will often be called an
|
||
|
.UL offset.
|
||
|
.PP
|
||
|
Operands can also be instruction labels,
|
||
|
objects or procedures.
|
||
|
Instruction labels are denoted by a
|
||
|
.UL label
|
||
|
.UL identifier,
|
||
|
which can be distinguished from a normal identifier.
|
||
|
.sp
|
||
|
The operand of a pseudo instruction can be a list of
|
||
|
.UL arguments.
|
||
|
Arguments can have the same type as operands, except
|
||
|
for the type short, which is not used for arguments.
|
||
|
Furthermore, an argument can be a string or
|
||
|
a string representation of a signed integer, unsigned integer
|
||
|
or floating point number.
|
||
|
If the number of arguments is not fully determined by
|
||
|
the pseudo instruction (e.g. a ROM pseudo can have any number
|
||
|
of arguments), then the list is terminated by a special
|
||
|
argument of type CEND.
|
||
|
.DS
|
||
|
.UL syntax
|
||
|
em_text:
|
||
|
{line} ;
|
||
|
line:
|
||
|
INSTR -- opcode
|
||
|
OPTYPE -- operand type
|
||
|
operand ;
|
||
|
operand:
|
||
|
empty | -- OPTYPE = NO
|
||
|
SHORT | -- OPTYPE = SHORT
|
||
|
OFFSET | -- OPTYPE = OFFSET
|
||
|
LAB_ID | -- OPTYPE = INSTRLAB
|
||
|
O_ID | -- OPTYPE = OBJECT
|
||
|
P_ID | -- OPTYPE = PROCEDURE
|
||
|
{argument} ; -- OPTYPE = LIST
|
||
|
argument:
|
||
|
ARGTYPE
|
||
|
arg ;
|
||
|
arg:
|
||
|
empty | -- ARGTYPE = CEND
|
||
|
OFFSET |
|
||
|
LAB_ID |
|
||
|
O_ID |
|
||
|
P_ID |
|
||
|
string | -- ARGTYPE = STRING
|
||
|
const ; -- ARGTYPE = ICON,UCON or FCON
|
||
|
string:
|
||
|
LENGTH -- number of characters
|
||
|
{CHARACTER} ;
|
||
|
const:
|
||
|
SIZE -- number of bytes
|
||
|
string ; -- string representation of (un)signed
|
||
|
-- or floating point constant
|
||
|
.DE
|
||
|
.NH 3
|
||
|
The control flow graphs
|
||
|
.PP
|
||
|
Each procedure can be divided
|
||
|
into a number of basic blocks.
|
||
|
A basic block is a piece of code with
|
||
|
no jumps in, except at the beginning,
|
||
|
and no jumps out, except at the end.
|
||
|
.PP
|
||
|
Every basic block has a set of
|
||
|
.UL successors,
|
||
|
which are basic blocks that can follow it immediately in
|
||
|
the dynamic execution sequence.
|
||
|
The
|
||
|
.UL predecessors
|
||
|
are the basic blocks of which this one
|
||
|
is a successor.
|
||
|
The successor and predecessor attributes
|
||
|
of all basic blocks of a single procedure
|
||
|
are said to form the
|
||
|
.UL control
|
||
|
.UL flow
|
||
|
.UL graph
|
||
|
of that procedure.
|
||
|
.PP
|
||
|
Another important attribute is the
|
||
|
.UL immediate
|
||
|
.UL dominator.
|
||
|
A basic block B dominates a block C if
|
||
|
every path in the graph from the procedure entry block
|
||
|
to C goes through B.
|
||
|
The immediate dominator of C is the closest dominator
|
||
|
of C on any path from the entry block.
|
||
|
(Note that the dominator relation is transitive,
|
||
|
so the immediate dominator is well defined.)
|
||
|
.PP
|
||
|
A basic block also has an attribute containing
|
||
|
the identifiers of every
|
||
|
.UL loop
|
||
|
that the block belongs to (see next section for loops).
|
||
|
.DS
|
||
|
.UL syntax
|
||
|
control_flow_graph:
|
||
|
{basic_block} ;
|
||
|
basic_block:
|
||
|
B_ID -- unique identifying number
|
||
|
#INSTR -- number of EM instructions
|
||
|
succ
|
||
|
pred
|
||
|
idom -- immediate dominator
|
||
|
loops -- set of loops
|
||
|
FLAGS ; -- flag bits
|
||
|
succ:
|
||
|
{B_ID} ;
|
||
|
pred:
|
||
|
{B_ID} ;
|
||
|
idom:
|
||
|
B_ID ;
|
||
|
loops:
|
||
|
{LP_ID} ;
|
||
|
.DE
|
||
|
The flag bits can have the values 'firm' and 'strong',
|
||
|
which are explained below.
|
||
|
.NH 3
|
||
|
The loop tables
|
||
|
.PP
|
||
|
Every procedure has an associated
|
||
|
.UL loop
|
||
|
.UL table
|
||
|
containing information about all the loops
|
||
|
in the procedure.
|
||
|
Loops can be detected by a close inspection of
|
||
|
the control flow graph.
|
||
|
The main idea is to look for two basic blocks,
|
||
|
B and C, for which the following holds:
|
||
|
.IP -
|
||
|
B is a successor of C
|
||
|
.IP -
|
||
|
B is a dominator of C
|
||
|
.LP
|
||
|
B is called the loop
|
||
|
.UL entry
|
||
|
and C is called the loop
|
||
|
.UL end.
|
||
|
Intuitively, C contains a jump backwards to
|
||
|
the beginning of the loop (B).
|
||
|
.PP
|
||
|
A loop L1 is said to be
|
||
|
.UL nested
|
||
|
within loop L2 if all basic blocks of L1
|
||
|
are also part of L2.
|
||
|
It is important to note that loops could
|
||
|
originally be written as a well structured for -or
|
||
|
while loop or as a messy goto loop.
|
||
|
Hence loops may partly overlap without one
|
||
|
being nested inside the other.
|
||
|
The
|
||
|
.UL nesting
|
||
|
.UL level
|
||
|
of a loop is the number of loops in
|
||
|
which it is nested (so it is 0 for
|
||
|
an outermost loop).
|
||
|
The details of loop detection will be discussed later.
|
||
|
.PP
|
||
|
It is often desirable to know whether a
|
||
|
basic block gets executed during every iteration
|
||
|
of a loop.
|
||
|
This leads to the following definitions:
|
||
|
.IP -
|
||
|
A basic block B of a loop L is said to be a \fIfirm\fR block
|
||
|
of L if B is executed on all successive iterations of L,
|
||
|
with the only possible exception of the last iteration.
|
||
|
.IP -
|
||
|
A basic block B of a loop L is said to be a \fIstrong\fR block
|
||
|
of L if B is executed on all successive iterations of L.
|
||
|
.LP
|
||
|
Note that a strong block is also a firm block.
|
||
|
If a block is part of a conditional statement, it is neither
|
||
|
strong nor firm, as it may be skipped during some iterations
|
||
|
(see Fig. 3.2).
|
||
|
.DS
|
||
|
loop
|
||
|
if cond1 then
|
||
|
... -- this code will not
|
||
|
-- result in a firm or strong block
|
||
|
end if;
|
||
|
... -- strong (always executed)
|
||
|
exit when cond2;
|
||
|
... -- firm (not executed on
|
||
|
-- last iteration).
|
||
|
end loop;
|
||
|
|
||
|
Fig. 3.2 Example of firm and strong block
|
||
|
.DE
|
||
|
.DS
|
||
|
.UL syntax
|
||
|
looptable:
|
||
|
{loop} ;
|
||
|
loop:
|
||
|
LP_ID -- unique identifying number
|
||
|
LEVEL -- loop nesting level
|
||
|
entry -- loop entry block
|
||
|
end ;
|
||
|
entry:
|
||
|
B_ID ;
|
||
|
end:
|
||
|
B_ID ;
|
||
|
.DE
|