ack/doc/ego/ic/ic3

.NH 2
Definition of the intermediate code
.PP
The intermediate code of the optimizer consists
of several components:
.IP -
the object table
.IP -
the procedure table
.IP -
the em code
.IP -
the control flow graphs
.IP -
the loop table
.LP -
.PP
These components are described in
the next sections.
The syntactic structure of every component
is described by a set of context free syntax rules,
with the following conventions:
.DS
.TS
l l.
x	a non-terminal symbol
A	a terminal symbol (in capitals)
x: a b c;	a grammar rule
a | b	a or b
(a)+	1 or more occurrences of a
{a}	0 or more occurrences of a
.TE
.DE
.NH 3
The object table
.PP
EM programs declare blocks of bytes rather than (global) variables.
A typical program may declare 'HOL 7780'
to allocate space for 8 I/O buffers,
2 large arrays and 10 scalar variables.
The optimizer wants to deal with
.UL objects
like variables, buffers and arrays
and certainly not with huge numbers of bytes.
Therefore the intermediate code contains information
about which global objects are used.
This information can be obtained from an EM program
by just looking at the operands of instruction
such as LOE, LAE, LDE, STE, SDE, INE, DEE and ZRE.
.PP
The object table consists of a list of
.UL datablock
entries.
Each such entry represents a declaration like HOL, BSS,
CON or ROM.
There are five kinds of datablock entries.
The fifth kind,
UNKNOWN, denotes a declaration in a
separately compiled file that is not made
available to the optimizer.
Each datablock entry contains the type of the block,
its size, and a description of the objects that
belong to it.
If it is a rom,
it also contains a list of values given
as arguments to the rom instruction,
provided that this list contains only integer numbers.
An object has an offset (within its datablock)
and a size.
The size need not always be determinable.
Both datablock and object contain a unique
identifying number
(see previous section for their use).
.DS
.UL syntax
.TS
lw(1i) l l.
object_table:
	{datablock} ;
datablock:
	D_ID	-- unique identifying number
	PSEUDO	-- one of ROM,CON,BSS,HOL,UNKNOWN
	SIZE	-- # bytes declared
	FLAGS
	{value}	-- contents of rom
	{object} ;	-- objects of the datablock
object:
	O_ID	-- unique identifying number
	OFFSET	-- offset within the datablock
	SIZE ;	-- size of the object in bytes
value:
	argument ;
.TE
.DE
A data block has only one flag: "external", indicating
whether the data label is externally visible.
The syntax for "argument" will be given later on
(see em_text).
.NH 3
The procedure table
.PP
The procedure table contains global information
about all procedures that are made available
to the optimizer
and that are needed by the EM program.
(Library units may not be needed, see section 3.5).
The table has one entry for
every procedure.
.DS
.UL syntax
.TS
lw(1i) l l.
procedure_table:
	{procedure}
procedure:
	P_ID	-- unique identifying number
	#LABELS	-- number of instruction labels
	#LOCALS	-- number of bytes for locals
	#FORMALS	-- number of bytes for formals
	FLAGS	-- flag bits
	calling	-- procedures called by this one
	change	-- info about global variables changed
	use ;	-- info about global variables used
calling:
	{P_ID} ;	-- procedures called
change:
	ext	-- external variables changed
	FLAGS ;
use:
	FLAGS ;
ext:
	{O_ID} ;	-- a set of objects
.TE
.DE
.PP
The number of bytes of formal parameters accessed by
a procedure is determined by the front ends and
passed via a message (parameter message) to the optimizer.
If the front end is not able to determine this number
(e.g. the parameter may be an array of dynamic size or
the procedure may have a variable number of arguments) the attribute
contains the value 'UNKNOWN_SIZE'.
.sp 0
A procedure has the following flags:
.IP -
external: true if the proc. is externally visible
.IP -
bodyseen: true if its code is available as EM text
.IP -
calunknown: true if it calls a procedure that has its bodyseen
flag not set
.IP -
environ: true if it uses or changes a (non-global) variable in
a lexically enclosing procedure
.IP -
lpi: true if is used as operand of an lpi instruction, so
it may be called indirect
.LP
The change and use attributes both have one flag: "indirect",
indicating whether the procedure does a 'use indirect'
or a 'store indirect' (indirect means through a pointer).
.NH 3
The EM text
.PP
The EM text contains the EM instructions.
Every EM instruction has an operation code (opcode)
and 0 or 1 operands.
EM pseudo instructions can have more than
1 operand.
The opcode is just a small (8 bit) integer.
.sp
There are several kinds of operands, which we will
refer to as
.UL types.
Many EM instructions can have more than one type of operand.
The types and their encodings in Compact Assembly Language
are discussed extensively in.
.[~[
keizer architecture
.], section 11.2]
Of special interest is the way numeric values
are represented.
Of prime importance is the machine independency of
the representation.
Ultimately, one could store every integer
just as a string of the characters '0' to '9'.
As doing arithmetic on strings is awkward,
Compact Assembly Language allows several alternatives.
The main idea is to look at the value of the integer.
Integers that fit in 16, 32 or 64 bits are
represented as a row of resp. 2, 4 and 8 bytes,
preceded by an indication of how many bytes are used.
Longer integers are represented as strings;
this is only allowed within pseudo instructions, however.
This concept works very well for target machines
with reasonable word sizes.
At present, most ACK software cannot be used for word sizes
higher than 32 bits,
although the handles for using larger word sizes are
present in the design of the EM code.
In the intermediate code we essentially use the
same ideas.
We allow three representations of integers.
.IP -
integers that fit in a short are represented as a short
.IP -
integers that fit in a long but not in a short are represented
as longs
.IP -
all remaining integers are represented as strings
(only allowed in pseudos).
.LP
The terms short and long are defined in
.[~[
ritchie reference manual programming language
.], section 4]
and depend only on the source machine
(i.e. the machine on which ACK runs),
not on the target machines.
For historical reasons a long will often be called an
.UL offset.
.PP
Operands can also be instruction labels,
objects or procedures.
Instruction labels are denoted by a
.UL label
.UL identifier,
which can be distinguished from a normal identifier.
.sp
The operand of a pseudo instruction can be a list of
.UL arguments.
Arguments can have the same type as operands, except
for the type short, which is not used for arguments.
Furthermore, an argument can be a string or
a string representation of a signed integer, unsigned integer
or floating point number.
If the number of arguments is not fully determined by
the pseudo instruction (e.g. a ROM pseudo can have any number
of arguments), then the list is terminated by a special
argument of type CEND.
.DS
.UL syntax
.TS
lw(1i) l l.
em_text:
	{line} ;
line:
	INSTR	-- opcode
	OPTYPE	-- operand type
	operand ;
operand:
	empty |	-- OPTYPE = NO
	SHORT |	-- OPTYPE = SHORT
	OFFSET |	-- OPTYPE = OFFSET
	LAB_ID |	-- OPTYPE = INSTRLAB
	O_ID |	-- OPTYPE = OBJECT
	P_ID |	-- OPTYPE = PROCEDURE
	{argument} ;	-- OPTYPE = LIST
argument:
	ARGTYPE
	arg ;
arg:
	empty |	-- ARGTYPE = CEND
	OFFSET |
	LAB_ID |
	O_ID |
	P_ID |
	string |	-- ARGTYPE = STRING
	const ;	-- ARGTYPE = ICON,UCON or FCON
string:
	LENGTH	-- number of characters
	{CHARACTER} ;
const:
	SIZE	-- number of bytes
	string ;	-- string representation of (un)signed
		-- or floating point constant
.TE
.DE
.NH 3
The control flow graphs
.PP
Each procedure can be divided
into a number of basic blocks.
A basic block is a piece of code with
no jumps in, except at the beginning,
and no jumps out, except at the end.
.PP
Every basic block has a set of
.UL successors,
which are basic blocks that can follow it immediately in
the dynamic execution sequence.
The
.UL predecessors
are the basic blocks of which this one
is a successor.
The successor and predecessor attributes
of all basic blocks of a single procedure
are said to form the
.UL control
.UL flow
.UL graph
of that procedure.
.PP
Another important attribute is the
.UL immediate
.UL dominator.
A basic block B dominates a block C if
every path in the graph from the procedure entry block
to C goes through B.
The immediate dominator of C is the closest dominator
of C on any path from the entry block.
(Note that the dominator relation is transitive,
so the immediate dominator is well defined.)
.PP
A basic block also has an attribute containing
the identifiers of every
.UL loop
that the block belongs to (see next section for loops).
.DS
.UL syntax
.TS
lw(1i) l l.
control_flow_graph:
	{basic_block} ;
basic_block:
	B_ID	-- unique identifying number
	#INSTR	-- number of EM instructions
	succ
	pred
	idom	-- immediate dominator
	loops	-- set of loops
	FLAGS ;	-- flag bits
succ:
	{B_ID} ;
pred:
	{B_ID} ;
idom:
	B_ID ;
loops:
	{LP_ID} ;
.TE
.DE
The flag bits can have the values 'firm' and 'strong',
which are explained below.
.NH 3
The loop tables
.PP
Every procedure has an associated
.UL loop
.UL table
containing information about all the loops
in the procedure.
Loops can be detected by a close inspection of
the control flow graph.
The main idea is to look for two basic blocks,
B and C, for which the following holds:
.IP -
B is a successor of C
.IP -
B is a dominator of C
.LP
B is called the loop
.UL entry
and C is called the loop
.UL end.
Intuitively, C contains a jump backwards to
the beginning of the loop (B).
.PP
A loop L1 is said to be
.UL nested
within loop L2 if all basic blocks of L1
are also part of L2.
It is important to note that loops could
originally be written as a well structured for -or
while loop or as a messy goto loop.
Hence loops may partly overlap without one
being nested inside the other.
The
.UL nesting
.UL level
of a loop is the number of loops in
which it is nested (so it is 0 for
an outermost loop).
The details of loop detection will be discussed later.
.PP
It is often desirable to know whether a
basic block gets executed during every iteration
of a loop.
This leads to the following definitions:
.IP -
A basic block B of a loop L is said to be a \fIfirm\fR block
of L if B is executed on all successive iterations of L,
with the only possible exception of the last iteration.
.IP -
A basic block B of a loop L is said to be a \fIstrong\fR block
of L if B is executed on all successive iterations of L.
.LP
Note that a strong block is also a firm block.
If a block is part of a conditional statement, it is neither
strong nor firm, as it may be skipped during some iterations
(see Fig. 3.2).
.DS
loop
       if cond1 then
	      ... \kx-- this code will not
		  \h'|\nxu'-- result in a firm or strong block
       end if;
       ...  -- strong (always executed)
       exit when cond2;
       ...  \kx-- firm (not executed on last iteration).
end loop;

Fig. 3.2 Example of firm and strong block
.DE
.DS
.UL syntax
.TS
lw(1i) l l.
looptable:
	{loop} ;
loop:
	LP_ID	-- unique identifying number
	LEVEL	-- loop nesting level
	entry	-- loop entry block
	end ;
entry:
	B_ID ;
end:
	B_ID ;
.TE
.DE