431 lines
		
	
	
	
		
			11 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
			
		
		
	
	
			431 lines
		
	
	
	
		
			11 KiB
		
	
	
	
		
			Text
		
	
	
	
	
	
.NH 2
 | 
						|
Definition of the intermediate code
 | 
						|
.PP
 | 
						|
The intermediate code of the optimizer consists
 | 
						|
of several components:
 | 
						|
.IP -
 | 
						|
the object table
 | 
						|
.IP -
 | 
						|
the procedure table
 | 
						|
.IP -
 | 
						|
the em code
 | 
						|
.IP -
 | 
						|
the control flow graphs
 | 
						|
.IP -
 | 
						|
the loop table
 | 
						|
.LP -
 | 
						|
.PP
 | 
						|
These components are described in
 | 
						|
the next sections.
 | 
						|
The syntactic structure of every component
 | 
						|
is described by a set of context free syntax rules,
 | 
						|
with the following conventions:
 | 
						|
.DS
 | 
						|
.TS
 | 
						|
l l.
 | 
						|
x	a non-terminal symbol
 | 
						|
A	a terminal symbol (in capitals)
 | 
						|
x: a b c;	a grammar rule
 | 
						|
a | b	a or b
 | 
						|
(a)+	1 or more occurrences of a
 | 
						|
{a}	0 or more occurrences of a
 | 
						|
.TE
 | 
						|
.DE
 | 
						|
.NH 3
 | 
						|
The object table
 | 
						|
.PP
 | 
						|
EM programs declare blocks of bytes rather than (global) variables.
 | 
						|
A typical program may declare 'HOL 7780'
 | 
						|
to allocate space for 8 I/O buffers,
 | 
						|
2 large arrays and 10 scalar variables.
 | 
						|
The optimizer wants to deal with
 | 
						|
.UL objects
 | 
						|
like variables, buffers and arrays
 | 
						|
and certainly not with huge numbers of bytes.
 | 
						|
Therefore the intermediate code contains information
 | 
						|
about which global objects are used.
 | 
						|
This information can be obtained from an EM program
 | 
						|
by just looking at the operands of instruction
 | 
						|
such as LOE, LAE, LDE, STE, SDE, INE, DEE and ZRE.
 | 
						|
.PP
 | 
						|
The object table consists of a list of
 | 
						|
.UL datablock
 | 
						|
entries.
 | 
						|
Each such entry represents a declaration like HOL, BSS,
 | 
						|
CON or ROM.
 | 
						|
There are five kinds of datablock entries.
 | 
						|
The fifth kind,
 | 
						|
UNKNOWN, denotes a declaration in a
 | 
						|
separately compiled file that is not made
 | 
						|
available to the optimizer.
 | 
						|
Each datablock entry contains the type of the block,
 | 
						|
its size, and a description of the objects that
 | 
						|
belong to it.
 | 
						|
If it is a rom,
 | 
						|
it also contains a list of values given
 | 
						|
as arguments to the rom instruction,
 | 
						|
provided that this list contains only integer numbers.
 | 
						|
An object has an offset (within its datablock)
 | 
						|
and a size.
 | 
						|
The size need not always be determinable.
 | 
						|
Both datablock and object contain a unique
 | 
						|
identifying number
 | 
						|
(see previous section for their use).
 | 
						|
.DS
 | 
						|
.UL syntax
 | 
						|
.TS
 | 
						|
lw(1i) l l.
 | 
						|
object_table:
 | 
						|
	{datablock} ;
 | 
						|
datablock:
 | 
						|
	D_ID	-- unique identifying number
 | 
						|
	PSEUDO	-- one of ROM,CON,BSS,HOL,UNKNOWN
 | 
						|
	SIZE	-- # bytes declared
 | 
						|
	FLAGS
 | 
						|
	{value}	-- contents of rom
 | 
						|
	{object} ;	-- objects of the datablock
 | 
						|
object:
 | 
						|
	O_ID	-- unique identifying number
 | 
						|
	OFFSET	-- offset within the datablock
 | 
						|
	SIZE ;	-- size of the object in bytes
 | 
						|
value:
 | 
						|
	argument ;
 | 
						|
.TE
 | 
						|
.DE
 | 
						|
A data block has only one flag: "external", indicating
 | 
						|
whether the data label is externally visible.
 | 
						|
The syntax for "argument" will be given later on
 | 
						|
(see em_text).
 | 
						|
.NH 3
 | 
						|
The procedure table
 | 
						|
.PP
 | 
						|
The procedure table contains global information
 | 
						|
about all procedures that are made available
 | 
						|
to the optimizer
 | 
						|
and that are needed by the EM program.
 | 
						|
(Library units may not be needed, see section 3.5).
 | 
						|
The table has one entry for
 | 
						|
every procedure.
 | 
						|
.DS
 | 
						|
.UL syntax
 | 
						|
.TS
 | 
						|
lw(1i) l l.
 | 
						|
procedure_table:
 | 
						|
	{procedure}
 | 
						|
procedure:
 | 
						|
	P_ID	-- unique identifying number
 | 
						|
	#LABELS	-- number of instruction labels
 | 
						|
	#LOCALS	-- number of bytes for locals 
 | 
						|
	#FORMALS	-- number of bytes for formals
 | 
						|
	FLAGS	-- flag bits
 | 
						|
	calling	-- procedures called by this one
 | 
						|
	change	-- info about global variables changed
 | 
						|
	use ;	-- info about global variables used
 | 
						|
calling:
 | 
						|
	{P_ID} ;	-- procedures called
 | 
						|
change:
 | 
						|
	ext	-- external variables changed
 | 
						|
	FLAGS ;
 | 
						|
use:
 | 
						|
	FLAGS ;
 | 
						|
ext:
 | 
						|
	{O_ID} ;	-- a set of objects
 | 
						|
.TE
 | 
						|
.DE
 | 
						|
.PP
 | 
						|
The number of bytes of formal parameters accessed by
 | 
						|
a procedure is determined by the front ends and
 | 
						|
passed via a message (parameter message) to the optimizer.
 | 
						|
If the front end is not able to determine this number
 | 
						|
(e.g. the parameter may be an array of dynamic size or
 | 
						|
the procedure may have a variable number of arguments) the attribute
 | 
						|
contains the value 'UNKNOWN_SIZE'.
 | 
						|
.sp 0
 | 
						|
A procedure has the following flags:
 | 
						|
.IP -
 | 
						|
external: true if the proc. is externally visible
 | 
						|
.IP -
 | 
						|
bodyseen: true if its code is available as EM text
 | 
						|
.IP -
 | 
						|
calunknown: true if it calls a procedure that has its bodyseen
 | 
						|
flag not set
 | 
						|
.IP -
 | 
						|
environ: true if it uses or changes a (non-global) variable in
 | 
						|
a lexically enclosing procedure
 | 
						|
.IP -
 | 
						|
lpi: true if is used as operand of an lpi instruction, so
 | 
						|
it may be called indirect
 | 
						|
.LP
 | 
						|
The change and use attributes both have one flag: "indirect",
 | 
						|
indicating whether the procedure does a 'use indirect'
 | 
						|
or a 'store indirect' (indirect means through a pointer).
 | 
						|
.NH 3
 | 
						|
The EM text
 | 
						|
.PP
 | 
						|
The EM text contains the EM instructions.
 | 
						|
Every EM instruction has an operation code (opcode)
 | 
						|
and 0 or 1 operands.
 | 
						|
EM pseudo instructions can have more than
 | 
						|
1 operand.
 | 
						|
The opcode is just a small (8 bit) integer.
 | 
						|
.sp
 | 
						|
There are several kinds of operands, which we will
 | 
						|
refer to as
 | 
						|
.UL types.
 | 
						|
Many EM instructions can have more than one type of operand.
 | 
						|
The types and their encodings in Compact Assembly Language
 | 
						|
are discussed extensively in.
 | 
						|
.[~[
 | 
						|
keizer architecture 
 | 
						|
.], section 11.2]
 | 
						|
Of special interest is the way numeric values
 | 
						|
are represented.
 | 
						|
Of prime importance is the machine independency of
 | 
						|
the representation.
 | 
						|
Ultimately, one could store every integer
 | 
						|
just as a string of the characters '0' to '9'.
 | 
						|
As doing arithmetic on strings is awkward,
 | 
						|
Compact Assembly Language allows several alternatives.
 | 
						|
The main idea is to look at the value of the integer.
 | 
						|
Integers that fit in 16, 32 or 64 bits are
 | 
						|
represented as a row of resp. 2, 4 and 8 bytes,
 | 
						|
preceded by an indication of how many bytes are used.
 | 
						|
Longer integers are represented as strings;
 | 
						|
this is only allowed within pseudo instructions, however.
 | 
						|
This concept works very well for target machines
 | 
						|
with reasonable word sizes.
 | 
						|
At present, most ACK software cannot be used for word sizes
 | 
						|
higher than 32 bits,
 | 
						|
although the handles for using larger word sizes are
 | 
						|
present in the design of the EM code.
 | 
						|
In the intermediate code we essentially use the
 | 
						|
same ideas.
 | 
						|
We allow three representations of integers.
 | 
						|
.IP -
 | 
						|
integers that fit in a short are represented as a short
 | 
						|
.IP -
 | 
						|
integers that fit in a long but not in a short are represented
 | 
						|
as longs
 | 
						|
.IP -
 | 
						|
all remaining integers are represented as strings
 | 
						|
(only allowed in pseudos).
 | 
						|
.LP
 | 
						|
The terms short and long are defined in
 | 
						|
.[~[
 | 
						|
ritchie reference manual programming language
 | 
						|
.], section 4]
 | 
						|
and depend only on the source machine
 | 
						|
(i.e. the machine on which ACK runs),
 | 
						|
not on the target machines.
 | 
						|
For historical reasons a long will often be called an
 | 
						|
.UL offset.
 | 
						|
.PP
 | 
						|
Operands can also be instruction labels,
 | 
						|
objects or procedures.
 | 
						|
Instruction labels are denoted by a
 | 
						|
.UL label
 | 
						|
.UL identifier,
 | 
						|
which can be distinguished from a normal identifier.
 | 
						|
.sp
 | 
						|
The operand of a pseudo instruction can be a list of
 | 
						|
.UL arguments.
 | 
						|
Arguments can have the same type as operands, except
 | 
						|
for the type short, which is not used for arguments.
 | 
						|
Furthermore, an argument can be a string or
 | 
						|
a string representation of a signed integer, unsigned integer
 | 
						|
or floating point number.
 | 
						|
If the number of arguments is not fully determined by
 | 
						|
the pseudo instruction (e.g. a ROM pseudo can have any number
 | 
						|
of arguments), then the list is terminated by a special
 | 
						|
argument of type CEND.
 | 
						|
.DS
 | 
						|
.UL syntax
 | 
						|
.TS
 | 
						|
lw(1i) l l.
 | 
						|
em_text:
 | 
						|
	{line} ;
 | 
						|
line:
 | 
						|
	INSTR	-- opcode
 | 
						|
	OPTYPE	-- operand type
 | 
						|
	operand ;
 | 
						|
operand:
 | 
						|
	empty |	-- OPTYPE = NO
 | 
						|
	SHORT |	-- OPTYPE = SHORT
 | 
						|
	OFFSET |	-- OPTYPE = OFFSET
 | 
						|
	LAB_ID |	-- OPTYPE = INSTRLAB
 | 
						|
	O_ID |	-- OPTYPE = OBJECT
 | 
						|
	P_ID |	-- OPTYPE = PROCEDURE
 | 
						|
	{argument} ;	-- OPTYPE = LIST
 | 
						|
argument:
 | 
						|
	ARGTYPE
 | 
						|
	arg ;
 | 
						|
arg:
 | 
						|
	empty |	-- ARGTYPE = CEND
 | 
						|
	OFFSET |
 | 
						|
	LAB_ID |
 | 
						|
	O_ID |
 | 
						|
	P_ID |
 | 
						|
	string |	-- ARGTYPE = STRING
 | 
						|
	const ;	-- ARGTYPE = ICON,UCON or FCON
 | 
						|
string:
 | 
						|
	LENGTH	-- number of characters
 | 
						|
	{CHARACTER} ;
 | 
						|
const:
 | 
						|
	SIZE	-- number of bytes
 | 
						|
	string ;	-- string representation of (un)signed
 | 
						|
		-- or floating point constant
 | 
						|
.TE
 | 
						|
.DE
 | 
						|
.NH 3
 | 
						|
The control flow graphs
 | 
						|
.PP
 | 
						|
Each procedure can be divided
 | 
						|
into a number of basic blocks.
 | 
						|
A basic block is a piece of code with
 | 
						|
no jumps in, except at the beginning,
 | 
						|
and no jumps out, except at the end.
 | 
						|
.PP
 | 
						|
Every basic block has a set of
 | 
						|
.UL successors,
 | 
						|
which are basic blocks that can follow it immediately in
 | 
						|
the dynamic execution sequence.
 | 
						|
The
 | 
						|
.UL predecessors
 | 
						|
are the basic blocks of which this one
 | 
						|
is a successor.
 | 
						|
The successor and predecessor attributes
 | 
						|
of all basic blocks of a single procedure
 | 
						|
are said to form the
 | 
						|
.UL control
 | 
						|
.UL flow
 | 
						|
.UL graph
 | 
						|
of that procedure.
 | 
						|
.PP
 | 
						|
Another important attribute is the
 | 
						|
.UL immediate
 | 
						|
.UL dominator.
 | 
						|
A basic block B dominates a block C if
 | 
						|
every path in the graph from the procedure entry block
 | 
						|
to C goes through B.
 | 
						|
The immediate dominator of C is the closest dominator
 | 
						|
of C on any path from the entry block.
 | 
						|
(Note that the dominator relation is transitive,
 | 
						|
so the immediate dominator is well defined.)
 | 
						|
.PP
 | 
						|
A basic block also has an attribute containing
 | 
						|
the identifiers of every
 | 
						|
.UL loop
 | 
						|
that the block belongs to (see next section for loops).
 | 
						|
.DS
 | 
						|
.UL syntax
 | 
						|
.TS
 | 
						|
lw(1i) l l.
 | 
						|
control_flow_graph:
 | 
						|
	{basic_block} ;
 | 
						|
basic_block:
 | 
						|
	B_ID	-- unique identifying number
 | 
						|
	#INSTR	-- number of EM instructions
 | 
						|
	succ
 | 
						|
	pred
 | 
						|
	idom	-- immediate dominator
 | 
						|
	loops	-- set of loops
 | 
						|
	FLAGS ;	-- flag bits
 | 
						|
succ:
 | 
						|
	{B_ID} ;
 | 
						|
pred:
 | 
						|
	{B_ID} ;
 | 
						|
idom:
 | 
						|
	B_ID ;
 | 
						|
loops:
 | 
						|
	{LP_ID} ;
 | 
						|
.TE
 | 
						|
.DE
 | 
						|
The flag bits can have the values 'firm' and 'strong',
 | 
						|
which are explained below.
 | 
						|
.NH 3
 | 
						|
The loop tables
 | 
						|
.PP
 | 
						|
Every procedure has an associated
 | 
						|
.UL loop
 | 
						|
.UL table
 | 
						|
containing information about all the loops
 | 
						|
in the procedure.
 | 
						|
Loops can be detected by a close inspection of
 | 
						|
the control flow graph.
 | 
						|
The main idea is to look for two basic blocks,
 | 
						|
B and C, for which the following holds:
 | 
						|
.IP -
 | 
						|
B is a successor of C
 | 
						|
.IP -
 | 
						|
B is a dominator of C
 | 
						|
.LP
 | 
						|
B is called the loop
 | 
						|
.UL entry
 | 
						|
and C is called the loop
 | 
						|
.UL end.
 | 
						|
Intuitively, C contains a jump backwards to
 | 
						|
the beginning of the loop (B).
 | 
						|
.PP
 | 
						|
A loop L1 is said to be
 | 
						|
.UL nested
 | 
						|
within loop L2 if all basic blocks of L1
 | 
						|
are also part of L2.
 | 
						|
It is important to note that loops could
 | 
						|
originally be written as a well structured for -or
 | 
						|
while loop or as a messy goto loop.
 | 
						|
Hence loops may partly overlap without one
 | 
						|
being nested inside the other.
 | 
						|
The
 | 
						|
.UL nesting
 | 
						|
.UL level
 | 
						|
of a loop is the number of loops in
 | 
						|
which it is nested (so it is 0 for
 | 
						|
an outermost loop).
 | 
						|
The details of loop detection will be discussed later.
 | 
						|
.PP
 | 
						|
It is often desirable to know whether a
 | 
						|
basic block gets executed during every iteration
 | 
						|
of a loop.
 | 
						|
This leads to the following definitions:
 | 
						|
.IP -
 | 
						|
A basic block B of a loop L is said to be a \fIfirm\fR block
 | 
						|
of L if B is executed on all successive iterations of L,
 | 
						|
with the only possible exception of the last iteration.
 | 
						|
.IP -
 | 
						|
A basic block B of a loop L is said to be a \fIstrong\fR block
 | 
						|
of L if B is executed on all successive iterations of L.
 | 
						|
.LP
 | 
						|
Note that a strong block is also a firm block.
 | 
						|
If a block is part of a conditional statement, it is neither
 | 
						|
strong nor firm, as it may be skipped during some iterations
 | 
						|
(see Fig. 3.2).
 | 
						|
.DS
 | 
						|
loop
 | 
						|
       if cond1 then
 | 
						|
	      ... \kx-- this code will not
 | 
						|
		  \h'|\nxu'-- result in a firm or strong block
 | 
						|
       end if;
 | 
						|
       ...  -- strong (always executed)
 | 
						|
       exit when cond2;
 | 
						|
       ...  \kx-- firm (not executed on last iteration).
 | 
						|
end loop;
 | 
						|
 | 
						|
Fig. 3.2 Example of firm and strong block
 | 
						|
.DE
 | 
						|
.DS
 | 
						|
.UL syntax
 | 
						|
.TS
 | 
						|
lw(1i) l l.
 | 
						|
looptable:
 | 
						|
	{loop} ;
 | 
						|
loop:
 | 
						|
	LP_ID	-- unique identifying number
 | 
						|
	LEVEL	-- loop nesting level
 | 
						|
	entry	-- loop entry block
 | 
						|
	end ;
 | 
						|
entry:
 | 
						|
	B_ID ;
 | 
						|
end:
 | 
						|
	B_ID ;
 | 
						|
.TE
 | 
						|
.DE
 |