.NH 2 Definition of the intermediate code .PP The intermediate code of the optimizer consists of several components: .IP - the object table .IP - the procedure table .IP - the em code .IP - the control flow graphs .IP - the loop table .LP - .PP These components are described in the next sections. The syntactic structure of every component is described by a set of context free syntax rules, with the following conventions: .DS x a non-terminal symbol A a terminal symbol (in capitals) x: a b c; a grammar rule a | b a or b (a)+ 1 or more occurrences of a {a} 0 or more occurrences of a .DE .NH 3 The object table .PP EM programs declare blocks of bytes rather than (global) variables. A typical program may declare 'HOL 7780' to allocate space for 8 I/O buffers, 2 large arrays and 10 scalar variables. The optimizer wants to deal with .UL objects like variables, buffers and arrays and certainly not with huge numbers of bytes. Therefore the intermediate code contains information about which global objects are used. This information can be obtained from an EM program by just looking at the operands of instruction such as LOE, LAE, LDE, STE, SDE, INE, DEE and ZRE. .PP The object table consists of a list of .UL datablock entries. Each such entry represents a declaration like HOL, BSS, CON or ROM. There are five kinds of datablock entries. The fifth kind, UNKNOWN, denotes a declaration in a separately compiled file that is not made available to the optimizer. Each datablock entry contains the type of the block, its size, and a description of the objects that belong to it. If it is a rom, it also contains a list of values given as arguments to the rom instruction, provided that this list contains only integer numbers. An object has an offset (within its datablock) and a size. The size need not always be determinable. Both datablock and object contain a unique identifying number (see previous section for their use). .DS .UL syntax object_table: {datablock} ; datablock: D_ID -- unique identifying number PSEUDO -- one of ROM,CON,BSS,HOL,UNKNOWN SIZE -- # bytes declared FLAGS {value} -- contents of rom {object} ; -- objects of the datablock object: O_ID -- unique identifying number OFFSET -- offset within the datablock SIZE ; -- size of the object in bytes value: argument ; .DE A data block has only one flag: "external", indicating whether the data label is externally visible. The syntax for "argument" will be given later on (see em_text). .NH 3 The procedure table .PP The procedure table contains global information about all procedures that are made available to the optimizer and that are needed by the EM program. (Library units may not be needed, see section 3.5). The table has one entry for every procedure. .DS .UL syntax procedure_table: {procedure} procedure: P_ID -- unique identifying number #LABELS -- number of instruction labels #LOCALS -- number of bytes for locals #FORMALS -- number of bytes for formals FLAGS -- flag bits calling -- procedures called by this one change -- info about global variables changed use ; -- info about global variables used calling: {P_ID} ; -- procedures called change: ext -- external variables changed FLAGS ; use: FLAGS ; ext: {O_ID} ; -- a set of objects .DE .PP The number of bytes of formal parameters accessed by a procedure is determined by the front ends and passed via a message (parameter message) to the optimizer. If the front end is not able to determine this number (e.g. the parameter may be an array of dynamic size or the procedure may have a variable number of arguments) the attribute contains the value 'UNKNOWN_SIZE'. .sp 0 A procedure has the following flags: .IP - external: true if the proc. is externally visible .IP - bodyseen: true if its code is available as EM text .IP - calunknown: true if it calls a procedure that has its bodyseen flag not set .IP - environ: true if it uses or changes a (non-global) variable in a lexically enclosing procedure .IP - lpi: true if is used as operand of an lpi instruction, so it may be called indirect .LP The change and use attributes both have one flag: "indirect", indicating whether the procedure does a 'use indirect' or a 'store indirect' (indirect means through a pointer). .NH 3 The EM text .PP The EM text contains the EM instructions. Every EM instruction has an operation code (opcode) and 0 or 1 operands. EM pseudo instructions can have more than 1 operand. The opcode is just a small (8 bit) integer. .sp There are several kinds of operands, which we will refer to as .UL types. Many EM instructions can have more than one type of operand. The types and their encodings in Compact Assembly Language are discussed extensively in. .[~[ keizer architecture .], section 11.2] Of special interest is the way numeric values are represented. Of prime importance is the machine independency of the representation. Ultimately, one could store every integer just as a string of the characters '0' to '9'. As doing arithmetic on strings is awkward, Compact Assembly Language allows several alternatives. The main idea is to look at the value of the integer. Integers that fit in 16, 32 or 64 bits are represented as a row of resp. 2, 4 and 8 bytes, preceded by an indication of how many bytes are used. Longer integers are represented as strings; this is only allowed within pseudo instructions, however. This concept works very well for target machines with reasonable word sizes. At present, most ACK software cannot be used for word sizes higher than 32 bits, although the handles for using larger word sizes are present in the design of the EM code. In the intermediate code we essentially use the same ideas. We allow three representations of integers. .IP - integers that fit in a short are represented as a short .IP - integers that fit in a long but not in a short are represented as longs .IP - all remaining integers are represented as strings (only allowed in pseudos). .LP The terms short and long are defined in .[~[ ritchie reference manual programming language .], section 4] and depend only on the source machine (i.e. the machine on which ACK runs), not on the target machines. For historical reasons a long will often be called an .UL offset. .PP Operands can also be instruction labels, objects or procedures. Instruction labels are denoted by a .UL label .UL identifier, which can be distinguished from a normal identifier. .sp The operand of a pseudo instruction can be a list of .UL arguments. Arguments can have the same type as operands, except for the type short, which is not used for arguments. Furthermore, an argument can be a string or a string representation of a signed integer, unsigned integer or floating point number. If the number of arguments is not fully determined by the pseudo instruction (e.g. a ROM pseudo can have any number of arguments), then the list is terminated by a special argument of type CEND. .DS .UL syntax em_text: {line} ; line: INSTR -- opcode OPTYPE -- operand type operand ; operand: empty | -- OPTYPE = NO SHORT | -- OPTYPE = SHORT OFFSET | -- OPTYPE = OFFSET LAB_ID | -- OPTYPE = INSTRLAB O_ID | -- OPTYPE = OBJECT P_ID | -- OPTYPE = PROCEDURE {argument} ; -- OPTYPE = LIST argument: ARGTYPE arg ; arg: empty | -- ARGTYPE = CEND OFFSET | LAB_ID | O_ID | P_ID | string | -- ARGTYPE = STRING const ; -- ARGTYPE = ICON,UCON or FCON string: LENGTH -- number of characters {CHARACTER} ; const: SIZE -- number of bytes string ; -- string representation of (un)signed -- or floating point constant .DE .NH 3 The control flow graphs .PP Each procedure can be divided into a number of basic blocks. A basic block is a piece of code with no jumps in, except at the beginning, and no jumps out, except at the end. .PP Every basic block has a set of .UL successors, which are basic blocks that can follow it immediately in the dynamic execution sequence. The .UL predecessors are the basic blocks of which this one is a successor. The successor and predecessor attributes of all basic blocks of a single procedure are said to form the .UL control .UL flow .UL graph of that procedure. .PP Another important attribute is the .UL immediate .UL dominator. A basic block B dominates a block C if every path in the graph from the procedure entry block to C goes through B. The immediate dominator of C is the closest dominator of C on any path from the entry block. (Note that the dominator relation is transitive, so the immediate dominator is well defined.) .PP A basic block also has an attribute containing the identifiers of every .UL loop that the block belongs to (see next section for loops). .DS .UL syntax control_flow_graph: {basic_block} ; basic_block: B_ID -- unique identifying number #INSTR -- number of EM instructions succ pred idom -- immediate dominator loops -- set of loops FLAGS ; -- flag bits succ: {B_ID} ; pred: {B_ID} ; idom: B_ID ; loops: {LP_ID} ; .DE The flag bits can have the values 'firm' and 'strong', which are explained below. .NH 3 The loop tables .PP Every procedure has an associated .UL loop .UL table containing information about all the loops in the procedure. Loops can be detected by a close inspection of the control flow graph. The main idea is to look for two basic blocks, B and C, for which the following holds: .IP - B is a successor of C .IP - B is a dominator of C .LP B is called the loop .UL entry and C is called the loop .UL end. Intuitively, C contains a jump backwards to the beginning of the loop (B). .PP A loop L1 is said to be .UL nested within loop L2 if all basic blocks of L1 are also part of L2. It is important to note that loops could originally be written as a well structured for -or while loop or as a messy goto loop. Hence loops may partly overlap without one being nested inside the other. The .UL nesting .UL level of a loop is the number of loops in which it is nested (so it is 0 for an outermost loop). The details of loop detection will be discussed later. .PP It is often desirable to know whether a basic block gets executed during every iteration of a loop. This leads to the following definitions: .IP - A basic block B of a loop L is said to be a \fIfirm\fR block of L if B is executed on all successive iterations of L, with the only possible exception of the last iteration. .IP - A basic block B of a loop L is said to be a \fIstrong\fR block of L if B is executed on all successive iterations of L. .LP Note that a strong block is also a firm block. If a block is part of a conditional statement, it is neither strong nor firm, as it may be skipped during some iterations (see Fig. 3.2). .DS loop if cond1 then ... -- this code will not -- result in a firm or strong block end if; ... -- strong (always executed) exit when cond2; ... -- firm (not executed on -- last iteration). end loop; Fig. 3.2 Example of firm and strong block .DE .DS .UL syntax looptable: {loop} ; loop: LP_ID -- unique identifying number LEVEL -- loop nesting level entry -- loop entry block end ; entry: B_ID ; end: B_ID ; .DE