Initial revision
This commit is contained in:
parent
2565b29679
commit
b52eecb904
284
doc/ceg/proposal.tr
Normal file
284
doc/ceg/proposal.tr
Normal file
|
@ -0,0 +1,284 @@
|
||||||
|
.TL
|
||||||
|
|
||||||
|
Code Expander
|
||||||
|
.br
|
||||||
|
(proposal)
|
||||||
|
|
||||||
|
.SH
|
||||||
|
Introduction
|
||||||
|
.LP
|
||||||
|
The \fBcode expander\fR, \fBce\fR, is a program that translates EM-code to
|
||||||
|
objectcode. The main goal is to translate very fast. \fBce\fR is an instance
|
||||||
|
of the EM_CODE(3L)-interface. During execution of \fBce\fR, \fBce\fR will build
|
||||||
|
in core a machine independent objectfile ( NEW A.OUT(5L)). With \fBcv\fR or
|
||||||
|
with routines supplied by the user the machine independent objectcode will
|
||||||
|
be converted to a machine dependent object code. \fBce\fR needs
|
||||||
|
information about the targetmachine (e.g. the opcode's). We divide the
|
||||||
|
information into two parts:
|
||||||
|
.IP
|
||||||
|
- The description in assembly instructions of EM-code instructions.
|
||||||
|
.IP
|
||||||
|
- The description in objectcode of assembly instructions.
|
||||||
|
.LP
|
||||||
|
With these two tables we can make a \fBcode expander generator\fR which
|
||||||
|
generates a \fBce\fR. It is possible to put the information in one table
|
||||||
|
but that will probably introduce (propable) more bugs in the table. So we
|
||||||
|
divide and conquer. With this approach it is also possible to generate
|
||||||
|
assembly code ( rather yhan objectcode), wich is useful for debugging.
|
||||||
|
There is of course a link between the two tables, the link
|
||||||
|
consist of a restriction on the assembly format. Every assembly
|
||||||
|
instruction must have the following format:
|
||||||
|
.sp
|
||||||
|
INSTR ::= LABEL : MNEMONIC [ OPERAND ( "," OPERAND)* ]
|
||||||
|
.sp
|
||||||
|
.LP
|
||||||
|
\fBCeg\fR uses the following algorithm:
|
||||||
|
.IP \0\0a)
|
||||||
|
The assembly table will be converted to a (C-)routine assemble().
|
||||||
|
assemble() gets as argument a string, the assembler instruction,
|
||||||
|
and can use the MNEMONIC to execute the corresponding action in the
|
||||||
|
assembly table.
|
||||||
|
.IP \0\0b)
|
||||||
|
The routine assemble() can now be used to convert the EM-code table to
|
||||||
|
a set of C-routines, wich together form an instance of the
|
||||||
|
EM_CODE(3L).
|
||||||
|
.SH
|
||||||
|
The EM-instruction table
|
||||||
|
.LP
|
||||||
|
We use the following grammar:
|
||||||
|
.sp
|
||||||
|
.TS
|
||||||
|
center box ;
|
||||||
|
l.
|
||||||
|
TABLE ::= (ROW)*
|
||||||
|
ROW ::= C_instr ( SPECIAL | SIMPLE)
|
||||||
|
SPECIAL ::= ( CONDITION SIMPLE)+ 'default' SIMPLE
|
||||||
|
SIMPLE ::= '==>' ACTIONLIST | '::=' ACTIONLIST
|
||||||
|
ACTIONLIST ::= [ ACTION ( ';' ACTION)* ] '.'
|
||||||
|
ACTION ::= function-call | assembly-instruction
|
||||||
|
.TE
|
||||||
|
.LP
|
||||||
|
An example for the 8086:
|
||||||
|
.LP
|
||||||
|
.DS
|
||||||
|
C_lxl
|
||||||
|
$arg1 == 0 ==> "push bp".
|
||||||
|
$arg1 == 1 ==> "push EM_BSIZE(bp)".
|
||||||
|
default ==> "mov cx, $arg1";
|
||||||
|
"mov si, bp";
|
||||||
|
"1: mov si, EM_BSIZE(si);
|
||||||
|
"loop 1b"
|
||||||
|
"push si".
|
||||||
|
.DE
|
||||||
|
.sp
|
||||||
|
Some remarks:
|
||||||
|
.sp
|
||||||
|
* The C_instr is a function indentifier in the EM_CODE(3L)-interface.
|
||||||
|
.LP
|
||||||
|
* CONDITION is a "boolean" C-expression.
|
||||||
|
.LP
|
||||||
|
* The arguments of an EM-instruction can be used in CONDITION and in assembly
|
||||||
|
instructions. They are referred by $arg\fIi\fR. \fBceg\fR modifies the
|
||||||
|
arguments as follows:
|
||||||
|
.IP \0\0-
|
||||||
|
For local variables at positive offsets it increases this offset by EM_BSIZE
|
||||||
|
.IP \0\0-
|
||||||
|
It makes names en labels unique. The user must supply the formats (see mach.h).
|
||||||
|
.LP
|
||||||
|
* function-call is allowed to implement e.g. push/pop optimization.
|
||||||
|
For example:
|
||||||
|
.LP
|
||||||
|
.DS
|
||||||
|
C_adi
|
||||||
|
$arg1 == 2 ==> combine( "pop ax");
|
||||||
|
combine( "pop bx");
|
||||||
|
"add ax, bx";
|
||||||
|
save( "push ax").
|
||||||
|
default ==> arg_error( "C_adi", $arg1).
|
||||||
|
.DE
|
||||||
|
.LP
|
||||||
|
* The C-functions called in the EM-instructions table have to use the routine
|
||||||
|
assemble()/gen?(). "assembler-instr" is in fact assemble( "assembler-instr").
|
||||||
|
.LP
|
||||||
|
* \fBceg\fR takes care not only about the conversions of arguments but also
|
||||||
|
about
|
||||||
|
changes between segments. There are situation when one doesn't want
|
||||||
|
conversion of arguments. This can be done by using ::= in stead of ==>.
|
||||||
|
This is usefull when two C_instr are equivalent. For example:
|
||||||
|
.IP
|
||||||
|
C_slu ::= C_sli( $arg1)
|
||||||
|
.LP
|
||||||
|
* There are EM-CODE instructions wich are machine independent (e.g. C_open()).
|
||||||
|
For these EM_CODE instructions \fBceg\fR will generate \fIdefault\fR-
|
||||||
|
instructions. There is one exception: in the case of C_pro() the tablewriter
|
||||||
|
has to supply a function prolog().
|
||||||
|
.LP
|
||||||
|
* Also the EM-pseudoinstructions C_bss_\fIcstp\fR(), C_hol_\fIcstp\fR(),
|
||||||
|
C_con_\fIcstp\fR() and C_rom_\fIcstp\fR can be translated automaticly.
|
||||||
|
\fBceg\fR only has to know how to interpretate string-constants:
|
||||||
|
.DS
|
||||||
|
\&..icon $arg2 == 1 ==> gen1( (char) atoi( $arg1))
|
||||||
|
$arg2 == 2 ==> gen2( atoi( $arg1))
|
||||||
|
$arg2 == 4 ==> gen4( atol( $arg1))
|
||||||
|
\&..ucon $arg2 == 1 ==> gen1( (char) atoi( $arg1))
|
||||||
|
$arg2 == 2 ==> gen2( atoi( $arg1))
|
||||||
|
$arg2 == 4 ==> gen4( atol( $arg1))
|
||||||
|
\&..fcon ::= not_implemented( "..fcon")
|
||||||
|
.DE
|
||||||
|
.LP
|
||||||
|
* Still, life can be made easier for the tablewriter; For the routines wich
|
||||||
|
he/she didn't implement \fBceg\fR will generate a default instruction wich
|
||||||
|
generates an error-message. \fBceg\fR seems to generate :
|
||||||
|
.IP
|
||||||
|
C_xxx ::= not_implemented( "C_xxx")
|
||||||
|
.SH
|
||||||
|
The assembly table
|
||||||
|
.LP
|
||||||
|
How to map assembly on objectcode.
|
||||||
|
.LP
|
||||||
|
Each row in the table consists of two fields, one field for the assembly
|
||||||
|
instruction, the other field for the corresponding objectcode. The tablewriter
|
||||||
|
can use the following primitives to generate code for the machine
|
||||||
|
instructions :
|
||||||
|
.IP "\0\0gen1( b)\0\0:" 17
|
||||||
|
generates one byte in de machine independent objectfile.
|
||||||
|
.IP "\0\0gen2( w)\0\0:" 17
|
||||||
|
generates one word ( = two bytes), the table writer can change the byte
|
||||||
|
order by setting the flag BYTES_REVERSED.
|
||||||
|
.IP "\0\0gen4( l)\0\0:" 17
|
||||||
|
generates two words ( = four bytes), the table writer can change the word
|
||||||
|
order by setting the flag WORDS_REVERSED.
|
||||||
|
.IP "\0\0reloc( n, o, r)\0\0:" 17
|
||||||
|
generates relocation information for a label ( = name + offset +
|
||||||
|
relocationtype).
|
||||||
|
.LP
|
||||||
|
Besides these primitives the table writer may use his self written
|
||||||
|
C-functions. This allows the table writer e.g. to write functions to set
|
||||||
|
bitfields within a byte.
|
||||||
|
.LP
|
||||||
|
There are more or less two methods to encode the assembly instructions:
|
||||||
|
.IP \0\0a)
|
||||||
|
MNEMONIC and OPERAND('s) are encoded independently of each other. This can be
|
||||||
|
done when the target machine has an orthogonal instruction set (e.g. pdp-11).
|
||||||
|
.IP \0\0b)
|
||||||
|
MNEMONIC and OPERAND('s) together determine the opcode. In this case the
|
||||||
|
assembler often uses overloading: one MNEMONIC is used for several
|
||||||
|
different machine-instructions. For example : (8086)
|
||||||
|
.br
|
||||||
|
mov ax, bx
|
||||||
|
.br
|
||||||
|
mov ax, variable
|
||||||
|
.br
|
||||||
|
These instructions have different opcodes.
|
||||||
|
.LP
|
||||||
|
As the transformation MNEMONIC-OPCODE is not one to
|
||||||
|
one the table writer must be allowed to put restrictions on the operands.
|
||||||
|
This can be done with type declarations. For example:
|
||||||
|
.LP
|
||||||
|
.DS
|
||||||
|
mov dst:REG, src:MEM ==>
|
||||||
|
gen1( 0x8b);
|
||||||
|
modRM( op2.reg, op1);
|
||||||
|
.DE
|
||||||
|
.DS
|
||||||
|
mov dst:REG, src:REG ==>
|
||||||
|
gen1( 0x89);
|
||||||
|
modRM( op2.reg, op1);
|
||||||
|
.DE
|
||||||
|
.LP
|
||||||
|
modRM() is a function written by the tablewriter and is used to encode
|
||||||
|
the operands. This frees the table writer of endless typing.
|
||||||
|
.LP
|
||||||
|
The table writer has to do the "typechecking" by himself. But typechecking
|
||||||
|
is almost the same as operand decoding. So it's more efficient to do this
|
||||||
|
in one function. We now have all the tools to describe the function
|
||||||
|
assemble().
|
||||||
|
.IP
|
||||||
|
assemble() first calls the function
|
||||||
|
decode_operand() ( by the table writer written), with two arguments: a
|
||||||
|
string ( the operand) and a
|
||||||
|
pointer to a struct. The struct is declared by the table writer and must
|
||||||
|
consist of at least a field called type. ( the other fields in the struct can
|
||||||
|
be used to remember information about the decoded operand.) Now assemble()
|
||||||
|
fires a row wich is selected by mapping the MNEMONIC and the type of the
|
||||||
|
operands.
|
||||||
|
.br
|
||||||
|
In the second field of a row there may be references to other
|
||||||
|
fields in the struct (e.g. op2.reg in the example above).
|
||||||
|
.LP
|
||||||
|
We ignored one problem. It's possible when the operands are encoded, that
|
||||||
|
not everything is known. For example $arg\fIi\fR arguments in the
|
||||||
|
EM-instruction table get their value at runtime. This problem is solved by
|
||||||
|
introducing a function eval(). eval() has a string as argument and returns
|
||||||
|
an arith. The string consists of constants and/or $arg\fIi\fR's and the value
|
||||||
|
returned by eval() is the value of the string. To encode the $arg\fIi\fR's
|
||||||
|
in as few bytes as possible the table writer can use the statements %if,
|
||||||
|
%else and %endif. They can be used in the same manner as #if, #else and
|
||||||
|
#endif in C and result in a runtime test. An example :
|
||||||
|
.LP
|
||||||
|
.DS
|
||||||
|
-- Some rows of the assembly table
|
||||||
|
|
||||||
|
mov dst:REG, src:DATA ==>
|
||||||
|
%if sfit( eval( src), 8) /* does the immediate-data fit in 1 byte? */
|
||||||
|
R53( 0x16 , op1.reg);
|
||||||
|
gen1( eval( src));
|
||||||
|
%else
|
||||||
|
R53( 0x17 , op1.reg);
|
||||||
|
gen2( eval( src));
|
||||||
|
%endif
|
||||||
|
.LD
|
||||||
|
|
||||||
|
mov dst:REG, src:REG ==>
|
||||||
|
gen1( 0x8b);
|
||||||
|
modRM( op1.reg, op2);
|
||||||
|
|
||||||
|
.DE
|
||||||
|
.DS
|
||||||
|
-- The corresponding part in the function assemble() :
|
||||||
|
|
||||||
|
case MNEM_mov :
|
||||||
|
decode_operand( arg1, &op1);
|
||||||
|
decode_operand( arg2, &op2);
|
||||||
|
if ( REG( op1.type) && DATA( op2.type)) {
|
||||||
|
printf( "if ( sfit( %s, 8)) {\\\\n", eval( src));
|
||||||
|
R53( 0x16 , op1.reg);
|
||||||
|
printf( "gen1( %s)\\\\n", eval( arg2));
|
||||||
|
printf( "}\\\\nelse {\\\\n");
|
||||||
|
R53( 0x17 , op1.reg);
|
||||||
|
printf( "gen2( %s)\\\\n", eval( arg2));
|
||||||
|
printf( "}\\\\n");
|
||||||
|
}
|
||||||
|
else if ( REG( op1.type) && REG( op2.type)) {
|
||||||
|
gen1( 0x8b);
|
||||||
|
modRM( op1.reg, op2);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
.DE
|
||||||
|
.DS
|
||||||
|
-- Some rows of the right part of the EM-instruction table are translated
|
||||||
|
-- in the following C-functions.
|
||||||
|
|
||||||
|
"mov ax, $arg1" ==>
|
||||||
|
if ( sfit( w, 8)) { /* w is the actual argument of C_xxx( w) */
|
||||||
|
gen1( 176); /* R53() */
|
||||||
|
gen1( w);
|
||||||
|
}
|
||||||
|
else {
|
||||||
|
gen1( 184);
|
||||||
|
gen2( w);
|
||||||
|
}
|
||||||
|
.LD
|
||||||
|
|
||||||
|
"mov ax, bx" ==>
|
||||||
|
gen1( 138);
|
||||||
|
gen1( 99); /* modRM() */
|
||||||
|
.DE
|
||||||
|
.SH
|
||||||
|
Restrictions
|
||||||
|
.LP
|
||||||
|
.IP \0\01)
|
||||||
|
The EM-instructions C_exc() is not implemented.
|
||||||
|
.IP \0\03)
|
||||||
|
All messages are ignored.
|
276
doc/ceg/prototype.tr
Normal file
276
doc/ceg/prototype.tr
Normal file
|
@ -0,0 +1,276 @@
|
||||||
|
.TL
|
||||||
|
A prototype Code expander
|
||||||
|
.NH
|
||||||
|
Introduction
|
||||||
|
.PP
|
||||||
|
A program to be compiled with ACK is first fed into the preprocessor.
|
||||||
|
The output of the preprocessor goes into the appropiate front end,
|
||||||
|
whose job it is to produce EM. The EM code generated is
|
||||||
|
fed into the peephole optimizer, wich scans it with a window of few
|
||||||
|
instructions, replacing certain inefficient code sequences by better
|
||||||
|
ones. Following the peephole optimizer follows a backend wich produces
|
||||||
|
good assembly code. The assembly code goes into the assembler and the objectcode
|
||||||
|
then goes into the loader/linker, the final component in the pipeline.
|
||||||
|
.PP
|
||||||
|
For various applications this scheme is too slow. For example for testing
|
||||||
|
programs; In this case the program has to be translated fast and the
|
||||||
|
runtime of the objectcode may be slower. A solution is to build a code
|
||||||
|
expander ( \fBce\fR) wich translates EM code to objectcode. Of course this
|
||||||
|
has to
|
||||||
|
be done automaticly by a code expander generator, but to get some feeling
|
||||||
|
for the problem we started out to build prototypes.
|
||||||
|
We built two types of ce's. One wich tranlated EM to assembly, one
|
||||||
|
wich translated EM to objectcode.
|
||||||
|
.NH
|
||||||
|
EM to assembly
|
||||||
|
.PP
|
||||||
|
We made one for the 8086 and one for the vax4. These ce's are instances of the
|
||||||
|
EM_CODE(3L)-interface and produce for a single EM instruction a set
|
||||||
|
of assembly instruction wich are semantic equivalent.
|
||||||
|
We implemented in the 8086-ce push/pop-optimalization.
|
||||||
|
.NH
|
||||||
|
EM to objectcode
|
||||||
|
.PP
|
||||||
|
Instead of producing assembly code we tried to produce vax4-objectcode.
|
||||||
|
During execution of ce, ce builds in core a machine independent
|
||||||
|
objectfile ( NEW A.OUT(5L)) and just before dumping the tables this
|
||||||
|
objectfile is converted to a Berkly 4.2BSD a.out-file. We build two versions;
|
||||||
|
One with static memory allocation and one with dynamic memory allocation.
|
||||||
|
If the first one runs out of memory it will give an error message and stop,
|
||||||
|
the second one will allocate more memory and proceed with producing
|
||||||
|
objectcode.
|
||||||
|
.PP
|
||||||
|
The C-frontend calls the EM_CODE-interface. So after linking the frontend
|
||||||
|
and the ce we have a pipeline in a program saving a lot of i/o.
|
||||||
|
It is interesting to compare this C-compiler ( called fcemcom) with "cc -c".
|
||||||
|
fcemcom1 (the dynamic variant of fcemcom) is tuned in such a way, that
|
||||||
|
alloc() won't be called.
|
||||||
|
.NH 2
|
||||||
|
Compile time
|
||||||
|
.PP
|
||||||
|
fac.c is a small program that produces n! ( see below). foo.c is small program
|
||||||
|
that loops a lot.
|
||||||
|
.TS
|
||||||
|
center, box, tab(:);
|
||||||
|
c | c | c | c | c | c
|
||||||
|
c | c | n | n | n | n.
|
||||||
|
compiler : program : real : user : sys : object size
|
||||||
|
=
|
||||||
|
fcemcom : sort.c : 31.0 : 17.5 : 1.8 : 23824
|
||||||
|
fcemcom1 : : 59.0 : 21.2 : 3.3 :
|
||||||
|
cc -c : : 50.0 : 38.0 : 3.5 : 6788
|
||||||
|
_
|
||||||
|
fcemcom : ed.c : 37.0 : 23.6 : 2.3 : 41744
|
||||||
|
fcemcom1 : : 1.16.0 : 28.3 : 4.6 :
|
||||||
|
cc -c : : 1.19.0 : 54.8 : 4.3 : 11108
|
||||||
|
_
|
||||||
|
fcemcom : cp.c : 4.0 : 2.4 : 0.8 : 4652
|
||||||
|
fcemcom1 : : 9.0 : 3.0 : 1.0 :
|
||||||
|
cc -c : : 8.0 : 5.2 : 1.6 : 1048
|
||||||
|
_
|
||||||
|
fcemcom : uniq.c : 5.0 : 2.5 : 0.8 : 5568
|
||||||
|
fcemcom1 : : 9.0 : 2.9 : 0.8 :
|
||||||
|
cc -c : : 13.0 : 5.4 : 2.0 : 3008
|
||||||
|
_
|
||||||
|
fcemcom : btlgrep.c : 24.0 : 7.2 : 1.4 : 12968
|
||||||
|
fcemcom1 : : 23.0 : 8.1 : 1.2 :
|
||||||
|
cc -c : : 1.20.0 : 15.3 : 3.8 : 2392
|
||||||
|
_
|
||||||
|
fcemcom : fac.c : 1.0 : 0.1 : 0.5 : 216
|
||||||
|
fecmcom1 : : 2.0 : 0.2 : 0.5 :
|
||||||
|
cc -c : : 3.0 : 0.7 : 1.3 : 92
|
||||||
|
_
|
||||||
|
fcemcom : foo.c : 4.0 : 0.2 : 0.5 : 272
|
||||||
|
fcemcom1 : : 11.0 : 0.3 : 0.5 :
|
||||||
|
cc -c : : 7.0 : 0.8 : 1.6 : 108
|
||||||
|
.TE
|
||||||
|
.NH 2
|
||||||
|
Run time
|
||||||
|
.LP
|
||||||
|
Is the runtime very bad?
|
||||||
|
.TS
|
||||||
|
tab(:), box, center;
|
||||||
|
c | c | c | c | c
|
||||||
|
c | c | n | n | n.
|
||||||
|
compiler : program : real : user : system
|
||||||
|
=
|
||||||
|
fcem : sort.c : 22.0 : 17.5 : 1.5
|
||||||
|
cc : : 5.0 : 2.4 : 1.1
|
||||||
|
_
|
||||||
|
fcem : btlgrep.c : 1.58.0 : 27.2 : 4.2
|
||||||
|
cc : : 12.0 : 3.6 : 1.1
|
||||||
|
_
|
||||||
|
fcem : foo.c : 1.0 : 0.7 : 0.1
|
||||||
|
cc : : 1.0 : 0.4 : 0.1
|
||||||
|
_
|
||||||
|
fcem : uniq.c : 2.0 : 0.5 : 0.3
|
||||||
|
cc : : 1.0 : 0.1 : 0.2
|
||||||
|
.TE
|
||||||
|
.NH 2
|
||||||
|
quality object code
|
||||||
|
.LP
|
||||||
|
The runtime is very bad so its interesting to have look at the code which is
|
||||||
|
produced by fcemcom and by cc -c. I took a program which computes recursively
|
||||||
|
n!.
|
||||||
|
.DS
|
||||||
|
long fac();
|
||||||
|
|
||||||
|
main()
|
||||||
|
{
|
||||||
|
int n;
|
||||||
|
|
||||||
|
scanf( "%D", &n);
|
||||||
|
printf( "fac is %D\\\\n", fac( n));
|
||||||
|
}
|
||||||
|
|
||||||
|
long fac( n)
|
||||||
|
int n;
|
||||||
|
{
|
||||||
|
if ( n == 0)
|
||||||
|
return( 1);
|
||||||
|
else
|
||||||
|
return( n * fac( n-1));
|
||||||
|
}
|
||||||
|
.DE
|
||||||
|
.br
|
||||||
|
.br
|
||||||
|
.br
|
||||||
|
.br
|
||||||
|
.LP
|
||||||
|
"cc -c fac.c" produces :
|
||||||
|
.DS
|
||||||
|
fac: tstl 4(ap)
|
||||||
|
bnequ 7f
|
||||||
|
movl $1, r0
|
||||||
|
ret
|
||||||
|
7f: subl3 $1, 4(ap), r0
|
||||||
|
pushl r0
|
||||||
|
call $1, fac
|
||||||
|
movl r0, -4(fp)
|
||||||
|
mull3 -4(fp), 4(ap), r0
|
||||||
|
ret
|
||||||
|
.DE
|
||||||
|
.br
|
||||||
|
.br
|
||||||
|
.LP
|
||||||
|
"fcem fac.c fac.o" produces :
|
||||||
|
.DS
|
||||||
|
_fac: 0
|
||||||
|
42: jmp be
|
||||||
|
48: pushl 4(ap)
|
||||||
|
4e: pushl $0
|
||||||
|
54: subl2 (sp)+,(sp)
|
||||||
|
57: tstl (sp)+
|
||||||
|
59: bnequ 61
|
||||||
|
5b: jmp 67
|
||||||
|
61: jmp 79
|
||||||
|
67: pushl $1
|
||||||
|
6d: jmp ba
|
||||||
|
73: jmp b9
|
||||||
|
79: pushl 4(ap)
|
||||||
|
7f: pushl $1
|
||||||
|
85: subl2 (sp)+,(sp)
|
||||||
|
88: calls $0,_fac
|
||||||
|
8f: addl2 $4,sp
|
||||||
|
96: pushl r0
|
||||||
|
98: pushl 4(ap)
|
||||||
|
9e: pushl $4
|
||||||
|
a4: pushl $4
|
||||||
|
aa: jsb .cii
|
||||||
|
b0: mull2 (sp)+,(sp)
|
||||||
|
b3: jmp ba
|
||||||
|
b9: ret
|
||||||
|
ba: movl (sp)+,r0
|
||||||
|
bd: ret
|
||||||
|
be: jmp 48
|
||||||
|
.DE
|
||||||
|
.NH 1
|
||||||
|
Conclusions
|
||||||
|
.PP
|
||||||
|
comparing "cc -c" with "fcemcom"
|
||||||
|
.LP
|
||||||
|
.TS
|
||||||
|
center, box, tab(:);
|
||||||
|
c | c s | c | c s
|
||||||
|
^ | c s | ^ | c s
|
||||||
|
^ | c | c | ^ | c | c
|
||||||
|
l | n | n | n | n | n.
|
||||||
|
program : compile time : object size : runtime
|
||||||
|
:_::_
|
||||||
|
: user : sys :: user : sys
|
||||||
|
=
|
||||||
|
sort.c : 0.47 : 0.5 : 3.5 : 7.3 : 1.4
|
||||||
|
_
|
||||||
|
ed.c : 0.46 : 0.5 : 3.8 : : :
|
||||||
|
_
|
||||||
|
cp.c : 0.46 : 0.5 : 4.4 : : :
|
||||||
|
_
|
||||||
|
uniq.c : 0.46 : 0.4 : 1.8 : : :
|
||||||
|
_
|
||||||
|
btlgrep.c : 0.47 : 0.3 : 5.4 : 7.5 : 3.8
|
||||||
|
_
|
||||||
|
fac.c : 0.14 : 0.4 : 2.3 : 1.8 : 1.0
|
||||||
|
_
|
||||||
|
foo.c : 0.25 : 0.3 : 2.5 : 5.0 : 1.5
|
||||||
|
.TE
|
||||||
|
.PP
|
||||||
|
The results for fcemcom1 are almost identical; The only thing that changes
|
||||||
|
is that fcemcom1 is 1.2 slower than fcemcom. ( compile time) This is due to
|
||||||
|
to an another datastructure . In the static version we use huge array's for
|
||||||
|
the text- and
|
||||||
|
data-segment, the relocation information, the symboltable and stringarea.
|
||||||
|
In the dynamic version we use linked lists, wich makes it expensive to get
|
||||||
|
and to put a byte on a abritrary memory location. So it is probably better
|
||||||
|
to use realloc(), because in the most cases there will be enough memory.
|
||||||
|
.PP
|
||||||
|
The quality of the objectcode is very bad. The reason is that the frontend
|
||||||
|
generates bad code and expects the peephole-optimizer to improve the code.
|
||||||
|
This is also one of the main reasons that the runtime is very bad.
|
||||||
|
(e.g. the expensive "cii" with arguments 4 and 4 could be deleted.)
|
||||||
|
So its seems a good
|
||||||
|
idea to put a new peephole-optimizer between the frontend and the ce.
|
||||||
|
.PP
|
||||||
|
Using the peephole optimizer the ce would produce :
|
||||||
|
.DS
|
||||||
|
_fac: 0
|
||||||
|
pushl 4(ap)
|
||||||
|
tstl (sp)+
|
||||||
|
beqlu 1f
|
||||||
|
jmp 3f
|
||||||
|
1 : pushl $1
|
||||||
|
jmp 2f
|
||||||
|
3 : pushl 4(ap)
|
||||||
|
decl (sp)
|
||||||
|
calls $0,_fac
|
||||||
|
addl2 $4,sp
|
||||||
|
pushl r0
|
||||||
|
pushl 4(ap)
|
||||||
|
mull2 (sp)+,(sp)
|
||||||
|
movl (sp)+,r0
|
||||||
|
2 : ret
|
||||||
|
.DE
|
||||||
|
.PP
|
||||||
|
Bruce McKenzy already implemented it and made some improvements in the
|
||||||
|
source code of the ce. The compile-time is two to two and a half times better
|
||||||
|
and the
|
||||||
|
size of the objectcode is two to three times bigger.(comparing with "cc -c")
|
||||||
|
Still we could do better.
|
||||||
|
.PP
|
||||||
|
Using peephole- and push/pop-optimization ce could produce :
|
||||||
|
.DS
|
||||||
|
_fac: 0
|
||||||
|
tstl 4(ap)
|
||||||
|
beqlu 1f
|
||||||
|
jmp 2f
|
||||||
|
1 : pushl $1
|
||||||
|
jmp 3f
|
||||||
|
2 : decl 4(ap)
|
||||||
|
calls $0,_fac
|
||||||
|
addl2 $4,sp
|
||||||
|
mull3 4(ap), r0, -(sp)
|
||||||
|
movl (sp)+, r0
|
||||||
|
3 : ret
|
||||||
|
.DE
|
||||||
|
.PP
|
||||||
|
prof doesn't cooperate, so no profile information.
|
||||||
|
.PP
|
Loading…
Reference in a new issue