Added
This commit is contained in:
parent
63c9fea5c2
commit
fb51183da2
29 changed files with 2085 additions and 0 deletions
14
doc/sparc/.distr
Normal file
14
doc/sparc/.distr
Normal file
|
@ -0,0 +1,14 @@
|
||||||
|
1
|
||||||
|
2
|
||||||
|
3
|
||||||
|
4
|
||||||
|
5
|
||||||
|
A
|
||||||
|
B
|
||||||
|
init
|
||||||
|
intro
|
||||||
|
note_on_reg_wins
|
||||||
|
refs
|
||||||
|
timing
|
||||||
|
title
|
||||||
|
Makefile
|
53
doc/sparc/1
Normal file
53
doc/sparc/1
Normal file
|
@ -0,0 +1,53 @@
|
||||||
|
.so init
|
||||||
|
.NH
|
||||||
|
INTRODUCTION
|
||||||
|
.NH 2
|
||||||
|
Why an EM backend for SPARC processors?
|
||||||
|
.PP
|
||||||
|
With the introduction of SPARC-based computers like the Sun-4, a
|
||||||
|
whole new range of fast computers became readily available to the general
|
||||||
|
public. The power of large mainframes had been captured into a small
|
||||||
|
desk-top computer at only a fraction of the cost.
|
||||||
|
.PP
|
||||||
|
In the older days, a new computer used to be very hard to integrate into
|
||||||
|
the existing environment, but due to standardization in the software world
|
||||||
|
incompatibility in hardware no longer means incompatibility in software.
|
||||||
|
Programs that are written for computer A can often be run on computer B
|
||||||
|
without major modifications. Unfortunately this is not true for all software.
|
||||||
|
.PP
|
||||||
|
There will always be programs that rely on the specific
|
||||||
|
hardware of a certain computer for many different reasons. They
|
||||||
|
can be categorized as:
|
||||||
|
.IP -
|
||||||
|
poorly written programs
|
||||||
|
.IP -
|
||||||
|
programs to directly control hardware (device drivers)
|
||||||
|
.IP -
|
||||||
|
code that requires efficiency (time-critical I/O drivers)
|
||||||
|
.IP -
|
||||||
|
programs to generate code to run on the hardware (compilers)
|
||||||
|
.LP
|
||||||
|
This project for instance, the design and implementation of an EM backend
|
||||||
|
for SPARC processors, comes in the last category.
|
||||||
|
.PP
|
||||||
|
We have designed and implemented an algorithm to convert EM programs to code
|
||||||
|
that will run directly on the SPARC hardware. Henceforth, both the algorithm
|
||||||
|
and the implementation will be referred to as the EM-to-SPARC backend,
|
||||||
|
or simply: the backend.
|
||||||
|
.NH 2
|
||||||
|
Why has nobody done this before?
|
||||||
|
.PP
|
||||||
|
Since EM was designed around 1981 and even SPARC has been around for some
|
||||||
|
years now, one may wonder why nobody has ever written an EM to SPARC backend
|
||||||
|
before. The reason is twofold. In the first place, there are some
|
||||||
|
non-trivial problems to be solved in the design phase, and secondly,
|
||||||
|
the SPARC-design combined with the lack of documentation, would surely
|
||||||
|
cost a lot of blood, sweat and tears. The absence of
|
||||||
|
clues to any of the design problems, combined with the \(em at first
|
||||||
|
glance \(em inhuman
|
||||||
|
SPARC instruction set did not make this a very attractive project.
|
||||||
|
.PP
|
||||||
|
On the other hand, these were exactly the reasons which made us take on
|
||||||
|
this particular project: it would require design skills, as well as some
|
||||||
|
hard work; a golden combination for a successful project.
|
||||||
|
.bp
|
109
doc/sparc/2
Normal file
109
doc/sparc/2
Normal file
|
@ -0,0 +1,109 @@
|
||||||
|
.so init
|
||||||
|
.nr H1 1
|
||||||
|
.NH
|
||||||
|
CLOSE-UP LOOK
|
||||||
|
.NH 2
|
||||||
|
What is EM?
|
||||||
|
.PP
|
||||||
|
As the abstract of the IR-81 rapport on EM
|
||||||
|
.[ [
|
||||||
|
description of a machine architecture
|
||||||
|
.]]
|
||||||
|
says: \*(OQEM is a family
|
||||||
|
of intermediate languages designed for producing portable compilers.\*(CQ
|
||||||
|
Because EM is to be used on a wide range of languages and processors,
|
||||||
|
the instruction set is kept simple enough to allow easy translation to,
|
||||||
|
or interpretation on, almost any processor. Yet it is also powerful enough
|
||||||
|
to accommodate easy translation from almost any block-structured language.
|
||||||
|
.PP
|
||||||
|
Even though EM was designed in the early 1980s, it
|
||||||
|
is based on
|
||||||
|
.\" already shows strong signs of being influenced by
|
||||||
|
the (then innovative) RISC architecture. All instructions
|
||||||
|
have 0 or 1 operands, there are no fancy addressing modes as in the
|
||||||
|
68020's\*(Si move.w a3(_array,d3.w*2), -(sp)\*(So, no explicit registers,
|
||||||
|
although instructions for higher languages
|
||||||
|
such as array-operations, multiway branches (case) and
|
||||||
|
floating point operations are provided.
|
||||||
|
.PP
|
||||||
|
To fully understand the discussion in the following chapters,
|
||||||
|
the reader should at least have some knowledge of EM.
|
||||||
|
.NH 2
|
||||||
|
What is SPARC?
|
||||||
|
.PP
|
||||||
|
According to Sun's RISC tutorial: \*(OQSun Microsystems has designed a RISC
|
||||||
|
architecture, called SPARC, and has implemented that architecture with
|
||||||
|
the Sun-4 family of supercomputing workstations and servers. SPARC stands
|
||||||
|
for Scalable Processor ARChitecture, emphasizing its applicability to
|
||||||
|
large as well as small machines.\*(CQ
|
||||||
|
.PP
|
||||||
|
In sharp contrast to EM, SPARC does have
|
||||||
|
explicit registers (31 integer and 32 floating point, all of which
|
||||||
|
are 32 bits wide) and
|
||||||
|
does not support any high level language operations: it does not even have
|
||||||
|
multiplication or division instructions. Because the SPARC design is
|
||||||
|
very straightforward, all instructions could be hard-coded (no microcode
|
||||||
|
involved) to
|
||||||
|
provided extremely high performance. All register-to-register operations
|
||||||
|
require exactly one clock cycle, and all register-to-memory and
|
||||||
|
memory-to-register operations require two clock cycles, one to retrieve
|
||||||
|
the instruction and one to access external memory. At a clock speed of
|
||||||
|
over 20 MHz this means that you can achieve well over 10 VAX MIPS:
|
||||||
|
more than 4 times the speed of a 15 MHz 68020 used in the Sun3/50.
|
||||||
|
.PP
|
||||||
|
As above, the reader should also have some general knowledge about
|
||||||
|
the SPARC processer to be able to understand the following chapters.
|
||||||
|
.NH 2
|
||||||
|
What exactly is a (fast) backend?
|
||||||
|
.PP
|
||||||
|
To put in the simplest of ways: a (fast) backend is a set of routines to
|
||||||
|
translate EM code to code that will run 'on the metal' (for example the SPARC
|
||||||
|
processor). The distinction between full-fledged backends (code generators)
|
||||||
|
.[ [
|
||||||
|
The table driven code generator
|
||||||
|
.]]
|
||||||
|
and fast backends (code expanders)
|
||||||
|
.[ [
|
||||||
|
The Code Expander Generator
|
||||||
|
.]]
|
||||||
|
is related to
|
||||||
|
the compilation-time vs. run-time trade off. Code generators generate
|
||||||
|
efficient code and code expanders generate code very efficient.
|
||||||
|
For details about code expanders see also
|
||||||
|
.[ [
|
||||||
|
The design of very fast portable compilers
|
||||||
|
.]].
|
||||||
|
.PP
|
||||||
|
The reasons for us to implement a code expander are numerous: Our first reason to
|
||||||
|
implement a code expander, rather than a code generator was that implementing a
|
||||||
|
code expander would be hard enough already. Code generators only give
|
||||||
|
more problems and there were already enough problems to be solved. Secondly,
|
||||||
|
we knew we would never be able to compete with original SPARC compilers due
|
||||||
|
to loss of information in the frontends (see also chapter 5). By implementing
|
||||||
|
a code expander we might be able to outrun the existing compilers on a
|
||||||
|
completely different terrain: compile speed.
|
||||||
|
.PP
|
||||||
|
The third 'reason' to implement a code expander lies a little deeper and was
|
||||||
|
not discovered until we had actually started the implementation... It was only
|
||||||
|
then that we found out that for certain architectures, such as the SPARC,
|
||||||
|
the idea behind the code-expander is not necessarily inferior to that
|
||||||
|
behind a code-generator. It seems that for highly orthogonal instruction
|
||||||
|
sets it is possible to generate near optimal code without using the
|
||||||
|
code-expander. We have to say, however, that this is only true for our
|
||||||
|
optimized version of the code-expander. With the original code-expander
|
||||||
|
it would not have been possible to generate near-optimal code for the
|
||||||
|
SPARC processor.
|
||||||
|
.NH 2
|
||||||
|
So, what are the main differences between EM and SPARC?
|
||||||
|
.PP
|
||||||
|
The main
|
||||||
|
difference between EM and SPARC is the stack versus register orientation.
|
||||||
|
The other differences, such as the presence of high level language
|
||||||
|
operations in EM, can easily be overcome by subroutines,
|
||||||
|
or small pieces of in-line SPARC code.
|
||||||
|
The design-part of this project mostly concentrates on
|
||||||
|
building a bridge between EM's stack and SPARC's registers.
|
||||||
|
.PP
|
||||||
|
In the next chapter we will make a list of all our design problems which
|
||||||
|
will then be discussed in chapter 4.
|
||||||
|
.bp
|
82
doc/sparc/3
Normal file
82
doc/sparc/3
Normal file
|
@ -0,0 +1,82 @@
|
||||||
|
.so init
|
||||||
|
.nr H1 2
|
||||||
|
.NH
|
||||||
|
PROBLEMS
|
||||||
|
.NH 2
|
||||||
|
Maintain SPARC speed
|
||||||
|
.PP
|
||||||
|
If we want to generate SPARC code, we should try to generate efficient code
|
||||||
|
as fast as possible. It would be quite embarrassing to find out that the
|
||||||
|
same program would run faster on a Motorola 68020 than on a SPARC processor,
|
||||||
|
when both operate at the same clock frequency.
|
||||||
|
Looking at some code generated by Sun's C-compiler and optimizing assembler,
|
||||||
|
we can spot a few remarkable characteristics of the generated SPARC code:
|
||||||
|
.IP -
|
||||||
|
There are almost no memory references
|
||||||
|
.IP -
|
||||||
|
Parameters to functions are passed through registers.
|
||||||
|
.IP -
|
||||||
|
Almost all delay slots\(dg
|
||||||
|
.FS
|
||||||
|
\(dg For details about delay slots see the SPARC Architecture Manual, chapter 4, pp. 42-48
|
||||||
|
.FE
|
||||||
|
are filled in by the assembler
|
||||||
|
.LP
|
||||||
|
If we want to generate efficient code, we should at least try to
|
||||||
|
reduce the number of memory references and use registers wherever we can.
|
||||||
|
Since EM is stack-oriented it references its stack for every operation so
|
||||||
|
this will not be an easy task; a suitable solution will however be given in
|
||||||
|
the next chapter.
|
||||||
|
.NH 2
|
||||||
|
Increase compilation speed
|
||||||
|
.PP
|
||||||
|
Because we will implement a code expander (fast backend) we should keep
|
||||||
|
a close eye on efficiency; if we cannot beat regular compilers on producing
|
||||||
|
efficient code we will try to beat them on fast code generation.
|
||||||
|
The usual trick to achieve fast compilation is to pack the frontend,
|
||||||
|
optimizer, code-generator and
|
||||||
|
assembler all into a single large binary to reduce the overhead of
|
||||||
|
reading and writing temporary files. Unfortunately, due to the
|
||||||
|
SPARC instruction set, its relocation information is slightly bizarre
|
||||||
|
and cannot be represented with the present primitives.
|
||||||
|
This means that it will not be possible to generate the required output
|
||||||
|
format directly from our backend.
|
||||||
|
.PP
|
||||||
|
There are three solutions here: generate assembler code, and let an
|
||||||
|
existing assembler generate the required object (\fI.o\fR) files,
|
||||||
|
create our own primitives than can handle the SPARC relocation format, or
|
||||||
|
do not use any of the addressing modes that require the bizarre relocation.
|
||||||
|
Because we have enough on our hands already we will
|
||||||
|
let the existing assembler deal with generating object files.
|
||||||
|
.NH 2
|
||||||
|
Convert stack to register operations
|
||||||
|
.PP
|
||||||
|
As we wrote in the previous chapter, for RISC machines a code expander can
|
||||||
|
produce almost as efficient code as a code generator. The fact that this is
|
||||||
|
true for stack-oriented RISC processors is rather obvious. The problem we
|
||||||
|
face, however, is that the SPARC processor is register, instead of
|
||||||
|
stack oriented. In the next chapter we will give a suitable solution to
|
||||||
|
convert most stack accesses to register accesses.
|
||||||
|
.NH 2
|
||||||
|
Miscellaneous
|
||||||
|
.PP
|
||||||
|
Besides performance and \fI.o\fR-compatibility there are some other
|
||||||
|
peculiarities of the SPARC processor and Sun's C-compiler (henceforth
|
||||||
|
simply called \fIcc\fR).
|
||||||
|
.PP
|
||||||
|
For some reason, the SPARC stack pointer requires alignment
|
||||||
|
on 8 bytes, so you cannot push a 4-byte integer on the stack
|
||||||
|
and then \*(Sisub 4, %sp\*(So\(dd.
|
||||||
|
.FS
|
||||||
|
\(dd For more information about SPARC assembler see the Sun-4 Assembly
|
||||||
|
Language Reference Manual
|
||||||
|
.FE
|
||||||
|
This too will be discussed in the next chapter, where we will take a
|
||||||
|
more in-depth look into this problem and also discuss a couple of
|
||||||
|
possible solutions.
|
||||||
|
.PP
|
||||||
|
Another thing is that \fIcc\fR usually passes the first six parameters of a
|
||||||
|
function-call through registers. To be \fI.o\fR-compatible we would have to
|
||||||
|
pass the first six parameters of each function call through registers as well.
|
||||||
|
Exactly why this is not feasible will also be discussed in the next chapter.
|
||||||
|
.bp
|
468
doc/sparc/4
Normal file
468
doc/sparc/4
Normal file
|
@ -0,0 +1,468 @@
|
||||||
|
.so init
|
||||||
|
.hw data-structures
|
||||||
|
.nr H1 3
|
||||||
|
.NH
|
||||||
|
SOLUTIONS
|
||||||
|
.NH 2
|
||||||
|
Maintaining SPARC speed
|
||||||
|
.PP
|
||||||
|
In chapter 3 we wrote:
|
||||||
|
.sp 0.3
|
||||||
|
.nf
|
||||||
|
>If we want to generate efficient code, we should at least try to reduce the number of
|
||||||
|
>memory references and use registers wherever we can.
|
||||||
|
.fi
|
||||||
|
.sp 0.3
|
||||||
|
In this chapter we will device a strategy to swiftly generate acceptable
|
||||||
|
code by using push-pop optimization.
|
||||||
|
Note that this is not the push-pop
|
||||||
|
optimization already available in the EM-kit, since that is only present
|
||||||
|
in the assembler-to-binary part which we do not use
|
||||||
|
.[ [
|
||||||
|
The Code Expander Generator
|
||||||
|
.]].
|
||||||
|
Our push-pop optimization
|
||||||
|
works more like the fake-stack described in
|
||||||
|
.[ [
|
||||||
|
The table driven code generator
|
||||||
|
.]].
|
||||||
|
.NH 3
|
||||||
|
Ad-hoc optimization
|
||||||
|
.PP
|
||||||
|
Before getting involved in any optimization let's have a look at some
|
||||||
|
code generated with a straightforward EM to SPARC conversion of the
|
||||||
|
C statement: \*(Sif(a[i]);\*(So Note that \*(Si%SP\*(So is an alias
|
||||||
|
for a general purpose
|
||||||
|
register and acts as the EM stack pointer. It has nothing to do with
|
||||||
|
\*(Si%sp\*(So \(em the SPARC stack pointer.
|
||||||
|
Analogous \*(Si%LB\*(So is EMs local base pointer.
|
||||||
|
.br
|
||||||
|
.IP
|
||||||
|
.HS
|
||||||
|
.TS
|
||||||
|
;
|
||||||
|
l s l s l
|
||||||
|
l1f6 lf6 l2f6 lf6 l.
|
||||||
|
EM code SPARC code Comment
|
||||||
|
|
||||||
|
lae _a set _a, %g1 ! load address of external _a
|
||||||
|
dec 4, %SP
|
||||||
|
st %g1, [%SP]
|
||||||
|
|
||||||
|
lol -4 set -4, %g1 ! load local -4 (i)
|
||||||
|
ld [%g1+%LB], %g2
|
||||||
|
dec 4, %SP
|
||||||
|
st %g2, [%SP]
|
||||||
|
|
||||||
|
loc 2 set 2, %g1 ! load constant 2
|
||||||
|
dec 4, %SP
|
||||||
|
st %g1, [%SP]
|
||||||
|
|
||||||
|
sli 4 ld [%SP], %g1 ! pop shift count
|
||||||
|
ld [%SP+4], %g2 ! pop shiftee
|
||||||
|
sll %g2, %g1, %g3
|
||||||
|
inc 4, %SP
|
||||||
|
st %g3, [%SP] ! push 4 * i
|
||||||
|
|
||||||
|
ads 4 ld [%SP], %g1 ! add pointer and offset
|
||||||
|
ld [%SP+4], %g2
|
||||||
|
add %g1, %g2, %g3
|
||||||
|
inc 4, %SP
|
||||||
|
st %g3, [%SP] ! push address of _a + (4 * i)
|
||||||
|
|
||||||
|
loi 4 ld [%SP], %g1 ! load indirect 4 bytes
|
||||||
|
ld [%g1], %g2
|
||||||
|
st %g2, [%SP] ! push a[i]
|
||||||
|
cal _f
|
||||||
|
...
|
||||||
|
.TE
|
||||||
|
.HS
|
||||||
|
.LP
|
||||||
|
Although the code is easy understand, it clearly is far from optimal.
|
||||||
|
The above code uses approximately 60 clock-cycles\(dg
|
||||||
|
.FS
|
||||||
|
\(dg In general each instruction only takes one cycle,
|
||||||
|
except for \*(Sild\*(So and
|
||||||
|
\*(Sist\*(So which may both require additional clock cycles. The exact amount
|
||||||
|
of extra cycles needed depends on the SPARC implementation and memory access
|
||||||
|
time. Furthermore, the
|
||||||
|
\*(Siset\*(So pseudo-instruction is a bit tricky. It takes one cycle when
|
||||||
|
its argument lies between -4096 and 4095, and two cycles otherwise.
|
||||||
|
.FE
|
||||||
|
to push an array-element on the stack,
|
||||||
|
something which a 68020 can do in a single instruction. The SPARC
|
||||||
|
processor may be fast, but not fast enough to justify the above code.
|
||||||
|
.PP
|
||||||
|
The same statement can be translated much more efficiently:
|
||||||
|
.DS
|
||||||
|
.TS
|
||||||
|
;
|
||||||
|
l2f6 lf6 l.
|
||||||
|
sll %i0, 2, %g2 ! multiply index by 4
|
||||||
|
set _a, g3
|
||||||
|
ld [%g2+%g3], %g1 ! get contents of a[i]
|
||||||
|
dec 4, SP
|
||||||
|
st %g2, [SP] ! push a[i] onto the stack
|
||||||
|
.TE
|
||||||
|
.DE
|
||||||
|
which, instead of 60, uses only 5 clock cycles to retrieve the element
|
||||||
|
from memory and 5 additional cycles when the result has to be pushed
|
||||||
|
on the stack. Note that when the result is not a parameter it does not
|
||||||
|
have to be pushed on the stack. By making efficient use of the SPARC
|
||||||
|
registers we can fetch \*(Sia[i]\*(So in only 5 cycles!
|
||||||
|
.NH 3
|
||||||
|
Analyzing optimization
|
||||||
|
.PP
|
||||||
|
Instead of ad-hoc optimization we will need something more solid.
|
||||||
|
When one tries to optimize the above code in an ad-hoc manner one will
|
||||||
|
probably notice the large overhead due to stack access. Almost every EM
|
||||||
|
instruction requires at least three SPARC instructions: one to carry out
|
||||||
|
the EM instruction and two to pop and push the result from and onto the
|
||||||
|
stack. This happens for every instruction, even though the data being pushed
|
||||||
|
will probably be needed by the next instruction. To optimize this extensive
|
||||||
|
pushing and popping of data we will use the appropriately named push-pop
|
||||||
|
optimization.
|
||||||
|
.PP
|
||||||
|
The idea behind push-pop optimization is to delay the push operation until
|
||||||
|
it is almost certain that the data actually has to be pushed.
|
||||||
|
As is often the case, the data does not have to be pushed,
|
||||||
|
but will be used as input to another EM instruction.
|
||||||
|
If we can decide at compile time that this will indeed be
|
||||||
|
the case we can save the time of first pushing the data and then popping it
|
||||||
|
back again by temporarily storing the data (possibly only during compilation!)
|
||||||
|
and using it no sooner than it is actually needed.
|
||||||
|
.PP
|
||||||
|
The \*(Sisli 4\*(So instruction, for instance, expects two inputs on top of the
|
||||||
|
stack: on top a counter and right below that the shiftee (the number
|
||||||
|
to be shifted). As a result \*(Sisli\*(So
|
||||||
|
pushes 'shiftee << counter' back to the stack. Now consider the following
|
||||||
|
sequence, which could be the result of the expression \*(Si4 * i\*(So
|
||||||
|
.DS
|
||||||
|
.TS
|
||||||
|
;
|
||||||
|
l1f6 lf6 l.
|
||||||
|
lol -4
|
||||||
|
loc 2
|
||||||
|
sli 4
|
||||||
|
.TE
|
||||||
|
.DE
|
||||||
|
In the non-optimized situation the \*(Silol\*(So would push
|
||||||
|
a local variable (whose offset is -4) on the stack.
|
||||||
|
Then the \*(Siloc\*(So pushes a 2 on the stack and finally \*(Sisli\*(So
|
||||||
|
retrieves both these numbers to replace then with the result.
|
||||||
|
On most machines it is not necessary to
|
||||||
|
push the 2 on the stack, since it can be used in the shift instruction
|
||||||
|
as an immediately operand. On a SPARC, for instance, one can write
|
||||||
|
.DS
|
||||||
|
.TS
|
||||||
|
;
|
||||||
|
l2f6 lf6 l.
|
||||||
|
ld [%LB-4], %g1 ! load local variable into register g1
|
||||||
|
sll %g1, 2, %g2 ! perform the shift-left-by-2
|
||||||
|
.TE
|
||||||
|
.DE
|
||||||
|
where the output of the \*(Silol\*(So, as well as the immediate operand 2 are used
|
||||||
|
in the shift instruction. As suggested before, all of this can be
|
||||||
|
achieved with push-pop optimization.
|
||||||
|
.NH 3
|
||||||
|
A mechanism for push-pop optimization
|
||||||
|
.PP
|
||||||
|
To implement the above optimization we need some mechanism to
|
||||||
|
temporarily store information during compilation.
|
||||||
|
We need to be able to store, compare and retrieve information from the
|
||||||
|
temporary storage (cache) without any
|
||||||
|
loss of information. Before describing all the routines used
|
||||||
|
to implement our cache we will first describe how the cache works.
|
||||||
|
.PP
|
||||||
|
Items in the cache are structures containing an external (\*(Sichar *\*(So),
|
||||||
|
two registers (\*(Sireg_t\*(So) and a constant (\*(Siarith\*(So),
|
||||||
|
any of which may be 0.
|
||||||
|
The value of such a structure is the sum of (the values of)
|
||||||
|
its elements. To put a register in the cache, one has to be allocated either
|
||||||
|
by calling \*(Sialloc_reg\*(So which returns a free register, by
|
||||||
|
\*(Siforced_alloc_reg\*(So which allocates a specific register or any
|
||||||
|
of the other routines available to allocate a register. The keep things
|
||||||
|
simple, we will not discuss all of the available primitives here.
|
||||||
|
When the register
|
||||||
|
is then put in the cache by the \*(Sipush_reg\*(So routine, the ownership will
|
||||||
|
be transferred from the user to the cache. Ownership is important, because
|
||||||
|
only the owner of a register may (and must!) deallocate it. Registers can be
|
||||||
|
owned by either an (imaginary) register manager, the cache or the user.
|
||||||
|
When the user retrieves a register from the stack with \*(Sipop_reg\*(So for
|
||||||
|
instance, ownership is back to the user.
|
||||||
|
The user should then call \*(Sifree_reg\*(So
|
||||||
|
to transfer ownership to the register manager or call \*(Sipush_reg\*(So
|
||||||
|
to give it back to the cache.
|
||||||
|
Since the cache behaves itself as a stack we will use the term pop resp. push
|
||||||
|
to get items from, resp. put items in the cache.
|
||||||
|
.PP
|
||||||
|
We shall now present the sets of routines that implement the cache.
|
||||||
|
.IP \(bu
|
||||||
|
The routines
|
||||||
|
.DS
|
||||||
|
\*(Si
|
||||||
|
reg_t alloc_reg(void)
|
||||||
|
reg_t alloc_reg_var(void)
|
||||||
|
reg_t alloc_float(void)
|
||||||
|
reg_t alloc_float_var(void)
|
||||||
|
reg_t alloc_double(void)
|
||||||
|
reg_t alloc_double_var(void)
|
||||||
|
|
||||||
|
void forced_alloc_reg(reg_t)
|
||||||
|
void soft_alloc_reg(reg_t)
|
||||||
|
|
||||||
|
void free_reg(reg_t)
|
||||||
|
void free_double_reg(reg_t)
|
||||||
|
\*(So
|
||||||
|
.DE
|
||||||
|
allocate and deallocate registers. If there are no more register left,
|
||||||
|
i.e. they are owned by the cache,
|
||||||
|
one or more registers will be freed by flushing part of the cache
|
||||||
|
onto the real stack.
|
||||||
|
The \*(Sialloc_xxx_var\*(So primitives try to allocate a register that
|
||||||
|
can be used to store local variables. (In the current implementation
|
||||||
|
only the input and local registers.) If none can be found \*(SiNULL\*(So
|
||||||
|
is returned. \*(Siforced_alloc_reg\*(So forces the allocation of a certain
|
||||||
|
register. If it was already in use, its contents are moved to another
|
||||||
|
register. Finally \*(Sisoft_alloc_reg\*(So provides the possibility to
|
||||||
|
push a register onto the cache and still keep a copy for later use.
|
||||||
|
(Used to implement the \*(Sidup 4\*(So for example.)
|
||||||
|
.IP \(bu
|
||||||
|
The routines
|
||||||
|
.DS
|
||||||
|
\*(Si
|
||||||
|
void push_const(arith)
|
||||||
|
arith pop_const(void)
|
||||||
|
\*(So
|
||||||
|
.DE
|
||||||
|
push or pop a constant onto or from the stack. Distinction between
|
||||||
|
constants and other types is made so as not to loose any information; constants
|
||||||
|
may be used later on as immediate operators, which is not the case
|
||||||
|
for other types. If \*(Sipop_const\*(So is called, but the element on top of
|
||||||
|
the cache has either one of the external or register fields non-zero a
|
||||||
|
fatal error will be reported.
|
||||||
|
.IP \(bu
|
||||||
|
The routines
|
||||||
|
.DS
|
||||||
|
\*(Si
|
||||||
|
reg_t pop_reg(void)
|
||||||
|
reg_t pop_float(void)
|
||||||
|
reg_t pop_double(void)
|
||||||
|
reg_t pop_reg_c13(char *n)
|
||||||
|
|
||||||
|
void pop_reg_as(reg_t)
|
||||||
|
|
||||||
|
void push_reg(reg_t)
|
||||||
|
\*(So
|
||||||
|
.DE
|
||||||
|
push or pop a register. These will be used most often since results from one
|
||||||
|
EM instruction, which are computed in a register, are often used in the next.
|
||||||
|
When the element on top of the cache is more
|
||||||
|
than just a register the cache manager
|
||||||
|
will generate code to compute the sum of its fields and put the result in a
|
||||||
|
register. This register will then be given to the user.
|
||||||
|
If the user wants the result is a special register, he should use the
|
||||||
|
\*(Sipop_reg_as\*(So routine.
|
||||||
|
The \*(Sipop_reg_c13\*(So gives an optional number (as character string) whose
|
||||||
|
value can be represented in 13 bits. The constant can then be used as an
|
||||||
|
offset for the SPARC \*(Sild\*(So and \*(Sist\*(So instructions.
|
||||||
|
.IP \(bu
|
||||||
|
The routine
|
||||||
|
.DS
|
||||||
|
\*(Si
|
||||||
|
void push_ext(char *)
|
||||||
|
\*(So
|
||||||
|
.DE
|
||||||
|
pushes an external onto the stack. There is no pop-variant of this one since
|
||||||
|
there is no use in popping an external.
|
||||||
|
.IP \(bu
|
||||||
|
The routines
|
||||||
|
.DS
|
||||||
|
\*(Si
|
||||||
|
void inc_tos(arith n)
|
||||||
|
void inc_tos_reg(reg_t r)
|
||||||
|
\*(So
|
||||||
|
.DE
|
||||||
|
increment the element on top of the cache by either the constant \*(Sin\*(So
|
||||||
|
or by a register. The latter is useful for pointer addition when referencing
|
||||||
|
external memory.
|
||||||
|
.KS
|
||||||
|
.IP \(bu
|
||||||
|
The routine
|
||||||
|
.DS
|
||||||
|
\*(Si
|
||||||
|
int type_of_tos(void)
|
||||||
|
\*(So
|
||||||
|
.DE
|
||||||
|
.KE
|
||||||
|
returns the type of the element on top of the cache. This is a combination
|
||||||
|
(binary OR) of \*(SiT_ext\*(So, \*(SiT_reg\*(So or \*(SiT_float\*(So,
|
||||||
|
\*(SiT_reg2\*(So or \*(SiT_float2\*(So, and \*(SiT_cst\*(So,
|
||||||
|
and tells the
|
||||||
|
user which of the three fields are non-zero. When the register-fields
|
||||||
|
represent \*(Si%g0\*(So, it is considered zero.
|
||||||
|
.IP \(bu
|
||||||
|
Miscellaneous routines:
|
||||||
|
.DS
|
||||||
|
\*(Si
|
||||||
|
void init_cache(void)
|
||||||
|
void cache_need(int)
|
||||||
|
void change_reg(void)
|
||||||
|
void flush_cache(void)
|
||||||
|
\*(So
|
||||||
|
.DE
|
||||||
|
\*(Siinit_cache\*(So should be called before any
|
||||||
|
other cache routines, to initialize some internal datastructures.
|
||||||
|
\*(Sicache_need\*(So is used to tell the cache that a certain number
|
||||||
|
of register are needed for the next operation. This way the cache can
|
||||||
|
load them efficiently in one fell swoop. \*(Sichange_reg\*(So is to be
|
||||||
|
called when the user changes a register of which the cache (possibly) has
|
||||||
|
co-ownership. Because the contents of registers in the cache are
|
||||||
|
not allowed to change the user should call \*(Sichange_reg\*(So to
|
||||||
|
instruct the cache to copy the contents to some other register.
|
||||||
|
\*(Siflush_cache\*(So writes the cache to the stack and invalidates
|
||||||
|
the cache. It should be used before branches,
|
||||||
|
before labels and on other places where the stack has to be valid (i.e. where
|
||||||
|
every item on the EM-stack should be stored on the real stack, not in some
|
||||||
|
virtual cache).
|
||||||
|
.NH 3
|
||||||
|
Implementing push-pop optimization in the EM_table
|
||||||
|
.PP
|
||||||
|
As indicated above, there is no regular way to represent the described
|
||||||
|
optimization in the EM_table. The only possible escapes from the EM_table
|
||||||
|
are function calls, but that is clearly not enough to implement a good
|
||||||
|
push-pop optimizer. Therefore we will use a modified version of the EM_table
|
||||||
|
format, where the description of, say, the \*(Silol\*(So instruction might look
|
||||||
|
like this\(dg:
|
||||||
|
.FS
|
||||||
|
\(dg This is not the way the \*(Silol\*(So actually looks in the EM_table;
|
||||||
|
it only shows how it \fImight\fR look using the forementioned push/pop
|
||||||
|
primitives.
|
||||||
|
.FE
|
||||||
|
.DS
|
||||||
|
\*(Si
|
||||||
|
reg_t A, B;
|
||||||
|
const_str_t n;
|
||||||
|
|
||||||
|
alloc_reg(A);
|
||||||
|
push_reg(LB);
|
||||||
|
inc_tos($1);
|
||||||
|
B = pop_reg_c13(n);
|
||||||
|
"ld [$B+$n], $A";
|
||||||
|
push_reg(A);
|
||||||
|
free_reg(B);
|
||||||
|
\*(So
|
||||||
|
.DE
|
||||||
|
For more details about the exact implementation consult
|
||||||
|
appendix B which contains some characteristic excerpts from the EM_table.
|
||||||
|
.NH 2
|
||||||
|
Stack management
|
||||||
|
.PP
|
||||||
|
When converting EM code to some executable code there is the problem of
|
||||||
|
maintaining multiple stacks. The usual way to do this is described in
|
||||||
|
.[ [
|
||||||
|
Description of a Machine Architecture
|
||||||
|
.]]
|
||||||
|
and is shown in figure \*(SN1.
|
||||||
|
.KE
|
||||||
|
.PS
|
||||||
|
copy "pics/EM_stack.orig"
|
||||||
|
.PE
|
||||||
|
.ce 1
|
||||||
|
\fIFigure \*(SN1: usual stack management.
|
||||||
|
.KE
|
||||||
|
.sp
|
||||||
|
.LP
|
||||||
|
This means that the EM stack and the hardware stack (used
|
||||||
|
for subroutine calls, etc.) are interleaved in memory. On the SPARC, however,
|
||||||
|
this brings up a large problem: in the former model it is assumed that the
|
||||||
|
resolution of the stack pointer is a word, but this is not the case on the
|
||||||
|
SPARC processor. On the SPARC processor the stack-pointer as well as the
|
||||||
|
frame-pointer have to be aligned on 8-byte boundaries, so one can not simply
|
||||||
|
push a word on the stack and then lower the stack-pointer by 4 bytes!
|
||||||
|
.NH 3
|
||||||
|
Possible solutions
|
||||||
|
.PP
|
||||||
|
A simple idea might be to use a swiss-cheese stack; we could
|
||||||
|
push a 4-byte word onto the stack and then lower the stack by 8.
|
||||||
|
Unfortunately, this is not a very solid solution, because
|
||||||
|
pointer-arithmetic involving pointers to objects on the stack would cause
|
||||||
|
hard-to-predict anomalies.
|
||||||
|
.PP
|
||||||
|
Another try would be not to use the hardware stack at all. As long as we
|
||||||
|
do not generate subroutine-calls everything will be all right. This
|
||||||
|
approach, however, also has some disadvantages: first we would not be able
|
||||||
|
to use any of the existing debuggers such as \fIadb\fR, because they all
|
||||||
|
assume a regular stack format. Secondly, we would not be able to make use
|
||||||
|
of the SPARC's register windows to keep local variables. Finally, doing all the
|
||||||
|
administrative work necessary for subroutine calls ourselves instead of
|
||||||
|
letting the hardware handle it for us,
|
||||||
|
causes unnecessary procedure-call overhead.
|
||||||
|
.PP
|
||||||
|
Yet another alternative would be to emulate the EM-part of the stack,
|
||||||
|
and to let the hardware handle the subroutine call. Since we will
|
||||||
|
emulate our own stack, there are no alignment restrictions and because
|
||||||
|
we will use the hardware procedure call we can still make use of
|
||||||
|
the register windows.
|
||||||
|
.NH 3
|
||||||
|
Our implementation
|
||||||
|
.PP
|
||||||
|
To implement the hybrid stack we need two extra registers: one for the
|
||||||
|
the EM stack pointer (the forementioned \*(Si%SP\*(So) and one for the
|
||||||
|
EM local base pointer (\*(Si%LB\*(So). The most elegant solution would be to
|
||||||
|
put both stacks in different segments, so they would not influence
|
||||||
|
each other. Unfortunately
|
||||||
|
.UX
|
||||||
|
lacks the ability to add segments and
|
||||||
|
since we will implement our backend under
|
||||||
|
.UX,
|
||||||
|
we will have to put
|
||||||
|
both stacks in the same segment. Exactly how this can be done is shown
|
||||||
|
in figure \*(SN2.
|
||||||
|
.DS
|
||||||
|
.PS
|
||||||
|
copy "pics/mem_config"
|
||||||
|
.PE
|
||||||
|
.ce 1
|
||||||
|
\fIFigure \*(SN2: our stack management.\fR
|
||||||
|
.DE
|
||||||
|
.sp
|
||||||
|
During normal procedure execution, the SPARC stack pointer has to point to
|
||||||
|
a memory location where the operating system can dump the active part of
|
||||||
|
the register window. The rest of the
|
||||||
|
register window will be dumped in the therefor pre-allocated (stack) space
|
||||||
|
by following the frame
|
||||||
|
pointer. When a signal occurs things get even more complicated and
|
||||||
|
result in figure \*(SN3.
|
||||||
|
.DS
|
||||||
|
.PS
|
||||||
|
copy "pics/signal_stack"
|
||||||
|
.PE
|
||||||
|
.ce 1
|
||||||
|
\fIFigure \*(SN3: our signal stack.\fR
|
||||||
|
.DE
|
||||||
|
.PP
|
||||||
|
The exact implementation of the stack is shown in figure \*(SN4.
|
||||||
|
.KF
|
||||||
|
.PS
|
||||||
|
copy "pics/EM_stack.ours"
|
||||||
|
.PE
|
||||||
|
.ce 1
|
||||||
|
\fIFigure \*(SN4: stack overview.\fR
|
||||||
|
.KE
|
||||||
|
.NH 2
|
||||||
|
Miscellaneous
|
||||||
|
.PP
|
||||||
|
As mentioned in the previous chapter, the generated \fI.o\fR-files are
|
||||||
|
not compatible with Sun's own object format. The primary reason for
|
||||||
|
this is that Sun usually passes the first six parameters of a procedure call
|
||||||
|
through registers. If we were to do that too, we would always have
|
||||||
|
to fetch the top six words from the stack into registers, even when
|
||||||
|
the procedure would not have any parameters at all. Apart from this,
|
||||||
|
structure-passing is another exception in Sun's object format which
|
||||||
|
makes is impossible to generate object-compatible code.\(dg
|
||||||
|
.FS
|
||||||
|
\(dg Exactly how Sun passes structures as parameters is described in
|
||||||
|
Appendix D of the SPARC Architecture Manual (Software Considerations)
|
||||||
|
.FE
|
||||||
|
.bp
|
153
doc/sparc/5
Normal file
153
doc/sparc/5
Normal file
|
@ -0,0 +1,153 @@
|
||||||
|
.so init
|
||||||
|
.nr H1 4
|
||||||
|
.NH
|
||||||
|
FUTURE WORK
|
||||||
|
.NH 2
|
||||||
|
A critique of EM
|
||||||
|
.PP
|
||||||
|
In general, EM fits its purpose quite well. Numerous compilers have been
|
||||||
|
written using EM as their intermediate language and it has even become a
|
||||||
|
commercial product. A great deal of its success is probably due to its
|
||||||
|
simplicity. There are no extravagant instructions but it does have all the
|
||||||
|
necessary functions to write a decent compiler.
|
||||||
|
.PP
|
||||||
|
There are, however, a few functions that come rather close to being
|
||||||
|
extravagant. The \*(Silar\*(So function for example \(em used
|
||||||
|
to fetch an element from an array \(em does not make it much easier
|
||||||
|
to write a frontend, but does make it unnecessary hard to write an
|
||||||
|
efficient backend. Other instructions for which it is difficult
|
||||||
|
to generate efficient code for are those that permit
|
||||||
|
dynamic operators, such as the \*(Silos\*(So. Dynamic operators, however, provide
|
||||||
|
significant extra possibilities and can therefore not be disposed of.
|
||||||
|
Note that even though the array operations \*(Silar\*(So and \*(Sisar\*(So
|
||||||
|
provide dynamic operators, they do not add additional power, since
|
||||||
|
they can easily be replaced with a sequence using the \*(Silos\*(So or
|
||||||
|
\*(Sists\*(So instructions.
|
||||||
|
.PP
|
||||||
|
EM code to reference arrays generated by the C frontend can be translated
|
||||||
|
very efficiently for almost any processor. However the same operation
|
||||||
|
generated by the Modula-2 frontend (which uses the \*(Silar\*(So),
|
||||||
|
is much less efficient, although the only difference is that the
|
||||||
|
latter performs range checking whereas the former does not.\(dg
|
||||||
|
.FS
|
||||||
|
\(dg Actually this depends on whether or not explicit range checking in enabled.
|
||||||
|
This clearly shows that the current code generators are not optimal and
|
||||||
|
often depend on ad-hoc decisions.
|
||||||
|
.FE
|
||||||
|
Since range checking can also be expressed explicitly in
|
||||||
|
EM (\*(Sirck\*(So) there is no need for any of the array operations
|
||||||
|
(\*(Siaar\*(So, \*(Silar\*(So and \*(Sisar\*(So).
|
||||||
|
.PP
|
||||||
|
Besides efficiency of the array-operations themselves, there still is another
|
||||||
|
major disadvantage of using these array-operations. In sharp contrast to
|
||||||
|
all other EM instructions except the \*(Silos\*(So and the \*(Sists\*(So,
|
||||||
|
they allow dynamic operators, so their effect on the stack-pointer can not
|
||||||
|
always be
|
||||||
|
determined at compile-time. This means that efficient caching of the
|
||||||
|
top-of-stack in registers is almost impossible,
|
||||||
|
so using these array-operations also effects the
|
||||||
|
efficiency of the surrounding code. Now that processors are produced with
|
||||||
|
more and more registers it could be very beneficiary to cache the
|
||||||
|
top-of-stack, so that the memory/register reference ratio decreases
|
||||||
|
to the benefit of the overall performance.
|
||||||
|
.PP
|
||||||
|
As a final critique, we would also like to discuss the semantics of some of
|
||||||
|
the EM instructions. In
|
||||||
|
.[ [
|
||||||
|
Description of a Machine Architecture
|
||||||
|
.]]
|
||||||
|
it is said that
|
||||||
|
all signed instructions such as the \*(Siadi\*(So, should cause an exception
|
||||||
|
on overflow. The unsigned operations such as \*(Siadu\*(So, however,
|
||||||
|
should act as modulo operations and therefor not perform overflow checking.
|
||||||
|
Since it is very expensive to perform overflow checking in EM,
|
||||||
|
we would suggest that the backend takes care of this. For languages which
|
||||||
|
do not require overflow checking, a simple message could be generated to
|
||||||
|
disable overflow checking in backends. This way all backends could be
|
||||||
|
written to fully comply to the official EM definition without any reduction in
|
||||||
|
efficiency.\(dd
|
||||||
|
.FS
|
||||||
|
\(dd Currently many backends do not implement error checks because they
|
||||||
|
are too expensive and almost never needed. Some frontends even have
|
||||||
|
facilities build in to generate EM-code to force these checks. If this
|
||||||
|
trend continues we will end up with a de-facto and a de-jure standard
|
||||||
|
both developed by the same people but nonetheless incompatible.
|
||||||
|
.FE
|
||||||
|
When such messages will be added we would like to suggest
|
||||||
|
that they can enforce overflow checks on unsigned, as well as signed arithmetic.
|
||||||
|
.PP
|
||||||
|
As a conclusion we would like to suggest removal of the array operations from
|
||||||
|
EM, or at least discontinuation of there usage in frontends.
|
||||||
|
.NH 2
|
||||||
|
\*(OQWanted: Procedure call information\*(CQ
|
||||||
|
.PP
|
||||||
|
The advantage of an intermediate language such as EM is that the backend
|
||||||
|
no longer has to know about any 'quirks' of the 'input'-language. The major
|
||||||
|
disadvantage, however, is that the backend no longer knows about any 'quirks'
|
||||||
|
of the 'input'-language... If the SPARC backend ever has to compete
|
||||||
|
with Sun's own C-compiler for example, removal of the array-operations
|
||||||
|
will not be enough. The amount of information that is lost during
|
||||||
|
the translation to EM is too large to ever generate truly efficient SPARC code.
|
||||||
|
.PP
|
||||||
|
To write such an efficient backend one needs to know, for example, whether,
|
||||||
|
when and what type of parameter is being computed, so the result can be stored
|
||||||
|
in the proper place and scratch registers can be reused.
|
||||||
|
(On the SPARC processor, for example, it is very beneficiary
|
||||||
|
to pass the first six parameters of a procedure call through
|
||||||
|
registers instead of using the stack.)
|
||||||
|
One way to express such things in EM is to insert extra messages in
|
||||||
|
the EM-code. The C statement \*(Sia = f(4, a + b);\*(So for example,
|
||||||
|
could be translated to the following EM-code:
|
||||||
|
.DS
|
||||||
|
.TS
|
||||||
|
;
|
||||||
|
l1f6 lf6 l.
|
||||||
|
lol -4 ! a
|
||||||
|
lol -8 ! b
|
||||||
|
mes x, 2 ! next instruction will compute 2nd parameter
|
||||||
|
adi 4
|
||||||
|
mes x, 1 ! next instruction will compute 1st parameter
|
||||||
|
loc 4
|
||||||
|
cal _f ! call function f
|
||||||
|
lfr 4
|
||||||
|
stl -4 ! store result in a
|
||||||
|
.TE
|
||||||
|
.DE
|
||||||
|
For a code expander it is important that the \*(Simes\*(So pseudo
|
||||||
|
instructions appear \fIbefore\fR
|
||||||
|
the EM instruction that computes the parameter, because that way the final
|
||||||
|
computation (the \*(Siadi\*(So and \*(Siloc\*(So in the previous example)
|
||||||
|
can be translated to machine code that performs the required computation
|
||||||
|
and also puts the result in the required place. If it is found to be
|
||||||
|
too difficult for the frontend to insert these \*(Simes\*(So instructions
|
||||||
|
at the right place the peep-hole optimizer might swap the \*(Simes\*(So and
|
||||||
|
the instruction that computes the parameter.
|
||||||
|
.PP
|
||||||
|
For some architectures, it is also
|
||||||
|
possible to generate more efficient code for a procedure when it is a
|
||||||
|
so-called leaf-procedure: a procedure that doesn't call other procedures.
|
||||||
|
On the SPARC, for example, it is not necessary to rotate the register
|
||||||
|
window for a call to a leaf procedure and it is also possible to use
|
||||||
|
the global registers for register variables in leaf procedures.
|
||||||
|
It will be a little harder to insert useful messages about leaf procedures,
|
||||||
|
because just as with register messages, they are only useful to the
|
||||||
|
backend when they appear immediately
|
||||||
|
after or before the \*(Sipro\*(So pseudo instruction. The frontend,
|
||||||
|
however, only knows whether a certain procedure is a leaf-procedure or not
|
||||||
|
when it has already generated the entire procedure in EM. Just as with the
|
||||||
|
\*(Sipro ? / end n\*(So-dilemma the peep-hole optimizer
|
||||||
|
.[ [
|
||||||
|
Using Peephole Optimization
|
||||||
|
.]]
|
||||||
|
might be able to lend a hand
|
||||||
|
and help us out by delaying EM-code generation until it has reached the
|
||||||
|
end of the procedure.
|
||||||
|
.PP
|
||||||
|
As with most optimizations, the main problem is that they have to be
|
||||||
|
implemented with the \*(Simes\*(So pseudo instruction.
|
||||||
|
Because the \*(Simes\*(So instruction can have many different meanings
|
||||||
|
depending on its argument,
|
||||||
|
it is important that all optimizers recognize and respect them. Addition
|
||||||
|
of even a single message will require careful inspection of, and maybe even
|
||||||
|
incorporate small changes to each of the optimizers.
|
||||||
|
.bp
|
184
doc/sparc/A
Normal file
184
doc/sparc/A
Normal file
|
@ -0,0 +1,184 @@
|
||||||
|
.so init
|
||||||
|
.SH
|
||||||
|
A. MEASUREMENTS
|
||||||
|
.SH
|
||||||
|
A.1. \*(OQThe bottom line\*(CQ
|
||||||
|
.PP
|
||||||
|
Although examples often are most illustrative, the cruel world out there is
|
||||||
|
usually more interested in everyday performance figures. To satisfy those
|
||||||
|
people too, we will present a series of measurements on our code expander
|
||||||
|
taken from (close to) real life situations. These include measurements
|
||||||
|
of compile and run times of different programs,
|
||||||
|
compiled with different compilers.
|
||||||
|
.SH
|
||||||
|
A.2. Compile time measurements
|
||||||
|
.PP
|
||||||
|
Figure A.2.1 shows compile-time measurements for typical C code:
|
||||||
|
the dhrystone benchmark\(dg
|
||||||
|
.[ [
|
||||||
|
dhrystone
|
||||||
|
.]].
|
||||||
|
.FS
|
||||||
|
\(dg To be certain that we only tested the compiler and not the quality of
|
||||||
|
the code in the library, we have added our own version of
|
||||||
|
\fIstrcmp\fR and \fIstrcpy\fR and have not used the ones present in the
|
||||||
|
library.
|
||||||
|
.FE
|
||||||
|
The numbers represent the duration of each separate pass of the compiler.
|
||||||
|
The numbers at the end of each bar represent the total duration of the
|
||||||
|
compilation process. As with all measurements in this chapter, the
|
||||||
|
quoted time or duration is the sum of user and system time in seconds.
|
||||||
|
.PS
|
||||||
|
copy "pics/compile_bars"
|
||||||
|
.PE
|
||||||
|
.DS
|
||||||
|
.IP cem: 6
|
||||||
|
C to EM frontend
|
||||||
|
.IP opt:
|
||||||
|
EM peep-hole optimizer
|
||||||
|
.IP be:
|
||||||
|
EM to assembler backend
|
||||||
|
.IP cpp:
|
||||||
|
Sun's C preprocessor
|
||||||
|
.IP ccom:
|
||||||
|
Sun's C compiler
|
||||||
|
.IP iropt:
|
||||||
|
Sun's optimizer
|
||||||
|
.IP cg:
|
||||||
|
Sun's code generator
|
||||||
|
.IP as:
|
||||||
|
Sun's assembler
|
||||||
|
.IP ld:
|
||||||
|
Sun's linker
|
||||||
|
.ce 1
|
||||||
|
\fIFigure A.2.1: compile-time measurements.\fR
|
||||||
|
.DE
|
||||||
|
.sp
|
||||||
|
.PP
|
||||||
|
A close examination of the first two bars in fig A.2.1 shows that the maximum
|
||||||
|
achievable compile-time
|
||||||
|
gain compared to \fIcc\fR is about 50% for medium-sized
|
||||||
|
programs.\(dd
|
||||||
|
.FS
|
||||||
|
\(dd (cpp+ccom+as+ld)/(cem+as+ld) = 1.53
|
||||||
|
.FE
|
||||||
|
For small programs the gain will be less, due to the almost constant
|
||||||
|
start-up time of each pass in the compilation process. Only a
|
||||||
|
built-in assembler may increase this number up to
|
||||||
|
180% in the ideal case that the optimizer, backend and assembler
|
||||||
|
would run in zero time. Speed-ups of 5 to 10 times as mentioned in
|
||||||
|
.[ [
|
||||||
|
fast portable compilers
|
||||||
|
.]]
|
||||||
|
are therefore not possible on the Sun-4 family. This is also due to
|
||||||
|
Sun's implementation of saving and restoring register windows. With
|
||||||
|
the current implementation in which only a single window is saved
|
||||||
|
or restored on a register-window overflow, it is very time consuming
|
||||||
|
when programs have highly dynamic stack use
|
||||||
|
due to procedure calls (as is often the case with compilers).
|
||||||
|
.PP
|
||||||
|
Although we are currently a little slower than \fIcc\fR, it is hard to
|
||||||
|
blame this on our backend. Optimizing the backend so that it would run
|
||||||
|
twice as fast would only reduce the total compilation process by
|
||||||
|
a mere 14%.
|
||||||
|
.PP
|
||||||
|
Finally it is nice to see that our push/pop-optimization,
|
||||||
|
initially designed to generate faster code, has also increased the
|
||||||
|
compilation speed. (see also figures A.4.1 and A.4.2.)
|
||||||
|
.SH
|
||||||
|
A.3. Run time performance
|
||||||
|
.PP
|
||||||
|
Figure A.3.1 shows the run-time performance of different compilers.
|
||||||
|
All results are normalized, where the best available compiler (Sun's
|
||||||
|
compiler with full optimization) is represented by 1.0 on our scale.
|
||||||
|
.PS
|
||||||
|
copy "pics/run-time_bars"
|
||||||
|
.PE
|
||||||
|
.ce 1
|
||||||
|
\fIFigure A.3.1: run time performance.\fR
|
||||||
|
.sp 1
|
||||||
|
.PP
|
||||||
|
The fact that our compiler behaves rather poorly compared to Sun's
|
||||||
|
compiler is due to the fact that the dhrystone benchmark uses
|
||||||
|
relatively many subroutine calls; all of which have to be 'emulated'
|
||||||
|
by our backend.
|
||||||
|
.SH
|
||||||
|
A.4. Overall performance
|
||||||
|
.LP
|
||||||
|
In the next two figures we will show the combined run and compile time
|
||||||
|
performance of 'our' compiler (the ACK C frontend and our backend)
|
||||||
|
compared to Sun's C compiler. Figure A.4.1 shows the results from
|
||||||
|
measurements on the dhrystone benchmark.
|
||||||
|
.G1
|
||||||
|
frame invis left solid bot solid
|
||||||
|
label left "run time" "(in \(*msec/dhrystone)"
|
||||||
|
label bot "compile time (in sec)"
|
||||||
|
coord x 0,21 y 0,610
|
||||||
|
ticks left out from 0 to 600 by 200
|
||||||
|
ticks bot out from 0 to 20 by 5
|
||||||
|
"\(bu" at 3.5, 1000000/1700
|
||||||
|
"ack w/o opt" ljust at 3.5 + 1, 1000000/1700
|
||||||
|
"\(bu" at 2.8, 1000000/8770
|
||||||
|
"ack with opt" below at 2.8 + 0.1, 1000000/8770
|
||||||
|
"\(bu" at 16.0, 1000000/10434
|
||||||
|
"ack -O4" above at 16.0, 1000000/10434
|
||||||
|
"\(bu" at 2.3, 1000000/7270
|
||||||
|
"\fIcc\fR" above at 2.3, 1000000/7270
|
||||||
|
"\(bu" at 9.0, 1000000/12500
|
||||||
|
"\fIcc -O4\fR" above at 9.0, 1000000/12500
|
||||||
|
"\(bu" at 5.9, 1000000/15250
|
||||||
|
"\fIcc -O\fR" below at 5.9, 1000000/15250
|
||||||
|
.G2
|
||||||
|
.ce 1
|
||||||
|
\fIFigure A.4.1: overall performance on dhrystones.
|
||||||
|
.sp 1
|
||||||
|
.LP
|
||||||
|
Fortunately for us, dhrystones are not all there is. The following
|
||||||
|
figure shows the same measurements as the previous one, except
|
||||||
|
this time we took a benchmark that uses no subroutines: an implementation
|
||||||
|
of Eratosthenes' sieve:
|
||||||
|
.G1
|
||||||
|
frame invis left solid bot solid
|
||||||
|
label left "run time" "for one run" "(in sec)" left .6
|
||||||
|
label bot "compile time (in sec)"
|
||||||
|
coord x 0,11 y 0,21
|
||||||
|
ticks bot out from 0 to 10 by 5
|
||||||
|
ticks left out from 0 to 20 by 5
|
||||||
|
"\(bu" at 2.5, 17.28
|
||||||
|
"ack w/o opt" above at 2.5, 17.28
|
||||||
|
"\(bu" at 1.6, 2.93
|
||||||
|
"ack with opt" above at 1.6, 2.93
|
||||||
|
"\(bu" at 9.4, 2.26
|
||||||
|
"ack -O4" above at 9.4, 2.26
|
||||||
|
"\(bu" at 1.5, 7.43
|
||||||
|
"\fIcc\fR" above at 1.5, 7.43
|
||||||
|
"\(bu" at 2.7, 2.02
|
||||||
|
"\fIcc -O4\fR" ljust at 1.9, 1.2
|
||||||
|
"\(bu" at 2.6, 2.10
|
||||||
|
"\fIcc -O\fR" ljust at 3.1,2.5
|
||||||
|
.G2
|
||||||
|
.ce 1
|
||||||
|
\fIFigure A.4.2: overall performance on Eratosthenes' sieve.
|
||||||
|
.sp 1
|
||||||
|
.PP
|
||||||
|
Although the above figures speak for themselves, a small comment
|
||||||
|
may be in place. At first it is clear that our compiler is neither
|
||||||
|
faster than \fIcc\fR, nor produces faster code than \fIcc -O4\fR. It should
|
||||||
|
also be noted however, that we do produce better code than \fIcc\fR
|
||||||
|
at only a very small additional cost.
|
||||||
|
It is also worth noticing that push-pop optimization
|
||||||
|
increases run-time speed as well as compile speed.
|
||||||
|
The first seems rather obvious,
|
||||||
|
since optimized code is
|
||||||
|
faster code, but the increase in compile speed may come as a surprise.
|
||||||
|
The main reason is that the \fIas\fR+\fIld\fR time depends largely on the
|
||||||
|
amount of generated code, which in general
|
||||||
|
depends on the efficiency of the code.
|
||||||
|
Push-pop optimization removes a lot of useless instructions which
|
||||||
|
would otherwise
|
||||||
|
have found their way through to the assembler and the loader.
|
||||||
|
Useless instructions inserted in an early stage in the compilation
|
||||||
|
process will slow down every following stage, so elimination of useless
|
||||||
|
instructions in an early stage, even when it requires a little computational
|
||||||
|
overhead, can often be beneficial to the overall compilation speed.
|
||||||
|
.bp
|
128
doc/sparc/B
Normal file
128
doc/sparc/B
Normal file
|
@ -0,0 +1,128 @@
|
||||||
|
.so init
|
||||||
|
.SH
|
||||||
|
B. IMPLEMENTATION
|
||||||
|
.SH
|
||||||
|
B.1. Excerpts from the non-optimized EM_table
|
||||||
|
.PP
|
||||||
|
Even though the non-optimized version of the EM_table is relatively
|
||||||
|
straight-forward, examples have never hurt anybody.
|
||||||
|
One of the simplest instructions is the \*(Siloc\*(So, which appears in
|
||||||
|
our EM_table as follows:
|
||||||
|
.DS
|
||||||
|
\f6
|
||||||
|
.TA 8 16 24 32 40 48 56 64
|
||||||
|
C_loc ==> "set $1, T1";
|
||||||
|
"dec 4, SP";
|
||||||
|
"st T1, [SP]".
|
||||||
|
\f1
|
||||||
|
.DE
|
||||||
|
Just as \*(SiSP\*(So is an alias for \*(Si%l0\*(So, \*(SiT1\*(So is
|
||||||
|
an alias for \*(Si%g1\*(So.
|
||||||
|
A little more complex is the \*(Siadi\*(So which performs integer
|
||||||
|
addition.
|
||||||
|
.DS
|
||||||
|
\f6
|
||||||
|
C_adi ==> "ld [SP], T1";
|
||||||
|
"ld [SP+4], T2";
|
||||||
|
"add T1, T2, T3";
|
||||||
|
"st T3, [SP+4];
|
||||||
|
"inc 4, SP".
|
||||||
|
\f1
|
||||||
|
.DE
|
||||||
|
We could go on with even more complex instructions, but since that would
|
||||||
|
not contribute to anything the reader is referred to the implementation
|
||||||
|
for more details.
|
||||||
|
.SH
|
||||||
|
B.2. Excerpts from the optimized EM_table
|
||||||
|
.PP
|
||||||
|
The optimized EM_table uses the cache primitives mentioned in chapter 4.
|
||||||
|
This means that the \*(Siloc\*(So this time appears as
|
||||||
|
.DS
|
||||||
|
\f6
|
||||||
|
C_loc ==> push_const($1).
|
||||||
|
\f1
|
||||||
|
.DE
|
||||||
|
The \*(Silol\*(So can now be written as
|
||||||
|
.DS
|
||||||
|
\f6
|
||||||
|
C_lol ==> push_reg(LB);
|
||||||
|
inc_tos($1);
|
||||||
|
push_const(4);
|
||||||
|
C_los(4).
|
||||||
|
\f1
|
||||||
|
.DE
|
||||||
|
Due to the law of conservation of misery somebody has to do the dirty work.
|
||||||
|
In this case, it is the \*(Silos\*(So. To show just a small part of
|
||||||
|
the implementation of the \*(Silos\*(So:
|
||||||
|
.DS
|
||||||
|
\f6
|
||||||
|
C_los $1 == 4 ==>
|
||||||
|
if (type_of_tos() == T_cst) {
|
||||||
|
arith size;
|
||||||
|
const_str_t n;
|
||||||
|
|
||||||
|
size= pop_const();
|
||||||
|
if (size <= 4) {
|
||||||
|
reg_t a;
|
||||||
|
reg_t a;
|
||||||
|
char *LD;
|
||||||
|
|
||||||
|
switch (size) {
|
||||||
|
case 1: LD = "ldub"; break;
|
||||||
|
case 2: LD = "lduh"; break;
|
||||||
|
case 4: LD = "ld"; break;
|
||||||
|
default: arg_error("C_los", size);
|
||||||
|
}
|
||||||
|
a = pop_reg_c13(n);
|
||||||
|
b = alloc_reg();
|
||||||
|
"$LD [$a+$n], $b";
|
||||||
|
push_reg(b);
|
||||||
|
free_reg(a);
|
||||||
|
} else ...
|
||||||
|
\f1
|
||||||
|
.DE
|
||||||
|
For the full implementation, the reader is again referred to the actual
|
||||||
|
implementation. Just to show how other instructions are affected
|
||||||
|
by the optimization we will show that implementation of the \*(Sitge\*(So
|
||||||
|
instruction:
|
||||||
|
.DS
|
||||||
|
\f6
|
||||||
|
C_tge ==> {
|
||||||
|
reg_t a;
|
||||||
|
reg_t b;
|
||||||
|
|
||||||
|
a = pop_reg();
|
||||||
|
b = alloc_reg();
|
||||||
|
" tst $a";
|
||||||
|
" bge,a 1f";
|
||||||
|
" mov 1, $b"; /* delay slot */
|
||||||
|
" set 0, $b";
|
||||||
|
"1:";
|
||||||
|
free_reg(a);
|
||||||
|
push_reg(b);
|
||||||
|
}.
|
||||||
|
|
||||||
|
\f1
|
||||||
|
.DE
|
||||||
|
.SH
|
||||||
|
.bp
|
||||||
|
CREDITS
|
||||||
|
.PP
|
||||||
|
In order of appearance:
|
||||||
|
.TS
|
||||||
|
center;
|
||||||
|
r c l.
|
||||||
|
Original idea - Dick Grune
|
||||||
|
Design & implementation - Philip Homburg
|
||||||
|
- Raymond Michiels
|
||||||
|
Tutor - Dick Grune
|
||||||
|
Assistant Tutor - Ceriel Jacobs
|
||||||
|
Proofreading - Dick Grune
|
||||||
|
- Hans van Eck
|
||||||
|
.TE
|
||||||
|
.SH
|
||||||
|
REFERENCES
|
||||||
|
.PP
|
||||||
|
.[
|
||||||
|
$LIST$
|
||||||
|
.]
|
10
doc/sparc/Makefile
Normal file
10
doc/sparc/Makefile
Normal file
|
@ -0,0 +1,10 @@
|
||||||
|
# $Header$
|
||||||
|
|
||||||
|
REFER=refer
|
||||||
|
TBL=tbl
|
||||||
|
TARGET=-Tlp
|
||||||
|
PIC=pic
|
||||||
|
GRAP=grap
|
||||||
|
|
||||||
|
../sparc.doc: refs title intro 1 2 3 4 5 A B init
|
||||||
|
$(REFER) -sA+T '-l\", ' -p refs title intro 1 2 3 4 5 A B | $(GRAP) | $(PIC) | $(TBL) | soelim > $@
|
18
doc/sparc/init
Normal file
18
doc/sparc/init
Normal file
|
@ -0,0 +1,18 @@
|
||||||
|
.nr PS 12
|
||||||
|
.nr VS 14
|
||||||
|
.\" .fp 6 AM
|
||||||
|
.fp 6 CW
|
||||||
|
.ds Si \f6\s-1
|
||||||
|
.ds So \f1\s+1
|
||||||
|
.ds OQ `\h'-1p'`
|
||||||
|
.ds CQ '\h'-1p''
|
||||||
|
.de UX
|
||||||
|
.ie \\n(UX \s-1UNIX\s0\\$1
|
||||||
|
.el \{\
|
||||||
|
\s-1UNIX\s0\\$1\(dg
|
||||||
|
.FS
|
||||||
|
\(dg \s-1UNIX\s0 is a registered bell of AT&T Trademark Laboratories.
|
||||||
|
.FE
|
||||||
|
.nr UX 1
|
||||||
|
.\}
|
||||||
|
..
|
23
doc/sparc/intro
Normal file
23
doc/sparc/intro
Normal file
|
@ -0,0 +1,23 @@
|
||||||
|
.so init
|
||||||
|
.hw de-vised
|
||||||
|
.TL
|
||||||
|
A fast backend for SPARC processors
|
||||||
|
.AU
|
||||||
|
Philip Homburg
|
||||||
|
Raymond Michiels
|
||||||
|
.AI
|
||||||
|
Dept. of Mathematics and Computer Science
|
||||||
|
Vrije Universiteit
|
||||||
|
Amsterdam, The Netherlands
|
||||||
|
.AB
|
||||||
|
The language EM is an intermediate language for use in compiler
|
||||||
|
construction.
|
||||||
|
In this paper we describe the construction of a so-called fast backend
|
||||||
|
which translates EM code to assembler for SPARC processors.
|
||||||
|
.br
|
||||||
|
Our construction deviates strongly from the usual procedure. We have
|
||||||
|
devised and implemented a virtual stack with which it is possible to
|
||||||
|
generate very acceptable code without much loss in compile time.
|
||||||
|
.AE
|
||||||
|
.PP
|
||||||
|
.bp
|
58
doc/sparc/note_on_reg_wins
Normal file
58
doc/sparc/note_on_reg_wins
Normal file
|
@ -0,0 +1,58 @@
|
||||||
|
When developing a fast compiler for the Sun-4 series we have encountered
|
||||||
|
rather strange behavior of the Sun kernel.
|
||||||
|
|
||||||
|
The problem is that when you have lots of nested procedure calls, (as
|
||||||
|
is often the case in compilers and parsers) the registers fill up which
|
||||||
|
causes a kernel trap. The kernel will then write out some of the registers
|
||||||
|
to memory to make room for another window. When you return from the nested
|
||||||
|
procedure call, just the reverse happens: yet another kernel trap so the
|
||||||
|
kernel can load the register from memory.
|
||||||
|
|
||||||
|
Unfortunately the kernel only saves or loads a single window (= 16 register)
|
||||||
|
on each trap. This means that when you call a procedure recursively it causes
|
||||||
|
a kernel trap on almost every invocation (except for the first few).
|
||||||
|
|
||||||
|
To illustrate this consider the following little program:
|
||||||
|
|
||||||
|
--------------- little program -------------
|
||||||
|
f(i) /* calls itself i times */
|
||||||
|
int i;
|
||||||
|
{
|
||||||
|
if (i)
|
||||||
|
f(i-1);
|
||||||
|
}
|
||||||
|
|
||||||
|
main(argc, argv)
|
||||||
|
int argc;
|
||||||
|
char *argv[];
|
||||||
|
{
|
||||||
|
|
||||||
|
|
||||||
|
i = atoi(argv[1]); /* # loops */
|
||||||
|
j = atoi(argv[2]); /* depth */
|
||||||
|
|
||||||
|
while (i--)
|
||||||
|
f(j);
|
||||||
|
}
|
||||||
|
------------ end of little program -----------
|
||||||
|
|
||||||
|
|
||||||
|
The performance decreases abruptly when the depth (j) becomes larger
|
||||||
|
than 5. On a SPARC station we got the following results:
|
||||||
|
|
||||||
|
depth run time (in seconds)
|
||||||
|
|
||||||
|
1 0.5
|
||||||
|
2 0.8
|
||||||
|
3 1.0
|
||||||
|
4 1.4 <- from here on it's +6 seconds for each
|
||||||
|
5 7.6 step deeper.
|
||||||
|
6 13.9
|
||||||
|
7 19.9
|
||||||
|
8 26.3
|
||||||
|
9 32.9
|
||||||
|
|
||||||
|
Things would be a lot better when instead of just 1, the kernel would
|
||||||
|
save or restore 4 windows (= 64 registers = 50% on our SPARC stations).
|
||||||
|
|
||||||
|
-Raymond.
|
12
doc/sparc/pics/.distr
Normal file
12
doc/sparc/pics/.distr
Normal file
|
@ -0,0 +1,12 @@
|
||||||
|
EM_stack.orig
|
||||||
|
EM_stack.ours
|
||||||
|
compile_bars
|
||||||
|
mem_config
|
||||||
|
perf
|
||||||
|
perf.comp
|
||||||
|
perf.d
|
||||||
|
perf.dhry
|
||||||
|
reg_layout
|
||||||
|
run-time_bars
|
||||||
|
run-time_bars.bup
|
||||||
|
signal_stack
|
34
doc/sparc/pics/EM_stack.orig
Normal file
34
doc/sparc/pics/EM_stack.orig
Normal file
|
@ -0,0 +1,34 @@
|
||||||
|
.PS
|
||||||
|
.ps -2
|
||||||
|
.vs -2
|
||||||
|
boxwid = 1.5;
|
||||||
|
boxht = 0.24
|
||||||
|
down;
|
||||||
|
box "actual parameter n-1";
|
||||||
|
box "." "." "." ht 0.6;
|
||||||
|
box "actual parameter 0";
|
||||||
|
move 0.3
|
||||||
|
box "return status block";
|
||||||
|
{arrow <- right with .w at last box.e; \
|
||||||
|
box invis wid 0.3 "LB" }
|
||||||
|
down
|
||||||
|
move to 2nd last box.s
|
||||||
|
move 0.1
|
||||||
|
box "local variables"
|
||||||
|
box "compiler temporaries"
|
||||||
|
move 0.1
|
||||||
|
box "register save block"
|
||||||
|
move 0.1
|
||||||
|
box "dynamic local generators"
|
||||||
|
move 0.1
|
||||||
|
box "operand"
|
||||||
|
box "operand"
|
||||||
|
move 0.1
|
||||||
|
box "parameter m-1"
|
||||||
|
box "." "." "." ht 0.6;
|
||||||
|
box "parameter 0" with .n at last box .s
|
||||||
|
{ arrow <- right with .w at last box.e; \
|
||||||
|
box invis wid 0.3 "SP" }
|
||||||
|
.ps +2
|
||||||
|
.vs +2
|
||||||
|
.PE
|
106
doc/sparc/pics/EM_stack.ours
Normal file
106
doc/sparc/pics/EM_stack.ours
Normal file
|
@ -0,0 +1,106 @@
|
||||||
|
.ps 10
|
||||||
|
.vs 12
|
||||||
|
.PS
|
||||||
|
boxwid = 1.3
|
||||||
|
boxht = 0.25
|
||||||
|
down;
|
||||||
|
box "floating point" "register dump area" ht 0.6
|
||||||
|
box "tmp float store"
|
||||||
|
box "register dump area" ht 0.6
|
||||||
|
{ arrow <- right with .w at 3/4 <last box.e, last box.se>; \
|
||||||
|
box invis wid 0.3 "%fp" }
|
||||||
|
move .1
|
||||||
|
box dotted "gap"
|
||||||
|
{ arrow <- right with .w at last box.e; \
|
||||||
|
box invis wid 0.3 "%LB" }
|
||||||
|
move .1
|
||||||
|
box "locals"
|
||||||
|
box "actual parameter n-1";
|
||||||
|
box "." "." "." ht 0.6;
|
||||||
|
box "actual parameter 0";
|
||||||
|
{ arrow <- right with .w at last box.e; \
|
||||||
|
box invis wid 0.3 "%SP" }
|
||||||
|
move 0.1
|
||||||
|
box "large gap" "(>64kb)" ht 1.0
|
||||||
|
box "register dump area" ht 0.6
|
||||||
|
{ arrow <- right with .w at 3/4 <last box.e, last box.se>; \
|
||||||
|
box invis wid 0.3 "%sp" }
|
||||||
|
move 0.2
|
||||||
|
box invis "\\s+2just before call\\s0"
|
||||||
|
move 1
|
||||||
|
box dotted "gap"
|
||||||
|
box invis "0 or 4 bytes" "for stack alignment" with .w at last box.e
|
||||||
|
box invis height .7 "when gap is 0 bytes," "%fp == %LB" with .n at 2nd last box.s
|
||||||
|
.PF
|
||||||
|
.PS
|
||||||
|
down;
|
||||||
|
move to 2.4,0
|
||||||
|
box "floating point" "register dump area" ht 0.6
|
||||||
|
box "tmp float store"
|
||||||
|
box "register dump area" ht 0.6
|
||||||
|
{ arrow <- right with .w at 3/4 <last box.e, last box.se>; \
|
||||||
|
box invis wid 0.3 "%fp" }
|
||||||
|
move .1
|
||||||
|
box dotted "gap"
|
||||||
|
{ arrow <- right with .w at last box.e; \
|
||||||
|
box invis wid 0.3 "%LB" }
|
||||||
|
move .1
|
||||||
|
box "locals"
|
||||||
|
box "actual parameter n-1";
|
||||||
|
box "." "." "." ht 0.6;
|
||||||
|
box "actual parameter 0";
|
||||||
|
{ arrow <- right with .w at last box.e; \
|
||||||
|
box invis wid 0.3 "%SP" }
|
||||||
|
move .1
|
||||||
|
box dotted "gap"
|
||||||
|
move .4
|
||||||
|
box "floating point" "register dump area" ht 0.6
|
||||||
|
box "tmp float store"
|
||||||
|
box "register dump area" ht 0.6
|
||||||
|
{ arrow <- right with .w at 3/4 <last box.e, last box.se>; \
|
||||||
|
box invis wid 0.3 "%sp" }
|
||||||
|
move 0.2
|
||||||
|
box invis "\\s+2'during' call\\s0"
|
||||||
|
.PF
|
||||||
|
.PS
|
||||||
|
down;
|
||||||
|
move to 4.8,0
|
||||||
|
box "floating point" "register dump area" ht 0.6
|
||||||
|
box "tmp float store"
|
||||||
|
box "register dump area" ht 0.6
|
||||||
|
move .1
|
||||||
|
box dotted "gap"
|
||||||
|
move .1
|
||||||
|
box "locals"
|
||||||
|
box "actual parameter n-1";
|
||||||
|
box "." "." "." ht 0.6;
|
||||||
|
box "actual parameter 0";
|
||||||
|
move .1
|
||||||
|
box dotted "gap"
|
||||||
|
move .4
|
||||||
|
box "floating point" "register dump area" ht 0.6
|
||||||
|
box "tmp float store"
|
||||||
|
box "register dump area" ht 0.6
|
||||||
|
{ arrow <- right with .w at 3/4 <last box.e, last box.se>; \
|
||||||
|
box invis wid 0.3 "%fp" }
|
||||||
|
move .1
|
||||||
|
box dotted "gap"
|
||||||
|
{ arrow <- right with .w at last box.e; \
|
||||||
|
box invis wid 0.3 "%LB" }
|
||||||
|
move .1
|
||||||
|
box "locals"
|
||||||
|
box "actual parameter n-1";
|
||||||
|
box "." "." "." ht 0.6;
|
||||||
|
box "actual parameter 0";
|
||||||
|
{ arrow <- right with .w at last box.e; \
|
||||||
|
box invis wid 0.3 "%SP" }
|
||||||
|
move 0.1
|
||||||
|
box "large gap" "(>64kb)" ht 1.0
|
||||||
|
box "register dump area" ht 0.6
|
||||||
|
{ arrow <- right with .w at 3/4 <last box.e, last box.se>; \
|
||||||
|
box invis wid 0.3 "%sp" }
|
||||||
|
move 0.2
|
||||||
|
box invis "\\s+2after call\\s0"
|
||||||
|
.PF
|
||||||
|
.ps 12
|
||||||
|
.vs 14
|
49
doc/sparc/pics/compile_bars
Normal file
49
doc/sparc/pics/compile_bars
Normal file
|
@ -0,0 +1,49 @@
|
||||||
|
.PS
|
||||||
|
boxht = 0.5
|
||||||
|
boxwid = 1
|
||||||
|
moveht = 0.65
|
||||||
|
down;
|
||||||
|
{
|
||||||
|
right;
|
||||||
|
box invis "ACK" "w/o" "opt"
|
||||||
|
box "cem" "0.7" wid 0.7
|
||||||
|
box "opt" "0.4" wid 0.4
|
||||||
|
box "be" "1.1" wid 1.1
|
||||||
|
box "as" "1.4" wid 1.4
|
||||||
|
box "ld" "0.4" wid 0.4
|
||||||
|
box invis "4.0" wid 0.5
|
||||||
|
}
|
||||||
|
move
|
||||||
|
{
|
||||||
|
right;
|
||||||
|
box invis "ACK" "with" "opt"
|
||||||
|
box "cem" "0.7" wid 0.7
|
||||||
|
box "opt" "0.4" wid 0.4
|
||||||
|
box "be" "0.6" wid 0.6
|
||||||
|
box "as" "0.7" wid 0.7
|
||||||
|
box "ld" "0.4" wid 0.4
|
||||||
|
box invis "2.8" wid 0.5
|
||||||
|
}
|
||||||
|
move
|
||||||
|
{
|
||||||
|
right;
|
||||||
|
box invis "\fIcc\fR"
|
||||||
|
box "cpp" "0.2" wid 0.2
|
||||||
|
box "ccom" "1.0" wid 1.0
|
||||||
|
box "as" "0.7" wid 0.7
|
||||||
|
box "ld" "0.4" wid 0.4
|
||||||
|
box invis "2.3" wid 0.5
|
||||||
|
}
|
||||||
|
move
|
||||||
|
{
|
||||||
|
right;
|
||||||
|
box invis "\fIcc -O4\fR"
|
||||||
|
box "cpp" "0.2" wid 0.2
|
||||||
|
box "ccom" "1.0" wid 1.0
|
||||||
|
box "iropt" "5.0 (not to scale!)" wid 1.5
|
||||||
|
box "cg" "0.7" wid 0.7
|
||||||
|
box "as" "1.7" wid 1.7
|
||||||
|
box "ld" "0.4" wid 0.4
|
||||||
|
box invis "9.0" wid 0.5
|
||||||
|
}
|
||||||
|
.PE
|
34
doc/sparc/pics/mem_config
Normal file
34
doc/sparc/pics/mem_config
Normal file
|
@ -0,0 +1,34 @@
|
||||||
|
.PS
|
||||||
|
boxwid = 1.3
|
||||||
|
down
|
||||||
|
[
|
||||||
|
right
|
||||||
|
[
|
||||||
|
down;
|
||||||
|
box "stack" ht .6
|
||||||
|
box "free" ht 1
|
||||||
|
box "heap" ht .3
|
||||||
|
box "text" ht .5
|
||||||
|
]
|
||||||
|
move 1
|
||||||
|
[
|
||||||
|
down;
|
||||||
|
box "\s-4SPARC stack\s+4" ht .2
|
||||||
|
box "\s-4EM stack\s+4" ht .1
|
||||||
|
box "\s-4SPARC stack\s+4" ht .1
|
||||||
|
box "\s-4EM stack\s+4" ht .1
|
||||||
|
box "\s-4free\s+4" ht .2
|
||||||
|
box "\s-4SPARC stack\s+4" ht .1
|
||||||
|
box "free" ht .8
|
||||||
|
box "heap" ht .3
|
||||||
|
box "text" ht .5
|
||||||
|
]
|
||||||
|
]
|
||||||
|
move .3
|
||||||
|
[
|
||||||
|
right
|
||||||
|
box invis "regular \(UX memory layout"
|
||||||
|
move 1
|
||||||
|
box invis "memory layout for EM"
|
||||||
|
]
|
||||||
|
.PF
|
12
doc/sparc/pics/perf
Normal file
12
doc/sparc/pics/perf
Normal file
|
@ -0,0 +1,12 @@
|
||||||
|
.G1
|
||||||
|
frame invis left solid bot solid
|
||||||
|
label left "run time" "(log scale)" left .5
|
||||||
|
label bot "compile time (log scale)"
|
||||||
|
coord x 0.1,10 log x y 1000,20000 log y
|
||||||
|
ticks left out at 2000,5000,10000,20000
|
||||||
|
ticks bot out at 0.1 0.3 1.0 3.0 10
|
||||||
|
copy "perf.d" thru X
|
||||||
|
"\(bu" at $1, $2
|
||||||
|
"$3" rjust at $1, $2
|
||||||
|
X
|
||||||
|
.G2
|
7
doc/sparc/pics/perf.comp
Normal file
7
doc/sparc/pics/perf.comp
Normal file
|
@ -0,0 +1,7 @@
|
||||||
|
in-line in ../A
|
||||||
|
|
||||||
|
2.5 17.28 ack w/o opt
|
||||||
|
1.6 2.93 ack with opt
|
||||||
|
9.4 2.26 ack -O4
|
||||||
|
1.5 7.43 \fIcc\fR
|
||||||
|
2.7 2.02 \fIcc -O4\fR
|
4
doc/sparc/pics/perf.d
Normal file
4
doc/sparc/pics/perf.d
Normal file
|
@ -0,0 +1,4 @@
|
||||||
|
1.0 1700 ack w/o opt
|
||||||
|
1.9 8000 ack with opt
|
||||||
|
1.6 8000 \fIcc\fR
|
||||||
|
7 18000 \fIcc -O4\fR
|
7
doc/sparc/pics/perf.dhry
Normal file
7
doc/sparc/pics/perf.dhry
Normal file
|
@ -0,0 +1,7 @@
|
||||||
|
in-line in ../A
|
||||||
|
|
||||||
|
3.5 1700 ack w/o opt
|
||||||
|
2.8 8770 ack with opt
|
||||||
|
16.0 10434 ack -O4
|
||||||
|
2.3 7270 \fIcc\fR
|
||||||
|
9.0 12500 \fIcc -O4\fR
|
24
doc/sparc/pics/reg_layout
Normal file
24
doc/sparc/pics/reg_layout
Normal file
|
@ -0,0 +1,24 @@
|
||||||
|
.nr PS 12
|
||||||
|
.nr VS 14
|
||||||
|
.PP
|
||||||
|
.TS
|
||||||
|
allbox;
|
||||||
|
l l l l
|
||||||
|
l2f6 l l2f6 l.
|
||||||
|
g0 0 l0 EM_SP
|
||||||
|
g1 temporary 1 l1 EM_LB
|
||||||
|
g2 temporary 2 l2
|
||||||
|
g3 temporary 3 l3 reserved
|
||||||
|
g4 64k..1M l4 reserved
|
||||||
|
g5 temporary 4 l5 reserved
|
||||||
|
g6 line number l6 reserved
|
||||||
|
g7 file name l7 reserved
|
||||||
|
o0 param 1 i0
|
||||||
|
o1 param 2 i1
|
||||||
|
o2 param 3 i2
|
||||||
|
o3 param 4 i3
|
||||||
|
o4 RETL_LD i4 RETL_ST
|
||||||
|
o5 RETH_LD i5 RETH_ST
|
||||||
|
sp stack pointer fp frame pointer
|
||||||
|
o7 xxx i7 return address
|
||||||
|
.TE
|
101
doc/sparc/pics/run-time_bars
Normal file
101
doc/sparc/pics/run-time_bars
Normal file
|
@ -0,0 +1,101 @@
|
||||||
|
.PS
|
||||||
|
boxht = 0.5
|
||||||
|
boxwid = 1
|
||||||
|
moveht = 1
|
||||||
|
down;
|
||||||
|
{
|
||||||
|
right;
|
||||||
|
box invis "ACK" "w/o" "opt."
|
||||||
|
move
|
||||||
|
[
|
||||||
|
down;
|
||||||
|
boxht = 0.25
|
||||||
|
box wid 4.5
|
||||||
|
"Sieve" ljust at last box.w + 0.1,-0.02
|
||||||
|
"10(!)" ljust at last box.e + 0.1,-0.02
|
||||||
|
box wid 4.5 with .nw at last box.sw
|
||||||
|
"Dhrystones" ljust at last box.w + 0.1,-0.02
|
||||||
|
"10(!)" ljust at last box.e + 0.1,-0.02
|
||||||
|
] with .w at last box.e
|
||||||
|
}
|
||||||
|
move
|
||||||
|
{
|
||||||
|
right;
|
||||||
|
box invis "ACK" "with" "our" "opt."
|
||||||
|
move
|
||||||
|
[
|
||||||
|
down;
|
||||||
|
boxht = 0.25
|
||||||
|
box wid 1.4
|
||||||
|
"Sieve" ljust at last box.w + 0.1,-0.02
|
||||||
|
"1.4" ljust at last box.e + 0.1,-0.02
|
||||||
|
box wid 1.9 with .nw at last box.sw
|
||||||
|
"Dhrystones" ljust at last box.w + 0.1,-0.02
|
||||||
|
"1.9" ljust at last box.e + 0.1,-0.02
|
||||||
|
] with .w at last box.e
|
||||||
|
}
|
||||||
|
move
|
||||||
|
{
|
||||||
|
right;
|
||||||
|
box invis "ACK" "-O4"
|
||||||
|
move
|
||||||
|
[
|
||||||
|
down;
|
||||||
|
boxht = 0.25
|
||||||
|
box wid 1.1
|
||||||
|
"Sieve" ljust at last box.w + 0.1,-0.02
|
||||||
|
"1.1" ljust at last box.e + 0.1,-0.02
|
||||||
|
box wid 1.6 with .nw at last box.sw
|
||||||
|
"Dhrystones" ljust at last box.w + 0.1,-0.02
|
||||||
|
"1.6" ljust at last box.e + 0.1,-0.02
|
||||||
|
] with .w at last box.e
|
||||||
|
}
|
||||||
|
move
|
||||||
|
{
|
||||||
|
right;
|
||||||
|
box invis "Sun's" "compiler" "w/o opt."
|
||||||
|
move
|
||||||
|
[
|
||||||
|
down;
|
||||||
|
boxht = 0.25
|
||||||
|
box wid 3.7
|
||||||
|
"Sieve" ljust at last box.w + 0.1,-0.02
|
||||||
|
"3.7" ljust at last box.e + 0.1,-0.02
|
||||||
|
box wid 2.2 with .nw at last box.sw
|
||||||
|
"Dhrystones" ljust at last box.w + 0.1,-0.02
|
||||||
|
"2.2" ljust at last box.e + 0.1,-0.02
|
||||||
|
] with .w at last box.e
|
||||||
|
}
|
||||||
|
move
|
||||||
|
{
|
||||||
|
right;
|
||||||
|
box invis "Sun's" "compiler" "-O"
|
||||||
|
move
|
||||||
|
[
|
||||||
|
down;
|
||||||
|
boxht = 0.25
|
||||||
|
box wid 1.1
|
||||||
|
"Sieve" ljust at last box.w + 0.1,-0.02
|
||||||
|
"1.1" ljust at last box.e + 0.1,-0.02
|
||||||
|
box wid 0.8 with .nw at last box.sw
|
||||||
|
"Dhryst." ljust at last box.w + 0.1,-0.02
|
||||||
|
"0.8!" ljust at last box.e + 0.1,-0.02
|
||||||
|
] with .w at last box.e
|
||||||
|
}
|
||||||
|
move
|
||||||
|
{
|
||||||
|
right;
|
||||||
|
box invis "Sun's" "compiler" "-O4"
|
||||||
|
move
|
||||||
|
[
|
||||||
|
down;
|
||||||
|
boxht = 0.25
|
||||||
|
box wid 1.0
|
||||||
|
"Sieve" ljust at last box.w + 0.1,-0.02
|
||||||
|
"1.0" ljust at last box.e + 0.1,-0.02
|
||||||
|
box wid 1.0 with .nw at last box.sw
|
||||||
|
"Dhrystones" ljust at last box.w + 0.1,-0.02
|
||||||
|
"1.0" ljust at last box.e + 0.1,-0.02
|
||||||
|
] with .w at last box.e
|
||||||
|
}
|
||||||
|
.PE
|
100
doc/sparc/pics/run-time_bars.bup
Normal file
100
doc/sparc/pics/run-time_bars.bup
Normal file
|
@ -0,0 +1,100 @@
|
||||||
|
.PS
|
||||||
|
boxht = 0.5
|
||||||
|
boxwid = 1
|
||||||
|
moveht = 1
|
||||||
|
down;
|
||||||
|
{
|
||||||
|
right;
|
||||||
|
box invis "ACK" "w/o" "opt"
|
||||||
|
move
|
||||||
|
[
|
||||||
|
down;
|
||||||
|
boxht = 0.25
|
||||||
|
box wid 4.5
|
||||||
|
"C (arithmetic)" ljust at last box.w + 0.1,-0.02
|
||||||
|
"10(!)" ljust at last box.e + 0.1,-0.02
|
||||||
|
box wid 4.5 with .nw at last box.sw
|
||||||
|
"C (dhrystones)" ljust at last box.w + 0.1,-0.02
|
||||||
|
"10(!)" ljust at last box.e + 0.1,-0.02
|
||||||
|
box wid 4.5 with .nw at last box.sw
|
||||||
|
"Modula-2" ljust at last box.w + 0.1,-0.02
|
||||||
|
"8(!)" ljust at last box.e + 0.1,-0.02
|
||||||
|
] with .w at last box.e
|
||||||
|
}
|
||||||
|
move
|
||||||
|
{
|
||||||
|
right;
|
||||||
|
box invis "ACK" "with" "peep-hole" "opt"
|
||||||
|
move
|
||||||
|
[
|
||||||
|
down;
|
||||||
|
boxht = 0.25
|
||||||
|
box wid 1.4
|
||||||
|
"C (arithmetic)" ljust at last box.w + 0.1,-0.02
|
||||||
|
"1.4" ljust at last box.e + 0.1,-0.02
|
||||||
|
box wid 1.9 with .nw at last box.sw
|
||||||
|
"C (dhrystones)" ljust at last box.w + 0.1,-0.02
|
||||||
|
"1.9" ljust at last box.e + 0.1,-0.02
|
||||||
|
box wid 2.5 with .nw at last box.sw
|
||||||
|
"Modula-2" ljust at last box.w + 0.1,-0.02
|
||||||
|
"2.5" ljust at last box.e + 0.1,-0.02
|
||||||
|
] with .w at last box.e
|
||||||
|
}
|
||||||
|
move
|
||||||
|
{
|
||||||
|
right;
|
||||||
|
box invis "ACK" "-O4"
|
||||||
|
move
|
||||||
|
[
|
||||||
|
down;
|
||||||
|
boxht = 0.25
|
||||||
|
box wid 1.1
|
||||||
|
"C (arithmetic)" ljust at last box.w + 0.1,-0.02
|
||||||
|
"1.1" ljust at last box.e + 0.1,-0.02
|
||||||
|
box wid 1.6 with .nw at last box.sw
|
||||||
|
"C (dhrystones)" ljust at last box.w + 0.1,-0.02
|
||||||
|
"1.6" ljust at last box.e + 0.1,-0.02
|
||||||
|
box wid 2.5 with .nw at last box.sw
|
||||||
|
"Modula-2" ljust at last box.w + 0.1,-0.02
|
||||||
|
"2.5" ljust at last box.e + 0.1,-0.02
|
||||||
|
] with .w at last box.e
|
||||||
|
}
|
||||||
|
move
|
||||||
|
{
|
||||||
|
right;
|
||||||
|
box invis "Sun's" "compiler" "w/o opt."
|
||||||
|
move
|
||||||
|
[
|
||||||
|
down;
|
||||||
|
boxht = 0.25
|
||||||
|
box wid 3.7
|
||||||
|
"C (arithmetic)" ljust at last box.w + 0.1,-0.02
|
||||||
|
"3.7" ljust at last box.e + 0.1,-0.02
|
||||||
|
box wid 2.2 with .nw at last box.sw
|
||||||
|
"C (dhrystones)" ljust at last box.w + 0.1,-0.02
|
||||||
|
"2.2" ljust at last box.e + 0.1,-0.02
|
||||||
|
box wid 1.8 with .nw at last box.sw
|
||||||
|
"Modula-2" ljust at last box.w + 0.1,-0.02
|
||||||
|
"1.8" ljust at last box.e + 0.1,-0.02
|
||||||
|
] with .w at last box.e
|
||||||
|
}
|
||||||
|
move
|
||||||
|
{
|
||||||
|
right;
|
||||||
|
box invis "Sun's" "compiler" "-O4"
|
||||||
|
move
|
||||||
|
[
|
||||||
|
down;
|
||||||
|
boxht = 0.25
|
||||||
|
box wid 1.0
|
||||||
|
"C (arith.)" ljust at last box.w + 0.1,-0.02
|
||||||
|
"1.0" ljust at last box.e + 0.1,-0.02
|
||||||
|
box wid 1.0 with .nw at last box.sw
|
||||||
|
"C (dhryst.)" ljust at last box.w + 0.1,-0.02
|
||||||
|
"1.0" ljust at last box.e + 0.1,-0.02
|
||||||
|
box wid 1.0 with .nw at last box.sw
|
||||||
|
"Modula-2" ljust at last box.w + 0.1,-0.02
|
||||||
|
"1.0" ljust at last box.e + 0.1,-0.02
|
||||||
|
] with .w at last box.e
|
||||||
|
}
|
||||||
|
.PE
|
42
doc/sparc/pics/signal_stack
Normal file
42
doc/sparc/pics/signal_stack
Normal file
|
@ -0,0 +1,42 @@
|
||||||
|
.PS
|
||||||
|
boxwid = 1.3
|
||||||
|
down
|
||||||
|
[
|
||||||
|
right
|
||||||
|
[
|
||||||
|
down;
|
||||||
|
box "\s-4SPARC stack\s+4" ht .2
|
||||||
|
box "\s-4EM stack\s+4" ht .1
|
||||||
|
box "\s-4SPARC stack\s+4" ht .1
|
||||||
|
box "\s-4EM stack\s+4" ht .1
|
||||||
|
box "\s-4free\s+4" ht .2
|
||||||
|
box "\s-4SPARC stack\s+4" ht .1
|
||||||
|
box "free" ht .8
|
||||||
|
box "heap" ht .3
|
||||||
|
box "text" ht .5
|
||||||
|
]
|
||||||
|
move 1
|
||||||
|
[
|
||||||
|
down;
|
||||||
|
box "\s-4SPARC stack\s+4" ht .2
|
||||||
|
box "\s-4EM stack\s+4" ht .1
|
||||||
|
box "\s-4SPARC stack\s+4" ht .1
|
||||||
|
box "\s-4EM stack\s+4" ht .1
|
||||||
|
box "\s-4free\s+4" ht .2
|
||||||
|
box "\s-4SPARC stack\s+4" ht .1
|
||||||
|
box "\s-4EM stack\s+4" ht .1
|
||||||
|
box "\s-4free\s+4" ht .2
|
||||||
|
box "\s-4SPARC stack\s+4" ht .1
|
||||||
|
box "free" ht .4
|
||||||
|
box "heap" ht .3
|
||||||
|
box "text" ht .5
|
||||||
|
]
|
||||||
|
]
|
||||||
|
move .3
|
||||||
|
[
|
||||||
|
right
|
||||||
|
box invis "before signal"
|
||||||
|
move 1
|
||||||
|
box invis "during (1st) signal"
|
||||||
|
]
|
||||||
|
.PF
|
31
doc/sparc/printP4P
Normal file
31
doc/sparc/printP4P
Normal file
|
@ -0,0 +1,31 @@
|
||||||
|
echo $0
|
||||||
|
case $1 in
|
||||||
|
1 )
|
||||||
|
CMD="cat"
|
||||||
|
;;
|
||||||
|
2 )
|
||||||
|
CMD="cat"
|
||||||
|
;;
|
||||||
|
3 )
|
||||||
|
CMD="cat"
|
||||||
|
;;
|
||||||
|
4 )
|
||||||
|
CMD="pic | tbl"
|
||||||
|
;;
|
||||||
|
5 )
|
||||||
|
CMD="tbl"
|
||||||
|
;;
|
||||||
|
A )
|
||||||
|
CMD="grap | pic"
|
||||||
|
;;
|
||||||
|
B )
|
||||||
|
CMD="tbl"
|
||||||
|
;;
|
||||||
|
esac
|
||||||
|
echo $0
|
||||||
|
if [ $0 = printP4P ]
|
||||||
|
then
|
||||||
|
refer -sA+T '-l\", ' -p refs $1 | eval $CMD | troff -ms -Tp4p | dip -Tp4p -Pp4p
|
||||||
|
else
|
||||||
|
xtroff -full -geom 665x883+566+0 -command "refer -sA+T '-l\", ' -p refs $1 | $CMD | troff -ms -Tp4p"
|
||||||
|
fi
|
185
doc/sparc/refs
Normal file
185
doc/sparc/refs
Normal file
|
@ -0,0 +1,185 @@
|
||||||
|
%T The design of very fast portable compilers
|
||||||
|
%A A.S. Tanenbaum
|
||||||
|
%A M.F. Kaashoek
|
||||||
|
%A K.G. Langendoen
|
||||||
|
%A C.J.H. Jacobs
|
||||||
|
%J SIGPLAN Notices
|
||||||
|
%V 24
|
||||||
|
%N 11
|
||||||
|
%P 125-131
|
||||||
|
%D November 1989
|
||||||
|
|
||||||
|
%T A Programmer-friendly LL(1) Parser Generator
|
||||||
|
%A D. Grune
|
||||||
|
%A C.J.H. Jacobs
|
||||||
|
%J Software \- Practice and Experience
|
||||||
|
%V 18
|
||||||
|
%N 1
|
||||||
|
%P 29-38
|
||||||
|
%D January 1988
|
||||||
|
|
||||||
|
%T The Code Expander Generator
|
||||||
|
%A Frans Kaashoek
|
||||||
|
%A Koen Langendoen
|
||||||
|
%R IM-9
|
||||||
|
%I Vrije Universiteit, Amsterdam
|
||||||
|
%D November 1987
|
||||||
|
|
||||||
|
%T The ACK Pascal Compiler
|
||||||
|
%A Aad Geudeke
|
||||||
|
%A Frans Hofmeester
|
||||||
|
%R IM-8
|
||||||
|
%I Vrije Universiteit, Amsterdam
|
||||||
|
%D November 1987
|
||||||
|
|
||||||
|
%T The EM-interpreter
|
||||||
|
%A Eddo de Groot
|
||||||
|
%A Leo van den Berge
|
||||||
|
%R IM-7
|
||||||
|
%I Vrije Universiteit, Amsterdam
|
||||||
|
%D June 1987
|
||||||
|
|
||||||
|
%T A set of multi\-process primitives for stack based machines
|
||||||
|
%A K. Bot
|
||||||
|
%A E. Scheffer
|
||||||
|
%R IR-122
|
||||||
|
%I Vrije Universiteit, Amsterdam
|
||||||
|
%D December 1986
|
||||||
|
|
||||||
|
%T An Occam Compiler
|
||||||
|
%A K. Bot
|
||||||
|
%A E. Scheffer
|
||||||
|
%R IM-6
|
||||||
|
%I Vrije Universiteit, Amsterdam
|
||||||
|
%D December 1986
|
||||||
|
|
||||||
|
%T Language- and Machine-independent Global Optimization on Intermediate Code
|
||||||
|
%A H.E. Bal
|
||||||
|
%A A.S. Tanenbaum
|
||||||
|
%J Computer Languages
|
||||||
|
%V 11
|
||||||
|
%N 2
|
||||||
|
%P 105-121
|
||||||
|
%D April 1986
|
||||||
|
|
||||||
|
%T The ACK Target Optimizer
|
||||||
|
%A H.E. Bal
|
||||||
|
%R IR-107
|
||||||
|
%D 1985
|
||||||
|
%I Vrije Universiteit, Amsterdam
|
||||||
|
|
||||||
|
%T Some Topics in Parser Generation
|
||||||
|
%A C.J.H. Jacobs
|
||||||
|
%R IR-105
|
||||||
|
%D October 1985
|
||||||
|
%I Vrije Universiteit, Amsterdam
|
||||||
|
|
||||||
|
%T The CEM compiler
|
||||||
|
%A E.H. Baalbergen
|
||||||
|
%A D. Grune
|
||||||
|
%A M. Waage
|
||||||
|
%R IM-4
|
||||||
|
%I Vrije Universiteit, Amsterdam
|
||||||
|
%D 1985
|
||||||
|
|
||||||
|
%T The Design and Implementation of the EM Global Optimizer
|
||||||
|
%A H.E. Bal
|
||||||
|
%I Vrije Universiteit, Amsterdam
|
||||||
|
%R IR-99
|
||||||
|
%D March 1985
|
||||||
|
|
||||||
|
%T Does anybody out there want to write HALF of a compiler?
|
||||||
|
%A A.S. Tanenbaum
|
||||||
|
%A E.G. Keizer
|
||||||
|
%A H. van Staveren
|
||||||
|
%J Sigplan Notices
|
||||||
|
%V 19
|
||||||
|
%N 8
|
||||||
|
%P 106-108
|
||||||
|
%D August 1984
|
||||||
|
|
||||||
|
%T Amsterdam Compiler Kit documentation
|
||||||
|
%A A.S. Tanenbaum et. al.
|
||||||
|
%I Vrije Universiteit, Amsterdam
|
||||||
|
%R IR-90
|
||||||
|
%D June 1984
|
||||||
|
|
||||||
|
%T A Practical Toolkit for Making Portable Compilers
|
||||||
|
%A A. S. Tanenbaum
|
||||||
|
%A H. van Staveren
|
||||||
|
%A E. G. Keizer
|
||||||
|
%A J. W. Stevenson
|
||||||
|
%J Communications of the ACM
|
||||||
|
%V 26
|
||||||
|
%N 9
|
||||||
|
%P 654-660
|
||||||
|
%D September 1983
|
||||||
|
|
||||||
|
%T Description of a Machine Architecture for use with Block Structured
|
||||||
|
Languages
|
||||||
|
%A A. S. Tanenbaum
|
||||||
|
%A H. van Staveren
|
||||||
|
%A E. G. Keizer
|
||||||
|
%A J. W. Stevenson
|
||||||
|
%R IR-81
|
||||||
|
%D August 1983
|
||||||
|
%I Vrije Universiteit, Amsterdam
|
||||||
|
|
||||||
|
%T A Unix Toolkit for Making Portable Compilers
|
||||||
|
%A A.S. Tanenbaum
|
||||||
|
%A H. van Staveren
|
||||||
|
%A E.G. Keizer
|
||||||
|
%A J.W. Stevenson
|
||||||
|
%J Proceedings USENIX conf.
|
||||||
|
%C Toronto, Canada
|
||||||
|
%V 26
|
||||||
|
%D July 1983
|
||||||
|
%P 255-261
|
||||||
|
|
||||||
|
%T Using Peephole Optimization on Intermediate Code
|
||||||
|
%A A.S. Tanenbaum
|
||||||
|
%A J.M. van Staveren
|
||||||
|
%A J.W. Stevenson
|
||||||
|
%J TOPLAS
|
||||||
|
%V 4
|
||||||
|
%N 1
|
||||||
|
%P 21-36
|
||||||
|
%D January 1982
|
||||||
|
|
||||||
|
%T EM-1 Compiler
|
||||||
|
%A A.S. Tanenbaum
|
||||||
|
%J Pascal News
|
||||||
|
%D September 1981
|
||||||
|
%P 4-38
|
||||||
|
|
||||||
|
%T A portable compiler for the Proposed ISO Standard Pascal Language
|
||||||
|
%A A.S. Tanenbaum
|
||||||
|
%A J.W. Stevenson
|
||||||
|
%A H. van Staveren
|
||||||
|
%J Sigplan Notices
|
||||||
|
%V 15
|
||||||
|
%N 10
|
||||||
|
%D 1980
|
||||||
|
|
||||||
|
%T Implications of Structured Programming for Machine Architecture
|
||||||
|
%A A.S. Tanenbaum
|
||||||
|
%J CACM
|
||||||
|
%V 21
|
||||||
|
%N 3
|
||||||
|
%P 237-246
|
||||||
|
%D March 1978
|
||||||
|
|
||||||
|
%T The table driven code generator from the Amsterdam Compiler Kit (Second
|
||||||
|
revised edition)
|
||||||
|
%A H. van Staveren
|
||||||
|
%I Vrije Universiteit, Amsterdam
|
||||||
|
%R on-line internal ACK documentation
|
||||||
|
%D early 1985
|
||||||
|
|
||||||
|
%T Dhrystone Benchmark: Rationale for Version 2 and Measurement Rules
|
||||||
|
%A R.P. Weicker
|
||||||
|
%J Sigplan Notices
|
||||||
|
%V 23
|
||||||
|
%N 8
|
||||||
|
%D august 1988
|
||||||
|
%P 49-62
|
22
doc/sparc/timing
Normal file
22
doc/sparc/timing
Normal file
|
@ -0,0 +1,22 @@
|
||||||
|
DHRYSTONES V2.0
|
||||||
|
|
||||||
|
cc cc -O4 cc -O fccO fccCE ack ack -O4
|
||||||
|
compile time:
|
||||||
|
real 4.0 12.0 10.0 6.4 8.0 31.0
|
||||||
|
user 1.6 7.3 4.1 1.9 1.8 2.0 9.3
|
||||||
|
sys 0.9 2.1 1.8 2.5 1.5 2.0 7.7
|
||||||
|
|
||||||
|
run time: 7263 16250 15250 4730 3430 8474 10434
|
||||||
|
(stones/sec)
|
||||||
|
|
||||||
|
SIEVE
|
||||||
|
|
||||||
|
cc cc -O4 fccO fccCE ack ack -O4
|
||||||
|
compile time:
|
||||||
|
real 2.4 4.4 x 3.3 6.4 17.0
|
||||||
|
user 0.8 1.6 x 0.7 0.7 3.2
|
||||||
|
sys 0.7 1.0 x 0.8 1.3 6.2
|
||||||
|
|
||||||
|
run time: 7.43 2.02 x 12.18 2.93 2.26
|
||||||
|
|
||||||
|
All ack-derived compilers are shell script driven
|
15
doc/sparc/title
Normal file
15
doc/sparc/title
Normal file
|
@ -0,0 +1,15 @@
|
||||||
|
.so init
|
||||||
|
.TL
|
||||||
|
.sp 1.2c
|
||||||
|
A fast backend for SPARC processors
|
||||||
|
.AU
|
||||||
|
Philip Homburg
|
||||||
|
Raymond Michiels
|
||||||
|
.AI
|
||||||
|
Dept. of Mathematics and Computer Science
|
||||||
|
Vrije Universiteit
|
||||||
|
Amsterdam, The Netherlands
|
||||||
|
.PP
|
||||||
|
.sp 1i
|
||||||
|
Afstudeerverslag, 20 augustus 1990
|
||||||
|
.bp
|
Loading…
Add table
Reference in a new issue