This commit is contained in:
ceriel 1991-11-01 09:43:36 +00:00
parent 0633c900a8
commit 44cc075183
23 changed files with 2115 additions and 0 deletions

50
doc/pascal/ab+intro.doc Normal file
View file

@ -0,0 +1,50 @@
.TL
The ACK Pascal Compiler
.AU
Aad Geudeke
Frans Hofmeester
.AI
Dept. of Mathematics and Computer Science
Vrije Universiteit
Amsterdam, The Netherlands
.AB
This document describes the implementation of a Pascal to EM compiler. The
compiler is written in C. The lexical analysis is done using a hand-written
lexical analyzer. Semantic analysis makes use of the extended LL(1) parser
generator LLgen. Several EM utility modules are used in the compiler.
.AE
.sp 2
.NH
Introduction
.PP
.nh
The Pascal front end of the Amsterdam Compiler Kit (ACK) complies with the
requirements of the international standard published by the International
Organization for Standardization (ISO) [ISO]. An informal description, which
unfortunately is not conforming to the standard, of the programming language
Pascal is given in [JEN].
.PP
The main reason for rewriting the Pascal compiler was that the old Pascal
compiler was written in Pascal itself, and a disadvantage of it was its
lack of flexibility. The compiler did not meet the needs of the current
ACK-framework, which makes use of modern parsing techniques and utility
modules. In this framework it is, for example, possible to use a fast back
end. Such a back end translates directly to object code [ACK]. Our compiler is
written in C and it is designed similar to the current C and Modula-2 compiler
of ACK.
.PP
Chapter 2 describes the basic structure of the compiler. Chapter 3 discusses
the code generation of the main Pascal constructs. Chapter 4 covers one of
the major components of Pascal, viz. the conformant array. In Chapter 5 the
various compiler options that can be used are enumerated. The extensions
to the standard and the deviations from the standard are listed in Chapter
6 and 7. Chapter 8 presents some ideas to improve the standard. Chapter 9
gives a short overview of testing the compiler. The major differences
between the old and new compiler can be found in Chapter 10. Suggestions
to improve the compiler are described in Chapter 11. The appendices
contain the grammar of Pascal and the changes made to the ACK Pascal run time
library. A translation of a Pascal program to EM code as example is presented.
.bp

89
doc/pascal/compar.doc Normal file
View file

@ -0,0 +1,89 @@
.sp 2
.NH
Comparison with the Pascal-VU compiler
.nh
.LP
In this chapter, the differences with the Pascal-VU compiler [IM2] are listed.
The points enumerated below can be used as improvements to the compiler (see
also Chapter 11).
.sp
.NH 2
Deviations
.LP
.sp
- large labels
.in +3m
only labels in the closed interval 0..9999 are allowed, as opposed to the
Pascal-VU compiler. The Pascal-VU compiler allows every unsigned integer
as label.
.in -3m
- goto
.in +3m
the new compiler conforms to the standard as opposed to the old one. The
following program, which contains an illegal jump to label 1, is accepted
by the Pascal-VU compiler.
.nf
\fBprogram\fR illegal_goto(output);
\fBlabel\fR 1;
\fBvar\fR i : integer;
\fBbegin\fR
\fBgoto\fR 1;
\fBfor\fR i := 1 \fBto\fR 10 \fBdo\fR
\fBbegin\fR
1 : writeln(i);
\fBend\fR;
\fBend\fR.
.fi
This program is rejected by the new compiler.
.in -3m
.NH 2
Extensions
.LP
.sp
The extensions implemented by the Pascal-VU compiler are listed in
Chapter 5 of [IM2].
.sp
- separate compilation
.ti +3m
the new compiler only accepts programs, not modules.
- assertions
.ti +3m
not implemented.
- additional procedures
.ti +3m
the procedures \fIhalt, mark\fR and \fIrelease\fR are not available.
.bp
- UNIX\(tm interfacing
.ti +3m
the \-c option is not implemented.
.FS
\(tm UNIX is a Trademark of Bell Laboratories.
.FE
- double length integers
.ti +3m
integer size can be set with the \-V option, so the additional type \fIlong\fR
is not implemented.
.NH 2
Compiler options
.LP
.sp
The options implemented by the Pascal-VU compiler are listed in
Chapter 7 of [IM2].
.sp
The construction "{$....}" is not recognized.
The options: \fIa, c, d, s\fR and \fIt\fR are not available.
The \-l option has been changed into the \-L option.
The size of reals can be set with the \-V option.

88
doc/pascal/conf.doc Normal file
View file

@ -0,0 +1,88 @@
.sp 1.5i
.nr H1 3
.NH
Conformant Arrays
.nh
.LP
.sp
A fifth kind of parameter, besides the value, variable, procedure, and function
parameter, is the conformant array parameter (\fBISO 6.6.3.7\fR). This
parameter, undoubtedly the major addition to Pascal from the compiler writer's
point of view, has been implemented. With this kind of parameter, the required
bounds of the index-type of an actual parameter are not fixed, but are
restricted to a specified range of values. Two types of conformant array
parameters can be distinguished: variable conformant array parameters and
value conformant array parameters.
.sp
.NH 2
Variable conformant array parameters
.LP
.sp
The treatment of variable conformant array parameters is comparable with the
normal variable parameter.
Both have in common that the parameter mechanism used is \fIcall by
reference\fR.
.br
An example is:
.br
.in +5m
to sort variable length arrays of integers, the following Pascal procedure could be used:
.nf
\fBprocedure\fR bubblesort(\fBvar\fR A : \fBarray\fR[low..high : integer] \fBof\fR integer);
\fBvar\fR i, j : integer;
\fBbegin
for\fR j := high - 1 \fBdownto\fR low \fBdo
for\fR i := low \fBto\fR j \fBdo
if\fR A[i+1] < A[i] \fBthen\fI interchange A[i] and A[i+1]
\fBend\fR;
.fi
.in -5m
For every actual parameter, the base address of the array is pushed on the
stack and for every index-type-specification, exactly one array descriptor
is pushed.
.sp
.NH 2
Value conformant array parameters
.LP
.sp
The treatment of value conformant array parameters is more complex than its
variable counterpart.
.br
An example is:
.br
.in +5m
an unpacked array of characters could be printed as a string with the following program part:
.nf
\fBprocedure\fR WriteAsString( A : \fBarray\fR[low..high : integer] \fBof\fR char);
\fBvar\fR i : integer;
\fBbegin
for\fR i := low \fBto\fR high \fBdo\fR write(A[i]);
\fBend\fR;
.fi
.in -5m
The calling procedure pushes the base address of the actual parameter and
the array descriptors belonging to it on the stack. Subsequently the procedure
using the conformant array parameter is called. Because it is a \fIcall by
value\fR, the called procedure has to create a copy of the actual parameter.
This implies that the calling procedure knows how much space on the stack
must be reserved for the parameters. If the actual-parameter is a conformant
array, the called procedure keeps track of the size of the activation record.
Hence the restrictions on the use of value conformant array parameters, as
specified in \fBISO 6.6.3.7.2\fR, are dropped.
A description of the EM code generated by the compiler is:
.nf
.ft I
load the stack adjustment sofar
load base address of array parameter
compute the size in bytes of the array
add this size to the stack adjustment
copy the array
remember the new address of the array
.ft R
.fi

41
doc/pascal/contents.doc Normal file
View file

@ -0,0 +1,41 @@
.sp 1.5i
.ps 12
.vs 14
.ft B
Contents\fR\h'+108u'\h'+5i'Page
\h'+34u'1. Introduction \h'+34u'\h'+1.5i'1
\h'+34u'2. The compiler \h'+34u'\h'+1.5i'2
\h'+34u'3. Translation of Pascal to EM \h'+34u'\h'+1.5i'5
\h'+34u'4. Conformant arrays \h'+1.5i'10
\h'+34u'5. Compiler options \h'+1.5i'11
\h'+34u'6. Extensions to the standard \h'+1.5i'13
\h'+34u'7. Deviations from the standard \h'+1.5i'13
\h'+34u'8. Hints to change the standard \h'+1.5i'15
\h'+34u'9. Testing the compiler \h'+1.5i'16
10. Comparison with the old compiler \h'+1.5i'16
11. Improvements to the compiler \h'+1.5i'17
12. History & Acknowledgements \h'+1.5i'18
13. References \h'+1.5i'19
\fBAppendices\fR
\h'+16u'A. ISO-PASCAL Grammar \h'+1.5i'20
\h'+24u'B. Changes to run time library \h'+1.5i'26
\h'+20u'C. An example \h'+1.5i'28

118
doc/pascal/deviations.doc Normal file
View file

@ -0,0 +1,118 @@
.sp 2
.NH
Deviations from the standard
.nh
.PP
The compiler deviates from the ISO 7185 standard with respect to the
following clauses:
.IP "\fBISO 6.1.3:\fR" 14
\h'-5u'Identifiers may be of any length and all characters of an identifier
shall be significant in distinguishing between them.
.sp
.in +3m
The constant IDFSIZE, defined in the file \fIidfsize.h\fR, determines
the (maximum) significant length of an identifier. It can be set at run
time with the \-M option (see also section on compiler options).
.in -3m
.sp
.IP "\fBISO 6.1.8:\fR"
\h'-5u'There shall be at least one separator between any pair of consecutive tokens
made up of identifiers, word-symbols, labels or unsigned-numbers.
.sp
.in +3m
A token separator is not needed when a number is followed by an identifier
or a word-symbol. For example the input sequence, 2\fBthen\fR, is recognized
as the integer 2 followed by the keyword \fBthen\fR.
.in -3m
.sp
.IP "\fBISO 6.2.1:\fR"
\h'-29u'The label-declaration-part shall specify all labels that prefix a statement
in the corresponding statement-part.
.sp
.ti +3m
The compiler generates a warning if a label is declared but never defined.
.bp
.IP "\fBISO 6.2.2:\fR"
\h'-9u'The scope of identifiers and labels should start at the beginning of the
block in which these identifiers or labels are declared.
.sp
.in +3m
The compiler, as most other one pass compilers deviates in this respect,
because the scope of variables and labels start at their defining-point.
.nf
.in +4m
\fBprogram\fR deviates\fB;
const\fR
x \fB=\fR 3\fB;
procedure\fR p\fB;
const\fR
y \fB=\fR x\fB;\fR
x \fB=\fR true\fB;
begin end;
begin
end.\fR
.in -4m
.fi
In procedure p, the constant y has the integer value 3. This program does not
conform to the standard. In [SAL] a simple algorithm is described for
enforcing the scope rules, it involves numbering all scopes encoutered in the
program in order of their opening, and recording in each identifier table
entry the number of the latest scope in which it is used.
Note: The compiler does not deviate from the standard in the following program:
.nf
.in +4m
\fBprogram\fR conforms\fB;
type\fR
x \fB=\fR real\fB;
procedure\fR p\fB;
type\fR
y \fB= ^\fRx\fB;\fR
x \fB=\fR boolean\fB;
var\fR
p \fB:\fR y\fB;
begin end;
begin
end.\fR
.in -4m
.fi
In procedure p, the variable p is a pointer to boolean.
.fi
.in -3m
.sp
.IP "\fBISO 6.4.3.2:\fR"
The standard specifies that any ordinal type is allowed as index-type.
.sp
.in +3m
The required type \fIinteger\fR is not allowed as index-type, i.e.
.ti +2m
\fBARRAY [ \fIinteger\fB ] OF\fR <component-type>
is not permitted.
.br
This could be implemented, but this might cause problems on machines with
a small memory.
.in -3m
.sp
.IP "\fBISO 6.4.3.3:\fR"
\h'-1u'The type possessed by the variant-selector, called the tag-type, must
be an ordinal type, so the integer type is permitted. The values denoted by
all case-constants shall be distinct and the set thereof shall be equal
to the set of values specified by the tag-type.
.sp
.in +3m
Because it is impracticable to enumerate all integers as case-constants,
the integer type is not permitted as tag-type. It would not make a great
difference to allow it as tagtype.
.in -3m
.sp
.IP "\fBISO 6.8.3.9:\fR"
The standard specifies that the control-variable of a for-statement is not
allowed to be modified while executing the loop.
.sp
.in +3m
Violation of this rule is not detected. An algorithm to implement this rule
can be found in [PCV].

92
doc/pascal/example.doc Normal file
View file

@ -0,0 +1,92 @@
.sp 1.5i
.ft B
Appendix C: An example
.ft R
.nh
.nf
\h'+10u' 1 \fBprogram\fR factorials(input, output);
\h'+10u' 2 { This program prints factorials }
\h'+10u' 3
\h'+10u' 4 \fBconst\fR
\h'+10u' 5 FAC1 = 1;
\h'+10u' 6 \fBvar\fR
\h'+10u' 7 i : integer;
\h'+10u' 8
\h'+10u' 9 \fBfunction\fR factorial(n : integer) : integer;
10 \fBbegin\fR
11 \fBif\fR n = FAC1 \fBthen\fR
12 factorial := FAC1
13 \fBelse\fR
14 factorial := n * factorial(n-1);
15 \fBend\fR;
16
17 \fBbegin\fR
18 write('Give a number : ');
19 readln(i);
20 \fBif\fR i < 1 \fBthen\fR
21 writeln('No factorial')
22 \fBelse\fR
23 writeln(factorial(i):1);
24 \fBend\fR.
.bp
.po
.DS
mes 2,4,4 loc 16
\&.1 cal $_wrs
rom 'factorials.p\(rs000' asp 12
i lin 19
bss 4,0,0 lae input
output cal $_rdi
bss 540,0,0 asp 4
input lfr 4
bss 540,0,0 ste i
exp $factorial lae input
pro $factorial, ? cal $_rln
mes 9,4 asp 4
lin 11 lin 20
lol 0 loe i
loc 1 loc 1
cmi 4 cmi 4
teq tlt
zeq *1 zeq *1
lin 12 lin 21
loc 1 .4
stl -4 rom 'No factorial'
bra *2 lae output
1 lae .4
lin 14 loc 12
lol 0 cal $_wrs
lol 0 asp 12
loc 1 lae output
sbi 4 cal $_wln
cal $factorial asp 4
asp 4 bra *2
lfr 4 1
mli 4 lin 23
stl -4 lae output
2 loe i
lin 15 cal $factorial
mes 3,0,4,0,0 asp 4
lol -4 lfr 4
ret 4 loc 1
end 4 cal $_wsi
exp $m_a_i_n asp 12
pro $m_a_i_n, ? lae output
mes 9,0 cal $_wln
fil .1 asp 4
\&.2 2
con input, output lin 24
lxl 0 loc 0
lae .2 cal $_hlt
loc 2 end 0
lxa 0 mes 4,24,'factorials.p\(rs000'
cal $_ini
asp 16
lin 18
\&.3
rom 'Give a number : '
lae output
lae .3
.DE

60
doc/pascal/extensions.doc Normal file
View file

@ -0,0 +1,60 @@
.pl 12i
.sp 1.5i
.NH
Extensions to Pascal as specified by ISO 7185
.nh
.IP "\fBISO 6.1.3:\fR" 14
\h'-11u'The underscore is treated as a letter when the \-u option is turned
on (see also section 5.2). This is implemented to be compatible with
Pascal-VU and can be used in identifiers to increase readability.
.sp
.IP "\fBISO 6.1.4:\fR"
\h'-12u'The directive \fIextern\fR can be used in a procedure-declaration or
function-declaration to specify that the procedure-block or function-block
corresponding to that declaration is external to the program-block. This can
be used in conjunction with library routines.
.sp
.IP "\fBISO 6.1.9:\fR"
\h'-22u'An alternative representation for the following tokens and delimiting
characters is recognized:
.in +5m
.ft 5
\fBtoken
.ft 5
\& \fBalternative token
.ft 5
.sp
^
\& @
.br
[
\& (.
.br
]
\& .)
.ft 5
\fBdelimiting character
.ft 5
\& \fBalternative delimiting pair of characters
.ft 5
.sp
{
\& (*
.br
}
\& *)
.in -5m
.sp
.IP "\fBISO 6.6.3.7.2:\fR"
\h'-1u'A conformant array parameter can be passed as value conformant array
parameter without the restrictions imposed by the standard. The compiler
gives a warning. This is implemented to keep the parameter mechanism orthogonal (see also Chapter 4).
.sp
.IP "\fBISO 6.9.3.1:\fR"
\h'-16u'If the value of the argument \fITotalWidth\fR of the required
procedure \fIwrite\fR is zero or negative, no characters are written for
character, string or boolean type arguments. If the value of the argument
\fIFracDigits\fR of the required procedure \fIwrite\fR is zero or negative,
the fraction and '.' character are suppressed for fixed-point arguments.

76
doc/pascal/hints.doc Normal file
View file

@ -0,0 +1,76 @@
.sp 1.5i
.nr H1 7
.NH
Hints to change the standard
.nh
.sp
.LP
We encoutered some difficulties when the compiler was developed. In this
chapter some hints are presented to change the standard, which would make
the implementation of the compiler less difficult. The semantics of Pascal
would not be altered by these adaptions.
.sp 2
.LP
\- Some minor changes in the grammar of Pascal from the user's point of view,
but which make the writing of an LL(1) parser considerably easier, could be:
.in +3m
.nf
field-list : [ ( fixed-part [ variant-part ] | variant-part ) ] .
fixed-part : record-section \fB;\fR { record-section \fB;\fR } .
variant-part : \fBcase\fR variant-selector \fBof\fR variant \fB;\fR { variant \fB;\fR } .
case-statement : \fBcase\fR case-index \fBof\fR case-list-element \fB;\fR { case-list-element \fB;\fR } \fBend\fR .
.fi
.in -3m
.LP
\- To ease the semantic checking on sets, the principle of qualified sets could
be used, every set-constructor must be preceeded by its type-identifier:
.nf
.ti +3m
set-constructor : type-identifier \fB[\fR [ member-designator { \fB,\fR member-designator } ] \fB]\fR .
Example:
t1 = set of 1..5;
t2 = set of integer;
The type of [3, 5] would be ambiguous, but the type of t1[3, 5] not.
.fi
.LP
\- Another problem arises from the fact that a function name can appear in
three distinct 'use' contexts: function call, assignment of function
result and as function parameter.
.br
Example:
.in +5m
.nf
\fBprogram\fR function_name;
\fBfunction\fR p(x : integer; function y : integer) : integer;
\fBbegin\fR .. \fBend\fR;
\fBfunction\fR f : integer;
\fBbegin\fR
f := p(f, f); (*)
\fBend\fR;
\fBbegin\fR .. \fBend\fR.
.fi
.in -5m
A possible solution in case of a call (also a procedure call) would be to
make the (possibly empty) actual-parameter-list mandatory. The assignment
of the function result could be changed in a \fIreturn\fR statement.
Though this would change the semantics of the program slightly.
.br
The above statement (*) would look like this: return p(f(), f);
.LP
\- Another extension to the standard could be the implementation of an
\fIotherwise\fR clause in a case-statement. This would behave exactly like
the \fIdefault\fR clause in a switch-statement in C.
.bp

36
doc/pascal/his.doc Normal file
View file

@ -0,0 +1,36 @@
.sp 2
.NH
History & Acknowledgements
.nh
.sp 2
.ft B
History
.ft R
.sp
.LP
The purpose of this project was to make a Pascal compiler which should satisfy
the conditions of the ISO standard. The task was considerably simplified,
because parts of the Modula-2 compiler were used. This gave the advantage of
increasing the uniformity of the compilers in ACK.
.br
While developing the compiler, a number of errors were detected in the Modula-2
compiler, EM utility modules and the old Pascal compiler.
.sp 2
.ft B
Acknowledgements
.ft R
.sp
.LP
During the development of the compiler, valuable support was received from
a number of persons. In this regard we owe a debt of gratitude to
Fred van Beek, Casper Capel, Rob Dekker, Frank Engel, Jos\('e Gouweleeuw
and Sonja Keijzer (Jut and Jul !!), Herold Kroon, Martin van Nieuwkerk,
Sjaak Schouten, Eric Valk, and Didan Westra.
.br
Special thanks are reserved for Dick Grune, who introduced us to the field of
Compiler Design and who helped testing the compiler. Ceriel Jacobs, who
developed LLgen and the Modula-2 compiler of ACK. Finally we would like to
thank Erik Baalbergen, who had the supervision on this entire project and
gave us many valuable suggestions.
.bp

87
doc/pascal/improv.doc Normal file
View file

@ -0,0 +1,87 @@
.sp 2
.NH
Improvements to the compiler
.nh
.sp
.LP
In consideration of portability, a restricted option could be implemented.
Under this option, the extensions and warnings should be considered as errors.
.LP
The restrictions imposed by the standard on the control variable of a
for-statment should be implemented (\fBISO 6.8.3.9\fR).
.LP
To check whether a function returns a valid result, the following algorithm
could be used. When a function is entered a hidden temporary variable of
type boolean is created. This variable is initialized with the value false.
The variable is set to true, when an assignment to the function name occurs.
On exit of the function a test is performed on the variable. If the value
of the variable is false, a run-time error occurs.
.br
Note: The check has to be done run-time.
.LP
The \fIundefined value\fR should be implemented. A problem arises with
local variables, for which space on the stack is allocated. A possible
solution would be to generate code for the initialization of the local
variables with the undefined value at the beginning of a procedure or
function.
.br
The implementation for the global variables is easy, because \fBbss\fR
blocks are used.
.LP
Closely related to the last point is the generation of warnings when
variables are never used or assigned. This is not yet implemented.
.LP
The error messages could specify more details about the errors occurred,
if some additional testing is done.
.bp
.LP
Every time the compiler detects sets with different base-types, a warning
is given. Sometimes this is superfluous.
.nf
\fBprogram\fR sets(output);
\fBtype\fR
week = (sunday, monday, tuesday, wednesday, thursday, friday, saturday);
workweek = monday..friday;
\fBvar\fR
s : \fBset of\fR workweek;
day : week;
\fBbegin\fR
day := monday;
s := [day]; (* warning *)
day := saturday;
s := [day]; (* warning *)
\fBend\fR.
.fi
The new compiler gives two warnings, the first one is redundant.
.LP
A nasty point in the compiler is the way the procedures \fIread, readln,
write\fR and \fIwriteln\fR are handled (see also section 2.2). They have
been added to the grammar. This implies that they can not be redefined as
opposed to the other required procedures and functions. They should be
removed from the grammar altogether. This could imply that more semantic
checks have to be performed.
.LP
No effort is made to detect possible run-time errors during compilation.
.br
E.g. a : \fBarray\fR[1..10] \fBof\fI something\fR, and the array selection
a[11] would occur.
.LP
Some assistance to implement the improvements mentioned above, can be
obtained from [PCV].

342
doc/pascal/internal.doc Normal file
View file

@ -0,0 +1,342 @@
.pl 12.5i
.sp 1.5i
.NH
The compiler
.nh
.LP
The compiler can be divided roughly into four modules:
\(bu lexical analysis
.br
\(bu syntax analysis
.br
\(bu semantic analysis
.br
\(bu code generation
.br
The four modules are grouped into one pass. The activity of these modules
is interleaved during the pass.
.br
The lexical analyzer, some expression handling routines and various
datastructures from the Modula-2 compiler contributed to the project.
.sp 2
.NH 2
Lexical Analysis
.LP
The first module of the compiler is the lexical analyzer. In this module, the
stream of input characters making up the source program is grouped into
\fItokens\fR, as defined in \fBISO 6.1\fR. The analyzer is hand-written,
because the lexical analyzer generator, which was at our disposal,
\fILex\fR [LEX], produces much slower analyzers. A character table, in the file
\fIchar.c\fR, is created using the program \fItab\fR which takes as input
the file \fIchar.tab\fR. In this table each character is placed into a
particular class. The classes, as defined in the file \fIclass.h\fR,
represent a set of tokens. The strategy of the analyzer is as follows: the
first character of a new token is used in a multiway branch to eliminate as
many candidate tokens as possible. Then the remaining characters of the token
are read. The constant INP_NPUSHBACK, defined in the file \fIinput.h\fR,
specifies the maximum number of characters the analyzer looks ahead. The
value has to be at least 3, to handle input sequences such as:
.br
1e+4 (which is a real number)
.br
1e+a (which is the integer 1, followed by the identifier "e", a plus, and the identifier "a")
Another aspect of this module is the insertion and deletion of tokens
required by the parser for the recovery of syntactic errors (see also section
2.2). A generic input module [ACK] is used to avoid the burden of I/O.
.sp 2
.NH 2
Syntax Analysis
.LP
The second module of the compiler is the parser, which is the central part of
the compiler. It invokes the routines of the other modules. The tokens obtained
from the lexical analyzer are grouped into grammatical phrases. These phrases
are stored as parse trees and handed over to the next part. The parser is
generated using \fILLgen\fR[LL], a tool for generating an efficient recursive
descent parser with no backtrack from an Extended Context Free Syntax.
.br
An error recovery mechanism is generated almost completely automatically. A
routine called \fILLmessage\fR had to be written, which gives the necessary
error messages and deals with the insertion and deletion of tokens.
The routine \fILLmessage\fR must accept one parameter, whose value is
a token number, zero or -1. A zero parameter indicates that the current token
(the one in the external variable \fILLsymb\fR) is deleted.
A -1 parameter indicates that the parser expected end of file, but did
not get it. The parser will then skip tokens until end of file is detected.
A parameter that is a token number (a positive parameter) indicates that
this token is to be inserted in front of the token currently in \fILLsymb\fR.
Also, care must be taken, that the token currently in \fILLsymb\fR is again
returned by the \fBnext\fR call to the lexical analyzer, with the proper
attributes. So, the lexical analyzer must have a facility to push back one
token.
.br
Calls to the two standard procedures \fIwrite\fR and \fIwriteln\fR can be
different from calls to other procedures. The syntax of a write-parameter
is different from the syntax of an actual-parameter. We decided to include
them, together with \fIread\fR and \fIreadln\fR, in the grammar. An alternate
solution would be to make the syntax of an actual-parameter identical to the
syntax of a write-parameter. Afterwards the parameter has to be checked to
see whether it is used properly or not.
.bp
As the parser is LL(1), it must always be able to determine what to do,
based on the last token read (\fILLsymb\fR). Unfortunately, this was not the
case with the grammar as specified in [ISO]. Two kinds of problems
appeared, viz. the \fBalternation\fR and \fBrepetition\fR conflict.
The examples given in the following paragraphs are taken from the grammar.
.NH 3
Alternation conflict
.LP
An alternation conflict arises when the parser can not decide which
production to choose.
.br
\fBExample:\fR
.in +2m
.ft 5
.nf
procedure-declaration : procedure-heading \fB';'\f5 directive |
.br
\h'\w'procedure-declaration : 'u'procedure-identification \fB';'\f5 procedure-block |
.br
\h'\w'procedure-declaration : 'u'procedure-heading \fB';'\f5 procedure-block ;
.br
procedure-heading : \fBprocedure\f5 identifier [ formal-parameter-list ]? ;
.br
procedure-identification : \fBprocedure\f5 procedure-identifier ;
.fi
.ft R
.in -2m
A sentence that starts with the terminal \fBprocedure\fR is derived from the
three alternative productions. This conflict can be resolved in two ways:
adjusting the grammar, usually some rules are replaced by one rule and more
work has to be done in the semantic analysis; using the LLgen conflict
resolver, "\fB%if\fR (C-expression)", if the C-expression evaluates to
non-zero, the production in question is chosen, otherwise one of the
remaining rules is chosen. The grammar rules were rewritten to solve this
conflict. The new rules are given below. For more details see the file
\fIdeclar.g\fR.
.in +2m
.ft 5
.nf
procedure-declaration : procedure-heading \fB';'\f5 ( directive | procedure-block ) ;
.br
procedure-heading : \fBprocedure\f5 identifier [ formal-parameter-list ]? ;
.fi
.ft R
.in -2m
A special case of an alternation conflict, which is common to many block
structured languages, is the \fI"dangling-else"\fR ambiguity.
.in +2m
.ft 5
.nf
if-statement : \fBif\f5 boolean-expression \fBthen\f5 statement [ else-part ]? ;
.br
else-part : \fBelse\f5 statement ;
.fi
.ft R
.in -2m
The following statement that can be derived from the rules above is ambiguous:
.ti +2m
\fBif\f5 boolean-expr-1 \fBthen\f5 \fBif\f5 boolean-expr-2 \fBthen\f5 statement-1 \fBelse\f5 statement-2
.ft R
.ps 8
.vs 7
.PS
move right 1.1i
S: line down 0.5i
"if-statement" at S.start above
.ft B
"then" at S.end below
.ft R
move to S.start then down 0.25i
L: line left 0.5i then down 0.25i
box ht 0.33i wid 0.6i "boolean" "expression-1"
move to L.start then left 0.5i
L: line left 0.5i then down 0.25i
.ft B
"if" at L.end below
.ft R
move to L.start then right 0.5i
L: line right 0.5i then down 0.25i
"statement" at L.end below
move to L.end then down 0.10i
L: line down 0.25i dashed
"if-statement" at L.end below
move to L.end then down 0.10i
L: line down 0.5i
.ft B
"then" at L.end below
.ft R
move to L.start then down 0.25i
L: line left 0.5i then down 0.25i
box ht 0.33i wid 0.6i "boolean" "expression-2"
move to L.start then left 0.5i
L: line left 0.5i then down 0.25i
.ft B
"if" at L.end below
.ft R
move to L.start then right 0.5i
L: line right 0.5i then down 0.25i
box ht 0.33i wid 0.6i "statement-1"
move to L.start then right 0.5i
L: line right 0.5i then down 0.25i
.ft B
"else" at L.end below
.ft R
move to L.start then right 0.5i
L: line right 0.5i then down 0.25i
box ht 0.33i wid 0.6i "statement-2"
move to S.start
move right 3.5i
L: line down 0.5i
"if-statement" at L.start above
.ft B
"then" at L.end below
.ft R
move to L.start then down 0.25i
L: line left 0.5i then down 0.25i
box ht 0.33i wid 0.6i "boolean" "expression-1"
move to L.start then left 0.5i
L: line left 0.5i then down 0.25i
.ft B
"if" at L.end below
.ft R
move to L.start then right 0.5i
S: line right 0.5i then down 0.25i
"statement" at S.end below
move to S.start then right 0.5i
L: line right 0.5i then down 0.25i
.ft B
"else" at L.end below
.ft R
move to L.start then right 0.5i
L: line right 0.5i then down 0.25i
box ht 0.33i wid 0.6i "statement-2"
move to S.end then down 0.10i
L: line down 0.25i dashed
"if-statement" at L.end below
move to L.end then down 0.10i
L: line down 0.5i
.ft B
"then" at L.end below
.ft R
move to L.start then down 0.25i
L: line left 0.5i then down 0.25i
box ht 0.33i wid 0.6i "boolean" "expression-2"
move to L.start then left 0.5i
L: line left 0.5i then down 0.25i
.ft B
"if" at L.end below
.ft R
move to L.start then right 0.5i
L: line right 0.5i then down 0.25i
box ht 0.33i wid 0.6i "statement-1"
.PE
.ps
.vs
\h'615u'(a)\h'1339u'(b)
.sp
.ce
Two parse trees showing the \fIdangling-else\fR ambiguity
.sp 2
According to the standard, \fBelse\fR is matched with the nearest preceding
unmatched \fBthen\fR, i.e. parse tree (a) is valid (\fBISO 6.8.3.4\fR).
This conflict is statically resolved in LLgen by using "\fB%prefer\fR",
which is equivalent in behaviour to "\fB%if\fR(1)".
.bp
.NH 3
Repetition conflict
.LP
A repetition conflict arises when the parser can not decide whether to choose
a production once more, or not.
.br
\fBExample:\fR
.in +2m
.ft 5
.nf
field-list : [ ( fixed-part [ \fB';'\f5 variant-part ]? | variantpart ) [;]? ]? ;
.br
fixed-part : record-section [ \fB';'\f5 record-section ]* ;
.fi
.in -2m
.ft R
When the parser sees the semicolon, it can not decide whether another
record-section or a variant-part follows. This conflict can be resolved in
two ways: adjusting the grammar or using the conflict resolver,
"\fB%while\fR (C-expression)". The grammar rules that deal with this conflict
were completely rewritten. For more details, the reader is referred to the
file \fIdeclar.g\fR.
.sp 2
.NH 2
Semantic Analysis
.LP
The third module of the compiler is the checking of semantic conventions of
ISO-Pascal. To check the program being parsed, actions have been used in
LLgen. An action consists of several C-statements, enclosed in brackets
"{" and "}". In order to facilitate communication between the actions and
\fILLparse\fR, the parsing routines can be given C-like parameters and
local variables. An important part of the semantic analyzer is the symbol
table. This table stores all information concerning identifiers and their
definitions. Symbol-table lookup and hashing is done by a generic namelist
module [ACK]. The parser turns each program construction into a parse tree,
which is the major datastructure in the compiler. This parse tree is used
to exchange information between various routines.
.sp 2
.NH 2
Code Generation
.LP
The final module in the compiler is that of code generation. The information
stored in the parse trees is used to generate the EM code [EM]. EM code is
generated with the help of a procedural EM-code interface [ACK]. The use of
static exchanges is not desired, since the fast back end can not cope with
static code exchanges, hence the EM pseudoinstruction \fBexc\fR is never
generated.
.br
Chapter 3 discusses the code generation in more detail.
.sp 2
.NH 2
Error Handling
.LP
The first three modules have in common that they can detect errors in the
Pascal program being compiled. If this is the case, a proper message is given
and some action is performed. If code generation has to be aborted, an error
message is given, otherwise a warning is given. The constant MAXERR_LINE,
defined in the file \fIerrout.h\fR, specifies the maximum number of messages
given per line. This can be used to avoid long lists of error messages caused
by, for example, the omission of a ';'. Three kinds of errors can be
distinguished: the lexical error, the syntactic error, and the semantic error.
Examples of these errors are respectively, nested comments, an expression with
unbalanced parentheses, and the addition of two characters.
.sp 2
.NH 2
Memory Allocation and Garbage Collection
.LP
The routines \fIst_alloc\fR and \fIst_free\fR provide a mechanism for
maintaining free lists of structures, whose first field is a pointer called
\fBnext\fR. This field is used to chain free structures together. Each
structure, suppose the tag of the structure is ST, has a free list pointed
by h_ST. Associated with this list are the operations: \fInew_ST()\fR, an
allocating mechanism which supplies the space for a new ST struct; and
\fIfree_ST()\fR, a garbage collecting mechanism which links the specified
structure into the free list.
.bp

166
doc/pascal/options.doc Normal file
View file

@ -0,0 +1,166 @@
.sp 1.5i
.NH
Compiler options
.nh
.PP
There are some options available to control the behaviour of the compiler.
Two types of options can be distinguished: compile-time options and
run-time options.
.sp
.NH 2
Compile time options
.LP
.sp
There are some options that can be set when the compiler is installed.
Those options can be found in the file \fIParameters\fR. To set a parameter
just modify its definition in the file \fIParameters\fR. The shell script
in the file \fImake.hfiles\fR creates for each parameter a separate .h file.
This mechanism is derived from the C compiler in ACK.
.sp
\fBIDFSIZE\fR
.in +3m
The maximum number of characters that are significant in an identifier. This
value has to be at least the value of \fBMINIDFSIZE\fR, defined in the file
\fIoptions.c\fR. A compile-time check is included to see if the value of
\fBMINIDFSIZE\fR is legal. The compiler will not recognize some keywords
if \fBIDFSIZE\fR is too small.
.in -3m
.sp
\fBISTRSIZE\fR, \fBRSTRSIZE\fR
.in +3m
The lexical analyzer uses these two values for the allocation of memory needed
to store a string. \fBISTRSIZE\fR is the initial number of bytes allocated.
\fBRSTRSIZE\fR is the step size used for enlarging the memory needed.
.in -3m
.sp
\fBNUMSIZE\fR
.in +3m
The maximum length of a numeric constant recognized by the lexical analyzer.
It is an error if this length is exceeded.
.in -3m
.sp
\fBERROUT\fR, \fBMAXERR_LINE\fR
.in +3m
Used for error messages. \fBERROUT\fR defines the file on which the
messages are written. \fBMAXERR_LINE\fR is the maximum number of error
messages given per line.
.in -3m
.sp
\fBSZ_CHAR\fR, \fBAL_CHAR\fR, etc
.in +3m
The default values of the target machine sizes and alignments. The values
can be overruled with the \-V option.
.in -3m
.sp
\fBMAXSIZE\fR
.in +3m
This value must be set to the maximum of the values of the target machine
sizes. This parameter is used in overflow detection (see also section 3.2).
.in -3m
.sp
\fBDENSITY\fR
.in +3m
This parameter is used to decide what EM instruction has to be generated
for a case-statement. If the range of the index value is sparse, i.e.
.br
.ti +5m
(upperbound - lowerbound) / number_of_cases
.br
is more than some threshold (\fBDENSITY\fR) the \fBcsb\fR instruction is
chosen. If the range is dense a jump table is generated (\fBcsa\fR). This
uses more space. Reasonable values are 2, 3 or 4.
.br
Higher values might also be reasonable on machines, which have lots of
address space and memory (see also section 3.3.3).
.in -3m
.sp
\fBINP_READ_IN_ONE\fR
.in +3m
Used by the generic input module. It can either be defined or not defined.
Defining it has the effect that files will be read completely into memory
using only one read-system call. This should be used only on machines with
lots of memory.
.in -3m
.sp
.bp
\fBDEBUG\fR
.in +3m
.nf
If this parameter is defined some built-in compiler-debugging tools can be used:
.in +2m
\(bu only lexical analyzing is done, if the \-l option is given.
\(bu if the \-I option is turned on, the allocated number of structures is printed.
\(bu the routine debug can be used to print miscellaneous information.
\(bu the routine PrNode prints a tree of nodes.
\(bu the routine DumpType prints information about a type structure.
\(bu the macro DO_DEBUG(x,y) defined as ((x) && (y)) can be used to perform
several actions.
.in -2m
.in -3m
.sp
.NH 2
Run time options
.LP
.sp
The run time options can be given in the command line when the compiler is
called.
.br
They all have the form: \-<character>
.br
Depending on the option, a character string has to be specified. The following
options are currently available:
.sp
.IP \-\fBC\fR 18
The lower case and upper case letters are treated different (\fBISO 6.1.1\fR).
.sp
.IP \-\fBu\fR
The character '_' is treated like a letter, so it is allowed to use the
underscore in identifiers.
.br
Note: identifiers starting with an underscore may cause problems, because
.br
\h'\w'Note: 'u'most identifiers in library routines start with an underscore.
.sp
.IP \-\fBn\fR
This option suppresses the generation of register messages.
.sp
.IP \-\fBr\fR
With this option rangechecks are generated where necessary.
.sp
.IP \-\fBL\fR
Do not generate EM \fBlin\fR and \fBfil\fR instructions. These instructions
are used only for profiling.
.sp
.IP \-\fBM\fR<number>
Set the number of characters that are significant in an identifier to <number>.
The maximum significant identifier length depends on the constant IDFSIZE,
defined in \fIidfsize.h\fR.
.sp
.IP \-\fBi\fR<number>
With this flag the setsize for a set of integers can be changed. The number must
be the number of bits per set. Default value : (#bits in a word) \- 1
.sp
.IP \-\fBw\fR
Suppress warning messages (see also section 2.5).
.sp
.IP \-\fBV\fR[[\fBw\fR|\fBi\fR|\fBf\fR|\fBp\fR|\fBS\fR][\fIsize\fR]?[\fI.alignment\fR]?]*
.br
Option to set the object sizes and alignments on the target machine
dynamically. The objects that can be manipulated are:
.br
\fBw\fR\h'\w'ifpS'u' word
.br
\fBi\fR\h'\w'wfpS'u' integer
.br
\fBf\fR\h'\w'wipS'u' float
.br
\fBp\fR\h'\w'wifS'u' pointer
.br
\fBS\fR\h'\w'wifp'u' structure
.br
In case of a structure, \fIsize\fR is discarded and the \fIalignment\fR is
the initial alignment of the structure. The effective alignment is the least
common multiple of \fIalignment\fR and the alignment of its members. This
option has been implemented so that the compiler can be used as cross
compiler.
.bp

1
doc/pascal/p1-9 Executable file
View file

@ -0,0 +1 @@
pic ab+intro.doc internal.doc transpem.doc | troff -ms > p1-9.dit

1
doc/pascal/p10-14 Executable file
View file

@ -0,0 +1 @@
troff -ms -n10 conf.doc options.doc extensions.doc deviations.doc > p10-14.dit

1
doc/pascal/p15-19 Executable file
View file

@ -0,0 +1 @@
troff -ms -n15 hints.doc test.doc compar.doc improv.doc his.doc reference.doc > p15-19.dit

1
doc/pascal/p20-29 Executable file
View file

@ -0,0 +1 @@
troff -ms -n20 syntax.doc rtl.doc example.doc > p20-29.dit

50
doc/pascal/reference.doc Normal file
View file

@ -0,0 +1,50 @@
.ps 12
.vs 14
.NH
References
.sp
.nh
.IP [ISO] 8
ISO 7185 Specification for Computer Programming Language Pascal, 1982,
Acornsoft ISO-PASCAL, 1984
.sp
.IP [EM]
A.S. Tanenbaum, H. van Staveren, E.G. Keizer and J.W. Stevenson,
\fIDescription Of A Machine Architecture for use with Block Structured
Languages\fR, Informatica Rapport IR-81, Vrije Universiteit, Amsterdam, 1983
.sp
.IP [C]
B.W. Kernighan and D.M. Ritchie, \fIThe C Programming Language\fR,
Prentice-Hall, 1978
.sp
.IP [LL]
C.J.H. Jacobs, \fISome Topics in Parser Generation\fR, Informatica Rapport
IR-105, Vrije Universiteit, Amsterdam, October 1985
.sp
.IP [IM2]
J.W. Stevenson, \fIPascal-VU Reference Manual and Unix Manual Pages\fR,
Informatica Manual IM-2, Vrije Universiteit, Amsterdam, 1980
.sp
.IP [JEN]
K. Jensen and N.Wirth, \fIPascal User Manual and Report\fR,
Springer-Verlag, 1978
.sp
.IP [ACK]
\fIACK Manual Pages\fR: ALLOC, ASSERT, EM_CODE, EM_MES, IDF, INPUT, PRINT,
STRING, SYSTEM
.sp
.IP [AHO]
A.V. Aho, R. Sethi and J.D. Ullman, \fICompiler Principles, Techniques, and
Tools\fR, Addison Wesley, 1985
.sp
.IP [LEX]
M.E. Lesk, \fILex - A Lexical Analyser Generator\fR, Comp. Sci. Tech. Rep.
No. 39, Bell Laboratories, Murray Hill, New Jersey, October 1975
.sp
.IP [PCV]
B.A. Wichmann and Z.J. Ciechanowicz, \fIPascal Compiler Validation\fR, John
Wiley & Sons, 1983
.sp
.IP [SAL]
A.H.J. Sale, \fIA Note on Scope, One-Pass Compilers and Pascal\fR, Australian
Communications, 1, 1, 80-82, 1979

85
doc/pascal/rtl.doc Normal file
View file

@ -0,0 +1,85 @@
.sp 1.5i
.ft B
Appendix B: Changes to the run time library
.ft R
.nh
.sp
Some minor changes in the run time library have been made concerning the
external files (i.e. program arguments). The old compiler reserved
space for the file structures of the external files in one \fBhol\fR block.
In the new compiler, every file structure is placed in a separate \fBbss\fR
block. This implies that the arguments with which \fI_ini\fR is called are
slightly different. The second argument was the base of the \fBhol\fR block
to relocate the buffer addresses, it is changed into an integer denoting the
size of the array passed as third argument. The third argument was a pointer
to an array of integers containing the description of external files, this
argument is changed into a pointer to an array of pointers to file structures.
The differences in the generated EM code for an arbitrary Pascal program are
listed below (only the relevant parts are shown):
.in +5m
.nf
\fBprogram\fR external_files(output,f);
\fBvar\fR
f : \fBfile of \fIsome-type\fR;
.
.
\fBend\fR.
.in -5m
EM code generated by Pascal-VU:
.in +5m
.
.
hol 1088,-2147483648,0 ; space belonging to file structures of the program arguments
.
.
.
\&.2
con 3, -1, 544, 0 \h'80u'; description of external files
lxl 0
lae .2
lae 0 \h'146u'; base of hol block, to relocate buffer addresses
lxa 0
cal $_ini
asp 16
.
.
.in -5m
EM code generated by our compiler:
.in +5m
.
.
f
bss 540,0,0 \h'100u'; space belonging to file structure of program argument f
output
bss 540,0,0 \h'100u'; space belonging to file structure of standard output
.
.
.
\&.2
con 0U4, output, f \h'50u'; the absence of standard input is denoted by a null pointer
lxl 0
lae .2
loc 3 \h'144u'; denotes the size of the array of pointers to file structures
lxa 0
cal $_ini
asp 16
.
.
.in -5m
.po
The following files in the run time library have been changed:
.in +1m
pc_file.h
hlt.c
ini.c
opn.c
pentry.c
pexit.c
.in -1m
.fi
.bp
.po

269
doc/pascal/syntax.doc Normal file
View file

@ -0,0 +1,269 @@
.sp 1.5i
.LP
.vs 14
.nh
.ft B
Appendix A: ISO-PASCAL grammar
.ft R
\fBA.1 Lexical tokens\fR
The syntax describes the formation of lexical tokens from characters and the
separation of these tokens, and therefore does not adhere to the same rules
as the syntax in A.2.
The lexical tokens used to construct Pascal programs shall be classified into
special-symbols, identifiers, directives, unsigned-numbers, labels and
character-strings. The representation of any letter (upper-case or lower-case,
differences of font, etc) occurring anywhere outside of a character-string
shall be insignificant in that occurrence to the meaning of the program.
letter = \fBa\fR | \fBb\fR | \fBc\fR | \fBd\fR | \fBe\fR | \fBf\fR | \fBg\fR | \fBh\fR | \fBi\fR | \fBj\fR | \fBk\fR | \fBl\fR | \fBm\fR | \fBn\fR | \fBo\fR | \fBp\fR | \fBq\fR | \fBr\fR | \fBs\fR | \fBt\fR | \fBu\fR | \fBv\fR | \fBw\fR | \fBx\fR | \fBy\fR | \fBz\fR .
digit = \fB0\fR | \fB1\fR | \fB2\fR | \fB3\fR | \fB4\fR | \fB5\fR | \fB6\fR | \fB7\fR | \fB8\fR | \fB9\fR .
The special symbols are tokens having special meanings and shall be used to
delimit the syntactic units of the language.
special-symbol = \fB+\fR | \fB\-\fR | \fB*\fR | \fB/\fR | \fB=\fR | \fB<\fR | \fB>\fR | \fB[\fR | \fB]\fR | \fB.\fR | \fB,\fR | \fB:\fR | \fB;\fR | \fB^\fR | \fB(\fR | \fB)\fR | \fB<>\fR | \fB<=\fR | \fB>=\fR | \fB:=\fR | \fB..\fR |
\h'\w'special-symbol = 'u'word-symbol .
word-symbol = \fBand\fR | \fBarray\fR | \fBbegin\fR | \fBcase\fR | \fBconst\fR | \fBdiv\fR | \fBdo\fR | \fBdownto\fR | \fBelse\fR | \fBend\fR | \fBfile\fR | \fBfor\fR | \fBfunction\fR |
\h'\w'word-symbol = 'u'\fBgoto\fR | \fBif\fR | \fBin\fR | \fBlabel\fR | \fBmod\fR | \fBnil\fR | \fBnot\fR | \fBof\fR | \fBor\fR | \fBpacked\fR | \fBprocedure\fR | \fBprogram\fR | \fBrecord\fR |
\h'\w'word-symbol = 'u'\fBrepeat\fR | \fBset\fR | \fBthen\fR | \fBto\fR | \fBtype\fR | \fBuntil\fR | \fBvar\fR | \fBwhile\fR | \fBwith\fR .
Identifiers may be of any length. All characters of an identifier shall be
significant. No identifier shall have the same spelling as any word-symbol.
identifier = letter { letter | digit } .
A directive shall only occur in a procedure-declaration or function-declaration.
No directive shall have the same spelling as any word-symbol.
directive = letter {letter | digit} .
Numbers are given in decimal notation.
.nf
unsigned-integer = digit-sequence .
unsigned-real = unsigned-integer \fB.\fR fractional-part [ \fBe\fR scale-factor ] | unsigned-integer \fBe\fR scale-factor .
digit-sequence = digit {digit} .
fractional-part = digit-sequence .
scale-factor = signed-integer .
signed-integer = [sign] unsigned-integer .
sign = \fB+\fR | \fB\-\fR .
.fi
.bp
Labels shall be digit-sequences and shall be distinguished by their apparent
integral values and shall be in the closed interval 0 to 9999.
label = digit-sequence .
A character-string containing a single string-element shall denote a value of
the required char-type. Each string-character shall denote an implementation-
defined value of the required char-type.
.nf
character-string = \fB'\fR string-element { string-element } \fB'\fR .
string-element = apostrophe-image | string-character .
apostrophe-image = \fB''\fR .
string-character = All 7-bits ASCII characters except linefeed (10), vertical tab (11), and new page (12).
.fi
The construct:
\fB{\fR any-sequence-of-characters-and-separations-of-lines- not-containing-right-brace \fB}\fR
shall be a comment if the "{" does not occur within a character-string or
within a comment. The substitution of a space for a comment shall not alter
the meaning of a program.
Comments, spaces (except in character-strings), and the separation of
consecutive lines shall be considered to be token separators. Zero or more
token separators may occur between any two consecutive tokens, or before
the first token of a program text. No separators shall occur within tokens.
.bp
.po
\fBA.2 Grammar\fR
The non-terminal symbol \fIprogram\fR is the start symbol of the grammar.
.nf
actual-parameter : expression | variable-access | procedure-identifier | function-identifier .
actual-parameter-list : \fB(\fR actual-parameter { \fB,\fR actual-parameter } \fB)\fR .
adding-operator : \fB+\fR | \fB\-\fR | \fBor\fR .
array-type : \fBarray\fR \fB[\fR index-type { \fB,\fR index-type } \fB]\fR \fBof\fR component-type .
array-variable : variable-access .
assignment-statement : ( variable-access | function-identifier ) \fB:=\fR expression .
base-type : ordinal-type .
block : label-declaration-part constant-definition-part type-definition-part variable-declaration-part
\h'\w'block : 'u'procedure-and-function-declaration-part statement-part .
Boolean-expression : expression .
bound-identifier : identifier .
buffer-variable : file-variable \fB^\fR .
case-constant : constant .
case-constant-list : case-constant { \fB,\fR case-constant } .
case-index : expression .
case-list-element : case-constant-list \fB:\fR statement .
case-statement : \fBcase\fR case-index \fBof\fR case-list-element { \fB;\fR case-list-element } [ \fB;\fR ] \fBend\fR .
component-type : type-denoter .
component-variable : indexed-variable | field-designator .
compound-statement : \fBbegin\fR statement-sequence \fBend\fR .
conditional-statement : if-statement | case-statement .
conformant-array-parameter-specification : value-conformant-array-specification |
\h'+18.5m'variable-conformant-array-specification .
conformant-array-schema : packed-conformant-array-schema | unpacked-conformant-array-schema .
constant : [ sign ] ( unsigned-number | constant-identifier ) | character-string .
constant-definition : identifier \fB=\fR constant .
constant-definition-part : [ \fBconst\fR constant-definition \fB;\fR { constant-definition \fB;\fR } ] .
constant-identifier : identifier .
control-variable : entire-variable .
domain-type : type-identifier .
else-part : \fBelse\fR statement .
empty-statement : .
entire-variable : variable-identifier .
enumerated-type : \fB(\fR identifier-list \fB)\fR .
expression : simple-expression [ relational-operator simple-expression ] .
.bp
.po
factor : variable-access | unsigned-constant | bound-identifier | function-designator | set-constructor |
\h'\w'factor : 'u'\fB(\fR expression \fB)\fR | \fBnot\fR factor .
field-designator : record-variable \fB.\fR field-specifier | field-designator-identifier .
field-designator-identifier : identifier .
field-identifier : identifier .
field-list : [ ( fixed-part [ \fB;\fR variant-part ] | variant-part ) [ \fB;\fR ] ] .
field-specifier : field-identifier .
file-type : \fBfile\fR \fBof\fR component-type .
file-variable : variable-access .
final-value : expression .
fixed-part : record-section { \fB;\fR record-section } .
for-statement : \fBfor\fR control-variable \fB:=\fR initial-value ( \fBto\fR | \fBdownto\fR ) final-value \fBdo\fR statement .
formal-parameter-list : \fB(\fR formal-parameter-section { \fB;\fR formal-parameter-section } \fB)\fR .
formal-parameter-section : value-parameter-specification | variable-parameter-specification |
\h'\w'formal-parameter-section : 'u'procedural-parameter-specification | functional-parameter-specification |
\h'\w'formal-parameter-section : 'u'conformant-array-parameter-specification .
function-block : block .
function-declaration : function-heading \fB;\fR directive | function-identification \fB;\fR function-block |
\h'\w'function-declaration : 'u'function-heading \fB;\fR function-block .
function-designator : function-identifier [ actual-parameter-list ] .
function-heading : \fBfunction\fR identifier [ formal-parameter-list ] \fB:\fR result-type .
function-identification : \fBfunction\fR function-identifier .
function-identifier : identifier .
functional-parameter-specification : function-heading .
goto-statement : \fBgoto\fR label .
identified-variable : pointer-variable \fB^\fR .
identifier-list : identifier { \fB,\fR identifier } .
if-statement : \fBif\fR Boolean-expression \fBthen\fR statement [ else-part ] .
index-expression : expression .
index-type : ordinal-type .
index-type-specification : identifier \fB..\fR identifier \fB:\fR ordinal-type-identifier .
indexed-variable : array-variable \fB[\fR index-expression { \fB,\fR index-expression } \fB]\fR .
initial-value : expression .
label : digit-sequence .
label-declaration-part : [ \fBlabel\fR label { \fB,\fR label } \fB;\fR ] .
member-designator : expression [ \fB..\fR expression ] .
multiplying-operator : \fB*\fR | \fB/\fR | \fBdiv\fR | \fBmod\fR | \fBand\fR .
.bp
.po
new-ordinal-type : enumerated-type | subrange-type .
new-pointer-type : \fB^\fR domain-type .
new-structured-type : [ \fBpacked\fR ] unpacked-structured-type .
new-type : new-ordinal-type | new-structured-type | new-pointer-type .
ordinal-type : new-ordinal-type | ordinal-type-identifier .
ordinal-type-identifier : type-identifier .
packed-conformant-array-schema : \fBpacked\fR \fBarray\fR \fB[\fR index-type-specification \fB]\fR \fBof\fR type-identifier .
pointer-type-identifier : type-identifier .
pointer-variable : variable-access .
procedural-parameter-specification : procedure-heading .
procedure-and-function-declaration-part : { ( procedure-declaration | function-declaration ) \fB;\fR } .
procedure-block : block .
procedure-declaration : procedure-heading \fB;\fR directive | procedure-identification \fB;\fR procedure-block |
\h'\w'procedure-declaration : 'u'procedure-heading \fB;\fR procedure-block .
procedure-heading : \fBprocedure\fR identifier [ formal-parameter-list ] .
procedure-identification : \fBprocedure \fR procedure-identifier .
procedure-identifier : identifier .
procedure-statement : procedure-identifier ( [ actual-parameter-list ] | read-parameter-list | readln-parameter-list |
\h'\w'procedure-statement : procedure-identifier ( ['u'write-parameter-list | writeln-parameter-list ) .
program : program-heading \fB;\fR program-block \fB.\fR .
program-block : block .
program-heading : \fBprogram\fR identifier [ \fB(\fR program-parameters \fB)\fR ] .
program-parameters : identifier-list .
read-parameter-list : \fB(\fR [ file-variable \fB,\fR ] variable-access { \fB,\fR variable-access } \fB)\fR .
readln-parameter-list : [ \fB(\fR ( file-variable | variable-access ) { \fB,\fR variable-access } \fB)\fR ] .
record-section : identifier-list \fB:\fR type-denoter .
record-type : \fBrecord\fR field-list \fBend\fR .
record-variable : variable-access .
record-variable-list : record-variable { \fB,\fR record-variable } .
relational-operator : \fB=\fR | \fB<>\fR | \fB<\fR | \fB>\fR | \fB<=\fR | \fB>=\fR | \fBin\fR .
repeat-statement : \fBrepeat\fR statement-sequence \fBuntil\fR Boolean-expression .
repetitive-statement : repeat-statement | while-statement | for-statement .
result-type : simple-type-identifier | pointer-type-identifier .
set-constructor : \fB[\fR [ member-designator { \fB,\fR member-designator } ] \fB]\fR .
set-type : \fBset\fR \fBof\fR base-type .
sign : \fB+\fR | \fB\-\fR .
simple-expression : [ sign ] term { adding-operator term } .
simple-statement : empty-statement | assignment-statement | procedure-statement | goto-statement .
simple-type-identifier : type-identifier .
.bp
.po
statement : [ label \fB:\fR ] ( simple-statement | structured-statement ) .
statement-part : compound-statement .
statement-sequence : statement { \fB;\fR statement } .
structured-statement : compound-statement | conditional-statement | repetitive-statement | with-statement .
subrange-type : constant \fB..\fR constant .
tag-field : identifier .
tag-type : ordinal-type-identifier .
term : factor { multiplying-operator factor } .
type-definition : identifier \fB=\fR type-denoter .
type-definition-part : [ \fBtype\fR type-definition \fB;\fR { type-definition \fB;\fR } ] .
type-denoter : type-identifier | new-type .
type-identifier : identifier .
unpacked-conformant-array-schema : \fBarray\fR \fB[\fR index-type-specification { \fB;\fR index-type-specification } \fB]\fR \fBof\fR
\h'\w'unpacked-conformant-array-schema : 'u'( type-identifier | conformant-array-schema ) .
unpacked-structured-type : array-type | record-type | set-type | file-type .
unsigned-constant : unsigned-number | character-string | constant-identifier | \fBnil\fR .
unsigned-number : unsigned-integer | unsigned-real .
value-conformant-array-specification : identifier-list \fB:\fR conformant-array-schema .
value-parameter-specification : identifier-list \fB:\fR type-identifier .
variable-access : entire-variable | component-variable | identified-variable | buffer-variable .
variable-conformant-array-specification : \fBvar\fR identifier-list \fB:\fR conformant-array-schema .
variable-declaration : identifier-list \fB:\fR type-denoter .
variable-declaration-part : [ \fBvar\fR variable-declaration \fB;\fR { variable-declaration \fB;\fR } ] .
variable-identifier : identifier .
variable-parameter-specification : \fBvar\fR identifier-list \fB:\fR type-identifier .
variant : case-constant-list \fB:\fR \fB(\fR field-list \fB)\fR .
variant-part : \fBcase\fR variant-selector \fBof\fR variant { \fB;\fR variant } .
variant-selector : [ tag-field \fB:\fR ] tag-type .
while-statement : \fBwhile\fR Boolean-expression \fBdo\fR statement .
with-statement : \fBwith\fR record-variable-list \fBdo\fR statement .
write-parameter : expression [ \fB:\fR expression [ \fB:\fR expression ] ] .
write-parameter-list : \fB(\fR [ file-variable \fB,\fR ] write-parameter { \fB,\fR write-parameter } \fB)\fR .
writeln-parameter-list : [ \fB(\fR ( file-variable | write-parameter ) { \fB,\fR write-parameter } \fB)\fR ] .
.fi
.vs
.bp
.po

19
doc/pascal/test.doc Normal file
View file

@ -0,0 +1,19 @@
.sp 2
.NH
Testing the compiler
.nh
.sp
.LP
Although it is practically impossible to prove the correctness of a compiler,
a systematic method of testing the compiler is used to increase the confidence
that it will work satisfactorily in practice. The first step was to see if
the lexical analysis was performed correctly. For this purpose, the routine
LexScan() was used (see also the \-l option). Next we tested the parser
generated by LLgen, to see whether correct Pascal programs were accepted and
garbage was dealed with gracefully. The biggest test involved was the
validation of the semantic analysis. Simultaneously we tested the code
generation. First some small Pascal test programs were translated and
executed. When these programs work correctly, the Pascal validation suite
and a large set of Pascal test programs were compiled to see whether they
behaved in the manner the standard specifies. For more details about the
Pascal validation suite, the reader is referred to [PCV].

13
doc/pascal/titlepg.doc Normal file
View file

@ -0,0 +1,13 @@
\v'3i'
.ps 36
The ACK Pascal Compiler
.ps 12
.sp 30
.ce 5
.ft I
There is always something like something that there should not be.
.sp 2
.ps 10
For Whom The Bell Tolls
.ft R
Ernest Hemingway

407
doc/pascal/transpem.doc Normal file
View file

@ -0,0 +1,407 @@
.sp 1.5i
.de CL
.ft R
c\\$1
.ft 5
\fIcode statement-\\$1
.ft 5
\fBbra *\fRexit_label
.ft 5
..
.NH
Translation of Pascal to EM code
.nh
.LP
.sp
A short description of the translation of Pascal constructs to EM code is
given in the following paragraphs. The EM instructions and Pascal terminal
symbols are printed in \fBboldface\fR. A sentence in \fIitalics\fR is a
description of a group of EM (pseudo)instructions.
.sp
.NH 2
Global Variables
.LP
.sp
For every global variable, a \fBbss\fR block is reserved. To enhance the
readability of the EM-code generated, the variable-identifier is used as
a data label to address the block.
.sp
.NH 2
Expressions
.LP
.sp
Operands are always evaluated, so the execution of
.br
.ti +3m
\fBif\fR ( p <> nil ) \fBand\fR ( p^.value <> 0 ) \fBthen\fR .....
.br
might cause a run-time error, if p is equal to nil.
.LP
The left-hand operand of a dyadic operator is almost always evaluated before
the right-hand side. Peculiar evaluations exist for the following cases:
.sp
the expression: set1 <= set2, is evaluated as follows :
.nf
- evaluate set2
- evaluate set1
- compute set2+set1
- test set2 and set2+set1 for equality
.fi
.sp
the expression: set1 >= set2, is evaluated as follows :
.nf
- evaluate set1
- evaluate set2
- compute set1+set2
- test set1 and set1+set2 for equality
.fi
.sp
Where allowed, according to the standard, constant integral expressions are
compile-time evaluated while an effort is made to report overflow on target
machine basis. The integral expressions are evaluated in the type \fIarith\fR.
The size of an arith is assumed to be at least the size of the integer type
on the target machine. If the target machine's integer size is less than the
size of an arith, overflow can be detected at compile-time. However, the
following call to the standard procedure new, \fInew(p, 3+5)\fR, is illegal,
because the second parameter is not a constant according to the grammar.
.sp
Constant floating expressions are not compile-time evaluated, because the
precision on the target machine and the precision on the machine on which the
compiler runs could be different. The boolean expression \fI(1.0 + 1.0) = 2.0\fR
could evaluate to false.
.sp
.NH 2
Statements
.NH 3
Assignment Statement
\fRPASCAL :
.ti +3m
\f5(variable-access | function-identifier) \fB:=\f5 expression
\fREM :
.nf
.in +3m
.ft I
evaluate expression
store in variable-access or function-identifier
.ft R
.in -3m
.fi
In case of a function-identifier, a hidden temporary variable is used to
keep the function result.
.bp
.NH 3
Goto Statement
\fRPASCAL :
.ti +3m
\fBGOTO\f5 label
\fREM :
.in +3m
Two cases can be distinguished :
.br
- local goto,
.ti +2m
in which a \fBbra\fR is generated.
- non-local goto,
.in +2m
.ll -1i
a goto_descriptor is build, containing the ProgramCounter of the instruction
jumped to and an offset in the target procedure frame which contains the
value of the StackPointer after the jump. The code for the jump itself is to
load the address of the goto_descriptor, followed by a push of the LocalBase
of the target procedure and a \fBcal\fR $_gto. A message is generated to
indicate that a procedure or function contains a statement which is the
target of a non-local goto.
.ll +1i
.in -2m
.in -3m
.sp 2
.NH 3
If Statement
\fRPASCAL :
.in +3m
.ft 5
\fBIF\f5 boolean-expression \fBTHEN\f5 statement
.in -3m
\fREM :
.nf
.in +3m
\fIevaluation boolean-expression
\fBzeq \fR*exit_label
\fIcode statement
\fRexit_label
.in -3m
.fi
.sp 2
\fRPASCAL :
.in +3m
.ft 5
\fBIF\f5 boolean-expression \fBTHEN\f5 statement-1 \fBELSE\f5 statement-2
.in -3m
\fREM :
.nf
.in +3m
\fIevaluation boolean-expression
\fBzeq \fR*else_label
\fIcode statement-1
\fBbra \fR*exit_label
\fRelse_label
\fIcode statement-2
\fRexit_label
.in -3m
.fi
.sp 2
.NH 3
Repeat Statement
\fRPASCAL :
.in +3m
.ft 5
\fBREPEAT\f5 statement-sequence \fBUNTIL\f5 boolean-expression
.in -3m
\fREM :
.nf
.in +3m
\fRrepeat_label
\fIcode statement-sequence
\fIevaluation boolean-expression
\fBzeq\fR *repeat_label
.in -3m
.fi
.bp
.NH 3
While Statement
\fRPASCAL :
.in +3m
.ft 5
\fBWHILE\f5 boolean-expression \fBDO\f5 statement
.in -3m
\fREM :
.nf
.in +3m
\fRwhile_label
\fIevaluation boolean-expression
\fBzeq\fR *exit_label
\fIcode statement
\fBbra\fR *while_label
\fRexit_label
.in -3m
.fi
.sp 2
.NH 3
Case Statement
.LP
.sp
The case-statement is implemented using the \fBcsa\fR and \fBcsb\fR
instructions.
\fRPASCAL :
.in +3m
\fBCASE\f5 case-expression \fBOF\f5
.in +5m
case-constant-list-1 \fB:\f5 statement-1 \fB;\f5
.br
case-constant-list-2 \fB:\f5 statement-2 \fB;\f5
.br
\&.
.br
\&.
.br
case-constant-list-n \fB:\f5 statement-n [\fB;\f5]
.in -5m
\fBEND\fR
.in -3m
.sp 2
.LP
.ll -1i
The \fBcsa\fR instruction is used if the range of the case-expression
value is dense, i.e.
.br
.ti +3m
\f5( upperbound \- lowerbound ) / number_of_cases\fR
.br
is less than the constant DENSITY, defined in the file \fIdensity.h\fR.
If the range is sparse, a \fBcsb\fR instruction is used.
.ll +1i
\fREM :
.nf
.in +3m
\fIevaluation case-expression
\fBbra\fR *l1
.CL 1
.CL 2
.
.
.CL n
.ft R
\&.case_descriptor
.ft 5
\fIgeneration case_descriptor
\fRl1
.ft 5
\fBlae\fR .case_descriptor
.ft 5
\fBcsa\fR size of (case-expression)
\fRexit_label
.in -3m
.fi
.bp
.NH 3
For Statement
\fRPASCAL :
.in +3m
.ft 5
\fBFOR\f5 control-variable \fB:=\f5 initial-value (\fBTO\f5 | \fBDOWNTO\f5) final-value \fBDO\f5 statement
.ft R
.in -3m
The initial-value and final-value are evaluated at the beginning of the loop.
If the values are not constant, they are evaluated once and stored in a
temporary.
EM :
.nf
.in +3m
\fIload initial-value
\fIload final-value
\fBbgt\fR exit-label (* DOWNTO : \fBblt\fI exit-label\fR *)
\fIload initial-value
\fRl1
\fIstore in control-variable
\fIcode statement
\fIload control-variable
\fBdup\fI control-variable
\fIload final-value
\fBbeq\fR exit_label
\fBinc\fI control-variable\fR (* DOWNTO : \fBdec\fI control-variable\fR *)
\fBbra *\fRl1
\fRexit_label
.in -3m
.fi
Note: testing must be done before incrementing(decrementing) the
control-variable,
.br
\h'\w'Note: 'u'because wraparound could occur, which could lead to an infinite
loop.
.sp 2
.NH 3
With Statement
\fRPASCAL :
.ti +3m
\fBWITH\f5 record-variable-list \fBDO\f5 statement
.ft R
The statement
.ti +3m
\fBWITH\fR r\s-3\d1\u\s0, r\s-3\d2\u\s0, ..., r\s-3\dn\u\s0 \fBDO\f5 statement
.ft R
is equivalent to
.in +3m
\fBWITH\fR r\s-3\d1\u\s0 \fBDO\fR
\fBWITH\fR r\s-3\d2\u\s0 \fBDO\fR
...
\fBWITH\fR r\s-3\dn\u\s0 \fBDO\f5 statement
.ft R
.in -3m
The translation of
.ti +3m
\fBWITH\fR r\s-3\d1\u\s0 \fBDO\f5 statement
.br
.ft R
is
.nf
.in +3m
\fIpush address of r\s-3\d1\u\s0
\fIstore address in temporary
\fIcode statement
.in -3m
.fi
.ft R
An occurrence of a field is translated into:
.in +3m
\fIload temporary
.br
\fIadd field-offset
.in -3m
.bp
.NH 2
Procedure and Function Calls
.ft R
In general, the call
.ti +5m
p(a\s-3\d1\u\s0, a\s-3\d2\u\s0, ...., a\s-3\dn\u\s0)
.br
is translated into the sequence:
.in +5m
.nf
\fIevaluate a\s-3\dn\u\s0
\&.
\&.
\fIevaluate a\s-3\d2\u\s0
\fIevaluate a\s-3\d1\u\s0
\fIpush localbase
\fBcal\fR $p
\fIpop parameters
.ft R
.fi
.in -5m
i.e. the order of evaluation and binding of the actual-parameters is from
right to left. In general, a copy of the actual-parameter is made when the
formal-parameter is a value-parameter. If the formal-parameter is a
variable-parameter, a pointer to the actual-parameter is pushed.
In case of a function call, a \fBlfr\fR is generated, which pushes the
function result on top of the stack.
.sp 2
.NH 2
Register Messages
.ft R
A register message can be generated to indicate that a local variable is never
referenced indirectly. This implies that a register can be used for a variable.
We distinguish the following classes, given in decreasing priority:
\(bu control-variable and final-value of a for-statement
.br
.ti +5m
to speed up testing, and execution of the body of the for-statement
.sp
\(bu record-variable of a with-statement
.br
.ti +5m
to improve the field selection of a record
.sp
\(bu remaining local variables and parameters
.sp 2
.NH 2
Compile-time optimizations
.ft R
The only optimization that is performed is the evaluation of constant
integral expressions. The optimization of constructs like
.ti +5m
\fBif\f5 false \fBthen\f5 statement\fR,
.br
is left to either the peephole optimizer, or a global optimizer.

23
doc/pascal/vrk.doc Normal file
View file

@ -0,0 +1,23 @@
.TL
The ACK Pascal Compiler
.AU
Aad Geudeke
Frans Hofmeester
.AI
Dept. of Mathematics and Computer Science
Vrije Universiteit
Amsterdam, The Netherlands
.LP
.ps 12
.sp 24
.ce 5
.ft I
There is always something like something that there should not be.
.sp 2
.ps 10
For Whom The Bell Tolls
.ft R
Ernest Hemingway