ack/doc/m2ref.doc

.\" $Id$
.\" troff -ms m2ref.doc
.TL
The ACK Modula-2 Compiler
.AU
Ceriel J.H. Jacobs
.AI
Department of Mathematics and Computer Science
Vrije Universiteit
Amsterdam
The Netherlands
.AB no
.AE
.NH
Introduction
.PP
This document describes the implementation-specific features of the
ACK Modula-2 compiler.
It is not intended to teach Modula-2 programming.
For a description of the Modula-2 language,
the reader is referred to [1].
.PP
The ACK Modula-2 compiler is currently available for use with the VAX,
Motorola MC68020,
Motorola MC68000,
PDP-11,
and Intel 8086 code-generators.
For the 8086,
MC68000,
and MC68020,
floating point emulation is used.
This is made available with the \fI-fp\fP
option,
which must be passed to \fIack\fP[4,5].
.NH
The language implemented
.PP
This section discusses the deviations from the Modula-2 language as described
in the "Report on The Programming Language Modula-2",
as it appeared in [1],
from now on referred to as "the Report".
Also,
the Report sometimes leaves room for interpretation.
The section numbers
mentioned are the section numbers of the Report.
.NH 2
Syntax (section 2)
.PP
The syntax recognized is that of the Report,
with some extensions to
also recognize the syntax of an earlier definition,
given in [2].
Only one compilation unit per file is accepted.
.NH 2
Vocabulary and Representation (section 3)
.PP
The input "\f(CW10..\fP" is parsed as two tokens: "\f(CW10\fP" and "\f(CW..\fP".
.PP
The empty string \f(CW""\fP has type
.DS
.ft CW
ARRAY [0 .. 0] OF CHAR
.ft P
.DE
and contains one character: \f(CW0C\fP.
.PP
When the text of a comment starts with a '\f(CW$\fP',
it may be a pragma.
Currently,
the following pragmas exist:
.DS
.ft CW
(*$F      (F stands for Foreign) *)
(*$R[+|-] (Runtime checks, on or off, default on) *)
(*$A[+|-] (Array bound checks, on or off, default off) *)
(*$U      (Allow for underscores within identifiers) *)
.ft P
.DE
The Foreign pragma is only meaningful in a \f(CWDEFINITION MODULE\fP,
and indicates that this
\f(CWDEFINITION MODULE\fP describes an interface to a module written in another
language (for instance C,
Pascal,
or EM).
Runtime checks that can be disabled are:
range checks,
\f(CWCARDINAL\fP overflow checks,
checks when assigning a \f(CWCARDINAL\fP to an \f(CWINTEGER\fP and vice versa,
and checks that \f(CWFOR\fP-loop control-variables are not changed
in the body of the loop.
Array bound checks can be enabled,
because many EM implementations do not
implement the array bound checking of the EM array instructions.
When enabled,
the compiler generates a check before generating an
EM array instruction.
Even when underscores are enabled,
they still may not start an identifier.
.PP
Constants of type \f(CWLONGINT\fP are integers with a suffix letter \f(CWD\fP
(for instance \f(CW1987D\fP).
Constants of type \f(CWLONGREAL\fP have suffix \f(CWD\fP if a scale factor is missing,
or have \f(CWD\fP in place of \f(CWE\fP in the scale factor (f.i. \f(CW1.0D\fP,
\f(CW0.314D1\fP).
This addition was made,
because there was no way to indicate long constants,
and also because the addition was made in Wirth's newest Modula-2 compiler.
.NH 2
Declarations and scope rules (section 4)
.PP
Standard identifiers are considered to be predeclared,
and valid in all
parts of a program.
They are called \fIpervasive\fP.
Unfortunately,
the Report does not state how this pervasiveness is accomplished.
However,
page 87 of [1] states: "Standard identifiers are automatically
imported into all modules".
Our implementation therefore allows
redeclarations of standard identifiers within procedures,
but not within
modules.
.NH 2
Constant expressions (section 5)
.PP
Each operand of a constant expression must be a constant:
a string,
a number,
a set,
an enumeration literal,
a qualifier denoting a
constant expression,
a type transfer with a constant argument,
or one of the standard procedures
\f(CWABS\fP,
\f(CWCAP\fP,
\f(CWCHR\fP,
\f(CWLONG\fP,
\f(CWMAX\fP,
\f(CWMIN\fP,
\f(CWODD\fP,
\f(CWORD\fP,
\f(CWSIZE\fP,
\f(CWSHORT\fP,
\f(CWTSIZE\fP,
or \f(CWVAL\fP,
with constant argument(s);
\f(CWTSIZE\fP and \f(CWSIZE\fP may also have a variable as argument.
.PP
Floating point expressions are never evaluated compile time,
because the compiler basically functions as a cross-compiler,
and thus cannot
use the floating point instructions of the machine on which it runs.
Also,
\f(CWMAX(REAL)\fP and \f(CWMIN(REAL)\fP are not allowed.
.NH 2
Type declarations (section 6)
.NH 3
Basic types (section 6.1)
.PP
The type \f(CWCHAR\fP includes the ASCII character set as a subset.
Values range from
\f(CW0C\fP to \f(CW377C\fP,
not from \f(CW0C\fP to \f(CW177C\fP.
.NH 3
Enumerations (section 6.2)
.PP
The maximum number of enumeration literals in any one enumeration type
is \f(CWMAX(INTEGER)\fP.
.NH 3
Record types (section 6.5)
.PP
The syntax of variant sections in [1] is different from the one in [2].
Our implementation recognizes both,
giving a warning for the older one.
However,
see section 3.
.NH 3
Set types (section 6.6)
.PP
The only limitation imposed by the compiler is that the base type of the
set must be a subrange type,
an enumeration type,
\f(CWCHAR\fP,
or \f(CWBOOLEAN\fP.
So,
the lower bound may be negative.
However,
if a negative lower bound is used,
the compiler gives a warning of the \fIrestricted\fP class (see the manual
page of the compiler).
.PP
The standard type \f(CWBITSET\fP is defined as
.DS
.ft CW
TYPE BITSET = SET OF [0 .. 8*SIZE(INTEGER)-1];
.ft P
.DE
.NH 2
Expressions (section 8)
.NH 3
Operators (section 8.2)
.NH 4
Arithmetic operators (section 8.2.1)
.PP
The Report does not specify the priority of the unary
operators \f(CW+\fP or \f(CW-\fP:
It does not specify whether
.DS
.ft CW
- 1 + 1
.ft P
.DE
means
.DS
.ft CW
- (1 + 1)
.ft P
.DE
or
.DS
.ft CW
(-1) + 1
.ft P
.DE
I have seen some compilers that implement the first alternative,
and others that implement the second.
Our compiler implements the second,
which is suggested by the fact that their priority is not specified,
which might indicate that it is the same as that of their binary counterparts.
And then the rule about left to right decides for the second.
On the other hand one might argue that,
since the grammar only allows for one unary operator in a simple expression,
it must apply to the whole simple expression,
not just the first term.
.NH 2
Statements (section 9)
.NH 3
Assignments (section 9.1)
.PP
The Report does not define the evaluation order in an assignment.
Our compiler certainly chooses an evaluation order,
but it is explicitly left undefined.
Therefore,
programs that depend on it may cease to work later.
.PP
The types \f(CWINTEGER\fP and \f(CWCARDINAL\fP are assignment-compatible with
\f(CWLONGINT\fP,
and \f(CWREAL\fP is assignment-compatible with \f(CWLONGREAL\fP.
.NH 3
Case statements (section 9.5)
.PP
The size of the type of the case-expression must be less than or equal to
the word-size.
.PP
The Report does not specify what happens if the value of the case-expression
does not occur as a label of any case,
and there is no \f(CWELSE\fP-part.
In our implementation,
this results in a runtime error.
.NH 3
For statements (section 9.8)
.PP
The Report does not specify the legal types for a control variable.
Our implementation allows the basic types (except \f(CWREAL\fP),
enumeration types,
and subranges.
A runtime warning is generated when the value of the control variable
is changed by the statement sequence that forms the body of the loop,
unless runtime checking is disabled.
.NH 3
Return and exit statements (section 9.11)
.PP
The Report does not specify which result-types are legal.
Our implementation allows any result type.
.NH 2
Procedure declarations (section 10)
.PP
Function procedures must exit through a RETURN statement,
or a runtime error occurs.
.NH 3
Standard procedures (section 10.2)
.PP
Our implementation supports \f(CWNEW\fP and \f(CWDISPOSE\fP
for backwards compatibility,
but issues warnings for their use.
However,
see section 3.
.PP
Also,
some new standard procedures were added,
similar to the new standard procedures in Wirth's newest compiler:
.IP \-
\f(CWLONG\fP converts an argument of type \f(CWINTEGER\fP or \f(CWREAL\fP to the
types \f(CWLONGINT\fP or \f(CWLONGREAL\fP.
.IP \-
\f(CWSHORT\fP performs the inverse transformation,
without range checks.
.IP \-
\f(CWFLOATD\fP is analogous to \f(CWFLOAT\fP,
but yields a result of type
\f(CWLONGREAL\fP.
.IP \-
\f(CWTRUNCD\fP is analogous to \f(CWTRUNC\fP,
but yields a result of type
\f(CWLONGINT\fP.
.NH 2
System-dependent facilities (section 12)
.PP
The type \f(CWBYTE\fP is added to the \f(CWSYSTEM\fP module.
It occupies a storage unit of 8 bits.
\f(CWARRAY OF BYTE\fP has a similar effect to \f(CWARRAY OF WORD\fP,
but is safer.
In some obscure cases the \f(CWARRAY OF WORD\fP mechanism does not quite
work properly.
.PP
The procedure \f(CWIOTRANSFER\fP is not implemented.
.NH 1
Backwards compatibility
.PP
Besides recognizing the language as described in [1],
the compiler recognizes most of the language described in [2],
for backwards compatibility.
It warns the user for old-fashioned
constructions (constructions that [1] does not allow).
If the \fI-Rm2-3\fP option (see [6]) is passed to \fIack\fP,
this backwards compatibility feature is disabled.
Also,
it may not be present on some
smaller machines,
like the PDP-11.
.NH 1
Compile time errors
.PP
The compile time error messages are intended to be self-explanatory,
and not listed here.
The compiler also sometimes issues warnings,
recognizable by a warning-classification between parentheses.
Currently,
there are 3 classifications:
.IP "(old-fashioned use)"
.br
These warnings are given on constructions that are not allowed by [1],
but are allowed by [2].
.IP (strict)
.br
These warnings are given on constructions that are supported by the
ACK Modula-2 compiler,
but might not be supported by others.
Examples: functions returning structured types,
SET types of subranges with
negative lower bound.
.IP (warning)
.br
The other warnings,
such as warnings about variables that are never assigned,
never used,
etc.
.NH 1
Runtime errors
.PP
The ACK Modula-2 compiler produces code for an EM machine as defined in [3].
Therefore,
it depends on the implementation
of the EM machine for detection some of the runtime errors that could occur.
.PP
The \fITraps\fP module enables the user to install his own runtime
error handler.
The default one just displays what happened and exits.
Basically,
a trap handler is just a procedure that takes an INTEGER as
parameter.
The INTEGER is the trap number.
This INTEGER can be one of the
EM trap numbers,
listed in [3],
or one of the numbers listed in the
\fITraps\fP definition module.
.PP
The following runtime errors may occur:
.IP "array bound error"
.br
The detection of this error depends on the EM implementation.
.IP "range bound error"
.br
Range bound errors are always detected,
unless runtime checks are disabled.
.IP "set bound error"
.br
The detection of this error depends on the EM implementation.
The current implementations detect this error.
.IP "integer overflow"
.br
The detection of this error depends on the EM implementation.
.IP "cardinal overflow"
.br
This error is detected,
unless runtime checks are disabled.
.IP "cardinal underflow"
.br
This error is detected,
unless runtime checks are disabled.
.IP "real overflow"
.br
The detection of this error depends on the EM implementation.
.IP "real underflow"
.br
The detection of this error depends on the EM implementation.
.IP "divide by 0"
.br
The detection of this error depends on the EM implementation.
.IP "divide by 0.0"
.br
The detection of this error depends on the EM implementation.
.IP "undefined integer"
.br
The detection of this error depends on the EM implementation.
.IP "undefined real"
.br
The detection of this error depends on the EM implementation.
.IP "conversion error"
.br
This error occurs when assigning a negative value of type INTEGER to a
variable of type CARDINAL,
or when assigning a value of CARDINAL that is > MAX(INTEGER),
to a variable of type INTEGER.
It is detected,
unless runtime checking is disabled.
.IP "stack overflow"
.br
The detection of this error depends on the EM implementation.
.IP "heap overflow"
.br
The detection of this error depends on the EM implementation.
Might happen when ALLOCATE fails.
.IP "case error"
.br
This error occurs when non of the cases in a CASE statement are selected,
and the CASE statement has no ELSE part.
The detection of this error depends on the EM implementation.
All current EM implementations detect this error.
.IP "stack size of process too large"
.br
This is most likely to happen if the reserved space for a coroutine stack
is too small.
In this case,
increase the size of the area given to
\f(CWNEWPROCESS\fP.
It can also happen if the stack needed for the main
process is too large and there are coroutines.
In this case,
the only fix is to reduce the stack size needed by the main process,
f.i. by avoiding local arrays.
.IP "too many nested traps + handlers"
.br
This error can only occur when the user has installed his own trap handler.
It means that during execution of the trap handler another trap has occurred,
and that several times.
In some cases,
this is an error because of overflow of some internal tables.
.IP "no RETURN from function procedure"
.br
This error occurs when a function procedure does not return properly
("falls" through).
.IP "illegal instruction"
.br
This error might occur when floating point operations are used on an
implementation that does not have floating point.
.PP
In addition,
some of the library modules may give error messages.
The \fBTraps\fP-module has a suitable mechanism for this.
.NH 1
Calling the compiler
.PP
See [4,5,6] for a detailed explanation.
.PP
The compiler itself has no version checking mechanism.
A special linker
would be needed to do that.
Therefore,
a makefile generator is included [7].
.NH 1
The procedure call interface
.PP
Parameters are pushed on the stack in reversed order,
so that the EM AB
(argument base) register indicates the first parameter.
For VAR parameters,
its address is passed,
for value parameters its value.
The only exception to this rule is with conformant arrays.
For conformant arrays,
the address is passed,
and an array descriptor is
passed.
The descriptor is an EM array descriptor.
It consists of three
fields: the lower bound (always 0),
upper bound - lower bound,
and the size of the elements.
The descriptor is pushed first.
If the parameter is a value parameter,
the called routine must make sure
that its value is never changed,
for instance by making its own copy
of the array.
The Modula-2 compiler does exactly this.
.PP
When the size of the return value of a function procedure is larger than
the maximum of \f(CWSIZE(LONGREAL)\fP and twice the pointer-size,
the caller reserves this space on the stack,
above the parameters.
Callee then stores
its result there,
and returns no other value.
.NH 1
References
.IP [1]
Niklaus Wirth,
.I
Programming in Modula-2, third, corrected edition,
.R
Springer-Verlag, Berlin (1985)
.IP [2]
Niklaus Wirth,
.I
Programming in Modula-2,
.R
Stringer-Verlag, Berlin (1983)
.IP [3]
A.S.Tanenbaum, J.W.Stevenson, Hans van Staveren, E.G.Keizer,
.I
Description of a machine architecture for use with block structured languages,
.R
Informatica rapport IR-81, Vrije Universiteit, Amsterdam
.IP [4]
UNIX manual \fIack\fP(1)
.IP [5]
UNIX manual \fImodula-2\fP(1)
.IP [6]
UNIX manual \fIem_m2\fP(6)
.IP [7]
UNIX manual \fIm2mm\fP(1)