.EQ delim $$ .EN .RP .TL ACK/CEM Compiler .br Reference Manual .AU Erik H. Baalbergen .AI Department of Mathematics and Computer Science Vrije Universiteit Amsterdam The Netherlands .AB no .AE .NH C Language .PP This section discusses the extensions to and deviations from the C language, as described in [1]. The issues are numbered according to the reference manual. .SH 2.2 Identifiers .PP Upper and lower case letters are different. The number of significant letters is 32 by default, but may be set to another value using the \fB\-M\fP option. The identifier length should be set according to the rest of the compilation programs. .SH 2.3 Keywords .SH \f5asm\fP .PP The keyword \f5asm\fP is recognized. However, the statement .DS .ft 5 asm(string); .ft R .DE is skipped, while a warning is given. .SH \f5enum\fP .PP The \f5enum\fP keyword is recognized and interpreted. .SH \f5entry\fP, \f5fortran\fP .PP The words \f5entry\fP and \f5fortran\fP are reserved under the restricted option. The words are not interpreted by the compiler. .SH 2.4.1 Integer Constants .PP An octal or hex constant which is less than or equal to the largest unsigned (target) machine integer is taken to be \f5unsigned\fP. An octal or hex constant which exceeds the largest unsigned (target) machine integer is taken to be \f5long\fP. .SH 2.4.3 Character Constants .PP A character constant is a sequence of 1 up to \f5sizeof(int)\fP characters enclosed in single quotes. The value of a character constant '$c sub 1 c sub 2 ... c sub n$' is $d sub n + M \(mu d sub {n - 1} + ... + M sup {n - 1} \(mu d sub 2 + M sup n \(mu d sub 1$, where M is 1 + maximum unsigned number representable in an \f5unsigned char\fP, and $d sub i$ is the signed value (ASCII) of character $c sub i$. .SH 2.4.4 Floating Constants .PP The compiler does not support compile-time floating point arithmetic. .SH 2.6 Hardware characteristics .PP The compiler is capable of producing EM code for machines with the following properties .IP \(bu a \f5char\fP is 8 bits .IP \(bu the size of \f5int\fP is equal to the word size .IP \(bu the size of \f5short\fP may not exceed the size of \f5int\fP .IP \(bu the size of \f5int\fP may not exceed the size of \f5long\fP .IP \(bu the size of pointers is equal to the size of either \f5short\fP, \f5int\fP or \f5long\fP .LP .SH 4 What's in a name? .SH \f5char\fP .PP Objects of type \f5char\fP are taken to be signed. The combination \f5unsigned char\fP is legal. .SH \f5unsigned\fP .PP The type combinations \f5unsigned char\fP, \f5unsigned short\fP and \f5unsigned long\fP are supported. .SH \f5enum\fP .PP The data type \f5enum\fP is implemented as described in \fIRecent Changes to C\fP (see appendix A). .I Cem treats enumeration variables as if they were \f5int\fP. .SH \f5void\fP .PP Type \f5void\fP is implemented. The type specifies an empty set of values, which takes no storage space. .SH \fRFundamental types\fP .PP The names of the fundamental types can be redefined by the user, using \f5typedef\fP. .SH 7 Expressions .PP The order of evaluation of expressions depends on the complexity of the subexpressions. In case of commutative operations, the most complex subexpression is evaluated first. Parameter lists are evaluated from right to left. .SH 7.2 Unary operators .PP The type of a \f5sizeof\fP expression is \f5unsigned int\fP. .SH 7.13 Conditional operator .PP Both the second and the third expression in a conditional expression may include assignment operators. They may be structs or unions. .SH 7.14 Assignment operators .PP Structures may be assigned, passed as arguments to functions, and returned by functions. The types of operands taking part must be the same. .SH 8.2 Type specifiers .PP The combinations \f5unsigned char\fP, \f5unsigned short\fP and \f5unsigned long\fP are implemented. .SH 8.5 Structure and union declarations .PP Fields of any integral type, either signed or unsigned, are supported, as long as the type fits in a word on the target machine. .PP Fields are left adjusted by default; the first field is put into the left part of a word, the next one on the right side of the first one, etc. The \f5-Vr\fP option in the call of the compiler causes fields to be right adjusted within a machine word. .PP The tags of structs and unions occupy a different name space from that of variables and that of member names. .SH 9.7 Switch statement .PP The type of \fIexpression\fP in .DS .ft 5 \f5switch (\fP\fIexpression\fP\f5)\fP \fIstatement\fP .ft .DE must be integral. A warning is given under the restricted option if the type is \f5long\fP. .SH 10 External definitions .PP See [4] for a discussion on this complicated issue. .SH 10.1 External function definitions .PP Structures may be passed as arguments to functions, and returned by functions. .SH 11.1 Lexical scope .PP Typedef names may be redeclared like any other variable name; the ice mentioned in \(sc11.1 is walked correctly. .SH 12 Compiler control lines .PP Lines which do not occur within comment, and with \f5#\fP as first character, are interpreted as compiler control line. There may be an arbitrary number of spaces, tabs and comments (collectively referred as \fIwhite space\fP) following the \f5#\fP. Comments may contain newline characters. Control lines with only white space between the \f5#\fP and the line separator are skipped. .PP The #\f5include\fP, #\f5ifdef\fP, #\f5ifndef\fP, #\f5undef\fP, #\f5else\fP and #\f5endif\fP control lines and line directives consist of a fixed number of arguments. The list of arguments may be followed an arbitrary sequence of characters, in which comment is interpreted as such. (I.e., the text between \f5/*\fP and \f5*/\fP is skipped, regardless of newlines; note that commented-out lines beginning with \f5#\fP are not considered to be control lines.) .SH 12.1 Token replacement .PP The replacement text of macros is taken to be a string of characters, in which an identifier may stand for a formal parameter, and in which comment is interpreted as such. Comments and newline characters, preceeded by a backslash, in the replacement text are replaced by a space character. .PP The actual parameters of a macro are considered tokens and are balanced with regard to \f5()\fP, \f5{}\fP and \f5[]\fP. This prevents the use of macros like .DS .ft 5 CTL([) .ft .DE .PP Formal parameters of a macro must have unique names within the formal-parameter list of that macro. .PP A message is given at the definition of a macro if the macro has already been #\f5defined\fP, while the number of formal parameters differ or the replacement texts are not equal (apart from leading and trailing white space). .PP Recursive use of macros is detected by the compiler. .PP Standard #\f5defined\fP macros are .DS \f5__FILE__\fP name of current input file as string constant \f5__DATE__\fP curent date as string constant; e.g. \f5"Tue Wed 2 14:45:23 1986"\fP \f5__LINE__\fP current line number as an integer .DE .PP No message is given if \fIidentifier\fP is not known in .DS .ft 5 #undef \fIidentifier\fP .ft .DE .SH 12.2 File inclusion .PP A newline character is appended to each file which is included. .SH 12.3 Conditional compilation .PP The #\f5if\fP, #\f5ifdef\fP and #\f5ifndef\fP control lines may be followed by an arbitrary number of .DS .ft 5 #elif \fIconstant-expression\fP .ft .DE control lines, before the corresponding #\f5else\fP or #\f5endif\fP is encountered. The construct .DS .ft 5 #elif \fIconstant-expression\fP some text #endif /* corresponding to #elif */ .ft .DE is equivalent to .DS .ft 5 #else #if \fIconstant-expression\fP some text #endif /* corresponding to #if */ #endif /* corresponding to #else */ .ft .DE .PP The \fIconstant-expression\fP in #\f5if\fP and #\f5elif\fP control lines may contain the construction .DS .ft 5 defined(\fIidentifier\fP) .ft .DE which is replaced by \f51\fP, if \fIidentifier\fP has been #\f5defined\fP, and by \f50\fP, if not. .PP Comments in skipped lines are interpreted as such. .SH 12.4 Line control .PP Line directives may occur in the following forms: .DS .ft 5 #line \fIconstant\fP #line \fIconstant\fP "\fIfilename\fP" #\fIconstant\fP #\fIconstant\fP "\fIfilename\fP" .ft .DE Note that \fIfilename\fP is enclosed in double quotes. .SH 14.2 Functions .PP If a pointer to a function is called, the function the pointer points to is called instead. .SH 15 Constant expressions .PP The compiler distinguishes the following types of integral constant expressions .IP \(bu field-width specifier .IP \(bu case-entry specifier .IP \(bu array-size specifier .IP \(bu global variable initialization value .IP \(bu enum-value specifier .IP \(bu truth value in \f5#if\fP control line .LP .PP Constant integral expressions are compile-time evaluated while an effort is made to report overflow. Constant floating expressions are not compile-time evaluated. .NH Compiler flags .IP \fB\-C\fR Run the preprocessor stand-alone while maintaining the comments. Line directives are produced whenever needed. .IP \fB\-D\fP\fIname\fP=\fIstring-of-characters\fP .br Define \fIname\fR as macro with \fIstring-of-characters\fR as replacement text. .IP \fB\-D\fP\fIname\fP .br Equal to \fB\-D\fP\fIname\fP\fB=1\fP. .IP \fB\-E\fP Run the preprocessor stand alone, i.e., list the sequence of input tokens and delete any comments. Line directives are produced whenever needed. .IP \fB\-I\fIpath\fR .br Prepend \fIpath\fR to the list of include directories. To put the directories "include", "sys/h" and "util/h" into the include directory list in that order, the user has to specify .DS .ft 5 -Iinclude -Isys/h -Iutil/h .ft R .DE An empty \fIpath\fP causes the standard include directory (usually \f5/usr/include\fP) to be forgotten. .IP \fB\-M\fP\fIn\fP .br Set maximum significant identifier length to \fIn\fP. .IP \fB\-n\fP Suppress EM register messages. The user-declared variables are not stored into registers on the target machine. .IP \fB\-p\fP Generate the EM \fBfil\fP and \fBlin\fP instructions in order to enable an interpreter to keep track of the current location in the source code. .IP \fB\-P\fP Equivalent with \fB\-E\fP, but without line directives. .IP \fB\-R\fP Interpret the input as restricted C (according to the language as described in [1]). .IP \fB\-T\fP\fIpath\fP .br Create temporary files, if necessary, in directory \fIpath\fP. .IP \fB\-U\fP\fIname\fP .br Get rid of the compiler-predefined macro \fIname\fP, i.e., consider .DS .ft 5 #undef \fIname\fP .ft R .DE to appear in the beginning of the file. .IP \fB\-V\fIcm\fR.\fIn\fR,\ \fB\-V\fIcm\fR.\fIncm\fR.\fIn\fR\ ... .br Set the size and alignment requirements. The letter \fIc\fR indicates the simple type, which is one of \fBs\fR(short), \fBi\fR(int), \fBl\fR(long), \fBf\fR(float), \fBd\fR(double) or \fBp\fR(pointer). If \fIc\fR is \fBS\fP or \fBU\fP, then \fIn\fP is taken to be the initial alignment of structs or unions, respectively. The effective alignment of a struct or union is the least common multiple of the initial struct/union alignment and the alignments of its members. The \fIm\fR parameter can be used to specify the length of the type (in bytes) and the \fIn\fR parameter for the alignment of that type. Absence of \fIm\fR or \fIn\fR causes the default value to be retained. To specify that the bitfields should be right adjusted instead of the default left adjustment, specify \fBr\fR as \fIc\fR parameter. .IP \fB\-w\fR Suppress warning messages .IP \fB\-\-\fIcharacter\fR .br Set debug-flag \fIcharacter\fP. This enables some special features offered by a debug and develop version of the compiler. Some particular flags may be recognized, others may have surprising effects. .RS .IP \fBd\fP Generate a dependency graph, reflecting the calling structure of functions. Lines of the form .DS .ft 5 DFA: \fIcalling-function\fP: \fIcalled-function\fP .ft .DE are generated whenever a function call is encountered. .IP \fBf\fP Dump whole identifier table, including macros and reserved words. .IP \fBh\fP Supply hash-table statistics. .IP \fBi\fP Print names of included files. .IP \fBm\fP Supply statistics concerning the memory allocation. .IP \fBt\fP Dump table of identifiers. .IP \fBu\fP Generate extra statistics concerning the predefined types and identifiers. Works in combination with \fBf\fP or \fBt\fP. .IP \fBx\fP Print expression trees in human-readable format. .RE .LP .SH References .IP [1] Brian W. Kernighan, Dennis M. Ritchie, .I The C Programming Language .R .IP [2] L. Rosler, .I Draft Proposed Standard - Programming Language C, .R ANSI X3J11 Language Subcommittee .IP [3] Erik H. Baalbergen, Dick Grune, Maarten Waage, .I The CEM Compiler, .R Informatica Manual IM-4, Dept. of Mathematics and Computer Science, Vrije Universiteit, Amsterdam, The Netherlands .IP [4] Erik H. Baalbergen, .I Modeling global declarations in C, .R internal paper .LP .bp .SH Appendix A - Enumeration Type .PP The syntax is .sp .RS .I enum-specifier : .RS \&\f5enum\fP { \fIenum-list\fP } .br \&\f5enum\fP \fIidentifier\fP { \fIenum-list\fP } .br \&\f5enum\fP \fIidentifier\fP .RE .sp \&\fIenum-list\fP : .RS \&\fIenumerator\fP .br \&\fIenum-list\fP , \fIenumerator\fP .RE .sp \&\fIenumerator\fP : .RS \&\fIidentifier\fP .br \&\fIidentifier\fP = \fIconstant-expression\fP .RE .sp .RE The identifier has the same role as the structure tag in a struct specification. It names a particular enumeration type. .PP The identifiers in the enum-list are declared as constants, and may appear whenever constants are required. If no enumerators with .B = appear, then the values of the constants begin at 0 and increase by 1 as the declaration is read from left to right. An enumerator with .B = gives the associated identifier the value indicated; subsequent identifiers continue the progression from the assigned value. .PP Enumeration tags and constants must all be distinct, and, unlike structure tags and members, are drawn from the same set as ordinary identifiers. .PP Objects of a given enumeration type are regarded as having a type distinct from objects of all other types. .bp .SH Appendix B: C grammar in LL(1) form .PP The \fBbold-faced\fP and \fIitalicized\fP tokens represent terminal symbols. .vs 16 .nf \fBexternal definitions\fP program: external-definition* external-definition: ext-decl-specifiers [declarator [function | non-function] | '\fB;\fP'] | asm-statement ext-decl-specifiers: decl-specifiers? non-function: initializer? ['\fB,\fP' init-declarator]* '\fB;\fP' function: declaration* compound-statement .sp 1 \fBdeclarations\fP declaration: decl-specifiers init-declarator-list? '\fB;\fP' decl-specifiers: other-specifier+ [single-type-specifier other-specifier*]? | single-type-specifier other-specifier* other-specifier: \fBauto\fP | \fBstatic\fP | \fBextern\fP | \fBtypedef\fP | \fBregister\fP | \fBshort\fP | \fBlong\fP | \fBunsigned\fP type-specifier: decl-specifiers single-type-specifier: \fItype-identifier\fP | struct-or-union-specifier | enum-specifier init-declarator-list: init-declarator ['\fB,\fP' init-declarator]* init-declarator: declarator initializer? declarator: primary-declarator ['\fB(\fP' formal-list ? '\fB)\fP' | arrayer]* | '\fB*\fP' declarator primary-declarator: identifier | '\fB(\fP' declarator '\fB)\fP' arrayer: '\fB[\fP' constant-expression? '\fB]\fP' formal-list: formal ['\fB,\fP' formal]* formal: identifier enum-specifier: \fBenum\fP [enumerator-pack | identifier enumerator-pack?] enumerator-pack: '\fB{\fP' enumerator ['\fB,\fP' enumerator]* '\fB,\fP'? '\fB}\fP' enumerator: identifier ['\fB=\fP' constant-expression]? struct-or-union-specifier: [ \fBstruct\fP | \fBunion\fP] [ struct-declaration-pack | identifier struct-declaration-pack?] struct-declaration-pack: '\fB{\fP' struct-declaration+ '\fB}\fP' struct-declaration: type-specifier struct-declarator-list '\fB;\fP'? struct-declarator-list: struct-declarator ['\fB,\fP' struct-declarator]* struct-declarator: declarator bit-expression? | bit-expression bit-expression: '\fB:\fP' constant-expression initializer: '\fB=\fP'? initial-value cast: '\fB(\fP' type-specifier abstract-declarator '\fB)\fP' abstract-declarator: primary-abstract-declarator ['\fB(\fP' '\fB)\fP' | arrayer]* | '\fB*\fP' abstract-declarator primary-abstract-declarator: ['\fB(\fP' abstract-declarator '\fB)\fP']? .sp 1 \fBstatements\fP statement: expression-statement | label '\fB:\fP' statement | compound-statement | if-statement | while-statement | do-statement | for-statement | switch-statement | case-statement | default-statement | break-statement | continue-statement | return-statement | jump | '\fB;\fP' | asm-statement ; expression-statement: expression '\fB;\fP' label: identifier if-statement: \fBif\fP '\fB(\fP' expression '\fB)\fP' statement [\fBelse\fP statement]? while-statement: \fBwhile\fP '\fB(\fP' expression '\fB)\fP' statement do-statement: \fBdo\fP statement \fBwhile\fP '\fB(\fP' expression '\fB)\fP' '\fB;\fP' for-statement: \fBfor\fP '\fB(\fP' expression? '\fB;\fP' expression? '\fB;\fP' expression? '\fB)\fP' statement switch-statement: \fBswitch\fP '\fB(\fP' expression '\fB)\fP' statement case-statement: \fBcase\fP constant-expression '\fB:\fP' statement default-statement: \fBdefault\fP '\fB:\fP' statement break-statement: \fBbreak\fP '\fB;\fP' continue-statement: \fBcontinue\fP '\fB;\fP' return-statement: \fBreturn\fP expression? '\fB;\fP' jump: \fBgoto\fP identifier '\fB;\fP' compound-statement: '\fB{\fP' declaration* statement* '\fB}\fP' asm-statement: \fBasm\fP '\fB(\fP' \fIstring\fP '\fB)\fP' '\fB;\fP' .sp 1 \fBexpressions\fP initial-value: assignment-expression | initial-value-pack initial-value-pack: '\fB{\fP' initial-value-list '\fB}\fP' initial-value-list: initial-value ['\fB,\fP' initial-value]* '\fB,\fP'? primary: \fIidentifier\fP | constant | \fIstring\fP | '\fB(\fP' expression '\fB)\fP' secundary: primary [index-pack | parameter-pack | selection]* index-pack: '\fB[\fP' expression '\fB]\fP' parameter-pack: '\fB(\fP' parameter-list? '\fB)\fP' selection: ['\fB.\fP' | '\fB\->\fP'] identifier parameter-list: assignment-expression ['\fB,\fP' assignment-expression]* postfixed: secundary postop? unary: cast unary | postfixed | unop unary | size-of size-of: \fBsizeof\fP [cast | unary] binary-expression: unary [binop binary-expression]* conditional-expression: binary-expression ['\fB?\fP' expression '\fB:\fP' assignment-expression]? assignment-expression: conditional-expression [asgnop assignment-expression]? expression: assignment-expression ['\fB,\fP' assignment-expression]* unop: '\fB*\fP' | '\fB&\fP' | '\fB\-\fP' | '\fB!\fP' | '\fB~ \fP' | '\fB++\fP' | '\fB\-\-\fP' postop: '\fB++\fP' | '\fB\-\-\fP' multop: '\fB*\fP' | '\fB/\fP' | '\fB%\fP' addop: '\fB+\fP' | '\fB\-\fP' shiftop: '\fB<<\fP' | '\fB>>\fP' relop: '\fB<\fP' | '\fB>\fP' | '\fB<=\fP' | '\fB>=\fP' eqop: '\fB==\fP' | '\fB!=\fP' arithop: multop | addop | shiftop | '\fB&\fP' | '\fB^ \fP' | '\fB|\fP' binop: arithop | relop | eqop | '\fB&&\fP' | '\fB||\fP' asgnop: '\fB=\fP' | '\fB+\fP' '\fB=\fP' | '\fB\-\fP' '\fB=\fP' | '\fB*\fP' '\fB=\fP' | '\fB/\fP' '\fB=\fP' | '\fB%\fP' '\fB=\fP' | '\fB<<\fP' '\fB=\fP' | '\fB>>\fP' '\fB=\fP' | '\fB&\fP' '\fB=\fP' | '\fB^ \fP' '\fB=\fP' | '\fB|\fP' '\fB=\fP' | '\fB+=\fP' | '\fB\-=\fP' | '\fB*=\fP' | '\fB/=\fP' | '\fB%=\fP' | '\fB<<=\fP' | '\fB>>=\fP' | '\fB&=\fP' | '\fB^=\fP' | '\fB|=\fP' constant: \fIinteger\fP | \fIfloating\fP constant-expression: assignment-expression identifier: \fIidentifier\fP | \fItype-identifier\fP .fi