.\" $Id$ .\" Run this paper off with .\" refer [options] -p LLgen.refs LLgen.doc | [n]eqn | tbl | (nt)roff -ms .if '\*(>.'' \{\ . if '\*(<.'' \{\ . if n .ds >. . . if n .ds >, , . if t .ds <. . . if t .ds <, ,\ \}\ \} .R1 database doc/LLgen.refs accumulate .R2 .ND .EQ delim @@ .EN .TL LLgen, an extended LL(1) parser generator .AU Ceriel J. H. Jacobs .AI Dept. of Mathematics and Computer Science Vrije Universiteit Amsterdam, The Netherlands .AB \fILLgen\fR provides a tool for generating an efficient recursive descent parser with no backtrack from an Extended Context Free syntax. The \fILLgen\fR user specifies the syntax, together with code describing actions associated with the parsing process. \fILLgen\fR turns this specification into a number of subroutines that handle the parsing process. .PP The grammar may be ambiguous. \fILLgen\fR contains both static and dynamic facilities to resolve these ambiguities. .PP The specification can be split into several files, for each of which \fILLgen\fR generates an output file containing the corresponding part of the parser. Furthermore, only output files that differ from their previous version are updated. Other output files are not affected in any way. This allows the user to recompile only those output files that have changed. .PP The subroutine produced by \fILLgen\fR calls a user supplied routine that must return the next token. This way, the input to the parser can be split into single characters or higher level tokens. .PP An error recovery mechanism is generated almost completely automatically. It is based on so called \fBdefault choices\fR, which are implicitly or explicitly specified by the user. .PP \fILLgen\fR has succesfully been used to create recognizers for Pascal, C, and Modula-2. .AE .NH Introduction .PP \fILLgen\fR provides a tool for generating an efficient recursive descent parser with no backtrack from an Extended Context Free syntax. A parser generated by \fILLgen\fR will be called \fILLparse\fR for the rest of this document. It is assumed that the reader has some knowledge of LL(1) grammars and recursive descent parsers. For a survey on the subject, see reference .[ ( griffiths .]). .PP Extended LL(1) parsers are an extension of LL(1) parsers. They are derived from an Extended Context-Free (ECF) syntax instead of a Context-Free (CF) syntax. ECF syntax is described in section 2. Section 3 provides an outline of a specification as accepted by \fILLgen\fR and also discusses the lexical conventions of grammar specification files. Section 4 provides a description of the way the \fILLgen\fR user can associate actions with the syntax. These actions must be written in the programming language C, .[ kernighan ritchie .] which also is the target language of \fILLgen\fR. The error recovery technique is discussed in section 5. This section also discusses what the user can do about it. Section 6 discusses the facilities \fILLgen\fR offers to resolve ambiguities and conflicts. \fILLgen\fR offers facilities to resolve them both at parser generation time and during the execution of \fILLparse\fR. Section 7 discusses the \fILLgen\fR working environment. It also discusses the lexical analyzer that must be supplied by the user. This lexical analyzer must read the input stream and break it up into basic input items, called \fBtokens\fR for the rest of this document. Appendix A gives a summary of the \fILLgen\fR input syntax. Appendix B gives an example. It is very instructive to compare this example with the one given in reference .[ ( yacc .]). It demonstrates the struggle \fILLparse\fR and other LL(1) parsers have with expressions. Appendix C gives an example of the \fILLgen\fR features allowing the user to recompile only those output files that have changed, using the \fImake\fR program. .[ make .] .NH The Extended Context-Free Syntax .PP The extensions of an ECF syntax with respect to an ordinary CF syntax are: .IP 1. 10 An ECF syntax contains the repetition operator: "N" (N represents a positive integer). .IP 2. 10 An ECF syntax contains the closure set operator without and with upperbound: "*" and "*N". .IP 3. 10 An ECF syntax contains the positive closure set operator without and with upperbound: "+" and "+N". .IP 4. 10 An ECF syntax contains the optional operator: "?", which is a shorthand for "*1". .IP 5. 10 An ECF syntax contains parentheses "[" and "]" which can be used for grouping. .PP We can describe the syntax of an ECF syntax with an ECF syntax : .DS .ft CW grammar : rule + ; .ft R .DE This grammar rule states that a grammar consists of one or more rules. .DS .ft CW rule : nonterminal ':' productionrule ';' ; .ft R .DE A rule consists of a left hand side, the nonterminal, followed by ":", the \fBproduce symbol\fR, followed by a production rule, followed by a ";", in\%di\%ca\%ting the end of the rule. .DS .ft CW productionrule : production [ '|' production ]* ; .ft R .DE A production rule consists of one or more alternative productions separated by "|". This symbol is called the \fBalternation symbol\fR. .DS .ft CW production : term * ; .ft R .DE A production consists of a possibly empty list of terms. So, empty productions are allowed. .DS .ft CW term : element repeats ; .ft R .DE A term is an element, possibly with a repeat specification. .DS .ft CW element : LITERAL | IDENTIFIER | '[' productionrule ']' ; .ft R .DE An element can be a LITERAL, which basically is a single character between apostrophes, it can be an IDENTIFIER, which is either a nonterminal or a token, and it can be a production rule between square parentheses. .DS .ft CW repeats : '?' | [ '*' | '+' ] NUMBER ? | NUMBER ? ; .ft R .DE These are the repeat specifications discussed above. Notice that this specification may be empty. .PP The class of ECF languages is identical with the class of CF languages. However, in many cases recursive definitions of language features can now be replaced by iterative ones. This tends to reduce the number of nonterminals and gives rise to very efficient recursive descent parsers. .NH Grammar Specifications .PP The major part of a \fILLgen\fR grammar specification consists of an ECF syntax specification. Names in this syntax specification refer to either tokens or nonterminal symbols. \fILLgen\fR requires token names to be declared as such. This way it can be avoided that a typing error in a nonterminal name causes it to be accepted as a token name. The token declarations will be discussed later. A name will be regarded as a nonterminal symbol, unless it is declared as a token name. If there is no production rule for a nonterminal symbol, \fILLgen\fR will complain. .PP A grammar specification may also include some C routines, for instance the lexical analyzer and an error reporting routine. Thus, a grammar specification file can contain declarations, grammar rules and C-code. .PP Blanks, tabs and newlines are ignored, but may not appear in names or keywords. Comments may appear wherever a name is legal (which is almost everywhere). They are enclosed in /* ... */, as in C. Comments do not nest. .PP Names may be of arbitrary length, and can be made up of letters, underscore "\_" and non-initial digits. Upper and lower case letters are distinct. Only the first 50 characters are significant. Notice however, that the names for the tokens will be used by the C-preprocessor. The number of significant characters therefore depends on the underlying C-implementation. A safe rule is to make the identifiers distinct in the first six characters, case ignored. .PP There are two kinds of tokens: those that are declared and are denoted by a name, and literals. .PP A literal consists of a character enclosed in apostrophes "'". The "\e" is an escape character within literals. The following escapes are recognized : .TS center; l l. \&'\en' newline \&'\er' return \&'\e'' apostrophe "'" \&'\e\e' backslash "\e" \&'\et' tab \&'\eb' backspace \&'\ef' form feed \&'\exxx' "xxx" in octal .TE .PP Names representing tokens must be declared before they are used. This can be done using the "\fB%token\fR" keyword, by writing .nf .ft CW .sp 1 %token name1, name2, . . . ; .ft R .fi .PP \fILLparse\fR is designed to recognize special nonterminal symbols called \fBstart symbols\fR. \fILLgen\fR allows for more than one start symbol. Thus, grammars with more than one entry point are accepted. The start symbols must be declared explicitly using the "\fB%start\fR" keyword. It can be used whenever a declaration is legal, f.i.: .nf .ft CW .sp 1 %start LLparse, specification ; .ft R .fi .sp 1 declares "specification" as a start symbol and associates the identifier "LLparse" with it. "LLparse" will now be the name of the C-function that must be called to recognize "specification". .NH Actions .PP \fILLgen\fR allows arbitrary insertions of actions within the right hand side of a production rule in the ECF syntax. An action consists of a number of C statements, enclosed in the brackets "{" and "}". .PP \fILLgen\fR generates a parsing routine for each rule in the grammar. The actions supplied by the user are just inserted in the proper place. There may also be declarations before the statements in the action, as the "{" and "}" are copied into the target code along with the action. The scope of these declarations terminates with the closing bracket "}" of the action. .PP In addition to actions, it is also possible to declare local variables in the parsing routine, which can then be used in the actions. Such a declaration consists of a number of C variable declarations, enclosed in the brackets "{" and "}". It must be placed right in front of the ":" in the grammar rule. The scope of these local variables consists of the complete grammar rule. .PP In order to facilitate communication between the actions and \fILLparse\fR, the parsing routines can be given C-like parameters. Each parameter must be declared separately, and each of these declarations must end with a semicolon. For the last parameter, the semicolon is optional. .PP So, for example .nf .ft CW .sp 1 expr(int *pval;) { int fact; } : /* * Rule with one parameter, a pointer to an int. * Parameter specifications are ordinary C declarations. * One local variable, of type int. */ factor (&fact) { *pval = fact; } /* * factor is another nonterminal symbol. * One actual parameter is supplied. * Notice that the parameter passing mechanism is that * of C. */ [ '+' factor (&fact) { *pval += fact; } ]* /* * remember the '*' means zero or more times */ ; .sp 1 .ft R .fi is a rule to recognize a number of factors, separated by "+", and to compute their sum. .PP \fILLgen\fR generates C code, so the parameter passing mechanism is that of C, as is shown in the example above. .PP Actions often manipulate attributes of the token just read. For instance, when an identifier is read, its name must be looked up in a symbol table. Therefore, \fILLgen\fR generates code such that at a number of places in the grammar rule it is defined which token has last been read. After a token, the last token read is this token. After a "[" or a "|", the last token read is the next token to be accepted by \fILLparse\fR. At all other places, it is undefined which token has last been read. The last token read is available in the global integer variable \fILLsymb\fR. .PP The user may also specify C-code wherever a \fILLgen\fR-declaration is legal. Again, this code must be enclosed in the brackets "{" and "}". This way, the user can define global declarations and C-functions. To avoid name-conflicts with identifiers generated by \fILLgen\fR, \fILLparse\fR only uses names beginning with "LL"; the user should avoid such names. .NH Error Recovery .PP The error recovery technique used by \fILLgen\fR is a modification of the one presented in reference .[ ( automatic construction error correcting .]). It is based on \fBdefault choices\fR, which just are what the word says, default choices at every point in the grammar where there is a choice. Thus, in an alternation, one of the productions is marked as a default choice, and in a term with a non-fixed repetition specification there will also be a default choice (between doing the term (once more) and continuing with the rest of the production in which the term appears). .PP When \fILLparse\fR detects an error after having parsed the string @s@, the default choices enable it to compute one syntactically correct continuation, consisting of the tokens @t sub 1~...~t sub n@, such that @s~t sub 1~...~t sub n@ is a string of tokens that is a member of the language defined by the grammar. Notice, that the computation of this continuation must terminate, which implies that the default choices may not invoke recursive rules. .PP At each point in this continuation, a certain number of other tokens could also be syntactically correct, f.i. the token @t@ is syntactically correct at point @t sub i@ in this continuation, if the string @s~t sub 1~...~t sub i~t~s sub 1@ is a string of the language defined by the grammar for some string @s sub 1@ and i >= 0. .PP The set @T@ containing all these tokens (including @t sub 1 ,~...,~t sub n@) is computed. Next, \fILLparse\fR discards zero or more tokens from its input, until a token @t@ \(mo @T@ is found. The error is then corrected by inserting i (i >= 0) tokens @t sub 1~...~t sub i@, such that the string @s~t sub 1~...~t sub i~t~s sub 1@ is a string of the language defined by the grammar, for some @s sub 1@. Then, normal parsing is resumed. .PP The above is difficult to implement in a recursive decent parser, and is not the way \fILLparse\fR does it, but the effect is the same. In fact, \fILLparse\fR maintains a list of tokens that may not be discarded, which is adjusted as \fILLparse\fR proceeds. This list is just a representation of the set @T@ mentioned above. When an error occurs, \fILLparse\fR discards tokens until a token @t@ that is a member of this list is found. Then, it continues parsing, following the default choices, inserting tokens along the way, until this token @t@ is legal. The selection of the default choices must guarantee that this will always happen. .PP The default choices are explicitly or implicitly specified by the user. By default, the default choice in an alternation is the alternative with the shortest possible terminal production. The user can select one of the other productions in the alternation as the default choice by putting the keyword "\fB%default\fR" in front of it. .PP By default, for terms with a repetition count containing "*" or "?" the default choice is to continue with the rest of the rule in which the term appears, and .sp 1 .ft CW .nf term+ .fi .ft R .sp 1 is treated as .sp 1 .nf .ft CW term term* . .ft R .fi .PP It is also clear, that it can never be the default choice to do the term (once more), because this could cause the parser to loop, inserting tokens forever. However, when the user does not want the parser to skip tokens that would not have been skipped if the term would have been the default choice, the skipping of such a term can be prevented by using the keyword "\fB%persistent\fR". For instance, the rule .sp 1 .ft CW .nf commandlist : command* ; .fi .ft R .sp 1 could be changed to .sp 1 .ft CW .nf commandlist : [ %persistent command ]* ; .fi .ft R .sp 1 The effects of this in case of a syntax error are twofold: The set @T@ mentioned above will be extended as if "command" were in the default production, so that fewer tokens will be skipped. Also, if the first token that is not skipped is a member of the subset of @T@ arising from the grammar rule for "command", \fILLparse\fR will enter that rule. So, in fact the default choice is determined dynamically (by \fILLparse\fR). Again, \fILLgen\fR checks (statically) that \fILLparse\fR will always terminate, and if not, \fILLgen\fR will complain. .PP An important property of this error recovery method is that, once a rule is started, it will be finished. This means that all actions in the rule will be executed normally, so that the user can be sure that there will be no inconsistencies in his data structures because of syntax errors. Also, as the method is in fact error correcting, the actions in a rule only have to deal with syntactically correct input. .NH Ambiguities and conflicts .PP As \fILLgen\fR generates a recursive descent parser with no backtrack, it must at all times be able to determine what to do, based on the current input symbol. Unfortunately, this cannot be done for all grammars. Two kinds of conflicts can arise : .IP 1) 10 the grammar rule is of the form "production1 | production2", and \fILLparse\fR cannot decide which production to chose. This we call an \fBalternation conflict\fR. .IP 2) 10 the grammar rule is of the form "[ productionrule ]...", where ... specifies a non-fixed repetition count, and \fILLparse\fR cannot decide whether to choose "productionrule" once more, or to continue. This we call a \fBrepetition conflict\fR. .PP There can be several causes for conflicts: the grammar may be ambiguous, or the grammar may require a more complex parser than \fILLgen\fR can construct. The conflicts can be examined by inspecting the verbose (-\fBv\fR) option output file. The conflicts can be resolved by rewriting the grammar or by using \fBconflict resolvers\fR. The mechanism described here is based on the attributed parsing of reference .[ ( milton .]). .PP An alternation conflict can be resolved by putting an \fBif condition\fR in front of the first conflicting production. It consists of a "\fB%if\fR" followed by a C-expression between parentheses. \fILLparse\fR will then evaluate this expression whenever a token is met at this point on which there is a conflict, so the conflict will be resolved dynamically. If the expression evaluates to non-zero, the first conflicting production is chosen, otherwise one of the remaining ones is chosen. .PP An alternation conflict can also be resolved using the keywords "\fB%prefer\fR" or "\fB%avoid\fR". "\fB%prefer\fR" is equivalent in behaviour to "\fB%if\fR (1)". "\fB%avoid\fR" is equivalent to "\fB%if\fR (0)". In these cases however, "\fB%prefer\fR" and "\fB%avoid\fR" should be used, as they resolve the conflict statically and thus give rise to better C-code. .PP A repetition conflict can be resolved by putting a \fBwhile condition\fR right after the opening parentheses. This while condition consists of a "\fB%while\fR" followed by a C-expression between parentheses. Again, \fILLparse\fR will then evaluate this expression whenever a token is met at this point on which there is a conflict. If the expression evaluates to non-zero, the repeating part is chosen, otherwise the parser continues with the rest of the rule. Appendix B will give an example of these features. .PP A useful aid in writing conflict resolvers is the "\fB%first\fR" keyword. It is used to declare a C-macro that forms an expression returning 1 if the parameter supplied can start a specified nonterminal, f.i.: .sp 1 .nf .ft CW %first fmac, nonterm ; .ft R .sp 1 .fi declares "fmac" as a macro with one parameter, whose value is a token number. If the parameter X can start the nonterminal "nonterm", "fmac(X)" is true, otherwise it is false. .NH The LLgen working environment .PP \fILLgen\fR generates a number of files: one for each input file, and two other files: \fILpars.c\fR and \fILpars.h\fR. \fILpars.h\fR contains "#-define"s for the tokennames. \fILpars.c\fR contains the error recovery routines and tables. Only those output files that differ from their previous version are updated. See appendix C for a possible application of this feature. .PP The names of the output files are constructed as follows: in the input file name, the suffix after the last point is replaced by a "c". If no point is present in the input file name, ".c" is appended to it. \fILLgen\fR checks that the filename constructed this way in fact represents a previous version, or does not exist already. .PP The user must provide some environment to obtain a complete program. Routines called \fImain\fR and \fILLmessage\fR must be defined. Also, a lexical analyzer must be provided. .PP The routine \fImain\fR must be defined, as it must be in every C-program. It should eventually call one of the startsymbol routines. .PP The routine \fILLmessage\fR must accept one parameter, whose value is a token number, zero or -1. .br A zero parameter indicates that the current token (the one in the external variable \fILLsymb\fR) is deleted. .br A -1 parameter indicates that the parser expected end of file, but didn't get it. The parser will then skip tokens until end of file is detected. .br A parameter that is a token number (a positive parameter) indicates that this token is to be inserted in front of the token currently in \fILLsymb\fR. The user can give the token the proper attributes. Also, the user must take care, that the token currently in \fILLsymb\fR is again returned by the \fBnext\fR call to the lexical analyzer, with the proper attributes. So, the lexical analyzer must have a facility to push back one token. .PP The user may also supply his own error recovery routines, or handle errors differently. For this purpose, the name of a routine to be called when an error occurs may be declared using the keyword \fB%onerror\fR. This routine takes two parameters. The first one is either the token number of the token expected, or 0. In the last case, the error occurred at a choice. In both cases, the routine must ensure that the next call to the lexical analyser returns the token that replaces the current one. Of course, that could well be the current one, in which case .I LLparse recovers from the error. The second parameter contains a list of tokens that are not skipped at the error point. The list is in the form of a null-terminated array of integers, whose address is passed. .PP The user must supply a lexical analyzer to read the input stream and break it up into tokens, which are passed to .I LLparse. It should be an integer valued function, returning the token number. The name of this function can be declared using the "\fB%lexical\fR" keyword. This keyword can be used wherever a declaration is legal and may appear only once in the grammar specification, f.i.: .sp 1 .nf .ft CW %lexical scanner ; .ft R .fi .sp 1 declares "scanner" as the name of the lexical analyzer. The default name for the lexical analyzer is "yylex". The reason for this funny name is that a useful tool for constructing lexical analyzers is the .I Lex program, .[ lex .] which generates a routine of that name. .PP The token numbers are chosen by \fILLgen\fR. The token number for a literal is the numerical value of the character in the local character set. If the tokens have a name, the "#\ define" mechanism of C is used to give them a value and to allow the lexical analyzer to return their token numbers symbolically. These "#\ define"s are collected in the file \fILpars.h\fR which can be "#\ include"d in any file that needs the token-names. The maximum token number chosen is defined in the macro \fILL_MAXTOKNO\fP. .PP The lexical analyzer must signal the end of input to \fILLparse\fR by returning a number less than or equal to zero. .NH Programs with more than one parser .PP \fILLgen\fR offers a simple facility for having more than one parser in a program: in this case, the user can change the names of global procedures, variables, etc, by giving a different prefix, like this: .sp 1 .nf .ft CW %prefix XX ; .ft R .fi .sp 1 The effect of this is that all global names start with XX instead of LL, for the parser that has this prefix. This holds for the variables \fILLsymb\fP, which now is called \fIXXsymb\fP, for the routine \fILLmessage\fP, which must now be called \fIXXmessage\fP, and for the macro \fILL_MAXTOKNO\fP, which is now called \fIXX_MAXTOKNO\fP. \fILL.output\fP is now \fIXX.output\fP, and \fILpars.c\fP and \fILpars.h\fP are now called \fIXXpars.c\fP and \fIXXpars.h\fP. .bp .SH References .[ $LIST$ .] .bp .SH Appendix A : LLgen Input Syntax .PP This appendix has a description of the \fILLgen\fR input syntax, as a \fILLgen\fR specification. As a matter of fact, the current version of \fILLgen\fR is written with \fILLgen\fR. .nf .ft CW .sp 2 /* * First the declarations of the terminals * The order is not important */ %token IDENTIFIER; /* terminal or nonterminal name */ %token NUMBER; %token LITERAL; /* * Reserved words */ %token TOKEN; /* %token */ %token START; /* %start */ %token PERSISTENT; /* %persistent */ %token IF; /* %if */ %token WHILE; /* %while */ %token AVOID; /* %avoid */ %token PREFER; /* %prefer */ %token DEFAULT; /* %default */ %token LEXICAL; /* %lexical */ %token PREFIX; /* %prefix */ %token ONERROR; /* %onerror */ %token FIRST; /* %first */ /* * Declare LLparse to be a C-routine that recognizes "specification" */ %start LLparse, specification; specification : declaration* ; declaration : START IDENTIFIER ',' IDENTIFIER ';' | '{' /* Read C-declaration here */ '}' | TOKEN IDENTIFIER [ ',' IDENTIFIER ]* ';' | FIRST IDENTIFIER ',' IDENTIFIER ';' | LEXICAL IDENTIFIER ';' | PREFIX IDENTIFIER ';' | ONERROR IDENTIFIER ';' | rule ; rule : IDENTIFIER parameters? ldecl? ':' productions ';' ; ldecl : '{' /* Read C-declaration here */ '}' ; productions : simpleproduction [ '|' simpleproduction ]* ; simpleproduction : DEFAULT? [ IF '(' /* Read C-expression here */ ')' | PREFER | AVOID ]? [ element repeats ]* ; element : '{' /* Read action here */ '}' | '[' [ WHILE '(' /* Read C-expression here */ ')' ]? PERSISTENT? productions ']' | LITERAL | IDENTIFIER parameters? ; parameters : '(' /* Read C-parameters here */ ')' ; repeats : /* empty */ | [ '*' | '+' ] NUMBER? | NUMBER | '?' ; .fi .ft R .bp .SH Appendix B : An example .PP This example gives the complete \fILLgen\fR specification of a simple desk calculator. It has 26 registers, labeled "a" through "z", and accepts arithmetic expressions made up of the C operators +, -, *, /, %, &, and |, with their usual priorities. The value of the expression is printed. As in C, an integer that begins with 0 is assumed to be octal; otherwise it is assumed to be decimal. .PP Although the example is short and not very complicated, it demonstrates the use of if and while conditions. In the example they are in fact used to reduce the number of nonterminals, and to reduce the overhead due to the recursion that would be involved in parsing an expression with an ordinary recursive descent parser. In an ordinary LL(1) grammar there would be one nonterminal for each operator priority. The example shows how we can do it all with one nonterminal, no matter how many priority levels there are. .sp 1 .nf .ft CW { #include #include #define MAXPRIO 5 #define prio(op) (ptab[op]) struct token { int t_tokno; /* token number */ int t_tval; /* Its attribute */ } stok = { 0,0 }, tok; int nerrors = 0; int regs[26]; /* Space for the registers */ int ptab[128]; /* Attribute table */ struct token nexttok() { /* Read next token and return it */ register c; struct token new; while ((c = getchar()) == ' ' || c == '\et') { /* nothing */ } if (isdigit(c)) new.t_tokno = DIGIT; else if (islower(c)) new.t_tokno = IDENT; else new.t_tokno = c; if (c >= 0) new.t_tval = ptab[c]; return new; } } %token DIGIT, IDENT; %start parse, list; list : stat* ; stat { int ident, val; } : %if (stok = nexttok(), stok.t_tokno == '=') /* The conflict is resolved by looking one further * token ahead. The grammar is LL(2) */ IDENT { ident = tok.t_tval; } '=' expr(1,&val) '\en' { if (!nerrors) regs[ident] = val; } | expr(1,&val) '\en' { if (!nerrors) printf("%d\en",val); } | '\en' ; expr(int level; int *val;) { int expr; } : factor(val) [ %while (prio(tok.t_tokno) >= level) /* Swallow operators as long as their priority is * larger than or equal to the level of this invocation */ '+' expr(prio('+')+1,&expr) { *val += expr; } /* This states that '+' groups left to right. If it * should group right to left, the rule should read: * '+' expr(prio('+'),&expr) */ | '-' expr(prio('-')+1,&expr) { *val -= expr; } | '*' expr(prio('*')+1,&expr) { *val *= expr; } | '/' expr(prio('/')+1,&expr) { *val /= expr; } | '%' expr(prio('%')+1,&expr) { *val %= expr; } | '&' expr(prio('&')+1,&expr) { *val &= expr; } | '|' expr(prio('|')+1,&expr) { *val |= expr; } ]* /* Notice the "*" here. It is important. */ ; factor(int *val;): '(' expr(1,val) ')' | '-' expr(MAXPRIO+1,val) { *val = -*val; } | number(val) | IDENT { *val = regs[tok.t_tval]; } ; number(int *val;) { int base; } : DIGIT { base = (*val=tok.t_tval)==0?8:10; } [ DIGIT { *val = base * *val + tok.t_tval; } ]* ; %lexical scanner ; { scanner() { if (stok.t_tokno) { /* a token has been inserted or read ahead */ tok = stok; stok.t_tokno = 0; return tok.t_tokno; } if (nerrors && tok.t_tokno == '\en') { printf("ERROR\en"); nerrors = 0; } tok = nexttok(); return tok.t_tokno; } LLmessage(insertedtok) { nerrors++; if (insertedtok) { /* token inserted, save old token */ stok = tok; tok.t_tval = 0; if (insertedtok < 128) tok.t_tval = ptab[insertedtok]; } } main() { register *p; for (p = ptab; p < &ptab[128]; p++) *p = 0; /* for letters, their attribute is their index in the regs array */ for (p = &ptab['a']; p <= &ptab['z']; p++) *p = p - &ptab['a']; /* for digits, their attribute is their value */ for (p = &ptab['0']; p <= &ptab['9']; p++) *p = p - &ptab['0']; /* for operators, their attribute is their priority */ ptab['*'] = 4; ptab['/'] = 4; ptab['%'] = 4; ptab['+'] = 3; ptab['-'] = 3; ptab['&'] = 2; ptab['|'] = 1; parse(); exit(nerrors); } } .fi .ft R .bp .SH Appendix C. How to use \fILLgen\fR. .PP This appendix demonstrates how \fILLgen\fR can be used in combination with the \fImake\fR program, to make effective use of the \fILLgen\fR-feature that it only changes output files when neccessary. \fIMake\fR uses a "makefile", which is a file containing dependencies and associated commands. A dependency usually indicates that some files depend on other files. When a file depends on another file and is older than that other file, the commands associated with the dependency are executed. .PP So, \fImake\fR seems just the program that we always wanted. However, it is not very good in handling programs that generate more than one file. As usual, there is a way around this problem. A sample makefile follows: .sp 1 .ft CW .nf # The grammar exists of the files decl.g, stat.g and expr.g. # The ".o"-files are the result of a C-compilation. GFILES = decl.g stat.g expr.g OFILES = decl.o stat.o expr.o Lpars.o LLOPT = # As make does'nt handle programs that generate more than one # file well, we just don't tell make about it. # We just create a dummy file, and touch it whenever LLgen is # executed. This way, the dummy in fact depends on the grammar # files. # Then, we execute make again, to do the C-compilations and # such. all: dummy make parser dummy: $(GFILES) LLgen $(LLOPT) $(GFILES) touch dummy parser: $(OFILES) $(CC) -o parser $(LDFLAGS) $(OFILES) # Some dependencies without actions : # make already knows what to do about them Lpars.o: Lpars.h stat.o: Lpars.h decl.o: Lpars.h expr.o: Lpars.h .fi .ft R