Ceci est le fichier Info bison.info, produit par Makeinfo version 4.2 � partir bison.texinfo. This manual is for GNU Bison (version 1.75, 14 October 2002), the GNU parser generator. Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999, 2000, 2001, 2002 Free Software Foundation, Inc. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with the Front-Cover texts being "A GNU Manual," and with the Back-Cover Texts as in (a) below. A copy of the license is included in the section entitled "GNU Free Documentation License." (a) The FSF's Back-Cover Text is: "You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development." INFO-DIR-SECTION GNU programming tools START-INFO-DIR-ENTRY * bison: (bison). GNU parser generator (yacc replacement). END-INFO-DIR-ENTRY File: bison.info, Node: Semantic Tokens, Next: Lexical Tie-ins, Up: Context Dependency Semantic Info in Token Types ============================ The C language has a context dependency: the way an identifier is used depends on what its current meaning is. For example, consider this: foo (x); This looks like a function call statement, but if `foo' is a typedef name, then this is actually a declaration of `x'. How can a Bison parser for C decide how to parse this input? The method used in GNU C is to have two different token types, `IDENTIFIER' and `TYPENAME'. When `yylex' finds an identifier, it looks up the current declaration of the identifier in order to decide which token type to return: `TYPENAME' if the identifier is declared as a typedef, `IDENTIFIER' otherwise. The grammar rules can then express the context dependency by the choice of token type to recognize. `IDENTIFIER' is accepted as an expression, but `TYPENAME' is not. `TYPENAME' can start a declaration, but `IDENTIFIER' cannot. In contexts where the meaning of the identifier is _not_ significant, such as in declarations that can shadow a typedef name, either `TYPENAME' or `IDENTIFIER' is accepted--there is one rule for each of the two token types. This technique is simple to use if the decision of which kinds of identifiers to allow is made at a place close to where the identifier is parsed. But in C this is not always so: C allows a declaration to redeclare a typedef name provided an explicit type has been specified earlier: typedef int foo, bar, lose; static foo (bar); /* redeclare `bar' as static variable */ static int foo (lose); /* redeclare `foo' as function */ Unfortunately, the name being declared is separated from the declaration construct itself by a complicated syntactic structure--the "declarator". As a result, part of the Bison parser for C needs to be duplicated, with all the nonterminal names changed: once for parsing a declaration in which a typedef name can be redefined, and once for parsing a declaration in which that can't be done. Here is a part of the duplication, with actions omitted for brevity: initdcl: declarator maybeasm '=' init | declarator maybeasm ; notype_initdcl: notype_declarator maybeasm '=' init | notype_declarator maybeasm ; Here `initdcl' can redeclare a typedef name, but `notype_initdcl' cannot. The distinction between `declarator' and `notype_declarator' is the same sort of thing. There is some similarity between this technique and a lexical tie-in (described next), in that information which alters the lexical analysis is changed during parsing by other parts of the program. The difference is here the information is global, and is used for other purposes in the program. A true lexical tie-in has a special-purpose flag controlled by the syntactic context. File: bison.info, Node: Lexical Tie-ins, Next: Tie-in Recovery, Prev: Semantic Tokens, Up: Context Dependency Lexical Tie-ins =============== One way to handle context-dependency is the "lexical tie-in": a flag which is set by Bison actions, whose purpose is to alter the way tokens are parsed. For example, suppose we have a language vaguely like C, but with a special construct `hex (HEX-EXPR)'. After the keyword `hex' comes an expression in parentheses in which all integers are hexadecimal. In particular, the token `a1b' must be treated as an integer rather than as an identifier if it appears in that context. Here is how you can do it: %{ int hexflag; %} %% ... expr: IDENTIFIER | constant | HEX '(' { hexflag = 1; } expr ')' { hexflag = 0; $$ = $4; } | expr '+' expr { $$ = make_sum ($1, $3); } ... ; constant: INTEGER | STRING ; Here we assume that `yylex' looks at the value of `hexflag'; when it is nonzero, all integers are parsed in hexadecimal, and tokens starting with letters are parsed as integers if possible. The declaration of `hexflag' shown in the prologue of the parser file is needed to make it accessible to the actions (*note The Prologue: Prologue.). You must also write the code in `yylex' to obey the flag. File: bison.info, Node: Tie-in Recovery, Prev: Lexical Tie-ins, Up: Context Dependency Lexical Tie-ins and Error Recovery ================================== Lexical tie-ins make strict demands on any error recovery rules you have. *Note Error Recovery::. The reason for this is that the purpose of an error recovery rule is to abort the parsing of one construct and resume in some larger construct. For example, in C-like languages, a typical error recovery rule is to skip tokens until the next semicolon, and then start a new statement, like this: stmt: expr ';' | IF '(' expr ')' stmt { ... } ... error ';' { hexflag = 0; } ; If there is a syntax error in the middle of a `hex (EXPR)' construct, this error rule will apply, and then the action for the completed `hex (EXPR)' will never run. So `hexflag' would remain set for the entire rest of the input, or until the next `hex' keyword, causing identifiers to be misinterpreted as integers. To avoid this problem the error recovery rule itself clears `hexflag'. There may also be an error recovery rule that works within expressions. For example, there could be a rule which applies within parentheses and skips to the close-parenthesis: expr: ... | '(' expr ')' { $$ = $2; } | '(' error ')' ... If this rule acts within the `hex' construct, it is not going to abort that construct (since it applies to an inner level of parentheses within the construct). Therefore, it should not clear the flag: the rest of the `hex' construct should be parsed with the flag still in effect. What if there is an error recovery rule which might abort out of the `hex' construct or might not, depending on circumstances? There is no way you can write the action to determine whether a `hex' construct is being aborted or not. So if you are using a lexical tie-in, you had better make sure your error recovery rules are not of this kind. Each rule must be such that you can be sure that it always will, or always won't, have to clear the flag. File: bison.info, Node: Debugging, Next: Invocation, Prev: Context Dependency, Up: Top Debugging Your Parser ********************* Developing a parser can be a challenge, especially if you don't understand the algorithm (*note The Bison Parser Algorithm: Algorithm.). Even so, sometimes a detailed description of the automaton can help (*note Understanding Your Parser: Understanding.), or tracing the execution of the parser can give some insight on why it behaves improperly (*note Tracing Your Parser: Tracing.). * Menu: * Understanding:: Understanding the structure of your parser. * Tracing:: Tracing the execution of your parser. File: bison.info, Node: Understanding, Next: Tracing, Up: Debugging Understanding Your Parser ========================= As documented elsewhere (*note The Bison Parser Algorithm: Algorithm.) Bison parsers are "shift/reduce automata". In some cases (much more frequent than one would hope), looking at this automaton is required to tune or simply fix a parser. Bison provides two different representation of it, either textually or graphically (as a VCG file). The textual file is generated when the options `--report' or `--verbose' are specified, see *Note Invoking Bison: Invocation. Its name is made by removing `.tab.c' or `.c' from the parser output file name, and adding `.output' instead. Therefore, if the input file is `foo.y', then the parser file is called `foo.tab.c' by default. As a consequence, the verbose output file is called `foo.output'. The following grammar file, `calc.y', will be used in the sequel: %token NUM STR %left '+' '-' %left '*' %% exp: exp '+' exp | exp '-' exp | exp '*' exp | exp '/' exp | NUM ; useless: STR; %% `bison' reports: calc.y: warning: 1 useless nonterminal and 1 useless rule calc.y:11.1-7: warning: useless nonterminal: useless calc.y:11.8-12: warning: useless rule: useless: STR calc.y contains 7 shift/reduce conflicts. When given `--report=state', in addition to `calc.tab.c', it creates a file `calc.output' with contents detailed below. The order of the output and the exact presentation might vary, but the interpretation is the same. The first section includes details on conflicts that were solved thanks to precedence and/or associativity: Conflict in state 8 between rule 2 and token '+' resolved as reduce. Conflict in state 8 between rule 2 and token '-' resolved as reduce. Conflict in state 8 between rule 2 and token '*' resolved as shift. ... The next section lists states that still have conflicts. State 8 contains 1 shift/reduce conflict. State 9 contains 1 shift/reduce conflict. State 10 contains 1 shift/reduce conflict. State 11 contains 4 shift/reduce conflicts. The next section reports useless tokens, nonterminal and rules. Useless nonterminals and rules are removed in order to produce a smaller parser, but useless tokens are preserved, since they might be used by the scanner (note the difference between "useless" and "not used" below): Useless nonterminals: useless Terminals which are not used: STR Useless rules: #6 useless: STR; The next section reproduces the exact grammar that Bison used: Grammar Number, Line, Rule 0 5 $accept -> exp $end 1 5 exp -> exp '+' exp 2 6 exp -> exp '-' exp 3 7 exp -> exp '*' exp 4 8 exp -> exp '/' exp 5 9 exp -> NUM and reports the uses of the symbols: Terminals, with rules where they appear $end (0) 0 '*' (42) 3 '+' (43) 1 '-' (45) 2 '/' (47) 4 error (256) NUM (258) 5 Nonterminals, with rules where they appear $accept (8) on left: 0 exp (9) on left: 1 2 3 4 5, on right: 0 1 2 3 4 Bison then proceeds onto the automaton itself, describing each state with it set of "items", also known as "pointed rules". Each item is a production rule together with a point (marked by `.') that the input cursor. state 0 $accept -> . exp $ (rule 0) NUM shift, and go to state 1 exp go to state 2 This reads as follows: "state 0 corresponds to being at the very beginning of the parsing, in the initial rule, right before the start symbol (here, `exp'). When the parser returns to this state right after having reduced a rule that produced an `exp', the control flow jumps to state 2. If there is no such transition on a nonterminal symbol, and the lookahead is a `NUM', then this token is shifted on the parse stack, and the control flow jumps to state 1. Any other lookahead triggers a parse error." Even though the only active rule in state 0 seems to be rule 0, the report lists `NUM' as a lookahead symbol because `NUM' can be at the beginning of any rule deriving an `exp'. By default Bison reports the so-called "core" or "kernel" of the item set, but if you want to see more detail you can invoke `bison' with `--report=itemset' to list all the items, include those that can be derived: state 0 $accept -> . exp $ (rule 0) exp -> . exp '+' exp (rule 1) exp -> . exp '-' exp (rule 2) exp -> . exp '*' exp (rule 3) exp -> . exp '/' exp (rule 4) exp -> . NUM (rule 5) NUM shift, and go to state 1 exp go to state 2 In the state 1... state 1 exp -> NUM . (rule 5) $default reduce using rule 5 (exp) the rule 5, `exp: NUM;', is completed. Whatever the lookahead (`$default'), the parser will reduce it. If it was coming from state 0, then, after this reduction it will return to state 0, and will jump to state 2 (`exp: go to state 2'). state 2 $accept -> exp . $ (rule 0) exp -> exp . '+' exp (rule 1) exp -> exp . '-' exp (rule 2) exp -> exp . '*' exp (rule 3) exp -> exp . '/' exp (rule 4) $ shift, and go to state 3 '+' shift, and go to state 4 '-' shift, and go to state 5 '*' shift, and go to state 6 '/' shift, and go to state 7 In state 2, the automaton can only shift a symbol. For instance, because of the item `exp -> exp . '+' exp', if the lookahead if `+', it will be shifted on the parse stack, and the automaton control will jump to state 4, corresponding to the item `exp -> exp '+' . exp'. Since there is no default action, any other token than those listed above will trigger a parse error. The state 3 is named the "final state", or the "accepting state": state 3 $accept -> exp $ . (rule 0) $default accept the initial rule is completed (the start symbol and the end of input were read), the parsing exits successfully. The interpretation of states 4 to 7 is straightforward, and is left to the reader. state 4 exp -> exp '+' . exp (rule 1) NUM shift, and go to state 1 exp go to state 8 state 5 exp -> exp '-' . exp (rule 2) NUM shift, and go to state 1 exp go to state 9 state 6 exp -> exp '*' . exp (rule 3) NUM shift, and go to state 1 exp go to state 10 state 7 exp -> exp '/' . exp (rule 4) NUM shift, and go to state 1 exp go to state 11 As was announced in beginning of the report, `State 8 contains 1 shift/reduce conflict': state 8 exp -> exp . '+' exp (rule 1) exp -> exp '+' exp . (rule 1) exp -> exp . '-' exp (rule 2) exp -> exp . '*' exp (rule 3) exp -> exp . '/' exp (rule 4) '*' shift, and go to state 6 '/' shift, and go to state 7 '/' [reduce using rule 1 (exp)] $default reduce using rule 1 (exp) Indeed, there are two actions associated to the lookahead `/': either shifting (and going to state 7), or reducing rule 1. The conflict means that either the grammar is ambiguous, or the parser lacks information to make the right decision. Indeed the grammar is ambiguous, as, since we did not specify the precedence of `/', the sentence `NUM + NUM / NUM' can be parsed as `NUM + (NUM / NUM)', which corresponds to shifting `/', or as `(NUM + NUM) / NUM', which corresponds to reducing rule 1. Because in LALR(1) parsing a single decision can be made, Bison arbitrarily chose to disable the reduction, see *Note Shift/Reduce Conflicts: Shift/Reduce. Discarded actions are reported in between square brackets. Note that all the previous states had a single possible action: either shifting the next token and going to the corresponding state, or reducing a single rule. In the other cases, i.e., when shifting _and_ reducing is possible or when _several_ reductions are possible, the lookahead is required to select the action. State 8 is one such state: if the lookahead is `*' or `/' then the action is shifting, otherwise the action is reducing rule 1. In other words, the first two items, corresponding to rule 1, are not eligible when the lookahead is `*', since we specified that `*' has higher precedence that `+'. More generally, some items are eligible only with some set of possible lookaheads. When run with `--report=lookahead', Bison specifies these lookaheads: state 8 exp -> exp . '+' exp [$, '+', '-', '/'] (rule 1) exp -> exp '+' exp . [$, '+', '-', '/'] (rule 1) exp -> exp . '-' exp (rule 2) exp -> exp . '*' exp (rule 3) exp -> exp . '/' exp (rule 4) '*' shift, and go to state 6 '/' shift, and go to state 7 '/' [reduce using rule 1 (exp)] $default reduce using rule 1 (exp) The remaining states are similar: state 9 exp -> exp . '+' exp (rule 1) exp -> exp . '-' exp (rule 2) exp -> exp '-' exp . (rule 2) exp -> exp . '*' exp (rule 3) exp -> exp . '/' exp (rule 4) '*' shift, and go to state 6 '/' shift, and go to state 7 '/' [reduce using rule 2 (exp)] $default reduce using rule 2 (exp) state 10 exp -> exp . '+' exp (rule 1) exp -> exp . '-' exp (rule 2) exp -> exp . '*' exp (rule 3) exp -> exp '*' exp . (rule 3) exp -> exp . '/' exp (rule 4) '/' shift, and go to state 7 '/' [reduce using rule 3 (exp)] $default reduce using rule 3 (exp) state 11 exp -> exp . '+' exp (rule 1) exp -> exp . '-' exp (rule 2) exp -> exp . '*' exp (rule 3) exp -> exp . '/' exp (rule 4) exp -> exp '/' exp . (rule 4) '+' shift, and go to state 4 '-' shift, and go to state 5 '*' shift, and go to state 6 '/' shift, and go to state 7 '+' [reduce using rule 4 (exp)] '-' [reduce using rule 4 (exp)] '*' [reduce using rule 4 (exp)] '/' [reduce using rule 4 (exp)] $default reduce using rule 4 (exp) Observe that state 11 contains conflicts due to the lack of precedence of `/' wrt `+', `-', and `*', but also because the associativity of `/' is not specified. File: bison.info, Node: Tracing, Prev: Understanding, Up: Debugging Tracing Your Parser =================== If a Bison grammar compiles properly but doesn't do what you want when it runs, the `yydebug' parser-trace feature can help you figure out why. There are several means to enable compilation of trace facilities: the macro `YYDEBUG' Define the macro `YYDEBUG' to a nonzero value when you compile the parser. This is compliant with POSIX Yacc. You could use `-DYYDEBUG=1' as a compiler option or you could put `#define YYDEBUG 1' in the prologue of the grammar file (*note The Prologue: Prologue.). the option `-t', `--debug' Use the `-t' option when you run Bison (*note Invoking Bison: Invocation.). This is POSIX compliant too. the directive `%debug' Add the `%debug' directive (*note Bison Declaration Summary: Decl Summary.). This is a Bison extension, which will prove useful when Bison will output parsers for languages that don't use a preprocessor. Useless POSIX and Yacc portability matter to you, this is the preferred solution. We suggest that you always enable the debug option so that debugging is always possible. The trace facility outputs messages with macro calls of the form `YYFPRINTF (stderr, FORMAT, ARGS)' where FORMAT and ARGS are the usual `printf' format and arguments. If you define `YYDEBUG' to a nonzero value but do not define `YYFPRINTF', `<stdio.h>' is automatically included and `YYPRINTF' is defined to `fprintf'. Once you have compiled the program with trace facilities, the way to request a trace is to store a nonzero value in the variable `yydebug'. You can do this by making the C code do it (in `main', perhaps), or you can alter the value with a C debugger. Each step taken by the parser when `yydebug' is nonzero produces a line or two of trace information, written on `stderr'. The trace messages tell you these things: * Each time the parser calls `yylex', what kind of token was read. * Each time a token is shifted, the depth and complete contents of the state stack (*note Parser States::). * Each time a rule is reduced, which rule it is, and the complete contents of the state stack afterward. To make sense of this information, it helps to refer to the listing file produced by the Bison `-v' option (*note Invoking Bison: Invocation.). This file shows the meaning of each state in terms of positions in various rules, and also what each state will do with each possible input token. As you read the successive trace messages, you can see that the parser is functioning according to its specification in the listing file. Eventually you will arrive at the place where something undesirable happens, and you will see which parts of the grammar are to blame. The parser file is a C program and you can use C debuggers on it, but it's not easy to interpret what it is doing. The parser function is a finite-state machine interpreter, and aside from the actions it executes the same code over and over. Only the values of variables show where in the grammar it is working. The debugging information normally gives the token type of each token read, but not its semantic value. You can optionally define a macro named `YYPRINT' to provide a way to print the value. If you define `YYPRINT', it should take three arguments. The parser will pass a standard I/O stream, the numeric code for the token type, and the token value (from `yylval'). Here is an example of `YYPRINT' suitable for the multi-function calculator (*note Declarations for `mfcalc': Mfcalc Decl.): #define YYPRINT(file, type, value) yyprint (file, type, value) static void yyprint (FILE *file, int type, YYSTYPE value) { if (type == VAR) fprintf (file, " %s", value.tptr->name); else if (type == NUM) fprintf (file, " %d", value.val); } File: bison.info, Node: Invocation, Next: Table of Symbols, Prev: Debugging, Up: Top Invoking Bison ************** The usual way to invoke Bison is as follows: bison INFILE Here INFILE is the grammar file name, which usually ends in `.y'. The parser file's name is made by replacing the `.y' with `.tab.c'. Thus, the `bison foo.y' filename yields `foo.tab.c', and the `bison hack/foo.y' filename yields `hack/foo.tab.c'. It's also possible, in case you are writing C++ code instead of C in your grammar file, to name it `foo.ypp' or `foo.y++'. Then, the output files will take an extension like the given one as input (respectively `foo.tab.cpp' and `foo.tab.c++'). This feature takes effect with all options that manipulate filenames like `-o' or `-d'. For example : bison -d INFILE.YXX will produce `infile.tab.cxx' and `infile.tab.hxx', and bison -d -o OUTPUT.C++ INFILE.Y will produce `output.c++' and `outfile.h++'. * Menu: * Bison Options:: All the options described in detail, in alphabetical order by short options. * Option Cross Key:: Alphabetical list of long options. * VMS Invocation:: Bison command syntax on VMS. File: bison.info, Node: Bison Options, Next: Option Cross Key, Up: Invocation Bison Options ============= Bison supports both traditional single-letter options and mnemonic long option names. Long option names are indicated with `--' instead of `-'. Abbreviations for option names are allowed as long as they are unique. When a long option takes an argument, like `--file-prefix', connect the option name and the argument with `='. Here is a list of options that can be used with Bison, alphabetized by short option. It is followed by a cross key alphabetized by long option. Operations modes: `-h' `--help' Print a summary of the command-line options to Bison and exit. `-V' `--version' Print the version number of Bison and exit. `-y' `--yacc' Equivalent to `-o y.tab.c'; the parser output file is called `y.tab.c', and the other outputs are called `y.output' and `y.tab.h'. The purpose of this option is to imitate Yacc's output file name conventions. Thus, the following shell script can substitute for Yacc: bison -y $* Tuning the parser: `-S FILE' `--skeleton=FILE' Specify the skeleton to use. You probably don't need this option unless you are developing Bison. `-t' `--debug' In the parser file, define the macro `YYDEBUG' to 1 if it is not already defined, so that the debugging facilities are compiled. *Note Tracing Your Parser: Tracing. `--locations' Pretend that `%locations' was specified. *Note Decl Summary::. `-p PREFIX' `--name-prefix=PREFIX' Pretend that `%name-prefix="PREFIX"' was specified. *Note Decl Summary::. `-l' `--no-lines' Don't put any `#line' preprocessor commands in the parser file. Ordinarily Bison puts them in the parser file so that the C compiler and debuggers will associate errors with your source file, the grammar file. This option causes them to associate errors with the parser file, treating it as an independent source file in its own right. `-n' `--no-parser' Pretend that `%no-parser' was specified. *Note Decl Summary::. `-k' `--token-table' Pretend that `%token-table' was specified. *Note Decl Summary::. Adjust the output: `-d' `--defines' Pretend that `%defines' was specified, i.e., write an extra output file containing macro definitions for the token type names defined in the grammar and the semantic value type `YYSTYPE', as well as a few `extern' variable declarations. *Note Decl Summary::. `--defines=DEFINES-FILE' Same as above, but save in the file DEFINES-FILE. `-b FILE-PREFIX' `--file-prefix=PREFIX' Pretend that `%verbose' was specified, i.e, specify prefix to use for all Bison output file names. *Note Decl Summary::. `-r THINGS' `--report=THINGS' Write an extra output file containing verbose description of the comma separated list of THINGS among: `state' Description of the grammar, conflicts (resolved and unresolved), and LALR automaton. `lookahead' Implies `state' and augments the description of the automaton with each rule's lookahead set. `itemset' Implies `state' and augments the description of the automaton with the full set of items for each state, instead of its core only. For instance, on the following grammar `-v' `--verbose' Pretend that `%verbose' was specified, i.e, write an extra output file containing verbose descriptions of the grammar and parser. *Note Decl Summary::. `-o FILENAME' `--output=FILENAME' Specify the FILENAME for the parser file. The other output files' names are constructed from FILENAME as described under the `-v' and `-d' options. `-g' Output a VCG definition of the LALR(1) grammar automaton computed by Bison. If the grammar file is `foo.y', the VCG output file will be `foo.vcg'. `--graph=GRAPH-FILE' The behavior of -GRAPH is the same than `-g'. The only difference is that it has an optional argument which is the name of the output graph filename. File: bison.info, Node: Option Cross Key, Next: VMS Invocation, Prev: Bison Options, Up: Invocation Option Cross Key ================ Here is a list of options, alphabetized by long option, to help you find the corresponding short option. --debug -t --defines=DEFINES-FILE -d --file-prefix=PREFIX -b FILE-PREFIX --graph=GRAPH-FILE -d --help -h --name-prefix=PREFIX -p NAME-PREFIX --no-lines -l --no-parser -n --output=OUTFILE -o OUTFILE --token-table -k --verbose -v --version -V --yacc -y File: bison.info, Node: VMS Invocation, Prev: Option Cross Key, Up: Invocation Invoking Bison under VMS ======================== The command line syntax for Bison on VMS is a variant of the usual Bison command syntax--adapted to fit VMS conventions. To find the VMS equivalent for any Bison option, start with the long option, and substitute a `/' for the leading `--', and substitute a `_' for each `-' in the name of the long option. For example, the following invocation under VMS: bison /debug/name_prefix=bar foo.y is equivalent to the following command under POSIX. bison --debug --name-prefix=bar foo.y The VMS file system does not permit filenames such as `foo.tab.c'. In the above example, the output file would instead be named `foo_tab.c'. File: bison.info, Node: FAQ, Next: Copying This Manual, Prev: Glossary, Up: Top Frequently Asked Questions ************************** Several questions about Bison come up occasionally. Here some of them are addressed. * Menu: * Parser Stack Overflow:: Breaking the Stack Limits File: bison.info, Node: Parser Stack Overflow, Up: FAQ Parser Stack Overflow ===================== My parser returns with error with a `parser stack overflow' message. What can I do? This question is already addressed elsewhere, *Note Recursive Rules: Recursion. File: bison.info, Node: Table of Symbols, Next: Glossary, Prev: Invocation, Up: Top Bison Symbols ************* `@$' In an action, the location of the left-hand side of the rule. *Note Locations Overview: Locations. `@N' In an action, the location of the N-th symbol of the right-hand side of the rule. *Note Locations Overview: Locations. `$$' In an action, the semantic value of the left-hand side of the rule. *Note Actions::. `$N' In an action, the semantic value of the N-th symbol of the right-hand side of the rule. *Note Actions::. `$accept' The predefined nonterminal whose only rule is `$accept: START $end', where START is the start symbol. *Note The Start-Symbol: Start Decl. It cannot be used in the grammar. `$end' The predefined token marking the end of the token stream. It cannot be used in the grammar. `$undefined' The predefined token onto which all undefined values returned by `yylex' are mapped. It cannot be used in the grammar, rather, use `error'. `error' A token name reserved for error recovery. This token may be used in grammar rules so as to allow the Bison parser to recognize an error in the grammar without halting the process. In effect, a sentence containing an error may be recognized as valid. On a parse error, the token `error' becomes the current look-ahead token. Actions corresponding to `error' are then executed, and the look-ahead token is reset to the token that originally caused the violation. *Note Error Recovery::. `YYABORT' Macro to pretend that an unrecoverable syntax error has occurred, by making `yyparse' return 1 immediately. The error reporting function `yyerror' is not called. *Note The Parser Function `yyparse': Parser Function. `YYACCEPT' Macro to pretend that a complete utterance of the language has been read, by making `yyparse' return 0 immediately. *Note The Parser Function `yyparse': Parser Function. `YYBACKUP' Macro to discard a value from the parser stack and fake a look-ahead token. *Note Special Features for Use in Actions: Action Features. `YYDEBUG' Macro to define to equip the parser with tracing code. *Note Tracing Your Parser: Tracing. `YYERROR' Macro to pretend that a syntax error has just been detected: call `yyerror' and then perform normal error recovery if possible (*note Error Recovery::), or (if recovery is impossible) make `yyparse' return 1. *Note Error Recovery::. `YYERROR_VERBOSE' Macro that you define with `#define' in the Bison declarations section to request verbose, specific error message strings when `yyerror' is called. `YYINITDEPTH' Macro for specifying the initial size of the parser stack. *Note Stack Overflow::. `YYLEX_PARAM' Macro for specifying an extra argument (or list of extra arguments) for `yyparse' to pass to `yylex'. *Note Calling Conventions for Pure Parsers: Pure Calling. `YYLTYPE' Macro for the data type of `yylloc'; a structure with four members. *Note Data Types of Locations: Location Type. `yyltype' Default value for YYLTYPE. `YYMAXDEPTH' Macro for specifying the maximum size of the parser stack. *Note Stack Overflow::. `YYPARSE_PARAM' Macro for specifying the name of a parameter that `yyparse' should accept. *Note Calling Conventions for Pure Parsers: Pure Calling. `YYRECOVERING' Macro whose value indicates whether the parser is recovering from a syntax error. *Note Special Features for Use in Actions: Action Features. `YYSTACK_USE_ALLOCA' Macro used to control the use of `alloca'. If defined to `0', the parser will not use `alloca' but `malloc' when trying to grow its internal stacks. Do _not_ define `YYSTACK_USE_ALLOCA' to anything else. `YYSTYPE' Macro for the data type of semantic values; `int' by default. *Note Data Types of Semantic Values: Value Type. `yychar' External integer variable that contains the integer value of the current look-ahead token. (In a pure parser, it is a local variable within `yyparse'.) Error-recovery rule actions may examine this variable. *Note Special Features for Use in Actions: Action Features. `yyclearin' Macro used in error-recovery rule actions. It clears the previous look-ahead token. *Note Error Recovery::. `yydebug' External integer variable set to zero by default. If `yydebug' is given a nonzero value, the parser will output information on input symbols and parser action. *Note Tracing Your Parser: Tracing. `yyerrok' Macro to cause parser to recover immediately to its normal mode after a parse error. *Note Error Recovery::. `yyerror' User-supplied function to be called by `yyparse' on error. The function receives one argument, a pointer to a character string containing an error message. *Note The Error Reporting Function `yyerror': Error Reporting. `yylex' User-supplied lexical analyzer function, called with no arguments to get the next token. *Note The Lexical Analyzer Function `yylex': Lexical. `yylval' External variable in which `yylex' should place the semantic value associated with a token. (In a pure parser, it is a local variable within `yyparse', and its address is passed to `yylex'.) *Note Semantic Values of Tokens: Token Values. `yylloc' External variable in which `yylex' should place the line and column numbers associated with a token. (In a pure parser, it is a local variable within `yyparse', and its address is passed to `yylex'.) You can ignore this variable if you don't use the `@' feature in the grammar actions. *Note Textual Positions of Tokens: Token Positions. `yynerrs' Global variable which Bison increments each time there is a parse error. (In a pure parser, it is a local variable within `yyparse'.) *Note The Error Reporting Function `yyerror': Error Reporting. `yyparse' The parser function produced by Bison; call this function to start parsing. *Note The Parser Function `yyparse': Parser Function. `%debug' Equip the parser for debugging. *Note Decl Summary::. `%defines' Bison declaration to create a header file meant for the scanner. *Note Decl Summary::. `%dprec' Bison declaration to assign a precedence to a rule that is used at parse time to resolve reduce/reduce conflicts. *Note GLR Parsers::. `%file-prefix="PREFIX"' Bison declaration to set the prefix of the output files. *Note Decl Summary::. `%glr-parser' Bison declaration to produce a GLR parser. *Note GLR Parsers::. `%left' Bison declaration to assign left associativity to token(s). *Note Operator Precedence: Precedence Decl. `%merge' Bison declaration to assign a merging function to a rule. If there is a reduce/reduce conflict with a rule having the same merging function, the function is applied to the two semantic values to get a single result. *Note GLR Parsers::. `%name-prefix="PREFIX"' Bison declaration to rename the external symbols. *Note Decl Summary::. `%no-lines' Bison declaration to avoid generating `#line' directives in the parser file. *Note Decl Summary::. `%nonassoc' Bison declaration to assign non-associativity to token(s). *Note Operator Precedence: Precedence Decl. `%output="FILENAME"' Bison declaration to set the name of the parser file. *Note Decl Summary::. `%prec' Bison declaration to assign a precedence to a specific rule. *Note Context-Dependent Precedence: Contextual Precedence. `%pure-parser' Bison declaration to request a pure (reentrant) parser. *Note A Pure (Reentrant) Parser: Pure Decl. `%right' Bison declaration to assign right associativity to token(s). *Note Operator Precedence: Precedence Decl. `%start' Bison declaration to specify the start symbol. *Note The Start-Symbol: Start Decl. `%token' Bison declaration to declare token(s) without specifying precedence. *Note Token Type Names: Token Decl. `%token-table' Bison declaration to include a token name table in the parser file. *Note Decl Summary::. `%type' Bison declaration to declare nonterminals. *Note Nonterminal Symbols: Type Decl. `%union' Bison declaration to specify several possible data types for semantic values. *Note The Collection of Value Types: Union Decl. These are the punctuation and delimiters used in Bison input: `%%' Delimiter used to separate the grammar rule section from the Bison declarations section or the epilogue. *Note The Overall Layout of a Bison Grammar: Grammar Layout. `%{ %}' All code listed between `%{' and `%}' is copied directly to the output file uninterpreted. Such code forms the prologue of the input file. *Note Outline of a Bison Grammar: Grammar Outline. `/*...*/' Comment delimiters, as in C. `:' Separates a rule's result from its components. *Note Syntax of Grammar Rules: Rules. `;' Terminates a rule. *Note Syntax of Grammar Rules: Rules. `|' Separates alternate rules for the same result nonterminal. *Note Syntax of Grammar Rules: Rules. File: bison.info, Node: Glossary, Next: FAQ, Prev: Table of Symbols, Up: Top Glossary ******** Backus-Naur Form (BNF) Formal method of specifying context-free grammars. BNF was first used in the `ALGOL-60' report, 1963. *Note Languages and Context-Free Grammars: Language and Grammar. Context-free grammars Grammars specified as rules that can be applied regardless of context. Thus, if there is a rule which says that an integer can be used as an expression, integers are allowed _anywhere_ an expression is permitted. *Note Languages and Context-Free Grammars: Language and Grammar. Dynamic allocation Allocation of memory that occurs during execution, rather than at compile time or on entry to a function. Empty string Analogous to the empty set in set theory, the empty string is a character string of length zero. Finite-state stack machine A "machine" that has discrete states in which it is said to exist at each instant in time. As input to the machine is processed, the machine moves from state to state as specified by the logic of the machine. In the case of the parser, the input is the language being parsed, and the states correspond to various stages in the grammar rules. *Note The Bison Parser Algorithm: Algorithm. Generalized LR (GLR) A parsing algorithm that can handle all context-free grammars, including those that are not LALR(1). It resolves situations that Bison's usual LALR(1) algorithm cannot by effectively splitting off multiple parsers, trying all possible parsers, and discarding those that fail in the light of additional right context. *Note Generalized LR Parsing: Generalized LR Parsing. Grouping A language construct that is (in general) grammatically divisible; for example, `expression' or `declaration' in C. *Note Languages and Context-Free Grammars: Language and Grammar. Infix operator An arithmetic operator that is placed between the operands on which it performs some operation. Input stream A continuous flow of data between devices or programs. Language construct One of the typical usage schemas of the language. For example, one of the constructs of the C language is the `if' statement. *Note Languages and Context-Free Grammars: Language and Grammar. Left associativity Operators having left associativity are analyzed from left to right: `a+b+c' first computes `a+b' and then combines with `c'. *Note Operator Precedence: Precedence. Left recursion A rule whose result symbol is also its first component symbol; for example, `expseq1 : expseq1 ',' exp;'. *Note Recursive Rules: Recursion. Left-to-right parsing Parsing a sentence of a language by analyzing it token by token from left to right. *Note The Bison Parser Algorithm: Algorithm. Lexical analyzer (scanner) A function that reads an input stream and returns tokens one by one. *Note The Lexical Analyzer Function `yylex': Lexical. Lexical tie-in A flag, set by actions in the grammar rules, which alters the way tokens are parsed. *Note Lexical Tie-ins::. Literal string token A token which consists of two or more fixed characters. *Note Symbols::. Look-ahead token A token already read but not yet shifted. *Note Look-Ahead Tokens: Look-Ahead. LALR(1) The class of context-free grammars that Bison (like most other parser generators) can handle; a subset of LR(1). *Note Mysterious Reduce/Reduce Conflicts: Mystery Conflicts. LR(1) The class of context-free grammars in which at most one token of look-ahead is needed to disambiguate the parsing of any piece of input. Nonterminal symbol A grammar symbol standing for a grammatical construct that can be expressed through rules in terms of smaller constructs; in other words, a construct that is not a token. *Note Symbols::. Parse error An error encountered during parsing of an input stream due to invalid syntax. *Note Error Recovery::. Parser A function that recognizes valid sentences of a language by analyzing the syntax structure of a set of tokens passed to it from a lexical analyzer. Postfix operator An arithmetic operator that is placed after the operands upon which it performs some operation. Reduction Replacing a string of nonterminals and/or terminals with a single nonterminal, according to a grammar rule. *Note The Bison Parser Algorithm: Algorithm. Reentrant A reentrant subprogram is a subprogram which can be in invoked any number of times in parallel, without interference between the various invocations. *Note A Pure (Reentrant) Parser: Pure Decl. Reverse polish notation A language in which all operators are postfix operators. Right recursion A rule whose result symbol is also its last component symbol; for example, `expseq1: exp ',' expseq1;'. *Note Recursive Rules: Recursion. Semantics In computer languages, the semantics are specified by the actions taken for each instance of the language, i.e., the meaning of each statement. *Note Defining Language Semantics: Semantics. Shift A parser is said to shift when it makes the choice of analyzing further input from the stream rather than reducing immediately some already-recognized rule. *Note The Bison Parser Algorithm: Algorithm. Single-character literal A single character that is recognized and interpreted as is. *Note From Formal Rules to Bison Input: Grammar in Bison. Start symbol The nonterminal symbol that stands for a complete valid utterance in the language being parsed. The start symbol is usually listed as the first nonterminal symbol in a language specification. *Note The Start-Symbol: Start Decl. Symbol table A data structure where symbol names and associated data are stored during parsing to allow for recognition and use of existing information in repeated uses of a symbol. *Note Multi-function Calc::. Token A basic, grammatically indivisible unit of a language. The symbol that describes a token in the grammar is a terminal symbol. The input of the Bison parser is a stream of tokens which comes from the lexical analyzer. *Note Symbols::. Terminal symbol A grammar symbol that has no rules in the grammar and therefore is grammatically indivisible. The piece of text it represents is a token. *Note Languages and Context-Free Grammars: Language and Grammar. File: bison.info, Node: Copying This Manual, Next: Index, Prev: FAQ, Up: Top Copying This Manual ******************* * Menu: * GNU Free Documentation License:: License for copying this manual.