| <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
| <html> |
| <head> |
| <meta http-equiv="content-type" |
| content="text/html; charset=ISO-8859-1"> |
| <title>manual.html</title> |
| </head> |
| <body> |
| <div style="text-align: center;"> <big><big><span |
| style="font-weight: bold;">DParser Manual<br> |
| </span></big></big> |
| <div style="text-align: left;"><big><br> |
| <br> |
| <span style="font-weight: bold;">Contents</span><br> |
| </big> |
| <ol> |
| <li>Installation</li> |
| <li>Getting Started</li> |
| <li>Comments<br> |
| </li> |
| <li>Productions</li> |
| <li>Global Code<br> |
| </li> |
| <li>Terminals</li> |
| <ol> |
| <li>Strings</li> |
| <li>Regular Expressions</li> |
| <li>External (C) scanners</li> |
| <li>Tokenizers</li> |
| <li>Longest Match<br> |
| </li> |
| </ol> |
| <li>Priorities and Associativity</li> |
| <ol> |
| <li>Token Priorities</li> |
| <li>Operator Priorities</li> |
| <li>Rule Priorities</li> |
| </ol> |
| <li>Actions</li> |
| <ol> |
| <li>Speculative Actions</li> |
| <li>Final Actions</li> |
| <li>Embedded<br> |
| </li> |
| <li>Pass Actions<br> |
| </li> |
| <li>Default Actions<br> |
| </li> |
| </ol> |
| <li>Attributes and Action Specifiers</li> |
| <ol> |
| <li>Global State</li> |
| <li>Parse Nodes</li> |
| <li>Misc</li> |
| </ol> |
| <li>Symbol Table</li> |
| <li>Whitespace</li> |
| <li>Ambiguities</li> |
| <li>Error Recovery</li> |
| <li>Parsing Options<br> |
| </li> |
| <li>Grammar Grammar<br> |
| </li> |
| </ol> |
| <span style="font-weight: bold;">1. Installation</span><br> |
| <br> |
| To build: |
| 'gmake' |
| (only available with source code package)<br> |
| <br> |
| To test: 'gmake |
| test' (only |
| available with source code package)<br> |
| <br> |
| To install, 'gmake install' (binary or source code |
| packages)<br> |
| <br> |
| <span style="font-weight: bold;">2. Getting Started<span |
| style="font-weight: bold;"></span><span style="font-weight: bold;"></span></span><span |
| style="font-weight: bold;"></span><br> |
| <br> |
| 2.1. Create your grammar, for example, in the file "my.g":<br> |
| <br> |
| E: E '+' E | "[abc]";<br> |
| <br> |
| 2.2. Convert grammar into parsing tables:<br> |
| <br> |
| % make_dparser -g my.g<br> |
| <br> |
| 2.3. Create a driver program, for example, in the file |
| "my.c": <br> |
| <br> |
| #include <stdio.h><br> |
| #include <dparse.h><br> |
| extern D_ParserTables parser_tables_gram;<br> |
| int<br> |
| main(int argc, char *argv[]) {<br> |
| char s[256], *ss;<br> |
| D_Parser *p = new_D_Parser(&parser_tables_gram, 0);<br> |
| if (fgets(s,255,stdin) && dparse(p, s, strlen(s)) |
| && !p->syntax_errors)<br> |
| printf("success\n");<br> |
| else<br> |
| printf("failure\n");<br> |
| }<br> |
| <br> |
| 2.4. Compile:<br> |
| <br> |
| % cc -I/usr/local/include my.c my.g.d_parser.c -L/usr/local/lib |
| -ldparse<br> |
| <br> |
| 2.5. Run:<br> |
| <br> |
| % a.out<br> |
| a=<br> |
| syntax error, '' line 1<br> |
| failure<br> |
| %<br> |
| <br> |
| % a.out<br> |
| a+b<br> |
| success<br> |
| %<br> |
| <br> |
| We'll come back to this example later.<br> |
| <br> |
| <span style="font-weight: bold;">3. Comments</span><br> |
| <br> |
| Grammars can include C/C++ style comments.<br> |
| <br> |
| EXAMPLE<br> |
| <br> |
| // My first grammar<br> |
| E: E '+' E | "[abc]";<br> |
| /* is this right? */<br> |
| <br> |
| <span style="font-weight: bold;">4. Productions</span><br> |
| <br> |
| 4.1. The first production is the root of your grammar (what you |
| will be trying to parse).<br> |
| 4.2. Productions start with the non-terminal being defined |
| followed by a colon ':', a set of right hand sides seperated by '|' |
| (or) |
| consisting of elements (non-terminals or terminals).<br> |
| 4.3. Elements can be grouped with parens '(', and the normal |
| regular expression symbols can be used ('+' '*' '?' '|').<br> |
| <br> |
| EXAMPLE<br> |
| <br> |
| program: statements+ | comment* (function | procedure)?;<br> |
| <br> |
| 4.4. <span style="font-weight: bold;">NOTE:</span> Instead of |
| using '[' ']' for optional elements we use the more familar and |
| consistent '?' operator. The square brackets are reserved for |
| speculative actions (below).<br> |
| <br> |
| <span style="font-weight: bold;">5. Global Code</span><br> |
| <br> |
| Global (or static) C code can be intermixed with productions by |
| surrounding the code with brackets '{'.<br> |
| <br> |
| EXAMPLE<br> |
| <br> |
| { void dr_s() { printf("Dr. S\n"); }<br> |
| S: 'the' 'cat' 'and' 'the' 'hat' { dr_s(); } | T;<br> |
| { void twain() { printf("Mark Twain\n"); }<br> |
| T: 'Huck' 'Finn' { twain(); };<br> |
| <br> |
| <span style="font-weight: bold;">6. Terminals</span><br> |
| <br> |
| 6.1. Strings terminals are surrounded with single quotes. |
| For example:<br> |
| <br> |
| <span style="font-weight: bold;"></span>block: '{' statements* '}';<br> |
| whileblock: 'while' '(' expression ')' block;<br> |
| <br> |
| 6.2. Regular expressions are surrounded with double quotes. |
| For example:<br> |
| <br> |
| hexint: "(0x|0X)[0-9a-fA-F]+[uUlL]?";<br> |
| <br> |
| <span style="font-weight: bold;">NOTE: </span>only the simple |
| regular expression operators are currently supported (v1.3). This |
| include parens, square parens, ranges, and '*', '+', '?'. If you |
| need something more, request a feature or implement it yourself; the |
| code is in scan.c.<br> |
| <br> |
| 6.3. External (C) Scanners<br> |
| <br> |
| There are two types of external scanners, those which read a |
| single terminal, and those which are global (called for every |
| terminal). |
| Here is an example of a scanner for a single terminal. |
| Notice how it can be mixed with regular string terminals.<br> |
| <br> |
| {<br> |
| extern char *ops;<br> |
| extern void *ops_cache;<br> |
| int ops_scan(char *ops, void *ops_cache, char **as,<br> |
| int *col, int *line, unsigned short *op_assoc, int |
| *op_priority);<br> |
| }<br> |
| <br> |
| X: '1' (${scan ops_scan(ops, ops_cache)} '2')*;<br> |
| <br> |
| The user provides the 'ops_scan' function. This example is |
| from tests/g4.test.g in the source distribution.<br> |
| <br> |
| The second type of scanner is a global scanner:<br> |
| <br> |
| {<br> |
| #include "g7.test.g.d_parser.h"<br> |
| int myscanner(char **s, int *col, int *line, unsigned short *symbol,<br> |
| int *term_priority, unsigned short |
| *op_assoc, int *op_priority)<br> |
| {<br> |
| if (**s == 'a') {<br> |
| (*s)++;<br> |
| *symbol = A;<br> |
| return 1;<br> |
| } else if (**s == 'b') {<br> |
| (*s)++;<br> |
| *symbol = BB;<br> |
| return 1;<br> |
| } else if (**s == 'c') {<br> |
| (*s)++;<br> |
| *symbol = CCC;<br> |
| return 1;<br> |
| } else if (**s == 'd') {<br> |
| (*s)++;<br> |
| *symbol = DDDD;<br> |
| return 1;<br> |
| } else<br> |
| return 0;<br> |
| }<br> |
| ${scanner myscanner}<br> |
| ${token A BB CCC DDDD}<br> |
| <br> |
| S: A (BB CCC)+ SS;<br> |
| SS: DDDD;<br> |
| <br> |
| Notice how the you need to include the header file generated by <span |
| style="font-weight: bold;">make_dparser</span> which contains the |
| token |
| definitions.<br> |
| <br> |
| 6.4. Tokenizers<br> |
| <br> |
| Tokenizers are non-context sensitive global scanners which |
| produce only one token for any given input string. Some |
| programming languages (for example <span style="font-weight: bold;">C</span>) |
| are easier to specify using a tokenizer because (for example) reserved |
| words can be handled simply by lowering the terminal priority for |
| identifiers.<br> |
| <br> |
| EXAMPLE:<br> |
| <br> |
| S : 'if' '(' S ')' S ';' | 'do' S 'while' '(' S ')' ';' | ident;<br> |
| ident: "[a-z]+" $term -1;<br> |
| <br> |
| The sentence: <span style="font-weight: bold;">if ( while ) a;</span> |
| is legal because <span style="font-weight: bold;">while</span> cannot |
| appear at the start of <span style="font-weight: bold;">S</span> and |
| so |
| it doesn't conflict with the parsing of <span |
| style="font-weight: bold;">while</span> |
| as an <span style="font-weight: bold;">ident</span> in that position. |
| However, if a tokenizer is specified, all tokens will be possible |
| at each position and the sentense will produce a syntax error.<br> |
| <br> |
| <span style="font-weight: bold;">DParser</span> provides two |
| ways to specify tokenizers: globally as an option (-T) to <span |
| style="font-weight: bold;">make_dparser</span> and locally with a |
| ${declare tokenize ...} specifier (see the ANSI C grammar for an |
| example). The ${declare tokenize ...} declartion allows a |
| tokenizer to be specified over a subset of the parsing states so that |
| (for example) ANSI C could be a subgrammar of another larger grammar. |
| Currently the parse states are not split so that the productions |
| for the substates must be disjoint.<br> |
| <br> |
| 6.5 Longest Match<br> |
| <br> |
| Longest match lexical ambiguity resolution is a technique used |
| by seperate phase lexers to help decide (along<br> |
| with lexical priorities) which single token to select for a given input |
| string. It is used in the definition of ANSI-C, but not in C++ |
| because of a snafu in the definition of templates whereby templates of |
| templates (List<List <Int>>) can end with the right shift |
| token ('>>"). Since <span style="font-weight: bold;">DParser</span> |
| does not have a seperate lexical phase, it does not require longest |
| match disambiguation, but provides it as an option.<br> |
| <br> |
| There are two ways to specify longest match disabiguation: |
| globally as an option (-l) to <span style="font-weight: bold;">make_dparser</span> |
| or locally with with a ${declare ... longest_match}. If global |
| longest match disambiguation is <span style="font-weight: bold;">ON</span>, |
| it can be locally disabled with {$declare ... all_matches} . As |
| with Tokenizers above, local declarations operate on disjoint subsets |
| of |
| parsing states.<br> |
| <br> |
| <span style="font-weight: bold;">7. Priorities and Associativity<br> |
| <br> |
| </span> Priorities can very from MININT to MAXINT and are |
| specified as integers. Associativity can take the values:<br> |
| <br> |
| assoc : '$unary_op_right' | '$unary_op_left' | '$binary_op_right'<br> |
| | |
| '$binary_op_left' | '$unary_right' | '$unary_left'<br> |
| | |
| '$binary_right' | '$binary_left' | '$right' | '$left' ;<br> |
| <br> |
| 7.1. Token Prioritites<br> |
| <br> |
| Currently (v1.0) the automatically generated scanners use the <span |
| style="font-style: italic;">longest match</span> rule so that, for |
| example:<br> |
| <br> |
| OP: '>' | '>>';<br> |
| <br> |
| will match the string '>>' only as '>>' instead of |
| ambiguously as either '>' or '>>'. A planned feature is |
| to make this optional.<br> |
| <br> |
| Termininal priorities apply after the longest match string has |
| been found and the terminal with the highest priority is selected.<br> |
| They are introduced after a terminal by the specifier <span |
| style="font-weight: bold;">$term</span>. We saw an example of |
| token priorities with the definition of <span |
| style="font-weight: bold;">ident</span>.<br> |
| <br> |
| EXAMPLE:<br> |
| <br> |
| S : 'if' '(' S ')' S ';' | 'do' S 'while' '(' S ')' ';' | ident;<br> |
| ident: "[a-z]+" $term -1;<br> |
| <br> |
| 7.2. Operator Priorities<br> |
| <br> |
| Operator priorities specify the priority of a operator symbol |
| (either a terminal or a non-terminal). This corresponds to the <span |
| style="font-style: italic;">yacc</span> or <span |
| style="font-style: italic;">bison</span> <span |
| style="font-weight: bold;">%left</span> etc. declaration. |
| However, since <span style="font-weight: bold;">DParser</span> |
| is |
| doesn't require a global tokenizer, operator priorities and |
| associativities are specified on the reduction which creates the token. |
| Moreover, the associativity includes the operator usage as well |
| since it cannot be infered from rule context. Possible operator |
| associativies are:<br> |
| <br> |
| operator_assoc : '$unary_op_right' | '$unary_op_left' | |
| '$binary_op_right'<br> |
| | |
| '$binary_op_left' | '$unary_right' | '$unary_left'<br> |
| | |
| '$binary_right' | '$binary_left';<br> |
| <br> |
| EXAMPLE:<br> |
| <br> |
| E: ident op ident;<br> |
| ident: '[a-z]+';<br> |
| <span style="font-weight: bold;"></span>op: '*' $binary_op_left 2 |<br> |
| '+' $binary_op_left 1;<br> |
| <br> |
| 7.3. Rule Priorities<br> |
| <br> |
| Rule priorities specify the priority of the reduction itself and |
| have the possible associativies:<br> |
| <br> |
| rule_assoc: '$right' | '$left';<br> |
| <br> |
| Rule and operator priorities can be intermixed and are |
| interpreted at run time (<span style="font-weight: bold;">not</span> |
| when |
| the tables are built). This make it possible for user-defined |
| scanners to return the associativities and priorities of tokens.<br> |
| <br> |
| <span style="font-weight: bold;">8. Actions<br> |
| <br> |
| </span> Actions are the bits of code which run when a reduction |
| occurs.<br> |
| <br> |
| EXAMPLE<br> |
| <br> |
| S: this | that;<br> |
| this: 'this' { printf("got this\n"); };<br> |
| that: 'that' { printf("got that\n"); };<br> |
| <br> |
| 8.1 Speculative Action<br> |
| <br> |
| Speculative actions occur when the reduction takes place during |
| the speculative parsing process. It is possible<br> |
| that the reduction will not be part of the final parse or that it will |
| occur a different number of times. For example:<br> |
| <br> |
| S: this | that;<br> |
| this: hi 'mom';<br> |
| that: ho 'dad';<br> |
| ho: 'hello' [ printf("ho\n"); ];<br> |
| hi: 'hello' [ printf("hi\n"); ];<br> |
| <br> |
| Will print both 'hi' and 'ho' when given the input 'hello dad' because |
| at the time hello is reduced, the following token is not known.<br> |
| <br> |
| 8.2 Final Actions<br> |
| <br> |
| Final actions occur only when the reduction must be part of any |
| legal final parse (committed). It is possible to do final actions |
| during parsing or delay them till the entire parse tree is constructed |
| (see Options). Final actions are executed in order and in number |
| according the the single final unambiguous parse.<br> |
| <br> |
| S: A S 'b' | 'x';<br> |
| A: [ printf("speculative e-reduce A\n"); ] <br> |
| { printf("final e-reduce A\n"); };<br> |
| <br> |
| On input:<br> |
| <br> |
| xbbb<br> |
| <br> |
| Will produce:<br> |
| <br> |
| speculative e-reduce A<br> |
| final e-reduce A<br> |
| final e-reduce A<br> |
| final e-reduce A<br> |
| <br> |
| 8.3 Embedded Actions<br> |
| <br> |
| Actions can be embedded into rule. These actions are executed |
| as if they were replaced with a synthetic production with a single null |
| rule containing the actions. For example:<br> |
| <br> |
| S: A { printf("X"); } B;<br> |
| A: 'a' { printf("a"); };<br> |
| B: 'b' { printf("b"); };<br> |
| <br> |
| On input:<br> |
| <br> |
| ab<br> |
| <br> |
| Will produce:<br> |
| <br> |
| aXb<br> |
| <br> |
| 8.4 Pass Actions<br> |
| <br> |
| <span style="font-weight: bold;">DParser</span> supports |
| multiple pass compilation. The passes are declared at the top of |
| the grammar, and the actions are associated with individual rules.<br> |
| <br> |
| EXAMPLE<br> |
| <br> |
| ${pass sym for_all postorder}<br> |
| ${pass gen for_all postorder}<br> |
| <br> |
| translation_unit: statement*;<br> |
| <br> |
| statement<br> |
| : expression ';' {<br> |
| d_pass(${parser}, &$n, ${pass sym});<br> |
| d_pass(${parser}, &$n, ${pass gen});<br> |
| }<br> |
| ;<br> |
| <br> |
| expression : integer<br> |
| gen: { printf("gen integer\n"); }<br> |
| sym: { printf("sym integer\n"); }<br> |
| | expression '+' expression $right 2<br> |
| sym: { printf("sym +\n"); }<br> |
| ;<br> |
| <br> |
| A pass name then a colon indicate that the following action is |
| associated with a particular pass. Passes can be either <span |
| style="font-weight: bold;">for_all</span> or <span |
| style="font-weight: bold;">for_undefined</span> (which means that the |
| automatic traversal only applies to rules without actions defined for |
| this pass). Furthermore, passes can be <span |
| style="font-weight: bold;">postorder</span>, <span |
| style="font-weight: bold;">preorder</span>, and <span |
| style="font-weight: bold;">manual</span> (you have to call <span |
| style="font-weight: bold;">d_pass</span> yourself). Passes can |
| be initiated in the final action of any rule.<br> |
| <br> |
| 8.5 Default Actions<br> |
| <br> |
| The special production "<span style="font-weight: bold;">_</span>" |
| can be defined with a single rule whose actions become the default when |
| no other action is specified. Default actions can be specified |
| for speculative, final and pass actions and apply to each seperately.<br> |
| <br> |
| EXAMPLE<br> |
| <br> |
| _: { printf("final action"); }<br> |
| gen: { printf("default gen action"); }<br> |
| sym: { printf("default sym action"); }<br> |
| ;<br> |
| <span style="font-weight: bold;"></span><br> |
| <span style="font-weight: bold;">9. Attributes and Action Specifiers</span><br> |
| <br> |
| 9.1. Global State (<span style="font-weight: bold;">$g</span>)<br> |
| <br> |
| Global state is declared by <span style="font-weight: bold;">define</span>'ing<span |
| style="font-weight: bold;"> D_ParseNodeGlobals</span> (see the ANSI C |
| grammar for a similar declaration for symbols). Global state can be |
| accessed in any action with <span style="font-weight: bold;">$g</span>. |
| Because <span style="font-weight: bold;">DParser</span> handles |
| ambiguous parsing global state can be accessed on different speculative |
| parses. In the future automatic splitting of global state may be |
| implemented (if there is demand). Currently, the global state can be |
| copied and assigned to <span style="font-weight: bold;">$g</span> to |
| ensure that the changes made only effect subsequent speculative parses |
| derived from the particular parse.<br> |
| <br> |
| EXAMPLE<br> |
| <br> |
| [ $g = copy_globals($g);<br> |
| $g->my_variable = 1;<br> |
| ]<br> |
| <br> |
| The symbol table (Section 10) can be used to manage state |
| information safely for different speculative parses.<br> |
| <br> |
| 9.2. Parse Node State<br> |
| <br> |
| Each parse node includes a set of system state variables and can |
| have a set of user-defined state variables. User defined parse |
| node state is declared by <span style="font-weight: bold;">define</span>'ing<span |
| style="font-weight: bold;"> D_ParseNodeUser</span>. Parse node |
| state is accessed with:<br> |
| <br> |
| <span style="font-weight: bold;">$#</span> - number of child nodes<br> |
| <span style="font-weight: bold;">$$</span> - user parse node |
| state |
| for parent node (non-terminal defined by the production)<br> |
| <span style="font-weight: bold;">$X</span> (where X is a number) |
| - |
| the user parse node state of element X of the production<br> |
| <span style="font-weight: bold;">$nX</span> - the system parse |
| node state of element X of the production<br> |
| <br> |
| The system parse node state is defined in <span |
| style="font-weight: bold;">dparse.h</span> which is installed with <span |
| style="font-weight: bold;">DParser</span>. It contains such |
| information as the symbol, the location of the parsed string, and |
| pointers to the start and end of the parsed string.<br> |
| <br> |
| 9.3. Misc<br> |
| <br> |
| <span style="font-weight: bold;">${scope}</span> - the current |
| symbol table scope<br> |
| <span style="font-weight: bold;">${reject}</span> - in |
| speculative actions permits the current parse to be rejected<br> |
| <br> |
| <span style="font-weight: bold;">10. Symbol Table</span><br> |
| <br> |
| <span style="font-weight: bold;"></span> The symbol table can be |
| updated down different speculative paths while sharing the bulk of the |
| data. It defines the following functions in the file (dsymtab.h):<br> |
| <br> |
| struct D_Scope *new_D_Scope(struct D_Scope *st);<br> |
| struct D_Scope *enter_D_Scope(struct D_Scope *current, struct D_Scope |
| *scope);<br> |
| D_Sym *NEW_D_SYM(struct D_Scope *st, char *name, char *end);<br> |
| D_Sym *find_D_Sym(struct D_Scope *st, char *name, char *end);<br> |
| D_Sym *UPDATE_D_SYM(struct D_Scope *st, D_Sym *sym);<br> |
| D_Sym *current_D_Sym(struct D_Scope *st, D_Sym *sym);<br> |
| D_Sym *find_D_Sym_in_Scope(struct D_Scope *st, char *name, char *end);<br> |
| <br> |
| 'new_D_Scope' creates a new scope below 'st' or NULL for a 'top level' |
| scope. 'enter_D_Scope' returns to a previous scoping level. |
| NOTE: do not simply assign ${scope} to a previous scope as any updated |
| symbol information will be lost. 'commit_D_Scope' can be used in |
| final actions to compress the update list for the top level scope and |
| improve efficiency.<br> |
| <br> |
| 'find_D_Sym' finds the most current version of a symbol in a given |
| scope. 'UPDATE_D_SYM' updates the value of symbol (creates a |
| difference record on the current speculative parse path). |
| 'current_D_Sym' is used to retrive the current version of a symbol, the |
| pointer to which may have been stored in some other attribute or |
| variable. Symbols with the same name should not be created in the |
| same scope. The function 'find_D_Sym_in_Scope' is provided to |
| detect this case. <br> |
| <br> |
| User data can be attached to symbols by <span |
| style="font-weight: bold;">define</span>'ing <span |
| style="font-weight: bold;">D_UserSym</span>. See the ANSI C |
| grammar for an example. <br> |
| <br> |
| Here is a full example of scope usage (from tests/g29.test.g):<br> |
| <br> |
| <span style="font-family: monospace;">#include <stdio.h><br> |
| <br> |
| typedef struct My_Sym {<br> |
| int value;<br> |
| } My_Sym;<br> |
| #define D_UserSym My_Sym<br> |
| typedef struct My_ParseNode {<br> |
| int value;<br> |
| struct D_Scope *scope;<br> |
| } My_ParseNode;<br> |
| #define D_ParseNode_User My_ParseNode<br> |
| }<br> |
| <br> |
| translation_unit: statement*;<br> |
| <br> |
| statement <br> |
| : expression ';' <br> |
| { printf("%d\n", $0.value); }<br> |
| | '{' new_scope statement* '}'<br> |
| [ ${scope} = enter_D_Scope(${scope}, $n0.scope); ]<br> |
| { ${scope} = commit_D_Scope(${scope}); }<br> |
| ;<br> |
| <br> |
| new_scope: [ ${scope} = new_D_Scope(${scope}); ];<br> |
| <br> |
| expression <br> |
| : identifier ':' expression <br> |
| [ <br> |
| D_Sym *s;<br> |
| if (find_D_Sym_in_Scope(${scope}, $n0.start_loc.s, |
| $n0.end))<br> |
| printf("duplicate identifier line %d\n", |
| $n0.start_loc.line);<br> |
| s = NEW_D_SYM(${scope}, $n0.start_loc.s, $n0.end);<br> |
| s->user.value = $2.value;<br> |
| $$.value = s->user.value;<br> |
| ]<br> |
| | identifier '=' expression<br> |
| [ D_Sym *s = find_D_Sym(${scope}, $n0.start_loc.s, $n0.end);<br> |
| s = UPDATE_D_SYM(${scope}, s);<br> |
| s->user.value = $2.value;<br> |
| $$.value = s->user.value;<br> |
| ]<br> |
| | integer <br> |
| [ $$.value = atoi($n0.start_loc.s); ]<br> |
| | identifier <br> |
| [ D_Sym *s = find_D_Sym(${scope}, $n0.start_loc.s, $n0.end);<br> |
| if (s)<br> |
| $$.value = s->user.value;<br> |
| ]<br> |
| | expression '+' expression<br> |
| [ $$.value = $0.value + $1.value; ]<br> |
| ;<br> |
| <br> |
| integer: "-?([0-9]|0(x|X))[0-9]*(u|U|b|B|w|W|L|l)*" $term -1;<br> |
| identifier: "[a-zA-Z_][a-zA-Z_0-9]*";<br> |
| <br> |
| </span> <br> |
| <span style="font-weight: bold;">11. Whitespace</span><br> |
| <br> |
| Whitespace can be specified two ways: C function which can be |
| user-defined, or as a subgrammar. The default whitespace parser |
| is |
| compatible with C/C++ #line directives and comments. It can be |
| replaced with any user specified function as a parsing option (see |
| Options).<br> |
| <br> |
| Additionally, if the (optionally) reserved production <span |
| style="font-weight: bold;">whitespace</span> is defined, the |
| subgrammar |
| it defines will be used to consume whitespace for the main grammar. |
| This subgrammar can include normal actions.<br> |
| <br> |
| EXAMPLE<br> |
| <br> |
| <span style="font-weight: bold;"></span>S: 'a' 'b' 'c';<br> |
| whitespace: "[ \t\n]*";<br> |
| <br> |
| Whitespace can be accessed on a per parse node basis using the |
| functions: <span style="font-weight: bold;">d_ws_before</span> and <span |
| style="font-weight: bold;">d_ws_after</span>, which return the start |
| of the whitespace before <span style="font-weight: bold;">start_loc.s</span> |
| and after <span style="font-weight: bold;">end</span> respectively. <br> |
| <br> |
| <span style="font-weight: bold;">12. Ambiguities<br> |
| </span> <br> |
| Ambiguities are resolved automatically based on priorities and |
| associativities. In addition, when the other resolution |
| techniques |
| fail, user defined ambiguity resolution is possible. The default |
| ambiguity handler produces a fatal error on an unresolved |
| ambiguity. This behavior can be replaced with a user defined |
| resolvers the signature of which is provided in <span |
| style="font-weight: bold;">dparse.h</span>.<br> |
| <br> |
| If the <span style="font-weight: bold;">verbose_level</span> |
| flag is set, the default ambiguity handler will print out parenthesized |
| versions of the ambiguous parse trees. This may be of some |
| assistence in disambiguating a grammar.<br> |
| <br> |
| <span style="font-weight: bold;">13. Error Recovery<br> |
| <br> |
| DParser</span> implements an error recovery scheme appropriate |
| to |
| scannerless parsers. I haven't had time to investigate all the |
| prior work in this area, so I am not sure if it is novel. Suffice |
| for now that it is optional and works well with C/C++ like grammars.<br> |
| <br> |
| <span style="font-weight: bold;">14. Parsing Options</span><br> |
| <br> |
| Parser are instantiated with the function <span |
| style="font-style: italic;">new_D_Parser<span |
| style="font-weight: bold;">. </span></span> The resulting data |
| structure contains a number of user configurable options (see <span |
| style="font-weight: bold;">dparser.h</span>). These are provided |
| reasonable default values and include:<br> |
| <ul> |
| <li><span style="font-weight: bold;">initial_globals</span> - the |
| initial global variables accessable through <span |
| style="font-weight: bold;">$g</span></li> |
| <li><span style="font-weight: bold;"></span><span |
| style="font-weight: bold;">initial_skip_space_fn</span> - the initial |
| whitespace function</li> |
| <li><span style="font-family: monospace;"></span><span |
| style="font-weight: bold;">initial_scope</span> - the initial symbol |
| table scope</li> |
| <li><span style="font-weight: bold;"></span><span |
| style="font-weight: bold;">syntax_error_fn</span> - the function |
| called |
| on a syntax error</li> |
| <li><span style="font-weight: bold;">ambiguity_fn</span> - the |
| function called on an unresolved ambiguity</li> |
| <li><span style="font-weight: bold;">loc</span> - the initial |
| location |
| (set on an error).</li> |
| </ul> |
| In addtion, there are the following user configurables:<br> |
| <ul> |
| <li><span style="font-weight: bold;">sizeof_user_parse_node</span> - |
| the sizeof <span style="font-weight: bold;">D_ParseNodeUser</span></li> |
| <li><span style="font-weight: bold;">save_parse_tree</span> - whether |
| or not the parse tree should be save once the final actions have been |
| executed</li> |
| <li><span style="font-style: italic;"></span><span |
| style="font-weight: bold;">dont_fixup_internal_productions</span> - to |
| not convert the Kleene star into a variable number of children from a |
| tree of reductions</li> |
| <li><span style="font-weight: bold;">dont_merge_epsilon_trees</span> |
| - |
| to not automatically remove ambiguities which result from trees of |
| epsilon reductions without actions</li> |
| <li><span style="font-weight: bold;">dont_use_eagerness_for_disambiguation</span> |
| - do not use the rule that the longest parse which reduces to the same |
| token should be used to disambiguate parses. This rule is used to |
| handle the case (<span style="font-weight: bold;">if then else?</span>) |
| relatively cleanly.</li> |
| <li><span style="font-weight: bold;">dont_use_height_for_disambiguation</span> |
| - do not use the rule that the least deep parse which reduces to the |
| same token should be used to disabiguate parses. This rule is |
| used |
| to handle recursive grammars relatiively cleanly.</li> |
| <li><span style="font-weight: bold;">dont_compare_stacks</span> - |
| disables comparing stacks to handle certain exponential cases during |
| ambiguous operator priority resolution. This feature is |
| relatively |
| new, and this disables the new code.<br> |
| </li> |
| <li><span style="font-weight: bold;">commit_actions_interval</span> - |
| how often to commit final actions (0 is immediate, MAXINT is |
| essentially |
| not till the end of parsing)</li> |
| <li><span style="font-weight: bold;"></span><span |
| style="font-weight: bold;">error_recovery</span> - whether or not to |
| use error recovery (defaults ON)</li> |
| </ul> |
| An the following result values:<br> |
| <ul> |
| <li><span style="font-weight: bold;">syntax_errors</span> - how many |
| syntax errors (if <span style="font-weight: bold;">error_recovery</span> |
| was on)</li> |
| </ul> |
| This final value should be checked to see if parse was successful.<br> |
| <br> |
| <span style="font-weight: bold;">15. Grammar Grammar</span><br> |
| <br> |
| <span style="font-weight: bold;">DParser</span> is fully |
| self-hosted (would you trust a parser generator which wasn't?). |
| The grammar grammar is <a href="grammar.g">here (Grammar Grammar)</a>. |
| <span style="font-weight: bold;"></span><br> |
| <span style="font-weight: bold;"></span><br> |
| <span style="font-weight: bold;"><br> |
| </span><big><big><span style="font-weight: bold;"></span></big></big></div> |
| </div> |
| </body> |
| </html> |