Tree construction - ANTLR 3 - ANTLR Project

来源:百度文库 编辑:神马文学网 时间:2024/05/01 09:35:14
There are two mechanisms in v3 for building abstract syntax trees (ASTs): operators and rewrite rules.
Operators
Nodes created for unmodified tokens and trees for unmodified rule references are added to the current subtree as children.
Operator Description
! do not include node or subtree (if referencing a rule) in subtree
^ make node root of subtree created for subrule
^^ make node root of subtree created for entire enclosing rule; this is equivalent to the single ^ used in v2
additiveExpression: multiplicativeExpression ((‘+‘^^ | ‘-‘^^) multiplicativeExpression)*;
Rewrite rules
The rewrite syntax it more powerful than the operators. It suffices for most common tree transformations.
While the parser grammar specifies how to recognize input, the rewrites are generational grammars, specifying how to generate output. ANTLR figures out how to map input to output grammar. To create an imaginary node, just mention it like the following example (UNIT is a node created from an imaginary token and is used to group the compilation unit chunks):
compilationUnit: packageDefinition? importDefinition* typeDefinition+-> ^(UNIT packageDefinition? importDefinition* typeDefinition*);
ANTLR tracks all elements with the same name into a single implicit list:
formalArgs: formalArg (‘,‘ formalArg)* -> formalArg+|;
If the same rule or token is mentioned twice you generally must label the elements to distinguish them. If you want to combine multiple elements into a single list, list labels are very handy (though in this case since they have the same name ANTLR will automatically combine them):
(‘implements‘ i+=typename (‘,‘ i+=typename)*)?
Here is the entire rule:
classDefinition[MantraAST mod]: ‘class‘ cname=ID(‘extends‘ sup=typename)?(‘implements‘ i+=typename (‘,‘ i+=typename)*)?‘{‘( variableDefinition| methodDefinition| ctorDefinition)*‘}‘-> ^(‘class‘ ID {$mod} ^(‘extends‘ $sup)? ^(‘implements‘ $i+)?variableDefinition* ctorDefinition* methodDefinition*);
Note that using a simple action in a rewrite means evaluate the expression and use as a tree node or subtree. The mod argument is a set of modifiers passed in from an enclosing rule.
Deleting tokens or rules is easy: just don‘t mention them:
packageDefinition: ‘package‘ classname ‘;‘ -> ^(‘package‘ classname);
If you need to build different trees based upon semantic information, use a semantic predicate:
variableDefinition: modifiers typename ID (‘=‘ completeExpression)? ‘;‘-> {inMethod}? ^(VARIABLE ID modifiers? typename completeExpression?)-> ^(FIELD ID modifiers? typename completeExpression?);
where inMethod is set by the method rule.
Often you will need to build a tree node from an input token but with the token type changed:
compoundStatement: lc=‘{‘ statement* ‘}‘ -> ^(SLIST[$lc] statement*);
SLIST by itself is a new node based upon token type SLIST but it has no line/column information nor text. By using SLIST[$lc], all information except the token type is copied to the new node.
Using a rewrite rule at a non-extreme-right-edge-of-production location is ok, but it still always sets the overall subtree for the enclosing rule.
‘if‘ ‘(‘ equalityExpression ‘)‘ s1=statement( ‘else‘ s2=statement -> ^(‘if‘ ^(EXPR equalityExpression) $s1 $s2)| -> ^(‘if‘ ^(EXPR equalityExpression) $s1))
You may reference the previous subtree for the enclosing rule using $rulename syntax
postfixExpression: (primary->primary) // set return tree( lp=‘(‘ args=expressionList ‘)‘ -> ^(CALL $postfixExpression $args)| lb=‘[‘ ie=expression ‘]‘ -> ^(INDEX $postfixExpression $ie)| dot=‘.‘ p=primary -> ^(FIELDACCESS $postfixExpression $p)| c=‘:‘ cl=closure[false] -> ^(APPLY ^(EXPR $postfixExpression) $cl))*;