ebook img

Grammatic -- a tool for grammar definition reuse and modularity PDF

0.22 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Grammatic -- a tool for grammar definition reuse and modularity

Grammatic – a tool for grammar definition reuse and modularity Andrey Breslav St.Petersburg State Universityof Information Technology, Mechanics and Optics 9 [email protected] 0 0 2 Abstract. Grammatic is a tool for grammar definition and manipula- n a tion aimed to improve modularity and reuse of grammars and related J development artifacts. It is independent from parsing technology and 6 any other details of target system implementation. Grammatic provides 1 a way for annotating grammars with arbitrary metadata (associativity attributes, semantic actions or anything else). It might be used as a ] front-end for external tools like parser generators to make their input L grammars modular and reusable. This paper describes main principles P behind Grammatic and gives an overview of languages it provides and . s their ability to separate concerns and define reusable modules. Also it c presents sketchesof possible usecases for thetool. [ 1 v 1 Introduction 1 6 Tofollowideasoflanguage-orientedprogramming[1]orDSL-baseddevelopment 4 2 weneedmanyDSLs,whichmaymeandozensforamedium-sizedproject.Tobe . able to produce and support so many languages we need tools which strongly 1 supportreuseandmodularity.Weneedtoreusedevelopmentartifactstoproduce 0 9 many languagessince evendifferent ones maysometimes have muchin common 0 and it is really hard to work on a bunch of languages having common features v: implementedbycodeduplication.Theprimaryconditionforreuseismodularity i therefore we will always mention them together. X Language development involves many aspects but here we will mostly con- r centrate ontextualsyntax.Structure oflanguagesyntaxisusually describedby a a context-free grammar. Engineering aspects of grammar development did not yet receive enough attention [2]. To provide tools suitable for industry we need toconsolidateandimproveresultsachievedbydifferentauthors.We giveabrief overview of existing tools in section 2. In this paper we describe Grammatic – a tool which is aimed to provide reuseandmodularityforgrammardefinitions whichmightbe usedforarbitrary purpose (not only parser generation but also analysis of grammars, language synthesis, layout and many other applications). Grammatic provides general languagesfor context-freegrammardefinition andannotation.We presentthem in section 3. These languages are designed to support modularity and reuse, we describe these features in section 4. 2 Grammaticismostlydedicatedtobeafront-endforothertoolsandaframe- workfor high-levelmanipulations with grammarsand metadata.As a front-end Grammatic does not replace existing tools but helps to extend and join their powerful abilities. We outline some possible use cases in section 5. 2 Related work Thebestdevelopedandmostwidelyusedfamilyoftoolsworkingwithgrammars are parser and compiler generators. Here we give an overview of most popular andpowerfulofthem.Ourgoalistopresentfeaturessupportingmodularityand reuse, so we concentrate on these aspects here. Ideas implemented in some of these tools significantly influenced our approach. 2.1 Separation of concerns Themostsignificantmodularityissuesarerelatedtoseparationofsyntacticand semantic information in grammar definitions. Traditionalparsergenerators(Yacc [3], ANTLR [4], COCO/R [5] and many others) use grammar definitions with embedded semantic actions. This leads to mixing up syntactical and semantical aspects of the system into one definition file: it lacks modularity and is hard to maintain. Tools specially dedicated to creating textual syntax for simple DSLs (xText [6], TCS [7] and others) use grammar annotations needed for their specific pur- poses(namelybuilding targetmodels)andthusalsomixsyntaxwithsemantics. SableCC [8] avoids mixing things up: it does not allow to embed semantic actions. A parser generated by SableCC creates an abstract syntax tree (AST) and the user has to write his/her own custom code to analyze it or transform into the target format. This provides a clear separation of concerns. SDF [9] also separates semantic actions from parsing by restricting parser responsibilities to building a tree. Unlike SableCC, SDF provides much more powerful grammar definition language. Another approach to separation of concerns is presented by LISA compiler- compiler [10]: it provides an opportunity to either embed semantic actions or attachthemusingAOP-likepointcuts.Thisallowstoestablishdifferentmodules (aspects) even for semantic actions themselves which improves overall system modularity. 2.2 Reuse A basic way to reuse grammar definitions in textual form is to use a C-like preprocessor (any system taking textual input allows reuse with preprocessor). This is the only way to go when using many traditional tools. One of the most popular parser generators today is ANTLR. In version 2 it provided “grammar inheritance” with “rule overriding” and in latest version 3.1.1 it introduced grammar imports [11] which are rather restricted. 3 SDF has a powerful import mechanism, supports grammar composition and templates (parameterized modules). LISA allows grammar inheritance, aspect-oriented composition and tem- plates for semantic actions. 3 Grammatic’s languages overview Grammatic defines a language for describing context-free grammars which does not involve anything but pure grammar productions. This language supports EBNF constructs for both lexical and syntactical rules. Grammarisnotenoughtobuildthewholesystem.Weneedsomesemantics- related or other information to be added to the grammar to have a complete systemdescription.ForthispurposeGrammaticdefinesalanguagesforattaching metadata to grammars and their elements. Metadata annotations are grouped into aspects. A metadata aspect can be though of as a description of the system made from a certain point of view: parsing, pretty printing, building target objects etc. Below we describe these languages in more details. 3.1 Grammar definitions A grammar definition language allows to define a grammar as a set of symbols each of which is associated to a set of productions (in concrete syntax we sep- arate productions by “||”). Grammatic itself does not restrict grammars to any particular class (like LL(k), LALR or anything else). Here is an example from a grammar for arithmetic expressions: Factor -> Literal || ID || ’(’ Expression ’)’ ; In this example characters in single quotes represent embedded lexical defi- nitions. There is no separate notion of a lexical rule (since it is not necessarily required, see [9]) and all the regular expression operations (sequence, alterna- tive, iteration) are available both for syntactical and lexical definitions. Here is a syntactic (nonterminal) symbol: Product -> Factor (’*’ Factor)* ; And these are “lexical” (terminal) symbols: 4 INT -> [’0’--’9’]+ ; REAL -> INT (’.’ INT)? ((’e’ | ’E’) (’+’ | ’-’)? INT)? ; Separationoflexerandparserisnotneededinsomecases(see[9]).Bydefault lexicalsymbols do not differ fromsyntactical.If it is needed, we cancan specify a symbol as lexical by providing corresponding metadata. 3.2 Metadata Puregrammardefinitiondefinesatextualformofalanguagebuttherearemany more things to express. Existing tools (see section 2) use some extensions to grammardefinitions:embeddedsemanticactions,ASTbuildinginstructionsetc. Such things might be generally described by metadata attached to a grammar. Grammatic allows to attach arbitrary metadata to a grammar definition. Metadata annotations might be attached to a grammar, symbol, individual production or a subexpression. Each annotation may contain several attributes (name-valuepairs).Attribute valuesmaybeofdifferenttypes.Thereareseveral predefined value types: identifier, string, integer, annotation and sequence of values and punctuation symbols. id = someName; // the value is an identifier str = "some string"; // string int = 10; // integer class = { // annotation name = MyClass; super = Object; }; astProduction = {{ // sequence ^(’+’ left ^(’-’ right 10)) }}; Users may add their own types. No attribute itself has any fixed semantics. Metadata is passive, some tools (like analyzers, transformers, generators etc.) may use it according to their needs. Evenwithout adding custom types many things might be expressedby such annotations. The most powerful type is a sequence of values – it allows to de- fine small embedded DSLs inside Grammatic. We use such DSLs to describe complicated custom properties when workingwith external tools (i.e. high-level definitions for AST structures). 3.3 Queries How to attach metadata to a grammar? In many cases it is done by directly embedding annotations into grammar definition. Therefore different concerns 5 are mixed together and this results into a problem: system is not modular, is hard to understand and extend. We employ ideas from aspect-oriented programming paradigm (AOP, see [12]) to solve this problem. We consider experience of creators of LISA [10] and extend their ideas. In Grammatic a grammar definition itself knows nothing about metadata. All the metadata is attached “from the outside”. In AOP this isdonebydefiningjoinpointswhicharedescribedbypointcuts[13].Alanguage of point cuts is a kind of “addressing” notation – a way to find some object. In original AOP this object may be a class or a method, in our case this will be a rule,subexpressionoranotherobjectfromagrammardefinition.Whenwehave found such an object we may attach metadata to it (or perform other actions, see below). InGrammaticwehavealanguageanalogoustoAOPpointcuts–wecallita querylanguage.Weusestructuralquerieswithembeddedvariables.Forexample this query matches rules defining binary operations: Op -> Arg (Sign Arg)* ; All the names here represent variables. This query matches any of the fol- lowing rules: Product -> Factor (MultOrDiv Factor)* ; Sum -> Product (PlusOrMinus Product)* ; By default a variable matches a symbol but it may match a subexpression or a whole production. A -> Alt=(B | C); HerevariablesA,BandCmatchsymbolreferencesandAltmatchesasubex- pression “B | C”. We can use wildcards in queries. The following query matches immediately left-recursive rules: Rec -> Rec ..; Twodotsrepresentawildcardwhichmatchesarbitrarysubexpression.Italso might be assigned to a variable: Rec -> Rec Rest=..; We can consider metadata in our queries. We can restrict a particular at- tribute to a certain type or value or require attribute’s presence or absence: 6 N { type = Nonterminal; operation; associativity : ID; !commutative; } This query matches a symbol with “type” attribute having value “Nonter- minal”, “operation” attribute present, “associativity” attribute having value of type ID and “commutative” attribute not present. 3.4 Aspects When a query selects some objects from a grammar definition, we can attach some metadata to them. Rec -> Rec ..; { Rec { leftRecursive; }; } This rule adds a “leftRecursive”attribute (with no value) to all the symbols matchedby Rec variableofthis query.Asetofsuchrules constitutes anaspect. Many aspects (independent or not) might be assignedto a single grammar,and even to many grammars since our queries are not tied to concrete objects but only to a grammar structure. 3.5 More applications of queries Query language appears to be useful not only in attaching metadata to gram- mars.AsinAOP(AspectJ,forexample)wecanuseittodefinesomeconstraints for our grammarsand their metadata. It’s done by defining a query and assign- inganerrormessagetoit:whenaqueryismatchedanerror(orwarning)willbe generated. The same might be defined for a case when a query does not match anything on a whole grammar. Also we can use queries as a general mechanism for locating objects in a grammar which is useful when defining grammar transformations or some gen- eratorstaking a grammar as input. Here we can consider our query language as analogous to XPath in the context of XSLT. 4 Reuse OneofthemainpurposesofGrammaticisimprovingreuseexperienceofgrammar- oriented tools. An approach described above gives a good basis for it since no concerns are to be mixed together – we can describe each of them in a separate aspect. 7 4.1 Reusing aspects Aspects themselves might be generally reusable – as we told above, query lan- guage does not require “hard linking” to grammar objects, these objects are located by their structural context and properties. Above we gave an example of marking all immediately left-recursive rules with a “leftRecursive” attribute. This is an example of a reusable aspect – we can use it on any grammar. Al- though this technique is not very powerful since we frequently define queries which simply describe whole rules with no generalization, it is still useful. Below we describe more powerful reuse instruments defined by Grammatic. 4.2 Grammar imports and templates First we focus on reusing grammar definitions themselves. There are many achievements in this field done by creators of LISA, SDF and other tools (see section 2). The most popular way of reuse is importing. Some grammar definition A might be imported into some other grammar definition B. This means that all the rules of A are inserted into B. Rules of B may refer to symbols of A – this is the way two grammar definitions are connected. Very frequently we have to customize some of the imported rules, i.e. add some more productions to the same symbols or replace existing productions. In paper [11] this is referredto as rule overriding.In Grammatic we decided to use more general form of this concept, namely templates. A language of grammar templates allows creating grammar definitions with “placeholders”whichcanbe replacedwith actualobjectsupontemplateinstan- tiation. Placeholdersmight be defined for rolesof identifier, expression,produc- tion or symbol. A template instantiation might result into grammar object of the type specified by template declaration. Any objects except symbols might be used immediately in rule definitions. Symbols are treated as imported from a template instantiation. This is due to naming reasons:to avoidname duplica- tion a template instantiation expression is a namespace and symbols from that namespace might be referred to by qualified names. Here is an example of a template and its usage. Symbol binaryOperation<ID $name, Expression $sign, Expression $argument> { $name -> $argument ($sign $argument)*; } import binaryOperation<Product, ’*’ | ’/’, Factor>; import binaryOperation<Sum, ’+’ | ’-’, Product>; Factor -> NUMBER || ID || ’(’ Sum ’)’ ; 8 Inthisexamplewedefineatemplatenamed“binaryOperation”whichmakes upaninfixbinaryoperationoutofsymbolname,signandargumentexpression. Then we instantiate ittwice andimportinstantiationresults into currentgram- mar definition – so we can use new symbol “Product” to create “Sum” and “Sum” to define “Factor”. Here we did not need to use qualified names explic- itly – there’s no name duplication. Now let’s assume that we need to refer to signs of our binary operations as separate symbols. We modify the template as follows: Symbol binaryOperation<ID $name, Expression $sign, Expression $argument> { Sign -> $sign; $name -> $argument (Sign $argument)*; } Now we get two symbols out of a single template instantiation: one symbol for operation and another (named “Sign”) – for its sign. To refer to these new symbols we need qualified names (and named namespaces): import product = binaryOperation<Product, ’*’ | ’/’, Factor>; import sum = binaryOperation<Sum, ’+’ | ’-’, Product>; AnySign -> product.Sign | sum.Sign ; Howcanweusetemplatesfor“overriding”things?Wecanputacustomizable setofrulesintoatemplate,provideaplaceholderforproductionorsubexpression that should be replaced and then put a right thing in upon instantiation. Symbol attributeValue<Production* $moreValueTypes> { AttributeValue -> STRING || ID || INT || Annotation || ValueSequence || $moreValueTypes ; } import attributeValue< ’{{{’ Expression ’}}}’ >; This defines a template for “AttributeValue” symbol and instantiates it adding a new production (to use expressions as attribute values). 9 5 Use cases Grammatic might be used for high level manipulation on modular grammar definitions annotated with arbitrary metadata. In practice this means that we canuseitasafront-endforsomeexternaltoolstoprovidemodularityandreuse. Forexample,ifweneedtouseYacctogenerateparserswecanwriteamodule whichtransformsaGrammaticgrammartoaYaccgrammartouseGrammaic’s modularity and Yacc’s generator together. But pure grammar will not allow us to generate any meaningful parser. We needtoprovidesomeadditionalinformationtogenerateYacc’ssemanticactions (and some other stuff which we will not consider here). The easiest way to do it is to provide string-valued attributes for Grammatic productions which simplycontainsemanticactions’code.Bydoingsowewillgetmodulargrammar separated from semantic actions which might be transformed into a working parser using Grammatic-to-Yacc generator and then Yacc. Anotherwayhereistorestrictfunctionalityoftheparserwearegoingtoget and go after SDF or xText. To do so we must define attributes which describe needed actions in a declarative way: create node of this type, attach this node tothis etc.Andagainwegeta workingparserbyapplyingtwotransformations: Grammatic-to-Yaccandthen Yacc.Although the firstwill be morecomplicated this time. NotethatGrammatic-to-Yacctransformationistobewrittenonceandshipped as a pluggable module for Grammatic. And this might be done for many other external tools (not only parser generators). InmanycasesweneedtorestrictaninputofGrammatic-to-Somethingtrans- formation:wehavetoensureattributevaluetypes,prohibitsomeconstructslike left recursion and so on. To do so we can define queries which generate some errors or warnings (see section 3.5). N { leftAssoc; rightAssoc; }; { error on N : "A symbol cannot be left- and right-associative at the same time"; } Inthisexampleweprohibitusingattributes“leftAssoc”and“rightAssoc”on the same symbol at the same time. These checks may be put into aspects along with metadata attachment rules which makes diagnostic modular and reusable. 6 Conclusion AsweshowedaboveGrammaticcansolvemanyissuesaboutgrammardefinition reuseandmodularity.Thistoolisintendedtobeanopensystemsowearegoing 10 to extend it by adding support for more and more external tools. Grammatic’s generalized grammar definition format may serve as an interchange format for different tools which will again improve end user’s experience. References 1. Martin P. Ward. Language-oriented programming. Software — Concepts and Tools, 15(4):147–161, 1994. 2. PaulKlint,RalfL¨ammel,andChrisVerhoef. Toward anengineeringdisciplinefor grammarware. ACM Trans. Softw. Eng. Methodol., 14(3):331–380, 2005. 3. StevenC. Johnson. Yacc:Yetanothercompiler compiler. InUNIX Programmer’s Manual, volume 2, pages 353–387. Holt, Rinehart, and Winston, New York, NY, USA,1979. 4. Terence Parr. The Definitive ANTLR Reference: Building Domain-Specific Lan- guages. The Pragmatic Bookshelf, Raleigh, 2007. 5. HanspeterMssenbckandJohannesKepler. ThecompilergeneratorCoco/R–user manual, 2005. 6. Open ArchitectureWare. xText. http://www.openarchitectureware.org/, 2007. 7. Fr`ed`ericJouault,JeanB`ezivin,andIvanKurtev. TCS:aDSLfortheSpecification ofTextualConcrete SyntaxesinModel Engineering. In GPCE 2006, ACM,pages 249–254, 2006. 8. EtienneGagnon and Laurie Hendren. An object-oriented compiler framework. In In Proceedings of TOOLS,pages 140–154, 1998. 9. J. Heering, P. R. H. Hendriks, P. Klint, and J. Rekers. The syntax definition formalism SDF– reference manual. SIGPLAN Not., 24(11):43–75, 1989. 10. Marjan Mernik, Mitja Leni, and Enis A. Compiler/interpreter generator system LISA. InInIEEEProceedings of33rdHawaiiInternational ConferenceonSystem Sciences, pages 590–594, 2000. 11. Terence Parr. The reuse of grammars with embedded semantic actions. Interna- tional Conference on Program Comprehension, 0:5–10, 2008. 12. GregorKiczales,JohnLamping,AnuragMendhekar,ChrisMaeda,CristinaVideira Lopes, Jean marc Loingtier, John Irwin, Gregor Kiczales, John Lamping, Anurag Mendhekar, Chris Maeda, Cristina Lopes, Jean marc Loingtier, and John Irwin. Aspect-oriented programming. In In ECOOP.SpringerVerlag, 1997. 13. Gregor Kiczales, Erik Hilsdale, Jim Hugunin, Mik Kersten, Jeffrey Palm, and William G. Griswold. An overview of AspectJ. In ECOOP ’01: Proceedings of the 15th European Conference on Object-Oriented Programming, pages 327–353, London,UK, 2001. Springer-Verlag.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.