The Wayback Machine - https://web.archive.org/web/20070314093055/http://www.xambala.com:80/gdev/gdprocessing/index.html

Grammar Development Process

grammar

Grammar Development Process

Grammar development process for Panini based systems has following steps. All steps are not mandatory and basic ones can be skipped by people who have already gained expertise in writing grammars for Panini systems (in other words as people’s knowledge about grammar gets closure to that of Panini- the father of the Sanskrit Grammar- may be father of the grammar itself- they can directly write Panini grammar!). It is assumed that grammar writer is already familiar with Flex & Bison tools and related syntax of “.l” and ‘.y” files (if not one should do that, see Technical Resources to learn more about Flex and Bison). It is also assumed that grammar writer is familiar with various hardware functions which are the disposal of the CPU (which would execute the action codes). Please refer to Panini subsystem’s user manual for detail of these hardware functions.

The Five Steps

1.  Creating suitable grammar (either from scratch or from available ones) for the messages type to be processed by application. The intention is to make sure that productions of interest are defined to sufficient granularity and accuracy (the more accurate the better: keep in mind that structural parsing is done totally by hardware so it is extremely fast compared to CPU based parsing). Using Flex/ Bison tools, verify the grammar for any errors, create executable and see if appropriate productions are generated when a message of the given type is fed to the executable. While creating the grammar, few points should be kept in mind. It makes transition to next steps easy.

a. Unlike a regular Bison/Flex grammar, which is written with a procedural CPU in mind, the Panini grammar runs on a data flow driven machine. The mind set of the grammar writer has to be that an autonomous hardware would do the structural lexing/ parsing and handover a terminal token (if action is called far) or production (if action is called far) along with associated attributes (of terminal or production) to CPU and also force the CPU to start executing the instructions from the area in memory where the associated “action code” is located. The job of “action writer” is to access the token related data structure (which is placed by hardware at well defined memory location); do any manipulation as may be called for and exit via a well defined “macro”. After the exit the control would automatically go to hardware which would keep the CPU in suspended state till there is some further action for CPU.

b. Unlike regular Bison/Flex grammar, which supports only monolithic (also referred to as atomic) grammars, Panini supports composite grammars. This means that if messages can be seen as having multiple parts each of which can be defined using independent grammars, it is good idea to use multiple grammars (for each of those parts) as opposed to creating a single monolithic grammar. The two big (and possibly obvious) advantages in having multiple grammars are: re-use (for example HTTP grammar can be used along with that of SMTP, HTML, XML to create composite grammar for Internet Mail, Web access and Application communication) and maintenance (if there is change in HTTP part of grammar, only one small grammar gets modified as opposed to three of them). Other side advantage of multiple grammars is that total grammar memory used is less and lot of time grammar data can fit in fast on chip memory, thereby giving even better performance.

c. Unlike a general CPU, Panini chip has many specialized hardware at the command of embedded processor, some of them being Memory manager, DMA,  Hash lookup, Normalizer, Decoder etc. Keeping this in mind, one should create equivalent functions early on and use it in action code (later theses functions would be replaced by suitable hardware API calls).

2.  Replacing the functions which have hardware equivalents by suitable API calls. Though it is transparent to writer, it is good to know, that most of the hardware APIs, invoke the hardware in “sleep” or background mode. That means that after triggering the hardware they transfer the control of CPU to hardware scheduler, which schedules the CPU to run, if available, other thread. When other thread transfers the control back to hardware scheduler and if the invoked hardware has completed its action, the execution control comes back to the thread which invoked the hardware API.

3.  Replacing set of certain actions with PRAGMAs. This step requires understanding of various PRAGMAs. The difference between hardware functions and PRAGMAs is that PRAGMAs are for pre-processing and can not be invoked by CPU. The hardware, even before scheduling the CPU to execute action, will do the processing as indicated by PRAGMAs. To illustrate the power of these PRAGMAs, it may be noted that in many XML applications, the action can be solely defined by PRAGMA (in other words, not only structural parsing but even the action  part is done by hardware only). Some of the most important PRAGMAs supported are related automatic hash lookup (for example, in Lexing stage, the hardware can compute hash value of, say, name space URI, and send the hash value as value of token, instead of URI string), string merging (for example, in Parsing stage, one can merge string values of, say, DomainName, DirName and FileName terminal tokens  and create a URL string value corresponding to URL production), string aggregation (for example, in Parsing stage, one can create a length delimited composite string of say ElementName, AttributeName, AtributeValue etc. as value of SOE SAX event).

4.  Validating the Panini Grammar. Once the above transformations have been done, the grammar tools (Xlex, Xparse) should be used to generate relevant binary files (.ltf for lex, .ptf for parse and .elf for action code of each stage). Sample application (called “simple”, as in full form of SNMP!) which is supplied by Xambala can be modified to point to the relevant grammar and message files. Once the board (Panini subsystem) is in system and is powered up and all good LEDs are glowing, the modified (and compiled) sample application can be run. The application would configure the board, load the grammar file in memory of the board and enable it. The Panini subsystem would read the message from host memory space, generate output and notify it to the sample application. Sample application would print the output on screen (one can modify the application to capture the data in some file as well).

5.  As advance validation of the grammar, one can use instrumentations. There are two kinds of instrumentations. First are built in Panini subsystem (these are statistics counters which count various types of tokens and other events). The second kind can be embedded in action code. For example, after a production of interest has been detected by parser action code, the instrumented code could write the value of production, associated attributes (which are already extracted by hardware) and time stamp to known memory locations, which then one could check in debugger environment. The debugger (XDB) is run after the sample application has been run (and output has been captured) as explained in previous step. The debugger can be used to check the various statistics counters as well as any data in memory created by separately inserted instrumentation code. It may be noted that there are two free running timers at the disposal of CPU which can be independently started, stopped and cleared at any time.

And the Grammar is ready for prime time

Once the grammar has gone through the above five steps (again as noted, all steps are not mandatory), it is ready for pilot stage. In pilot stage, there is real application (as opposed to sample one). The various APIs (be it for control purpose like grammar loading or be it for data purpose like sending and receiving data) are used by Host application. Likewise any result checking (or instrumentation checking) is also done through Host APIs. Once pilot setup is run and tested under regressions, grammar (and whole setup) is ready for deployment in live scenario.

Post Additional Content

Login to the Grammar Development Portal

Username:
Password:
New User? Register
Forgot your Password?

Contact Xambala

Need more help?

Email Xambala Sales

Give Us Feedback Your questions, comments, and suggestions about this Web site.