News:

MASM32 SDK Description, downloads and other helpful links
MASM32.com New Forum Link
masmforum WebSite

Parse Handles

Started by cman, January 01, 2005, 08:45:46 PM

Previous topic - Next topic

cman

I'm reading thorough my second book on compiler construction ,Introduction to Compiler Construction , T.W. Parsons , and I've come across an example I can't figure out. Suppose we have the grammer:


     
     E -> E + E | E * E | i

    ( "i" is a terminal )



and we are given the expression:


   
   i + i * i




and now we parse the expression "bottom up" , step by step :



line           stack                     input                           production
     
 1.                                         i + i * i             

2.             i                           + i * i                           E -> i

  ( "i" is a handle on line 2 )

3.             E                          + i * i

4.             E +                        i * i                             

5.             E + i                      *  i                              E -> i

  ( "i" is a handle on line 5 )

6.             E + E                      * i 

( why is ( E + E ) not a handle on line 6 ??????? )






The definition of handle states:

   for b to be a handle of the sentential form abw , we must have
   
   ( a ) aAw  ==> abw ( A a nonterminal )

   ( b ) S == > aAw   ( aAw must be derivable from the state symbol )

where we are dealing with rightmost derivations throughout.


If anyone could explain this I would be very grateful. Thanks.........

Randall Hyde

First of all, the grammar is ambiguous. Therefore, there are multiple derivations going from the input string to the starting symbol (i.e., a rightmost derivation). "E+E" *could* be a handle in the derivation. The authors chose, instead, to shift rather than reduce at this point. Their decision, of course, was made to support the standard arithmetic precedence we're used to using. But they could just as easily have reduced "E+E" as shifted the "* i".  In a real parser, you would need to disambiguate this before programming (e.g., in Yacc/Bison you could define the precedence, in an LL parser you would rewrite the grammar in an non-ambiguous form).

So the short answer to your question is "it's magic."  That is, the author decided to shift rather than reduce. The grammar gives you no information as to how you should handle the shift/reduce conflict.

cman

OK , thanks for the information! I see what your say ( operator precedence  ) . In the text the auther implys that the decision can be made by looking at what is on the parse stack and what is left in the input stream , and applying the definition of "handle" to this information. Thats what had me confused. I'll look this over again and get back with you......

Randall Hyde

Quote from: cman on January 02, 2005, 03:30:45 AM
OK , thanks for the information! I see what your say ( operator precedence  ) . In the text the auther implys that the decision can be made by looking at what is on the parse stack and what is left in the input stream , and applying the definition of "handle" to this information. Thats what had me confused. I'll look this over again and get back with you......

Yeah, I've used Parson's book in compiler classes before. It had the same effect on my students :-(.
It's an okay book, but not exactly the clearest to understand.
Cheers,
Randy Hyde

tenkey

#4
Quote from: cman on January 01, 2005, 08:45:46 PMThe definition of handle states:

   for b to be a handle of the sentential form abw , we must have
   
   ( a ) aAw  ==> abw ( A a nonterminal )

   ( b ) S == > aAw   ( aAw must be derivable from the state symbol )

where we are dealing with rightmost derivations throughout.

w is the string of tokens that have not yet been accepted.
The first token in w is the "next" or lookahead symbol - your lexer has decoded it, but you have not yet validated it as being part of a correct parse. One of the things you may need to determine before parsing is what are the valid "follower" tokens for A.

a is the reduced form of the token sequence you have already parsed. There are no handles in it because the parser reduces handles as soon as they are detected. After reduction of b to A, aA may also contain a handle ending in A.

(a) must use a direct derivation, i.e., a rule from the grammar.
(b) may be any derivation, usually indicated with a * for arbitrary number of rules applied, i.e., it may take many rules to show the relationship between S and aAw.

I agree with Randy. Using an ambiguous example is a bad idea. You should try it with a more conventional way of handling operator precedence first.

E ::= T | E + T
T ::= F | T * F
F ::= id | num
A programming language is low level when its programs require attention to the irrelevant.
Alan Perlis, Epigram #8

cman

OK , thanks for the information! Heres the exact paragraph that confuses me ( this come immediatly after the entire parse of the expression ( i + i * i ) ) :

Quote

  I have defined a handle previously: for b to be a handle of the sentential form abw, we must have that ( a ) aAw ==> abw  ( b ) S ==> aAw , where we are dealing with rightmost derivations throughout. The first thing to observe is that w must consist of nothing but terminals ( which is why I wrote it as w instead of , say ,  Y ). Since we are dealing with rightmost derivations , A must be the rightmost nonterminal in aAw. Hence we conclude that there are never any nonterminals to the right of a handle.

( here comes the part that is confuseing to me )

  The definition helps make clear why the ( E + E ) in line 6 ( see my first post ) of our example wasn't a handle .........



To me that says that I can determine that ( E + E ) is a handle ( on line 6 , even though the grammer is ambigous ( which makes no sence ).  I'm I reading this correctly? Thanks....



tenkey

#6
Without a disambiguating rule, the final sentence in the quote is absolutely wrong. Because of ambiguity, there are two rightmost derivations. (We expand the rightmost nonterminal.)

E => E + E     (expand E)
  => E + E * E (expand 2nd E)
  => E + E * i (expand 3rd E)
  => E + i * i (expand 2nd E)
  => i + i * i (expand E)

E => E * E     (expand E)
  => E * i     (expand 2nd E)
  => E + E * i (expand E)
  => E + i * i (expand 2nd E)
  => i + i * i (expand E)


In the first case, the first E + E encountered in parsing is not a handle because the E * E must be reduced first.
In the second case, the first (and only, in this case) E + E is a handle.
A programming language is low level when its programs require attention to the irrelevant.
Alan Perlis, Epigram #8

cman

Quote
Without a disambiguating rule, the final sentence in the quote is absolutely wrong.

Thats exactly what I thought! Authors have to be careful of what they write, as one out of place sentence can confuse and fustrate readers of thier text. I really like the book , however. The author assures me I will be sufficiently prepared to read Compilers: Principals , Techniques , and Tools , Aho , Sethi and Ullman , after finishing this text ( this book seems to be the standard for compiler texts ). Maybe someday I will have enough information to author a hobby quality translator! BTW , has anyone seen a compliler with a table-driven parser that was not software generated? I know this is very difficult to do ( there would be maybe thousands of table entries ) , just curious. Useing other peoples software to write something seems like cheating  :bg ...........

ÇódèKñïght

I don't think this is cheating RAD tools are used in other applications so why not in compilers ? it would be a big waste of time to do all things again when such tools are available besides tools like flex/bison have been used by many people so it must be much more stable and bug free as compared to a new tool but if you are working on some complex compiler like verilog which has around 3000 rules then it's  better to create your own software or parser .   

tenkey

Yes, I have seen compilers with hand-built tables.

In the late 50s, one compiler, for the NELIAC language, had a table that was created by hand. It was called a CO-NO table (current operator/next operator). The table was translated into a branch/jump tree, although it could have been coded as a jump table. Source code for a production version of this compiler was published in a book by Maurice Halstead, Machine-Independent Computer Programming. The book is long out-of-print, and is not a good model for modern compiling techniques.

In D. Gries's book (also out-of-print), Compiler Construction for Digital Computers, a "state transition table" was described, and it was usually hand coded as a jump table. It's similar to one of the ways that FSMs (finite state machines) can be implemented in software.

Tables for operator precedence parsing can be hand coded, and the tables are relatively small compared to other parsing tables.

The modern LR parsers can handle null-generating rules, which complicates grammar analysis. A first pass at LR-analysis can yield a very large number of states. Many of these states can be optimized by merging, but it's much easier to use tools to do all this. You just need to understand why your grammar is being flagged as having "local" ambiguities (conflicts).

Aho, Sethi, and Ullman was once the standard for compiler textbooks. My feeling is that Appel's textbooks are the next standard. It introduces techniques that had not been published in textbooks back in the "dragon book" days. It also discusses details of implementing object-oriented languages and functional languages.



To CodeKnight
It is when your language gets large and complicated that you want tools to manage it. PL/I was considered to be an overly complex language. The designers of the "full optimizing" compiler built special languages to manage the complexities of creating "better code" for PL/I.

Flex and bison provide only parsing support with hooks for adding translation rules. They don't handle compiler problems such as symbol table management, optimization, register allocation, machine instruction formats, and object file formats.

As for Verilog and VHDL, these are languages for simulating (digital) systems with parallel processes. The execution model is more elaborate than conventional sequential languages. There are also inference rules for synthesizing hardware. Tools for placement and routing must be added, as this task is heavily dependent on whether you are generating ASICs, or on which configurable logic system (FPGA or CPLD model) you are programming. The execution model and the inference rules aren't defined by the language grammar.
A programming language is low level when its programs require attention to the irrelevant.
Alan Perlis, Epigram #8

cman

Thanks for the replys! Is Yacc/ Bison basically a C compiler with extra features? This is how the code appears in the tutorials that I have viewed. Can C++ files be linked to Yacc/Bison files to develop translators? I guess useing Bison would not be a big deal as long as I understand LALR parseing fully......

tenkey

Yacc/Bison is a grammar compiler. You embed C code in it, and the C code gets passed on to the output, along with the generated parse table and other code. You need a separate C compiler to further process your code.

If you prefer recursive descent, you may want to look at ANTLR. The only major drawback is that it generates Java. If you hate Java, you may want to find its predecessor PCCTS, which generates C++ code.
A programming language is low level when its programs require attention to the irrelevant.
Alan Perlis, Epigram #8