Parsing Techniques and Recursive Descent Parsing for XML Developers

recursive descent parsing for xml developers n.w
1 / 74
Embed
Share

Learn about parsing techniques and recursive descent parsing in XML development. Understand how to structure flat XML documents for easier processing. Explore examples and limitations of recursive descent parsing. Discover the essence of parsing, turning linear data into structured output according to a defined grammar.

  • XML Development
  • Parsing Techniques
  • Recursive Descent Parsing
  • Structured Data
  • Syntax Analysis

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Recursive Descent Parsing for XML Developers Roger L. Costello 15 October 20141

  2. Table of Contents Introduction to parsing in general, recursive descent parsing in particular Example #1: How to do recursive descent parsing on Book data Example #2: How to do recursive descent parsing for a grammar that contains alternatives Limitations of recursive descent parsing 2

  3. Flat XML Document You might receive an XML document that has no structure. For example, this XML document contains a flat (linear) list of Book data: <input> <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> </input> 2

  4. Give it structure to facilitate processing <Books> <Book> <Title>Parsing Techniques</Title> <Authors> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> </Authors> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> </Book> <Book> <Title>Introduction to Graph Theory</Title> <Authors> <Author>Richard J. Trudeau</Author> </Authors> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> </Book> <Book> <Title>Introduction to Formal Languages</Title> <Authors> <Author>Gyorgy E. Revesz</Author> </Authors> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> </Book> </Books> <input> <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> </input> 3

  5. Thats parsing! Parsing is taking a flat (linear) sequence of items and adding structure so that the result conforms to a grammar. 4

  6. Parsing <Books> <Book> <Title>Parsing Techniques</Title> <Authors> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> </Authors> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> </Book> <Book> <Title>Introduction to Graph Theory</Title> <Authors> <Author>Richard J. Trudeau</Author> </Authors> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> </Book> <Book> <Title>Introduction to Formal Languages</Title> <Authors> <Author>Gyorgy E. Revesz</Author> </Authors> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> </Book> </Books> <input> <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> </input> parse 5

  7. From the book: Parsing Techniques Parsing is the process of structuring a linear representation in accordance with a given grammar. The linear representation may be: a flat sequence of XML elements a sentence a computer program a knitting pattern a sequence of geological strata a piece of music actions of ritual behavior 7

  8. Grammar A grammar is a succinct description of the structure. Here is a grammar for Books: Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text 7

  9. Parsing Grammar Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text Linear representation <input> <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> </input> Structured representation <Books> <Book> <Title>Parsing Techniques</Title> <Authors> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> </Authors> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> </Book> <Book> <Title>Introduction to Graph Theory</Title> <Authors> <Author>Richard J. Trudeau</Author> </Authors> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> </Book> <Book> <Title>Introduction to Formal Languages</Title> <Authors> <Author>Gyorgy E. Revesz</Author> </Authors> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> </Book> </Books> parser 8

  10. Alternate view of the parser output Grammar Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text Linear representation <input> <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> </input> parser Parse tree Books Book Book Book Authors Publisher Authors Authors Title Date 2012 Title Date 2007 Title Date 1993 ISBN 0-486-66697-2Dover Publications ISBN ISBN Publisher Springer Publisher 0-486-67870-9 Dover PublicationsIntroduction to Parsing Techniques 978-0-387-20248-8 Introduction to Graph Theory Formal Languages Author Richard J. Trudeau Author Gyorgy E. Revesz Author Ceriel J.H. Jacobs Author Dick Grune 8

  11. Parsing Techniques Over the last 50 years many parsing techniques have been created. Some parsing techniques work from the starting grammar rule to the bottom. Those are called top-down parsing techniques. Other parsing techniques work from the bottom grammar rules to the starting grammar rule. Those are called bottom-up parsing techniques. The following slides explain the recursive descent parsing technique. It is a top-down parsing technique. 9

  12. Terminology: Token A token is an atomic (indivisible) unit. Each item in the input is a token. After parsing the tokens will be leaf nodes. 12

  13. The input consists of a sequence of tokens <input> <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> </input> Each of these are tokens. This input consists of 16 tokens. 13

  14. After parsing the tokens will be leaf nodes <Books> <Book> <Title>Parsing Techniques</Title> <Authors> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> </Authors> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> </Book> <Book> <Title>Introduction to Graph Theory</Title> <Authors> <Author>Richard J. Trudeau</Author> </Authors> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> </Book> <Book> <Title>Introduction to Formal Languages</Title> <Authors> <Author>Gyorgy E. Revesz</Author> </Authors> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> </Book> </Books> tokens (terminal symbols) 14

  15. Another view of the tokens, after parsing Books Book Book Book Authors Publisher Authors Authors Title Date 2012 Title Date 2007 Title Date 1993 ISBN 0-486-66697-2Dover Publications ISBN ISBN Publisher Springer Publisher 0-486-67870-9 Dover PublicationsIntroduction to Parsing Techniques 978-0-387-20248-8 Introduction to Graph Theory Formal Languages Author Richard J. Trudeau Author Gyorgy E. Revesz Author Ceriel J.H. Jacobs Author Dick Grune 15

  16. Parsing structures the input by wrapping the tokens in non-terminal symbols <Books> <Book> <Title>Parsing Techniques</Title> <Authors> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> </Authors> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> </Book> <Book> <Title>Introduction to Graph Theory</Title> <Authors> <Author>Richard J. Trudeau</Author> </Authors> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> </Book> <Book> <Title>Introduction to Formal Languages</Title> <Authors> <Author>Gyorgy E. Revesz</Author> </Authors> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> </Book> </Books> non-terminal symbols 16

  17. Recursive descent parsing Recursive descent parsing works like this: Start at the grammar s start symbol and output it. In our grammar, the start symbol is <Books>, so output it. Progress through each grammar rule. For a non-terminal symbol, output it. For a terminal symbol (i.e., token), check the token in the input stream for match with the terminal symbol; if it matches, output it. 17

  18. Initial <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text Start with the grammar s start symbol and the first token in the input stream. 7

  19. Output the start symbol <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text Output: <Books> </Books> 19

  20. Grammar says there must be at least one Book Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text So the input stream must contain all the tokens for at least one Book. Let s process the grammar rule for Book. 20

  21. Output <Book> <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text Output: <Books> <Book> <Book> </Books> 21

  22. Grammar says the token in the input stream must be Title Yea, the input token matches the grammar rule <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text Output: <Books> <Book> <Title>Parsing Techniques</Title> <Book> </Books> 22

  23. Grammar: after Title must be Authors Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text So the input stream must contain Author tokens. Let s process the rule for Authors. 23

  24. Output <Authors> <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text Output: <Books> <Book> <Authors> <Authors> <Book> </Books> 24

  25. Grammar says the next token in the input stream must be an Author token Yea, the input token matches the grammar rule <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text Output: <Books> <Book> <Author>Dick Grune</Author> <Authors> <Authors> <Book> </Books> 25

  26. Grammar says the next token in the input stream may be an Author token Another Author match <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text Output: <Books> <Book> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Authors> <Authors> <Book> </Books> 26

  27. The next token in the input stream is not an Author token <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text So, return to the caller (i.e., return to the Book rule). 27

  28. Grammar says the input stream token must be a Date token Yea, the input token matches the grammar rule <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text Output: <Books> <Book> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Authors> <Date>2007</Date> <Authors> <Book> </Books> 28

  29. Grammar says the input stream token must be an ISBN token Yea, the input token matches the grammar rule <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text Output: <Books> <Book> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Authors> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Authors> <Book> </Books> 29

  30. Grammar says the input stream token must be a Publisher token Yea, the input token matches the grammar rule <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text Output: <Books> <Book> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Authors> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Book> </Books> <Authors> 30

  31. Weve completed structuring the first 6 input tokens <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text Output: <Books> <Book> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Authors> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Book> </Books> <Authors> 31

  32. Completed the Book rule <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text We ve finished processing the Book rule, so return to the caller (i.e., the Books rule). 32

  33. Begin work on structuring the next Book <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Title text Author text Date text ISBN text Publisher text 33

  34. Implementation The following slides show, in a step-by-step manner, how to implement a recursive descent parser 34

  35. Step 1 Create a function for each non-terminal symbol in the grammar: Functions Books Book+ Book Title Authors Date ISBN Publisher Authors Author+ Books() { } Book() { } Authors() { } 35

  36. Step 2 Create a global element, Token, that is used to identify the current position in the input stream. Initialize Token to 0: Token = 0 36

  37. Step 3 Create a function, get_next_token(). When it is called, it increments the current position in the input stream: get_next_token() { Token = Token + 1 } 37

  38. Step 4 Create a function, token(), and pass it a name, tk. The purpose of this function is to answer the question: Does the token at the current position in the input stream match tk? 38

  39. Example of using the token() function Suppose that during recursive descent parsing the grammar indicates that the next token in the input stream must be Title. Suppose the global variable, Token, indicates that we are here in the input stream: <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> 39

  40. Example (cont.) The token() function determines that there is a match, so it calls get_next_token() to increment the position in the input stream and returns the token: <Title>Parsing Techniques</Title> <Author>Dick Grune</Author> <Author> Ceriel J.H. Jacobs</Author> <Date>2007</Date> <ISBN>978-0-387-20248-8</ISBN> <Publisher>Springer</Publisher> <Title>Introduction to Graph Theory</Title> <Author>Richard J. Trudeau</Author> <Date>1993</Date> <ISBN>0-486-67870-9</ISBN> <Publisher>Dover Publications</Publisher> <Title>Introduction to Formal Languages</Title> <Author>Gyorgy E. Revesz</Author> <Date>2012</Date> <ISBN>0-486-66697-2</ISBN> <Publisher>Dover Publications</Publisher> return 40

  41. The token() function token(string tk) { if (tk != input[position() = Token]) then return () else { get_next_token() return input[position() = Token]) } } Notice that token() returns empty if there is not a match. 41

  42. Motivation for Step 5 Suppose that during recursive descent parsing we are in the Book() function. The Book() function first checks by calling the token() function to see if the current position of the input stream contains Title. Suppose it does. Then, according to the grammar, there must be Authors, Date, ISBN, and then Publisher: Book Title Authors Date ISBN Publisher 42

  43. Step 5 Create a function, require(), and pass it a token, found. If the token is empty (i.e., the token() function returned empty because there was not a match) then call the error() function. Otherwise, return the token. require(element found) { if empty(found) then error( Invalid input ) else return found } 43

  44. Step 6 Create an error function, error(). Pass it a string. It outputs the string and then halts the parser. error(string s) { output s stop } 44

  45. The complete implementation Recursive descent has been around a long time and people have developed beautiful code for it. The following two slides collects all the code from the previous slides. I recommend spending some time studying it to appreciate its beauty. 45

  46. Token = 0 Code for a Recursive Descent Parser main() { get_next_token() require(input()) } input() { return require(Books()) } Books() { <Books> return (require(Book()), optional_additional_Books()) </Books> } optional_additional_Books() { book = Book() if exists(book) then return (book, optional_additional_Books()) } Book() { title = token('Title') if exists(title) then <Book> return (title, require(Authors(), require(token('Date')), require(token( ISBN')), require(token( Publisher')) </Book> } Authors() { <Authors> return (require(Author()), optional_additional_Authors()) </Authors> } 46

  47. optional_additional_Authors() { author = token( Author') if exists(author) then return (author, optional_additional_Authors()) } token(string tk) { if (tk != input[position() = Token]) then return () else { get_next_token() return input[position() = Token]) } } require(element found) { if empty(found) then error( Invalid input ) else return found } get_next_token() { Token = Token + 1 } 47

  48. XSLT Implementation I created an XSLT implementation. I tried to mirror the beautiful code shown on the previous slides. If you would like to give my implementation a go, here is the XSLT program and a sample flat (linear) input XML document: http://www.xfront.com/parsing-techniques/recursive-descent-parser/books- parser.xsl http://www.xfront.com/parsing-techniques/recursive-descent-parser/books- test.xml 48

  49. Richer example The Books example shown on the previous slides was fine for introducing recursive descent parsing. But it glossed over an important problem: grammar rules with alternatives. The following example shows how to do recursive descent parsing with a grammar that has alternatives. 49

  50. Expressions Let s parse a simple expression language that has these tokens: IDENTIFIER, addition, parentheses, and EoF. Here are a few examples of expressions: IDENTIFIER EoF (IDENTIFIER) EoF IDENTIFIER + IDENTIFIER EoF (IDENTIFIER + IDENTIFIER) EoF IDENTIFIER + (IDENTIFIER + IDENTIFIER) EoF (IDENTIFIER + IDENTIFIER) + IDENTIFIER EoF IDENTIFIER + (IDENTIFIER + (IDENTIFIER + IDENTIFIER)) EoF Each expression ends with an end-of-file (EoF) token. 50

More Related Content