
Understanding Bottom-Up Parsing and Chomsky Normal Form
Explore the concepts of Bottom-Up Parsing, CYK algorithm, Chomsky Normal Form, different parsing approaches, and the benefits of Chomsky Normal Form in parsing. Dive into dynamic programming in parsing with the CYK algorithm.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Bottom-Up Parsing (A First Step) Cocke Younger Kasami (CYK) algorithm and Chomsky Normal Form 1
Last time Showed how to use Java CUP for getting ASTs But we never saw HOW the parser works 2
This time Dip our toe into parsing Approaches to parsing CFG transformations Useless non-terminals Chomsky Normal Form: Chomsky Normal Form: A form of grammar that is easier to deal with CYK CYK: powerful, heavyweight approach to parsing 3
Approaches to Parsing Top Down / Goal driven Begin with the start nonterminal Grow parse tree downward to match the string Bottom Up / Data Driven Start at terminals Generate ever larger subtrees; the goal is to obtain a single tree whose root is the start nonterminal Expr Expr plus Term Term id id 4
CYK: A General Approach to Parsing (Cocke Younger Kasami algorithm) ) Operates in time O(n3) Works bottom-up Requires the grammar to be in Chomsky Normal Form This turns out not to be a limitation: any context-free grammar can be converted into one in Chomsky Normal Form 5
Chomsky Normal Form All rules must be one of two forms: X t (terminal) X A B The only rule allowed to derive epsilon is the start S 6
What CNF buys CYK The fact that non-terminals come in pairs allows you to think of a subtree as a subspan of the input The fact that non-terminals are not nullable (except for start) means that each subspan has at least one character s = s1 s2 s3 s4 7
CYK: Dynamic Programming X t Form the leaves of the parse tree X A B Form binary interior nodes of the parse tree S1,4 S1,4 S2,4 S1,3 S1,4 S3,4 S1,2 S1,2 S3,4 S1,1 S2,2 S3,3 S4,4 S1,1 S2,2 S3,3 S4,4 S1,1 S2,2 S3,3 S4,4 s1 s2 s3 s4 s1 s2 s3 s4 s1 s2 s3 s4 8
Running CYK Track every viable subtree from leaf to root. Here are all the subspans for a string of 6 terminals: Full string 1,6 start, end 1,5 2,6 Ending position of subspan 1,4 2,5 3,6 1,3 2,4 3,5 4,6 1,2 2,3 3,4 4,5 5,6 Single characters 1,1 2,2 3,3 4,4 5,5 6,6 Starting position of subspan 9
CYK Example F 1,6 W In general, go up a column and down a diagonal F I W 1,5 2,6 F I Y X W L X 1,4 2,5 3,6 X N R Y L R N N id 1,3 2,4 3,5 4,6 N I Z Z C N Z X I id L ( 1,2 2,3 3,4 4,5 5,6 R ) I,N L I,N I,N R C , C id ( id , id ) 10
CYK Example F 1,6 F I W W F I Y W L X 2,6 X N R X Y L R 3,6 N id N I Z N Z C N 3,5 I id L ( Z R ) C , 4,5 I,N L I,N C I,N R id ( id , id ) 11
CYK Example F 1,6 F I W W F I Y W L X 2,6 X N R X Y L R 3,6 N id N I Z N Z C N 3,5 I id L ( Z R ) 4,5 C , I,N L I,N C N R id ( id , id ) 12
CYK Example F 1,6 F I W W F I Y W L X 2,6 X N R X Y L R 3,6 N id N I Z N Z C N 3,5 I id L ( Z R ) 4,5 C , I,N L I C N R id ( id , id ) 13
CYK Example F 1,6 F I W W F I Y W L X 2,6 X N R X Y L R 3,6 N id N I Z N Z C N 3,5 I id L ( Z R ) 4,5 C , I,N L I C N R id ( id , id ) 14
CYK Example F 1,6 F I W W 2,6 F I Y W L X X N R X Y L R 3,6 N id N I Z N Z C N 3,5 I id L ( Z R ) 4,5 C , I,N L I C N R id ( id , id ) 15
CYK Example F 1,6 F I W W 2,6 F I Y W L X X N R X Y L R 3,6 N id N I Z N Z C N 3,5 I id L ( Z R ) 4,5 C , I,N L I C N R id ( id , id ) 16
Cleaning up our grammars We want to avoid unnecessary work Remove useless rules 17
Eliminating Useless Nonterminals 1. If a nonterminal cannot derive a sequence of terminal symbols, then it is useless 2. If a nonterminal cannot be derived from the start symbol, then it is useless 18
Eliminate Useless Nonterminals Mark all terminal symbols Repeat If all symbols on the righthand side of a production are marked mark the lefthand side Until no more non-terminals can be marked If a nonterminal cannot derive a sequence of terminal symbols, then it is useless 19
Example: S X Y X | Y ( ) ( Y Y ) 20
Eliminate Useless Nonterminals Mark the start symbol Repeat If the lefthand side of a production is marked mark all righthand non-terminal Until no more non-terminals can be marked If a nonterminal cannot be derived from the start symbol, then it is useless 21
Example: S A B C A B + | - | digit | B digit . B 22
Chomsky Normal Form 4 Steps Eliminate epsilon rules Eliminate unit rules Fix productions with terminals on RHS Fix productions with > 2 nonterminals on RHS 23
Eliminate (Most) Epsilon Productions If a nonterminal A immediately derives epsilon Make copies of all rules with A on the RHS and delete all combinations of A in those copies 24
Example 1 F id ( A ) A A N N id N id , N F id ( A ) F id ( ) A N N id N id , N 25
Example 2 X AxAyA A A z X AxAyA | AxAy | AxyA | x AyA | Axy | x Ay | xyA | xy A z 26
Eliminate Unit Productions Productions of the form A B are called unit productions Place B anywhere A could have appeared and remove the unit production 27
Example 1 F id ( A ) F id ( ) A N N id N id , N F id ( N ) F id ( ) N id N id , N 28
Fix RHS Terminals For productions with terminals and something else on the RHS For each terminal t t add the rule X t t Where X is a new non-terminal Replace t with X in the original rules 29
Example F I L N R F I L R F id ( N ) N id F id ( ) N I C N N id N id , N I id L ( R ) C , 30
Fix RHS Nonterminals For productions with > 2 Nonterminals on the RHS Replace all but the first nonterminal with a new nonterminal Add a rule from the new nonterminal to the replaced nonterminal sequence Repeat 31
Example F I L N R F I W W L N R F I W W L X X N R 32
Parsing is Tough CYK parses an arbitrary CFG, but O(n3) time Too slow! For special classes of grammars O(n) time Examples of such classes: LL(1) and LALR(1) 33
Classes of Grammars LL(1) Scans input from Left-to-right (first L) Builds a Leftmost Derivation (second L) Can peek (1) token ahead of the token being parsed Top-down predictive parsers LALR(1) Uses special lookahead procedure (LA) Scans input from Left-to-right (second L) Rightmost derivation (R) Can also peek (1) token ahead LALR(1) strictly more powerful, but the algorithm is harder to understand (Java CUP generates a LALR(1) parser) 34
Summary We covered How to parse with the CYK algorithm (dynamic programming) How to put a grammar into Chomsky Normal Form 35