Lexical and Syntactic Analysis in Compilation Process

Slide Note

We delve into breaking source code into lexemes units and parsing them into syntactic uses during Lexical and Syntactic Analysis. Learn how errors can be identified and reported, the implementation process, and the importance of parsing in generating parse trees with grammatical categories. Understand the forms of parsers - top-down and bottom-up - used in programming languages for a seamless compilation process.

forero_w Follow

Uploaded on Feb 27, 2025 | 1 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Lexical and Syntactic Analysis We look at two of the tasks involved in the compilation process break the source code into lexemes units (lexical analysis) the goal is to identify each lexeme and assign it to its proper token parse the lexical components into their syntactic uses (syntactic analysis) the goal is to parse the lexemes into a parse tree During both lexical and syntactic analysis, errors can be detected and reported

Continued Lexical analysis takes source code that consists of reserved words, identifiers, punctuation, blank spaces, comments Identify each item for its lexical category e.g., the reserved word for , a semicolon, an identifier, a close ), etc How do we perform this operation? use a relatively simple state transition diagram to describe the various entities of interest implement a program based on recursive functions, one per lexical category, to search the next portion of the program to identify a component s category

Example

Recognizing Names/Words/Numbers

Implementation int lex( ) { getChar( ); switch (charClass) { case LETTER: addChar( ); getChar( ); while (charClass == LETTER || charClass == DIGIT) { addChar( ); getChar( ); } return lookup(lexeme); break; case DIGIT: addChar( ); getChar( ); while (charClass == DIGIT) { addChar( ); getChar( ); } return INT_LIT; break; } /* End of switch */ } /* End of function lex */

Parsing The process of generating a parse tree from a set of input that identifies the grammatical categories of each element of the input identifying if and where errors occur Parsing is similar whether for a natural language or a programming language a good parser will continue parsing even after errors have been found this requires a recovery process Parsing is based on the language s grammar but must also include the use of attributes in the grammar (that is, an attribute grammar)

Forms of Parsers Top-down (used in the LL parser algorithm) start with LHS rules, map to RHS rules until terminal symbols have been identified, match these against the input Bottom-up (used in the LR parser algorithms) start with RHS rules and input, collapse terminals and non-terminals into non-terminals until you have reached the starting non-terminal Parsing is an O(n3) problem where n is the number of items in the input if we cannot determine a single token for each lexeme, the problem becomes O(2n)! by restricting our parser to work only on the grammar of the given language, we can reduce the complexity to O(n)

Top-Down Parsing Uses an LL parser (left-to-right, using leftmost derivation) Generate a recursive-decent parser from a BNF grammar non-terminal grammatical categories converted into functions e.g., <expr>, <if>, <factor>, <assign>, <id> each function, when called, parses the next lexeme using a function called lex( ) and maps it to terminal symbols and/or calls further functions Two restrictions on the grammar cannot have left recursion if a rule has recursive parts, those parts must not be the first items on the RHS of a rule, for instance <A> <A>b cannot be allowed but <A> b<A> can must pass the pairwise disjointness test (covered shortly) Algorithms exist to alter a grammar so that it passes both restrictions

Recursive Decent Parser Example Recall our example expression grammar from chapter 3: <expr> <term> {(+ | -) <term>} <term> <factor> {(* | /) <factor>} <factor> id | ( <expr> ) void expr( ) { term( ); while (nextToken = = PLUS_CODE || nextToken = = MINUS_CODE){ lex( ); term( ); } } void term( ) { factor( ); while (nextToken = = MULT_CODE || nextToken = = DIV_CODE) { lex( ); factor( ); } } void factor( ) { if(nextToken = = ID_CODE) lex( ); else if(nextToken = = LEFT_PAREN_CODE) { lex( ); expr( ); if(nextToken = = RIGHT_PAREN_CODE) lex( ); else error( ); } else error( ); }

If Statement Example void ifstmt( ) { if (nextToken != IF_CODE) error( ); else { lex( ); if (nextToken != LEFT_PAREN_CODE) error( ); else { boolexpr( ); if (nextToken != RIGHT_PAREN_CODE) error( ); else { statement( ); if(nextToken = = ELSE_CODE) { lex( ); statement( ); } } } } } We expect an if statement to look like this: if (boolean expr) statement; optionally followed by: else statement; Otherwise, we return an error

Pairwise Disjointness Consider a rule with multiple RHS parts, for instance <A> a<B> | a<C> The LL parser must be able to select the part of the rule to simplify, the first non-terminal on each right-hand side rule must differ (that is, each RHS mapping must start uniquely to make the choice obvious) This is pairwise disjointness Here are some examples A aB | bAb | c passes (pairwise disjoint) A aB | aAb fails (not pairwise disjoint) <var> id | id[<expr>] fails but can be made pairwise disjoint as follows <var> id<next> <next> | [<expr>] ( means empty set)

Bottom-Up Parsing To avoid the restrictions on an LL parser, we might want to use an LR parser (left-to-right parsing, rightmost derivation) Implemented using a pushdown automaton a stack added to the state diagrams seen earlier Parser has two basic processes shift move items from the input onto the stack reduce take consecutive stack items and reduce them for instance, if we have a rule <A> a<B> and we have a and <B> on the stack, reduce them to <A> the parser is easy to implement but we must first construct what is known as an LR parsing table there are numerous algorithms to generate the parsing table

Parser Algorithm Given input S0, a1, , an, $ S0 is the start state a1, , an are the lexemes that make up the program $ is a special end of input symbol If action[Sm, ai] = Shift S, then push ai, S onto stack and change state to S If action[Sm, ai] = Reduce R, then use rule R in the grammar and reduce the items on the stack appropriately, changing state to be the state GOTO[Sm-1, R] If action[Sm, ai] = Accept then the parse is complete with no errors If action[Sm, ai] = Error (or the entry in the table is blank) then call error-handling and recovery routine The Parsing table stores the values of action[x, y] and GOTO[x, y]

Example Grammar: 1. E E + T 2. E T 3. T T * F 4. T F 5. F (E) 6. F id Parse of id+id*id$ Stack 0 0id5 0F3 0T2 0E1 0E1+6 0E1+6id5 *id$ R6(GOTO[6,F]) 0E1+6F3 *id$ R4(GOTO[6,T]) 0E1+6T9 *id$ S7 0E1+6T9*7 id$ S5 0E1+6T9*7id5 $ R6(GOTO[7,F]) 0E1+6T9*7F10 $ R3(GOTO[6,T]) 0E1+6T9 $ R1(GOTO[0,E]) 0E1 $ ACCEPT Input id+id*id$ S5 +id*id$ R6(GOTO[0,F]) +id*id$ R4(GOTO[0,T]) +id*id$ R2(GOTO[0,E]) +id*id$ S6 id*id$ S5 Action

Lexical and Syntactic Analysis in Compilation Process

Download Presentation

Presentation Transcript

Related

More Related Content