Regular Expressions and Finite Automata Overview

Regular Expressions and Finite Automata Overview
Slide Note
Embed
Share

Explore the relationship between regular expressions and finite automata in the context of lexical analysis, compiler design, and token specification. Learn about the implementation of lexical analyzers as finite state machines and the synthesis of FSMs to recognize desired patterns. Discover how transition diagrams depict the action of lexical analyzers in parsing source code to identify tokens efficiently.

  • Regular Expressions
  • Finite Automata
  • Lexical Analysis
  • Compiler Design
  • Token Specification

Uploaded on Apr 03, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Regular Expressions To Finite Automata Mr.Keshav Tambre Information Technology Department International Institute of Information Technology, I IT www.isquareit.edu.in

  2. Regular Expressions Lexical analyzer recognizes tokens present in the source program Compiler defines tokens using RE These indicate what characters may go into a lexeme belonging to a particular token and how they go together Lexical analyzer is implemented as FSM 2

  3. Regular Expressions Design of lexical analyzer involves:- Represent tokens by RE Synthesis of finite state machines that recognize desired strings generated by RE Simulating the behavior of FSM which generates the lexical analyzer 3

  4. Specification of Tokens RE are notations for specifying patterns Alphabet/Character class Denotes a finite set of symbols String Strings are defined over some alphabets, is a finite sequence of symbols drawn from the alphabet Empty string denoted by Language Set of strings over some alphabet 4

  5. Regular Expression ={a,b} a|b denotes the set {a,b} RE (a|b) (a|b) denotes the set {aa,ab,ba,bb} (a|b)* a|a*b Identifiers in C letter|(letter | digit)* 5

  6. Regular Definitions Names given to RE If is an alphabet of basic symbols then RE is a sequence of definitions of the form: d1 r1 .. dn rn where di is the distinct name and ri is the RE Short hand notations: * + ? 6

  7. Transition diagrams Intermediate steps in the construction of the analyzer Depicts the action of the lexical analyzer when called upon by the parser to get the next token Positions are represented as circles called as states States connected by arrows called edges Edges have labels indicating the input characters that can appear next after the transition diagram has reached states Assumption TD are deterministic in nature 7

  8. Finite automata When RE is compiled as a recognizer for a language by constructing a generalized transition diagram called finite automata Deterministic and Non deterministic NFA Mathematical model that consists of Set of states , s Set of input symbol Transition function move that maps state symbol pairs to set of states A start state A final state 8

  9. Finite automata Diagrammatically represented as a labeled directed graph called transition graph Implementation-State Transition Table DFA Special case of NFA in which No state has transition For each state s and input symbol a is at most one edge labeled a leaving s 9

  10. Specification of Tokens RE are notations for specifying patterns Alphabet/Character class Denotes a finite set of symbols String Strings are defined over some alphabets, is a finite sequence of symbols drawn from the alphabet Empty string denoted by Language Set of strings over some alphabet 10

  11. RE to DFA RE to DFA We augment the grammar the RE with # Construct nullable,firstpos ,lastpos and followpos which is done by making traversal over the tree Construct DFA from followpos Firstpos: at what positions the string can start Lastpos:at what positions the string can end Follopos:what positions can follow position I in the syntax tree Nullable:whether an empty string can be generated at the node (sub expression) 11

  12. RE to DFA Node n Nullable(n) Firstpos(n) Lastpos(n) true N is a leaf labeled Nullable(c1) or Nullable (c2) Firstpos (C1) U Firstpos(C2) Lastpos (C1) U Lasttpos(C2) if N is | node with left child c1 and right child c2 N is a leaf labeled with position i false { i } { i } Nullable(c1) and Nullable (c2) If nullable (C1) then Firstpos (C1) U Firstpos(C2) else Firstpos(C1) Firstpos (C1) If nullable (C2) then Lastpos (C1) U Lastpos(C2) else Lastpos(C2) Lastpos (C2) if N is . Node node with left child c1 and right child c2 true if N is the node with only c1 as child 12

  13. RE to DFA Computation of Followpos : Followpos(i) tells us what positions can follow position i in the syntax tree. This can be computed as follows. if n is a . Node, with a left child C1 and right child C2 and i is a position in the Lastpos(C1), then all positions in Firstpos(C2) are in Followpos(i) if n is a * Nodeand i is a position in the Lastpos(n), then all positions in Firstpos(n) are Followpos(i) 13

  14. RE to DFA Algorithm for construction of DFA transition table Initially , the only unmarked state in Dstates is firstpos(root), where root is the root of a syntax tree. While there is an unmarked state T in Dstates do Begin Mark T For each input symbol a do Begin Let U be the set of positions that are in Followpos(P) for some P in T, such that the symbol at position P is a. If U is not empty and is not in Dstates then add U as an unmarked state to Dstates Dtran [T,a] = U End End 14

More Related Content