
Regular Expressions and Lexical Analysis for IUST Compilers - TA Class Fall 2017
Explore the concepts of regular expressions, converting FA to regular expressions, and introduction to flex in the context of compilers TA class at IUST for Fall 2017. Learn about defining, describing, and utilizing regular languages through examples and recursive definitions.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Regular Expressions and Lexical Analysis IUST Compilers - TA Class Morteza Zakeri Mohsen Amirian Fall 2017 1
Outline Regular Expressions Convert FA to Regular Expressions Introduction to flex 2
Regular Expressions Regular expressions describe regular languages + ( ) * a b c Example: describes the language * , bc a = , , , , , ,... a bc aa abc bca 4
Recursive Definition , Primitive regular expressions: , 1r 2r Given regular expressions and + r r 1 2 r r 1 2 Are regular expressions * r 1 ( ) 1 r 5
Examples ( ) + + A regular expression: * ( ) a b c c ( ) +b + a Not a regular expression: 6
Languages of Regular Expressions : language of regular expression ( ) r L r Example ( ( L ) * = + ) , , , , , ,... a b c a bc aa abc bca 7
Definition For primitive regular expressions: ( ) = L ( ) = L ( ) a = L a 8
Definition (continued) 1r 2r For regular expressions and ( 1 r L + ) ( ) 1 r ( ) 2 r = r L L 2 ( ) ( ) ( ) 1 L r = L r r L r 1 2 2 ( ) ( ( ) 1 r )* = * L r L 1 ( ) ( r ) ( ) 1 r = L L 1 9
Example a Regular expression: ( ) + * b a ( ( ) ) * a ( a ( ) a a , b a , a ( ( ) ( L ) ( L ( ) L b , , aaa aa ) ) * a ) * a ) b ) ( a , aa a + = = = = = = + L a b L L ( ( a b + b ( ( ) a )* aaa )* L L , ,... ,..., , , ,... b ba baa 10
Example ( ) ( * ) = + + Regular expression r a b a bb ( ) r L = , , , , , ,... a bb aa abb ba bbb 11
Example ( ) ( * ) b * = r aa bb Regular expression ( ) r 2 2 n m = { : , 0 } L a b b n m 12
Example = ) 1 + ) 1 + 0 ( * 00 0 ( * r Regular expression (r ) L = { all strings with at least two consecutive 0 } 13
Example = + + 1 ( 01 ) 0 ( * ) r Regular expression (r ) L = { all strings without two consecutive 0 } 14
Generalized Transition Graph Obtaining the final regular expression: 4r 1r 3r f q q 0 2r = + * ( * ) * r r r r r r r 1 2 4 3 1 2 = = ( ) ( ) L r L M L 16
M From construct the equivalent Generalized Transition Graph transition labels are regular expressions Example: M a c a c a+ a, b b 17
b b Another Example: a a, b 1 q q q 0 2 b b b a a+ b q 1 q q 0 2 b 18
b b Reducing the states: a a+ b 1 q q q 0 2 b bb* a b + * ( ) bb a b q q 0 2 19
Resulting Regular Expression: bb* a b + * ( ) bb a b q q 0 2 = + ( * ) * * ( ) * r bb a bb a b b = = ( ) ( ) L r L M L 20
In General e Removing states: c d j q iq q a b ae* ce* d b ce* d iq j q ae* b 21
Format of the Input File %option noyywrap %{ #include<stdio.h> %} Definitions %% Rules (Regular Expressions) Actions ( C codes) %% int main() { yylex(); // include the main loop return 0; } 24
Toy example %{ %} digit [0-9] letter [a-zA-Z] %% #include<stdio.h> #include<conio.h> int count; Definitions Rule and action {letter}({letter}|{digit})* { count++; printf("matched!!! ); } %% int main(void) { yylex(); printf("number of identifiers = %d\n", count); return 0; } 25
Patterns x match the character x . any character (byte) except newline [xyz] either an x , a y , or a z a character class; in this case, the pattern matches [^A-Z] in the class. In this case, any character EXCEPT an uppercase letter. a "negated character class", i.e., any character but those r* r+ r? r{2,5} zero or more r s, where r is any regular expression one or more r s zero or one r s (that is, an optional r ) anywhere from two to five r s 27
Example (0+2)*[(1+3)(0+2)*(1+3)]* 28
Predefined Variables Name Function int yylex(void) char *yytext yyleng yylval int yywrap(void) FILE *yyout FILE *yyin call to invoke lexer, returns token pointer to matched string length of matched string value associated with token wrapup, return 1 if done, 0 if not done output file input file 29
Read input from file int main( int argc, char **argv ) { ++argv, --argc; if ( argc > 0 ) { yyin = fopen( argv[0], "r" ); } else{ yyin = stdin; } yylex(); return 0; } /* skip over program name */ 30
Use lex In Windows and Mac: Refer to our video tutorial. In Unix and Linux: 31
Reference An Introduction to Formal Languages and Automata, Peter Linz Dr. Costas Busch Slides Slides 3 to 20 Flex Online Manual http://dinosaur.compilertools.net/flex/index.html 32
Thank you! 33