
Compilers and Grammars
Learn about compilers, their functions, and the importance of grammars in programming languages. Explore lexical analysis, syntactic analysis, and the BNF notation used for defining language constructs.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
CHAPTER 5 COMPILERS
The compiler functions are: To generate execution language to object code) needed (translate in the level code domain high To provide diagnostics for violations of the programming language semantics in the source program (To detect errors) A High level language is usually described in terms of a grammar (a formal description of its syntax or legal statements) The semantics statements grammar does meaning not describe the the or of various
E.g. an assignment statement might be defined by the grammar as: a variable name followed by an assignment operator (:=) followed by an expression. is the matching of the Compilation statements written by the programmer to structures defined by the grammar, and generating the appropriate object code for each statement.
The fundamental building blocks of the language are called keyword, a variable name, an integer, an arithmetic operator etc. e.g. a Tokens The task of scanning the source statement, recognizing and classifying the various tokens is known as lexical analysis. This is done by part of a compiler called a Scanner
After the token scan, each statement in the program must be recognized as some language construct, declaration, or an assignment statement, described by the grammar. such as a This process which is called syntactic analysis or parsing is performed by part of the compiler that is called the parser. The last step is the basic translation process in the generation of object code.
GRAMMARS The simplest used notation to write a grammar is the BNF (Backus Naur Form). A BNF grammar consists of a set of rules each of which defines the syntax of some construct in the programming language.
BNF grammar of a Pascal Language. 1. <prog> ::= PROGRAM <prog-name> VAR <dec-list> BEGIN <stmt-list> END 2. <prog-name>::= id 3. <dec-list> ::= <dec> | <dec-list> ; <dec> 4. <dec> ::= <id-list> : <type> 5. <type> ::= INTEGER 6. <id-list>::= id | <id-list> , id 7. <stmt-list> ::= <stmt> | <stmt-list> ; <stmt> 8. <stmt> ::= <assign> | <read> | <write> | for 9. <assign> ::= id := <exp>
BNF grammar of a Pascal Language. 10.<exp> ::= <term> | <exp> + <term> | <exp> - <term> 11.<term> ::= <factor> | <term * <factor> | <term> DIV <factor> 12.<factor>::= id | int | ( <exp>) 13.<read> ::= READ ( <id-list> ) 14.<write> ::= WRITE ( <id-list> ) 15.<for> ::= FOR <index-exp> DO <body> 16.<index-exp> ::= id := <exp> TO <exp> 17.<body> ::= <stmt> | BEGIN <stmt-list> END
BNF grammar of a Pascal Language. The symbol ::= means is defined to be. Character angle brackets < terminal constructs defined in the grammar). strings enclosed and (i.e. between > are called non names the of the symbols Entries not enclosed in angle brackets are terminal symbols of the grammar (i.e. tokens).
13. <read> ::= READ ( <id-list>) The non terminal symbols are <read> and <id-list>, and the terminal symbols are the tokens READ, (, and ). The rule specifies that a <read> consists of the token READ, followed by the token ( , followed by a language construct <id- list>, followed by the token ) . To recognize a <read> we need the definition of <id-list> which is provided for in rule 6.
Lexical Analysis This involves scanning the program to be compiled and recognizing the tokens that make up the source statements. Scanners identifiers, integers, floating point numbers, character strings and other similar items that are written as part of the source program. recognize keywords, operators, Items such as identifiers and integers are usually recognized as either single tokens or they could be defined as part of the grammar e.g.
Lexical Analysis <ident> ::= <letter> | <ident> <letter> | <ident> <digit> <letter> ::= A | B | C | D | |Z <digit> ::= 0 | 1 | 2 | 3 | ..|9 In such a case the scanner would recognize as tokens the single characters A, B, 0, 1 etc. Similarly the scanner recognizes both single character and multiple character tokens directly. The output of a scanner consists of a sequence of tokens. Each token is usually represented by some fixed length code such as an integer. Below is the token coding scheme for the grammar considered:
Lexical Analysis A token specifier is the name of a particular token. In addition to recognizing tokens the scanner also reads the lines of the source program and it prints the source listing.
Example 1. PROGRAM STATS 2. VAR 3. SUM, SUMQ, I, VALUE,MEAN, VARIANCE : INTEGER 4. BEGIN 5. SUM := 0; 6. SUMQ := 0; 7. FOR I := 1 TO 100 DO 8. BEGIN 9. READ (VALUE); 10. SUM := SUM + VALUE; 11. SUMQ := SUMQ + VALUE * VALUE 12. END; 13. MEAN : = SUM DIV 100; 14. VARIANCE := SUMQ DIV 100 - MEAN * MEAN; 15. WRITE (MEAN, VARIANCE) 16.END.
Syntactic Analysis syntactic analysis statements are recognized as language constructs described by the grammar being used using a Parse Tree or Syntax Tree. In the source Parsing techniques are of two types; bottom up and top down according to the way in which the parse tree is being constructed.
Syntactic Analysis Top down methods begin with the rule of the grammar that specifies the goal of the analysis (i.e. the root of the tree), and attempt to construct the tree so that the terminal nodes match the statements being analyzed.
9: READ (VALUE) <Read> <id-list> Read ( id ) {value}
Syntactic Analysis Bottom up methods begin with terminal nodes of the tree (the statements being analyzed), and attempt to combine these into successively higher- level nodes until the root is reached. One of the bottom-up parsing techniques is the operator-precedence method which is based on examining pairs of consecutive operators in the source program and making decisions about which operation should be performed first.
Syntactic Analysis Consider A + B * C D Multiplication precedence than addition and subtraction. and division usually have higher So for the first pair of operators i.e. (+ and *), + has lower precedence than * i.e. + < *; similarly * > - for the next pair. For the expression A + B * C D < > This implies that the expression B * C is to be computed before either of the other operations. In form of a parse tree this means that the * operation appears at a lower level than the + or operators.
Syntactic Analysis The statement being analyzed is scanned for a sub expression whose operators have higher precedence than the surrounding operators. This sub expression then is interpreted in terms of the rules of the grammar under consideration. This process continues until the root of the tree is reached. The first step in constructing an operator precedence parser is precedence relations between the operators of the grammar. In this context, operator means any terminal symbol (a token). to determine the
Syntactic Analysis From the table: PROGRAM = VAR means that the two tokens involved have equal precedence. means that BEGIN has less precedence than BEGIN < FOR FOR. Precedence relations do not follow the ordinary rules for comparisons e.g. ; > END but END > ; Where there are no precedence relations between pairs of tokens means that the two tokens cannot appear together in any legal statement. If such a combination occurs during parsing it should be recognized as a syntax error. The statements are scanned from left to right one token at a time. For each pair of operators the precedence relation between them is determined.
1. Read(Value);
2. Variance := Sumq DIV 100 Mean * Mean ;
2. Variance := Sumq DIV 100 Mean * Mean ;
2. Variance := Sumq DIV 100 Mean * Mean ;
2. Variance := Sumq DIV 100 Mean * Mean ;
CODE GENERATION The last task in compilation is the generation of object code. The object code for each part of the program is generated as soon as its syntax has been recognized. There are routines (functions) for each rule in the grammar. When the parser recognizes a portion of the source program according to some rule of the grammar, the corresponding routine is executed. Such routines are called semantic routines or code generation routines.
CODE GENERATION code generated computer for which compiled. The depends the program upon is the being The parser always recognizes at each step the left most substring of the input that can be interpreted according grammar. to the rule of the In recognition occurs when a substring of the input is reduced to some non terminal <Ni> the operator precedence method this
CODE GENERATION assembler code representation of generated for the READ statement on the SIC/XE machine. This shows object the code symbolic to the be +JSUB WORD WORD XREAD 1 VALUE XREAD is part of a standard library associated with the compiler. It can be called by any program that wants operation. to perform a READ
CODE GENERATION XREAD is passed parameters that specify the details of the READ. The second word in this parameter list contains the number of variables that will be read and the third line gives the address of this variable. The parser in generating the parse tree recognizes first <id-list> and then <read>. At appropriate code generation routine. each step the parser calls the