Understanding Language Translation in Computer Science

compilation n.w
1 / 25
Embed
Share

Explore the complexity of language translation in computer science, from compilers to de-compilation. Dive into the layers of high-level languages and learn about courses like Formal Languages and Compiler Construction. Discover the challenges and importance of defining programming languages and generating code accurately.

  • Language Translation
  • Computer Science
  • Compilers
  • Programming Languages
  • Formal Languages

Uploaded on | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Compilation more generally, language translation

  2. compilers read the source code you write and produce machine code from it we likely all know this But there are many facets to the story (and several chapters to each) K&R C compiler Java compiler early C++ compilers Lisp interpreter Are these all essentially the same?

  3. language translation a very big part of computer science consider all the many layers from your high-level language of choice, down to the nand gate The layers often correspond to, or are defined by a language

  4. language translation There is a lot of complexity to the topic both In academia, this shows up in a course named Formal Languages theoretical and in terms of and this shows up in a course named compiler construction technical implementation (if they show up at all)

  5. language translation some of the topics are of interest here in our world of reverse engineering. It is especially relevant to attempts to evolve de-compilation beyond the current juvenile state in which it lives. Dis-assembly is pretty well understood, but de-compiling is another matter

  6. language translation We will skip over much error handling much of type checking 1001 of the 1002 available optimizations and more But we will bother with: A way to define a language Code to read and recognize a statement in the source language A way to canonically generate code for the target language

  7. language translation We don t translate French into English We translate a work written in French into a work written in English (a subtle thought, but worth mentioning) C, Java, Algol, APL, Fortran, etc. are languages, The programs we write in them are statements in the language. Are programming languages precisely defined? Can we look at a program and, 100% of the time, Give a correct, binary, answer to the question: is this a correct statement in the language?

  8. language translation Forget about translating to the target language And everything else beyond Just answer: is this correct Java? That s what the first couple phases of compilation is mostly about

  9. language translation Except for early Fortran (only counter example I know of anyway) A computer programming language is defined by a grammar

  10. language translation There are different forms of grammars, but they share A precise definition of a language and, they can be used to generate a program which can read the statement (code) and say yes/no

  11. language translation grammars define the language let s just get the idea by example: For the C programming language see the pdf in the code section of the class website: c-Grammar.pdf For the Java programming language (early version) see: http://csci.csusb.edu/dick/samples/java.syntax.html

  12. The Chomsky hierarchy some theory behind it all compiler phase 2, syntax analysis compiler phase 1, lexical analysis a set of grammars along with the abstract machines which can recognize them

  13. The stages of compilation 1d = check each token to make sure it is correct correct = the fsm which recognizes the RE ends in an accepting state 1-d checking tokens lexical analysis is called: regular expression each token type defined by finite state machine recognized by: According to Chomsky type 3 software helper: lex

  14. Consider reading source code: int func1(char x) { int a=3,b; if (x > a) { 1-d lexical analysis, recognize individual tokens int func1 lparen char x rparen etc -type -ident -( -type -ident -) So we are turning the source code into a token stream.

  15. we are turning the source code into a token stream Each token is described by a Regular Expression A Finite State Machine can be built to recognize each RE A Finite State Machine is an abstract machine which implements a mathematical model of computation. lex (also flex) is well-known code used to generate a lexical analyzer

  16. Each token can be described by a Regular Expression int func1 lparen char x rparen -type -ident -( -type -ident -) type: int | char | float | short ident: (A-Za-z)(A-Za-z0-9_)* lparen: ( rparen: ) equal: = num10: [(+,-)] (1-9)+[(1-9)]*

  17. Formal definition of a Finite State Machine A finite finite state Q denotes the finite set of states represents the alphabet that consists of input symbols : Q x = Q denotes the transition function the defines the transition from qito qjQ for each input symbol where qi, qj Q. q0denotes the start state F denotes the set of final states state machine machine is a 5-tuple (Q, , , q0, F) where

  18. A Finite State Machine can be built to recognize each RE S is the start state (sometimes labelled Q0) Alphabet is ASCII Q = the 4 states is the (only) accepting state 1-S are the transitions A 2 start in the start state S (q0); read a character, take the indicated branch; when EOI, if you are in an accepting state: 3 it is recognized other conditions are errors

  19. You may recognize FSMs from other contexts GUI design protocols other human interfaces lot s of stuff in CS/logic

  20. Example simple FSM for an elevator FSM to recognize 111 in binary input Floor you are on ? Button you press Andrew Tuline, tuline.com

  21. 2d = check arrangement of tokens according to the grammar correct = the pda which recognizes the language ends in an accepting state checking arrangement of tokens: ( followed by Boolean-expr? followed by ) ? 2-d syntactical analysis is called: BNF grammar Language defined by pushdown automata recognized by: According to Chomsky Type 2 software helper: yacc

  22. Consider reading source code: int func1(char x) { int a=3,b; if (x > a) { 2-d sytactical analysis, recognize 2-d structure lparen args rparen The language is defined by the grammar A push-down automata can be built to recognize each sentence in the grammar-defined language, in other words, each program Add a stack to a Finite State Machine and you have a PDA yacc is well-know code to generate a parser

  23. Consider reading source code: int func1(char x) { int a=3,b; if (x > a) { Each time you recognize a token, push a symbol for it on the stack If the right hand side of a rule exists on the top of the stack pop it off and replace it with the left hand side of a rule ID lparen Args rparen id call id

  24. Deeper details are not needed for this class seek out Formal Language and Compiler Construction in graduate school if you d like to pursue it further Takeaway for our purposes: A programming language is defined by a grammar Any given program is a statement in that language Compilers process the source in stages to convert to binary stage 1 is lexical analysis each token is recognized by FSM/RE software to generate lexical analyzer from RE s is lex (flex) stage 2 is syntax analysis program is accepted or not by PDA/grammar software to generate parser from grammar is yacc (bison) following stage is code generation Comfort understanding/reading/creating FSM s

  25. lex/fsm/yacc/pda/bnf/grammar/etc parsing example https://www.youtube.com/watch?v=yTXCPGAD3SQ

Related


More Related Content