Parsing APL for Static Analysis by Anders Schack-Nielsen, Ph.D.

parsing apl for static analysis n.w
1 / 16
Embed
Share

Explore the background, motivation, and solutions for parsing APL for static analysis, including examples and the importance of formalizing assumptions through variable types and static analysis tools. Learn from the experience of APL developers at SimCorp in maintaining a complex codebase and the challenges of documentation assumptions in programming.

  • Parsing APL
  • Static Analysis
  • Variable Types
  • Anders Schack-Nielsen
  • Static Analysis Tools.

Uploaded on | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Parsing APL for Static Analysis Speaker: Anders Schack-Nielsen, Ph.D. Sept. 23rd2014

  2. Outline Background and Motivation To remove bullets: Use backspace. Variable Types To reinsert bullets: Choose Home/Paragraph/ Filled Round Bullets. For next bullet level, use the tab button. Static Analysis Tool Parsing APL In case you need to reset the click on the slide and choose Reset Slide . Kind Inference BNF Grammar 2

  3. Background APL codebase in SimCorp: 68000 functions 1.7m lines of code To remove bullets: Use backspace. To reinsert bullets: Choose Home/Paragraph/ Filled Round Bullets. For next bullet level, use the tab button. 215 APL developers actively developing and maintaining this codebase In case you need to reset the click on the slide and choose Reset Slide . Additional functions and developers covering utilities, etc.

  4. Motivation example Programmer A writes function foo. To remove bullets: Use backspace. foo args : args should be ... mat1 mat2 strings args A makes certain assumptions about the input arguments. To reinsert bullets: Choose Home/Paragraph/ Filled Round Bullets. For next bullet level, use the tab button. ... (implicit assumptions) A documents his assumptions. In case you need to reset the click on the slide and choose Reset Slide . Programmer K writes function bar and calls foo. bar ... foo mat1 mat2 strings ... K has read the header of foo so he knows what sort of arguments to supply. For good measure he also tests it.

  5. Motivation what can go wrong? Translating: A s assumptions documentation of foo K s understanding A lot can be missed, misinterpreted, or left out To remove bullets: Use backspace. To reinsert bullets: Choose Home/Paragraph/ Filled Round Bullets. For next bullet level, use the tab button. Test might not catch this In case you need to reset the click on the slide and choose Reset Slide . Maintenance: Updates to foo Updates to bar Assumptions change requires a synchronous update in three places to be correct.

  6. Solution: Variable Types Formalize assumptions make them checkable. To remove bullets: Use backspace. Introduce variable types and static analysis. Check header specification. Check foo against its header. Check the call to foo from bar. To reinsert bullets: Choose Home/Paragraph/ Filled Round Bullets. For next bullet level, use the tab button. In case you need to reset the click on the slide and choose Reset Slide . foo args : args[1] : mat1 As vtINT[;] : [2] : mat2 As vtINT[mat1:1;mat1:2] : [3] : strings As vtCHAR[][] mat1 mat2 strings args ...

  7. Static Analysis Tool First type checker was introduced in SimCorp 10 years ago. Worked well, but had many flaws. To remove bullets: Use backspace. To reinsert bullets: Choose Home/Paragraph/ Filled Round Bullets. For next bullet level, use the tab button. Recently, the tool has been rewritten from scratch. Many interesting challenges, e.g. parsing APL. 8k lines of F# including 500 lines of FsLex/FsYacc. Understands the semantics of all APL symbols and control- flow constructs. New type checker catches many things the old did not, e.g. potentially all rank errors. In case you need to reset the click on the slide and choose Reset Slide . 7

  8. Real life example To remove bullets: Use backspace. r y textStringRemove x;h 2: y As vtSTRING|vtSTRING[] : (string1)(string2)..... 3: x As vtSTRING|vtCHAR[;] : text vector or matrix 4: r As vtSTRING|vtCHAR[;] : resulting text vector or m ... To reinsert bullets: Choose Home/Paragraph/ Filled Round Bullets. For next bullet level, use the tab button. In case you need to reset the click on the slide and choose Reset Slide . ... dbsource ' 'textStringRemove dbsource tokens '('textSplitAt')'textStringRemove dbsource ... vtSTRING is a short-hand for vtCHAR[] 8

  9. Parsing APL APL is statically un-parsable! To remove bullets: Use backspace. However, it becomes parsable with only a few very minor restrictions. To reinsert bullets: Choose Home/Paragraph/ Filled Round Bullets. For next bullet level, use the tab button. In fact, we can make an LALR(1) parser: In case you need to reset the click on the slide and choose Reset Slide . It is possible to define a completely disambiguated BNF grammar, allowing us to code-generate the parser using Yacc. I.e. we can parse APL from left to right with only a single token lookahead and no backtracking. 9

  10. Parsing APL To remove bullets: Use backspace. x/ y To reinsert bullets: Choose Home/Paragraph/ Filled Round Bullets. For next bullet level, use the tab button. MonadicApply DyadicApply OperatorApply ArrayVariable(y) ArrayVariable(x) ArrayVariable(y) In case you need to reset the click on the slide and choose Reset Slide . OperatorApply Each OperatorApply FunctionVariable(x) Reduce Replicate Each 10

  11. Parsing APL Values come in 3 kinds: Arrays, Functions, and Operators. Sequences of Arrays form vectors. Functions associate to the right. Operators associate to the left. To remove bullets: Use backspace. To reinsert bullets: Choose Home/Paragraph/ Filled Round Bullets. For next bullet level, use the tab button. In case you need to reset the click on the slide and choose Reset Slide . Parsing needs complete kind information. Solution: Separate parsing in two steps with a kind inference algorithm sandwiched in-between: 1. Parse control-flow and matching parentheses, effectively representing expressions as mere token trees. 2. Do kind inference on the token trees. 3. Parse the token trees as full-fledged expressions. 11

  12. Kind Inference Kind inference naturally proceeds from left to right: Consider e.g.: x.y , x/y , x[y] To remove bullets: Use backspace. To reinsert bullets: Choose Home/Paragraph/ Filled Round Bullets. For next bullet level, use the tab button. Left-to-right, depth-first scan: Individual tokens can be inferred based on the kinds of the tokens to the left of it. Parenthesized expressions can have their compound kind inferred based on the kinds of their subparts. In case you need to reset the click on the slide and choose Reset Slide . Tag all tokens with their kind and all left-parentheses with the compound kind they enclose. 12

  13. Inferring compound kinds Kind sequence rewrite algorithm: Uses an elaboration into 5 kinds: Array (A), Function (F), Namespace indexer (.), Monadic operator (M), and Dyadic operator (D). To remove bullets: Use backspace. To reinsert bullets: Choose Home/Paragraph/ Filled Round Bullets. For next bullet level, use the tab button. K K (done) A A Ks A Ks A . Ks Ks A F Ks A (done) K D D Ks A (done) // outer product F F Ks A (done) F A Ks A (done) [AF] M Ks F Ks K D A A Ks K D A Ks K D A . Ks K D Ks [AF] D [AF] Ks F Ks In case you need to reset the click on the slide and choose Reset Slide . *Assumes a minor preprocessing step that wraps A . F with parentheses. Also slightly simplified assuming no A . D or A . M . 13

  14. BNF Grammar (sample excerpt) Expr: | Vector Func Expr { DyadicApply(vector $1, $2, $3) } | FuncLeftmost Expr { MonadicApply($1, $2) } | Vector { vector $1 } To remove bullets: Use backspace. To reinsert bullets: Choose Home/Paragraph/ Filled Round Bullets. For next bullet level, use the tab button. Vector: | SimpleExprLeftmost { [$1] } | SimpleExprLeftmost SimpleVector { $1 :: $2 } SimpleExprLeftmost: | AtomicExpr { $1 } | Vector LBRACKET IdxList RBRACKET { Index(vector $1, $3) } | NameSpaceExprLeftmost AtomicExpr { NameSpace($1, $2) } In case you need to reset the click on the slide and choose Reset Slide . AtomicExpr: | LPAREN Expr RPAREN { $2 } | IDARRAY { IdenArray($1) } | INT { Value(parseInt($1)) } | FLOAT { Value(Float(parseDouble($1))) } | STRING { Value(parseStringValue($1)) } | APLVALUE { Value(AplNil(parseNiladic($1))) } Func: | Func MonadicOperator { MonadicOpApply($1, $2) } | Func DyadicOperatorFuncFunc SimpleFunc { DyadicOpApply($2, FF($1, $3)) } | JOT DyadicOperatorFuncFunc SimpleFunc { DyadicOpApply($2, FF(AplFunction(OuterProduct), $3)) } | SimpleFunc { $1 } 14

  15. Restrictions the fine print What were those restrictions to allow parsing? Defined operators need a static description of whether their operands are functions or arrays. This is not a problem in practice. We need an environment describing all global variables and functions. We need this anyway to typecheck function calls. (Minor quirk related to the :Until-:AndIf construction.) To remove bullets: Use backspace. To reinsert bullets: Choose Home/Paragraph/ Filled Round Bullets. For next bullet level, use the tab button. In case you need to reset the click on the slide and choose Reset Slide . 15

  16. 16

More Related Content