
Importance of Data Visualisation in Analysis
Data visualisation plays a crucial role in the analysis process, aiding in exploring data, checking assumptions, generating hypotheses, and conveying results effectively. Understanding the principles and elements of good data visualisation is key to presenting information in a clear and concise manner.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
On Abstraction Refinement for Program Analyses in Datalog Xin Zhang, Ravi Mangal, Mayur Naik Georgia Tech Radu Grigore, Hongseok Yang Oxford University
Datalog for program analysis Datalog 2 Programming Language Design and Implementation, 2014 6/10/2014
What is Datalog? Datalog 3 Programming Language Design and Implementation, 2014 6/10/2014
What is Datalog? Input relations: Output relations: Rules: (1) path(i, i). (2) path(i, k) :- path(i, j), edge(j, k). edge(i, j). path(i, j). Least fixpoint computation: Input: edge(0, 1), edge(1, 2). path(0, 0). path(1, 1). path(2, 2). path(0, 1) :- path(0, 0), edge(0, 1). path(0, 2) :- path(0, 1), edge(1, 2). Datalog 4 Programming Language Design and Implementation, 2014 6/10/2014
Why Datalog? If there exists a path from a to b, and there is an edge from b to c, then there exists a path from a to c: path(a, c) :- path(a, b), edge(b, c). Datalog 5 Programming Language Design and Implementation, 2014 6/10/2014
Why Datalog? k-object-sensitivity, k = 2, ~100KLOC 6 Programming Language Design and Implementation, 2014 6/10/2014
Limitation k-object-sensitivity, k = 10, ~500KLOC k-object-sensitivity, k = 2, ~100KLOC 7 Programming Language Design and Implementation, 2014 6/10/2014
Program abstraction Abstraction Precision Scalability 8 Programming Language Design and Implementation, 2014 6/10/2014
Parametric program abstraction 1 1 1 1 1 Abstraction Precision Scalability 9 Programming Language Design and Implementation, 2014 6/10/2014
Parametric program abstraction 1 0 1 1 0 Abstraction Precision Scalability 10 Programming Language Design and Implementation, 2014 6/10/2014
Parametric program abstraction: Example 1 Cloning depth K for each call site and allocation site 1 0 1 1 0 Pointer Analysis 11 Programming Language Design and Implementation, 2014 6/10/2014
Parametric program abstraction: Example 2 Predicates to use as abstraction predicates 1 0 1 1 0 Shape Analysis 12 Programming Language Design and Implementation, 2014 6/10/2014
Program abstraction 1 0 1 1 0 0 1 1 0 0 Datalog Program Datalog Program alias(p, q)? alias(m, n)? 13 Programming Language Design and Implementation, 2014 6/10/2014
Program abstraction 1 0 1 1 0 0 1 1 0 0 Counterexample guided refinement (CEGAR) via MAXSAT Datalog Program Datalog Program alias(p, q)? alias(m, n)? 14 Programming Language Design and Implementation, 2014 6/10/2014
Pointer analysis example f(){ v2 = id1(v1); v3 = id2(v2); q2:assert(v3!= v1); } g(){ v5 = id1(v4); v6 = id2(v5); q1:assert(v6!= v1); } v1 = new ...; v4 = new ...; id1(v){return v;} id2(v){return v;} 15 Programming Language Design and Implementation, 2014 6/10/2014
Pointer analysis as graph reachability 0 3 a0 b0 b1 a1 6 6 6 a0 b0 b1 a1 1 4 c1 d0 c0 d1 7 7 7 c0 d1 d0 c1 2 5 16 Programming Language Design and Implementation, 2014 6/10/2014
Graph reachability in Datalog 0 3 Input relations: edge(i, j, n), abs(n) a0 b0 b1 a1 6 6 6 Output relations: path(i, j) a0 b0 b1 a1 1 4 c1 d0 c0 d1 Rules: (1) path(i, i). (2) path(i, j) :- path(i, k), edge(k, j, n), abs(n). 7 7 7 c0 d1 d0 c1 2 5 Input tuples: edge(0, 6, a0), edge(0, 6 , a1), edge(3, 6, b0), abs(a0) abs(a1), abs(b0) abs(b1), abs(c0) abs(c1), abs(d0) abs(d1). Query Tuple q1: path(0, 5) assert(v6!= v1) q2: path(0, 2) assert(v3!= v1) Original Query 16 possible abstractions in total 17 Programming Language Design and Implementation, 2014 6/10/2014
Desired result 0 3 Input relations: edge(i, j, n), abs(n) a0 b0 b1 a1 6 6 6 Output relations: path(i, j) a0 b0 b1 a1 1 4 c1 d0 c0 d1 Rules: (1) path(i, i). (2) path(i, j) :- path(i, k), edge(k, j, n), abs(n). 7 7 7 c0 d1 d0 c1 2 5 Input tuples: edge(0, 6, a0), edge(0, 6 , a1), edge(3, 6, b0), abs(a0) abs(a1), abs(b0) abs(b1), abs(c0) abs(c1), abs(d0) abs(d1). Query Answer a1b0c1d0 Impossibility q1: path(0, 5) q2: path(0, 2) 18 Programming Language Design and Implementation, 2014 6/10/2014
Iteration 1 0 3 path(0, 0). path(0, 6) :- path(0, 0), edge(0, 6, a0), abs(a0). path(0, 1) :- path(0, 6), edge(6, 1, a0), abs(a0). path(0, 7) :- path(0, 1), edge(1, 7, c0), abs(c0). path(0, 2) :- path(0, 7), edge(7, 2, c0), abs(c0). path(0, 4) :- path(0, 6), edge(6, 4, b0), abs(b0). path(0, 7) :- path(0, 4), edge(4, 7, d0), abs(d0). path(0, 5) :- path(0, 7), edge(7, 5, d0), abs(d0). a0 b0 b1 a1 6 6 6 a0 b0 b1 a1 1 4 c1 d0 c0 d1 7 7 7 c0 d1 d0 c1 2 5 Query Eliminated Abstractions q1: path(0, 5) q2: path(0, 2) abs(a0) abs(a1), abs(b0) abs(b1), abs(c0) abs(c1), abs(d0) abs(d1). 19 Programming Language Design and Implementation, 2014 6/10/2014
Iteration 1 - derivation graph 0 3 a0 b0 b1 a1 6 6 6 a0 b0 b1 a1 1 4 c1 d0 c0 d1 7 7 7 c0 d1 d0 c1 2 5 Query Eliminated Abstractions q1: path(0, 5) q2: path(0, 2) abs(a0) abs(a1), abs(b0) abs(b1), abs(c0) abs(c1), abs(d0) abs(d1). 20 Programming Language Design and Implementation, 2014 6/10/2014
Iteration 1 - derivation graph abs(a0) path(0,0) edge(0,6,a0) abs(b0) abs(a0) edge(6,1,a0) path(0,6) edge(6,4,b0) abs(d0) edge(4,7,d0) abs(c0) edge(1,7,c0) path(0,1) path(0,4) abs(c0) edge(7,2,c0) edge(7,5,d0) abs(d0) path(0,7) path(0,2) path(0,5) 21 Programming Language Design and Implementation, 2014 6/10/2014
Iteration 1 - derivation graph abs(a0) path(0,0) edge(0,6,a0) abs(b0) abs(a0) edge(6,1,a0) path(0,6) edge(6,4,b0) abs(d0) edge(4,7,d0) abs(c0) edge(1,7,c0) path(0,1) path(0,4) abs(c0) edge(7,2,c0) edge(7,5,d0) abs(d0) path(0,7) path(0,2) path(0,5) a0 c0 22 Programming Language Design and Implementation, 2014 6/10/2014
Iteration 1 - derivation graph abs(a0) path(0,0) edge(0,6,a0) abs(b0) abs(a0) edge(6,1,a0) path(0,6) edge(6,4,b0) abs(d0) edge(4,7,d0) abs(c0) edge(1,7,c0) path(0,1) path(0,4) abs(c0) edge(7,2,c0) edge(7,5,d0) abs(d0) path(0,7) path(0,2) path(0,5) a0 c0d0 23 Programming Language Design and Implementation, 2014 6/10/2014
Iteration 1 - derivation graph abs(a0) path(0,0) edge(0,6,a0) abs(b0) abs(a0) edge(6,1,a0) path(0,6) edge(6,4,b0) abs(d0) edge(4,7,d0) abs(c0) edge(1,7,c0) path(0,1) path(0,4) abs(c0) edge(7,2,c0) edge(7,5,d0) abs(d0) path(0,7) path(0,2) path(0,5) a0b0 d0 24 Programming Language Design and Implementation, 2014 6/10/2014
Iteration 1 - derivation graph 0 3 a0 b0 b1 a1 6 6 6 a0 b0 b1 a1 1 4 c1 d0 c0 d1 7 7 7 c0 d1 d0 c1 2 5 Query Eliminated Abstractions a0 c0d0, a0b0 d0(4/16) a0 c0 (4/16) q1: path(0, 5) q2: path(0, 2) abs(a0) abs(a1), abs(b0) abs(b1), abs(c0) abs(c1), abs(d0) abs(d1). 25 Programming Language Design and Implementation, 2014 6/10/2014
Encoded as MAXSAT Hard Constraints Soft Constraints MAXSAT(??, ??,??, ,(??,??)): Find solution ?that Maximize ?? ? ??,1 ? ?} Subject to ? ?0 26 Programming Language Design and Implementation, 2014 6/10/2014
Encoded as MAXSAT Avoid all the counterexamples Hard constraints: ??? (0,0) (??? (0,6) ??? (0,0) ???(?0)) (??? (0,1) ??? (0,6) ???(?0)) (??? (0,7) ??? (0,1) ???(?0)) (??? (0,4) ??? (0,6) ???(?0)) Minimize the abstraction cost Soft constraints: (??? ?0 ?????? ?) (??? ?0 ?????? ?) (??? ?0 ?????? ?) (??? ?0 ?????? ?) ( ??? 0,2 ?????? ?) ( ??? 0,5 ?????? ?) abs(a0) abs(a1), abs(b0) abs(b1), abs(c0) abs(c1), abs(d0) abs(d1). 27 Programming Language Design and Implementation, 2014 6/10/2014
Encoded as MAXSAT Solution: ??? 0,0 = true,??? 0,6 = false, ??? 0,1 = false,??? 0,4 = false, ??? 0,7 = false,??? 0,2 = false, ??? 0,5 = false,?,?? 0,6 = 0, ??? ?0 = false,??? ?0 = true, ??? ?0 = true,??? ?0 = true. Hard constraints: ??? (0,0) (??? (0,6) ??? (0,0) ???(?0)) (??? (0,1) ??? (0,6) ???(?0)) (??? (0,7) ??? (0,1) ???(?0)) (??? (0,4) ??? (0,6) ???(?0)) Soft constraints: a1b0c0d0 (??? ?0 ?????? ?) (??? ?0 ?????? ?) (??? ?0 ?????? ?) (??? ?0 ?????? ?) ( ??? 0,2 ?????? ?) ( ??? 0,5 ?????? ?) Query Eliminated Abstractions a0 c0d0, a0b0 d0(4/16) a0 c0 (4/16) q1: path(0, 5) q2: path(0, 2) 28 Programming Language Design and Implementation, 2014 6/10/2014
Iteration 2 and beyond Iteration 1 Derivation ?? Constraints Datalog solver MAXSAT solver ?? ?? a1b0c0d0 Query Answer Eliminated Abstractions a0 c0d0, a0b0 d0, a1 c0d0 (4/16) a0 c0 , a1 c0 (4/16) q1: path(0, 5) q2: path(0, 2) 29 Programming Language Design and Implementation, 2014 6/10/2014
Iteration 2 and beyond Iteration 2 Derivation ?? Constraints Datalog solver MAXSAT solver ?? ?? a1b0c0d0 Query Answer Eliminated Abstractions a0 c0d0, a0b0 d0, a1 c0d0 a0 c0 , a1 c0 (4/16) q1: path(0, 5) q2: path(0, 2) (4/16) 30 Programming Language Design and Implementation, 2014 6/10/2014
Iteration 2 and beyond Iteration 2 Derivation ?? Constraints Datalog solver MAXSAT solver ?? ?? ?? a1b0c0d0 Query Answer Eliminated Abstractions a0 c0d0, a0b0 d0, a1 c0d0 (4/16) a0 c0 (4/16) q1: path(0, 5) q2: path(0, 2) 31 Programming Language Design and Implementation, 2014 6/10/2014
Iteration 2 and beyond Iteration 2 Derivation ?? Constraints Datalog solver MAXSAT solver ?? ?? ?? a1b0c1d0 Query Answer Eliminated Abstractions a0 c0d0, a0b0 d0, a1 c0d0 (6/16) a0 c0 , a1 c0 (8/16) q1: path(0, 5) q2: path(0, 2) 32 Programming Language Design and Implementation, 2014 6/10/2014
Iteration 2 and beyond Iteration 3 Derivation ?? Constraints Datalog solver MAXSAT solver ?? ?? ?? q1 is proven. a1b0c1d0 Query Answer Eliminated Abstractions a0 c0d0, a0b0 d0, a1 c0d0 a0 c0 , a1 c0 (8/16) q1: path(0, 5) q2: path(0, 2) (6/16) a1b0c1d0 33 Programming Language Design and Implementation, 2014 6/10/2014
Iteration 2 and beyond Iteration 3 Derivation ?? Constraints Datalog solver MAXSAT solver ?? ?? ?? ?? q2 is impossible to prove. q1 is proven. a1b0c1d0 Query Answer Eliminated Abstractions a0 c0d0, a0b0 d0, a1 c0d0 a0 c0 , a1 c0 , a1 c1 , a0 c1 (16/16) q1: path(0, 5) q2: path(0, 2) (6/16) a1b0c1d0 Impossibility 34 Programming Language Design and Implementation, 2014 6/10/2014
Mixing counterexamples Iteration 1 Iteration 3 Eliminated Abstractions: a0 c0 a1 c1 35 Programming Language Design and Implementation, 2014 6/10/2014
Mixing counterexamples Iteration 1 Mixed! Iteration 3 Eliminated Abstractions: a0 c0 a0 c1 a1 c1 36 Programming Language Design and Implementation, 2014 6/10/2014
Experimental setup Implemented in JChord using off-the-shelf solvers: Datalog: bddbddb MAXSAT: MiFuMaX Applied to two analyses that are challenging to scale: k-object-sensitivity pointer analysis: flow-insensitive, weak updates, cloning-based typestate analysis: flow-sensitive, strong updates, summary-based Evaluated on 8 Java programs from DaCapo and Ashes. 37 Programming Language Design and Implementation, 2014 6/10/2014
Benchmark characteristics classes 1K 1K 1.2K 1K 1.1K 1.3K 1.2K 1.9k methods 6K 6.5K 8K 7K 7.7K 7.9K 8K 12K bytecode(KB) 423 434 504 442 532 508 511 708 KLOC 258 265 326 283 303 295 314 460 toba-s javasrc-p weblech hedc antlr luindex lusearch schroeder-m 38 Programming Language Design and Implementation, 2014 6/10/2014
Results: pointer analysis 4-object-sensitivity < 50% queries abstraction size resolved iterations total current 7 46 5 47 143 138 322 51 baseline 0 0 2 6 5 67 29 25 final 170 470 140 730 970 1K 1K 450 max 18K 18K 31K 29K 29K 40K 39K 58K toba-s javasrc-p weblech hedc antlr luindex lusearch schroeder-m 7 46 5 47 143 138 322 51 10 13 10 18 15 26 17 15 < 3% of max 39 Programming Language Design and Implementation, 2014 6/10/2014
Performance of Datalog: pointer analysis k = 4, 3h28m Baseline k = 3, 590s k = 2, 214s lusearch k = 1, 153s 40 Programming Language Design and Implementation, 2014 6/10/2014
Performance of MAXSAT: pointer analysis lusearch 41 Programming Language Design and Implementation, 2014 6/10/2014
Statistics of MAXSAT formulae pointer analysis variables clauses 0.7M 1.5M toba-s javasrc-p weblech hedc antlr luindex lusearch schroeder-m 0.5M 0.9M 1.6M 3.3M 1.2M 2.7M 3.6M 6.9M 2.4M 5.6M 2.1M 5M 6.7M 23.7M 42 Programming Language Design and Implementation, 2014 6/10/2014
Conclusion Datalog MAXSAT Abstraction 43 Programming Language Design and Implementation, 2014 6/10/2014
Conclusion Datalog MAXSAT A(x, y):- B(x, z), C(z, y) Soundness Hard Constraints ? ?,? ? ?,? ?(?,?) Tradeoffs 1 0 1 1 0 Scalability vs. Precision Soft Constraints ??? ?0 ?????? ? Sound vs. Complete 44 Programming Language Design and Implementation, 2014 6/10/2014