
Parallelization of Information Flow Queries and Dynamic Information Flow Tracking
Explore the parallelization of information flow queries and dynamic information flow tracking techniques as discussed by Andrew Quinn, David Devecsery, Peter Chen, and Jason Flinn. Dive into the complexities, limitations, and strategies for optimizing these processes effectively.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
JetStream: Cluster JetStream: Cluster- -scale Parallelization of scale Parallelization of Information Flow Queries Information Flow Queries Andrew Quinn, David Devecsery, Peter Chen and Jason Flinn
Dynamic Information Flow Tracking DIFT instruments execution to track causality Also known as Taint-Tracking Sources (Inputs) Sinks (Outputs) 2
DIFT for Debugging o1 o1 Server 3
DIFT for Debugging Inputs o1 Server Outputs 4
DIFT for Debugging Inputs o1 Server Outputs 5
DIFT for Debugging Inputs o1 Server Outputs 6
DIFT for Debugging Inputs o1 Server Outputs 7
DIFT limitations Overheads ~100x Arnold 14 Long queries X-Ray 12 TaintDroid 10 No native code Backtracker 03 Coarse-grained causality 8
Parallelizing DIFT is HARD A = read() B = read() C = A + B D = X + Y E = C B = 0 Z = A[D] F = E write(F) Sequential Dependencies 10
Parallelizing DIFT is HARD Speck (ASPLOS 08) A = read() B = read() C = A + B D = X + Y E = C B = 0 Z = A[D] F = E write(F) Sequential Dependencies Parallel Lifeguards (ASPLOS 08) 11
Parallelizing DIFT is HARD Speck (ASPLOS 08) A = read() B = read() C = A + B D = X + Y E = C B = 0 Z = A[D] F = E write(F) Embarrassingly Sequential - Ruwase et al. Sequential Dependencies Parallel Lifeguards (APSLOS 08) 12
JetStream 2x 21x Local DIFT epoch parallelism Faster than original execution! Aggregation pipeline parallelism 13
Outline Design of JetStream Local DIFT Aggregation Evaluation 14
Debugging Query Inputs Outputs 15
Local DIFT Time slice execution into Epochs Inputs Outputs 16
Local DIFT Leverage Record and Replay to calculate DIFT in parallel Inputs Outputs 17
Local DIFT Track mapping between all intermediate locations Inputs Outputs 18
Local DIFT Mapping is too expensive to calculate: log operations defer calculating relationships until aggregation Inputs Outputs 19
DIFT Fast Forward: replay execution until start of epoch Analysis: log operations using a graph A D Fast Forward B C D = B + C C = A[D] Analysis A D B C 20
DIFT Fast Forward: replay execution until start of epoch Analysis: log operations using a graph A D Fast Forward B C D = B + C C = A[D] Analysis A D B C 21
DIFT Fast Forward: replay execution until start of epoch Analysis: log operations using a graph A D Fast Forward B C D = B + C C = A[D] Analysis A D B C 22
Local DIFT output Inputs Outputs 23
JetStream Local DIFT epoch parallelism Aggregation pipeline parallelism 24
Aggregation Calculate paths between source and sinks Many nodes are not on path between source and sink Use sequential information to prune work Inputs Outputs 25
Forward Pass Pass locations which are derived from a source Inputs derived locations Outputs 26
Forward Pass Pass locations which are derived from a source Inputs derived locations Outputs 27
Backward Pass Pass locations which are used by a sink Inputs used locations Outputs 28
Backward Pass Pass locations which are used by a sink Inputs used locations Outputs 29
JetStream In the paper: Insights about why na ve approaches fail Partitioning challenging to predict the local DIFT time Pre-pruning garbage collection of the graph 30
Outline Evaluation 31
Experimental Setup CloudLab cluster of 32 machines, 1 128 cores 32
Experimental Setup CloudLab cluster of 32 machines, 1 128 cores sources: home directory sinks: all sources: Cookies sinks: suspicious connections 33
Two Different Scenarios Unexpected analysis: prioritize low record overhead Expected analysis periodic checkpoint gather partitioning stats 34
Scalability of Unexpected Analysis mean: 13x 100 Normalized Spedup 10 1 1 10 100 Number of Cores Ghostscript OpenOffice Gzip Nginx Evince Firefox Mongodb Ideal 35
Scalability of Expected Analysis mean: 21x 100 Normalized Speedup 10 1 1 10 100 Number of Cores Ghostscript OpenOffice Gzip Nginx Evince Firefox Mongodb Ideal 36
JetStream 2x 21x Local DIFT epoch parallelism Faster than original execution! Aggregation pipeline parallelism 37
Questions 38