Parallelization of Information Flow Queries and Dynamic Information Flow Tracking

jetstream cluster jetstream cluster scale n.w
1 / 38
Embed
Share

Explore the parallelization of information flow queries and dynamic information flow tracking techniques as discussed by Andrew Quinn, David Devecsery, Peter Chen, and Jason Flinn. Dive into the complexities, limitations, and strategies for optimizing these processes effectively.

  • Information Flow
  • Parallelization
  • Dynamic Tracking
  • Limitations
  • Optimization

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. JetStream: Cluster JetStream: Cluster- -scale Parallelization of scale Parallelization of Information Flow Queries Information Flow Queries Andrew Quinn, David Devecsery, Peter Chen and Jason Flinn

  2. Dynamic Information Flow Tracking DIFT instruments execution to track causality Also known as Taint-Tracking Sources (Inputs) Sinks (Outputs) 2

  3. DIFT for Debugging o1 o1 Server 3

  4. DIFT for Debugging Inputs o1 Server Outputs 4

  5. DIFT for Debugging Inputs o1 Server Outputs 5

  6. DIFT for Debugging Inputs o1 Server Outputs 6

  7. DIFT for Debugging Inputs o1 Server Outputs 7

  8. DIFT limitations Overheads ~100x Arnold 14 Long queries X-Ray 12 TaintDroid 10 No native code Backtracker 03 Coarse-grained causality 8

  9. Parallelize DIFT 9

  10. Parallelizing DIFT is HARD A = read() B = read() C = A + B D = X + Y E = C B = 0 Z = A[D] F = E write(F) Sequential Dependencies 10

  11. Parallelizing DIFT is HARD Speck (ASPLOS 08) A = read() B = read() C = A + B D = X + Y E = C B = 0 Z = A[D] F = E write(F) Sequential Dependencies Parallel Lifeguards (ASPLOS 08) 11

  12. Parallelizing DIFT is HARD Speck (ASPLOS 08) A = read() B = read() C = A + B D = X + Y E = C B = 0 Z = A[D] F = E write(F) Embarrassingly Sequential - Ruwase et al. Sequential Dependencies Parallel Lifeguards (APSLOS 08) 12

  13. JetStream 2x 21x Local DIFT epoch parallelism Faster than original execution! Aggregation pipeline parallelism 13

  14. Outline Design of JetStream Local DIFT Aggregation Evaluation 14

  15. Debugging Query Inputs Outputs 15

  16. Local DIFT Time slice execution into Epochs Inputs Outputs 16

  17. Local DIFT Leverage Record and Replay to calculate DIFT in parallel Inputs Outputs 17

  18. Local DIFT Track mapping between all intermediate locations Inputs Outputs 18

  19. Local DIFT Mapping is too expensive to calculate: log operations defer calculating relationships until aggregation Inputs Outputs 19

  20. DIFT Fast Forward: replay execution until start of epoch Analysis: log operations using a graph A D Fast Forward B C D = B + C C = A[D] Analysis A D B C 20

  21. DIFT Fast Forward: replay execution until start of epoch Analysis: log operations using a graph A D Fast Forward B C D = B + C C = A[D] Analysis A D B C 21

  22. DIFT Fast Forward: replay execution until start of epoch Analysis: log operations using a graph A D Fast Forward B C D = B + C C = A[D] Analysis A D B C 22

  23. Local DIFT output Inputs Outputs 23

  24. JetStream Local DIFT epoch parallelism Aggregation pipeline parallelism 24

  25. Aggregation Calculate paths between source and sinks Many nodes are not on path between source and sink Use sequential information to prune work Inputs Outputs 25

  26. Forward Pass Pass locations which are derived from a source Inputs derived locations Outputs 26

  27. Forward Pass Pass locations which are derived from a source Inputs derived locations Outputs 27

  28. Backward Pass Pass locations which are used by a sink Inputs used locations Outputs 28

  29. Backward Pass Pass locations which are used by a sink Inputs used locations Outputs 29

  30. JetStream In the paper: Insights about why na ve approaches fail Partitioning challenging to predict the local DIFT time Pre-pruning garbage collection of the graph 30

  31. Outline Evaluation 31

  32. Experimental Setup CloudLab cluster of 32 machines, 1 128 cores 32

  33. Experimental Setup CloudLab cluster of 32 machines, 1 128 cores sources: home directory sinks: all sources: Cookies sinks: suspicious connections 33

  34. Two Different Scenarios Unexpected analysis: prioritize low record overhead Expected analysis periodic checkpoint gather partitioning stats 34

  35. Scalability of Unexpected Analysis mean: 13x 100 Normalized Spedup 10 1 1 10 100 Number of Cores Ghostscript OpenOffice Gzip Nginx Evince Firefox Mongodb Ideal 35

  36. Scalability of Expected Analysis mean: 21x 100 Normalized Speedup 10 1 1 10 100 Number of Cores Ghostscript OpenOffice Gzip Nginx Evince Firefox Mongodb Ideal 36

  37. JetStream 2x 21x Local DIFT epoch parallelism Faster than original execution! Aggregation pipeline parallelism 37

  38. Questions 38

More Related Content