The Marriage of Incremental and Approximate Computing

The Marriage of Incremental and Approximate Computing
Slide Note
Embed
Share

This article explores the fusion of incremental and approximate computing paradigms, focusing on their trade-offs and benefits. By computing over subsets of data items and reusing memoized parts, these paradigms offer efficiency gains in time and resources. Learn how incremental computation reruns applications over evolving inputs, while approximate computation generates good enough outputs by computing only selected parts of inputs. The basic idea lies in biased sampling, selecting input items for which memoized results exist. The study delves into motivation, design, and evaluation of these innovative computing approaches.

  • Incremental computing
  • Approximate computing
  • Efficiency gains
  • Data analysis
  • Computing paradigms

Uploaded on Mar 17, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. IncApprox The marriage of incremental and approximate computing Pramod Bhatotia Dhanya Krishnan, Do Le Quoc, Christof Fetzer, Rodrigo Rodrigues* (TU Dresden & *IST Lisbon)

  2. Data analytics systems Raw data Information 2

  3. Massive scale Big data systems High throughput Low latency 3

  4. To strike a balance Tension Low latency High throughput Novel computing paradigms 4

  5. How do these computing paradigms make this trade-off? Observation: Compute over a sub-set of data items instead of the entire data-set! Take less time and resources for computation 5

  6. Two such computing paradigms Inc Approx Approximate computing Incremental computing 6

  7. Incremental computation Common workflow: Rerun the same application over evolving input Application Small Incrementally updated output changed input Incremental updates: Reuse memoized parts of the computation that are unaffected by the changed input 7

  8. Approximate computation Common use-case: Approximate output is good enough! Application Approximate output Input Approximate output: Compute only parts of the input selected by representative sampling 8

  9. Basic idea Incremental computation Approximate computation Both paradigms compute over a sub-set of data items ! by the changed input Affected Selected by the input sampling IncApprox Biased sampling: Select input items for which we already have memoized result from previous runs 9

  10. Outline Motivation Design Evaluation 10

  11. Overview of IncApprox Streaming query Query budget (Latency or resource constraints) IncApprox Input Approximate output data stream Incremental computing Approximate computing + Query budget provides adaptive execution interface to systematically tune b/w latency & throughput! 11

  12. Computation model Batched stream processing Input data stream Computation window M M M R R For each sliding window M M M R R M M Output M Input Run a data-parallel job 12

  13. High-level approach Step #1 Step #2 Step #3 Stratified sampling Biased sampling Run job incrementally Computation input window Approximate output 13

  14. #1: Stratified sampling Step #1 Step #2 Step #3 Stratified sampling Biased sampling Run job incrementally Computation input window Approximate output 14

  15. #1: Why stratified sampling? Sub-streams S1 Stream aggregator (Kafka) S2 Stream processing system Input stream Sn Need proportional allocation of data-items for all sub-streams Disparate events with different distributions Different arrival rates Sub-streams: 15

  16. #1: Stratified sampling in IncApprox Query budget Sub-streams Computation window for the input stream Sample size S1 Stream aggregator (Kafka) S2 IncApprox Sn Stratified reservoir sampling (see the paper for details) 16

  17. #2: Biased sampling Step #1 Step #2 Step #3 Stratified sampling Biased sampling Run job incrementally Computation input window Approximate output 17

  18. #2: Why biased sampling? Input data stream Window at T1 Window at T2 Overlap Successive overlapping computation windows provide an opportunity to reuse result 18

  19. #2: Biased sampling in IncApprox Adaptive budget / Sample size T1 T2 IncApprox Overlapping windows w/ fluctuating arrival rates Biased sampling (see the paper for details) 19

  20. #3: Run job incrementally Step #1 Step #2 Step #3 Stratified sampling Biased sampling Run job incrementally Computation input window Approximate output 20

  21. #3: Why incremental run? old new old old new old old new Computation window (with old and new data-items) To reuse results: Need for automatic and efficient mechanism Design and implement Dynamic algorithms to incrementally update the output 21

  22. #3: Incremental run in IncApprox Self-adjusting computation Change in a data item Window M M M M M Dependence graph Change propagation R R R R R (see the paper for details) 22

  23. Outline Motivation Design Evaluation 23

  24. Evaluation Performance gains of IncApprox 1. Twitter stream analytics 2. Network monitoring See the paper for more results! Implementation Apache Spark Streaming Platform 24 nodes distributed computing cluster 24

  25. Performance gains 2000 Network monitoring Twitter analytics Higher the better 1800 1600 (# of flows/ # of tweets) 1400 Events (K) 1200 1000 800 600 400 200 0 Spark-Streaming Inc Approx IncApprox 2X over native Spark Streaming 1.4X over individual Inc & Approx modules 25

  26. Summary: IncApprox A data analytics system for incremental approximate computing Transparent: Targets existing applications w/o any code changes Practical: Supports adaptive execution based on the query budget Efficient: Employs a mix of Inc & Approx computing paradigms 26

  27. IncApprox Transparent + Practical + Efficient IncApprox also provides error estimation approximate output = output error-estimate See the paper for details! Thank you! www.mpi-sws.org/~bhatotia 27

Related


More Related Content