Simultaneous Approximation of Multiple Functions over Distributed Streams

https www flickr com photos pasukaru76 5597863793 n.w
1 / 24
Embed
Share

Explore the efficient computation of multiple functions over distributed streams in a dynamic data environment to avoid centralizing and re-running costs. This research highlights the challenges and solutions for processing distributed streams, emphasizing the importance of simultaneous approximation for enhanced efficiency.

  • Distributed Streams
  • Stream Processing
  • Functions
  • Data Mining
  • Efficiency

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. https://www.flickr.com/photos/pasukaru76/5597863793 One for All and All for One Simultaneous Approximation of Multiple Functions over Distributed Streams Arnon Lazerson, Moshe Gabel, Assaf Schuster -Technion I.I.T. , Daniel Keren -Haifa University This research was supported by the European Union's Horizon 2020 Research And Innovation Programme under grant agreements no. 688380 and 727277-2.

  2. Distributed Stream Processing Stream mining But, streams are Often distributed Distributed algorithms (graph algs./model fitting/learning ) But, data is not static Centralizing and re-running is very expensive We must be smarter! ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 2

  3. Distributed Stream Processing Space efficient Time efficient Communication efficient Battery lifetime Network congestion Computation bottleneck at central server Smartphones WSN ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 3

  4. Distributed Stream Processing Multiple sources/nodes/sensors Nodes see different data streams Data is dynamic Can t store or centralize Nodes communicate over network Goal: efficiently compute ? ? ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 4

  5. Distributed Model ?? ? Output ? ? Coordinator Nodes ?1 ?2 ?3 ?? Streams ?1 ?2 ?3 ?? ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 5

  6. Function Over the Average Distributed Histogram Distributed Graph ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 6

  7. Single Function Approximation Approximate a function over distributed streams Tradeoff accuracy for efficiency Early work contains special-purpose algorithms Sampling, sketching, local constraints Geometric monitoring General approach - can track any function of the average of streams ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 7

  8. Geometric Monitoring Admissible region: ? = ? ? ? ? Safe zone ? is a convex subset of ?. ?? ? Convexity: ?1,?2 ?? ? ? = ? ?. ? = {?|?(?) ?} ? ?2 ?1 ?= ?? ? ?3 Node ? is silent as long as ?? ?. ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 8

  9. Tracking a Single Function Poll nodes and compute ?0(? at t=0) Output ?(?0) As long as ?(?0) ? ? ?(?0) + Do nothing Restart algorithm Two threshold crossing problems ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 9

  10. Tracking Multiple Functions Centralize and compute Inefficient even for a single function No per-function overhead Independently track each function Each function is tracked efficiently But, each function adds to the communication cost Can we do better? ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 10

  11. Multiple Functions with GM Multiple functions require multiple admissible regions. Use ?? the common admissible region Intersection of individual admissible regions. ? ?? all approximations hold ? ?,? = ? ?2 0.8, g ?,? = ?2+ ?2 1, ?? = 1 ?? = 0.5 ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 11

  12. Common Admissible Region ? ?,? = ?2 ? 0.4, ?? = 1 ? ?,? = ?2 ?, ?? = 1 ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 12

  13. Common Admissible Region ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 13

  14. Safe Zones ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 14

  15. Common Safe Zone - CSZ ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 15

  16. Evaluation Methodology Applied Common safe zone & independent tracking Counted messages Cost is reported as ratio to naive Expect CSZ to be better than indep ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 16

  17. Tracking PCA Scores KDDCUP-1999 dataset TCP connection data 37 features (duration, protocol, bytes sent, ) Nodes maintain ? ? scatter matrices (?=37) ? ?=1 ?=1 ?? ?? , for ? = 1 ? Track top ? PCA scores: ??= ? ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 17

  18. Comm. Cost for PCA Scores = 0.2 = 0.01 ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 18

  19. Tracking Regression and Condition Number Intel Lab dataset 16 sensors measuring humidity, temperature light and voltage Track (Generalized) Least Squares Model Humidity as a function of temperature light and voltage Track Condition number Metric for the stability of the linear system = ?1 ?,? ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 19

  20. Regression and Condition Number Communication Costs ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 20

  21. Scalability ??? = 0.1 ????? = 0.1 ??? = 0.05 ????? = 0.1 ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 21

  22. One for All and All for One One for all One SZ forall functions All for one Tracking all function for (nearly) the price of one ONE FOR ALL AND ALL FOR ONE: SIMULTANEOUS APPROXIMATION OF MULTIPLE FUNCTIONS OVER DISTRIBUTED STREAMS - DEBS 2017 22

  23. Thank you! Questions MONITORING DISTRIBUTED STREAMS USING CONVEX DECOMPOSITIONS VLDB 2015 23

  24. Function Correlations MONITORING DISTRIBUTED STREAMS USING CONVEX DECOMPOSITIONS VLDB 2015 24

Related


More Related Content