
Continuous Distributed Counting for Non-monotonic Streams Overview
Explore the challenges and solutions in continuous distributed counting for non-monotonic streams, featuring algorithmic methodologies, functional monitoring, SUM tracking problems, related works, applications like network monitoring, and real-time monitoring examples.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Continuous Distributed Counting for Non-monotonic Streams Milan Vojnovic Microsoft Research Joint work with Zhenming Liu and Bozidar Radunovic Workshop Big Data in Digital Life , Paris, June 20-21, 2012
Big Data Algorithmic Challenges Massive scale of data May arrive in a continuous and distributed fashion Efficiency: time, space, communication A wide range of computational problems: Combinatorial Optimization Database Queries Machine Learning +1, -2, -2, Distributed computing system 2
Functional Monitoring Distributed streams: k sites connected to a coordinator Traditional streaming computation k 1 2 3 ?1 ?2 ?3 ?1 ?2 ?3 ?4 ?5 3
SUM Tracking Problem : 1 ? ?? ?? 1 + ? ?? Maintain estimate ?? k 1 2 3 ?1 ?2 ?3 ?4 ?5 ??= ? t ?? SUM: 4
Outline Applications Overview of Related Work Sum Tracking Algorithm for Non-Monotonic Randomized Streams Back to Applications Conclusion 5
Applications Network monitoring Frequency moments Machine learning 6
Ex 1: Real Time Monitoring A financial product traded in different markets Hedge funds want to track bid-offer queues in all the markets The queues change frequently (high frequency) k 1 2 3 ?1 ?2 ?3 ?4 ?5 7
Ex 2: ?2 Tracking ??= ??,??,?? [?], ?? 1,1 2(?) ??? = ? ?:??=??? ?2? = ? [?]?? Problem: track ?2? within a prescribed relative tolerance ? > 0 with high probability 8
AMS Sketch Maintain ?1 ?2 counters ?,?= ? ??? (??) = ? [?] ? ??? ?? ?,?2 1 ?1 ??? ???= ?2 estimate: ??= median(???) Hash function: : ? { 1,1} 9
Ex 3: Bayesian Linear Regression Feature vector ?? ?? Output ?? ? ??= ????+ ? 0,? 1, ?? Prior ? ? ?0,?0 Posterior ? ? ??,?? ?? 1?0+ ??? 1+ ??? ??? ??= ???0 ?? 1= ?0 ??? ? ??= ?1, ,?? 10
Outline Applications Overview of Related Work Sum Tracking Algorithm for Non-Monotonic Randomized Streams Back to Applications Conclusion 11
Key Notation Number of sites: ? Total length of stream: ? Multiplicative error guarantee 1 ? 1 ? ?? ?? 1 + ? ?? Worst-case input arrival Adversary decides when and where the next input arrives 12
Related Work Count tracking [Huang, Yi and Zhang, 2011] monotonic sum: all ?? either positive or negative Expected communication cost: ? ?log ? ? Lower bound for adversary value updates [Arackaparambil, Brody and Chakrabarti, 2009] Expected communication cost: (? ?) 13
Related Work (contd) coordinator St site Lower bound (single site case) Input:+1, 1,+1, 1,+1, 1, Sum: ?1= 1,?2= 0,?3= 1,?4= 0, . Suppose at ? = 4, the site does not report Need: 1 ? ?? ?? 1 + ? ?? ?4= 1, compared as ?4= 0 bad estimate 14
Outline Applications Overview of Related Work Sum Tracking Algorithm for Non-Monotonic Randomized Streams Back to Applications Conclusion 15
Our Work Relaxed adversarial setting Random input values Adversary assignment of values Random input streams Random permutation Random i. i. d. Fractional Brownian motion 16
Our Work (contd) An algorithm with sub-linear communication For drift defined as ? = E ??, the total communication cost: ? ?( ?min{1/|?|, ?}) Matching lower bounds Our algorithm is optimal 17
Our Tracker Algorithm ?? Sample based algorithm Use two monotonic counters ?/? Always report ? ?/? Sample based algorithm 1/(?2?) 18
Our Tracker Algorithm (contd) Each site reports to the coordinator upon receiving a value update ? with probability ? log?? ???2,1 ??= min Sync all whenever the coordinator receives an update from a site S1S S = S1+ + Sk S, S1 site coordinator Mi = 1 Xi Sk S S, Sk site 19
Design Intuition At time ?, the site sync with the coordinator Q: what is the laziest way to sync next time? A: sync before exist of the safe region Estimating using a random input model (ex. those that we used) Safe region 20
Fractional Brownian Motion Updates according to a fractional Brownian motion with Hurst parameter 1 2 ? 1/? Sampling probability: ??= min{??log1+? ? ?? Continual tracking within relative accuracy ? > 0 with high probability 2? ?,1} 3 ? 2 Expected communication cost: ?(min{? ?1 ?,?}) ? 21
Lower Bounds Single site i.i.d. Bernoulli input ? ??= 1 = 1 ? ??= 1 =1 Expected total communication cost: 2 (min{1 ?,?}) ? 1/? 1/? 22
Lower Bounds (contd) Multiple sites Input: i.i.d. Bernoulli ? ??= 1 = 1 ? ??= 1 =1 or a random permutation 2 Expected total communication: ? (min{ ?,?}) ? 23
Outline Applications Overview of Related Work Sum Tracking Algorithm for Non-Monotonic Randomized Streams Back to Applications Conclusion 24
F2 Frequency Moment 2(?) ??? = ? ?:??=??? ?2? = ? [?]?? Expected number of messages: ? ?( ?) ?2 25
Bayesian Linear Regression 1?0+ ??? 1+ ??? ??? ??= ???0 ?? 1= ?0 ??? ?? ?? Expected number of messages: ? ?(?2 ? log?) ? 26
Conclusion First results for sum tracking with non- monotonic distributed streams inputs Practical algorithms Matching lower bounds Implications to other computational problems 27
Some Open Questions Sliding windows? Other classes of random inputs? Other queries with non-monotonic distributed stream inputs? 28
Proof Key Ideas 0 ? 2? ?? (? + 1)? ? ?? + 1 ?? + 2 ?? (? + 1)? 1 2 k ? ?, ? ?, ??= ?(??? [ min ?? ,min ?? ]) Under ??= 1, maximum deviation ? ??? ? 29
Proof Key Ideas (contd) k-input problem: 1 2 k i. i. d. ?? ??= 1 = ?? ??= 1 =1 ?1 ?2 ?? 2 Query: H0: ???> ? or H1: ???< ? ? Answer: incorrect if | ???| > ? under H1 or ???< ? under H0 ? and the answer is ???> Lemma: (?) messages is necessary to answer the query correctly with a constant positive probability 30