Streaming Algorithms (Part 1)

Streaming Algorithms  (Part 1)
Slide Note
Embed
Share

Streaming algorithms are essential for efficiently processing continuous data streams without the ability to store entire input data. They provide solutions for handling large volumes of data sequentially, optimizing time and space complexities. Explore various aspects of streaming algorithms such as data stream analysis, handling distinct elements, graph stream algorithms, and practical examples of their applications in diverse fields like databases, routers, and satellites.

  • Streaming Algorithms
  • Data Streams
  • Efficiency
  • Sequential Processing
  • Space-Time Tradeoffs

Uploaded on Feb 15, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Streaming Algorithms (Part 1) Nitzan Weissman 1

  2. Overview What is a streaming algorithm? Data stream algorithms: Finding Maximum Counting distinct elements Graph Stream algorithms: Insert-only streams- spanners Sliding window- connectivity 2

  3. Part 1 What is a streaming algorithm? 3

  4. Data streams A data stream is a (very big) amount of data that cannot be stored. Most algorithms usually require for the whole input to be available, we don t have that luxury. 4

  5. Approaches Ignore it. Construct algorithms for handling with such data. Streaming algorithms are algorithms for processing data streams. 5

  6. Examples Routers Databases Satellites Facebook actions 6

  7. Two characteristics of the model The input data can be accessed only sequentially in the form of a stream. The working memory is considerably smaller than the size of the entire input stream. 7

  8. Streaming algorithms analysis We look at three different parameters: Time complexity for each element Assume we can control it Overall time complexity (if exists) Space complexity 8

  9. Streaming algorithms analysis Different algorithms give different space-time tradeoffs Algorithms may use approximations 9

  10. Part 2 Data stream algorithms: Finding Maximum Counting distinct elements 10

  11. Our world = 1 2... a a a - the data stream If the stream is finite, is the last element n a - the domain {1,2,..., } m Only when handling with numbers - approximation factor 11

  12. Warming up- finding max Input A finite data stream of numbers (in a random order) Output The maximum element in the stream 12

  13. Its trivial! max 0 For each i in : If : max {1,..., } n max ia ia Return max 13

  14. Analysis Time complexity for each element- (log ) O m Overall time complexity- ( log ) O n m Space complexity- (log ) O m This algorithm is not probabilistic! 14

  15. A faster max algorithm max 0 For each i in : With probability skip (go to next i) If : max {1,..., } n 1 1 n max ia ia max n + m Return m 1 15

  16. Is it even better? Time complexity for each element- 1 1 log (1 ) 0 log ( m O n n n m + = ) Overall time complexity- ( log ) O n m Space complexity- + (log log ) O m n So is it strictly better? 17

  17. Counting distinct elements Input A finite data stream of numbers Output How many distinct elements are in the stream? The answer will be probabilistic 18

  18. A better algorithm Pick a random hash function . :{1,..., } h m [0,1] hash min_ 1 For each i in : Calculate . If : min_ hash 1 min_hash {1,..., } ( ) i h a n ( ) h a min_ hash i ( ) h a i Return . 20

  19. Why does it work? Assuming that h has a uniform continuous distribution If there are t distinct elements, they are uniformly distributed by h, so smallest of them should be ~ . 1 t 21

  20. Analysis Time complexity for each element- (log ) O m Overall time complexity- ( log ) O n m Space complexity- (log ) O m 22

  21. Part 3 Graph Stream algorithms: Insert-only streams- spanners Sliding window- connectivity 23

  22. Our world = = = - A graph with no edges, ( , V E ) | | V n G In insert-only streams, we have: = { , e e ,..., } S e - A stream that defines the edges in the graph. 1 2 m 24

  23. Finding a spanner Given a graph G, we say that a subgraph H is a c-spanner for G if for all , , u v V ( , ) u v ( , ) u v ( , ) u v d d c d G H G Goal: find a c-spanner after one pass (while minimizing number of edges) 25

  24. Algorithm for c-spanner H {} For each : If : H ( , ) u v S ( , ) u v d c {( , )} u v H H Return H 26

  25. How does it work? On the board. 27

  26. Proof of correctness Claim: for each , , u v V ( , ) u v ( , ) u v ( , ) u v d d c d G H G Proof: the first inequality follows from the fact the H is a subgraph of G. The second inequality will be proved with induction. Our parameter- . ( , ) u v d G 28

  27. Induction (base) If , it means . ( , ) 1 G d u v = ( , ) u v S When the loop arrived to the edge : If it was added to H, became 1. Otherwise, it wasn t added to H because at that time, was . ( , ) H d u v ( , ) u v ( , ) u v d H c During the course of the algorithm, edges are only added to H. So the distance between edges can only decrease. In conclusion, after one pass over S ( , ) u v d c H 29

  28. Induction- cont. Assume for each s.t . ( , ) G d u v t , u v V ( , ) u v ( , ) u v d c d H G For with , the path from u to v in G is . = + , u v V ( , ) u v u 1 d t , ,..., , u u G = = u u v + 0 1 1 t t For , so by the induction hypothesis . ( , ) H n d u u ct ( , u u ) , d t u u V G n n are neighbors so . , ( , ) d u v c n u v V H n Conclusion: . + ( , ) u v ( 1) d c t H 30

  29. Analysis Overall time complexity- slow 2 + Space complexity- + 1 ( ) O n 1 c 31

  30. One last world- sliding window = = = - A graph with no edges, ( , V E ) | | V n G = - An infinite stream of edges. { , e e ,...} S 1 2 At time t we only consider the graph whose edge set consists of the last w edges, . = { ,..., } W t w e + e 1 t 32

  31. K-edge connectivity A graph is k-connected if every cut in the graph includes at least k edges. Goal: testing whether a graph is k- edge connected (k is given). 33

  32. Algorithm for k-connectivity Maintain k sets of edges , along with their time of arrival. ( ,..., ) E E 1 k For every new edge : Add the edge to . If it completes a cycle in , remove the oldest edge in the cycle and add it to . If it completes a cycle in , remove the oldest edge in the cycle and add it to If it completes a cycle in , remove the oldest edge in the cycle. ( , ) u v S 1 E ( , V E ) 1 E 2 ( , V E ) 2 E 3 ( , V E ) k 34

  33. Algorithm- cont. When k-connectivity check is needed: Remove all old edges from . Check if is k-connected 1 i k ( ,..., ) E E 1 k ( , V ) E i 35

  34. How does it work? On the board 36

  35. Proof of correctness If there are k or more edges among the last w edges across a cut, F will include the k most recent of these edges. 37

  36. Space complexity - much better than . ( log ) O kn n ( log ) O m n 38

  37. Conclusion 39

Related


More Related Content