Matrix Sketching over Sliding Windows for Streaming Data Optimization

optimal matrix sketching over sliding windows n.w
1 / 17
Embed
Share

Explore the concept of matrix sketching in sliding window and streaming models to optimize handling vast data streams efficiently with sublinear space or time complexity. Learn about the challenges, motivations, and techniques involved in this cutting-edge approach.

  • Matrix Sketching
  • Sliding Windows
  • Streaming Data
  • Optimization
  • Data Streams

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Optimal Matrix Sketching over Sliding Windows Hanyan Yin1, Dongxie Wen1, Jiajun Li1, Zhewei Wei1*, Xiao Zhang1, Zengfeng Huang2, Feifei Li3 1Renmin University of China 2Fudan University 3Alibaba Group Aug 2024 *Zhewei Wei is the corresponding author. 0

  2. Streaming Scenario & Sliding Window Model Streaming data scenario encounters vast volume of data streams with rapid throughput rate. (sensor data/network logs) Computational/storage resources are still limited today. Algorithms for streaming data provide approximate solutions with sublinear space or time complexity relative to the input size 1

  3. Matrix sketching over sliding windows Matrix sketching problem: approximate large matrix ? ? ? with ? ?, ?, in an online row-update fashion. Covariance error [Liberty 2013, Ghashami 2014, Woodruff 2016]: cova_err ?,? = ? ? ? ?2/ ?? Sliding window model: consider the most recent ? rows or arrived in a past time . 2 ? E.g. recent 1k posted tweets, network logs in last 24 hours ? ? ? ? ? ? ? ? 1 ? 1 ? ? ? Streaming model Sliding window model 2

  4. Motivations Matrix sketching in sliding window model is not optimal. Matrix sketching in streaming model is optimal. [Liberty 2013] Matrix sketching in sliding window model was NOT optimal yet. Lower bound. The space complexity of our method aligns with the lower bound, thus is optimal. ?2 ?3 ? ? ?2log1 ? ? d ?2 ? ? ?log1 O EXPONENTAIL HISTOGRAM EXPONENTAIL HISTOGRAM DYADIC INTERVAL DYADIC INTERVAL HASHING HASHING RANDOM PROJECT RANDOM PROJECT SAMPLING SAMPLING ? ? ? ? ? streaming ? LM LM- -FD DI DI- -FD FD FD DS DS- -FD FD FD FD SLIDING WINDOW LB SLIDING WINDOW LB This work This work STREAMING LB STREAMING LB 2013 2013 2016 2016- - 2016 2016 3

  5. Dump snapshot Consider sequence-based & normalized sliding window model. Sequence-based: each update occupies a timestamp. ? 2= 1. Normalized: the norm of each row ?? 2 Maintain a sketch matrix ? and a queue ? of the dumped snapshots. ? Time 4

  6. Dump snapshot Consider sequence-based & normalized sliding window model. Sequence-based: each update occupies a timestamp. ? 2= 1. Normalized: the norm of each row ?? 2 Receive a row update. Pop the oldest snapshot out of ? if it expires. ? ?? Time 5

  7. Dump snapshot Consider sequence-based & normalized sliding window model. Sequence-based: each update occupies a timestamp. ? 2= 1. Normalized: the norm of each row ?? 2 ? ??. ? Perform SVD on ?? = ? Time ? ? 6

  8. Dump snapshot Consider sequence-based & normalized sliding window model. Sequence-based: each update occupies a timestamp. ? 2= 1. Normalized: the norm of each row ?? 2 ? ??. ? Perform SVD on ?? If the top singular value ?1> ??, push a snapshot (?1 ?1,?) into the queue ?. = ? Time ? ? 7

  9. Dump snapshot Consider sequence-based & normalized sliding window model. Sequence-based: each update occupies a timestamp. ? 2= 1. Normalized: the norm of each row ?? 2 Set ?? as the new sketch ?. ? Time 8

  10. Analysis Theorem 3.1 The algorithm returns a sketch matrix ??if we set the maximum number of row of ? to be = min 1 ?,? , then ??2 ? ? ? ?? ?,? ?? ?,? ?? ? ? The algorithm uses ? ? ? + 3time. space and processes each update in 9

  11. Unnormalized rows & time-based window Unnormalized rows: ?? 1,? Time-based window: ?? 0 1,? ?? ??, window length [?,??] Regard the arrival of ??as ?? Current Window Level log? ? = ??? Sequence-based: 1 ?log? . . . . . . . . . ? Level 2 ? = 4?? Time-based: 1 ?log??? Level 1 ? = 2?? Level 0 ? = ?? ? Time ? ? ? Expired Snapshot Dropped Snapshot Saved Snapshot 10

  12. Space Lower bound What is the minimum space complexity to maintain the matrix sketching over sliding window with ? ? ? ? ? ?? Construct challenging adversarial input against the algorithms 2? Choose from 2? matrics Divide the window into log? blocks Adjust the rows and norms Only 0 vector will arrive Block log? ?/4 Block 0 /4 later ?/4 ?/8 /2 /4 /4 /4 ?-dim ?? ?? 2 ? ? ? ? 2? 2? ? 2?? ?? ? ? ? 1 1 1 1 1 1 1 1 1 ? ? + 1 ? ? + ?/4 Sliding Window 1 Sliding Window 2 2 1 ? ?? ? 1 ?? ? ?? ? 1 ?? ?? ?? Require ? bits [Theorem 4.1, Ghashami et al. 2015] ?? 2 1 3l 1 3l ? 4 2?+1+3? ? 4 2?+3? ?? ?? ? ? ?? ? Sliding Window 1 : ?? l 4 2 ? 1 ? 1 ?? Sliding Window 2 : ?? 2 11 4

  13. Experiments Sampling: 2[Frieze2004]. Sample ?? w.p. proportional to ?? Priority sampling [Efraimidis2006] + Sliding window top-k. LM-FD: Exponential Histogram (Logarithmic method) [Datar2002] + Frequent Directions. DI-FD: Dyadic interval techniques [Arasu2004] + Frequent Directions. Sketches Update Space ? ?2log?? ? ?2log??? ?? ?log? ? ?log??? Window ? ?2loglog?? Sampling Sequence & time LM-FD Sequence & time ?log??? ? ?log? DI-FD Sequence ? ? ? ?+1 DS-FD(Our Work) Sequence & time ?3log??? 12

  14. Experiments: space vs. error (a) SYNTHETIC (b) BIBD (c) PAMAP (d) RAIL (d) YEAR DS-FD achieves the best space-error trade-off. DS-FD achieves the best update-query time cost trade-off. 13

  15. Applications of Matrix Sketching Sketched linear contextual bandit [Kuzborskij 2019, Chen 2020] Second-order online learning gradient descent: Online Newton Step ??+1= ?? 1 ? ??+1 ?? 1?? ??= ?? 1+ ???? ??+1= arg min ??? ? ??+1 time: ?2 Sketched Online Newton Step [Luo 2016] 1 ??= FD ?? 1,?? ??+1= ?? 1 ??+1= ??+1 ??(??+1 ?? time: ? ?? ??= ???+ ???? ??? ?? ?????? ??????+1) 14

  16. Conclusion We introduce DS-FD, a deterministic algorithm that achieves better space complexity for matrix sketching over sliding windows. We solve the open question of the lower bounds for any deterministic matrix sketching algorithm over sliding windows. The answer to the open problem confirms that our DS-FD algorithm is optimal in terms of space complexity. 15

  17. Thanks! 16

More Related Content