
Space-Efficient Range Queries and Data Structures
Explore efficient algorithms and data structures for handling range queries in arrays, focusing on minimizing space usage while providing fast query responses. Topics include encoding range queries, typical data structure preprocessing, and succinct data structure design. Discover how to preprocess input data efficiently and optimize encoding approaches for smaller auxiliary data structures.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
PaweGawrychowski* and Pat Nicholson** *University of Warsaw **Max-Planck-Institut f r Informatik
Range Queries in Arrays Input: an array ?[1..?] Preprocess the array to answer queries of the form Given a range [?,?] find _____ in the subarray ?[?..?] Where ______ is something like: the index of the maximum/minimum element the index of the top-? values the index of the ?-th largest/smallest number find the maximum sum range ? ,? [?,?]
Encoding Range Queries in Arrays How much space do we need to answer these queries? As an example, think of range min. queries (RMinQ): If we return the value of min, then we must store the array. Why? Because we can ask the query [?,?] for each ? [1,?] This allows us to recover the entire array If we return just the array index, then we can do much better. There is a succinct data structure that occupies 2? + ?(?) bits, and answers queries in constant time. Fischer and Heun (2011)
Typical Data Structure Input Data (Relatively Big)
Typical Data Structure Input Data (Relatively Big) Data Structure Preprocess
Encoding Approach Input Data (Relatively Big)
Encoding Approach Preprocess w.r.t. Some Query Encoding (Hope: much smaller) Input Data (Relatively Big)
Encoding Approach Encoding (Hope: much smaller)
Encoding Approach Auxiliary Data Structures: (Should be smaller still) Encoding (Hope: much smaller)
Encoding Approach Succinct Data Structure: Minimum Space Possible Auxiliary Data Structures: (Should be smaller still) Encoding (Hope: much smaller)
Encoding Approach Succinct Data Structure: Minimum Space Possible Auxiliary Data Structures: (Should be smaller still) Encoding (Hope: much smaller) Query (Hope: as fast as non- succinct counterpart)
This Talk: Maximum-Sum Segments From Jon Bentley s Programming Pearls : Input: an array ?[1..?] containing arbitrary numbers Output: the range [?,?] s.t. ?=? Only non-trivial if array contains negative numbers Can be solved in linear time (credited to Kadane) Applications: Bentley [1986]: [problem] is a toy it was never incorporated into a system. Chen and Chao [2004]: plays an important role in sequence analysis. ? ?[?] is maximized We focus on the range query case: Find range ? ,? [?,?] s.t. ?=? Also motivated by biological sequence analytics applications ? ?[?] is maximized
Range Maximum-Sum Segment Queries What was known: Chen and Chao [ISAAC 2004, Disc. App. Math. 2007] This can be done in (?)words of space and (1)time Very closely related to the range maximum problem: RMSSQ RMaxQ: Pad elements with large negative numbers RMinQ/RMaxQ RMSSQ: More complicated argument
Range Maximum-Sum Segment Queries What was not known: Is there an efficient encoding structure for this problem? That is: can we beat (?) words Solution stores several (?) word arrays; compares various array elements Really similar to RMaxQ
Range Maximum-Sum Segment Queries Our main results: We can encode these queries using (?) bits (Rest of this talk) I. II. A space lower bound of 1.89113? bits (Enumeration argument using methods from: ) III. Application to computing ?-covers (? disjoint subranges that achieve the maximum sum) Cs r s: The problem arises in DNA and protein segmentation, and in postprocessing of sequence alignments.
Main Idea: (?) word solution Define an array ? consisting of the partial sums of ?
Main Idea: (?) word solution Imagine shooting a ray from each ? ? to the left
Main Idea: (?) word solution Imagine shooting a ray from each ? ? to the left
Main Idea: (?) word solution Now find the minimum in this range
Main Idea: (?) word solution Now find the minimum in this range
Main Idea: (?) word solution Define another array ? storing these minima ? ?[?]
Candidate Pairs We call each pair (? ? ,?) a candidate We define (yet another) array ? as follows: ?[?] is the score of the candidate (? ? ,?) That is: the sum within the range [? ? + 1,?]
What Do They Store? The array ?: (?) words (Cumulative Sums) 1) 2) The array ?: (?) words (Candidate partners) 3) Range min (RMinQ) structure on ?: 2? + ?(?) bits 4) Range max (RMaxQ) structure on ?: 2? + ? ? bits (Candidate Scores)
Main Idea: (?) word solution How to answer a query: the easy case
Main Idea: (?) word solution Let ? = RMaxQ(?,?,?), and examine candidate pair ? ?[?]
Main Idea: (?) word solution If ? ? + 1 is in query range, return [? ? + 1,?] ? ?[?]
Main Idea: (?) word solution How to answer a query: the not so easy case
Main Idea: (?) word solution Let ? = RMaxQ(?,?,?) this time ? ? + 1 [?,?] ? ?[?]
Main Idea: (?) word solution Let ? = RMinQ ?,?,? ? ?
Main Idea: (?) word solution Let ? = RMinQ ?,?,? and ? = RMaxQ ?,? + 1,? ? ? ? ?[?]
Main Idea: (?) word solution Return the greater sum: [? + 1,?] or [? ? + 1,?] ? ? ? ?[?]
Reducing the Space What are the bottlenecks in the data structure? Storing the array ? I. We need to store the candidate pairs II. Storing the array ? We must compare scores of candidates in the not so easy case
Nested Is Good Imagine indices as ? vertices, candidate pairs as edges We can represent an ?-edge nested graph in 4? bits Also known as a one-page or outerplanar graph Navigation is efficient: select vertices, follow edges, etc. Jacobson (1989), Munro and Raman (2001); 4? + ?(?) bits 1 2 3 4 5 6 7 8 ()(( ())( ()( ())) ())(( ()( ())) ())
Dealing with ?: Bottleneck II We call the point the left sibling of (? ? ,?) Knowing , we can handle the not so easy case. ? ?[?]
Recall The Query Algorithm Return the greater sum: [? + 1,?] or [? ? + 1,?] ? ? ? ?[?]
Recall The Query Algorithm Return the greater sum: [? + 1,?] or [? ? + 1,?] If the left sibling of (? ? ,?) is < ? ? ? ? ?[?]
Recall The Query Algorithm Return the greater sum: [? + 1,?] or [? ? + 1,?] If the left sibling of (? ? ,?) is [?,?] ? ? ? ?[?]
Recall The Query Algorithm Return the greater sum: [? + 1,?] or [? ? + 1,?] Left sibling of (? ? ,?)can t be here ? ? ? ?[?]
Dealing with ?: Bottleneck II Problem: cannot store the left siblings explicitly ? ?[?]
Dealing with ?: Bottleneck II Idea: try to find something that is nested ? ?[?]
Dealing with ?: Bottleneck II Solution: the pairs ( ,? ? ) are nested ? ?[?]
What Do We Store? The graph representing candidates: 4? + ?(?) bits 1) 2) The graph representing left siblings: 4? + ?(?) bits 3) Range min (RMinQ) structure on ?: 2? + ?(?) bits 4) Range max (RMaxQ) structure on ?: 2? + ? ? bits Grand total: 12? + ?(?)bits (can be reduced slightly with more tricks)