Fundamental Concepts in Logic Design
Delve into the world of logic design exploring fundamental hardware requirements, communication methods, computation principles, and storage techniques. Understand the significance of bits, digital signals, logic gates, combinational circuits, and equality operations at the bit and word levels. Discover how to compute with logic gates, utilize multiplexors, and represent data in hardware control language.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Succinct: Enabling Queries on Compressed Data Shreya
Current Query Methods Data Scans: Loads file into memory and scans. Column oriented stores Low memory, high latency Index based scan: the file is preprocessed and stored in memory with indices. High memory, low latency
Succinct Stores compressed representation of the file into memory only. Can store more data before resorting to slow storage No additional indexes, data scans, or decompression (unless data accesses)
Extensions for semi-structured data Given a collection of records (key, avplist) Avplist = ((attribute1, value1), (attributeN, valueN)) Avplist is encoded into Succinct s data representation (flat files)
Existing Techniques Classical search techniques are usually based on tries or suffix trees. May result in 10-20x memory Burrows-Wheeler Transform (BWT) and Suffix arrays FM-indexes and Compressed Suffix Arrays Succinct adapts compressed suffix arrays
Compressed Suffix Arrays Array of Suffixes (AoS): array containing all suffixes in the input file in lexicographically sorted order O(n2) bits AoS2Input: O(nlogn) bits Input2Aos: O(nlogn) bits
Succinct Data Representation Uses a more space-efficient AoS2Input and Input2AoS using sampling by value. Samples every multiple of and scales down by to save space More space efficient representation for NextCharIdx Skewed wavelet trees
Queries on Compressed Data Binary search is inefficient Query algorithm takes advantage of 2d NextCharIdx representation 2.3 speed-up on an average and 19 speed- up in the best case
Succinct Multistore Design Write friendly multi-store design that chains multiple individual stores Log Store: New data is appended into this. Executes queries via in-memory data scans sped up using an inverted index that supports fast fine-grained update Suffix Store: supports bulk appends(aggregates larger amounts of data before compression) Succinct Store: supports queries on compressed data
SuccinctStore Can choose sampling rate. Smaller rate means less memory and higher latency. Can specify string lengths for the 2d NextCharIdx rows Tuning of these parameters allows SuccinctStore to avoid disk and handle overloaded partitions for skewed workloads
Data Transformation between Stores Logstore partition < 250 MB AoS2Input can be constructed on the server itself using an efficient linear-time Aggregates data across multiple partitions before transforming it to SuccinctStore Transforming SuffixStore data into a SuccinctStore partition requires a merge sort of AoS2Input for each of the SuffixStore partitions Currently done in the background, but could dedicate a server to just this process
Evaluation Compared to MongoDB and Cassandra for queries using secondary indexes; HyperDex using hyperspace hashing; and an industrial columnar- store DB-X, using in-memory data scans Used two multiattribute record datasets: smallVal and largeVal from Conviva Amazon EC2 with15GB RAM and 4 cores
Results: Memory MongoDB and Cassandra fit roughly 10 11 less data than Succinct due to storing secondary indexes and input data HyperDex not only stores large metadata and stores 126x more