
Data Reduction Schemes in Immens Visualization" (57 characters)
Explore the significance of data reduction schemes in Immens visualization, which deviate from traditional database queries. Learn about key lessons, where techniques may not work, and various data reduction schemes employed. Discover how Immens differs from full OLAP in enhancing data processing efficiency. (292 characters)
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Lessons Visualization Layer Database Layer Key lessons: Visualizations impose a specific set of interaction patterns that do not directly correspond to database queries. Data reduction schemes need to be aware of these interaction patterns Data representation is important, so is ability to do parallel execution
Paper itself What did you think?
Paper itself What did you think? Simple idea, end-to-end execution Not extensive in experiments or theory But many important engineering design decisions Unclear if techniques would work on general
Where would these techniques not work? Many attributes / cardinalities Arbitrary filters in addition to fixed visualization interactions
Data Reduction Schemes We ve seen a number of data reduction schemes, i.e., doing more with same hardware. What are they? Are there others?
Data Reduction Schemes We ve seen a number of data reduction schemes, i.e., doing more with same hardware. What are they? Are there others? Pros/Cons Compression -based Exact Compression Lossy Compression Storing Outliers Storing Models Aggregation -based Storing image tiles Pre-aggregation/binning only on finest granularity OLAP Data Cubes (complete cube) Sampling -based Random Sampling Complete Stratified Sampling (Aqua) Sampling via QCS
Comparison to Full OLAP How is Immens s data reduction scheme different from full OLAP?
Comparison to Full OLAP How is Immens different from full OLAP? Only lowest granularity aggregates are kept Slower for roll-up Similar if range-queries are executed
Comparison to BlinkDB What are the similarities/differences to BlinkDB?
Comparison to BlinkDB What are the similarities/differences to BlinkDB? Similarities: QCS like binned aggregation sets Different: more like materialized views/cubes than samples.
True Number of Multivariate Tiles If the bin count per dimension is b, and the number of attributes is 5, then the paper argues that the bin count goes down from b5 down to 4b3 in their example. What is the worst case?
True Number of Multivariate Tiles If the bin count per dimension is b, and the number of attributes is 5, then the paper argues that the bin count goes down from b5 down to 4b3 in their example. What is the worst case? Two four dimensional scatterplots (X, Y, Z, W): 5C4 b4 = 5b4. If b = 50, b5 = 312.5M, 4b3 = 0.5M, 5b4 = 31.2M Can be pretty bad!!!!!
Regular bins The paper talks about bins of the same dimensions . An alternative is bins of different dimensions , tailored to where the data is. Pros/Cons? x x x x x x x
Regular bins The paper talks about bins of the same dimensions . An alternative is bins of different dimensions , tailored to where the data is. Pros/Cons? Con: Hard to locate cells of interest Pro: More compact storage, in case of sparsity
Comment from the paper (sampling methods) require that specific dimensions be chosen ahead of time, requiring prior knowledge, and often costly pre-processing (and therefore is a bad idea for the paper) Do you agree? Do you think sampling is still a good idea? When would it be a good idea relative to what they have?
Comment from the paper (sampling methods) require that specific dimensions be chosen ahead of time, requiring prior knowledge, and often costly pre- processing (and therefore is a bad idea for the paper) Sampling is a good idea when individual data values are needed. If only one aggregate is to be computed, might as well compute aggregates across bins Depends on how fine the granularity of the bins are (if the bins are such that there are many points per bin, sampling may be worse).