
Advanced Techniques in Data Management Without Traditional DBMS
Explore why organizations such as CERN opt for flat files over DBMS, the challenges of managing data effectively, and innovative solutions for adaptive loading and file adaptation to enhance performance and efficiency.
Uploaded on | 0 Views
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Here are my Data Files. Here are my Queries. Where are my Results? Stratos Idreos* Ioannis Alagiannis Ryan Johnson Anastasia Ailamaki University of Toronto cole Polytechnique F d rale de Lausanne *CWI, Amsterdam
CERN ($20B physics experiment) Last year: 35PB! Experiments, simulation, user data All stored in flat files Database only stores metadata Custom solutions & scripts Almost never a DBMS Why??? 2
Why people dont use DBMS? Requirements Analysis Define a schema Load the data Iterate to convergence Tune the system Evolving requirements => no convergence 3
Data import & tuning Massage Data Load Tuples DBMS owns the data now Database Flat Files Why wait? Why complete load? Which format? Hire DB expert? Not worth the startup cost 4
Avoiding up-front overheads Flat File a1 a2 a3 a10 Flat files an integral part of the system Hot data Query over flat files Adaptive loads Tuning in background DBMS actions driven by workload 5
Adaptive loading Flat File Metadata a1 a2 a3 a4 Column Load Loaded Columns: a2 a3 Partial Load Full Load Metadata Loaded Parts: a2 a3 Storage 6
Dynamic file adaptation New Flat Files a) Parse only needed columns b) New flat file per attribute a1 a2 a4 Original Flat File a1 a2 a3 Analyze non-tokenized attributes a1 a2 a4 7
Adaptive loading in practice Q1: Loading Cost + First Query Constant performance for all queries 100 Response Time (seconds) Q11: load from FF Filtering on-the-fly MonetDB Q1: half the cost MySQL CSV 10 Column Loads Partial Loads a) On-the-fly load b) Cache data 1 1 5 10 15 20 Query Sequence Amortize loading cost over the query sequence select sum(a1), avg(a2) from R where a1<v1 and a2<v2 8
Towards a fully autonomous system Get your results! Give me your data as is Adaptive Load Adaptive Data Store Adaptive Invisible DBMS Kernel Give me your queries (supports SQL + your tools) grep, awk Challenge: make this invisible 9