Special Topics in Database Systems by Antonios Deligiannakis
Dive into special topics in database systems with Antonios Deligiannakis, exploring course details, class hours, course material, grading criteria, project options, and presentation requirements. Learn about large-scale systems and more in this comprehensive course.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Special Topics in Database Systems Antonios Deligiannakis adeli76@yahoo.com
Contact /Office: 145.A24 (2 . ) Email: adeli76@yahoo.com Webpage: http://www.softnet.tuc.gr/~adeli /Phone: 28210-37415
Hours of Classes Choice 1 The course is listed as 3 hours/week => 39 hours/semester at the graduate coursebook. Choice 1 Monday 11:00 -13:00, room 137.P39 Tuesday 11:00 -13:00, room 137.P39 9 weeks (36 hours) of classes (until 26/11) + 1 week in December (December 16 & 17) for presentation of projects & review Total: 40 hours 3
Hours of Classes Choice 2 The course is listed as 3 hours/week => 39 hours/semester at the graduate coursebook. Choice 2 Monday 11:00 -12:00, room 137.P39 Tuesday 11:00 -13:00, room 137.P39 13 weeks of classes => 39 hours Please vote between the 2 choices 4
Course Material Lectures Based on Scientific Articles Posted at eclass Helping Course Book Database Systems The Complete Book (H. Garcia-Molina, J.D. Ullman, J. Widom) 5
Grading 40% Final 40% Project (teams of 1 or 2) Apache FLINK or implementation of a publication to be announced or you can suggest a topic that is of interest to you Potential project topics will be posted around 31/10 Due by 15/12/2024 20% Presentation (on November 11-12) of either a publication (teams of 1 or 2) (at least 12 pages, at a Research Session) SIGMOD 2024: https://2024.sigmod.org/program_sigmod.shtml VLDB 2024: https://vldb.org/pvldb/volumes/17 ICDE 2024: https://icde2024.github.io/papers.html Or the Apache Flink Big Data Platform 6
Note Presentations of Publications Whoever declares a publication, chooses data and time to present Last ones to choose will not have a choice for the date Choose publications for which you will find slides online You must declare the title of your publication by 04/11 7
What Will we Cover Today? Large Scale Systems: SQL alone is not the solution What does large scale mean; Course topics 8
Back to the future You work for a big telecom company COSMOPHONE Head of information analysis department ($$$) Asked by CEO to report the yearly volume of calls by your subscribers to the evil VONDOTE competitor CEO YOU 9
Raw Data CALLDETAILS CustId Call_start Call_end PhoneNo CompId 100103 8:00 8:02 6995219694 COSMOPHONE 100105 7:55 8:20 6995811821 VONDOTE 100105 8:40 8:44 6991155123 VONDOTE 105812 8:55 8:59 6991058186 COSMOPHONE 10
SQL Query SELECT SUM(call_end-call_start),count(*) FROM CALLDETAILS WHERE CompId= VONDOTE 11
How hard is this? 10,000,000 phone-calls per day 10% to the evil company = 1,000,000/day 365M records need to be aggregated (=sum total- minutes) x365 12
WAIT, we have an index record CompId VONDOTE VONDOTE Use index to jump to interesting records Mostly random I/O 13
Index Lookups Assume 2msecs disk access time Ignore all other delays Assume index fits in memory 365M records * 0.002sec/record = 9days Now CEO asks How does this differ from last year? 14
What does Large Scale mean? Data Volume? For example: videos 1GB/hour ~12GB/day 4 /year ... But only 365 tuples at a table 15
What does Large Scale mean? Data Quantity? (number of tuples) Example: data warehouses Walmart: 900-CPU, 2,700 disk, 23TB Teradata system Hawkeye (AT&T) 312TB and 1,88 Trillion Records Number of tuples impacts query and retrieval time (requirement for indices) 16
Distributed Systems Need to transfer data to the user Challenge: bandwidth constraints Transfering all data is impossible or infeasible How to do query answering in real time? 17
Example: IP-network Backbone router Gateway router Access router Lots of data, fast data rates CISCO NetFlow: 10 Billion records/day Centralizing all data is not possible 18
Approximate Query Answering Query Exact Answer GB/TB Query + Approximate Answer KB/MB Exact answers are not always needed! Often interested at identifying tendencies, patterns ? 19
Example Network Monitoring Intrusion Warnings Online Performance Metrics Register Monitoring Queries DSMS Network measurements, Packet traces, Archive Scratch Store Lookup Tables 20
What does Large Scale Mean? Data Volume Number of Tuples Distributed Data Volume, difficulties in acquiring Data Integration 21
Data Integration Enterprise Databases Legacy Databases Services and Applications 22
Design time Run time Mediated Schema query reformulation Semantic mappings optimization & execution wrapper wrapper wrapper wrapper wrapper 23
What does Large Scale Mean? Data Volume Number of Tuples Distributed Data Volume, difficulties in acquiring Data Integration Number of Nodes Peer-to-Peer Systems, Sensor Networks 24
Super-Peer Architecture Some peers (super-peers) have special roles Peers that join the network directly connect to one of the super-peers Each super-peer maintains links to simple peers and to a limited set of super-peers Each peer holds a set of d-dimensional points (horizontally distributed dataset) Aims: SPA SPC SPB Super-peer Level Peer Level PA PB PC Local Data Contact only peers and super-peers that contribute to the query X 2 1 1 2 5 Y 2 3 3 3 2 Z 2 2 5 2 4 X Y 1 5 3 2 5 Z 1 4 3 3 5 X Y 3 5 5 1 6 Z 7 6 5 3 6 A1 A2 A3 A4 A5 B1 3 B2 4 B3 2 B4 1 B5 5 C1 5 C2 2 C3 5 C4 1 C5 6 Minimize number of data objects transferred Minimize query response time Exact result set 25
Syllabus Multidimensional Indices, Query Optimization Data Warehousing and Online Analytical Processing (OLAP) Several techniques for computing the Data Cube Processing multiple concurrent queries Processing over distributed data Synopses + approximate query answering Modern Big Data Platforms (Flink, Storm) Association rules, Skyline Queries, P2P 26