Silver Bullets Against Corruption: Unveiling Effective Solutions
Delve into the 24th Annual Conference of the European Society of Criminology where Professor Ricardo Rivero Ortega discusses the pressing issue of corruption, its political, economic, and social implications, and proposed solutions involving criminal law agencies and artificial intelligence. Learn about the new European Directive and how artificial intelligence and Big Data can aid in combating corruption effectively.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
21stInternational Symposium on Temporal Representation and Reasoning Lean Index Structures for Snapshot Access in Transaction-time Databases Fabio Grandi Alma Mater Studiorum - Universit degli Studi di Bologna Bologna, Italy TIME 2014 Verona, Italy, 8-10 September 2014
Introduction (1) Temporal Databases are the answer to advanced application requirements involving the representation and management of time-varying data Adopting a non-deletion policy, the full history of past database states is kept and past snapshots of database tables can be accessed for archiving, accountability or audit purposes In many cases, this can be done in a way transparent to the non-temporal users, via support of transaction time (e.g., with system-versioned tables in SQL:2011) TIME 2014 - F. Grandi Lean Index Structures for Snapshot Access in Transaction-time Databases
Introduction (2) In a dynamic environment, the full maintenance of past data versions leads to a fast data growth which soon ends up in a performance issue In order to efficiently provide selective access to temporal snapshots, suitable index structures must be employed Several temporal index structures have been proposed in the 90 s for accessing transaction-time data: Snapshot Index [Tsotras & Kangelaris 1995] Time-Split B-Tree [Lomet & Salzberg 1990] Multiversion B-Tree [Becker et al. 1996] Append-only Tree [Gunadhi & Segev 1993] Time Index [Elmasri, Wuu & Kim, 1990] TIME 2014 - F. Grandi Lean Index Structures for Snapshot Access in Transaction-time Databases
Introduction (3) Since temporal tuples may belong to multiple snapshots, a perfect clustering involving separation of snapshots is impossible without data duplication Hence, the theoretical asymptotic optimum for snapshot access time O(logbn + |s(T)|/b) can only be achieved at the expense of a (sometimes very high) data duplication For instance, the Time-Split B-Tree grants optimal shapshot access but with a very high data duplication rate (the index may become several times the size of the indexed relation!) The Snapshot index trades between data duplication and query performance via the usefulness parameter a TIME 2014 - F. Grandi Lean Index Structures for Snapshot Access in Transaction-time Databases
Overview of the contribution A new index structure supporting fast selective access to past snapshots of a transaction-time database table is presented The new structure, called RABTree, is lean: the index has low memory requirements with high occupancy and requires no data duplication An even leaner though less efficient variant, called RAB-Tree, is also presented Performance evaluation and comparison with competitors is presented TIME 2014 - F. Grandi Lean Index Structures for Snapshot Access in Transaction-time Databases
A Sample Temporal Table TID TID TID TID Key Key Key Key Value Value Value Value Start Start Start Start End End End End P1 P1 P1 P1 A A A A 10 10 10 10 T0 T0 T0 T0 UC T1 T1 T1 P2 P2 P2 P2 B B B B 20 20 20 20 T0 T0 T0 T0 UC UC UC UC P3 P3 P3 P3 C C C C 30 30 30 30 T0 T0 T0 T0 UC T1 T1 T1 Time T0: insert A,B,C,D,E,F; P4 P4 P4 P4 D D D D 50 50 50 50 T0 T0 T0 T0 UC UC T2 T2 P5 P5 P5 P5 E E E E 45 45 45 45 T0 T0 T0 T0 UC UC UC UC P6 P6 P6 P6 F F F F 25 25 25 25 T0 T0 T0 T0 UC T1 T1 T1 Time T1: update A,C; delete F; insert G,H,I P7 P7 P7 A A A 15 15 15 T1 T1 T1 UC T2 T2 P8 P8 P8 C C C 20 20 20 T1 T1 T1 UC T2 T2 P9 P9 P9 G G G 30 30 30 T1 T1 T1 UC T2 T2 P10 P10 P10 H H H 10 10 10 T1 T1 T1 UC T2 T2 P11 P11 P11 I I I 15 15 15 T1 T1 T1 UC UC UC Time T2: update C,G,H; delete A,D; insert J,K; P12 P12 C C 40 40 T2 T2 UC T3 P13 P13 G G 25 25 T2 T2 UC T3 P14 P14 H H 90 90 T2 T2 UC UC P15 P15 J J 10 10 T2 T2 UC T3 Time T3: update C,J; delete G,K; insert L,M P16 P16 K K 30 30 T2 T2 UC T3 P17 C 15 T3 UC P18 J 25 T3 UC P19 L 10 T3 UC P20 M 60 T3 UC TIME 2014 - F. Grandi Lean Index Structures for Snapshot Access in Transaction-time Databases
A Sample Temporal Table T0 >= Start and T0 < End TID Key Value Start End P1 A 10 T0 T1 P2 B 20 T0 UC P3 C 30 T0 T1 Snapshot T0: P1,P2,P3,P4,P5,P6 P4 D 50 T0 T2 T1 >= Start and T1 < End P5 E 45 T0 UC P6 F 25 T0 T1 Snapshot T1: P2,P4,P5,P7, P8,P9,P10,P11 T2 >= Start and T2 < End P7 A 15 T1 T2 P8 C 20 T1 T2 P9 G 30 T1 T2 P10 H 10 T1 T2 P11 I 15 T1 UC Snapshot T2: P2,P5,P11,P12, P13,P14,P15,P16 P12 C 40 T2 T3 T3 >= Start and T3 < End P13 G 25 T2 T3 P14 H 90 T2 UC P15 J 10 T2 T3 Snapshot T3: P2,P5,P11,P14 P17,P18,P19,P20 P16 K 30 T2 T3 P17 C 15 T3 UC P18 J 25 T3 UC P19 L 10 T3 UC P20 M 60 T3 UC TIME 2014 - F. Grandi Lean Index Structures for Snapshot Access in Transaction-time Databases
A Compression Technique for TID Lists Snapshot T0: P1%5 Snapshot T0: P1,P2,P3,P4,P5,P6 Snapshot T1: P2,P4%1,P7%4 Snapshot T1: P2,P4,P5,P7, P8,P9,P10,P11 Snapshot T2: P2,P5,P11%5 Snapshot T2: P2,P5,P11,P12, P13,P14,P15,P16 Snapshot T3: P2,P5,P11,P14 P17%3 Snapshot T3: P2,P5,P11,P14 P17,P18,P19,P20 TIME 2014 - F. Grandi Lean Index Structures for Snapshot Access in Transaction-time Databases
A Further Optimization TID TID Key Key Value Value Start Start End End P1 P1 A A 10 10 T0 T0 T1 T1 P2 P2 B C 20 30 T0 T0 UC T1 P3 P3 C F 30 25 T0 T0 T1 T1 Tuples are naturally clustered wrt Start values according to their creation (append) order P4 P4 D D 50 50 T0 T0 T2 T2 P5 P5 E B 45 20 T0 T0 UC UC P6 P6 F E 25 45 T0 T0 T1 UC P7 P7 A A 15 15 T1 T1 T2 T2 A further optimization can be made by superimposing a secondary order on End values P8 P8 C C 20 20 T1 T1 T2 T2 P9 P9 G G 30 30 T1 T1 T2 T2 P10 P10 H H 10 10 T1 T1 T2 T2 P11 P11 I I 15 15 T1 T1 UC UC This minimizes the number of ranges and maximizes the range length in TID lists P12 P12 C C 40 40 T2 T2 T3 T3 P13 P13 G G 25 25 T2 T2 T3 T3 P14 P14 H J 90 10 T2 T2 UC T3 Sorting required to achieve such optimization is quite inexpensive as it can be made in main memory during update operations P15 P15 J K 10 30 T2 T2 T3 T3 P16 P17 K H 30 90 T2 T2 T3 UC P17 C 15 T3 UC P18 J 25 T3 UC P19 L 10 T3 UC P20 M 60 T3 UC TIME 2014 - F. Grandi Lean Index Structures for Snapshot Access in Transaction-time Databases
An Optimized Temporal Table TID Key Value Start End P1 A 10 T0 T1 P2 C 30 T0 T1 P3 F 25 T0 T1 Snapshot T0: P1%5 P4 D 50 T0 T2 P5 B 20 T0 UC P6 E 45 T0 UC Snapshot T1: P4%7 P7 A 15 T1 T2 P8 C 20 T1 T2 P9 G 30 T1 T2 P10 H 10 T1 T2 P11 I 15 T1 UC Snapshot T2: P5%1,P11%5 P12 C 40 T2 T3 P13 G 25 T2 T3 P14 J 10 T2 T3 P15 K 30 T2 T3 Snapshot T3: P5%1,P11,P16%4 P16 H 90 T2 UC P17 C 15 T3 UC P18 J 25 T3 UC P19 L 10 T3 UC P20 M 60 T3 UC TIME 2014 - F. Grandi Lean Index Structures for Snapshot Access in Transaction-time Databases
The RABTree (1) The structure is very similar to a traditional B+-Tree The compression technique is applied to TID-lists in the leaves A tree structure is built on time (Start attribute) above the leaves for fast access to a given snapshot (road-map to the desired TID-list) T0 T8 T16 T0 T2 T4 T6 T8 T10 T12 T14 T16 T18 T0:L0; T1:L1 T2:L2; T3:L3 T4:L4; T5:L5 T6:L6; T7:L7 T8:L8; T9:L9 T10:L10 T11:L11 T12:L12;T13:L13 T14:L14;T15:L15 T16:L16; T17:L17 T18:L18 The resulting temporal index is also very similar to the Time Index [Elmasri, Wuu & Kim, 1990], from which it basically differs by the compression technique used for TID-lists TIME 2014 - F. Grandi Lean Index Structures for Snapshot Access in Transaction-time Databases
The RABTree (2) Owing to the semantics of transaction time, the resulting index is append-only. New entries are inserted only in the rightmost nodes of each level. This leads to the name: Right-Append B+-Tree. Nice properties of the RABTree are a high memory occupancy (near 100% versus 69% of the B+-Tree) and a low height. The quantum leap wrt to the Time Index is the new compression technique for TID lists: The Time Index stores a full (uncompressed) TID list for the first entry of each leaf and deltas (TIDs of added and deleted tuples) for the other entries This leads to a worst case of O(n2/b) memory space for leaves, where n is the number of tuples in the indexed relation This makes the Time Index a non lean structure (and even impracticable in several cases, due to an excess of memory required) The space required by the RABTree is O( n/b), with usually <1, equal to a few percents of the indexed relation; however a worst case could theoretically exist TIME 2014 - F. Grandi Lean Index Structures for Snapshot Access in Transaction-time Databases
Tending to The RAB-Tree TID Key Value Start End P1 A 10 T0 T1 P2 C 30 T0 T1 P3 F 25 T0 T1 Snapshot T0: P1-P6 P4 D 50 T0 T2 P5 B 20 T0 UC P6 E 45 T0 UC Snapshot T1: P4-P11 P7 A 15 T1 T2 P8 C 20 T1 T2 P9 G 30 T1 T2 P10 H 10 T1 T2 P11 I 15 T1 UC Snapshot T2: P5-P16 P12 C 40 T2 T3 P13 G 25 T2 T3 P14 J 10 T2 T3 P15 K 30 T2 T3 Snapshot T3: P5-P20 P16 H 90 T2 UC P17 C 15 T3 UC P18 J 25 T3 UC P19 L 10 T3 UC P20 M 60 T3 UC TIME 2014 - F. Grandi Lean Index Structures for Snapshot Access in Transaction-time Databases
Snapshot Access with RAB/RAB-Tree To Access the snaspshot S(T) with the RABTree: Retrieve in the index leaves entries Ti:Li,Ti+1:Li+1 such that Ti T<Ti+1 Access all tuples pointed by TIDs in Li with the RAB-Tree: Retrieve in the index leaves entries Ti:Pi,Ti+1:Pi+1 such that Ti T<Ti+1 Sequentially access all tuples from the one pointed by Pi to the last one with Start<Ti+1 (discard if End T) TIME 2014 - F. Grandi Lean Index Structures for Snapshot Access in Transaction-time Databases
Experimental Settings Stationary growth Only updates (and/or insertions balanced by deletions) are applied, so that the snapshot size stays constant; the file grows by the addition of snapshots 50% of tuples are updated by each transaction Starting from an initial snapshot with 1,000 tuples, we end up with 2,470,000 tuples after 5,000 transactions Linear growth Each transaction executes a constant number of ins/del/upd (on average), with a positive balance between insertions and deletions, so that the snapshot size grows linearly in time Number of ins/del/upd uniformly distributed in [0..100]/[0..50]/[0..200] Starting from an initial snapshot with 1,000 tuples, we end up with 1,510,000 tuples after 10,000 transactions Exponential growth Each transaction applies ins/del/upd at a constant rate (on average), with a positive balance between insertions and deletions, so that the snapshot size grows exponentially in time Rates of randomly ins/del/upd tuples are 5/2/30 % Starting from an initial snapshot with 1,000 tuples, we end up with 32,152,000 tuples after 500 transactions TIME 2014 - F. Grandi Lean Index Structures for Snapshot Access in Transaction-time Databases
Conclusions We presented the RABTree index (and its variant RAB-Tree) which is an I/O-suboptimal secondary structure for snapshot access in transaction-time databases. Without data duplication, with the proposed optimized storage, the RABTree index is I/O-optimal We proved our solutions to be efficient in different experimental configurarions and to show a good scale-up behaviour The compression technique adopted for TID-lists in the RABTree leaves has shown to provide excellent results, making it a lean indexing solution, suitable to settings where memory occupation is an issue TIME 2014 - F. Grandi Lean Index Structures for Snapshot Access in Transaction-time Databases
Future work We will study in deeper detail the performance of the RAB/RAB-Tree indexes and, in particular, of the proposed RABTree TID-list compression technique (e.g., to characterize the worst case) We will study the definition of quasi transaction- time databases, where retro- and pro-active transactions can be allowed, at a limited extent, although the adopted time dimension has the semantics of transaction time (the RAB/RAB-Tree indexes can still be used in this kind of database) TIME 2014 - F. Grandi Lean Index Structures for Snapshot Access in Transaction-time Databases