
Enhancements to Bitemporal Data Models Support Integrity Constraints
"Learn about the advancements in bitemporal data models supporting integrity constraints and more, including practical uses like snapshot queries and time-interval queries. Discover how Postgres is incorporating time for temporal data models. References and previous work overview included."
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Enhancements to the Bitemporal Model Support: Integrity Constraints and More Henrietta Dombrovskaya, Braviant Holdings, Chicago Boris Novikov, Saint Petersburg University, Russia Chad Slaughter, Scale Genius, Inc. Chicago
Bitemporal Data: Making It Happen In Postgres Incorporating time, thereby making the data models temporal, is a hot topic. The practical usage include: Snapshot queries (This report looked differently last week! I have a print out! Can you run it as it looked at the end of November?) Time-interval queries (Why this query takes twice longer now than last month? What exactly changed in the data?) Correlated time queries (How many people where fired since X became a CTO? How many people left since the new head of Analytics started?) Many systems are already providing this kind of support (DB2, Oracle) and it is included into the SQL ANSI 2011 standard. However, Postgres never formally included the time dimensions, and we are going to fix this! Apr-25 Bitemporal Data Model - PG Open 2017 2
References C.J. Date, Hugh Darwen, and Nikos Lorentzos. 2014. Time and Relational Theory, Second Edition: Temporal Databases in the Relational Model and SQL (2nd ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Tom Johnston. 2014. Bitemporal Data: Theory and Practice (1st ed.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Tom Johnston and Randall Weis. 2010. Managing Time in Relational Databases: How to Design, Update and Query Temporal Data. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. Krishna Kulkarni and Jan-Eike Michels. 2012. Temporal features in SQL:2011. SIGMOD Rec. 41, 3 (October 2012), 34-43. Richard Thomas Snodgrass. 1999. Developing Time-Oriented Database Applications in SQL. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. ISO/IEC 9075-2:2011, Information technology Database languages SQL , 2011 Martin Kaufmann, Peter M. Fischer, Norman May, Donald Kossmann: Benchmarking Bitemporal Database Systems: Ready for the Future or Stuck in the Past? EDBT 2014: 738-749 Apr-25 Bitemporal Data Model - PG Open 2017 3
Previous Work Overview What we ve presented a year ago Apr-25 Bitemporal Data Model - PG Open 2017 4
Summary The surveys (Kosman et al) indicate, that is should be easy to support time in Postgres due to the existence of the GIST indexes. And we ve utilized this advantage. Last year we presented the Postgres-based implementation of asserted versioning framework (Johnston and Weis). This model supports effective time and asserted time and differs from the ANSI 2011 standard. Apr-25 Bitemporal Data Model - PG Open 2017 5
Where Our Work Can Be Found pg_bitemporal git repo https://github.com/scalegenius/pg_bitemporal Includes: Domain definition Allen relations implementation (including the ones already implemented in Postgres) Operations: create bitemporal table bitemporal insert bitemporal update bitemporal correction inactivate bitemporal delete Support of bitemporal constraints Apr-25 Bitemporal Data Model - PG Open 2017 6
Overview Of Bitemporal Operations The next several slides present the basic bitemporal data manipulation operations and explain the differences between bitemporal and conventional operations. Terminology: - effective time - asserted time - the combination of effective and asserted times: time regions Apr-25 Bitemporal Data Model - PG Open 2017 7
Bitemporal Insert now = 2015-05-01 # Effective Interval Assertive Interval Customer No. Name Type select ll_bitemporal_insert( 'customers , ,$$ customer_no , name', type' $$, , $$'C100','John Doe', 'Silver' $$, timeperiod('2015-06-01','infinity'), timeperiod('2015-05-01','infinity') John Doe Silver 1 [2015-06-01, oo) [ 2015-05-01 , oo) C100 Effective A s s e r t e d 1 Apr-25 Bitemporal Data Model - PG Open 2017 8
Bitemporal Update now = 2015-09-15 # Effective Interval Assertive Interval Customer No. Name Type John Doe Silver 1 [2015-06-01, oo) [2015-05-01,2015-09-15) C100 select ll_bitemporal_update($$customers$$, $$customer_no$$, $$100$$, $$type$$, $$Gold$$, timeperiod( 2015-09-15 , infinity ), timeperiod( 2015-09-15 , infinity )) John Doe Silver 2 [2015-06-01,2015-09-15) [2015-09-15, oo) C100 John Doe Gold 3 [2015-09-15, oo) [ 2015-09-15, oo) C100 Effective A s s e r t e d 1 1 3 2 Apr-25 Bitemporal Data Model - PG Open 2017 9
Bitemporal Correction # Effective Interval Assertive Interval Customer No. Name Type now = 2015-09-15 John Doe Silver 1 [2015-06-01, oo) [2015-05-01,2015-09-15) C100 John Doe Silver 2 [2015-06-01,2015-09-15) [2015-09-15, oo) C100 select ll_bitemporal_correction($$customers$$, $$type $$, $$ Platinum$$, $$ customer_no $$, $$ C100$$, timeperiod('2015-09-15','infinity'), now()) John Doe Gold 3 [2015-09-15, oo) [2015-09-15, 2015-09-22) C100 John Doe Platinum 4 [2015-09-15, oo) [ 2015-09-22, oo) C100 Effective A s s e r t e d 1 1 3 3 2 2 4 Apr-25 Bitemporal Data Model - PG Open 2017 10
Bitemporal Inactivate # Effective Interval Assertive Interval Customer No. Name Type now = 2015-11-05 John Doe Silver 1 [2015-06-01, oo) [2015-05-01,2015-09-15) C100 John Doe Silver 2 [2015-06-01,2015-09-15) [2015-09-15, oo) C100 select ll_bitemporal_inactivate( $$customers$$, $$customer_no$$, $$C100$$, timeperiod('2015-12-31','infinity'), timeperiod('2015-11-05','infinity'), John Doe Gold 3 [2015-09-15, oo) [2015-09-15, 2015-09-22) C100 John Doe Platinum 4 [2015-09-15, oo) [2015-09-22, 2015-11-05) C100 John Doe Platinum 5 [2015-09-15,2015-12-31 ) [ 2015-11-05, oo) C100 Effective A s s e r t e d 1 1 3 3 4 2 2 4 5 Apr-25 Bitemporal Data Model - PG Open 2017 11
Bitemporal Delete # Effective Interval Assertive Interval Customer No. Name Type now = 2015-11-17 John Doe Silver 1 [2015-06-01, oo) [2015-05-01,2015-09-15) C100 select ll_bitemporal_delete( 'customers', $$ customer_no $$, $$ 'C100' $$, timeperiod('2015-11-17','infinity')) John Doe Silver 2 [2015-06-01,2015-09-15) [2015-09-15, oo) C100 John Doe Gold 3 [2015-09-15, oo) [2015-09-15, 2015-09-22) C100 John Doe Platinum 4 [2015-09-15, oo) [2015-09-22, 2015-11-05) C100 John Doe Platinum 5 [2015-09-15,2015-12-31 ) [2015-11-05, 2015-11-17 ) C100 Effective A s s e r t e d 1 1 3 3 4 4 2 2 5 5 Apr-25 Bitemporal Data Model - PG Open 2017 12
Bitemporal Constraints We need to support the following constraint types: Primary key supported by GIST with exclusion Unique - supported by GIST with exclusion Check no difference from regular tables IS/IS NOT NULL no difference from regular tables Foreign key most difficult to support We use metacode to record the presence of the bitemporal constraints: PK: select bitemporal_internal.pk_constraint('postgres_cluster_id'); UQ: select bitemporal_internal.unique_constraint('port'); Apr-25 Bitemporal Data Model - PG Open 2017 13
Defining Foreign Key Constraint The difficulty of verifying the bi-temporal FK is that the PK/UQ in the parent table should be effective and asserted all the time when a dependent record is effective/asserted CONSTRAINT "bitemporal fk postgres_version_database_versionsrelease_version" check (true or 'fk' <> '@postgres_version -> database_versions(release_version)@')); Function: select bitemporal_internal.fk_constraint( 'postgres_version ,'database_versions ,'release_version'); Apr-25 Bitemporal Data Model - PG Open 2017 14
FK Creation Check whether the referencing field is a PK/UQ validate_bitemporal_pk_uq Create check constraint fk_constraint Check whether the validation on the parent table field already exists ll_lookup_validation_function If not, create it ll_generate_fk_validate create trigger on insert/update and a trigger function Apr-25 Bitemporal Data Model - PG Open 2017 15
How a Validation Function Should Work? In order to define the algorithm, we need to take a closer look at how bitemporal queries work. Apr-25 Bitemporal Data Model - PG Open 2017 16
Understanding Bitemporal Queries Apr-25 Bitemporal Data Model - PG Open 2017 17
Types of Bitemporal Queries Snapshot queries how the data looked at a certain point of time What was the status of customer_no C100 on Sep 1 2015? May also use one-dimensional conditions Interval queries in selection criteria How did the status change for customer_no C100 during 2015? May return multiple rows with time regions One-dimensional conditions are also here Correlated time queries What was the status of customer C200 when customer C100 was Platinum ? Values are equal only at time when they are valid in both time regions Apr-25 Bitemporal Data Model - PG Open 2017 18
How to Define Time Regions For the Query Result? Conceptually each row has associated bitemporal time region The strategy is to perform an operation as in non-temporal DB and calculate region for each tuple of result Operations on time regions: intersection union difference Examples: Interval query: intersect query region with tuple regions: what was the customer type for customer C100 in September 2015? Join: find matching tuples and intersect their regions: what was the status of customer C300 when customer C100 was Platinum ? Apr-25 Bitemporal Data Model - PG Open 2017 19
Defining Time Regions For Updates UPDATE a_table SET attr = (expr) WHERE (cond) Time regions for: S the time region defined by UPDATE function (for new values to be set) P time region for each row identified by WHERE clause (with old values to be updated) N- time region for each already existing row with values, which we are going to assign (empty of not exists) P N = After UPDATE the DB will contain regions: Old values: P S New values: N S P s N P s N Apr-25 Bitemporal Data Model - PG Open 2017 20
Time Regions as Sets of Rectangles Operations on time regions: intersection union difference Asserted interval Effective interval = rectangle Unfortunately, union and difference of rectangles are not rectangular Sets of non-intersecting rectangles: all operations produce sets or rectangles Apr-25 Bitemporal Data Model - PG Open 2017 21
Operations on Regions (Rectangle Sets) Disjoin (auxiliary) Split both arguments into smaller rectangles such that each pair of rectangles from different arguments either coincide or do not overlap Set-theoretic operations: union, intersection, set difference: Calculate the disjoin Choose rectangles that belong to union/intersection/difference Apr-25 Bitemporal Data Model - PG Open 2017 22
Using Rectangle sets Checking foreign key constraints Rect_set (FK) rect_set (PK) Computing time regions (rectangle sets) for query results Intersection Union Difference Bulk inserts and updates Apr-25 Bitemporal Data Model - PG Open 2017 23
Finding One-Dimensional (Interval) Disjoin Sort all left and right ends of source intervals together Choose intervals that intersect with or overlap with at least one of source intervals Apr-25 Bitemporal Data Model - PG Open 2017 24
Two-Dimensional Disjoin Algorithm Project both arguments on one dimension (say, on asserted and ignore effective time) Compute on-dimensional disjoin Put back the second dimension producing stripes For each stripe, find one- dimensional disjoin. Apr-25 Bitemporal Data Model - PG Open 2017 25
Enhancements To The Existing Bitemporal Functions New features added as a result of bitemporal production system implementation Apr-25 Bitemporal Data Model - PG Open 2017 28
Bulk Operations INSERT-SELECT The power of SQL is operating on sets, but so far we only implemented single-row insert. The new function ll_bitemporal_insert_select(p_table text, p_list_of_fields text, p_select TEXT, p_effective temporal_relationships.timeperiod, p_asserted temporal_relationships.timeperiod ) Allows to pass a select statement as a parameter. Apr-25 Bitemporal Data Model - PG Open 2017 29
UPDATE/CORRECTION AS SELECT Our existing bitemporal_update and bitemporal_correction support updating multiple records in one call, but the update conditions are reduced to the lists of values New bulk update and correction will allow free-format expressions for both selection criteria and new values. The same concept will be applied to the DELETE/INACTIVATE functions. Apr-25 Bitemporal Data Model - PG Open 2017 30
Consistent Transactional Behavior Question: What should be recorded in a bitemporal table, when several operations happen in one transaction and affect the same record(s)? Transactions are atomic, thereby the intermediate states should not be visible. Also, if we materialize the record status before (and regardless) of transaction commit/rollback, we might (most likely) get inconsistent results for the snapshot queries. Apr-25 Bitemporal Data Model - PG Open 2017 31
Bitemporal Correction Immediately After Insert # Effective Interval Assertive Interval Customer No. Name Type now = 2015-09-15 John Doe Silver 1 [2015-06-01, oo) [2015-05-01,2015-09-15) C100 John Doe Silver 2 [2015-06-01,2015-09-15) [2015-09-15, oo) C100 select ll_bitemporal_correction($$customers$$, $$type $$, $$ Platinum$$, $$ customer_no $$, $$ C100$$, timeperiod('2015-09-15','infinity'), John Doe Gold 3 [2015-09-15, oo) [2015-09-15, 2015-09-15) C100 John Doe Platinum 4 [2015-09-15, oo) [ 2015-09-15, oo) C100 Effective A s s e r t e d 1 1 3 3 2 4 2 Apr-25 Bitemporal Data Model - PG Open 2017 32
Corrected Behavior Of UPDATE/CORRECTION Now we check the asserted interval of the last record, and if the last asserted start is the same as the new asserted start we do not insert a new record, but update the old one. This still allows the non-transactional behavior, of the app developers choose to ignore transactions Bitemporal correction has a new parameter: asserted_start_time (instead of now by default). One can still use the old version Apr-25 Bitemporal Data Model - PG Open 2017 33
Bitemporal Correction Correct Behavior # Effective Interval Assertive Interval Customer No. Name Type now = 2015-09-15 John Doe Silver 1 [2015-06-01, oo) [2015-05-01,2015-09-15) C100 John Doe Silver 2 [2015-06-01,2015-09-15) [2015-09-15, oo) C100 select ll_bitemporal_correction($$customers$$, $$type $$, $$ Platinum$$, $$ customer_no $$, $$ C100$$, timeperiod('2015-09-15','infinity'), John Doe Platinum 3 [2015-09-15, oo) [2015-09-15, oo) C100 John Doe Platinum 4 [2015-09-15, oo) [ 2015-09-22, oo) Cl 00 Effective A s s e r t e d 1 1 3 3 2 2 Apr-25 Bitemporal Data Model - PG Open 2017 34
High-level Functions The app developers do not care about how exactly bitemporal support is implemented. We created a high-level functions for them to perform select/insert/update/delete without knowing about the bitemporal operations Example: create or replace function common.hl_bitemporal_insert (p_schema_name text, p_table_name text, p_list_of_fields text, p_list_of_values text) returns integer Apr-25 Bitemporal Data Model - PG Open 2017 35
Performance We are using pg_bitemporal for both OLTP and OLAP systems, and the data volumes appear to be manageable Indexing Strategy: GIST indexes are efficient, but occasionally we need to build extra indexes, for example conditional index with UPPER (asserted)= infinity Apr-25 Bitemporal Data Model - PG Open 2017 36
Future Work Finalizing FK support Finalizing constraints creation Finalizing bitemporal UPDATE Continue research on performance and the choice of indexes Many thanks to Braviant Holdings leadership for their continuous support! Apr-25 Bitemporal Data Model - PG Open 2017 37