Clock-SI: Snapshot Isolation for Partitioned Data Stores
Clock-SI introduces a completely decentralized implementation of Snapshot Isolation (SI) for partitioned data stores, enhancing availability, latency, and throughput. The approach eliminates centralized components, improving the overall performance and reliability of data management systems.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
Clock-SI: Snapshot Isolation for Partitioned Data Stores Jiaqing Du, EPFL Sameh Elnikety, MSR Redmond Willy Zwaenepoel, EPFL
Key Idea Completely decentralized implementation of SI Current solutions have centralized component Better availability, latency, throughput 2
Snapshot Isolation (SI) Popular multi-version concurrency control 3
SI, Informally T start: Get snapshot T execution: Reads from snapshot Writes to private workspace T commit: Check for write-write conflicts Install updates 4
SI, More Formally Total order on commits Transactions read from consistent snapshot No write-write conflicts between // transactions 5
Consistent Snapshot Writes of all committed transactions before No other writes visible 6
Total Order on Commits Commit T1 x=x1, y=y1 Commit T2 y=y2 Commit T3 x=x3 7
Snapshots Commit T1 x=x1, y=y1 Commit T2 y=y2 Commit T3 x=x3 Start T5 x1, y2 Start T6 x1, y2 Start T4 x1, y1 8
Timestamps Commit timestamp Assigned at commit Order of commit in commit total order Snapshot timestamp Assigned at start Commit timestamp of last committed transaction 9
Commit Timestamps Commit T1 x=x1, y=y1 Commit T2 y=y2 Commit T3 x=x3 1 2 3 Start T5 x1, y2 Start T6 x1, y2 Start T4 x1, y1 10
Snapshot Timestamps Commit T1 x=x1, y=y1 Commit T2 y=y2 Commit T3 x=x3 1 2 3 Start T5 x1, y2 Start T6 x1, y2 Start T4 x1, y1 1 2 2 11
Multi-Versioning Commit T1 x=x1, y=y1 commit counter 1 x1,1 x y y1,1 12
Multi-Versioning Commit T1 x=x1, y=y1 Commit T2 y=y2 commit counter 1 2 x1,1 x y y1,1 y2,2 13
Multi-Versioning Commit T1 x=x1, y=y1 Commit T2 y=y2 Commit T3 x=x3 commit counter 1 2 3 x1,1 x3,3 x y y1,1 y2,2 14
Reads from Snapshot Commit T1 x=x1, y=y1 Commit T2 y=y2 Commit T3 x=x3 commit counter 1 2 3 x1,1 x3,3 x y y1,1 y2,2 Start T5 x1, y2 2 15
Reads from Snapshot Commit T1 x=x1, y=y1 Commit T2 y=y2 Commit T3 x=x3 commit counter 1 2 3 x1,1 x3,3 x y y1,1 y2,2 Start T5 x1, y2 r5(x) -> x1 2 16
Partitioned Data Store P2 P1 P3 17
Transaction in Partitioned Data Store P2 P1 P3 T 18
Simplification In this talk: Only single-partition update transactions In the paper: Multi-partition update transactions Using (augmented) 2PC 19
Conventional SI in Partitioned Data Store [Percolator, OSDI 2010] Timestamp Authority P2 P1 P3 T 20
Conventional SI: Transaction Start Timestamp Authority P2 P1 P3 T snapshot timestamp 21
Conventional SI: Transaction Execution Timestamp Authority P2 P1 P3 T 22
Conventional SI: Transaction Commit Timestamp Authority P2 P1 P3 T commit timestamp 23
Problems Single point of failure Latency (esp. with geo-partitioning) Throughput 24
Clock-SI: Key Idea Eliminate central timestamp authority Replace by loosely coupled clocks 25
Clock-SI Timestamp Authority P2 P1 P3 T 26
Clock-SI: Transaction Start P2 P1 P3 T snapshot timestamp 27
Clock-SI: Transaction Execution P2 P1 P3 T 28
Clock-SI: Transaction Commit P2 P1 P3 T commit timestamp 29
SI? Total order on commits Transactions read from consistent snapshot No write-write conflicts between // transactions 30
SI: Total Order on Commits? Ordered by commit timestamp (as before) Ties broken by partition-id 31
SI: Consistent Snapshot? Challenges Clock skew Commit duration 32
Clock-SI Challenges: Clock Skew P1 t P2 t Timeline 33
Clock-SI Challenges: Clock Skew T1.start() T1.read(x) P1 t P2 t Timeline 34
Clock-SI Challenges: Clock Skew T1.start() T1.read(x) P1 t P2 t Timeline 35
Clock-SI Challenges: Clock Skew T1.start() T1.read(x) P1 t delay P2 t Timeline 36
Clock-SI Challenges: Commit Duration P1 t T1.write(x) T1.commit 37
Clock-SI Challenges: Commit Duration P1 t T1.write(x) T1.commit.finished T1.commit 38
Clock-SI Challenges: Commit Duration P1 t t T1.write(x) T1.commit.finished T1.commit T2.start T2.read(x) 39
Clock-SI Challenges: Commit Duration P1 t t T1.write(x) T1.commit.finished T1.commit delay T2.start T2.read(x) 40
Reducing Delay Snapshot timestamp = Clock() If > max commit delay + max clock skew Then no delays 41
Clock-SI: Advantages No single point of failure No roundtrip latency Throughput not limited by single machine Tradeoff: May delay reads for short time May read slightly stale data 42
Evaluation Partitioned key-value store Keep index and versioned data in memory Commit operation log to disk synchronously Clocks are synchronize by NTP (peer mode) Latency numbers Synchronous disk write: 6.7ms Round-trip network latency: 0.14ms 43
Latency Roundtrip(s) eliminated WAN: important LAN: Important for short read-only T s Less important for long or update T s 44
LAN Transaction Latency Clock-SI, Delta=0 Conventional SI 0.9 0.8 Latency (millisecond) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 Number of partitions read by one transaction 45
Read Scalability Clock-SI, Delta=0 Conventional SI 250 Throughput (thousand per second) 200 150 100 50 0 1 2 3 4 5 6 7 8 Number of partitions serving client requests 46
Write Scalability Clock-SI, Delta=0 Conventional SI 80 Throughput (thousand per second) 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 Number of partitions serving client requests 47
Delay Probability Analytical model in the paper Large data set, random access Very low probability Small data set, hotspot Some probability Choosing older snapshot helps 48
Hot Spots Read Throughput Clock-SI, Delta=0ms Clock-SI, Delta=7ms Clock-SI, Delta=14ms Clock-SI, Delta=21ms 9 Throughput (1000 per second) 8 7 6 5 4 3 2 1 0 0% 10% 20% Probability of reading updated items 30% 40% 50% 60% 70% 80% 90% 100% 49
Hot Spots Read Latency Clock-SI, Delta=0ms Clock-SI, Delta=7ms Clock-SI, Delta=14ms Clock-SI, Delta=21ms 8 7 Average latency (ms) 6 5 4 3 2 1 0 0 0.1 0.2 Probability of reading updated items 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 50