Clock-SI: Snapshot Isolation for Partitioned Data Stores

Clock-SI: Snapshot Isolation  for Partitioned Data Stores
Slide Note
Embed
Share

Clock-SI introduces a completely decentralized implementation of Snapshot Isolation (SI) for partitioned data stores, enhancing availability, latency, and throughput. The approach eliminates centralized components, improving the overall performance and reliability of data management systems.

  • Snapshot Isolation
  • Partitioned Data Stores
  • Decentralized Implementation
  • Multi-Version Concurrency Control
  • Consistent Snapshot

Uploaded on Mar 03, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Clock-SI: Snapshot Isolation for Partitioned Data Stores Jiaqing Du, EPFL Sameh Elnikety, MSR Redmond Willy Zwaenepoel, EPFL

  2. Key Idea Completely decentralized implementation of SI Current solutions have centralized component Better availability, latency, throughput 2

  3. Snapshot Isolation (SI) Popular multi-version concurrency control 3

  4. SI, Informally T start: Get snapshot T execution: Reads from snapshot Writes to private workspace T commit: Check for write-write conflicts Install updates 4

  5. SI, More Formally Total order on commits Transactions read from consistent snapshot No write-write conflicts between // transactions 5

  6. Consistent Snapshot Writes of all committed transactions before No other writes visible 6

  7. Total Order on Commits Commit T1 x=x1, y=y1 Commit T2 y=y2 Commit T3 x=x3 7

  8. Snapshots Commit T1 x=x1, y=y1 Commit T2 y=y2 Commit T3 x=x3 Start T5 x1, y2 Start T6 x1, y2 Start T4 x1, y1 8

  9. Timestamps Commit timestamp Assigned at commit Order of commit in commit total order Snapshot timestamp Assigned at start Commit timestamp of last committed transaction 9

  10. Commit Timestamps Commit T1 x=x1, y=y1 Commit T2 y=y2 Commit T3 x=x3 1 2 3 Start T5 x1, y2 Start T6 x1, y2 Start T4 x1, y1 10

  11. Snapshot Timestamps Commit T1 x=x1, y=y1 Commit T2 y=y2 Commit T3 x=x3 1 2 3 Start T5 x1, y2 Start T6 x1, y2 Start T4 x1, y1 1 2 2 11

  12. Multi-Versioning Commit T1 x=x1, y=y1 commit counter 1 x1,1 x y y1,1 12

  13. Multi-Versioning Commit T1 x=x1, y=y1 Commit T2 y=y2 commit counter 1 2 x1,1 x y y1,1 y2,2 13

  14. Multi-Versioning Commit T1 x=x1, y=y1 Commit T2 y=y2 Commit T3 x=x3 commit counter 1 2 3 x1,1 x3,3 x y y1,1 y2,2 14

  15. Reads from Snapshot Commit T1 x=x1, y=y1 Commit T2 y=y2 Commit T3 x=x3 commit counter 1 2 3 x1,1 x3,3 x y y1,1 y2,2 Start T5 x1, y2 2 15

  16. Reads from Snapshot Commit T1 x=x1, y=y1 Commit T2 y=y2 Commit T3 x=x3 commit counter 1 2 3 x1,1 x3,3 x y y1,1 y2,2 Start T5 x1, y2 r5(x) -> x1 2 16

  17. Partitioned Data Store P2 P1 P3 17

  18. Transaction in Partitioned Data Store P2 P1 P3 T 18

  19. Simplification In this talk: Only single-partition update transactions In the paper: Multi-partition update transactions Using (augmented) 2PC 19

  20. Conventional SI in Partitioned Data Store [Percolator, OSDI 2010] Timestamp Authority P2 P1 P3 T 20

  21. Conventional SI: Transaction Start Timestamp Authority P2 P1 P3 T snapshot timestamp 21

  22. Conventional SI: Transaction Execution Timestamp Authority P2 P1 P3 T 22

  23. Conventional SI: Transaction Commit Timestamp Authority P2 P1 P3 T commit timestamp 23

  24. Problems Single point of failure Latency (esp. with geo-partitioning) Throughput 24

  25. Clock-SI: Key Idea Eliminate central timestamp authority Replace by loosely coupled clocks 25

  26. Clock-SI Timestamp Authority P2 P1 P3 T 26

  27. Clock-SI: Transaction Start P2 P1 P3 T snapshot timestamp 27

  28. Clock-SI: Transaction Execution P2 P1 P3 T 28

  29. Clock-SI: Transaction Commit P2 P1 P3 T commit timestamp 29

  30. SI? Total order on commits Transactions read from consistent snapshot No write-write conflicts between // transactions 30

  31. SI: Total Order on Commits? Ordered by commit timestamp (as before) Ties broken by partition-id 31

  32. SI: Consistent Snapshot? Challenges Clock skew Commit duration 32

  33. Clock-SI Challenges: Clock Skew P1 t P2 t Timeline 33

  34. Clock-SI Challenges: Clock Skew T1.start() T1.read(x) P1 t P2 t Timeline 34

  35. Clock-SI Challenges: Clock Skew T1.start() T1.read(x) P1 t P2 t Timeline 35

  36. Clock-SI Challenges: Clock Skew T1.start() T1.read(x) P1 t delay P2 t Timeline 36

  37. Clock-SI Challenges: Commit Duration P1 t T1.write(x) T1.commit 37

  38. Clock-SI Challenges: Commit Duration P1 t T1.write(x) T1.commit.finished T1.commit 38

  39. Clock-SI Challenges: Commit Duration P1 t t T1.write(x) T1.commit.finished T1.commit T2.start T2.read(x) 39

  40. Clock-SI Challenges: Commit Duration P1 t t T1.write(x) T1.commit.finished T1.commit delay T2.start T2.read(x) 40

  41. Reducing Delay Snapshot timestamp = Clock() If > max commit delay + max clock skew Then no delays 41

  42. Clock-SI: Advantages No single point of failure No roundtrip latency Throughput not limited by single machine Tradeoff: May delay reads for short time May read slightly stale data 42

  43. Evaluation Partitioned key-value store Keep index and versioned data in memory Commit operation log to disk synchronously Clocks are synchronize by NTP (peer mode) Latency numbers Synchronous disk write: 6.7ms Round-trip network latency: 0.14ms 43

  44. Latency Roundtrip(s) eliminated WAN: important LAN: Important for short read-only T s Less important for long or update T s 44

  45. LAN Transaction Latency Clock-SI, Delta=0 Conventional SI 0.9 0.8 Latency (millisecond) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 1 2 3 Number of partitions read by one transaction 45

  46. Read Scalability Clock-SI, Delta=0 Conventional SI 250 Throughput (thousand per second) 200 150 100 50 0 1 2 3 4 5 6 7 8 Number of partitions serving client requests 46

  47. Write Scalability Clock-SI, Delta=0 Conventional SI 80 Throughput (thousand per second) 70 60 50 40 30 20 10 0 1 2 3 4 5 6 7 8 Number of partitions serving client requests 47

  48. Delay Probability Analytical model in the paper Large data set, random access Very low probability Small data set, hotspot Some probability Choosing older snapshot helps 48

  49. Hot Spots Read Throughput Clock-SI, Delta=0ms Clock-SI, Delta=7ms Clock-SI, Delta=14ms Clock-SI, Delta=21ms 9 Throughput (1000 per second) 8 7 6 5 4 3 2 1 0 0% 10% 20% Probability of reading updated items 30% 40% 50% 60% 70% 80% 90% 100% 49

  50. Hot Spots Read Latency Clock-SI, Delta=0ms Clock-SI, Delta=7ms Clock-SI, Delta=14ms Clock-SI, Delta=21ms 8 7 Average latency (ms) 6 5 4 3 2 1 0 0 0.1 0.2 Probability of reading updated items 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 50

More Related Content