Solving Double-Logging Issue in LSM-Tree Based Databases

removing double logging with passive data n.w
1 / 19
Embed
Share

Explore a solution for the double-logging problem in relational databases utilizing LSM-tree storage engines. The research presents eliminating unnecessary redundancy in logging mechanisms, focusing on improving data persistence and efficiency. Discover the challenges and goals of removing Write-Ahead Logging (WAL) while maintaining data reliability upon failures.

  • LSM-Tree
  • Data Persistence
  • Relational Databases
  • Write-Ahead Logging
  • Research

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Removing Double-Logging with Passive Data Persistence in LSM-tree based Relational Databases FAST 22 Kecheng Huang , Zhaoyan Shen , Zhiping Jia , Zili Shao , Feng Chen Shandong University The Chinese University of Hong Kong Louisiana State University Presented by Yu-Ang Cao 2022/04/20

  2. Storage Engine of Relational Database (RDB) MySQL server MySQL server B-tree based storage engine LSM-tree based storage engine Traditional B-tree based RDB LSM-tree based RDB 7/4/2025 USTC-SYS Reading Group 2

  3. LSM-tree based RDB MySQL Server Layer Two layer structure RDB server layer LSM-tree based storage engine layer SQL Persistent Logging (Binlog) SQL Persistent Logging (Binlog) Storage Engine Pluggable Interface RocksDB Engine KV Persistent Logging (WAL) KV Persistent Logging (WAL) The system has unnecessary redundancy the double- logging problem. 7/4/2025 USTC-SYS Reading Group 3

  4. The Double-logging Problem Two independent logging mechanisms MySQL Server Layer SQL Persistent Logging (Binlog) Storage Engine Pluggable Interface Disadvantages: Unnecessary space usage I/O overheads Critical path blocking RocksDB Engine KV Persistent Logging (WAL) 7/4/2025 USTC-SYS Reading Group 4

  5. Solution MySQL Server Layer Remove the WAL. SQL Persistent Logging (Binlog) Why not binlog? Storage Engine Pluggable Interface Binlog has some semantic information such as transaction, but WAL doesn t. RocksDB Engine KV Persistent Logging (WAL) 7/4/2025 USTC-SYS Reading Group 5

  6. Challenges Goal: Remove WAL while retaining data reliability upon failures. MySQL Server Layer Unwarranted data persistence SQL Persistent Logging (Binlog) Storage Engine Pluggable Interface RocksDB Engine KV Persistent Logging (WAL) Committed data may still be in volatile memory buffers. Memtable Memtable SSTables Column Family-N SSTables Column Family-1 7/4/2025 USTC-SYS Reading Group 6

  7. Challenges Goal: Remove WAL while retaining data reliability upon failures. Binlog Commit Order T1 T3 T2 Unwarranted data persistence Partial persistence KVs Batch Commit Order (k31,v31) (k32,v32) (k11,v11) (k12,v12) (k21,v21) (k22,v22) Storage Engine A transaction can be partially persistent. (k11,v11)(k31,v31) (k11,v11)(k31,v31) Memtable (k12,v12)(k22,v22) Memtable Flushed and persisted Column Family2 Column Family1 7/4/2025 USTC-SYS Reading Group 7

  8. Challenges Goal: Remove WAL while retaining data reliability upon failures. MySQL Server Layer Unwarranted data persistence Partial persistence Lost track of LSN (Log Sequence Number) SQL Persistent Logging (Binlog) Storage Engine Pluggable Interface RocksDB Engine KV Persistent Logging (WAL) Replaying the binlog cannot regenerate the lost LSNs. The original ordering cannot be restored without LSNs. Memtable Memtable SSTables Column Family-N SSTables Column Family-1 7/4/2025 USTC-SYS Reading Group 8

  9. Passive Data Persistence Scheme (PASV) Aims: Data persistence and correctness Minimal and non-intrusive changes Effectiveness and efficiency MySQL Server Layer SQL Persistent Logging (Binlog) Passive Logging Manager Storage Engine Pluggable Interface Partial Recovery EBP flush_flag Three parts: Passive flushing policy Epoch-based persistence (EBP) Partial recovery RocksDB Engine Memtable Memtable SSTables Column Family-N SSTables Column Family-1 7/4/2025 USTC-SYS Reading Group 9

  10. Passive Flushing Binlog Commit Order T3 Every time the storage engine flushes a memtable, a Flush Flag is inserted into the memtable and flushed together. T2 T1 KVs Batch Commit Order (k31,v31) (k32,v32) (k11,v11) (k12,v12) (k21,v21) (k22,v22) Flush Flag (A special key-value item) Key: A magic number Value: < ??,???,????????,???????> ??: The column family ???: The last transaction in the memtable ????????: The LSN of the first KV of ??? Storage Engine (k11,v11)(k31,v31) (k11,v11)(k31,v31) Memtable (k12,v12)(k22,v22) Memtable Flush Flag Flushed and persisted CF2 CF1 7/4/2025 USTC-SYS Reading Group 10

  11. Epoch-based Persistence (EBP) Different CFs may make unequal progresses. Local epoch: < ??1,????>,< ??2,????>,< ??3,????> Global epoch: ???? The earliest transaction in the local epochs. 7/4/2025 USTC-SYS Reading Group 11

  12. Partial Recovery : Starting point for each column family The transactions that have been flushed don t need to be flushed again. Just start from the pending transactions. 7/4/2025 USTC-SYS Reading Group 12

  13. Example 7/4/2025 USTC-SYS Reading Group 13

  14. Evaluation Experiment setup Hardware: Intel i7-8700 3.2GHz processor, 32GB memory, 500GB SSD Software: Ubuntu 18.04 LTS with Linux Kernel 4.15 and Ext4 file system Workloads LinkBench and TPC-C 7/4/2025 USTC-SYS Reading Group 14

  15. Evaluation General Performance MyRocks PASV Improvement Total Loading Time (sec) 6027.3 4019.9 33.3% Data Loading Throughput (KOPS) 72.6 108.8 49.9% Storage Engine I/O (GB) 206.5 117.9 42.9% Total Execution Time (sec) 3371 2614 22.5% Query Running Throughput (KOPS) 2.97 3.82 28.6% Better performance with higher throughput and faster loading and execution. Lower storage cost by removing WAL in the storage engine layer. 7/4/2025 USTC-SYS Reading Group 15

  16. Evaluation Comparison with the Active Method Binlog Commit Order T3 T2 T1 Active method KVs Batch Commit Order (k31,v31) (k32,v32) (k11,v11) (k12,v12) (k21,v21) (k22,v22) Disadvantage Impaired system modularity. Flush memtables of different sizes. Storage Engine (k11,v11)(k31,v31) (k11,v11)(k31,v31) Memtable (k12,v12)(k22,v22) Memtable All memtables flush together and record the sync point . CF2 CF1 7/4/2025 USTC-SYS Reading Group 16

  17. Evaluation Comparison with the Active Method ACT-N means that invoke the active buffer flush process every N transactions. Smaller recovery time variance. Better performance and less I/Os. 7/4/2025 USTC-SYS Reading Group 17

  18. Evaluation with TPC-C PASV Na ve MyRocks Total Execution Time (sec) 2658.4 2651 3612 Throughput (KQPS) 35.6 35.7 26.2 Storage Engine I/O (GB) 204.2 204.2 267.6 Na ve method: Just remove the WAL in the storage engine. Performance is better than MyRocks and nearly identical to the Na ve approach. 7/4/2025 USTC-SYS Reading Group 18

  19. Conclusion Double-logging is a critical problem in LSM-tree based relational databases. We propose PASV to effectively address the double-logging problem. Our experiments show that PASV can achieve significant improvement. 7/4/2025 USTC-SYS Reading Group 19

More Related Content