Fast Databases with Fast Durability and Recovery through Multicore Parallelism

fast databases with fast durability and recovery n.w
1 / 28
Embed
Share

Explore the challenges and goals of developing an in-memory database with fast durability and recovery, presented by Wenting Zheng at OSDI'14. Learn about Silo and SiloR designs to achieve full persistence and fast transaction throughput, with a focus on multicore parallelism.

  • Database
  • Multicore Parallelism
  • Fast Recovery
  • In-memory Database
  • Durability

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Fast Databases with Fast Durability and Recovery Through Multicore Parallelism Wenting Zheng, on OSDI 14 Presented by Youhui Bai 2017/11/15

  2. Outline Background and Motivations Goal Silo overview SiloR design Evaluation

  3. Background and Motivations In-memory databases are popular 1. Process tens of millions of transactions per seconds Potential weakness is robustness 1. Durability is impacted by crashes and power failures Crash resistance mechanisms 1. Logging and checkpointing Slow down transaction execution 1. Generate more than 50GB log data per minute 2. Na ve implementation: sequential write a single log file a) Recovery of a multi-gigabyte database using a single core could take more than 90 minutes

  4. Goal Develop an in-memory database with full persistence at relatively low cost to transaction throughput and fast recovery in a few minutes

  5. Challenges Avoid interference with transaction execution Fast recovery 1. Serial recovery takes too long 2. Parallel recovery constrains logging and checkpointing designs Implement SiloR based on Silo (SOSP 13)

  6. Outline Background and Motivations Goal Silo overview SiloR design Evaluation

  7. Silo Overview A fast in-memory database 1. Tables are stored in B-trees on shared memory key record

  8. Silo Overview A fast in-memory database 1. Tables are stored in B-trees on shared memory 2. Concurrency control central on transaction IDs (TIDs) Epoch 1. A designated thread advances it every 40 ms 2. Globally visible TID comprise : 64 bit integer 1. High bits : epoch number 2. Middle bits : distinguish txns 3. Low three bit : status bits

  9. Silo Overview A fast in-memory database 1. Tables are stored in B-trees on shared memory 2. Concurrency control central on transaction IDs (TIDs) 3. Workers on different cores execute transactions W0 W1 Wn write write buf 1 write write buf n buf 0 Read-set Write-set Old TIDs New states for written records 1. 2. 3. Locks the records in the write-set Compute the TIDs Compare TIDs to determine aborting or committing

  10. Outline Background and Motivations Goal Silo overview SiloR design Evaluation

  11. SiloR Design Logging Checkpointing Recovery

  12. Logging Value logging vs operation logging 1. Operation logging : smaller log size 2. Value logging : easier to parallelize recovery SiloR chooses value logging A log record 1. Contain a committed transaction s TID plus the table, key, and value information for all records modified by that transaction

  13. Logging structure Condition 1. Buffer fills 2. Epoch boundary inform W W W W commit commit commit commit buf buf buf buf condition condition buf buf buf buf L L fsync fsync

  14. Logging file management Log file renamed to old_data.e, where e is the largest epoch seen in that particular file W W buf buf buf buf L It would be possible to extract all logs for recovery old_data.e data.log data.log

  15. Logging file management W W W W e1 e2 e3 e4 e?1 e?2 L L ??1= min ?1,?2;??2= min{?3,?4} ??= min ??1,??2 1 All transactions in epochs ??are persistent Disadvantage: transaction commit contains tow fsync

  16. Checkpoint why and main goal Logs suffice to recover a database, but don t suffice to recover a database in bounded time Main goal: produce checkpoints periodicly as quickly as possible without disrupting worker throughput

  17. Checkpoints architecture Parallel checkpointing 1. Different checkpointers are responsible for different slices of the database 2. A checkpoint manager assigns slices to checkpointers 3. Each slice s amount to roughly 1/nth of the database (n is the number of disks) L/C L/C

  18. Checkpoint processing End and cleanup 1. Remove checkpoint and log files that contain only transaction with epochs < ??(current checkpoint start time) C m files Block: Key/TID/value tuples

  19. Recovery Bring database to a consistent state 1. Recovery parallelism is easy because of logging and checkpointing designs Checkpoint recovery Log recovery

  20. Checkpoint recovery Easy parallelism 1. One checkpoint recovery thread per file R R R R m files m files

  21. Log recovery Value logging enables log files to be played in any order highest TID per key wins 1. Logs in later epochs replayed first No log record from epoch > ep is replayed

  22. Outline Background and Motivations Goal Silo overview SiloR design Evaluation

  23. Evaluation Experiment setup 1. Single machine with four 8 core Intel Xeon E7-4830 processors (32 physical cores) 2. Machine has 256 GB of DRAM, 64 GB of DRAM attached to each socket 3. 4 disks: 3 Fusion IO drives, 1 RAID-5 disk array

  24. Evaluation goals Can SiloR keep up with high transaction throughput from Silo? Does recovery take no more than a few minutes for a large database?

  25. Evaluation: YCSB-A Key-value benchmark 400 million keys, 100 byte records 70% read, 30% write 28 workers, 4 loggers, 4 checkpoint threads Database does not grow

  26. Evaluation: YCSB-A

  27. Recovery for YCSB-A Simulates crash right before the second checkpoint completes

  28. Thank you~

More Related Content