Cost/Performance in Modern Data Stores: How Data Caching Systems Succeed

Cost/Performance in Modern Data Stores: How Data Caching Systems Succeed
Slide Note
Embed
Share

Traditional databases are facing challenges with main memory row store and column store systems. Explore why traditional data caching systems are considered dead, while the elephants in the field still dance. Economic factors play a significant role in shaping hardware and software infrastructure needs. Money talks, as traditional database systems continue to cache data stores with good cost/performance, prompting a shift in focus for the industry.

  • Data Stores
  • Data Caching Systems
  • Traditional Databases
  • Performance
  • Cost

Uploaded on Feb 20, 2025 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Cost/Performance in Modern Data Stores How Data Caching Systems Succeed David Lomet Microsoft Research Redmond, WA 98052 2/20/2025 SoCal 2019 1

  2. Traditional Databases are Challenged Main Memory Row Store Main Memory Row Store And the threat is Perceived to be growing Traditional Database Systems Column Store Streaming Column Store Streaming 2/20/2025 SoCal 2019 2

  3. Traditional Data Caching Systems are Dead MSFT SQL Server MSFT SQL Server IBM DB2 Oracle Database Amazon Aurora 2/20/2025 SoCal 2019 3

  4. The Elephants Still Dance WHY? 2/20/2025 SoCal 2019 4

  5. Heres the story: Main Memory Row Store WHY? Niche market partnership absorbed Traditional Database Systems Column Store Streaming 2/20/2025 SoCal 2019 5

  6. Economic Factors Hardware infrastructure needs to be Cheap => Commodity hardware Software infrastructure needs to be Efficient => high performance on this hardware Elastic Provisioning Free up expensive resources when otherwise under-utilized Give resources to those with higher value (willing to pay more) Under-provision for SLAs and pocket the difference 2/20/2025 SoCal 2019 6

  7. Money Talks! $ 2/20/2025 SoCal 2019 7

  8. Traditional Database Systems Are Caching Data Stores 2/20/2025 SoCal 2019 8

  9. Traditional DBMSs have good Cost/Performance We need to move away from being fixated only on performance Costs vary enormously between systems We describe why traditional systems Have lower performance than recent research systems Yet succeed by having better cost/performance And suggest a change in focus for our field 2/20/2025 SoCal 2019 9

  10. Caching Data Store . Uses a Cache Moves data from secondary storage into main memory when it is to be operated on Removes data from main memory when it is no longer being operated on Writes it back to secondary storage when it has been changed Drops it from main memory when unchanged Data lives permanently on secondary storage 2/20/2025 SoCal 2019 10

  11. Result: Two Forms of Operation MM: main memory VERY fast N*10**6 ops/sec SS: secondary storage About 5.8X execution time of MM operations (our system) Must do MM operation + I/O to bring data into main memory Caching data stores look like a BAD idea 2/20/2025 SoCal 2019 11

  12. Mixed Operation Performance weighted average of MM and SS operations: R = SS/MM ~ 5.8 +- 30% 1 Expected Perf (max) Expected Perf (min) 0.9 Relative Performance 0.8 Actual Perf (1-core) Actual Perf (4-core) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 10 20 30 40 50 60 Percent of SS operations SoCal 2019 2/20/2025 12

  13. A Performance Disaster? Each added SS operation pushes performance lower Toward SLOW SS performance Away from FAST MM performance Why Bother??? Costs matter in addition to performance 2/20/2025 SoCal 2019 13

  14. So lets look at Cost/Performance 2/20/2025 SoCal 2019 14

  15. Two Costs for Operating on Data Storage Cost Always paid Most of cost for cold data Or when data becomes cold Execution Cost Paid only when data is operated on Most of cost for hot data Or when data becomes hot 2/20/2025 SoCal 2019 15

  16. Its All About Relative Costs 14 Relative Storage Costs Relative Execution Costs 12 12 New SSDs support n*100K IOPS 10 10 8 8 6 6 4 4 2 2 0 0 Main Memory Flash Storage Main Memory Op Execution Flash Storage Flash I/O Op Per Byte I/O Path Approximate Bw-tree Relative Numbers 2/20/2025 SoCal 2019 16

  17. Cost Analysis We compute cost/sec and plot it against operations/sec executed Cost is a linear function of storage and execution costs: C = A + BX, where we will plot C vs X Cost/sec = (storage-rental/sec) + (cost/operation)*(operations/sec) C = A + B * X A: Storage cost = (cost/byte)*(size of data) and is constant for an operation on a piece of data DRAM cost/byte ~ 10X Flash cost/byte At ops/sec = 0, all cost is storage cost: [storage cost is y-intercept] Execution cost = (cost/operation)*(operations/sec) B: Cost/operation: SS op ~ 12X MM op Cost/operation determines the slope of the cost curve 2/20/2025 SoCal 2019 17

  18. Some Details C = A + B * X $SS = cost of secondary storage ops/sec = $SSstor + $MMindex + {$MMexec + ($ SSD IO) + ($ Proc IO)}*Ops/sec $MM = cost of main memory ops/sec = {$MMstor + $SSstor + $MMindx} + {$Mmexec}*Ops/sec We set $SS = $MM and solve for (Ops/sec)**(-1) = time between ops $MMindx, $SSstor, $MMexec all cancel out Time (between ops) = {($ SSD IO) + ($ Proc IO)}/{($M/byte) * Msize} Gray s Five Minute Rule: Now ~ One Minute $SSD IO has dropped dramatically, so $ Proc IO is now important for cost Our full analysis tells you also about costs away from break-even 2/20/2025 SoCal 2019 18

  19. RELATIVE COST Cost/Performance 20 18 Gray 5 min Rule Now ~ 1 min for 4K page 16 14 12 10 8 Storage Cost Caching System uses lowest cost OP: Either SS or MM 6 4 2 0 Storage Cost High Performance PERFORMANCE Ops/(unit time) Data Caching Op MM Op Flash Op 2/20/2025 SoCal 2019 20

  20. How About Other Kinds of Systems 2/20/2025 SoCal 2019 21

  21. Main Memory Systems Perform Better indeed, data caching systems perform better when fully cached but with worse cost/performance MassTree: high performance main memory data store Deuteronomy: high performance caching data store MassTree has higher performance than Deuteronomy s Bw-tree Relative costs Execution cost: Deuteronomy is 2.6 X MassTree because MassTree is 2.6X faster Storage cost: MassTree is 2.3 X Deuteronomy Because MassTree uses 2.3X the storage of Deuteronomy 2/20/2025 SoCal 2019 22

  22. Cost/Performance: [Not to Scale] Data Caching Frequently Better Deuteronomy vs MassTree 40 About 3 seconds Per 4K page MassTree vs Bw-tree 3*10**6 ops/sec for 40GB 35 Deuteronomy has Better cost/performance 30 25 Relative Cost 20 All or Nothing No switching between systems 15 10 MM OP SS OP MassTree About 1 minute 4K page SS and MM ops 5 0 no operations Breakeven High High+ High++ High+++ Performance: Ops/Unit-time 2/20/2025 SoCal 2019 23

  23. Lowering Storage Cost Facebook problem: huge data volume almost all cold But requiring high performance and decent latency Buying processors to be able to attach more SSDs To accommodate volume of data Solution: data caching system + data compression Use SSD storage for decent latency Scale out processors for good performance Data compression to lower storage cost Recall that storage cost is (cost/byte)*(size of data) At the expense of increased execution cost 2/20/2025 SoCal 2019 24

  24. About 1 minute 4K page SS and MM ops Cost/Performance: Not to Scale With Compressed Data(hypothetical) 50 45 HOT 20% 40 35 Relative Cost Cold Data is in this range 30 25 20 Caching system can use lowest cost operation from choice of three COLD 80% 10 15 MM OP CSS SS OP CA OP 5 CSS used here for cold data 0 0 1 2 3 6 5 7 8 9 Performance: Ops/Unit-time 2/20/2025 SoCal 2019 25

  25. NVM Storage? Used as SSD RELATIVE COST 20 NVM Storage? Used as Mem Extension NVM Cost/Performance Gray 5 min Rule Now ~ 1 min for 4K page 18 16 14 12 Deeper Memory Hierarchy 10 8 6 Caching System uses lowest cost OP: Either SS or MM Storage Cost 4 2 0 Storage Cost High Performance PERFORMANCE Ops/(unit time) Data Caching Op MM Op Flash Op SoCal 2019 2/20/2025 26

  26. Making a High Performance and Low Cost Data Caching Systems 2/20/2025 SoCal 2019 27

  27. How do we provide good cost/perf? Data caching system is part of answer But how do we optimize cost/performance in a data caching system Increase in-memorymulti-core performance Increased performance reduces cost must be relevant to data caching Improve single core performance, and multi-core scale-out Reduce number of I/Os Blind writes, log structuring, record cache Reduce data movement cost: SSD to/from main memory Reduce execution cost of I/O: e.g., user-level I/O Reduce SSD I/O cost: e.g., SSD with more IOPS Modify SSD: summer project with Jae Young Do and Ivan Picoli Reduce cost of secondary storage Compress Data for fewer bytes Use flash, not NVRAM (NVRAM ~ 3X cost of Flash) New NVM Technology? 2/20/2025 SoCal 2019 28

  28. CAVEATS Two important ones 2/20/2025 SoCal 2019 29

  29. Our Numbers are ApPRXIMATE Performance based on Set of point experiments Executed on our machines Using our Deuteronomy system Costs from web prices Variable at any one time Changing over time General thrust of the results is solid But your mileage may vary! 2/20/2025 SoCal 2019 30

  30. Best cost/performance is NOT NOT always best Low costs are a plus, but not the only thing We want max[value cost] If value is high enough, low cost may be secondary Value might depend on latency (or peak performance) Some of the time, lowest latency wins the game and captures all the value E.g. some stock trading applications But this is not the common case Most of time, interactive system latency is adequate Anything less than a millisecond per op should be fine E.g. online shopping 2/20/2025 SoCal 2019 31

  31. Recap Two costs Storage: always present Execution: only when operations execute When data is: Cold: storage cost dominates Hot: execution cost dominates Data Caching Systems win on cost/performance because they Move hot data to (high cost) main memory cache for low execution cost Move cold data to (low cost) secondary storage for low storage cost We should be focusing on data caching improvements! 2/20/2025 SoCal 2019 32

  32. Talk brought to you by Bw-Tree in Production SQL Server Hekaton: Key-sequential index Lock-free for high concurrency consistent with Hekaton s non-blocking main memory architecture In-memory Bw-tree Azure DocumentDB: Indexing engine Rich query processing over a schema-free JSON model, with automatic indexing Sustained document ingestion at high rates Bw-tree + LLAMA Bing ObjectStore: Sorted key-value store Supports range queries Optimized for flash SSDs Bw-tree + LLAMA ObjectStore 2/20/2025 SoCal 2019 33

  33. Some References David B. Lomet: Cost/performance in modern data stores: how data caching systems succeed. DaMoN 2018: 9:1-9:10 https://dl.acm.org/citation.cfm?doid=3211922.3211927 Justin J. Levandoski, David B. Lomet, Sudipta Sengupta: The Bw-Tree: A B- tree for new hardware platforms. ICDE 2013: 302-313 Justin J. Levandoski, David B. Lomet, Sudipta Sengupta: LLAMA: A Cache/Storage Subsystem for Modern Hardware. PVLDB 6(10): 877-888 (2013) http://www.vldb.org/pvldb/vol6/p877-levandoski.pdf Justin J. Levandoski, David B. Lomet, Sudipta Sengupta, Ryan Stutsman, Rui Wang: High Performance Transactions in Deuteronomy. CIDR 2015 http://cidrdb.org/cidr2015/Papers/CIDR15_Paper15.pdf 2/20/2025 SoCal 2019 34

  34. QUESTIONS? 2/20/2025 SoCal 2019 35

More Related Content