Optimizing Peak Loads Through I/O Off-Loading Strategy

everest n.w
1 / 40
Embed
Share

"Explore how Everest tackles unexpected I/O peaks on servers through efficient write off-loading techniques to enhance performance and mitigate response time issues. Learn how workload properties are leveraged to optimize stores for peak loads and overcome challenges in maintaining data consistency and recoverability."

  • Everest
  • I/O off-loading
  • Peak loads
  • Workload properties
  • Data consistency

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Everest: scaling down peak loads through I/O off-loading D. Narayanan, A. Donnelly, E. Thereska, S. Elnikety, A. Rowstron Microsoft Research Cambridge, UK hanzler666 @ UKClimbing.com

  2. Problem: I/O peaks on servers Short, unexpected peaks in I/O load This is not about predictable trends Uncorrelated across servers in data center And across volumes on a single server Bad I/O response times during peaks Everest: write off-loading for I/O peaks 2

  3. Example: Exchange server Production mail server 5000 users, 7.2 TB across 8 volumes Well provisioned Hardware RAID, NVRAM, over 100 spindles 24-hour block-level I/O trace Peak load, response time is 20x mean Peaks are uncorrelated across volumes Everest: write off-loading for I/O peaks 3

  4. Exchange server load 100000 Load (reqs/s/volume) 10000 1000 100 14:39 15:52 17:05 18:18 19:31 21:57 23:10 00:23 01:36 02:49 04:02 05:15 06:28 07:41 08:54 10:07 11:20 12:33 13:46 20:44 Time of day Everest: write off-loading for I/O peaks 4

  5. Write off-loading Reads and Writes Everest store Writes Reclaims Everest client No off-loading Off-loading Reclaiming Everest store Reads Everest store Volume Everest: write off-loading for I/O peaks 5

  6. Exploits workload properties Peaks uncorrelated across volumes Loaded volume can find less-loaded stores Peaks have some writes Off-load writes reads see less contention Few foreground reads on off-loaded data Recently written hence in buffer cache Can optimize stores for writes Everest: write off-loading for I/O peaks 6

  7. Challenges Any write anywhere Maximize potential for load balancing Reads must always return latest version Split across stores/base volume if required State must be consistent, recoverable Track both current and stale versions No meta-data writes to base volume Everest: write off-loading for I/O peaks 7

  8. Design features Recoverable soft state Write-optimized stores Reclaiming off-loaded data N-way off-loading Load-balancing policies Everest: write off-loading for I/O peaks 8

  9. Recoverable soft state Need meta-data to track off-loads block ID <location, version> Latest version as well as old (stale) versions Meta-data cached in memory On both clients and stores Off-loaded writes have meta-data header 64-bit version, client ID, block range Everest: write off-loading for I/O peaks 9

  10. Recoverable soft state (2) Meta-data also persisted on stores No synchronous writes to base volume Stores write data+meta-data as one record Store set persisted base volume Small, infrequently changing Client recovery contact store set Store recovery read from disk Everest: write off-loading for I/O peaks 10

  11. Everest stores Short-term, write-optimized storage Simple circular log Small file or partition on existing volume Not LFS: data is reclaimed, no cleaner Monitors load on underlying volume Only used by clients when lightly loaded One store can support many clients Everest: write off-loading for I/O peaks 11

  12. Reclaiming in the background Everest client Everest store Read any Everest store write delete(block range, version) Volume <block range, version, data> Everest store Multiple concurrent reclaim threads Efficient utilization of disk/network resources Everest: write off-loading for I/O peaks 12

  13. Correctness invariants I/O on off-loaded range always off-loaded Reads: sent to correct location Writes: ensure latest version is recoverable Foreground I/Os never blocked by reclaim Deletion of a version only allowed if Newer version written to some store, or Data reclaimed and older versions deleted All off-loaded data eventually reclaimed Everest: write off-loading for I/O peaks 13

  14. Evaluation Exchange server traces OLTP benchmark Scaling Micro-benchmarks Effect of NVRAM Sensitivity to parameters N-way off-loading Everest: write off-loading for I/O peaks 14

  15. Exchange server workload Replay Exchange server trace 5000 users, 8 volumes, 7.2 TB, 24 hours Choose time segments with peaks extend segments to cover all reclaim Our server: 14 disks, 2 TB can fit 3 Exchange volumes Subset of volumes for each segment Everest: write off-loading for I/O peaks 15

  16. Trace segment selection 1000000 Total I/O rate (reqs/s) 100000 10000 1000 100 14:39 15:52 17:05 18:18 19:31 20:44 21:57 23:10 00:23 01:36 02:49 04:02 05:15 06:28 07:41 08:54 10:07 11:20 12:33 13:46 Time of day Everest: write off-loading for I/O peaks 16

  17. Trace segment selection 1000000 Peak 2 Peak 2 Total I/O rate (reqs/s) Peak 1 Peak 1 Peak 3 Peak 3 100000 10000 1000 100 14:39 15:52 17:05 18:18 19:31 20:44 21:57 23:10 00:23 01:36 02:49 04:02 05:15 06:28 07:41 08:54 10:07 11:20 12:33 13:46 Time of day Everest: write off-loading for I/O peaks 17

  18. Three volumes/segment min Trace store (3%) client Trace Trace client median Trace store (3%) Trace Trace Trace client max Trace store (3%) Everest: write off-loading for I/O peaks 18

  19. Mean response time 200 Mean resp time (ms) No off-load Off-load 150 100 50 0 Peak 1 reads Peak 2 reads Peak 3 reads Peak 1 writes Peak 2 writes Peak 3 writes Everest: write off-loading for I/O peaks 19

  20. 99th percentile response time 2000 99% resp time (ms) No off-load Off-load 1500 1000 500 0 Peak 1 reads Peak 2 reads Peak 3 reads Peak 1 writes Peak 2 writes Peak 3 writes Everest: write off-loading for I/O peaks 20

  21. Exchange server summary Substantial improvement in I/O latency On a real enterprise server workload Both reads and writes, mean and 99th pc What about application performance? I/O trace cannot show end-to-end effects Where is the benefit coming from? Extra resources, log structure, ...? Everest: write off-loading for I/O peaks 21

  22. OLTP benchmark OLTP client 10 min warmup 10 min measurement SQL Server binary Detours DLL redirection Everest client LAN Data Log Store Dushyanth Narayanan 22

  23. OLTP throughput 3000 2x disks, 3x speedup? 2500 Throughput (tpm) 2000 1500 Extra disk 1000 + 500 Log layout 0 No off-load Off-load Log structured 2-disk striped Striped + Log- structured Everest: write off-loading for I/O peaks 23

  24. Off-loading not a panacea Works for short-term peaks Cannot use to improve perf 24/7 Data usually reclaimed while store still idle Long-term off-load eventual contention Data is reclaimed before store fills up Long-term log cleaner issue Everest: write off-loading for I/O peaks 24

  25. Conclusion Peak I/O is a problem Everest solves this through off-loading By modifying workload at block level Removes write from overloaded volume Off-loading is short term: data is reclaimed Consistency, persistence are maintained State is always correctly recoverable Everest: write off-loading for I/O peaks 25

  26. Questions? Everest: write off-loading for I/O peaks 26

  27. Why not always off-load? OLTP client OLTP client SQL Server 1 Everest client SQL Server 2 Read Read Read Write Write Write Data Data Store Store Dushyanth Narayanan 27

  28. 10 min off-load,10 min contention 4 3 Speedup 2 1 0 Off-load Contention (server 1) Contention (server 2) Everest: write off-loading for I/O peaks 28

  29. Mean and 99th pc (log scale) 100000 Response time (ms) No off-load Off-load 10000 1000 100 10 1 Peak 1 reads Peak 2 reads Peak 3 reads Peak 1 writes Peak 2 writes Peak 3 writes Everest: write off-loading for I/O peaks 29

  30. Read/write ratio of peaks 1 Cumulative fraction 0.8 0.6 0.4 0.2 0 0 10 20 30 40 % of writes 50 60 70 80 90 100 Everest: write off-loading for I/O peaks 30

  31. Exchange server response time 10 Response time (s) 1 0.1 0.01 14:39 15:52 17:05 18:18 19:31 20:44 21:57 23:10 00:23 01:36 02:49 04:02 05:15 06:28 07:41 08:54 10:07 11:20 12:33 13:46 Time of day Everest: write off-loading for I/O peaks 31

  32. Exchange server load (volumes) 100000 Max Mean Min Load (reqs/s) 10000 1000 100 14:39 15:52 17:05 18:18 19:31 21:57 23:10 00:23 01:36 02:49 04:02 05:15 06:28 07:41 08:54 10:07 11:20 12:33 13:46 20:44 Time of day Everest: write off-loading for I/O peaks 32

  33. Effect of volume selection 40000 Peak 1 Load (reqs/s/volume) 35000 30000 All Selected 25000 20000 15000 10000 5000 0 22:30 22:32 22:34 22:36 22:38 22:40 22:42 22:44 22:48 22:50 22:52 22:54 22:56 22:58 22:28 22:46 Time of day Everest: write off-loading for I/O peaks 33

  34. Effect of volume selection 70000 Peak 2 Load (reqs/s/volume) 60000 50000 40000 All Selected 30000 20000 10000 0 03:16 03:19 03:22 03:25 03:28 03:34 03:37 03:40 03:43 03:46 03:49 03:52 03:55 03:58 04:01 03:31 Time of day Everest: write off-loading for I/O peaks 34

  35. Effect of volume selection 18000 Peak 3 Load (reqs/s/volume) 16000 14000 12000 All Selected 10000 8000 6000 4000 2000 0 10:07 10:09 10:11 10:13 10:15 10:17 10:19 10:21 10:25 10:27 10:29 10:31 10:33 10:35 10:05 10:23 Time of day Everest: write off-loading for I/O peaks 35

  36. Scaling with #stores OLTP client Store SQL Server binary Detours DLL redirection LAN Everest client Store Data Log Store Dushyanth Narayanan 36

  37. Scaling: linear until CPU-bound 6 Speedup 4 2 0 0 1 2 3 Number of stores Everest: write off-loading for I/O peaks 37

  38. Everest store: circular log layout Header block Tail Stale records Active log Reclaim Head Delete Everest: write off-loading for I/O peaks 38

  39. Exchange server load: CDF 1 Cumulative fraction 0.8 0.6 0.4 0.2 0 100 1000 10000 100000 Request rate per volume (reqs/s) Everest: write off-loading for I/O peaks 39

  40. Unbalanced across volumes 1 Cumulative fraction Min Mean Max 0.8 0.6 0.4 0.2 0 100 1000 10000 100000 Request rate per volume (reqs/s) Everest: write off-loading for I/O peaks 40

More Related Content