Optimizing Cluster Scheduling with Hopper Decentralized Speculation

hopper n.w
1 / 47
Embed
Share

"Discover how Hopper, a decentralized job scheduler, enhances scheduling efficiency and speeds up analytics jobs using speculation techniques. Explore the impact of speculation-aware scheduling on performance and resource utilization in large-scale clusters."

  • Hopper
  • Decentralized
  • Cluster Scheduling
  • Speculation
  • Optimization

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Hopper Decentralized Speculation-aware Cluster Scheduling at Scale Xiaoqi Ren, Ganesh Ananthanarayanan, Adam Wierman, Minlan Yu

  2. Hopper is a decentralized job scheduler which jointly optimizes speculation and job scheduling and speeds up analytics jobs by >50% 2

  3. Schedulers Workers A1 Job A A2 A3 B1 Job B B2 B3 3

  4. Schedulers Workers straggler Job A A1 B1 Job B B2 B3 A2 A3 4

  5. Workers straggler Common fix: Speculation A1 B1 Speculative tasks account for 21% of resource usage in Facebook Hadoop cluster B2 A1+ B3 A2 A3 5

  6. Speculation is a well-honed technique Mantri(2010) Duplication(2004) GRASS(2014) LATE(2008) Dolly(2013) 6

  7. Speculation is a well-honed technique Mantri(2010) Duplication(2004) How to joint optimize job scheduling and speculation? GRASS(2014) LATE(2008) Dolly(2013) 7

  8. A1 A2 Job A speculation A1 A3 A2 time time 0 0 5 5 8

  9. A1 A2 Job A speculation A1 A3 job scheduling A2 B1 B2 Job B time time 0 0 5 5 When should a scheduler start an unscheduled task from a new job vs. start an unscheduled task from a running job vs. start a speculative task? 9

  10. How do schedulers handle speculation today? Best effort speculation (practical) budgets no slots. No guarantee when can speculate. Budgeted speculation budgets a fixed number of slots. Slots are usually wasted. No coordination between scheduling and speculation leads to huge performance degradation. 10

  11. Example: Best effort A1 Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 A2 A3 A3 B1 B2 time time B3 20 20 B3 0 0 10 10 30 30 4 40 0 straggler straggler Job A Job B B2 B4 B3 B1 A3 A1 A2 B5 11

  12. Example: Best effort A1 A3+ B3+ Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 A2 A3 B1 B2 time time B3 20 20 0 0 10 10 30 30 4 40 0 straggler straggler Job A Job B B2 B4 B3 B1 A3 A1 A2 B5 12

  13. Example: Best effort A: 22 A: 22 A1 A3+ B3+ Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 A2 A3 B1 B2 time time B3 0 0 10 10 20 20 30 30 4 40 0 straggler straggler Job A Job B B2 B4 B3 B1 A3 A1 A2 B5 13

  14. Example: Best effort A: 22 A: 22 A1 A3+ B3+ Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 A2 A3 B4 B4 B1 B2 B5 time time B3 0 0 10 10 20 20 30 30 4 40 0 straggler straggler Job A Job B B2 B4 B3 B1 A3 A1 A2 B5 straggler 14

  15. Example: Best effort A: 22 A: 22 A1 A3+ B3+ B4+ Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 A2 A3 B4 B1 B2 B5 time time B3 0 0 10 10 20 20 30 30 4 40 0 straggler straggler Job A Job B B2 B4 B3 B1 A3 A1 A2 B5 straggler 15

  16. Example: Best effort A: 22 A: 22 B: 32 B: 32 A1 A3+ B3+ B4+ Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 Inefficient because speculation comes too late. A2 A3 B4 B1 B2 B5 time time B3 0 0 10 10 20 20 30 30 4 40 0 straggler straggler Job A Job B B2 B4 B3 B1 A3 A1 A2 B5 straggler 16

  17. Example: Budgeted B: 34 B: 34 A: 24 A: 24 A1 A3 Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 Inefficient because slots are wasted. A2 B1 B2 B3 B4 B5 B4+ A3+ B3+ budgeted slots time time 0 0 10 10 20 20 30 30 4 40 0 straggler straggler Job A Job B B2 B4 B3 B1 A3 A1 A2 B5 straggler 17

  18. Example: Dynamic allocation A1 Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 A2 A3 A3 A3+ B1 B2 time time 0 0 10 10 20 20 30 30 4 40 0 straggler straggler Job A Job B B2 B4 B3 B1 A3 A1 A2 B5 straggler 18

  19. Example: Dynamic allocation A: 12 A: 12 A1 Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 A2 A3 A3+ B1 B2 time time 0 0 10 10 20 20 30 30 4 40 0 straggler straggler Job A Job B B2 B4 B3 B1 A3 A1 A2 B5 straggler 19

  20. Example: Dynamic allocation A: 12 A: 12 B3 B4 B4 A1 Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 B3 A2 A3 B5 B3+ B4+ A3+ B1 B2 time time 0 0 10 10 20 20 30 30 4 40 0 straggler straggler Job A Job B B2 B4 B3 B1 A3 A1 A2 B5 straggler 20

  21. Example: Dynamic allocation A: 12 A: 12 B3 40% improvement B: 24 B: 24 A1 Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 A2 A3 B4 B5 B3+ B4+ A3+ B1 B2 time time 0 0 10 10 20 20 30 30 4 40 0 straggler straggler Job A Job B B2 B4 B3 B1 A3 A1 A2 B5 straggler 21

  22. Example: Dynamic allocation A: 12 A: 12 B3 Intuition for design 40% improvement B: 24 B: 24 A1 Slot 1 Slot 2 Slot 3 Slot 4 Slot 5 Slot 6 A2 A3 speculation among jobs B4 dynamically allocate/budget slots for B5 B3+ B4+ A3+ B1 B2 time time 0 0 10 10 20 20 30 30 4 40 0 straggler straggler How valuable is one extra slot for a job? Job A Job B B2 B4 B3 B1 A3 A1 A2 B5 straggler 22

  23. How valuable is one extra slot for a job? 2.2 Two regimes: Job completion time 2 1.8 1.6 1.4 1.2 1 0.5 1 1.5 2 2.5 # of assigned slots/# of tasks 23

  24. How valuable is one extra slot for a job? 2.2 2.2 Two regimes: A: marginal return is large and constant Job completion time 2 2 1.8 1.8 1.6 1.6 1.4 1.4 1.2 1.2 1 1 0.5 0.5 1 1 1.5 1.5 2 2 2.5 2.5 # of assigned slots/# of tasks 24

  25. How valuable is one extra slot for a job? virtual size 2.2 2.2 Two regimes: A: marginal return is large and constant B: marginal return is small Job completion time 2 2 1.8 1.8 1.6 1.6 1.4 1.4 1.2 1.2 1 1 0.5 0.5 1 1 1.5 1.5 2 2 2.5 2.5 # of assigned slots/# of tasks 25

  26. How valuable is one extra slot for a job? virtual size 2.2 2.2 Job completion time Pareto task duration distribution: virtual size =2 # of tasks 2 2 1.8 1.8 1.6 1.6 : straggler likelihood 1.4 1.4 1.2 1.2 1 1 0.5 0.5 1 1 1.5 1.5 2 2 2.5 2.5 # of assigned slots/# of tasks 26

  27. How valuable is one extra slot for a job? virtual size 2.2 2.2 Job completion time 2 2 1.8 1.8 1.6 1.6 1.4 1.4 1.2 1.2 1 1 0.5 0.5 1 1 1.5 1.5 2 2 2.5 2.5 # of assigned slots/# of tasks Scheduler should make sure: all jobs get their virtual sizes, before any job gets >its virtual size. 27

  28. Two cases: 1) Jobs can not all achieve their virtual sizes Theorem: Assigning slots to smallest jobs and each given the number of slots equals to its virtual size is optimal. 2) Jobs can all achieve their virtual sizes Scheduler should make sure: all jobs get their virtual sizes, before any job gets >its virtual size. 28

  29. How should slots be shared? 1) Jobs can not all achieve their virtual sizes Theorem: Assigning the smallest jobs their virtual sizes is optimal. 2) Jobs can all achieve their virtual sizes Shortest job first S1 S2 S3 S4 Slots: Jobs: Job A (2) Job B (3) Job C (4) 29

  30. How should slots be shared? 1) Jobs can not all achieve their virtual sizes Theorem: Assigning the smallest jobs their virtual sizes is optimal. 2) Jobs can all achieve their virtual sizes Theorem: Sharing slots proportionally to virtual sizes is optimal. Shortest job first Larger jobs have more stragglers S1 S6 S2 S7 S3 S8 S4 S9 S5 S10 Slots: Jobs: Job A (2) Job B (3) 30

  31. Dynamic allocation when decentralized Job Worker Scheduler1 Worker Worker Scheduler2 Worker 31

  32. Dynamic allocation when decentralized Job Worker Scheduler1 Worker ACK Worker Scheduler2 Worker Workers schedule tasks in their waiting queues. 32

  33. Hurdles to dynamic allocation Any scheduler or worker only knows a subset of jobs 1) How to determine whether cluster is capacity constrained or not? 2) How to perform scheduling policies (assigning to smallest jobs/proportionally sharing)? 33

  34. Hurdles to dynamic allocation 1) How to determine whether cluster is capacity constrained or not? Assume capacity is constrained by default. Job constrained? Scheduler1 Worker Scheduler2 34

  35. Hurdles to dynamic allocation 1) How to determine whether cluster is capacity constrained or not? Assume capacity is constrained by default. After several refusals, believe capacity is not constrained. Job NOT constrained Scheduler1 Worker Scheduler2 35

  36. Hurdles to dynamic allocation 2) How to perform scheduling policies (assigning to smallest jobs/proportionally sharing)? Power of two many choices 36

  37. Hurdles to dynamic allocation 2) How to perform scheduling policies (assigning to smallest jobs/proportionally sharing)? Power of twomany choices. Workers get enough information to perform scheduling policies. Two choices is not enough for heavy-tailed workload [M. Bramson, et al., Sigmetrics, 2010] 37

  38. Hopper Core features Dynamically budgeting slots across jobs Scalable: easy to decentralize Other details + Independent of task-level speculation algorithms + Multiple phase jobs (DAG) + Fairness and policy constraints + Locality constraints See our paper for more details 38

  39. How well does Hopper perform? Hadoop YARN 2.3 and Spark 0.7.3 200 node private cluster Real workload from Facebook and Microsoft Cosmos from Oct-Dec 2012 Baseline: Best effort + decentralized scheduler 39

  40. Decentralized Hopper reduces job completion time by 66% Baseline: best effort speculation + decentralized scheduler - Reduction (%) in Average 80 Job Completion Time 60 40 Facebook workload Cosmos workload 20 0 60 65 70 75 80 85 90 Utilization (%) Results are consistent across speculation algorithms, # of schedulers, and centralized Hopper. 40

  41. Decentralized Hopper performs efficient scheduling with little fairness loss Fairness knob : guarantee (1- ) perfect fairness Reduction (%) in Average 70 Job Completion Time 50 30 Facebook workload Cosmos workload 10 0 2 5 10 15 20 25 30 Fairness knob (%) Unfair Fair 41

  42. Hopper: Decentralized Speculation-aware Scheduling Predictable and scalable performance for analytics in large clusters Predictability via speculative (extra) tasks Scalability via decentralized scheduling Speculative tasks cost extra slots No central info of jobs Power of many choices probe & queue Optimal #slots per job statistical & theoretical Job completion time 2 2 1.5 1.5 1 1 0.5 0.5 # of assigned slots/# of tasks 1 1 1.5 1.5 2 2 2.5 2.5 Spark & Hadoop prototype; production queries speed-up by 66%

  43. Hopper Decentralized Speculation-aware Cluster Scheduling at Scale Xiaoqi Ren, Ganesh Ananthanarayanan, Adam Wierman, Minlan Yu

  44. Back up Slides

  45. Centralized Hopper reduces job completion time by 50% Baseline: LATE (speculation) + SRPT (job scheduler) Reduction (%) in Average - 100 Job Completion Time Hadoop Spark 80 60 40 20 0 Overall < 50 51-150 151-500 > 500 Job sizes (number of tasks) 45

  46. Decentralized Hopper: CDF of gains 100 Facebook workload Cosmos workload 80 CDF 60 40 20 0 0 20 40 60 80 Reduction (%) in Average Job Duration 46

  47. Fairness 20 25 duration of Slowed Jobs Increase (%) in Job (%) Slowed Jobs 20 15 %Jobs Average Worst 15 10 10 5 5 0 0 0 10 Fairness (%) 20 30 47

More Related Content