Locality-Aware Load Balancing for Serverless Clusters

locality aware load balancing for serverless n.w
1 / 20
Embed
Share

Explore the criticality of locality in serverless computing, discussing challenges, solutions like Consistent Hashing, and the impact of load balancing on resource utilization and latency reduction. Discover how provider and user preferences align for warm hits and efficient resource management.

  • Serverless
  • Load Balancing
  • Locality
  • Consistent Hashing
  • Resource Management

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Locality-Aware Load-Balancing For Serverless Clusters Alex Fuerst, Prateek Sharma HPDC; June 29th, 2022; Minneapolis, Minnesota

  2. Serverless Function as a Service Clusters 1 1 2 2 1. Unique resource management challenge 1 1 3 Wide diversity in applications Load Balancer Lot of servers to choose between quickly 2. Many considerations for provider Maximizing resource utilization Server Server Server Minimize user latency 2 INDIANA UNIVERSITY BLOOMINGTON

  3. Where to run a function? 1. Conventional policies might use least loaded Functions all require CPU resources And they occupy space in server memory Both are highly applicable load metrics 2. OpenWhisk uses a sticky algorithm with a memory cap 3 INDIANA UNIVERSITY BLOOMINGTON

  4. Criticality of Locality 1. Functions see lower latency when run repeatedly on same server 2. Provider keeps function sandbox in memory to achieve this 3. Both users and providers want such warm hits 4. Most work has focused on single-server optimizations 5. Policies like least-loaded increase cold starts Function CNN Inference Web Serving Scientific Computing Warm Runtime 2.5 s 0.4 s 0.5 s Cold Runtime 6.5 2.4 s 2.5 s 4 INDIANA UNIVERSITY BLOOMINGTON

  5. Outline & Significance 1. Possible impact of load balancing 2. Background: Why load balancing choice FaaS is hard? 3. Our solution: Consistent Hashing with Random Load Update 4. Experimental Analysis 5 INDIANA UNIVERSITY BLOOMINGTON

  6. Domain Challenges Competing priorities 1. Preserve locality 2. Avoid server overloading Complex nature of the workload 1. Heterogenous function characteristics 2. Skewed nature of requests: frequent, burst, or rare 3. Inconsistent cluster information 6 INDIANA UNIVERSITY BLOOMINGTON

  7. Locality via Consistent Hashing 1. Want a policy that encourages warm hits 0 3 Locality seems like a good way to get this 2. Functions map somewhere on a ring Worker 3. Local server is next on the ring 4. Repeated functions run on the same server 1 2 7 INDIANA UNIVERSITY BLOOMINGTON

  8. Locality via Consistent Hashing 1. Want a policy that encourages warm hits 0 3 Locality seems like a good way to get this 2. Functions map somewhere on a ring Worker 3. Local server is next on the ring 4. Repeated functions run on the same server 5. Amenable to server changes 2 Low-impact addition and removal 8 INDIANA UNIVERSITY BLOOMINGTON

  9. Overloading Concern 1. Functions are invoked at different frequencies 3 0 2. High variance on used resources 3. 20% of functions use 90% of resources 2 9 INDIANA UNIVERSITY BLOOMINGTON

  10. CH with Bounded Loads 1. Assumes each item creates equal load 3 0 2. LB tracks outstanding work to get load 3. If server is full we forward to the next server 4. When under the load bound: pure locality 5. Forwarding has a high but decaying probability of warm hit Domain characteristics make this not suitable for FaaS 2 [*] Mirrokni, V., Thorup, M., and Zadimoghaddam, M.Consistent hashing with bounded loads. InProceedings of the Twenty-Ninth Annual ACM-SIAM Symposiumon Discrete Algorithms(2018), SIAM, pp. 587 604. 10 INDIANA UNIVERSITY BLOOMINGTON

  11. Challenge 1: Stale loads 1. Functions do not contribute to load equally or consistently Any load metric will be stale at load balancer 2. We use Linux s load average metric to capture all work on the server 3. Policies assuming perfect load information can lead to violating load bound Load Load Load Bound Actual Load Actual load exceeds bound Est. Load Assign work here thinking estimated load will not exceed bound 11 INDIANA UNIVERSITY BLOOMINGTON

  12. Challenge 2: Outsized Impact Functions 1. 20% of functions account for most invocations Invoked with high frequency, e.g. less than every second Bursts of high frequency with periods of inactivity 2. These functions create violations of the load bound 3. We modify SHARDS reuse distance to find such functions online 4. Tracking the inter-arrival-time quickly detects popular functions or bursts 12 INDIANA UNIVERSITY BLOOMINGTON

  13. Solution: Aggressive Forwarding 1. Popular and busts are marked for more aggressive forwarding 2. Even if the server is under the load bound, forwarding can happen 3. Anticipate load of a function using Gaussian distribution, N( ) 4. Reduces overload risk for individual servers 5. Function is popular enough to maintain locality across several servers 13 INDIANA UNIVERSITY BLOOMINGTON

  14. CH with Random Load Update RLU-Forward(function, server) Popular functions have increased chance of being forwarded L = Load(server) ifPopular(function) L = Load(server) + N(?=1/avg_IAT, ?=0.1) Stale server load is updated based on routing our decision if L < load_bound route(function, server) Follow the hash ring for locality else RLU-Forward(function, next(server)) 14 INDIANA UNIVERSITY BLOOMINGTON

  15. Evaluation 1. Implemented our algorithm inside OpenWhisk 2. Workers load is propagated to load balancer via Redis cache 3. Workload Functions created with code from FunctionBench ported to OpenWhisk Arrivals made to mimic Azure trace data for realistic setup 4. Our primary goal is to minimize global latency 5. The optimal case is everything running warm 6. And ensure no server is overloaded causing interference 15 INDIANA UNIVERSITY BLOOMINGTON

  16. Global Latency Evaluation 1. Place systems under heavy load 2. Enough to have ~1 outstanding request per CPU in the cluster 3. Least-loaded eschews locality entirely 4. CH - Bounded loads forwards poorly 5. RLU outperforms all by at least 2x 16 INDIANA UNIVERSITY BLOOMINGTON

  17. Load balancing 1. Variance between loads on workers 2. OpenWhisk prioritizes locality 3. This causes excessive load on workers 4. Created additional latency for functions 5. RLU is on-par with least-loaded 6. Server load is kept low while preserving locality for low latency 17 INDIANA UNIVERSITY BLOOMINGTON

  18. Bursty Workload Evaluation 1. Change the frequency of functions routinely 2. Exercising the ability to detect changing popularities 18 INDIANA UNIVERSITY BLOOMINGTON

  19. Related Works 1. Single server optimizations Keep-alive: FaasCache; ASPLOS 2021, Serverless in the Wild; ATC 2020 Startup: Catalyzer; ASPLOS 2020, Firecracker; NSDI 2020 2. Cluster Level Package-Aware Scheduling; CCGRID 2019 IceBreaker; ASPLOS 2022 19 INDIANA UNIVERSITY BLOOMINGTON

  20. Our simple and practical load balancing policy can substantially improve FaaS latency over current load balancing policies. Please check out our paper, Locality-aware Load-Balancing For Serverless Clusters, for more.

More Related Content