Optimizing Distributed Data Locality-Aware Job Allocation

distributed data locality aware job allocation n.w
1 / 9
Embed
Share

Explore a proposed approach in job allocation with a focus on data locality awareness. The bidding scheduler method discussed aims to optimize job execution time, utilize workers efficiently, and adapt to changing environments. Learn about different scheduler approaches and their impact on job assignment in distributed systems.

  • Optimization
  • Data Locality
  • Job Allocation
  • Distributed Systems
  • Bidding Scheduler

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. Distributed Data Locality-Aware Job Allocation Presenter: Ana Markovic Authors: Ana Markovic, Dimitris Kolovos, Leandro Soares Indrusiak 4/13/2025 1

  2. Outline 1. Example 2. Data-locality scheduler approaches 3. Proposed approach: Bidding scheduler 4. Evaluation 5. Conclusion & Future work 4/13/2025 2

  3. Example Example Mining software repositories Examining co-occurrences of popular JavaScript libraries in large scale GitHub projects 4/13/2025 3

  4. Data Data- -locality scheduler approaches locality scheduler approaches Delayed scheduling delays job assignment until an appropriate node is available or the allocation will be postponed Matchmaking for MapReduce when a node is available will it try to pull a task for which it has data locally the node can remain idle for a single heartbeat before it is bound to accept a non-local task Crossflow scheduler Relies on opinionated nodes which can accept an incoming task or decline it a fixed number of times if the necessary data is not local 4/13/2025 4

  5. Proposed approach: Bidding scheduler Proposed approach: Bidding scheduler Master opens a bidding contest for each incoming job Workers submit their bids, ie. estimates of how long they anticipate it would take to complete the job Worker with the lowest bid wins the job Currently, we wait for all workers to submit their bids within a threshold period of time Estimates include data transfer time, processing time, and estimates for previously won jobs Workers are involved in the job-allocation process Benefits: Optimizing the end-to-end execution time even if it means less data-locality All workers are almost equally used even if there are numerous jobs involving the same data Worker differences (CPU/Memory) are considered in job allocation and not only the local data Adaptable in volatile environments workers speed/memory can change over time, and job allocation will take it into account 4/13/2025 5

  6. Evaluation Metrics: end-to-end execution time data load cache misses Cloud infrastructure setup distributed AWS infrastructure using t3.micro instances total of 7 instances 5 workers 1 master 1 for the messaging infrastructure instances are geographically distributed (randomly determined during configuration startup) Simulation configurations 4 worker configurations (including fast/slow workers) 5 job configurations Real deployment experiments network bandwidth and read/write speed determined by the AWS infrastructure 4/13/2025 6

  7. 1 2 Evaluation Simulation results Bidding Scheduler achieves a speedup of approximately 24.5% compared to the Baseline. Bidding Scheduler demonstrates improvements in local data utilization, with approximately 49% fewer cache misses and approximately 45.3% reduction in data load per workflow run. 3 1.Average total execution time per workload 2.Average cache miss count per workload 3.Average data load per workload 4/13/2025 7

  8. Evaluation real deployment results The Bidding Scheduler completes the execution with a 10.3%-25.5% reduction in time compared to the Baseline, together with ~50% less cache misses resulting in ~60% less MB downloaded. 4/13/2025 8

  9. Conclusion & Future work Conclusion -effective for large resources and long-running workflows - for small resources or short workflows, competing for jobs unnecessarily prolongs the execution Future work Larger-scale evaluation (e.g. comparing with other scheduling techniques such as Matchmaking) Additional worker intelligence introduce worker s ability to learn from estimates errors and correct their future bids 4/13/2025 9

Related


More Related Content