
Planning Bulk Transfer via Internet and Shipping Networks
Explore new algorithms for efficient planning of bulk transfer through internet and shipping networks to overcome bandwidth limitations. Learn about various transfer options, cost comparisons, and cooperative transfer solutions to meet deadlines and minimize costs effectively.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
New Algorithms for Planning Bulk Transfer via Internet and Shipping Networks Brian Cho Indranil Gupta University of Illinois at Urbana-Champaign
Motivation: Ad-hoc Data Processing Data-intensive research on OpenCirrus Federated cloud: diverse geographic locations Data scale of TBs Limited wide area bandwidth is a big bottleneck : Can take days or weeks to transfer over internet [Garfinkel 07] Success story: Washington Post Hillary Clinton White House schedule Released as 17,481 pages non-searchable PDF images Convert to searchable text and deliver to newsroom within the same news cycle Done within 26 hours with Amazon AWS Pay for bandwidth and computer usage 2
Bulk Transfer Options Internet Transfer Grid: [GridFTP] PlanetLab: [CoBlitz 06] Disk Shipping Transfer [Jim Gray 03] [PostManet 04] [DOT 06] Amazon AWS Import/Export Pandora (People and networks moving data around) First ever solution to transfer data cooperatively between multiple sources with internet and shipping edges Produce optimal transfer plans that obey time deadlines and minimize dollar cost Better than internet-only and shipping-only strategies 3
Option 1: Internet Transfer 5-20 Mbps 1TB: 5-20 days $0.10 per GB Data Source (CMU) Computation Provider (Amazon) No Cost Data Source (Illinois) 4
Option 2: Disk Shipping Transfer Overnight: $60 per Disk Two-Day: $30 per Disk Ground: $10 per Disk Disk Interface 40 MB/s $0.02 per GB $80 per Disk Data Source (CMU) Computation Provider (Amazon) Overnight: $40 per Disk Two-Day: $15 per Disk Ground: $5 per Disk Data Source (Illinois) Overnight: $50 per Disk Two-Day: $25 per Disk Ground: $5 per Disk 5
Cooperative Transfer Solutions Good solutions Meet deadlines Minimize dollar cost Complexity Global scale Many strategies Collaboration helps Open Cirrus Sites How to find the best solution? 6
Example: Minimize Dollar Cost 0.8 TB Data Source A Cloud Service Provider 15 Days Total Cost: $125 Total Time: 20 Days No Cost Data Source B Loading: $40 Handling: $80 5 Days . 1.2 TB Ground: $5 14 hours 7
Example: Meet Deadline (3 days) while Minimizing Dollar Cost 0.8 TB Data Source A 1 Day Total Cost: $210 Total Time: 3 Days Cloud Service Provider Overnight: $40 6 hours Data Source B Loading: $40 Handling: $80 1 Day . 1.2 TB Overnight: $50 . 14 hours 8
Outline Motivation Problem Formulation Graph Model Flow Over Time Solution: Pandora Experimental Results Conclusion 9
Graph Model: Internet Links Capacity (Mb/s) Cost ($/GB) Transit time (almost instantaneous) Incoming/ Outgoing BW inet_out inet_out inet_in inet_in Site A Site B 10
Graph Model: Shipment Links Capacity (Mb/s) Cost ($/GB) Transit time (almost instantaneous) Incoming/ Outgoing BW inet_out inet_out inet_in inet_in ship_in ship_in Site A Site B Capacity (almost infinite) Cost: Shipping and Handling ($/Disk) Transit time (Hrs) Disk Interface BW e.g., 40 MB/s Cost: Loading ($/GB) 11
Data Transfer Over Time Goal: Meet time deadline T while minimizing dollar cost C Hard problem on graph with both Internet and Shipment links NP-Hard Formal problem and proof in paper Solution: Pandora computes optimal and approximate solutions 12
Solution: Pandora Overview Transform into static time-expanded network Decomposition of shipping edges Solve min-cost flow on static network Mixed Integer Program Optimizations to reduce computation time 13
Time-expanded Network Intuitively, incorporate time into graph to create an extended graph representation = 3 = 1 T = 5 Make T=deadline copies of each vertex Draw edges according to transit time Draw holdover edges time [Ford Fulkerson 58] Disk shipment represented as time-expanded network 14
Decomposed Shipping Edges Decompose shipping edges to fixed cost edges 1. Transit time 2. Fixed cost 3. Capacity capacity = 2 TB cap = 2 TB cost = $130 cost = $110 cost = $100 capacity = 2 TB 15
Solution: Min-cost Flow Calculation using Mixed-Integer Program Fixed-cost edges make min-cost flow calculation NP-Hard Mixed-Integer Program (MIP) Binary variable ye defined on fixed-cost edges Goal: Minimize dollar cost Subject to Capacity constraints (flowe capacitye ye) Conservation of flow Demands of sources and sink Proof of NP-Hardness and formal MIP in paper 16
Optimizations: Overview Size of MIP grows linearly with deadline T Worst-case running time grows exponentially with T Reduce size of the MIP Reduce number of shipment edges -condensed time-expanded networks More optimizations in paper 17
Optimizations: Reduce number of shipment edges 8am Can remove redundant shipment edges Example: Overnight shipment sent anytime before 4pm will arrive at destination at 8am 7am 4pm 3pm 2pm 1pm noon 18
Optimization: -condensed Time-expanded Network Each batch of consecutive time units condensed into one virtual time unit Solution has Minimum cost Deadline approximation depending on More details in paper [Fleischer Skutella 07] = 2 19
Experimental Setup Trace-driven Wrote scripts to communicate with FedEx web services: queried package rates and destination time Internet BW from PlanetLab measurements GNU Linear Programming Kit (GLPK) 20
Experimental Results: 8 sources, 0.25 TB per node, Heterogeneous BW Direct Internet Cost: $200 Time: 280 hrs Cannot take advantage of heterogeneous bandwidth Direct Overnight Cost: $1,500 Time: 38 hrs Cannot fill disks to capacity 4 5 3 6 2 7 1 8 x 8 Width proportional to BW t 0.25 TB 21
Experimental Results: 8 sources, 0.25 TB per node, Heterogeneous BW Pandora Deadline=96hrs Cost: $183 Time: < 96 hrs Direct Internet Cost: $200 Time: 280 hrs Cannot take advantage of heterogeneous bandwidth Direct Overnight Cost: $1,500 Time: 38 hrs Cannot fill disks to capacity 3 2 4 1 5 6 0.06 TB 0.08 TB 7 8 t 0.14 TB 22 1.92 TB
Experimental Results: Optimizations Reducing shipment edges decreases computation time Using -condensed time-expanded networks decreases computation time Deadlines met in our experiments 2 sources 1 source 23
Conclusion First ever solution to transfer data cooperatively between multiple sources with internet and shipping edges Produce optimal transfer plans that obey time deadlines and minimize dollar cost Better than internet-only and shipping-only strategies Reasonable computation time by using optimizations 24