Lock-Free Multi-Threaded Algorithm for Max-Flow Problem

Lock-Free Multi-Threaded Algorithm for Max-Flow Problem
Slide Note
Embed
Share

This paper discusses a lock-free multi-threaded algorithm for addressing the Max-Flow Problem efficiently, presenting existing algorithms, vertex properties, impact of locking, and a new lock-free algorithm model. The content covers various sequential and parallel algorithms, vertex characteristics, excessive flow, locking impact, and the new lock-free algorithm proposed for SMP computers with multiple processors.

  • Lock-Free Algorithm
  • Max-Flow Problem
  • Multi-Threaded
  • Parallel Algorithms
  • Vertex Properties

Uploaded on Feb 22, 2025 | 1 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. A Lock-free Multi-threaded Algorithm for the Max-flow Problem Bo Hong Electrical and Computer Engineering Department Drexel University bohong@coe.drexel.edu http://www.ece.drexel.edu/faculty/bohong

  2. The Max-flow Problem 4 a b 6 5 8 4 S c t 7 6 4 3 Find: maximum flow from s to t d Subject to: edge capacity constraints zero net-flow for u V- {s,t} Bo Hong 2

  3. Existing algorithms Sequential Algorithms Augmenting Path Ford-Fulkerson , pseudo-polynomial Edmonds and Karp, O(|V| |E|2) Dinitz, O(|V|2 |E|) Preflow Push Karzanov, O(|V|3) Push-Relabel Goldberg, O(|V|2 |E|), with dynamic trees O(|V| |E| log(|V|2 |E|) ) Parallel Algorithms Shiloach, etc. O(|V|2 log|V| ) with |V|-processor PRAM Goldberg, O(|V|2 log|V| ) with |V|-processor PRAM Anderson, etc. Global relabeling Bader, etc. Gap relabeling 3 Bo Hong

  4. Two vertex properties a b c t S a b 3 1 d c t S 3 0 d Excessive flow: the net flow into a vertex e.g. e(c) = 5 Every vertex has an integer valued height e.g. h(c) = 2 Bo Hong 4

  5. Existing parallel algorithms a b a b c t S c t S d d Push: Lift: applicable when e(a)>0 and there exists cf(a,v) > 0 and h(v)=h(a)-1 Actions: Lock a and v a->v still pushable? d = min( e(a), cf(a,v) ) e(a) = e(a) d e(v) = e(v) + d cf(a,v) = cf(a,v) d cf(v,a) = cf(v,a) + d Unlock a and v applicable when e(c)>0 and all cf(c,x) > 0 implies h(x) h(c) Actions: Lock v v = lowest such vertex x h(c) = h(v) + 1 Unlock v Bo Hong 5

  6. Impact of locking Locks protect shared accesses But locks are expensive P1 P2 Lock x x+1 Unlock Lock x x+1 Unlock 16 Lock acquisition time (us) 14 12 Read x Increase 1 Update x Actual 10 T 8 6 4 time Read x Increase 1 Update x Ideal 2 0 3 5 7 9 11 13 15 Number of processors Bo Hong 6

  7. New lock-free algoritm: model of the architecture SMP computer with multiple processors sharing the memory Multi-processor systems Multi-core systems Supports atomic fetch-and-add instruction Supports sequential consistency P1 P2 x x+c1 x x+c2 x x+c3 x x+c4 Eventual result x x+c1+c2+c3+c4 not matter how exactly the instructions were interleaved. 7 Bo Hong

  8. New algorithm: two basic lock-free operations a b a b c t S c t S d d Push: Lift: applicable when e(a)>0 and there exists cf(a,x) > 0 and h(x)<h(a) Actions: v = lowest such vertex x d = min( e(a), cf(a,v) ) e(a) = e(a) d e(v) = e(v) + d cf(a,v) = cf(a,v) d cf(v,a) = cf(v,a) + d applicable when e(c)>0 and all cf(c,x) > 0 implies h(x) h(c) Actions: v = lowest such vertex x h(c) = h(v) + 1 Bo Hong 8

  9. The algorithm Initialize h(u), e(u), and f(u,v) h(s) = |V| h(u) = 0 for u V {s} f(s,u) = c(s,u) e(u) = c(s,u) f(u,v) = 0, otherwise a b c t S d While there exists applicable push or lift operations execute the push or lift operations asynchronously Bo Hong 9

  10. Asynchronous execution of the basic operations P1 P2 while e(u) > 0 e = e(u) h = for each (u,v) s.t. cf(u, v) > 0 if h(v) < h h = h(v) v = v if h(u) > h d = min ( e , cf(u, v ) ) cf(u, v ) = cf(u, v ) + d cf(v , u) = cf(v , u) d e(u) = e(u) d e(v ) = e(v ) + d else h(u) = h + 1 while e(u) > 0 e = e(u) h = for each (u,v) s.t. cf(u, v) > 0 if h(v) < h h = h(v) v = v if h(u) > h d = min ( e , cf(u, v ) ) cf(u, v ) = cf(u, v ) + d cf(v , u) = cf(v , u) d e(u) = e(u) d e(v ) = e(v ) + d else h(u) = h + 1 Bo Hong 10

  11. Seems rather chaotic? . . . Not really P1 P2 while e(u) > 0 e = e(u) h = for each (u,v) s.t. cf(u, v) > 0 if h(v) < h h = h(v) v = v if h(u) > h d = min ( e , cf(u, v ) ) cf(u, v ) = cf(u, v ) + d cf(v , u) = cf(v , u) d e(u) = e(u) d e(v ) = e(v ) + d else h(u) = h + 1 time or time Bo Hong 11

  12. An invariant property of the algorithm As long as cf(u,v) and e(u) are updated atomically, we always have h(u) h(v) + 1 for any cf(u,v) > 0, no matter how the threads are interleaved. 12 Bo Hong

  13. Optimality of the algorithm If any e(u) > 0, then the algorithm will not terminate Property of the push and lift operations If the algorithm terminates, then there is no path from s to t in the residual graph Proof by contradiction, if such path exists, then the invariant property of function f has to be broken If the algorithm terminates, it finds a maximum flow Termination implies all e(u)=0, meaning this is a feasible flow. No path from s to t, by max-flow min-cut theorem, it has to be a maximum flow 13 Bo Hong

  14. Convergence of the algorithm (complexity bound) For any u s.t. e(u) > 0, there exists a path from u to s in the residual graph Property of network flow The height of any vertex is less than 2|V| - 1 The longest path can have at most |V| vertices The total number of lift operations is bound by 2|V|2-|V| Bound by the height of vertices The total number of saturated pushes is bound by (2|V|-1) |E| Bound by the total number of lift operations The total number of un-saturated pushes is bound by 4|V|2 |E| Bound by the number of lift and saturated pushes Therefore the algorithm terminates with O(|V|2 |E|) operations 14 Bo Hong

  15. Lock-free termination detection The algorithm terminates when e(u) = 0 for all u V {s,t} e(u) = 0 at a single thread is insufficient to terminate the thread An elegant solution: The net flow out of source s decreases monotonically The net flow into sink t increases monotonically When the two values become equal, we must have e(u) = 0 for all u V {s,t}, a necessary and sufficient termination condition. 15 Bo Hong

  16. Experimental results 1.5 1.5 1 Speedup Speedup 1 0.5 2 Threads 4 Threads Lock-Free Lock-Based 0 0.5 1 2 4 600 800 1000 1200 Number of Threads Number of Nodes Comparison Against Classical Lock-Based Algorithm Scalability of the Lock-Free Algorithm Execution results on 2-way SMP with 3.2GHz Intel Xeon Processors 4-thread results obtained when hyper-threading was enabled 16 Bo Hong

  17. Summary and future work Developed a lock-free multi-threaded algorithm for the max-flow problem having the same complexity bound as existing parallel algorithms eliminated lock usages thereby improving thread-level parallelism 20% improvement over existing lock-based parallel algorithms Results indicate the effectiveness of algorithmic method in reducing synchronization overheads Future work Load balancing across the threads: vertex to thread assignment, static or dynamic or hybrid? Optimize cache usages Reduce the number of operations via global and gap relabling What if edge capacities are floating-point? 17 Bo Hong

Related


More Related Content