
Parallel Algorithms in CS260: Last Lecture Insights
In the last lecture of CS260, a deep dive into parallel algorithms was provided, exploring concepts such as Single-Source Shortest Path (SSSP), Dijkstra's algorithm, Bellman-Ford algorithm, and various data structures. Tools and libraries essential for implementing these algorithms were highlighted, along with discussions on frontier-based and bucket-based graph algorithms. Furthermore, the lecture covered parallel trees and the Join operation for balancing schemes. Dive into this comprehensive overview of parallel algorithms and their implementations.
Download Presentation

Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.
E N D
Presentation Transcript
CS260 CS260 parallel algorithms Yihan Sun parallel algorithms Yihan Sun Parallel data structures
Last Lecture: SSSP Find a set of vertices as the frontier, update all their neighbors Dijkstra: 1 vertex in the frontier Frontier: the one with the smallest tentative distance ? rounds to finish Bellman Frontier: all vertices ? rounds to finish Find a set of vertices as the frontier, update all their neighbors Dijkstra: 1 vertex in the frontier Bellman- -ford: n vertices in the frontier ford: n vertices in the frontier 2
Last Lecture: SSSP - -stepping: a bucket of vertices in the frontier Fontier: all vertices with tentative distances in range ? to ? + 1 Run multiple rounds of Bellman-ford until no tentative distances are updated No useful bounds about #rounds Radius Shortcut each vertex to its ? nearest neighbors, call the ?-th nearest distance to ? the radius of ?, or ?(?) Frontier: all unsettled vertices within range ?? ??= min A vertex ? can update its neighbors distances as far as ? ? + ?(?) Run two rounds of Bellman-ford until no tentative distances are updated ? stepping: a bucket of vertices in the frontier Radius- -stepping: a bucket of vertices in the frontier stepping: a bucket of vertices in the frontier ????????? ? ?? ? + ?(?) after round ? 1 ? ?log?? rounds to finish, ? = max/min edge weight 3
Useful tools Almost all algorithms mentioned in the lectures about parallel algorithms are implemented in libraries: Ligra/Ligra+ [Shun and Blelloch 13] Frontier-based graph algorithms https://github.com/jshun/ligra Julienne [Dhulipala et al. 17] Bucket-based graph algorithms https://github.com/jshun/ligra/tree/master/apps/bucketing GBBS [Dhulipala et al. 18] Parallel graph algorithm library https://github.com/ldhulipala/gbbs ASPEN [Dhulipala et al. 18] Dynamic graph processing library https://github.com/ldhulipala/aspen Almost all algorithms mentioned in the lectures about parallel algorithms are implemented in libraries: 4
Last lecture: parallel trees Join ? =Join(??,?,??) : : ?? and ?? are two trees of a certain balancing scheme, ? is an entry/node (the pivot). ??< ? < ?? It returns a valid tree ?, which is ?? ? ?? Cost: roughly the difference in height Join- -based tree algorithms based tree algorithms . ?? ??2 4 ? ? = 10 2 ? ?? ?? (Rebalance if necessary) 5
Parallel trees Join Join- -based insertion Recursively insert the key into the corresponding subtree Join back using the root Join Find which subtree does the key locate Split that subtree Join back based insertion(?,?) Return ??,?,join(??,? ,?.right) Join- -based split based split(?,?) ? ? Join2 Split the last element ? in ?? Join ?? ? and ?? using ? Join2(??,??) ?.right ?? ?? ? 6
Range query (1D) Report all entries in key range Get a tree of them: Flatten them in an array: Report all entries in key range [??,??]. . Get a tree of them: ?(log?) work and depth Flatten them in an array: ?(? + log?) work, work and depth work, ?(log?) depth depth Equivalent to using two split algorithms ?(log?) related nodes / subtrees ?? ??
Parallel Algorithms Parallel algorithms on trees using divide Recursively deal with two subtrees in parallel (or split the tree into two pieces and handle them in parallel) Combine results of recursive calls and the root (e.g., using join or join2) Usually gives polylogarithmic bounds on depth Parallel algorithms on trees using divide- -and and- -conquer scheme conquer scheme func(T, ) { if (T is empty) return base_case; M = do_something(T.root); in parallel: L=func(T.left, ); R=func(T.right, ); return combine_results(L, R, M, ) }
Get the maximum value In each node we store a key and a value. The nodes are sorted by the keys. In each node we store a key and a value. The nodes are sorted by the keys. get_max(Tree T) { if (T is empty) return ; in parallel: L=get_max(T.left); R=get_max(T.right); return max(max(L, T.root.value), R); ?(?) work and ?(log?) depth Similar algorithm work on any map-reduce function
Map and reduce Maps each entry on the tree to a certain value using function map, then reduce all the mapped values using identity Reduce is associative Assume Maps each entry on the tree to a certain value using function , then reduce all the mapped values using reduce (with identity identity). (with ). Assume map and and reduce both have constant cost. both have constant cost. e.g., for all values ??in the tree, return ?? map(?) = ?2 reduce(?,?) = ? + ?; 2 map_reduce(Tree T, function map, function reduce, value_type identity) { if (T is empty) return identity; M=map(t.root); in parallel: L=map_reduce(T.left, map, reduce, identity); R=map_reduce(T.right, map, reduce, identity); return reduce(reduce(L, M), R); ?(?) work and ?(log?) depth
Filter Select all entries in the tree that satisfy function Return a tree of all these entries Select all entries in the tree that satisfy function ? Return a tree of all these entries filter(Tree T, function f) { if (T is empty) return an empty tree; in parallel: L=filter(T.left, f); R=filter(T.right, f); if (f(T.root)) return join(L, T.root, R); else return join2(L, R); } ?(?) work and ?(log2?) depth
Construction T=build(Array A, int size) { ? =parallel_sort(A, size); return build_sorted(? ,?); } ?(?) work and ?(log?) depth T=build_sorted(ArraryA, int start, int end) { if (start == end) return an empty tree; if (start == end-1) return singleton(A[start]); mid = (start+end)/2; in parallel: L = build_sorted(A, start, mid); R = build_sorted(A, mid+1, end); return join(L, A[mid], R); ?(?log?) work and ?(log?) depth, bounded by the sorting algorithm
Output to array Output the entries in a tree Assume each tree node stores its subtree size (an empty tree has size 0) Output the entries in a tree ? to an array in its in Assume each tree node stores its subtree size (an empty tree has size 0) to an array in its in- -order order ? to_array(Tree T, array A, int offset) { if (T is empty) return; A[offset+T.left.size] = get_entry(T.root); in parallel: to_array(T.left, A, offset); to_array(T.right, A, offset+T.left.size()+1); ?.???? ?.??? ? ? The size of the left subtree ?(?) work and ?(log?) depth
Parallel algorithms for ordered sets 16
Ordered Sets Ordered sets: sets with total ordering Ordered sets: sets with total ordering I am big! I am Mr. Big! < 4 6 < 10 >? <? 93 11 26 15 A set of integers (ordered) A set of citizens in Zootopia (unordered) Useful interface: Find, previous, next, size, first, last, k-th, Filter, reduce, construction, Union, intersection, difference, symmetric difference, Useful interface:
Merging Two Sets of Size n and m (nm) Solution 1: flatten trees into arrays, merge with moving pointers: 0 4 7 8 Solution 1: flatten trees into arrays, merge with moving pointers: Cost: ? ? + ? What if ? ?? O ? ? 1 2 3 5 6 9 0 1 2 3 4 5 6 7 8 9 Solution 2: insert the entries in the smaller tree into the larger tree Solution 2: insert the entries in the smaller tree into the larger tree Cost: ? ?log? 5 7 2 9 6 What if ? = ?? 4 8 9 1 3 6 7 0 ? ?log? ? 0 4 78 18 18 What is the minimum cost? What is the minimum cost?
The Lower Bound of Merging Two Ordered Sets Choose available slots in the final result. Set 1 (size ?): Choose ? slots for the elements in the first set among all available slots in the final result. slots for the elements in the first set among all ? + ? Set 2 (size ? ?): 0 4 7 8 1 2 3 5 6 9 ? + ? ? ? + ? ? = 0 1 2 3 4 5 6 7 8 9 Result: cases cases ? ?+ ? Lower bound: Lower bound: ???? ?+? ? = ? ????
The Lower Bound of Merging Two Ordered Sets The lower bound The lower bound ? ?+ ? ? ???? When When ? log? ) ) When ? = ?, it is When n ?, it is about , it is ?(?) , it is about ? ?log? ( (e.g., when e.g., when ? = 1, it is , it is Can we give an algorithm achieving this bound? Can we give an algorithm achieving this bound?
Join-based Algorithms: Union union union(??,??) if ?1= then return ?2 if ?2= then return ?1 ?2,?2,?2 = ???????(?2) ?1,?,?1 =split(?1,?2) In parallel: ??=Union ?1,?2 ??=Union(?1,?2) return Join(??,?2,??) Base case 7 5 4 8 2 9 0 1 3 6
Join-based Algorithms: Union union union(??,??) if ?1= then return ?2 if ?2= then return ?1 ?2,?2,?2 = ???????(?2) ?1,?,?1 =split(?1,?2) In parallel: ??=Union ?1,?2 ??=Union(?1,?2) return Join(??,?2,??) 5 4 7 2 9 8 0 1 3 6 ?? ?? ?? ??
Join-based Algorithms: Union union union(??,??) if ?1= then return ?2 if ?2= then return ?1 ?2,?2,?2 = ???????(?2) ?1,?,?1 =split(?1,?2) In parallel: ??=Union ?1,?2 ??=Union(?1,?2) return Join(??,?2,??) 5 7 2 4 9 8 1 3 0 6 Union Union
Join-based Algorithms: Union union union(??,??) if ?1= then return ?2 if ?2= then return ?1 ?2,?2,?2 = ???????(?2) ?1,?,?1 =split(?1,?2) In parallel: ??=Union ?1,?2 ??=Union(?1,?2) return Join(??,?2,??) 5 2 7 1 3 6 8 0 4 9 ?? ??
Join-based Algorithms: Union union union(??,??) if ?1= then return ?2 if ?2= then return ?1 ?2,?2,?2 = ???????(?2) ?1,?,?1 =split(?1,?2) In parallel: ??=Union ?1,?2 ??=Union(?1,?2) return Join(??,?2,??) 5 2 7 1 3 6 8 0 4 9 Similarly we can implement intersection and difference.
Join-based Algorithms: Union Theorem 1. For AVL trees, red-black trees, weight-balance trees and treaps, the above algorithm of merging two balanced BSTs of sizes ? and ? (? ?) have ? ?log ?(log?log?) depth (in expectation for treaps). ? ?+ 1 work and The bound also holds for intersection and difference The bound also holds for intersection and difference 26
Cost analysis of union Using AVL as an example union union(??,??) Lemma 1. The Join work can be asymptotically bounded by its corresponding Split. if ?1= then return ?2 if ?2= then return ?1 ?2,?2,?2 = ???????(?2) ?1,?,?1 =split(?1,?2) In parallel: ??=Union ?1,?2 ??=Union(?1,?2) return Join(??,?2,??) Split cost: the height of ?1 Join cost: difference of height between ?? and ?? ??= ?2.???? ? ?????? ?? ?1 ??= ?2.??? ? ? ?????? ?? ?1 Can prove by induction The depth is ? ???????? 2steps to reach the base case, O( 1) cost for each split 27
The Split Work union union(??,??) if ?1= then return ?2 if ?2= then return ?1 ?2,?2,?2 = ???????(?2) ?1,?,?1 =split(?1,?2) In parallel: ??=Union ?1,?2 ??=Union(?1,?2) return Join(??,?2,??) Union Union 8 split 4 9 ?? 5 0 Union Union Union Union split 8 4 split 2 7 ?? 9 0 ^ 1 3 6 8 0 4 ^ 9 Union Union Union Union Union Union Union Union
Concavity: ? The Split Work ?? ?log ?? log ? ?=1 8 split 4 9 ?? 5 log|?1| = log? 0 (|?|: the size of tree ?) ?21 ?22 split 8 4 log|?21| + log|?22| 2log? split 2 7 ?? 9 0 2 log|?31| + log|?32| + log ?33+ log|?34| 4log? log|? 1| + + log|? 2 | 2 log? ? ?+ 1 The height of T2 = ?(log?) ^ 1 3 6 8 4 0 4 ^ ?32 ?33 9 ?31 ?34 2 log? + 2log? 2+ 4log? 4 + 2 log? c c 2 = ? ? log (If ?2 is perfectly balanced) log2? terms c c
How can we enumerate the splitters? Height For the For Height ? ???? For the ?- -th For ????? layers, are there th layer, there are at most layers, are there ?? ??? ?= ?? nodes? layer, there are at most ?? ? nodes nodes nodes? 1 2 4 8 16? 32? AVL tree, ? = ?? 31
How can we enumerate the splitters? Idea: group all nodes bottom up Example: AVL tree: group all nodes with height into level Idea: group all nodes bottom up Example: AVL tree: group all nodes with height ?? ? and into level ?: no more than and ?? ? : no more than ?? ?of them of them AVL tree, ? = ?? 32
How can we enumerate the splitters? Idea: group all nodes bottom up Example: AVL tree: group all nodes with height into level Idea: group all nodes bottom up Example: AVL tree: group all nodes with height ?? ? and into level ?: no more than and ?? ? : no more than ?? ?of them of them Height 6, 1 of them ? Height 5, 1 of them ? ? Lemma 3. In an AVL tree of size ?, there are at most nodes in level ?. 2? 1 Height 4, 2 of them ? Height 3, 4 of them ? Similar idea is applicable to other balancing schemes Height 2, 6 of them ? Height 1, 9 of them AVL tree, ? = ?? 33
Proof sketch Every node with height 3 has at least 2 leaf descendants, so AVL invariant Remove all leaf nodes, it is still an AVL tree, so Let ??be the number of nodes with height ? Every node with height 3 has at least 2 leaf descendants, so ?? ?? AVL invariant Remove all leaf nodes, it is still an AVL tree, so ?? ?? ? ? Obviously ?1+ ?2 ? So ?3+ ?4 ? 2 ? Lemma 3. In an AVL tree of size ?, there are at most nodes in level ?. 2? 1 AVL tree, ? = ?? 34
The Split Work 8 split 4 9 ?? 5 0 ?21 ?22 split 8 4 split 2 7 ?? ?/2+? 9 0 ^ 1 3 6 ?/4+ ? ?log? ?+? ? ? ? ?+ 1 8 = ? ?log 2log 4log 0 4 ^ ?32 ?33 9 ?31 ?34 log? + 2log? 2+ 4log? 4 + 2 log? ? ?+ 1 c c 2 = ? ? log (If ?2 is perfectly balanced) log2? terms c c
Join-based Algorithms: Union Theorem 1. For AVL trees, red-black trees, weight-balance trees and treaps, the above algorithm of merging two balanced BSTs of sizes ? and ? (? ?) have ? ?log ?(log?log?) depth (in expectation for treaps). ? ?+ 1 work and The bound also holds for intersection and difference The bound also holds for intersection and difference 36
What Are Persistence and MVCC? Persistence Preserves the previous version of itself Always yields a new version when being updated Multi Let write transactions create new versions Let ongoing queries work on old versions Why Persistence and MVCC? Persistence [DSST 86] [DSST 86]: for data structures : for data structures Multi- -version Concurrency Control (MVCC): for databases version Concurrency Control (MVCC): for databases To guarantee concurrent updates and queries to work correctly Queries work on a consistent version Writers/readers do not block each other To guarantee concurrent updates and queries to work correctly and and efficiently efficiently 38
Why Persistence and MVCC? An example of a document search engine. Riverside Whales OR dog Calories room AND magic Document 4: The University of California, Riverside is a public research university in Riverside, California. Document 1: Blue whales eat half a million calories in one mouthful. Document 2: My dog probably thinks I m magical because rooms light up when I enter them. Document 3: Banging your head against a wall for one hour burns 150 calories. A Document Database For end-user experience: Queries shouldn't be delayed by updates Queries must be done on a consistent version of database Generally useful for any database systems with concurrent updates and queries. Hybrid Transactional and Analytical Processing (HTAP) Database System 39
Persistence Using Join Path Copying occur Path- -copying: Copying occur only in Join copying: copy the affected path on trees only in Join! ! insert ?1 4 5 3 8 1 9 40
Persistence Using Join Path Copying occur Path- -copying: Copying occur only in Join copying: copy the affected path on trees only in Join! ! ?1 insert 5 4 3 8 1 9 41
Persistence Using Join Path Copying occur Always All the other parts in the algorithm remain No extra cost Safe for (MVCC) 5 Path- -copying: Copying occur only in Join Always copy All the other parts in the algorithm remain unchanged No extra cost in time asymptotically, small overhead in space Safe for concurrency (MVCC) copying: copy the affected path on trees only in Join! ! copy the middle node the middle node unchanged in time asymptotically, small overhead in space concurrency multi ?1 ?2 multi- -version concurrency control version concurrency control 5 5 3 3 8 3 1 9 4 42
Persistence for MVCC Each operate on a snapshot Multiple updates can be visible atomically Each operate on a snapshot safe for concurrency Multiple updates can be visible atomically lock safe for concurrency lock- -free free Current version ? ??: ??= ?.insert(4) commit(??) ?1 ?3 ?2 5 5 5 5 5 5 5 ??: ??= ?.delete(9) 8 3 8 3 3 8 8 8 6 4 1 9 ??: ??= ?.insert(6) 43
Transactions using Multi-version Concurrency Control (MVCC) Lock-free atomic updates A series of operations A bulk of operations (e.g., union) Easy roll-back Do not affect other concurrent operations Any operation works on as if a single- versioned tree with no extra (asymptotical) cost Worst-case ?(log?) for a lookup ?(log?) for an insertion Bank: add the total balance Bank: find all with balance>10 Insurance agent: check Bob s balance Frank: check my balance current $2 Carol Wendy Wendy +$2 Carol -$2 ?temp ?1 ?2 Carol 17 Carol 15 Carol 17 Frank 7 Bob 20 Frank 7 Alice 17 David 13 Wendy 6 Wendy 4
Transactions using Multi-version Concurrency Control (MVCC) Lock-free atomic updates A series of operations A bulk of operations (e.g., union) Easy roll-back Do not affect other concurrent operations Any operation works on as if a single- versioned tree with no extra (asymptotical) cost Bank: add the total balance Bank: find all with balance>10 Insurance agent: check Bob s balance current $2 Carol Wendy Wendy +$2 Carol -$2 ?temp ?1 ?2 Concurrent writes? Concurrent transactions work on snapshots They don t come into effect on the same tree? Useless old nodes? Out-of-date nodes should be collected in time Carol 17 Carol 15 Carol 17 Frank 7 Bob 20 Frank 7 Alice 17 David 13 Wendy 6 Wendy 4
Batching Collect all concurrent writes can commit using a single writer once a while Collect all concurrent writes can commit using a single writer once a while Insert 15 delete 16 Insert 8 Delete 6 Insert 9 Insert 5 Delete 3 Database 48
Batching Collect all concurrent writes can commit using a single writer once a while Collect all concurrent writes can commit using a single writer once a while Insert 15 delete 16 Insert 8 Delete 6 Insert 9 Insert 5 Delete 3 Batching layer Union of [5,8,9,15] Difference of [3,6,16] Database 49
Garbage Collection Reference Counter Garbage Collector Each tree node records the number of other tree nodes/pointers refers to it Node 8 and 1 in the example have reference counter 2 Reference Counter Garbage Collector ?1 ?2 Reference count: 5 5 Node Count 1 2 3 1 5 1 8 2 9 1 5 1 3 1 4 1 3 8 3 1 9 4
Garbage Collection Each tree node records the number of other tree nodes/pointers refers to it Node 8 and 1 in the example have reference counter 2 Collect a node if and only if its reference count is 1 ?1 ?2 Each tree node records the number of other tree nodes/pointers refers to it Node 8 and 1 in the example have reference counter 2 Collect a node if and only if its reference count is 1 collect(node* t) { if (!t) return; if (t->ref_cnt == 1) { node* lc = t->lc, *rc = t->rc; free(t); in parallel: collect(lc); collect(rc); } else dec(t->ref_cnt); } 1 3 5 2 1 1 1 5 5 3 8 3 1 9 4 Node Count 8 2 1 9 1 5 1 3 1 4 1
Version chains An alternative way is to use version chains Stores all versions in one (tree) skeleton Readers need to check the visibility of versions Less space used, but readers can be slow An alternative way is to use version chains 36 5 Time stamp Value 9 4 11 5 20 2 21 7 34 2 55 5 60 9 3 8 A reader starting at time 36 would check the version chain and return the value 2 at time stamp 34 1 9