Efficient Parallel Sorting Methods and Issues

1 / 20

Embed Share

"Explore the benefits and challenges of parallel sorting techniques in programming. Learn about comparison-based algorithms, distribution of elements between processes, and strategies for sorting with one or multiple elements per process. Discover insights on merging sorted data efficiently in parallel computing."

lae_p Follow

Uploaded on Jun 12, 2025 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Parallel Programming - Sorting David Monismith CS599 Notes are primarily based upon Introduction to Parallel Programming, Second Edition by Grama, Gupta, Karypis, and Kumar

Sorting Useful because sorted data is often easier to manipulate than unsorted data. Arranges elements into ascending or descending order. Internal or external Can be performed in main memory or must use secondary memory Comparison or non-comparison based Compare and Swap sorting methods such as MergeSort have a time complexity of (n lg n) in the best case. Some such algorithms like Insertion Sort have O(n2) complexity. Non-comparison methods such as radix sort, which manipulate a known property of the data have a time complexity of (n) in the best case. Our focus will be on Comparison-based algorithms.

Issues in Parallelizing Sorting Distributing elements between processes. Assume input and data are distributed between processes. It may be necessary to enforce global ordering to ensure ordered results. Example: We may need to require all elements in process Pjare less than or equal to those in process Piif j < i. Note that it is easy to compare and swap elements within the same process, however, communication is required if the elements are not within the same process.

One Element Per Process What if each process only contains one element from the set to be sorted? Given a process Pjwith element ajand a process Pi with element ai, if j < i, and aj> ai, then Pjshould contain ai and Pi should contain ajafter a compare and swap. Note that this method requires that both elements be sent to both processes for comparison to occur. This means if the communication time takes more than the comparison time, then this method is inefficient. This is true for most computers today. So, in general, using one element per process is inefficient.

Multiple Elements Per Process Want to sort large amount of data with few processes, where few is much less than large. Assign n/p elements to each process n is the total number of elements p is the total number of processes If we assume the elements stored within each process are already sorted, we can have each process to merge its data with its neighbor s data. In an ascending sort, the process with the smaller rank should keep the smaller elements, and the process with the large rank should keep the larger elements. Given a bidirectional communication link, this process will take (n/p) time and (p) steps as shown previously with the Odd-Even sort.

Merging Elements void mergeArrays(int * arr, int * neighborArr, int * temp, int size, int low_or_high) { int i, j, k; i = j = k = 0; } else if(j < size) { temp[i] = arr[j]; j++; } else { temp[i] = neighborArr[k]; k++; } } //Assume arrays are already sorted for(i = 0, j = 0, k = 0; i < size*2; i++) { if(j < size && k < size) { if(arr[j] < neighborArr[k]){ temp[i] = arr[j]; j++; } else { temp[i] = neighborArr[k]; k++; } if(low_or_high % 2 != 0) for(i = 0; i < size; i++) arr[i] = temp[i]; else for(i = size, j=0; j < size; i++,j++) arr[j]= temp[i]; }

Sorting Networks Sorting Networks are designed to sort n elements in less than (n lg n) assuming many comparisons will be performed in parallel. Such networks require a Comparator Increasing or decreasing Depth number of columns in the network

Bitonic Sort Sorts n elements in (lg2 n) time. Rearranges a bitonic sequence into a sorted sequence. Bitonic sequence of elements has the property that there is an index i to which the sequence is monotonically increasing, and after which the sequence is monotonically decreasing. Or a cyclic index shift of the sequence can produce such a sequence. Examples: 1, 2, 3, 4, 0, -1 9, 7, 2, -1, 3, 4 -> -1, 3, 4, 9, 7, 2

Bitonic Sort Bitonic Split splitting a length n bitonic sequence into two sequences defined by: s1 = min(a0,an/2), min(a1, an/2+1), . . . , min(an/2-1,an-1) s2 = max(a0,an/2), max(a1, an/2+1), . . . , max(an/2-1,an-1) It is possible to recursively obtain shorter bitonic sequences until reaching a sequence of size 1, at which point the entire sequence is monotonically increasing. This operation requires log n splits, and is referred to as a Bitonic Merge An example of a Bitonic Merging Network is provided in one of the following slides.

Bitonic Sort An unsorted sequence of elements can be viewed as a sequence of concatenated pairs of bitonic sequences in increasing and decreasing order. By iteratively ordering and merging these sequences, it is possible to create a bitonic sequence. Then the algorithm from the previous slide can be applied. Notice that the sorting algorithm requires O(n lg2 n) operations A picture of the network is provided on the next slide.

Bitonic Sort A bitonic sorting network. The left half of the network generates a bitonic sequence Arrows indicate where the larger element should be stored. Notice that the second half of the network is the Bitonic Merge. Source: http://en.wikipedia.org/wiki/Bitonic_sorter

Example Using the left half of the network sort the following bitonic sequence. 3, 8, 12, 34, 41, 47, 59, 82, 79, 64, 12, 11, 10, 8, 4, -3 After sorting this sequence, we will look at OpenMP parallelized code for the bitonic sort. It is non-trivial to map bitonic sort to a ring, mesh, and hypercube. Note that bitonic sort, while a better option than a O(n2) sort is not the best sorting option in general.

Recall Odd-Even Sort Previously we looked at the odd-even sort, which is a parallelized version of bubble sort Because odd-even sort is a variant of bubble sort, it is impossible to improve the number of operations required past O(n2) Even when using a O(n/p lg n/p) local sort with MPI, odd-even sort scales poorly. It is cost optimal when p = O(lg n)

Quick Sort Divide and conquer algorithm. Recursively divides a sequence into smaller subsequences. Makes use of a pivot The following algorithm is based on en.wikipedia.org/wiki/Quicksort

Quick Sort Algorithm void quicksort(int * arr, int lo, int hi) { if(lo < hi) { int p = partition(arr, lo, hi); quicksort(arr, lo, p - 1); quicksort(arr, p + 1, hi); } }

Quick Sort Algorithm int partition(int * arr, int lo, int hi) { int i, int storeIdx = lo; int pivotIdx = choosePivot(arr, lo, hi); int pivotVal = arr[pivotIdx]; swap(&arr[pivotIdx], &arr[hi]); for(i = lo; i < hi; i++) { if(arr[i] < pivotVal) { swap(&arr[i], &arr[storeIdx]); storeIdx++; } } swap(&arr[storeIdx], &arr[hi]); return storeIdx; }

Quicksort Median of Three int choosePivot(int * arr, int lo, int hi) { int mid = (lo+hi)/2; int temp; if(arr[lo] < arr[hi]) { temp = lo; lo = hi; hi = temp; } if(arr[mid] < arr[lo]) { temp = mid; mid = lo; lo = temp; } if(arr[hi] < arr[mid]) { temp = mid; mid = hi; hi = temp; } return mid; }

Parallelization Select a pivot Broadcast the pivot to all processes After receiving the pivot, each process rearranges its data into two local blocks Less than the pivot - Li Greater than the pivot - Gi Next, global rearrangement is performed so that elements smaller than the pivot are at the beginning of the array and those larger than the pivot are at the end. Then, the algorithm partitions the processes into two groups those responsible for sorting the larger and smaller elements. Recursion ends when a particular block of elements is assigned to only one process. At that point, the elements can be sorted with a serial quicksort.

Partitioning Partitioning processes for quick sort is done by the relative sizes of the smaller (S) and larger (L) blocks. The first ceil( |S|p/n + 0.5 ) processes are used to sort the smaller elements (S) and the remaining processes are assigned to sort the larger elements. A prefix sum is used to determine the index location where the elements of each local array should be placed in the local array.

Bucket and Sample Sort These sorting algorithms are based upon prior knowledge of the type of data in the sequence to be sorted. Data is placed in buckets representative of different ranges in the sequence. A prefix sum is useful in determining the indices of the elements to be placed in each bucket. Sampling is useful in determining the range of values to be placed in each bucket.

Efficient Parallel Sorting Methods and Issues

Download Presentation

Presentation Transcript

Related

More Related Content