filmov
tv
Comparison of parallel and sequential sorting algorithms

Показать описание
Comparison of parallel and sequential sorting algorithms
.
.
Keywords:
Comparison of parallel sorting algorithms, Comparison of parallel and sequential sorting algorithms, Darko Božidar and Tomaž Dobravec, Faculty of Computer and Information Science, University of Ljubljana, Slovenia, Technical report,
November 2015, BITONIC
SORT, MULTISTEP
BITONIC
SORT, IBR
BITONIC
SORT, MERGE
SORT, QUICKSORT, RADIX
SORT, SAMPLE
SORT, X-axis: Binary logarithm of the sequence length, Y-axis: speedup, X-axis: Binary logarithm of the sequence length, Y-axis: sort rate in M/s,
...................................................
THE ALGORITHMS:
2.1
BITONIC
SORT
Bitonic sort was designed by Ken E. Batcher [3] and because of its simplicity is one of the most
studied algorithms on GPU. It falls into the group of sorting networks, which means, that the
sequence and direction of comparisons is determined in advance and is independent of input
sequence. Bitonic sort is based on a bitonic sequence [3, 4, 5].
2.2
MULTISTEP
BITONIC
SORT
Parallel implementation of bitonic sort is very efficient when sorting short sequences, but it
becomes slower when sorting very long sequences, because shared memory size is limited and long
bionic sequences cannot be saved into it. In order to merge long bionic sequences, global memory
has to be used instead of shared memory. Furthermore, global memory has to be accessed for every
step of bionic merge. In order to increase the speed of sort, multiple steps of bitonic merge have to
be executed with a single kernel invocation. This can be achieved with multistep bitonic sort [4].
2.3
IBR
BITONIC
SORT
Interval Based Rearrangement (IBR) bitonic sort [5] is based on adaptive bitonic sort implemented
by Bilardi et. al. [20]. Adaptive bitonic sort operates on the idea, that every bitonic sequence can be
merged, if first half of a bitonic sequence and second half of a bitonic sequence are ring-shifted by a
value called q. Value q can be found with a variation of a binary search. In order to exchange
elements in a bitonic merge step in time O(log n), a variation of a binary tree (bitonic tree [5, 20]) is
used.
2.4
MERGE
SORT
Merge sort is based on the divide-and-conquer approach. It follows this approach by splitting the
sequence into multiple subsequences, sorting them and then merging them into sorted sequence. If
the algorithm for merging sorted sequences is stable, than the whole merge sort algorithm is also
stable [22]. In our tests we used a variation of a parallel merge sort presented by Satish in [2].
2.5
QUICKSORT
Similarly as merge sort, quicksort is also based on the divide-and-conquer basis. In divide step, the
pivot has to be determined. This pivot is used to partition the input sequence A[p...r] into 2
subsequences. When the partitioning is done, pivot has to be placed in A[q], where all the elements
in subsequence A[p...q − 1] have to be lower or equal to A[q] and all the elements of A[q + 1...r]
have to be greater than A[q]. In step conquer, subsequences A[p...q-1] and A[q + 1...r] have to be
recursively sorted as described above, until subsequences of length 1 are obtained. This recursive
procedure sorts the entire sequence T [p...r] [22, 23].
In out tests we used the parallel quicksort designed by Cederman et. al. [1]. This algorithm first
determines an initial pivot, which is calculated as an average of a minimum and a maximum
element of an input sequence, as suggested by Singleton et. al. [24]. Minimum and maximum
values were calculated using parallel reduction kernel by Harris [7], which improved the
performance of sort [1]. Performance was further improved by Sengupta’s et al. scan [8]. We also
introduced a novel approach for determining a pivot.
2.6
RADIX
SORT
Radix sort is one of the fastest sorting algorithms for short keys and is the only sorting algorithm in
this report which is not comparison based. Its sequential variation first splits the elements being
sorted (numbers, words, dates, ...) into d r-bit digits. The elements are then sorted from least to most
significant digit. For this task, the sorting algorithm has to be stable, in order to preserve the order
of elements with duplicate digits. For a good performance an effective sorting algorithm has to be
used, which is usually counting sort [2, 19, 22]. We improved the performance of a parallel
algorithm by using Harris’s et al. binary scan [9].
2.7
SAMPLE
SORT
Sample sort is a sorting algorithm, which splits the input sequence into multiple smaller buckets,
until they are small enough to be sorted by alternative sort. For the parallel implementation we
chose variation of the sample sort by Dehne et. al. [6], because it is more robust for sorting different
types of input sequences than Leischner’s et. al. [13, 19].
.
.
Keywords:
Comparison of parallel sorting algorithms, Comparison of parallel and sequential sorting algorithms, Darko Božidar and Tomaž Dobravec, Faculty of Computer and Information Science, University of Ljubljana, Slovenia, Technical report,
November 2015, BITONIC
SORT, MULTISTEP
BITONIC
SORT, IBR
BITONIC
SORT, MERGE
SORT, QUICKSORT, RADIX
SORT, SAMPLE
SORT, X-axis: Binary logarithm of the sequence length, Y-axis: speedup, X-axis: Binary logarithm of the sequence length, Y-axis: sort rate in M/s,
...................................................
THE ALGORITHMS:
2.1
BITONIC
SORT
Bitonic sort was designed by Ken E. Batcher [3] and because of its simplicity is one of the most
studied algorithms on GPU. It falls into the group of sorting networks, which means, that the
sequence and direction of comparisons is determined in advance and is independent of input
sequence. Bitonic sort is based on a bitonic sequence [3, 4, 5].
2.2
MULTISTEP
BITONIC
SORT
Parallel implementation of bitonic sort is very efficient when sorting short sequences, but it
becomes slower when sorting very long sequences, because shared memory size is limited and long
bionic sequences cannot be saved into it. In order to merge long bionic sequences, global memory
has to be used instead of shared memory. Furthermore, global memory has to be accessed for every
step of bionic merge. In order to increase the speed of sort, multiple steps of bitonic merge have to
be executed with a single kernel invocation. This can be achieved with multistep bitonic sort [4].
2.3
IBR
BITONIC
SORT
Interval Based Rearrangement (IBR) bitonic sort [5] is based on adaptive bitonic sort implemented
by Bilardi et. al. [20]. Adaptive bitonic sort operates on the idea, that every bitonic sequence can be
merged, if first half of a bitonic sequence and second half of a bitonic sequence are ring-shifted by a
value called q. Value q can be found with a variation of a binary search. In order to exchange
elements in a bitonic merge step in time O(log n), a variation of a binary tree (bitonic tree [5, 20]) is
used.
2.4
MERGE
SORT
Merge sort is based on the divide-and-conquer approach. It follows this approach by splitting the
sequence into multiple subsequences, sorting them and then merging them into sorted sequence. If
the algorithm for merging sorted sequences is stable, than the whole merge sort algorithm is also
stable [22]. In our tests we used a variation of a parallel merge sort presented by Satish in [2].
2.5
QUICKSORT
Similarly as merge sort, quicksort is also based on the divide-and-conquer basis. In divide step, the
pivot has to be determined. This pivot is used to partition the input sequence A[p...r] into 2
subsequences. When the partitioning is done, pivot has to be placed in A[q], where all the elements
in subsequence A[p...q − 1] have to be lower or equal to A[q] and all the elements of A[q + 1...r]
have to be greater than A[q]. In step conquer, subsequences A[p...q-1] and A[q + 1...r] have to be
recursively sorted as described above, until subsequences of length 1 are obtained. This recursive
procedure sorts the entire sequence T [p...r] [22, 23].
In out tests we used the parallel quicksort designed by Cederman et. al. [1]. This algorithm first
determines an initial pivot, which is calculated as an average of a minimum and a maximum
element of an input sequence, as suggested by Singleton et. al. [24]. Minimum and maximum
values were calculated using parallel reduction kernel by Harris [7], which improved the
performance of sort [1]. Performance was further improved by Sengupta’s et al. scan [8]. We also
introduced a novel approach for determining a pivot.
2.6
RADIX
SORT
Radix sort is one of the fastest sorting algorithms for short keys and is the only sorting algorithm in
this report which is not comparison based. Its sequential variation first splits the elements being
sorted (numbers, words, dates, ...) into d r-bit digits. The elements are then sorted from least to most
significant digit. For this task, the sorting algorithm has to be stable, in order to preserve the order
of elements with duplicate digits. For a good performance an effective sorting algorithm has to be
used, which is usually counting sort [2, 19, 22]. We improved the performance of a parallel
algorithm by using Harris’s et al. binary scan [9].
2.7
SAMPLE
SORT
Sample sort is a sorting algorithm, which splits the input sequence into multiple smaller buckets,
until they are small enough to be sorted by alternative sort. For the parallel implementation we
chose variation of the sample sort by Dehne et. al. [6], because it is more robust for sorting different
types of input sequences than Leischner’s et. al. [13, 19].