Count the occurrences of numbers in a CUDA array

We comparing two approaches to count the occurrences of numbers in a CUDA array. The two approaches use CUDA Thrust: Using thrust::counting_iterator and thrust::upper_bound, following the histogram Thrust example; Using thrust::unique_copy and thrust::upper_bound. A fully worked example is available on our GitHub page. The first approach has shown to be the fastest. On an NVIDIA GTX 960 card, we have had the following timings for a number of N = 1048576 array elements: First ap...
More

Sorting 2 or 3 arrays by key with CUDA Thrust

We have compared two approaches to sort arrays by key, with the same key. One of those approaches uses thrust::zip_iterator and the other thrust::gather. We have tested them in the case of sorting two arrays or three arrays. In all the two cases, the approach using thrust::gather has shown to be faster. The full codes are available on our GitHub website: 2 Arrays solution 3 Arrays solution In the following, some timing results (NVIDIA GTX 960 card): Timing in the case of 2 arrays for...
More

Customized Stream Compaction

Stream compaction consists of removing undesired elements in a collection depending on a predicate. For example, considering an array of integers and the predicate p(x)=x>5, the array A={6,3,2,11,4,5,3,7,5,77,94,0} is compacted to B={6,11,7,77,94}. The general idea of stream compaction approaches is that a different computational thread be assigned to a different element of the array to be compacted. Each of such threads must decide to write its corresponding element to the output array de...
More

Sorting many small “packed” arrays by key in CUDA

On problem of interest, is that of extending the approach in "Sorting many small arrays by key in CUDA" to the case when multiple arrays must be ordered according to the same key. Unfortunately, it is not possible to use cub::BlockRadixSort by "packing" the arrays using zip iterators and tuples. Accordingly, we have exploited an helper index approach. On our GitHub website a fully worked example is reported.
More

Sorting many small arrays in CUDA

In many applications, the problem of sorting many small arrays in CUDA arises. CUB offers a possible solution to face this problem. On our Git Hub website, we report an example that can be reused for this purpose. The idea is assigning the small arrays to be sorted to different thread blocks and then using cub::BlockRadixSort to sort each array. Two versions are provided, one loading and one loading the small arrays into shared memory.
More