We comparing two approaches to count the occurrences of numbers in a CUDA array.
The two approaches use CUDA Thrust:
Using thrust::counting_iterator and thrust::upper_bound, following the histogram Thrust example;
Using thrust::unique_copy and thrust::upper_bound.
A fully worked example is available on our GitHub page.
The first approach has shown to be the fastest. On an NVIDIA GTX 960 card, we have had the following timings for a number of N = 1048576 array elements:
First ap...

More
# Thrust

# Sorting 2 or 3 arrays by key with CUDA Thrust

We have compared two approaches to sort arrays by key, with the same key. One of those approaches uses thrust::zip_iterator and the other thrust::gather.
We have tested them in the case of sorting two arrays or three arrays. In all the two cases, the approach using thrust::gather has shown to be faster.
The full codes are available on our GitHub website:
2 Arrays solution
3 Arrays solution
In the following, some timing results (NVIDIA GTX 960 card):
Timing in the case of 2 arrays for...

More
# Customized Stream Compaction

Stream compaction consists of removing undesired elements in a collection depending on a predicate. For example, considering an array of integers and the predicate p(x)=x>5, the array A={6,3,2,11,4,5,3,7,5,77,94,0} is compacted to B={6,11,7,77,94}.
The general idea of stream compaction approaches is that a different computational thread be assigned to a different element of the array to be compacted. Each of such threads must decide to write its corresponding element to the output array de...

More
# Sorting by key with tuple key and customized comparison operator

Sorting an array by key is possible once an ordering is defined for the tuple key.
Defining an ordering is possible with CUDA Thrust by an overload of the “<” comparison operator.
Accordingly, sorting tuples with CUDA Thrust can be performed by a combination of thrust::sort_by_key, zip iterators and tuples.
Our GitHub web page contains an example on how this can be simply accomplished.

More
# Sorting tuples with CUDA Thrust

Sorting tuples is possible once an ordering is defined for the tuples.
Defining an ordering is possible with CUDA Thrust by an overload of the “<” comparison operator. Accordingly, sorting tuples with CUDA Thrust can be performed by a combination of thrust::sort, zip iterators and tuples.
Our GitHub web page contains an example on how this can be simply accomplished.

More
# CUDA Thrust saxpy with placeholders and lambda expressions

Saxpy, namely z = a * x + y, is a very common operation to be performed in scientific programming.
cuBLAS implements its own saxpy, but it is limited to the case when z = y, so in some circumstances it has to be implemented using a kernel function or using CUDA Thrust.
On our GitHub website, a fully worked example is shown on how implementing saxpy in CUDA using Thrust and, in particular, using the placeholder technique.
A fully worked example is also reported on how implementing saxpy ...

More
# Calling CUDA Thrust primitives from within a kernel

Starting from Thrust 1.8, CUDA Thrust primitives can be combined with the thrust::seq execution policy to run sequentially within a single CUDA thread (or sequentially within a single CPU thread).
Starting from Thrust 1.8.1, CUDA Thrust primitives can be combined with the thrust::device execution policy to run in parallel within a single CUDA thread exploiting CUDA dynamic parallelism.
An example of both is reported on our GitHub website.
The example performs reductions of the rows of a matri...

More
# Find the minima of the columns of a matrix along with their corresponding row indices with CUDA Thrust

Suppose to have a matrix and that you want to find the minima (or, of course, the maxima) of the columns of that matrix along with their corresponding row indices.
For example, if you have the following matrix:
[ 0 1 12 18 20 3 10 8 ]
[ 5 15 1 11 12 17 12 10 ]
[ 18 20 15 20 6 8 18 13 ]
[ 18 20 3 18 19 6 19 8 ]
[ 6 10 8 16 14 11 12 1 ]
[ 12 9 12 17 10 16 1 4 ]
you want the following output:
Min position = 0; Min value = 0
Min position = 0; Min value = 1
Min position = 1; Min value = 1
Min positio...

More
# Find the minima of the rows of a matrix along with their corresponding column indices with CUDA Thrust

Suppose to have a matrix and that you want to find the minima (or, of course, the maxima) of the rows of that matrix along with their corresponding column indices.
For example, if you have the following matrix:
[ 0 1 12 18 20 3 10 8 ]
[ 5 15 1 11 12 17 12 10 ]
[ 18 20 15 20 6 8 18 13 ]
[ 18 20 3 18 19 6 19 8 ]
[ 6 10 8 16 14 11 12 1 ]
[ 12 9 12 17 10 16 1 4 ]
you want the following output:
Min position = 0; Min value = 0
Min position = 2; Min value = 1
Min position = 4; Min value = 6
Min positio...

More
# Computing the Euclidean distances between corresponding rows of matrices with CUDA

Calculating the Euclidean distances between homologous rows of two matrices is an easy task if CUDA Thrust is employed.
Indeed, this computation is an application of thrust::reduce_by_key.
On our GitHub website a full worked example is reported.

More