Calculating the projection of a vector on a set with CUDA

Many times it is necessary to calculate the projection of a vector v on a set of vectors S. By "projection", we mean calculating the element of S whose Euclidean distance is the least from v. This can be done with CUDA Thrust by the following approach: Assume that the vector v is [0 1 12 18 20 3 10 8 5 15] Suppose to arrange the elements of S in a matrix as [ 1 11 12 17 12 10 18 20 15 20 ] [ 6 8 18 13 18 20 3 18 19 6 ] [ 19 8 6 10 8 16 14 11 12 1 ] [ 12 9 12 17 10 16 1 4 4 16 ] [ 1 3 12 12 15 6...
More

Calculating the l2 norm of an array using CUDA Thrust

Many times, computing the l2 norm of an array is necessary.This operation belongs to the class of reduction operations. Basically, it requires squaring the elements of an array and then summing the squared elements. Instead of proceeding to code your own reducing, CUDA Thrust offers a very useful tool, namely thrust::transform_reduce which performs an elementwise transformation over the array elements and then a reduction (in this case, a "plus" reduction). Accordingly, calculating the norm of...
More

Tricks and Tips: Row-wise/Column-wise operations on matrices with CUDA

Many times it is necessary to apply the same operation on all the rows or columns of a matrix with CUDA. For example, add the same row vector to all the rows of a matrix or add the same column vector to all the columns of a matrix. This is an operation that can be easily done using CUDA Thrust. On our GitHub page a fully worked example is reported.
More

Tricks and Tips: Replicate a vector multiple times using CUDA Thrust

Suppose we have an array of M elements. We want to create a new array of M * N elements in which the M elements of the original array are repeated N times. In other words, if we M = 3 and the original array is {1, 2, 3}, we  want to end up with {1, 2, 3, 1, 2, 3, ...}. This can be easily done with CUDA Thrust as an application of the expand operator. On our GitHub page a fully worked example is available.
More

Tricks and Tips: Scaling the rows of a matrix with CUDA

Suppose that we want to scale the rows of a matrix as follows: Besides writing your own CUDA kernel, there are two possibilities: CUDA Thrust's thrust::transform and cuBLAS's cublasdgmm. On our GitHub page a full example is reported. We have tested the above code on a Kepler K20c and these are the result: Size Thrust cuBLAS 2500 x 1250 0.20ms 0.25ms 5000 x 2500 0.77ms 0.83ms In the cuBLAS timing, we are excluding the cublasCreate time. Even with this, the CUDA Thrust version seems ...
More

Tricks and Tips: Finding the number of occurrences of keys and the positions of first occurrences of keys by CUDA Thrust

Let us suppose to have a vector of keys: thrust::device_vector<int> keys(10); keys[0] = 51; // <*> keys[1] = 51; keys[2] = 72; // <*> keys[3] = 72; keys[4] = 72; keys[5] = 103; // <*> keys[6] = 103; keys[7] = 504; // <*> keys[8] = 504; keys[9] = 504; We want to populate the two device arrays pidx and pnum so that: 1. The pidx array contains the first position of each distinct key in the keys vector, namely the positions marked with <*> in the code snippet ab...
More

Tricks and Tips: Reordering matrix rows by key

Ordering an array by key is something that can be achieved in a very simple way in CUDA by using CUDA Thurst sort_by_key or stable_sort_by_key. But what happens if we want to order the rows of a matrix according to a key or "membership"? For example, consider the following matrix [ 10 17 64 90 97 27 56 45 ] [ 33 76 18 60 62 82 63 56 ] [ 88 99 75 96 36 48 90 68 ] [ 91 96 24 87 91 36 94 47 ] [ 37 56 45 81 72 58 63 18 ] along with the following row keys 3 2 2 4 2 We want to order the matrix rows ac...
More

Tricks and Tips: Reduction by key with tuple key

Classical examples for reduction by key with CUDA Thrust consider scalar keys. Sometimes, the keys to be considered are couples or, more generally, tuples. On our GitHub web page a simple example is reported for reduction by key with CUDA Thrust reduce_by_key using tuples as keys. In those examples, a tuple vector is created out of two vectors forming the key. Both the cases when the key and value arrays are host_vector’s and when they are regular, cudaMalloc’ed arrays are considered.
More

Tricks and Tips: Concurrently sorting many arrays with CUDA Thrust

The classical way to sort multiple arrays is the so-called back-to-back approach which uses uses thrust::stable_sort_by_key two times. One needs to create a keys vector such that elements within the same array have the same key. For example: Elements: 10.5 4.3 -2.3 0. 55. 24. 66. Keys:       0   0   0  1   1   1   1 In this case, one has two arrays, the first with 3 elements and the second with 4 elements. It is needed to first call thrust::stable_sort_by_key having the matrix values as the ke...
More

Tricks and Tips: Determining the least element and its position in each matrix column with CUDA Thrust

In this post, we consider two approaches for determining the minimum element along each column of a matrix. The first uses Thrust's reduce_by_key in conjunction with a transform iterator which performs an implicit matrix transposition. The second operates de facto an ordering of each row along with the corresponding element columns' indices . Of course, the second approach does something more than the first one. From it, not only the least but also the second to the least element can be determi...
More