Tricks and Tips: Reduce matrix columns with CUDA

We here reporting 4  approaches for column matrix reduction, 3 of them based on using CUDA Thrust and 1 based on using cublas<t>gemv() with a column of 1’s.

The CUDA Thrust approaches are the analogous of our previous post: Reduce matrix rows with CUDA with an implicit transposition obtained by

thrust::make_permutation_iterator(d_matrix.begin(),
   thrust::make_transform_iterator(thrust::make_counting_iterator(0), (_1 % Nrows) * Ncols + _1 / Nrows))

The full code is reported on our github page .

Leave a Reply

Your email address will not be published. Required fields are marked *