Warp reduction in CUDA

Reduction examples in CUDA typically refer to the case of large arrays. However, in some cases there is the need to sum a very large number of small arrays.
For this case, warp reduction offers many advantages.

Instead of coding your own warp reduction, a very good point is to use CUB primitives, in particular, CUB’s WarpReduce primitive.

On our GitHub website a fully worked example is available.
In that example, an array of length N is created and the result is the sum of 32 consecutive elements:

result[0] = data[0] + ... + data[31];

result[1] = data[32] + ... + data[63];


Leave a Reply

Your email address will not be published. Required fields are marked *