Reduction examples in CUDA typically refer to the case of large arrays. However, in some cases there is the need to sum a very large number of small arrays.
For this case, warp reduction offers many advantages.
Instead of coding your own warp reduction, a very good point is to use CUB primitives, in particular, CUB's WarpReduce primitive.
On our GitHub website a fully worked example is available.
In that example, an array of length N is created and the result is the sum of 32 consecutiv...

More
# Reduction

# Conditional reduction in CUDA

Conditional reduction amounts at applying a reduction operation to the only elements of an array satisfying a certain predicate.
To perform conditional reduction, one can directly introduce the condition as a multiplication by 0 (false) or 1 (true) to the addends (in case the reduction is a summation).
In other words, suppose that the condition one would like to meet is that the addends be smaller than 10.
In this case, borrowing the first code at Optimizing Parallel Reduction in CUDA by M. Harr...

More