Block reduction in CUDA

In many CUDA reduction examples, the basic idea is to perform a block reduction first and then reducing the partial results from all the blocks.

It is good to know that CUB provides a block reduction primitive, called BlockReduce.

On our GitHub website a fully worked example is available.

In that example, an array of length N is created and the result is the sum of 32 consecutive elements, being 32 the block size:

result[0] = data[0] + ... + data[31];
result[1] = data[32] + ... + data[63];

Leave a Reply

Your email address will not be published. Required fields are marked *