We comparing two approaches to count the occurrences of numbers in a CUDA array.
The two approaches use CUDA Thrust:
- Using thrust::counting_iterator and thrust::upper_bound, following the histogram Thrust example;
- Using thrust::unique_copy and thrust::upper_bound.
A fully worked example is available on our GitHub page.
The first approach has shown to be the fastest. On an NVIDIA GTX 960 card, we have had the following timings for a number of N = 1048576 array elements:
First approach: 2.35ms First approach without thrust::adjacent_difference: 1.52 Second approach: 4.67ms
Please, note that there is no strict need to calculate the adjacent difference explicitly, since this operation can be manually done during a kernel processing, if needed.