Sorting many small arrays in CUDA

In many applications, the problem of sorting many small arrays in CUDA arises.

CUB offers a possible solution to face this problem. On our Git Hub website, we report an example that can be reused for this purpose.

The idea is assigning the small arrays to be sorted to different thread blocks and then using cub::BlockRadixSort to sort each array. Two versions are provided, one loading and one loading the small arrays into shared memory.

