Count the occurrences of numbers in a CUDA array

We comparing two approaches to count the occurrences of numbers in a CUDA array. The two approaches use CUDA Thrust: Using thrust::counting_iterator and thrust::upper_bound, following the histogram Thrust example; Using thrust::unique_copy and thrust::upper_bound. A fully worked example is available on our GitHub page. The first approach has shown to be the fastest. On an NVIDIA GTX 960 card, we have had the following timings for a number of N = 1048576 array elements: First ap...
More

Radix-4 Decimation-In-Frequency Iterative FFT

On our GitHub web page, we have made available a fully worked Matlab implementation of a radix-4 Decimation in Frequency FFT algorithm. In the code, we have also provided an overall operations count in terms of complex matrix multiplications and additions. It can be indeed shown that each radix-4 butterfly involves 3 complex multiplications and 8 complex additions. Since there are log4N = log2N / 2 stages and each stage involves N / 4 butterflies, so the operations count is complex mul...
More

Understanding the radix-2 FFT recursive algorithm

The recursive implementation of the radix-2 Decimation-In-Frequency algorithm can be understood using the following two figures. The first one refers to pushing the stack phase, while the second one illustrates the popping the stack phase.   In particular, the two figures illustrate the Matlab implementation that you may find on our GitHub website: Implementation I Implementation II The above recursive implementation is the counterpart of the iterative implemen...
More

Radix-2 Decimation-In-Frequency Iterative FFT

At the github page, we prove an implementation of the radix-2 Decimation-In-Frequency FFT in Matlab. The code is an iterative one and considers the scheme in the following figure: A recursive approach is also possible. The implementation calculates also the number of performed multiplications and additions and compares it with the theoretical calculations reported in “Number of operation counts for radix-2 FFTs”. The code is obviously much slower than the highly optimized FFTW explo...
More

Radix-2 Decimation-In-Time Iterative FFT

At the github page, we prove an implementation of the radix-2 Decimation-In-Time FFT in Matlab. The code is an iterative one and considers the scheme in the following figure:   A recursive approach is also possible. The implementation calculates also the number of performed multiplications and additions and compares it with the theoretical calculations reported in “Number of operation counts for radix-2 FFTs”. The code is obviously much slower than the highly optimized FFTW ...
More

Number of operation counts for radix-2 FFTs

The actual Floating Point Operations per Second (FLOPS) depend on the particular hardware and implementation and algorithms with higher number of Floating Point Operations (FLOPs) may correspond to lower FLOPS implementations, just because with such implementations you can more effectively exploit the hardware. To compute the number of floating point operations for a Decimation In Time (DIT) radix-2 approach, refer to the following figure: Let N be the length of the sequence to be transfor...
More

Complex numbers product using only three real multiplications

The product between two complex numbers can be performed with only three real multiplications. This is an application of Karatsuba's algorithm. Indeed, x = a + i * b; y = c + i * d; real(x * y) = a * c - b * d; imag(x * y) = (a + b) * (c + d) - a * c - b * d; Of course, the question is: can we perform the product between two complex numbers with less than three real multiplications? The answer is NO and is provided by Winograd's theorem in: Winograd, "On the number of multiplications...
More

Compiling Cuda mex files with Visual Studio 2013

Configuration: Matlab 2015b, Visual Studio 2013, Intel 64bit machine. In Visual Studio do the following: 1) File -> New Project; Select location and name; in the project type, select NVIDIA -> CUDA 8.0 (choose your CUDA version as appropriate); 2) Project -> Properties -> Configuration Manager -> Active Solution Platform -> choose x64; 3) Project -> Properties -> Configuration -> Release (possibly optional); 4) Project -> Properties -> Configuration ...
More

Generate a STY triangular spherical mesh in Blender

Please, refer to the “Removing starting cube and camera in Blender” post to obtain a clean Blender startup. Refer also to “Generate a STY triangular planar mesh in Blender” for further details. Shift + s -> Cursor to Center Lower horizontal command bar. Add -> Mesh -> UV sphere Click on the “+” which is in the upper right corner of the screen. The “+” appears on the left corner with respect to the right vertical command bar. After having clicked on “+”, a new menu will appear. ...
More

Generate a STY triangular cylindrical mesh in Blender

Please, refer to the “Removing starting cube and camera in Blender”post to obtain a clean Blender startup. Refer also to “Generate a STY triangular planar mesh in Blender” for further details. Shift + s -> Cursor to Center Lower horizontal command bar. Add -> Mesh -> Cylinder Click on the “+” which is in the upper right corner of the screen. The “+” appears on the left corner with respect to the right vertical command bar. After having clicked on “+”, a new menu will appear. ...
More