Implementing a critical section in CUDA

Implementing a critical section in CUDA
Critical sections are sequences of operations that must be executed sequentially by the CUDA threads. Suppose to construct a kernel which has the task of computing the number of thread blocks of a thread grid. One possible idea is to let each thread in each block with threadIdx.x == 0 increase a global counter. To prevent race conditions, all the increases must occur sequentially, so they must ...
More

Graphical connections to Ubuntu Linux from Windows

Graphical connections to Ubuntu Linux from Windows
Suppose that you have a Windows system and that you want to connect to a remote Linux Ubuntu machine; suppose that you also want to run some applications of that machine, having at disposal also their graphical interface. First step: configure the Windows system Download Putty. Install Xming. Use a simple google search for “sourceforge xming x server windows”. When asking for the fonts to b...
More

Tricks and Tips – Using omp_set_num_threads and omp_get_num_threads

Tricks and Tips – Using omp_set_num_threads and omp_get_num_threads
Posted on
When programming with OpenMP, it should be noticed that omp_get_num_threads() returns 1 in sequential sections of the code. Accordingly, even if setting, by omp_set_num_threads(), an overall number of threads larger than 1, any call to omp_get_num_threads() will return 1, unless we are in a parallel section. The example on our GitHub website tries to clarify this point.
More

Compiling mex files with Visual Studio 2013

Compiling mex files with Visual Studio 2013
Configuration: Matlab 2015b, Visual Studio 2013, Intel 64bit machine. In Visual Studio do the following: 1) File -> New Project; Select location and name; in the project type, select Templates -> Visual C++ -> Win32 -> Win32 Console Application -> OK; 2) In the Win32 Application Wizard, click Next, in the Application Type choose DLL, then click Finish. 3) Project -> P...
More

A thing to care about when passing a struct to a CUDA kernel

A thing to care about when passing a struct to a CUDA kernel
Structures can be passed by values to CUDA kernels. However, some care should be devoted to set up a proper destructor since the destructor is called at exit from the kernel. Consider this example with the uncommented destructor and do not pay too much attention on what the code actually does. If you run that code, you will receive the following output: Calling destructor Counting in the lo...
More

Count the occurrences of numbers in a CUDA array

Count the occurrences of numbers in a CUDA array
Posted on
We comparing two approaches to count the occurrences of numbers in a CUDA array. The two approaches use CUDA Thrust: Using thrust::counting_iterator and thrust::upper_bound, following the histogram Thrust example; Using thrust::unique_copy and thrust::upper_bound. A fully worked example is available on our GitHub page. The first approach has shown to be the fastest. On an NVIDIA GTX...
More

Radix-4 Decimation-In-Frequency Iterative FFT

Radix-4 Decimation-In-Frequency Iterative FFT
On our GitHub web page, we have made available a fully worked Matlab implementation of a radix-4 Decimation in Frequency FFT algorithm. In the code, we have also provided an overall operations count in terms of complex matrix multiplications and additions. It can be indeed shown that each radix-4 butterfly involves 3 complex multiplications and 8 complex additions. Since there are log4N = l...
More

Understanding the radix-2 FFT recursive algorithm

Understanding the radix-2 FFT recursive algorithm
Posted on
The recursive implementation of the radix-2 Decimation-In-Frequency algorithm can be understood using the following two figures. The first one refers to pushing the stack phase, while the second one illustrates the popping the stack phase.   In particular, the two figures illustrate the Matlab implementation that you may find on our GitHub website: Implementation I Im...
More

Radix-2 Decimation-In-Frequency Iterative FFT

Radix-2 Decimation-In-Frequency Iterative FFT
Posted on
At the github page, we prove an implementation of the radix-2 Decimation-In-Frequency FFT in Matlab. The code is an iterative one and considers the scheme in the following figure: A recursive approach is also possible. The implementation calculates also the number of performed multiplications and additions and compares it with the theoretical calculations reported in “Number of operation...
More

Radix-2 Decimation-In-Time Iterative FFT

Radix-2 Decimation-In-Time Iterative FFT
Posted on
At the github page, we prove an implementation of the radix-2 Decimation-In-Time FFT in Matlab. The code is an iterative one and considers the scheme in the following figure:   A recursive approach is also possible. The implementation calculates also the number of performed multiplications and additions and compares it with the theoretical calculations reported in “Number of oper...
More