# CUDA mex function using real data residing on the host and producing real results on the host In the CUDA_mex_host_to_device GitHub directory, we provide an example on how creating a mex function executing on the GPU when the input real data reside on the host and the final results are returned on the host. The first thing to do is to recover the pointer to the first element of the real data from the Matlab input array/matrix: double *h_input = mxGetPr(prhs); We can also recover the number of elements of the input variable (the input variable can be also a matrix) as: in...
More

# Compiling mex files with Visual Studio 2013 Configuration: Matlab 2015b, Visual Studio 2013, Intel 64bit machine. In Visual Studio do the following: 1) File -> New Project; Select location and name; in the project type, select Templates -> Visual C++ -> Win32 -> Win32 Console Application -> OK; 2) In the Win32 Application Wizard, click Next, in the Application Type choose DLL, then click Finish. 3) Project -> Properties -> Configuration Manager -> Active Solution Platform -> New -> Type or Select ...
More On our GitHub web page, we have made available a fully worked Matlab implementation of a radix-4 Decimation in Frequency FFT algorithm. In the code, we have also provided an overall operations count in terms of complex matrix multiplications and additions. It can be indeed shown that each radix-4 butterfly involves 3 complex multiplications and 8 complex additions. Since there are log4N = log2N / 2 stages and each stage involves N / 4 butterflies, so the operations count is complex mul...
More

# Understanding the radix-2 FFT recursive algorithm The recursive implementation of the radix-2 Decimation-In-Frequency algorithm can be understood using the following two figures. The first one refers to pushing the stack phase, while the second one illustrates the popping the stack phase.   In particular, the two figures illustrate the Matlab implementation that you may find on our GitHub website: Implementation I Implementation II The above recursive implementation is the counterpart of the iterative implemen...
More At the github page, we prove an implementation of the radix-2 Decimation-In-Frequency FFT in Matlab. The code is an iterative one and considers the scheme in the following figure: A recursive approach is also possible. The implementation calculates also the number of performed multiplications and additions and compares it with the theoretical calculations reported in “Number of operation counts for radix-2 FFTs”. The code is obviously much slower than the highly optimized FFTW explo...
More At the github page, we prove an implementation of the radix-2 Decimation-In-Time FFT in Matlab. The code is an iterative one and considers the scheme in the following figure:   A recursive approach is also possible. The implementation calculates also the number of performed multiplications and additions and compares it with the theoretical calculations reported in “Number of operation counts for radix-2 FFTs”. The code is obviously much slower than the highly optimized FFTW ...
More

# Compiling Cuda mex files with Visual Studio 2013 Configuration: Matlab 2015b, Visual Studio 2013, Intel 64bit machine. In Visual Studio do the following: 1) File -> New Project; Select location and name; in the project type, select NVIDIA -> CUDA 8.0 (choose your CUDA version as appropriate); 2) Project -> Properties -> Configuration Manager -> Active Solution Platform -> choose x64; 3) Project -> Properties -> Configuration -> Release (possibly optional); 4) Project -> Properties -> Configuration ...
More

# Sparse 3D Matrices in Matlab As known, Matlab does not directly deal with 3D matrices. A workaround is using cell arrays of sparse matrices. Suppose that you want to create a sparse matrix containing only the elements (1, 1, 1) and (1, 3, 50) and suppose that A(1, 1, 1) = 1 and A(1, 3, 10) = 54. You can do the following: mySp{1}  = sparse(3, 3); mySp{10} = sparse(3, 3);   mySp{1}(1, 1) = 3; mySp{10}(1, 3) = 54; In this way, >> mySp   mySp = [3x3 double]    []    []    []    []    []...
More

# Emulating Matlab’s meshgrid in CUDA On our GitHub webpage, we are posting a worked example implementing in CUDA the classical Matlab's meshgrid function. In Matlab: x = [1 2 3]; y = [4 5 6 7]; [X, Y] = meshgrid(x, y); produces: X = 1     2     3 1     2     3 1     2     3 1     2     3 and Y = 4     4     4 5     5     5 6     6     6 7     7     7 X is exactly the four-fold replication of the x array, while Y is the three-fold consecutive replication of each element of the y array.
More

# Tricks and Tips: Exchanging data between Matlab and C++ in ASCII format

To save data from C++ in a txt format file, do the following (full worked example to store an array of N double precision real numbers): #include <fstream> int main() {     const int N = 6;     double *U = (double*)malloc(N * sizeof(double));          for (int i=0; i<N; i++) U[i] = 2.5 * (double)i;          std::ofstream outfile;     outfile.open("data.txt");     for(int i=0; i<N; i++){         outfile << U[i];         outfile << "n";     }     outfile.close();     r...
More