In this post we explore the possibility of using a single cufftPlanMany to perform 1D FFTs of the columns of a 3D matrix.
Transformations performed according to cufftPlanMany, that is called like
cufftPlanMany(&handle, rank, n, inembed, istride, idist, onembed, ostride, odist, CUFFT_C2C, batch);
must obey the Advanced Data Layout . In particular, 1D FFTs are worked out according to the following layout
input[b * idist + x * istride]
where b addresses the b-th signal and istride is the distan...

More
# Numerical methods

# Romberg's integration of real valued functions by CUDA Thrust

From the point of view of parallel programming, integration is basically a reduction, so that a very simple way to implement integration in CUDA is exploiting the primitives of the Thrust library.
Below is a simple example implementing the Romberg integration method by the Thrust primitives.
It is a "direct" translation of the corresponding Matlab code available at this site, so this example also shows how "simply" some Matlab codes can be ported to CUDA by Thurst.
#include <thrust/sequen...

More
# Simpson's integration of real valued functions by CUDA Thrust

Simpson's integration can be easily implemented by using CUDA Thrust. You basically need five steps:
1. Generate the Simpson's quadrature weights;
2. Generate the function sampling points;
3. Generate the function values;
4. Calculate the elementwise product between the quadrature weights and the function values;
5. Sum the above products.
Step #1 requires creating an array with elements repeated many times, namely, 1 4 2 4 2 4 ... 1 for the Simpson's case.
This can be accomplished by bor...

More
# CUDA Thrust: Computing finite differences with Thrust

Thrust is a library of parallel algorithms whose routines have been build up so to resemble calls to the C++ Standard Template Library (STL). It is very useful to quickly achieve parallelization with a high level syntax.
By Thrust, legacy (e.g., Matlab written) codes can be easily translated to a CUDA parallel approach. For example, the following Matlab syntax
d_diff2(1:n-2) = d_y(3:n) - 2. * d_y(2:n-1) + d_y(1:n-2);
to calculate second order central differences can be obtained as an applicatio...

More
# Gaussian elimination with CUDA

The VS project Gaussian elimination with CUDA, that you may find in download section, contains CPU and GPU routines for solving a linear system of equations by Gaussian elimination without pivoting.
Besides providing standard CPU Gaussian elimination and solution of an upper triangular system, sequential and parallel codes have been developed based on the paper:
Manuel Carcenac, "From tile algorithm to stripe algorithm: a CUBLAS-based parallel implementation on GPUs of Gauss method for the resol...

More
# Lagrange polynomials for interpolation

Linear interpolation consists of approximating a function [latex size="1"]f(x)[/latex] as:
[latex size="1"]f(x)=sum_{i=1}^{N}a_i phi_i(x);;;;;;;;;;;(1)[/latex]
where the [latex size="1"]a_i[/latex]'s are the interpolation coefficients and the [latex size="1"]phi_i[/latex]'s are prefixed interpolation functions.
Lagrange interpolation, which is one of the simplest and mostly employed interpolation methods, consists of finding the interpolation coefficients as the solution of the linear system
[...

More
# Finite Difference Time Domain Method

The Finite Difference Time Domain (FDTD) method is a very popular and rather intuitive method to solve partial differential equations describing time-evolving phenomena.
The relevant partial differential equations are discretized in both time and space by finite differences.
“Update equations” expressing the (unknown) future fields in terms of (known) past fields are established.
We here focus on the electromagnetic case, in which Maxwell's equations should be solved, although FDTD is very ...

More