Tricks and Tips: FFTShift in CUDA

This is an unusual way of performing fftshift, yet of interest for CUDA developers. The code for the 1D case is


__global__ void 1dfftshift(double2 *u_d, int N)
   int i = blockDim.x * blockIdx.x + threadIdx.x;
   if(i < N)
     double a = pow(-1.0,i&1);
     u_d[i].x *= a;
     u_d[i].y *= a;

It consists in multiplying the vector to be transformed by a sequence of 1s and -1s which is equivalent to the multiplication by exp(-j*n*pi) and thus to a shift in the conjugate domain.

You have to call this kernel before and after the application of the CUFFT.

One pro is that memory movements/swapping are avoided and that the idea is simple and can be immediately extended to the 2D case, see below:


#define IDX2R(i,j,N) (((i)*(N))+(j))

__global__ void fftshift_2D(double2 *data, int N1, int N2)
    int i = threadIdx.x + blockDim.x * blockIdx.x;
    int j = threadIdx.y + blockDim.y * blockIdx.y;

    if (i < N1 && j < N2) {
       double a = 1-2*((i+j)&1);

       data[j*blockDim.x*gridDim.x+i].x *= a;
       data[j*blockDim.x*gridDim.x+i].y *= a;


One thought on “Tricks and Tips: FFTShift in CUDA

  1. I have implemented a library for executing the real fft shift on CUDA for 1D, 2D, and 3D flat arrays. You can find it here (

    You can also cite these papers:
    [1] Marwan Abdellah, Salah Saleh, Ayman Eldeib and Amr Shaaraw, “High Performance Multi-dimensional (2D/3D) FFT-Shift Implementation on Graphics Processing Units (GPUs)”, Proc. 6th Cairo International Biomedical Engineering Conf., Cairo, Egypt, 2012.
    [2] CUFFTSHIFT: High Performance CUDA-accelerated FFT-Shift Library, 22nd High Performance Computing Symposium (HPC 2014), Tampa, FL, USA.

Leave a Reply

Your email address will not be published. Required fields are marked *