This is an unusual way of performing fftshift, yet of interest for CUDA developers. The code for the 1D case is

__global__ void 1dfftshift(double2 *u_d, int N) { int i = blockDim.x * blockIdx.x + threadIdx.x; if(i < N) { double a = pow(-1.0,i&1); u_d[i].x *= a; u_d[i].y *= a; } }

It consists in multiplying the vector to be transformed by a sequence of 1s and -1s which is equivalent to the multiplication by exp(-j*n*pi) and thus to a shift in the conjugate domain.

You have to call this kernel before and after the application of the CUFFT.

One pro is that memory movements/swapping are avoided and that the idea is simple and can be immediately extended to the 2D case, see below:

#define IDX2R(i,j,N) (((i)*(N))+(j)) __global__ void fftshift_2D(double2 *data, int N1, int N2) { int i = threadIdx.x + blockDim.x * blockIdx.x; int j = threadIdx.y + blockDim.y * blockIdx.y; if (i < N1 && j < N2) { double a = 1-2*((i+j)&1); data[j*blockDim.x*gridDim.x+i].x *= a; data[j*blockDim.x*gridDim.x+i].y *= a; } }

I have implemented a library for executing the real fft shift on CUDA for 1D, 2D, and 3D flat arrays. You can find it here (https://github.com/marwan-abdellah/cufftShift).

You can also cite these papers:

[1] Marwan Abdellah, Salah Saleh, Ayman Eldeib and Amr Shaaraw, “High Performance Multi-dimensional (2D/3D) FFT-Shift Implementation on Graphics Processing Units (GPUs)”, Proc. 6th Cairo International Biomedical Engineering Conf., Cairo, Egypt, 2012.

[2] CUFFTSHIFT: High Performance CUDA-accelerated FFT-Shift Library, 22nd High Performance Computing Symposium (HPC 2014), Tampa, FL, USA.