Optimizing the solution of the 2D diffusion (heat) equation in CUDA

On our GitHub website we are posting a fully worked code concerning the optimization of the solution approach for the 2D heat equation.

Five approaches are considered, using:

  1. Global memory, essentially the OP’s approach;
  2. Shared memory of size BLOCK_SIZE_X x BLOCK_SIZE_Y not loading the halo regions;
  3. Shared memory of size BLOCK_SIZE_X x BLOCK_SIZE_Y loading the halo regions;
  4. Shared memory of size (BLOCK_SIZE_X + 2) x (BLOCK_SIZE_Y + 2) loading the halo regions;
  5. Texture memory.

Everybody can run the code and check out which approach is faster for his own GPU architecture.

Leave a Reply

Your email address will not be published. Required fields are marked *