Tricks and Tips: The NVCC –arch and –code options

A CUDA executable contains 2 types of program data: SASS code which is basically GPU machine code, and PTX which is an intermediate language code very close to machine code. As long as PTX code is present in the executable, then if the driver decides that a proper SASS binary is not available for the GPU that the code will actually run on, it will do a Just-In-Time (JIT) compilation step at application launch, to create the necessary binary code appropriate for the device in question, using the ...
More

Tricks and Tips: Obtaining CUDA assembly

PTX is an intermediate language designed to be portable across multiple GPU architectures, but it is not the ultimate machine code executed by the GPU. Indeed, it gets compiled by the compiler component PTXAS into the final machine code, also referred to as SASS, for the particular architecture at hand. The final machine code actually executed by the GPU can be obtained by disassembling it with the cuobjdump utility. To do so, in a Visual Studio Cuda Project go to: Project -> Properties ->...
More