Tricks and Tips: The NVCC –arch and –code options

A CUDA executable contains 2 types of program data: SASS code which is basically GPU machine code, and PTX which is an intermediate language code very close to machine code.
As long as PTX code is present in the executable, then if the driver decides that a proper SASS binary is not available for the GPU that the code will actually run on, it will do a Just-In-Time (JIT) compilation step at application launch, to create the necessary binary code appropriate for the device in question, using the PTX code in the application package.
With the –arch and –code options, device code architecturally conforming to the -arch type, but compiled to use machine level instructions associated with the code type is created.
For example, compiling for arch = 1.2 and code = 2.0 means that the double type cannot be used (double variables will be demoted to float, because double is not supported in a 1.2 architecture) but the SASS machine code generated will be ready to execute on a cc 2.0 device, and will not require a JIT-compile step for that kind of device.
So if a kernel is compiled for one device architecture, and it successfully runs on another device architecture, it’s due to the JIT-compile mechanism.

Leave a Reply

Your email address will not be published. Required fields are marked *