Tricks and Tips: Obtaining CUDA assembly

PTX is an intermediate language designed to be portable across multiple GPU architectures, but it is not the ultimate machine code executed by the GPU.
Indeed, it gets compiled by the compiler component PTXAS into the final machine code, also referred to as SASS, for the particular architecture at hand.
The final machine code actually executed by the GPU can be obtained by disassembling it with the cuobjdump utility.
To do so, in a Visual Studio Cuda Project go to:

Project -> Properties -> Configuration Properties -> CUDA C/C++ -> Common -> Keep Preprocessed Files -> choose Yes (–keep)

Open a command window, go to the Release folder of your VS project: ..Project_NameProject_NameRelease and type:

 cuobjdump yourkernel.sm_21.cubin –dump -sass

yourkernel.sm_21.cubin is the file containing a fat binary which may contain one or more device-specific binary images (in this case, specific to sm_21) as well as (optionally) PTX.

In the command window, you will obtain something like

Function : _Z11simple_copyPfPKf

.headerflags    @"EF_CUDA_SM20 EF_CUDA_PTX_SM(EF_CUDA_SM20)"

/*0000*/        MOV R1, c[0x1][0x100];                               /* 0x2800440400005de4 */

/*0008*/        NOP;                                                           /* 0x4000000000001de4 */

/*0010*/        MOV R0, c[0x0][0x14];                        /* 0x2800400050001de4 */

/*0018*/        S2R R2, SR_CTAID.Y;                                  /* 0x2c00000098009c04 */

/*0020*/        SHL R0, R0, 0x5;                               /* 0x6000c00014001c03 */

/*0028*/        S2R R3, SR_TID.Y;                             /* 0x2c0000008800dc04 */

/*0030*/        ISCADD R3, R2, R3, 0x5;                              /* 0x400000000c20dca3 */

/*0038*/        S2R R4, SR_CTAID.X;                                  /* 0x2c00000094011c04 */

/*0040*/        S2R R5, SR_TID.X;                             /* 0x2c00000084015c04 */

/*0048*/        ISCADD R2, R4, R5, 0x5;                             /* 0x4000000014409ca3 */

/*0050*/        IMAD R2, R0, R3, R2;                         /* 0x200400000c009ca3 */

/*0058*/        ISCADD R0, R2, c[0x0][0x24], 0x2;      /* 0x4000400090201c43 */

/*0060*/        ISCADD R2, R2, c[0x0][0x20], 0x2;      /* 0x4000400080209c43 */

/*0068*/        LD R0, [R0];                                       /* 0x8000000000001c85 */

/*0070*/        ST [R2], R0;                                       /* 0x9000000000201c85 */

/*0078*/        EXIT ;                                                /* 0x8000000000001de7 */

.....................................

Leave a Reply

Your email address will not be published. Required fields are marked *