I have the following function with inline assembly that works fine on debug mode in 32 bit Visual Studio 2008:
__device__ void add(int* pa, int* pb)
{
asm(".reg .u32 s<3>;"::);
asm(".reg .u32 r<14>;"::);
asm("ld.global.b32 s0, [%0];"::"r"(&pa)); //load addresses of pa, pb
printf(...);
asm("ld.global.b32 s1, [%0];"::"r"(&pb));
printf(...);
asm("ld.global.b32 r1, [s0+8];"::);
printf(...);
asm("ld.global.b32 r2, [s1+8];"::);
printf(...);
...// perform some operations
}
pa and pb are globally allocated on the device such as
__device__ int pa[3] = {0, 0x927c0000, 0x20000011};
__device__ int pb[3] = {0, 0xbb900000, 0x2000000b};
However, this code fails on release mode, on line asm("ld.global.b32 r1, [s0+8];"::);
How can I load function parameters correctly with inline ptx on release mode?
P.S. building the release mode with -G flag (Generates GPU debug info) causes the code to run correctly on release mode. Thank you,
pamay be getting optimized into a register in release mode, and you can't take the address of a register. That's just a guess. It would be helpful if you defined whether the failure you are observing is at compile time or run time. If at compile time, what error exactly are you getting? If at runtime, how have you localized to this line of code? Why do you want to take the address ofpaanyway? Taking the address of a function parameter seems odd. To modify pa in the calling context, you need to pass a pointer to it, i.e.**paDid you mean*paor&(pa[0])orpa?asm("mov.b32 s0, %0;"::"r"(pa));?