3

I am trying to profile an x86 Assembly program using Ubuntu 12.04. I'd like to use the rdtsc function. The problem is, according to a comment, that I should get the number of cycles in rdx but with the following code I get a too high number:

SECTION .bss

SECTION .dat

SECTION .text


global main         

main:           
nop

cpuid
rdtsc
shl rdx, 32
or rdx, rax
mov r8, rdx

xor esi,esi
mov esi,19        ; instructions to be monitored


cpuid
rdtsc
shl rdx, 32
or rdx, rax
sub rdx, r8

Running it in a debugger I get the following results on registers after the sub instruction:

rax     0xd88102bc
rbx     0x0
rcx     0xf0
rdx     0x44f3914a0
rsi     0x13
rdi     0x1
rbp     0x0
rsp     0x7fffffffdf38
r8      0x11828947ee1c

I can't figure out why the number of cycles in rdx is so high for so simple instructions. Is the right number in rcx? Isn't it too high too?

Thanks in advance

1 Answer 1

8

I'm not sure what's happening, but when you're calling C functions from assembler you should usually prefix them with a leading underscore, for example call _clock. This is because the C compiler will add this prefix to all functions it generates.

Additionally as you're on a 64-bit architecture the 64-bit result should end up in rax, you should ensure you're looking at that, not eax and ebx.

Finally I'd suggest rather than using clock you should use the assembler instruction rdtsc. This will return a 64-bit result in edx:eax. It's relative rather than absolute and is measured in cycles rather than some fractions of seconds, but it should be exactly what you need for profiling.

Example:

cpuid
rdtsc
shl rdx, 32
or rdx, rax
mov r8, rdx
<expensive assembler code>
cpuid
rdtsc
shl rdx, 32
or rdx, rax
sub rdx, r8

This will leave the number of ticks that elapsed in rdx. The cpuid instructions are to prevent the processor from reordering instructions around the profiling points.

Sign up to request clarification or add additional context in comments.

9 Comments

I've tried the way you suggest but I had to put some modifications due to opcode operand memory errors. First I tried to make the add instructions add edx,eax and then add rdx,rax because having the operand of different sizes gives errors. I got in rdx respectively xa5e996bc and 2c4b89fe072. Aren't they too high values if I put your instructions between a simple xor and assignment ones? Which is the correct one to be used? I'd say the one with eax,edx but the number seems to me still too high
You're right about the mistake in the assembler code. I've corrected it so that it clears rax first, then adds the whole of rax, not just eax. You can't just do add eax, edx because the carry won't propagate correctly. If you're doing this correctly then the result in rdx should be a relatively small number. Make sure you're not clobbering rcx in your code in the middle.
Excuse me, probably I'm not getting something right. I did exactly as you wrote and I got in rdx a number of the order 1000 000 000 000 000.
...As I noticed that rcx got clobbered by the second cpuid instruction I tried to push it just before the second cpuid and to pop it just before the sub instruction to make the subtraction with the old value. In this case I got a number of the order 22 000 000 000 which, if I have understood right what I'm doing, means on a 2.86 GHz cpu that the whole process should take more or less 10 seconds. In fact my process take the time of an eye-blinking. Sorry again but I'm rather puzzled by this behavior, I'm not sure I'm doing right...
I'm not sure. I'd suggest you post a minimal yet complete example as a new question. That'll make it easier for people to look at. If it turns out there's a problem with the code in this answer I'll edit it or you can.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.