How to create a high resolution timer in Linux to measure program performance?

Question

I'm trying to compare GPU to CPU performance. For the NVIDIA GPU I've been using the cudaEvent_t types to get a very precise timing.

For the CPU I've been using the following code:

// Timers
clock_t start, stop;
float elapsedTime = 0;

// Capture the start time

start = clock();

// Do something here
.......

// Capture the stop time
stop = clock();
// Retrieve time elapsed in milliseconds
elapsedTime = (float)(stop - start) / (float)CLOCKS_PER_SEC * 1000.0f;

Apparently, that piece of code is only good if you're counting in seconds. Also, the results sometime come out quite strange.

Does anyone know of some way to create a high resolution timer in Linux?

See this question: stackoverflow.com/questions/700392/…

Steve-o
– Steve-o

2011-07-19 15:26:12 +00:00
Commented Jul 19, 2011 at 15:26 — Steve-o
– Steve-o, Commented Jul 19, 2011 at 15:26

h3ct0r · Accepted Answer · 2018-05-20 21:10:48Z

70

Check out clock_gettime, which is a POSIX interface to high-resolution timers.

If, having read the manpage, you're left wondering about the difference between CLOCK_REALTIME and CLOCK_MONOTONIC, see Difference between CLOCK_REALTIME and CLOCK_MONOTONIC?

See the following page for a complete example: http://www.guyrutenberg.com/2007/09/22/profiling-code-using-clock_gettime/

#include <iostream>
#include <time.h>
using namespace std;

timespec diff(timespec start, timespec end);

int main()
{
    timespec time1, time2;
    int temp;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time1);
    for (int i = 0; i< 242000000; i++)
        temp+=temp;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &time2);
    cout<<diff(time1,time2).tv_sec<<":"<<diff(time1,time2).tv_nsec<<endl;
    return 0;
}

timespec diff(timespec start, timespec end)
{
    timespec temp;
    if ((end.tv_nsec-start.tv_nsec)<0) {
        temp.tv_sec = end.tv_sec-start.tv_sec-1;
        temp.tv_nsec = 1000000000+end.tv_nsec-start.tv_nsec;
    } else {
        temp.tv_sec = end.tv_sec-start.tv_sec;
        temp.tv_nsec = end.tv_nsec-start.tv_nsec;
    }
    return temp;
}

edited May 20, 2018 at 21:10

h3ct0r

7362 gold badges11 silver badges24 bronze badges

answered Jul 19, 2011 at 15:27

NPE

503k114 gold badges970 silver badges1k bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

sj755 Over a year ago

Just so I'm clear about what I've read, can you give me an example on how you would use clock_gettime to find the time elapsed in nanoseconds?

NPE Over a year ago

@seljuq70: I've added a link to a complete example.

Owl Over a year ago

OP was posted C, but your answer is C++. Still useful, but not on my ZedBoard that has no C++ libs :D To fix, prefix the timespec with struct and strip out the couts.

itMaxence Over a year ago

so the answer explicitly speaks about CLOCK_REALTIME and CLOCK_MONOTONIC but we end up with CLOCK_PROCESS_CPUTIME_ID in the code sample? Can someone clear this up? What's the one to go?

jplozier Over a year ago

@itMaxence Check this out: stackoverflow.com/a/3527632/9732482

|

damian · Accepted Answer · 2018-02-15 14:06:54Z

21

To summarise information presented so far, these are the two functions required for typical applications.

#include <time.h>

// call this function to start a nanosecond-resolution timer
struct timespec timer_start(){
    struct timespec start_time;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start_time);
    return start_time;
}

// call this function to end a timer, returning nanoseconds elapsed as a long
long timer_end(struct timespec start_time){
    struct timespec end_time;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end_time);
    long diffInNanos = (end_time.tv_sec - start_time.tv_sec) * (long)1e9 + (end_time.tv_nsec - start_time.tv_nsec);
    return diffInNanos;
}

Here is an example of how to use them in timing how long it takes to calculate the variance of a list of input.

struct timespec vartime = timer_start();  // begin a timer called 'vartime'
double variance = var(input, MAXLEN);  // perform the task we want to time
long time_elapsed_nanos = timer_end(vartime);
printf("Variance = %f, Time taken (nanoseconds): %ld\n", variance, time_elapsed_nanos);

edited Feb 15, 2018 at 14:06

damian

3,6841 gold badge29 silver badges46 bronze badges

answered Nov 11, 2013 at 3:07

Alex

18.6k10 gold badges51 silver badges53 bronze badges

4 Comments

amaurea Over a year ago

Aren't you ignoring the tv_sec of the timespec? Also, why CLOCK_PROCESS_CPUTIME_ID rather than CLOCK_MONOTONIC?

TimZaman Over a year ago

The poster is comparing CPU to GPU performance. You are honestly giving code getting the CPU time. CLOCK_PROCESS_CPUTIME_ID. This means he will get speed ups many orders of magnitude. For CPU/GPU performance (this question) always use wall time. Remove this answer.

Alex Over a year ago

@TimZaman Yep, realtime might be better in the poster's use case. I'm not going to take down an answer though, obviously people have found it useful. Cheers.

fredk Over a year ago

Before using CLOCK_PROGRESS_CPUTIME_ID you shoud run grep constant_tsc /proc/cpuinfo to understand how this clock works. If you CPU does not support constant_tsc, the time reflects actual CPU clock cycles. If the flag is set, the clock is adjusted to account for current CPU frequency. I give this a -1 because time_elapsed_nanos is incorrectly calculated. This may be a better approach.

Karoly Horvath · Accepted Answer · 2011-07-19 15:28:40Z

1

struct timespec t;
clock_gettime(CLOCK_REALTIME, &t);

there is also CLOCK_REALTIME_HR, but I'm not sure whether it makes any difference..

answered Jul 19, 2011 at 15:28

Karoly Horvath

96.7k11 gold badges123 silver badges181 bronze badges

1 Comment

gsamaras Over a year ago

And I am not sure if CLOCK_REALTIME_HR is suported. Question.

Foo Bah · Accepted Answer · 2011-07-19 22:41:16Z

1

Are you interested in wall time (how much time actually elapses) or cycle count (how many cycles)? In the first case, you should use something like gettimeofday.

The highest resolution timer uses the RDTSC x86 assembly instruction. However, this measures clock ticks, so you should be sure that power saving mode is disabled.

The wiki page for TSC gives a few examples: http://en.wikipedia.org/wiki/Time_Stamp_Counter

answered Jul 19, 2011 at 22:41

Foo Bah

26.4k5 gold badges58 silver badges82 bronze badges

3 Comments

Peter Cordes Over a year ago

On a modern CPU, rdtsc correlates 1:1 with wall-clock time, not core clock cycles. It doesn't pause when your process (or the whole CPU) is sleeping, and it runs at constant frequency regardless of turbo / power-saving. Use performance counters to measure actual core clock cycles. e.g. perf stat awk 'BEGIN {for (i=0 ; i<10000000; i++){}}'.

radato Over a year ago

I am actually interested in wall time. Your reply hit the spot!

radato Over a year ago

Is it possible to link your reply to my original comment?

radato · Accepted Answer · 2022-03-27 05:10:57Z

1

After reading this thread I started testing the code for clock_gettime against c++11's chrono and they don't seem to match.

There is a huge gap between them!

The std::chrono::seconds(1) seems to be equivalent to ~70,000 of the clock_gettime

#include <ctime>
#include <cstdlib>
#include <cstring>
#include <iostream>
#include <thread>
#include <chrono>
#include <iomanip>
#include <vector>
#include <mutex>

timespec diff(timespec start, timespec end);
timespec get_cpu_now_time();
std::vector<timespec> get_start_end_pairs();
std::vector<timespec> get_start_end_pairs2();
void output_deltas(const std::vector<timespec> &start_end_pairs);

//=============================================================
int main()
{
    std::cout << "Hello waiter" << std::endl; // flush is intentional
    std::vector<timespec> start_end_pairs = get_start_end_pairs2();
    output_deltas(start_end_pairs);

    return EXIT_SUCCESS;
}

//=============================================================
std::vector<timespec> get_start_end_pairs()
{
    std::vector<timespec> start_end_pairs;
    for (int i = 0; i < 20; ++i)
    {
        start_end_pairs.push_back(get_cpu_now_time());
        std::this_thread::sleep_for(std::chrono::seconds(1));
        start_end_pairs.push_back(get_cpu_now_time());
    }

    return start_end_pairs;
}


//=============================================================
std::vector<timespec> get_start_end_pairs2()
{
    std::mutex mu;
    std::vector<std::thread> workers;
    std::vector<timespec> start_end_pairs;
    for (int i = 0; i < 20; ++i) {
        workers.emplace_back([&]()->void {
            auto start_time = get_cpu_now_time();
            std::this_thread::sleep_for(std::chrono::seconds(1));
            auto end_time = get_cpu_now_time();
            std::lock_guard<std::mutex> locker(mu);
            start_end_pairs.emplace_back(start_time);
            start_end_pairs.emplace_back(end_time);
        });
    }

    for (auto &worker: workers) {
        if (worker.joinable()) {
            worker.join();
        }
    }

    return start_end_pairs;
}

//=============================================================
void output_deltas(const std::vector<timespec> &start_end_pairs)
{
    std::cout << "size: " << start_end_pairs.size() << std::endl;
    for (auto it_start = start_end_pairs.begin(); it_start < start_end_pairs.end(); it_start += 2)
    {
        auto it_end = it_start + 1;
        auto delta = diff(*it_start, *it_end);

        std::cout
                << std::setw(2)
                << std::setfill(' ')
                << std::distance(start_end_pairs.begin(), it_start) / 2
                << " Waited ("
                << delta.tv_sec
                << "\ts\t"
                << std::setw(9)
                << std::setfill('0')
                << delta.tv_nsec
                << "\tns)"
                << std::endl;
    }
}

//=============================================================
timespec diff(timespec start, timespec end)
{
    timespec temp;
    temp.tv_sec = end.tv_sec-start.tv_sec;
    temp.tv_nsec = end.tv_nsec-start.tv_nsec;

    if (temp.tv_nsec < 0) {
        --temp.tv_sec;
        temp.tv_nsec += 1000000000;
    }
    return temp;
}

//=============================================================
timespec get_cpu_now_time()
{
    timespec now_time;
    memset(&now_time, 0, sizeof(timespec));
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &now_time);

    return now_time;
}

output:

Hello waiter
 0 Waited (0    s       000843254       ns)
 1 Waited (0    s       000681141       ns)
 2 Waited (0    s       000685119       ns)
 3 Waited (0    s       000674252       ns)
 4 Waited (0    s       000714877       ns)
 5 Waited (0    s       000624202       ns)
 6 Waited (0    s       000746091       ns)
 7 Waited (0    s       000575267       ns)
 8 Waited (0    s       000860157       ns)
 9 Waited (0    s       000827479       ns)
10 Waited (0    s       000612959       ns)
11 Waited (0    s       000534818       ns)
12 Waited (0    s       000553728       ns)
13 Waited (0    s       000586501       ns)
14 Waited (0    s       000627116       ns)
15 Waited (0    s       000616725       ns)
16 Waited (0    s       000616507       ns)
17 Waited (0    s       000641251       ns)
18 Waited (0    s       000683380       ns)
19 Waited (0    s       000850205       ns)

edited Mar 27, 2022 at 5:10

answered Jan 8, 2019 at 17:43

radato

94011 silver badges28 bronze badges

5 Comments

Simone-Cu Over a year ago

I guess ++temp.tv_sec; is a type and you meant --temp.tv_sec; in the diff function.

radato Over a year ago

it is not a type, when I subtract the 2 structs, I take into account that there might be a carry over

Simone-Cu Over a year ago

Yes, understood that. But when you do the carry from sec to nsec, you should subtract 1 tothe seconds field and sum 1000000000 (1s) to the nsec field. Let's say (10s and 900ns) - (5s and 1000ns) --> 5s and -100ns --> 4s and (-100+10^9)ns. The last step decreases the sec, thus doing the carry.

radato Over a year ago

Yes correct, I fixed the answer accordingly

Hunaja Apr 3 at 9:51

You're not actually measuring the same things here. You're measuring CPU time with CLOCK_PROCESS_CPUTIME_ID, but you're putting threads to sleep for 1 sec. If the threads are actually sleeping and not spinning, the thread is not taking up time on the processor. If you add a measurement of wall clock time here as well and then check the difference between wall clock time and process time, you'll see they're very close to 1s. Basically you're measuring here everything but the 1 sec of sleeping.

Nikolai Fetissov · Accepted Answer · 2011-07-19 15:27:30Z

0

clock_gettime(2)

answered Jul 19, 2011 at 15:27

Nikolai Fetissov

84.6k13 gold badges118 silver badges175 bronze badges

1 Comment

Dirk is no longer here Over a year ago

clock_gettime is preferable as it gets you nanoseconds.

Kevin Lee · Accepted Answer · 2018-03-02 03:27:13Z

epoll implemention: https://github.com/ielife/simple-timer-for-c-language

use like this:

timer_server_handle_t *timer_handle = timer_server_init(1024);
if (NULL == timer_handle) {
    fprintf(stderr, "timer_server_init failed\n");
    return -1;
}
ctimer timer1;
    timer1.count_ = 3;
    timer1.timer_internal_ = 0.5;
    timer1.timer_cb_ = timer_cb1;
    int *user_data1 = (int *)malloc(sizeof(int));
    *user_data1 = 100;
    timer1.user_data_ = user_data1;
    timer_server_addtimer(timer_handle, &timer1);

    ctimer timer2;
    timer2.count_ = -1;
    timer2.timer_internal_ = 0.5;
    timer2.timer_cb_ = timer_cb2;
    int *user_data2 = (int *)malloc(sizeof(int));
    *user_data2 = 10;
    timer2.user_data_ = user_data2;
    timer_server_addtimer(timer_handle, &timer2);

    sleep(10);

    timer_server_deltimer(timer_handle, timer1.fd);
    timer_server_deltimer(timer_handle, timer2.fd);
    timer_server_uninit(timer_handle);

Collectives™ on Stack Overflow

How to create a high resolution timer in Linux to measure program performance?

7 Answers 7

7 Comments

4 Comments

1 Comment

3 Comments

5 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

7 Comments

4 Comments

1 Comment

3 Comments

5 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related