Why are my preprocessor-macros slower or faster than manual writing seemingly random?

Question

Yesterday at work, my colleague claimed that preprocessor macros were slower than writing variables and functions manually. The context is that we have a class in which member variables are sometimes added and for each of these member variables, 3 different methods have to be created in exactly the same pattern. We had these generated automatically using macros, as shown below.

struct Bar
{
    long long a;
    long long b;
    long long c;
    long long d;
};

struct Foo
{
    Bar var[1300];
};

typedef std::vector<Foo> TEST_TYPE ;

class A
{
private:
    TEST_TYPE container;

public:
    TEST_TYPE& getcontainer()
    {
        return container;
    }
};

#define createBMember(TYPE, NAME)         \
private:                                  \
    TYPE NAME;                            \
                                          \
public:                                   \
    TYPE& get##NAME()                     \
    {                                     \
        return NAME;                      \
    }

class B
{
    createBMember(TEST_TYPE, container);
};

double testA()
{
    A a;
    LARGE_INTEGER frequency;
    LARGE_INTEGER startA, endA;

    if (!QueryPerformanceFrequency(&frequency)) {
        std::cerr << "High-Resolution-Timer nicht unterstützt." << std::endl;
        return 1;
    }

    QueryPerformanceCounter(&startA);
    for(size_t i = 0; i < 10000; ++i)
    {
        a.getcontainer().push_back(Foo());
    }

    QueryPerformanceCounter(&endA);

    return static_cast<double>(endA.QuadPart - startA.QuadPart) / frequency.QuadPart;
}

double testB()
{
    B b;
    LARGE_INTEGER frequency;
    LARGE_INTEGER startB, endB;

    if (!QueryPerformanceFrequency(&frequency)) {
        std::cerr << "High-Resolution-Timer nicht unterstützt." << std::endl;
    }

    QueryPerformanceCounter(&startB);

    for(size_t i = 0; i < 10000; ++i)
    {
        b.getcontainer().push_back(Foo());
    }

    QueryPerformanceCounter(&endB);

    return static_cast<double>(endB.QuadPart - startB.QuadPart) / frequency.QuadPart;
}

//----------------------------------------------------[main]
int main()
{
    double Atest = 0;
    double Btest = 0;

    double AHigh = 0;
    double BHigh = 0;

    double ALow = 10000;
    double BLow = 10000;

    double a;
    double b;

    const uint16_t amount = 30;

    for(uint16_t i = 0; i < amount; ++i)
    {   
        a = testA();

        AHigh = a > AHigh ? a : AHigh;
        ALow = a < ALow ? a : ALow;

        Atest += a;
    }

    for(uint8_t i = 0; i < amount; ++i)
    {   
        b = testB();

        BHigh = b > BHigh ? b : BHigh;
        BLow = b < BLow ? b : BLow;

        Btest += b;
    }

    Atest /= amount; 
    Btest /= amount; 

    std::cout << "A: " << Atest << std::endl;
    std::cout << "B: " << Btest << std::endl;

    auto size = sizeof(Foo);

    return 0;
}

I tried to refute his statement with this test by having a fairly large struct, which I simply append in a vector in each test run.

The strange thing, however, was that although the preprocessor runs before compiling and both classes should therefore be identical, I measured some speed differences. The following observations were made:

In debug mode without any optimization, the class that is tested first is faster
In release mode with "whole-program-optimization" and other settings, B is faster. The last times were: A: 0.47695, B: 0.430825

This confuses me, because as I said, both classes are identical.

I should also mention that unfortunately, as far as our development environment is concerned, we have to work with a kind of snapshot version of C++11 (Visual Studio 2010). That's why I can't use std::chrono for benchmarking, for example.

I haven't been able to test it with other compilers yet. I also looked at the assembly code on godbolt.org, but didn't find anything that could make such a big difference.

Admittedly, I'm still a trainee and would classify my skills as more of an amateur. Does anyone have any idea what could be causing this difference in speed?

I hadn't tested this yet. But the result is as follows: I did 30 runs each in the release configuration when testing. TestA is faster with A and B as classes. Then I swapped which test is executed first - i.e. swapped the for loops. In this case, TestA is still faster in both cases. Now I swapped the positions of the two functions again and tested them. When testing with class A, TestB is faster. When testing with class B, however, TestA is faster again. — Ccre
– Ccre, Commented Aug 16, 2024 at 8:25
Have you tried swapping the order of the tests. The first one to run may suffer from a slowdown due to memory being not recently used. Test A,B,A,B to be sure that memory being allocated first time around isn't skewing your benchmarks. I can't see how the preprocessor generated sourcecode should be any different when compiled. I'm sure it will add something to the compile time but shouldn't affect runtime at all. — Martin Brown
– Martin Brown, Commented Aug 16, 2024 at 8:27
Yes, I tested that as well. In general the first test to run is usually faster, not slower. — Ccre
– Ccre, Commented Aug 16, 2024 at 8:43
The vectors will do lots of heap allocations, and of course the state of the heap is different for the first run. So, are you testing the heap or the macros? :-) — BoP
– BoP, Commented Aug 16, 2024 at 9:05

MSalters · Accepted Answer · 2024-08-16 12:00:25Z

3

Your colleague doesn't know what they're talking about.

Macro's are a textual replacement in the preprocessor, one of the earliest phases of compilation. The actual compiler sees identical code. Any speed differences will be due to other factors. As noted in the comments, a bad test methodology is almost certain the best explanation (especially given the lack of knowledge shown by both the claim and the use of VS 2010)

answered Aug 16, 2024 at 12:00

MSalters

182k11 gold badges171 silver badges376 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Martin Brown Over a year ago

I concur. I'm am also puzzled why they are stuck on such a geriatric compiler version.

Collectives™ on Stack Overflow

Why are my preprocessor-macros slower or faster than manual writing seemingly random?

1 Answer 1

Your colleague doesn't know what they're talking about.

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your colleague doesn't know what they're talking about.

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related