Why is a program compiled with the same optimization features (AVX2, OpenMP) enabled much slower on Linux than on Windows?

Ask Question

Asked 1 year, 6 months ago

Modified 1 year, 6 months ago

Viewed 288 times

Update2:

You can find the original codes below in the github link, if needed. You can also find the complete, exact changes I made to reproduce the problem, along with program logs. But they are in the edit history now (apparently including those details makes this question out of focus).

Update and highlight again: This is not a question regarding how to time a c++ program

Like I said in the original question, I specifically measured the real elapsed time (wall clock time), which is 20s (Windows) vs. 60s (Linux). I confirmed this with a stopwatch on my phone. My only question is why this program, compiled with the same optimization features enabled, is much slower on Linux than on Windows?

I am trying to run this github code (hi this is a clickable link in case you didn’t notice) on Linux, but I find it runs 2x ~ 3x slower on Linux than on Windows. Using its official input example data/BS_1000_torus.xyz, on Windows it takes ~20s, but on Linux it takes ~60s (I confirmed this using a stopwatch on my phone). I am trying to figure out how to set up the compilation so that running on Linux matches the same performance. Let me explain in detail.

On Windows:

I followed the exact steps (enabling AVX2, fast floating-point and OpenMP) in README to compile the project with vcpkg and VS2022. Running the BS_1000_torus takes around 20s (confirmed with a stopwatch).

On Linux

On Linux, I made the following changes to enable the key features mentioned in README:

Removed the vcpkg toolchain specification in CMakeLists.txt
Added set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx2 -fopenmp -pthread -Ofast") to CMakeLists.txt, right after set(CMAKE_BUILD_TYPE RELEASE) to enable the features mentioned in the README file.

Then I used cmake and make to compile it:

mkdir build
cd build
cmake ..
make -j

The output from cmake:

-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- 3.3.9
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.71.0/BoostConfig.cmake (found version "1.71.0")  
-- BOOST FOUNDED
-- Using header-only CGAL
-- Targeting Unix Makefiles
-- Using /usr/bin/c++ compiler.
-- Found GMP: /usr/lib/x86_64-linux-gnu/libgmp.so  
-- Found MPFR: /usr/lib/x86_64-linux-gnu/libmpfr.so  
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.71.0/BoostConfig.cmake (found suitable version "1.71.0", minimum required is "1.66")  
-- Boost include dirs: /usr/include
-- Boost libraries:    
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE  
-- Using gcc version 4 or later. Adding -frounding-math
-- Build type: RELEASE
-- USING CXXFLAGS = ' -mavx2 -fopenmp -pthread -Ofast -O3 -DNDEBUG'
-- USING EXEFLAGS = ' '
-- Found OpenMP_C: -fopenmp (found version "4.5") 
-- Found OpenMP_CXX: -fopenmp (found version "4.5") 
-- Found OpenMP: TRUE (found version "4.5")  
-- Configuring done (1.2s)
-- Generating done (0.0s)
-- Build files have been written to: /home/user/3dlab/GCNO-master/build

Running with the same model as the Windows experiment above takes ~64s (confirmed by a stopwatch).

System specs:

Both tests are on the same computer (dual boot, not WSL), with Intel(R) Core(TM) i9-10900X CPU @ 3.70GHz (10 cores, 2 threads per core). I set omp_set_num_threads(20); at the beginning of int main.
Windows system: Windows 10
Linux system: 5.15.0-88-generic #98~20.04.1-Ubuntu

Questions:

Why is the running time (actual elapsed time) so different (20s on Windows and 60s on Linux), even if I've enabled all optimization flags I can think of on Linux? Why compiling with the same enabled features (AVX2, OpenMP) leads to very different running time?
How can I set up the compilation so it is as fast on Linux as on Windows? Is there some optimization automatic for Windows but manual for Linux that I did not turn on?

edited Apr 30, 2024 at 1:09

asked Apr 29, 2024 at 5:58

ihdv

2,3373 gold badges20 silver badges41 bronze badges

2

@463035818_is_not_an_ai Without these modifications it simply won't compile. I've tried to make as few changes as possible. And correct me if I'm wrong, these modifications are merely made because some headers or functions do not exist on Linux, and should not lead to a program running 3x slower.

ihdv
– ihdv

2024-04-29 06:26:52 +00:00
Commented Apr 29, 2024 at 6:26
2

std::isinf should be fine on both, as well as int main(int argc, char* argv[]) and gamma_in_myrpd. Sure they wont affect runtime, but the lesser the list of modifications the easier it is to find the one difference that does affect runtime.

463035818_is_not_an_ai
– 463035818_is_not_an_ai

2024-04-29 06:29:30 +00:00
Commented Apr 29, 2024 at 6:29
2

@GhorbanM.Tavakoly You only have to give one optimization level (you have to choose whether it is -O3 or -Ofast), if you give two, it makes no sense. Which one should the compiler chose ? In that case, -O3 is obsolete because it is already included in -Ofast.

Fareanor
– Fareanor

2024-04-29 07:18:45 +00:00
Commented Apr 29, 2024 at 7:18
2

The question is about performance and timing ... but the behaviour in log seems different.

Öö Tiib
– Öö Tiib

2024-04-29 07:31:11 +00:00
Commented Apr 29, 2024 at 7:31
3

Windows Time: 21.544, Linux Time: 1191.18. My suspicion is that those two timing are using different units. Making them not apples-to-apples comparable.

Eljay
– Eljay

2024-04-29 11:26:03 +00:00
Commented Apr 29, 2024 at 11:26

| Show 29 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Why is a program compiled with the same optimization features (AVX2, OpenMP) enabled much slower on Linux than on Windows?

Update2:

Update and highlight again: This is not a question regarding how to time a c++ program

On Windows:

On Linux

System specs:

Questions:

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

Update2:

Update and highlight again: This is not a question regarding how to time a c++ program

On Windows:

On Linux

System specs:

Questions:

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest