Update2:
You can find the original codes below in the github link, if needed. You can also find the complete, exact changes I made to reproduce the problem, along with program logs. But they are in the edit history now (apparently including those details makes this question out of focus).
Update and highlight again: This is not a question regarding how to time a c++ program
Like I said in the original question, I specifically measured the real elapsed time (wall clock time), which is 20s (Windows) vs. 60s (Linux). I confirmed this with a stopwatch on my phone. My only question is why this program, compiled with the same optimization features enabled, is much slower on Linux than on Windows?
I am trying to run this github code (hi this is a clickable link in case you didn’t notice) on Linux, but I find it runs 2x ~ 3x slower on Linux than on Windows. Using its official input example data/BS_1000_torus.xyz, on Windows it takes ~20s, but on Linux it takes ~60s (I confirmed this using a stopwatch on my phone). I am trying to figure out how to set up the compilation so that running on Linux matches the same performance. Let me explain in detail.
On Windows:
I followed the exact steps (enabling AVX2, fast floating-point and OpenMP) in README to compile the project with vcpkg and VS2022. Running the BS_1000_torus takes around 20s (confirmed with a stopwatch).
On Linux
On Linux, I made the following changes to enable the key features mentioned in README:
- Removed the vcpkg toolchain specification in
CMakeLists.txt - Added
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mavx2 -fopenmp -pthread -Ofast")toCMakeLists.txt, right afterset(CMAKE_BUILD_TYPE RELEASE)to enable the features mentioned in the README file.
Then I used cmake and make to compile it:
mkdir build
cd build
cmake ..
make -j
The output from cmake:
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- 3.3.9
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.71.0/BoostConfig.cmake (found version "1.71.0")
-- BOOST FOUNDED
-- Using header-only CGAL
-- Targeting Unix Makefiles
-- Using /usr/bin/c++ compiler.
-- Found GMP: /usr/lib/x86_64-linux-gnu/libgmp.so
-- Found MPFR: /usr/lib/x86_64-linux-gnu/libmpfr.so
-- Found Boost: /usr/lib/x86_64-linux-gnu/cmake/Boost-1.71.0/BoostConfig.cmake (found suitable version "1.71.0", minimum required is "1.66")
-- Boost include dirs: /usr/include
-- Boost libraries:
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Using gcc version 4 or later. Adding -frounding-math
-- Build type: RELEASE
-- USING CXXFLAGS = ' -mavx2 -fopenmp -pthread -Ofast -O3 -DNDEBUG'
-- USING EXEFLAGS = ' '
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Configuring done (1.2s)
-- Generating done (0.0s)
-- Build files have been written to: /home/user/3dlab/GCNO-master/build
Running with the same model as the Windows experiment above takes ~64s (confirmed by a stopwatch).
System specs:
- Both tests are on the same computer (dual boot, not WSL), with Intel(R) Core(TM) i9-10900X CPU @ 3.70GHz (10 cores, 2 threads per core). I set
omp_set_num_threads(20);at the beginning ofint main. - Windows system: Windows 10
- Linux system: 5.15.0-88-generic #98~20.04.1-Ubuntu
Questions:
- Why is the running time (actual elapsed time) so different (20s on Windows and 60s on Linux), even if I've enabled all optimization flags I can think of on Linux? Why compiling with the same enabled features (AVX2, OpenMP) leads to very different running time?
- How can I set up the compilation so it is as fast on Linux as on Windows? Is there some optimization automatic for Windows but manual for Linux that I did not turn on?
std::isinfshould be fine on both, as well asint main(int argc, char* argv[])andgamma_in_myrpd. Sure they wont affect runtime, but the lesser the list of modifications the easier it is to find the one difference that does affect runtime.-O3or-Ofast), if you give two, it makes no sense. Which one should the compiler chose ? In that case,-O3is obsolete because it is already included in-Ofast.Time: 21.544, LinuxTime: 1191.18. My suspicion is that those two timing are using different units. Making them not apples-to-apples comparable.