Quick question: what is the compiler flag to allow g++ to spawn multiple instances of itself in order to compile large projects quicker (for example 4 source files at a time for a multi-core CPU)?
10 Answers
You can do this with make - with gnu make it is the -j flag (this will also help on a uniprocessor machine).
For example if you want 4 parallel jobs from make:
make -j 4
You can also run gcc in a pipe with
gcc -pipe
This will pipeline the compile stages, which will also help keep the cores busy.
If you have additional machines available too, you might check out distcc, which will farm compiles out to those as well.
11 Comments
-j argumentsThere is no such flag, and having one runs against the Unix philosophy of having each tool perform just one function and perform it well. Spawning compiler processes is conceptually the job of the build system. What you are probably looking for is the -j (jobs) flag to GNU make, a la
make -j4
Or you can use pmake or similar parallel make systems.
3 Comments
If using make, issue with -j. From man make:
-j [jobs], --jobs[=jobs] Specifies the number of jobs (commands) to run simultaneously. If there is more than one -j option, the last one is effective. If the -j option is given without an argument, make will not limit the number of jobs that can run simultaneously.
And most notably, if you want to script or identify the number of cores you have available (depending on your environment, and if you run in many environments, this can change a lot) you may use ubiquitous Python function cpu_count():
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.cpu_count
Like this:
make -j $(python3 -c 'import multiprocessing as mp; print(int(mp.cpu_count() * 1.5))')
If you're asking why 1.5 I'll quote user artless-noise in a comment above:
The 1.5 number is because of the noted I/O bound problem. It is a rule of thumb. About 1/3 of the jobs will be waiting for I/O, so the remaining jobs will be using the available cores. A number greater than the cores is better and you could even go as high as 2x.
3 Comments
make -j`nproc` with nproc in GNU Coreutils.make -j $(( $(nproc) + 1 )) (make sure you put spaces where I have them).nproc isn't available, e.g. in manylinux1 containers, it saves additional time by avoiding running yum update/yum install.make will do this for you. Investigate the -j and -l switches in the man page. I don't think g++ is parallelizable.
1 Comment
-l option ( does not start a new job unless all previous jobs did terminate ). Otherwise it seems that the linker job begins with not all object files built (as some compilations are still ongoing), so that the linker job fails.People have mentioned make but bjam also supports a similar concept. Using bjam -jx instructs bjam to build up to x concurrent commands.
We use the same build scripts on Windows and Linux and using this option halves our build times on both platforms. Nice.
Comments
distcc can also be used to distribute compiles not only on the current machine, but also on other machines in a farm that have distcc installed.
2 Comments
You can use make -j$(nproc) . This command is used to build a project using the make build system with multiple jobs running in parallel.
For example, if your system has 4 CPU cores, running make -j$(nproc) would instruct make to run 4 jobs concurrently, one on each CPU core, speeding up the build process.
You can also see how many cores you have with run this command;
echo $(nproc)
1 Comment
nproc doesn't always yield optimal performance. My machine has four cores but compilation runs faster with -j2.I'm not sure about g++, but if you're using GNU Make then "make -j N" (where N is the number of threads make can create) will allow make to run multple g++ jobs at the same time (so long as the files do not depend on each other).
2 Comments
-j N tells make how many processes at once should be spawned, not threads. That's the reason why it is not as performant as MS cl -MT(really multithreaded).N is too large? E.g. can -j 100 break the system or is N merely an upper bound that is not required to achieve?GNU parallel
I was making a synthetic compilation benchmark and couldn't be bothered to write a Makefile, so I used:
sudo apt-get install parallel
ls | grep -E '\.c$' | parallel -t --will-cite "gcc -c -o '{.}.o' '{}'"
Explanation:
{.}takes the input argument and removes its extension-tprints out the commands being run to give us an idea of progress--will-citeremoves the request to cite the software if you publish results using it...
parallel is so convenient that I could even do a timestamp check myself:
ls | grep -E '\.c$' | parallel -t --will-cite "\
if ! [ -f '{.}.o' ] || [ '{}' -nt '{.}.o' ]; then
gcc -c -o '{.}.o' '{}'
fi
"
xargs -P can also run jobs in parallel, but it is a bit less convenient to do the extension manipulation or run multiple commands with it: Running multiple commands with xargs
Parallel linking was asked at: Can gcc use multiple cores when linking?
Fun note: parsing of context free grammars can be reduced to boolean matrix multiplication, e.g. https://www.ps.uni-saarland.de/courses/seminar-ws06/papers/07_franziska_ebert.pdf so maybe it would also be theoretically possible to speed up single file parsing for large files. Likely not of much practical use, but fun fact.
Tested in Ubuntu 18.10.
2 Comments
There has been work to make gcc use multiple cores, see this link: https://gcc.gnu.org/wiki/ParallelGcc, but it seems to be in internal stages, so not yet able to be used and also seems to not have moved in the last 5 years. I'd not say that's too worrisome since compiler coding is not trivial and there's not too many people that can do it.
Looking through the readme you'll see that many things had been already investigated, but it would still need a few makeovers to really unlock the performance benefits. (i.e. see what they write about locks that could still be removed).
But at least it has been tried, so your flag may exist at some point in the future ;-)
make -jalmost always results in some improvement.