C++ - Multithreading

Question

I have the code:

for (int i = 0; i < (int)(kpts.size()); i++) {
    perform_operation(kpts1[i], *kpts2[i]);
}

where kpt1 and kpt2 are a std::vector<> types. The function perform_operation takes kpt1[i], performs an operation on it and stores it in kpt2[i].

It seems like I should be able to multithread this. Since each cycle of the for loop is independent of one another, then I should be able to run this parallely with as many processes as there are CPU cores, right?

I've seem several SO questions kinda answering this, but they don't really get at how to parallelize a simple for loop; and I'm not sure if reading the same kpt1 variable and writing to the same kpt2 variable is possible.

Or am I misunderstanding something? - is this not parallelizable?

I'd be happy if I could find a solution in C++ or C, but right now I am stuck.

Yes, you can parellelize this trivially, but think a bit beforehand -- surely you don't want thousands of threads, but only as many as yu have physical cores. So you really just want to partition the index space. — Kerrek SB
– Kerrek SB, Commented Mar 13, 2014 at 20:12
First thing that comes to mind is Boost Threads stackoverflow.com/questions/415994/boost-thread-tutorials — Cory Kramer
– Cory Kramer, Commented Mar 13, 2014 at 20:12
C++11 has support for threads in the standard library. If kpts.size() is large and/or perform_operation() is very expensive then it might be worth parallelizing. — amdn
– amdn, Commented Mar 13, 2014 at 20:13
@KerrekSB I will definitely partition kpts1 so that the max number of threads equals the cores (I'm still deciding which AWS instance I will be using, so I don't know the max cores at the moment). But for argument, lets say I want to split this up into 8 processes. How could I do that? — Brett
– Brett, Commented Mar 13, 2014 at 20:15
Here's a tutorial for C++11 threads solarianprogrammer.com/2011/12/16/cpp-11-thread-tutorial — amdn
– amdn, Commented Mar 13, 2014 at 20:44

abligh · Accepted Answer · 2014-03-13 20:13:37Z

1

Provided each perform_operation operates independently of each other, then ues, this is parallelizable.

Rather than simply calling perform_operation, start a new thread (with pthread_create). You will need to wrap the parameters in a single struct (could just be pointers to both arguments), and pass start_routine as a wrapper around perform_operation. That will create the relevant number of threads. Then in a second for loop use pthread_join to wait for the threads you have created to exit.

That's a rough outline. Obviously some error handling would be useful, and you might want each thread to perform a number of perform_operations serially, rather than one thread per item. But you should get the basic idea from the above.

answered Mar 13, 2014 at 20:13

abligh

25.3k4 gold badges54 silver badges91 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Brett Over a year ago

Thanks for the answer @abligh. This looks like a good approach; but it's exactly the type of example I haven't been able to find any basic code for. I'm very very new to C/C++ threading, so I'm looking to follow some basic code if possible.

abligh Over a year ago

If you can find a C++ library, that might be easier. If using pthreads, try this: computing.llnl.gov/tutorials/pthreads/#Joining - also take a look at stackoverflow.com/questions/15376908/…

juanchopanza Over a year ago

@Brett Note that you don't have to use pthreads directly in this day and age. C++11 has a threading library, and if you don't have C++11, then there's boost.thread.

Jay · Accepted Answer · 2014-03-13 20:19:02Z

1

I believe you're asking can you operate on each element of the array in a separate thread?

You can. There are several considerations though.

As long as the separate operations don't impact each other it's a good candidate for parallelism.

As a practical matter standard on CPU threading is slow to setup and eats up a good amount of memory (pthread by default allocates 32 megabytes per thread for the stack). If the tasks are pretty intensive then you get back the setup overhead in time savings. If not then it's both harder to code, bigger, and slower than doing it in a straight forward way.

Intel TBB is one option. NVidia CUDA is another

answered Mar 13, 2014 at 20:19

Jay

14.5k5 gold badges47 silver badges74 bronze badges

Collectives™ on Stack Overflow

C++ - Multithreading

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related