Memory allocation and assignment in C++ with openMP

Question

I am optimizing a for loop with openMP. In each thread, a large array will be temporarily used (not needed when this thread finishes). Since I don't want to repeatedly allocate & delete these arrays, so I plan to allocate a large block of memory, and assign a part to each thread. To avoid conflicting, I should have a unique ID for each running thread, which should not change and cannot be equal to another thread. So my question is, can I use the thread ID return by function omp_get_thread_num() for this purpose? Or is there any efficient solution for such memory allocation & assignment task? Thanks very much!

Yes, you can. omp_get_thread_num() returns unique thread ID which will not change, it is common practice to use the thread ID to partition the matrix/array — Aleksander Fular
– Aleksander Fular, Commented Nov 26, 2013 at 12:44
Do you need that memory to persist between multiple entrances into the same parallel region or into consecutive parallel regions? An example of the former case would be a serial outer loop with a parallel inner loop. Also having the same thread ID returned by omp_get_thread_num() does not necessarily mean that the code is being executed by the same process thread. — Hristo Iliev
– Hristo Iliev, Commented Nov 26, 2013 at 16:14
Thanks, I do not need that to be persist. I only need each memory block will not be used by different thread at the same time, so the first solution is enough for me. — Chai ML
– Chai ML, Commented Nov 26, 2013 at 17:43

Sergey L. · Accepted Answer · 2013-11-26 13:39:20Z

5

You can start the parallel section and then start allocating variables/memory. Everything that is declared within the parallel section is thread private on their own stack. Example:

#pragma omp parallel
{
    // every variable declared here is thread private
    int * temp_array_pointer = calloc(sizeof(int), num_elements);
    int temp_array_on_stack[num_elements];

    #pragma omp for
    for (...) {
         // whatever my loop does
    }

    // if you used dynamic allocation
    free(temp_array_pointer);
}

answered Nov 26, 2013 at 13:39

Sergey L.

22.7k6 gold badges54 silver badges78 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

pburka Over a year ago

How do you suggest doing error handling here? Any of the calloc calls could fail. It's simpler, IMO, to allocate the scratch structures at program startup where you can more easily handle errors.

Sergey L. Over a year ago

@pburka If you use automatic variables (stack allocation) then you don't need error handling. If you do *alloc in the parallel section then yes, you should do error handling here. I would recommend allocating everything that is thread private within the parallel section to 1) avoid any potential bugs with passing the pointer around 2) If you are working on a NUMA machine then this will allow you to allocate the memory hardware-local to the executing thread, thus boosting performance.

Hristo Iliev Over a year ago

If one uses large automatic variables, then ulimit -s (Un*x) / /STACK:nnn (Windows) and OMP_STACKSIZE are things one should get familiar with.

High Performance Mark · Accepted Answer · 2013-11-26 12:19:11Z

0

Once your program encounters a parallel region, that is once it hits

#pragma omp parallel

the threads (which may have been started at program initialisation or not until the first parallel construct) will become active. Inside the parallel region any thread which allocates memory, to an array for example, will be allocating that memory inside it's own, private, address space. Unless the thread deallocates the memory it will remain allocated for the entirety of the parallel region.

If your program first, in serial, allocates memory for an array and then, on entering the parallel region, copies that array to all threads, use the firstprivate clause and let the run time take care of copying the array into the private address space of each thread.

Given all that, I don't see the point of allocating, presumably before encountering the parallel region, a large amount of memory then sharing it between threads using some roll-your-own approach to dividing it based on calculations on the thread id.

answered Nov 26, 2013 at 12:19

High Performance Mark

78.6k7 gold badges109 silver badges168 bronze badges

3 Comments

Chai ML Over a year ago

Thanks. My purpose is to avoid the overhead of allocate memory in thread. Say the thread size is 8, and each thread will need memory size N, so I need only allocate N*8 memory, and let each thread have its only private part of size N in that. Yes I can copy the pointer of array with firstprivate clause, but this can not know which part is not used by current threads.

High Performance Mark Over a year ago

Ahh, so you'd rather have one thread allocate 8n bytes than 8 threads allocate n bytes each. I'm not sure I see that as anything more than an unnecessary optimisation -- and to convince me that it isn't you'll have to show me the data. As for pointers, you brought them into the discussion, not me.

Chai ML Over a year ago

Actually, I want to allocate 8n at the very beginning rather than each thread allocate n, the total thread size is far more than 8, but there're at most 8 threads running at the same time.

Collectives™ on Stack Overflow

Memory allocation and assignment in C++ with openMP

2 Answers 2

3 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related