4

I am optimizing a for loop with openMP. In each thread, a large array will be temporarily used (not needed when this thread finishes). Since I don't want to repeatedly allocate & delete these arrays, so I plan to allocate a large block of memory, and assign a part to each thread. To avoid conflicting, I should have a unique ID for each running thread, which should not change and cannot be equal to another thread. So my question is, can I use the thread ID return by function omp_get_thread_num() for this purpose? Or is there any efficient solution for such memory allocation & assignment task? Thanks very much!

3
  • 3
    Yes, you can. omp_get_thread_num() returns unique thread ID which will not change, it is common practice to use the thread ID to partition the matrix/array Commented Nov 26, 2013 at 12:44
  • Do you need that memory to persist between multiple entrances into the same parallel region or into consecutive parallel regions? An example of the former case would be a serial outer loop with a parallel inner loop. Also having the same thread ID returned by omp_get_thread_num() does not necessarily mean that the code is being executed by the same process thread. Commented Nov 26, 2013 at 16:14
  • Thanks, I do not need that to be persist. I only need each memory block will not be used by different thread at the same time, so the first solution is enough for me. Commented Nov 26, 2013 at 17:43

2 Answers 2

5

You can start the parallel section and then start allocating variables/memory. Everything that is declared within the parallel section is thread private on their own stack. Example:

#pragma omp parallel
{
    // every variable declared here is thread private
    int * temp_array_pointer = calloc(sizeof(int), num_elements);
    int temp_array_on_stack[num_elements];

    #pragma omp for
    for (...) {
         // whatever my loop does
    }

    // if you used dynamic allocation
    free(temp_array_pointer);
}
Sign up to request clarification or add additional context in comments.

3 Comments

How do you suggest doing error handling here? Any of the calloc calls could fail. It's simpler, IMO, to allocate the scratch structures at program startup where you can more easily handle errors.
@pburka If you use automatic variables (stack allocation) then you don't need error handling. If you do *alloc in the parallel section then yes, you should do error handling here. I would recommend allocating everything that is thread private within the parallel section to 1) avoid any potential bugs with passing the pointer around 2) If you are working on a NUMA machine then this will allow you to allocate the memory hardware-local to the executing thread, thus boosting performance.
If one uses large automatic variables, then ulimit -s (Un*x) / /STACK:nnn (Windows) and OMP_STACKSIZE are things one should get familiar with.
0

Once your program encounters a parallel region, that is once it hits

#pragma omp parallel

the threads (which may have been started at program initialisation or not until the first parallel construct) will become active. Inside the parallel region any thread which allocates memory, to an array for example, will be allocating that memory inside it's own, private, address space. Unless the thread deallocates the memory it will remain allocated for the entirety of the parallel region.

If your program first, in serial, allocates memory for an array and then, on entering the parallel region, copies that array to all threads, use the firstprivate clause and let the run time take care of copying the array into the private address space of each thread.

Given all that, I don't see the point of allocating, presumably before encountering the parallel region, a large amount of memory then sharing it between threads using some roll-your-own approach to dividing it based on calculations on the thread id.

3 Comments

Thanks. My purpose is to avoid the overhead of allocate memory in thread. Say the thread size is 8, and each thread will need memory size N, so I need only allocate N*8 memory, and let each thread have its only private part of size N in that. Yes I can copy the pointer of array with firstprivate clause, but this can not know which part is not used by current threads.
Ahh, so you'd rather have one thread allocate 8n bytes than 8 threads allocate n bytes each. I'm not sure I see that as anything more than an unnecessary optimisation -- and to convince me that it isn't you'll have to show me the data. As for pointers, you brought them into the discussion, not me.
Actually, I want to allocate 8n at the very beginning rather than each thread allocate n, the total thread size is far more than 8, but there're at most 8 threads running at the same time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.