I am optimizing a for loop with openMP. In each thread, a large array will be temporarily used (not needed when this thread finishes). Since I don't want to repeatedly allocate & delete these arrays, so I plan to allocate a large block of memory, and assign a part to each thread. To avoid conflicting, I should have a unique ID for each running thread, which should not change and cannot be equal to another thread. So my question is, can I use the thread ID return by function omp_get_thread_num() for this purpose? Or is there any efficient solution for such memory allocation & assignment task? Thanks very much!
2 Answers
You can start the parallel section and then start allocating variables/memory. Everything that is declared within the parallel section is thread private on their own stack. Example:
#pragma omp parallel
{
// every variable declared here is thread private
int * temp_array_pointer = calloc(sizeof(int), num_elements);
int temp_array_on_stack[num_elements];
#pragma omp for
for (...) {
// whatever my loop does
}
// if you used dynamic allocation
free(temp_array_pointer);
}
3 Comments
*alloc in the parallel section then yes, you should do error handling here. I would recommend allocating everything that is thread private within the parallel section to 1) avoid any potential bugs with passing the pointer around 2) If you are working on a NUMA machine then this will allow you to allocate the memory hardware-local to the executing thread, thus boosting performance.ulimit -s (Un*x) / /STACK:nnn (Windows) and OMP_STACKSIZE are things one should get familiar with.Once your program encounters a parallel region, that is once it hits
#pragma omp parallel
the threads (which may have been started at program initialisation or not until the first parallel construct) will become active. Inside the parallel region any thread which allocates memory, to an array for example, will be allocating that memory inside it's own, private, address space. Unless the thread deallocates the memory it will remain allocated for the entirety of the parallel region.
If your program first, in serial, allocates memory for an array and then, on entering the parallel region, copies that array to all threads, use the firstprivate clause and let the run time take care of copying the array into the private address space of each thread.
Given all that, I don't see the point of allocating, presumably before encountering the parallel region, a large amount of memory then sharing it between threads using some roll-your-own approach to dividing it based on calculations on the thread id.
3 Comments
8n bytes than 8 threads allocate n bytes each. I'm not sure I see that as anything more than an unnecessary optimisation -- and to convince me that it isn't you'll have to show me the data. As for pointers, you brought them into the discussion, not me.
omp_get_thread_num()does not necessarily mean that the code is being executed by the same process thread.