I am doing some image processing and have a nested for loop. I want to implement multiprocessing using OpenMP. The for loop looks like this, where I have added the pragma tags and declared some of the variables private as well.
int a,b,j, idx;
#pragma omp parallel for private(b,j,sumG,sumGI)
for(a = 0; a < ny; ++a)
{
for(b = 0; b < nx; ++b)
{
idx = a*ny+b;
if (imMask[idx] == 0)
{
Wshw[idx] = 0;
continue;
}
sumG = 0;
sumGI = 0;
for(j = a; j < ny; ++j)
{
sumG += shadowM[j-a];
sumGI += shadowM[j-a] * imBlurred[nx*j + b];
}
Wshw[idx] = sumGI / sumG;
}
}
The size of both nx and ny is large and I thought that, using OpenMP, I would get a descent decrease in execution time, instead there is almost no difference. Am I doing something wrong when I implement the multi-threading maybe?
idxprivate as well.