I do not understand why such code is not vectorized with gcc 4.4.6
int MyFunc(const float *pfTab, float *pfResult, int iSize, int iIndex)
{
for (int i = 0; i < iSize; i++)
pfResult[i] = pfResult[i] + pfTab[iIndex];
}
note: not vectorized: unhandled data-ref
However, if I write the following code
int MyFunc(const float *pfTab, float *pfResult, int iSize, int iIndex)
{
float fTab = pfTab[iIndex];
for (int i = 0; i < iSize; i++)
pfResult[i] = pfResult[i] + fTab;
}
gcc succeeds auto-vectorize this loop
if I add omp directive
int MyFunc(const float *pfTab, float *pfResult, int iSize, int iIndex)
{
float fTab = pfTab[iIndex];
#pragma omp parallel for
for (int i = 0; i < iSize; i++)
pfResult[i] = pfResult[i] + fTab;
}
i have the following error not vectorized: unhandled data-ref
Could you please help me why the first code and third code is not auto-vectorized ?
Second question: math operand seems to be not vectorized (exp, log , etc...), this code for example
for (int i = 0; i < iSize; i++)
pfResult[i] = exp(pfResult[i]);
is not vectorized. It is due to my version of gcc ?
Edit: with new version of gcc 4.8.1 and openMP 2011 (echo |cpp -fopenmp -dM |grep -i open) i have the following error for all kind of loop even basically
for (iGID = 0; iGID < iSize; iGID++)
{
pfResult[iGID] = fValue;
}
note: not consecutive access *_144 = 5.0e-1;
note: Failed to SLP the basic block.
note: not vectorized: failed to find SLP opportunities in basic block.
Edit2:
#include<stdio.h>
#include<sys/time.h>
#include <string.h>
#include <math.h>
#include <stdlib.h>
#include <omp.h>
int main()
{
int szGlobalWorkSize = 131072;
int iGID = 0;
int j = 0;
omp_set_dynamic(0);
// warmup
#if WARMUP
#pragma omp parallel
{
#pragma omp master
{
printf("%d threads\n", omp_get_num_threads());
}
}
#endif
printf("Pagesize=%d\n", getpagesize());
float *pfResult = (float *)malloc(szGlobalWorkSize * 100* sizeof(float));
float fValue = 0.5f;
struct timeval tim;
gettimeofday(&tim, NULL);
double tLaunch1=tim.tv_sec+(tim.tv_usec/1000000.0);
double time = omp_get_wtime();
int iChunk = getpagesize();
int iSize = ((int)szGlobalWorkSize * 100) / iChunk;
//#pragma omp parallel for
for (iGID = 0; iGID < iSize; iGID++)
{
pfResult[iGID] = fValue;
}
time = omp_get_wtime() - time;
gettimeofday(&tim, NULL);
double tLaunch2=tim.tv_sec+(tim.tv_usec/1000000.0);
printf("%.6lf Time1\n", tLaunch2-tLaunch1);
printf("%.6lf Time2\n", time);
}
result with
#define _OPENMP 201107
gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-15)
gcc -march=native -fopenmp -O3 -ftree-vectorizer-verbose=2 test.c -lm
lot of
note: Failed to SLP the basic block.
note: not vectorized: failed to find SLP opportunities in basic block.
and note: not consecutive access *_144 = 5.0e-1;
Thanks
restrictvectorization could be wrong. And add -ffast-math because the compiler is scared otherwise. For exp and log, I'm sure I've seen related questions on SO. Basically, you would need to have a library that provides vector versions of exp and log so gcc could generate calls to them.iin your loops???