2

Here my very simple question. With ICC I know it is possible to use #pragma SIMD to force vectorization of loops that the compiler chooses not to vectorize. Is there something analogous in GCC? Or, is there any plan to add this feature in a future release?

Quite related, what about forcing vectorization with Graphite?

1 Answer 1

1

As long as gcc is allowed to use SSE/SSE2/etc instructions, the compiler will in general produce vector instructions when it realizes that it's "worthwhile". Like most things in compilers, this requires some luck/planning/care from the programmer to avoid the compiler thinking "maybe this isn't safe" or "this is too complicated, I can't figure out what's going on". But quite often, it's successful if you are using a reasonably modern version of gcc (4.x versions should all do this).

You can make the compiler use SSE or SSE2 instructions by adding -msse or -msse2 (etc. for later SSE extensions). -msse2 is default in x86-64.

I'm not aware of any way that you can FORCE this, however. The compiler will either do this because it's happy that it's a good solution, or it wont.

Sorry, can't answer about Graphite.

Sign up to request clarification or add additional context in comments.

10 Comments

Yes, I know what you mean. I just want to force some loops to be vectorized because if I do that with ICC, I get some performance improvement. So, I'm curious to see the reaction of GCC. But I need to discover whether it is possible and how to force vectorization. Thanks anyway.
@user2047635 If you're at the point where you think you can do better than the compiler, you might as well just manually vectorize it yourself with intrinsics.
Or better, yet, write it in assembler all the way - that way, you have 100% control over which instructions come in which order, what registers are used where, etc, etc.
You're both right. But things are not so simple. I am investigating a class of programs sharing a specific feature, i.e. a loop nest with very small trip counts. So using intrinsics means building a compiler/code translator/generator (call it however you prefer) to generate them, and this would be more complicated than that I would have to build for making the transformations I am currently doing (up to now manually, for experimental purposes) to the loops.
Have you actually looked at what the gcc compiler produces?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.