167 questions
7
votes
1
answer
172
views
Why is faster to do a branch than a lookup?
I have just been trying to benchmark some PRNG code for generating hex characters - I have an input of random values buf, which I want to turn into a series of hex characters, in-place.
Code
#define ...
2
votes
1
answer
317
views
How to auto-vectorize (SIMD) a modular multiplication in Rust
I'm trying to optimize a code that has modular multiplication, to use SIMD auto-vectorization. That is, I don't want to use any libraries, the compiler should do the job. Here's the smalles verifiable ...
1
vote
0
answers
107
views
Rust auto vectorization of exp function
I am trying to get some good codegen for a Rust binary in x64 but it seems very hard to get simd operations for exp.
I have a simplified code example: https://godbolt.org/z/4MY34T1zj
The code should ...
0
votes
1
answer
321
views
Why does msvc not vectorize?
I am using Visual Studio 2022 on Windows 10. My processor: Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz 1.80 GHz.
Here is the code:
#include <vector>
#include <iostream>
#include <time.h&...
28
votes
1
answer
2k
views
Why does GCC generate code that conditionally executes a SIMD implementation?
The following code produces assembly that conditionally executes SIMD in GCC 12.3 when compiled with -O3. For completeness, the code always executes SIMD in GCC 13.2 and never executes SIMD in clang ...
5
votes
1
answer
330
views
AVX512 auto-vectorized C++ matrix-vector functions are much slower when source = destination, in-place
I've tried to write a few functions to carry out matrix-vector multiplication using a single matrix together with an array of source vectors. I've once written those functions in C++ and once in x86 ...
3
votes
3
answers
2k
views
Vectorizing vs interleaving loop C++
I am trying to get into optimization and i wanted to ask if i got the difference between loop vectorization and interleaving by bringing this examples:
void add_arrays(int* a, int* b, int n){
for (...
0
votes
1
answer
76
views
Why is the auto-vectorizer failing to find "vectorizable type information"?
I'm trying to get some of my code to vectorize, but I keep running into info C5002: loop not vectorized due to reason '1305'. According to this page:
// Code 1305 is emitted when the compiler can't ...
0
votes
0
answers
204
views
Why can llvm not auto-vectorize comparing two arrays and write result to vector<bool>?
I want to compare two arrays and write the result to bitmap. Therefore, I tried the following code:
void wtf(int *lhs, int *rhs, int len, std::vector<bool>& dst){
for (int i = 0; i < ...
5
votes
0
answers
360
views
Why does the compiler not use SIMD in my range-expression?
I have got two implementations of a dot-product:
One hand-coded https://godbolt.org/z/48EEnnY4r
int bla2(const std::vector<int>& a, const std::vector<int>& b){
int res = 0;
...
1
vote
1
answer
247
views
Understanding JIT's rewrite of a for loop
I have the following Java code (all arrays are initialized before we call "arrays" and all are of size "arraySize")
int arraySize = 64;
float[] a;
float[] b;
float[] result;
...
3
votes
1
answer
684
views
Why can't the Rust compiler auto-vectorize this FP dot product implementation?
Lets consider a simple reduction, such as a dot product:
pub fn add(a:&[f32], b:&[f32]) -> f32 {
a.iter().zip(b.iter()).fold(0.0, |c,(x,y)| c+x*y))
}
Using rustc 1.68 with -C opt-level=...
11
votes
0
answers
182
views
Is integer vectorization accuracy / precision of integer division CPU-dependent?
I tried to vectorize the premultiplication of 64-bit colors of 16-bit integer ARGB channels.
I quickly realized that due to lack of accelerated integer division support I need to convert my values to ...
0
votes
0
answers
120
views
Bash Script Using API for Online Image Vectorizing
I am a user of vectorizer.io services, but I now need to batch convert and that requires a script utilizing their API. Their customer service sent me a script written in ChatGPT that users have put ...
2
votes
1
answer
292
views
Auto-vectorization of a loop shuffling 4 int elements and taking absolute differences vs. the previous arrangement
I'm trying, without succeeding, to make the Rust compiler (rustc 1.66.0) auto-vectorize the following function (without using any unsafe intrinsics):
pub fn get_transforms(s: &mut [i32]) -> u32 ...
1
vote
1
answer
747
views
Is `-ftree-loop-vectorize` not enabled by `-O2` in GCC v12?
Example: https://www.godbolt.org/z/ahfcaj7W8
From https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Optimize-Options.html
It says
-ftree-loop-vectorize
Perform loop vectorization on trees. This flag ...
0
votes
1
answer
38
views
Fast way to generate a vector of successive powers in DolphinDB
Suppose the base value is x, I would like to create a vector [1, x, x** 2, x** 3,.... , x**n-1] where the i-th element is Xi.
I know in Python it can be implemented with a list. For x=5 and n=10:
[pow(...
0
votes
0
answers
177
views
why do I get "info C5002: loop not vectorized due to reason '1305'" when compiling a program with eigen in msvc
I wrote some code to do eigen calculation, and I add /Qvec-report:2 to see whether avx is activated in eigen. But I get so many "info C5002: loop not vectorized due to reason '1305'". It ...
4
votes
1
answer
6k
views
Auto vectorization with Rust
I am new with rust/SIMD and I have the following code snippet as the bottleneck of my program, I am wondering if I can leverage on the autovectorization feature of it
fn is_subset(a: Vec<i64>, b:...
4
votes
1
answer
119
views
Why does different types of array subscript used to iterate affect auto vectorization
As following code shows, why uint32_t prevents the compiler (GCC 12.1 + O3) from optimizing by auto vectorization. See godbolt.
#include <cstdint>
// no auto vectorization
void test32(uint32_t *...
3
votes
0
answers
900
views
Can auto-vectorization be automatically done by #pragma omp simd? [duplicate]
In order to use auto-vectorization for a c++ code which will be running on x86-64 and aarch64 processors, is just adding #pragma omp simd in the code is sufficient? I plan to compile in windows using ...
3
votes
0
answers
549
views
Using multiple pragma on same for-loop for auto-vectorization on GCC and ICC
When there is a simple loop running on simple arrays,
for(int i=0;i<16;i++)
{
a[i]=b[i]+c[i];
}
GCC and ICC behave differently with pragmas. So I experimented with pragmas and observed that ...
1
vote
1
answer
485
views
weird auto-vectorization in gcc with different results on godbolt
I'm confused by an auto-vectorization result. The following code addtest.c
#include <stdio.h>
#include <stdlib.h>
#define ELEMS 1024
int
main()
{
float data1[ELEMS], data2[ELEMS];
...
1
vote
1
answer
504
views
Is `-ftree-slp-vectorize` not enabled by `-O2` in GCC?
From https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
It says "-ftree-slp-vectorize: Perform basic block vectorization on trees. This flag is enabled by default at -O2 and by -ftree-...
0
votes
2
answers
608
views
How to autovectorize a loop with access stride 2 with g++ without openCL or intrinsics
I am trying to convert a function from an implementation using intrinsics into standard C++ (to simplify maintenance, portability, etc.). Everything worked fine, except for a loop with stride 2 where ...