Skip to main content
Filter by
Sorted by
Tagged with
7 votes
1 answer
172 views

I have just been trying to benchmark some PRNG code for generating hex characters - I have an input of random values buf, which I want to turn into a series of hex characters, in-place. Code #define ...
Anon's user avatar
  • 381
2 votes
1 answer
317 views

I'm trying to optimize a code that has modular multiplication, to use SIMD auto-vectorization. That is, I don't want to use any libraries, the compiler should do the job. Here's the smalles verifiable ...
Poperton's user avatar
  • 2,066
1 vote
0 answers
107 views

I am trying to get some good codegen for a Rust binary in x64 but it seems very hard to get simd operations for exp. I have a simplified code example: https://godbolt.org/z/4MY34T1zj The code should ...
benjamin-lieser's user avatar
0 votes
1 answer
321 views

I am using Visual Studio 2022 on Windows 10. My processor: Intel(R) Core(TM) i5-8265U CPU @ 1.60GHz 1.80 GHz. Here is the code: #include <vector> #include <iostream> #include <time.h&...
vev01's user avatar
  • 5
28 votes
1 answer
2k views

The following code produces assembly that conditionally executes SIMD in GCC 12.3 when compiled with -O3. For completeness, the code always executes SIMD in GCC 13.2 and never executes SIMD in clang ...
MarkB's user avatar
  • 2,230
5 votes
1 answer
330 views

I've tried to write a few functions to carry out matrix-vector multiplication using a single matrix together with an array of source vectors. I've once written those functions in C++ and once in x86 ...
Loran's user avatar
  • 55
3 votes
3 answers
2k views

I am trying to get into optimization and i wanted to ask if i got the difference between loop vectorization and interleaving by bringing this examples: void add_arrays(int* a, int* b, int n){ for (...
Niccolò Tiezzi's user avatar
0 votes
1 answer
76 views

I'm trying to get some of my code to vectorize, but I keep running into info C5002: loop not vectorized due to reason '1305'. According to this page: // Code 1305 is emitted when the compiler can't ...
Arkathorn's user avatar
0 votes
0 answers
204 views

I want to compare two arrays and write the result to bitmap. Therefore, I tried the following code: void wtf(int *lhs, int *rhs, int len, std::vector<bool>& dst){ for (int i = 0; i < ...
YjyJeff's user avatar
  • 953
5 votes
0 answers
360 views

I have got two implementations of a dot-product: One hand-coded https://godbolt.org/z/48EEnnY4r int bla2(const std::vector<int>& a, const std::vector<int>& b){ int res = 0; ...
Stein's user avatar
  • 3,281
1 vote
1 answer
247 views

I have the following Java code (all arrays are initialized before we call "arrays" and all are of size "arraySize") int arraySize = 64; float[] a; float[] b; float[] result; ...
DevD's user avatar
  • 23
3 votes
1 answer
684 views

Lets consider a simple reduction, such as a dot product: pub fn add(a:&[f32], b:&[f32]) -> f32 { a.iter().zip(b.iter()).fold(0.0, |c,(x,y)| c+x*y)) } Using rustc 1.68 with -C opt-level=...
benjamin-lieser's user avatar
11 votes
0 answers
182 views

I tried to vectorize the premultiplication of 64-bit colors of 16-bit integer ARGB channels. I quickly realized that due to lack of accelerated integer division support I need to convert my values to ...
György Kőszeg's user avatar
0 votes
0 answers
120 views

I am a user of vectorizer.io services, but I now need to batch convert and that requires a script utilizing their API. Their customer service sent me a script written in ChatGPT that users have put ...
CeramicMonster's user avatar
2 votes
1 answer
292 views

I'm trying, without succeeding, to make the Rust compiler (rustc 1.66.0) auto-vectorize the following function (without using any unsafe intrinsics): pub fn get_transforms(s: &mut [i32]) -> u32 ...
scasci's user avatar
  • 59
1 vote
1 answer
747 views

Example: https://www.godbolt.org/z/ahfcaj7W8 From https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Optimize-Options.html It says -ftree-loop-vectorize      Perform loop vectorization on trees. This flag ...
colinfang's user avatar
  • 21.9k
0 votes
1 answer
38 views

Suppose the base value is x, I would like to create a vector [1, x, x** 2, x** 3,.... , x**n-1] where the i-th element is Xi. I know in Python it can be implemented with a list. For x=5 and n=10: [pow(...
saki's user avatar
  • 319
0 votes
0 answers
177 views

I wrote some code to do eigen calculation, and I add /Qvec-report:2 to see whether avx is activated in eigen. But I get so many "info C5002: loop not vectorized due to reason '1305'". It ...
stormCommander's user avatar
4 votes
1 answer
6k views

I am new with rust/SIMD and I have the following code snippet as the bottleneck of my program, I am wondering if I can leverage on the autovectorization feature of it fn is_subset(a: Vec<i64>, b:...
xxx222's user avatar
  • 3,284
4 votes
1 answer
119 views

As following code shows, why uint32_t prevents the compiler (GCC 12.1 + O3) from optimizing by auto vectorization. See godbolt. #include <cstdint> // no auto vectorization void test32(uint32_t *...
Adonis Ling's user avatar
3 votes
0 answers
900 views

In order to use auto-vectorization for a c++ code which will be running on x86-64 and aarch64 processors, is just adding #pragma omp simd in the code is sufficient? I plan to compile in windows using ...
TestUser's user avatar
  • 977
3 votes
0 answers
549 views

When there is a simple loop running on simple arrays, for(int i=0;i<16;i++) { a[i]=b[i]+c[i]; } GCC and ICC behave differently with pragmas. So I experimented with pragmas and observed that ...
huseyin tugrul buyukisik's user avatar
1 vote
1 answer
485 views

I'm confused by an auto-vectorization result. The following code addtest.c #include <stdio.h> #include <stdlib.h> #define ELEMS 1024 int main() { float data1[ELEMS], data2[ELEMS]; ...
Ralf's user avatar
  • 1,335
1 vote
1 answer
504 views

From https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html It says "-ftree-slp-vectorize: Perform basic block vectorization on trees. This flag is enabled by default at -O2 and by -ftree-...
colinfang's user avatar
  • 21.9k
0 votes
2 answers
608 views

I am trying to convert a function from an implementation using intrinsics into standard C++ (to simplify maintenance, portability, etc.). Everything worked fine, except for a loop with stride 2 where ...
Come Raczy's user avatar
  • 1,700