461 questions with no answers
3
votes
0
answers
109
views
How well can clang 20 infer the likelihood of branches without annotations?
I have a performance-critical C++ code base, and I want to improve (or at least measure if it's worth improving) the likelihood that clang assigns to branches, and in general understand what it's ...
2
votes
0
answers
61
views
Why is sequential indexing with fixed length stride slower in Estrin's method?
Preparing to make Estrin's method vectorisable I changed from normal linear indexing of the coefficients to bitreversed and restricted it to strictly powers of 2. Neither MSVC nor ICX can see how to ...
0
votes
0
answers
129
views
About JVM's tiered compilation sequence, does isolated method optimization occur before inlining?
The tiered steps provided by Oracle are:
It seems to me that...
I'd be a reasonable assumption to think that optimizations should occur with methods in isolation (detached from its call-site context),...
2
votes
0
answers
96
views
FSharp inlining downcast optimization?
I am currently reading through the F# core library source code and stumbled upon a common pattern which made me wonder a little about the performance of it, and could not find anything about it by a ...
1
vote
0
answers
114
views
dlv 'Warning: debugging optimized function' despite 'go build -gcflags "-N -l"'
I compile my binary like this go build -gcflags '-N -l', but when I run them in dlv I get Warning: debugging optimized function. My guess it's that it's a different host than the build host and the ...
1
vote
0
answers
81
views
How to implement a Swift analogue of `benchmark::DoNotOptimize`?
I would like to do some (micro)benchmarking in Swift. I have been using package-benchmark for this. It comes with a blackHole helper function that forces the compiler to assume that a variable is read ...
2
votes
0
answers
122
views
C Switch/Case compiler produces jump fiesta (GCC very suboptimal)
This question provides kind of an alternative way for the issue described in
Inefficient Loop Unrolling. But not, this question is absolute independent of the other one.
We have a primitive switch/...
0
votes
0
answers
95
views
Specialized versions of a Fortran procedure
Consider the following minimal example:
subroutine f(x,a,b,c)
integer::x,i,j,k
integer,optional::a,b,c
! depending on present arguments, some expressions and variables are redundant
i=...
0
votes
0
answers
96
views
Vectorizing irregular and non SIMD register width multiple loop
Assume the function below. There are two extents that have been picked to be 3 and 17. We wish to vectorize SomeWork (it is in the translation unit and simple).
The naive approach I take is to flatten ...
1
vote
0
answers
311
views
VisualDSP++ Debug mode vs Release mode
I'm testing a project on a signal processing processor (ADSP 21489). I'm using as a development software VisualDSP++ 5.0 from Analog Devices. I use the DMA buffers to send data to a CODEC.
The ...
0
votes
0
answers
150
views
Why does LLVM IR generate `alloca` and `store` instructions for unused variables?
I'm learning LLVM IR and noticed some seemingly redundant instructions in the generated code. For example, in the following LLVM IR:
define i32 @main() #0 {
%1 = alloca i32, align 4
store i32 0, ...
0
votes
0
answers
84
views
Getter only property in C# record struct primary constructor
If I declare a struct like this:
public record struct MyStruct(bool MyValue);
and then look at the decompiled source, it looks like this:
public struct MyStruct : IEquatable<MyStruct>
{
[...
1
vote
0
answers
88
views
Will Java compiler optimize the for loop with identical method call inside?
I'm wondering if the Java compiler will optimize the run of the for-loop if there no changes within the for-loop body?
For instance, let's suppose I have the following code:
double res = 0;
for (int i ...
1
vote
0
answers
76
views
Pass const expression to consteval function through non-consteval functions
in C++ is it possible to somehow pass a constant argument through multiple functions and still use consteval functions for them? The code below does not compile to FieldAsInt because the 'key' ...
2
votes
0
answers
188
views
Can't translating out of SSA be more trivial?
I'm reading about how to translate out of SSA form from different sources and I'm having hard time about the necessity of different techniques.
First of all looking at Cytron's SSA paper at section 7 ...
1
vote
0
answers
67
views
Inline assembly: writing an unrolled loop to process arrays with correct constraints
Context
I'm trying to write a piece of code in inline assembly, which processes all elements of a "small" array (say ~10 elements) as a fully-unrolled loop. I want to avoid falling into the ...
1
vote
0
answers
144
views
optimize indexing while inside of a loop
i want to try optimizing this line of code:
for i in 0..len {
slots[values[i].slot].count += 1;
}
(both of these lists are extremely long and the same size)
i have already optimized it like this(...
3
votes
0
answers
153
views
GCC optimizes x + n > y as x + (n-1) >= y?
In general, the program
extern int x, y;
int main() {
return x + N > y;
}
is optimized into something akin to x + N-1 >= y for some given N. Example below. Am I reading the assembly right? ...
0
votes
0
answers
175
views
Why doesn't -fno-exceptions reduce the size of C++ code?
My architecture is MicroBlaze, and I'm developing on the Xilinx ZCU102. The C++ version is 17. I built the code on Vitis 2022.2 version.
My original code had a .text section of only 60,000 bytes. ...
2
votes
0
answers
288
views
Why is a program compiled with the same optimization features (AVX2, OpenMP) enabled much slower on Linux than on Windows?
Update2:
You can find the original codes below in the github link, if needed. You can also find the complete, exact changes I made to reproduce the problem, along with program logs. But they are in ...
3
votes
0
answers
116
views
Is `__declspec(noalias)` a no-op?
gcc/clang's __attribute__((const)) is a close (but not entirely exact) analogue of msvc's __declspec(noalias): both express that "a function call doesn't modify or reference visible global state&...
1
vote
0
answers
23
views
ConstantOptimiser in Solidity
While debugging in RemixIDE with 'optimize=true', I noticed that the Solidity code address(0xAAAA) generates the following opcodes, which only compute 20 bytes 1s:
PUSH1 0x1
PUSH1 0x1
PUSH1 0xa0
SHL
...
0
votes
0
answers
147
views
Optimizing Memory-Bound Loop with Indirect Prefetching
I'm currently working on optimizing a kernel, and one of the most time-consuming loops, despite optimization efforts, still accounts for 80% of the benchmark's execution time. The loop's performance ...
4
votes
0
answers
377
views
Why is my rust program producing a Segfault
I'm creating a particle-life simulation in rust and i'm using Nannou for rendering graphics. Everything seems to work when i run "cargo run" but when i tried doing a "cargo run --...
1
vote
0
answers
125
views
Tree-sitter: "choice" grammar does not work
I use tree-sitter to write the parser for comment input.
this is the code I write for single line comment parsing:
seq(
"//",
optional(seq($.comment_prefix, optional(/[ ]*/))),
...