380 questions
2
votes
0
answers
280
views
why processing sorted array is faster? [closed]
So there is this original question I assume most of the C++ developers familiar with :
Why is processing a sorted array faster than processing an unsorted array?
Answer: branch prediction
Then I tried ...
3
votes
0
answers
109
views
How well can clang 20 infer the likelihood of branches without annotations?
I have a performance-critical C++ code base, and I want to improve (or at least measure if it's worth improving) the likelihood that clang assigns to branches, and in general understand what it's ...
4
votes
1
answer
114
views
Why prefer NOPs to unconditional jumps?
Sometimes we purposefully leave NOPs in a function for later runtime patching. Instead of:
.nops 16
Why not:
jmp 0f
.nops 14
0:
Or, if the amount that you need to patch in, varies up to a maximum:
....
2
votes
1
answer
90
views
Unconditional branch behaviour in no-taken branch prediction
I have the following code in nanoMips:
loop: lw $t1, A($t0)
lw $t2, B($t0)
sub $t3, $t1, $t2
beq $t3, $r0, else
sw $t2, A($t0)
b end
The exercise asks me to implement the no-taken branch prediction ...
0
votes
1
answer
241
views
Can the number of wasted cycles per branch misprediction vary greatly? And why?
When learning about the basic 5-stage pipeline processor that does in-order execution the number of wasted cycles per branch misprediction is a constant number when the processor is flushed.
But what ...
0
votes
0
answers
34
views
Fastest way to check if a GUID of set size has been encountered before
There are many questions on checking finding a GUID in a list etc. But I could not find any for just determining if a message was seen before or not.
I have an API which receives requests with a ...
0
votes
1
answer
242
views
Optimization of branch prediction inside hot loop in C++
I have performance critical code which calculates inter-atomic forcefield. It is controled by variables like bPBC, shifts, doBonds, doPiSigma, doPiPiI which can be switched on and off by user which ...
0
votes
0
answers
52
views
Wrap into Combine (Promise-Future) code with 2 completions/branches?
https://developers.google.com/admob/ios/privacy
class ViewController: UIViewController {
// Use a boolean to initialize the Google Mobile Ads SDK and load ads once.
private var ...
0
votes
1
answer
129
views
Use branch prediction with no else statement
I am currently implementing selectionsort. Using the code below and the driver file to test it below. I am currently trying to do micro optimizations to see what speeds it up.
public static void ...
1
vote
4
answers
397
views
Branch prediction and UB (undefined behavior)
I know a little something about branch prediction. This happens at the CPU and has nothing to do with compilation. Although you might be able to tell the compiler if one branch is more likely than the ...
1
vote
3
answers
232
views
How to avoid branching when finding runs of the same value and storing as a range (like run-length encoding)?
I have the following logic:
struct Range {
int start;
int end;
};
bool prev = false;
Range range;
std::vector<Range> result;
for (int i = 0; i < n; i++) {
bool curr = ...; // this is ...
1
vote
1
answer
416
views
How to replace nested IF/ELSE branches with SIMD (SSE or AVX)?
EDIT x 2
Added more comprehensive function which returns an abstract register class: the function outputs a register full of floats. I don't care the actual length - SSE, AVX... - because Google ...
2
votes
1
answer
1k
views
How much does a mispredicted conditional branch cost?
On x86-64 whatever micro architecture and ARM64 devices, how many clock cycles does a mispredicted conditional branch cost? And I suppose I should also ask what the figure is for a successfully ...
1
vote
0
answers
184
views
Does branchless programming make sense on very old x86 CPUs? (before 80486)
Modern CPUs since at least the 486 ¹) have a tightly-pipelined design, so conditional branches can cause "stalls" in which the pipeline has to be flushed and the code restarted on a ...
1
vote
0
answers
280
views
Branch Prediction: What is the BTB eviction scheme used in modern CPUs (Intel skylake for example)?
For branch prediction, the BHT(Branch history table) is indexed by branch virtual address. Aliasing problem happens when two or more branches hash to the same entry in the BHT(Branch history table), ...
3
votes
0
answers
180
views
Why is there a connection between branch prediction failure and "rep ret" in the K8 processor?
I am currently looking for answers to why gcc generates strange instructions like "rep ret" in the generated assembly code. I came across a question on Stack Overflow where someone raised a ...
0
votes
0
answers
36
views
What is the depth of CPU branch prediction? [duplicate]
If CPU is already in the path of a branch A speculatively, will it continue to speculatively execute the next branch B? or wait until branch A retire?
if (A) {
/* body of branch A */
if(B) {
...
3
votes
0
answers
160
views
Branch predictor friendly tree traversal
I have an AVL tree and I need to traverse it in ascending and descending order.
I implemented a simple algorithm, where knowing the tree size in advance, I allocate an array and assign 0 to a counter, ...
3
votes
1
answer
234
views
How to view branch predictor tables of a process using a debugger (gdb)?
I know that most modern processors maintain a branch prediction table (BPT). I have read the gdb documentation but I could not found any command that should give desired results. Based on this, I have ...
-1
votes
1
answer
478
views
Is branch prediction purely cpu behavior, or will the compiler give some hints?
In go standard package src/sync/once.go, a recent revision change the snippets
if atomic.LoadUint32(&o.done) == 1 {
return
}
//otherwise
...
to:
//if atomic.LoadUint32(&o.done) == ...
0
votes
1
answer
533
views
How debuggers deal with out-of-order execution and branch prediction
I know that modern CPUs do OoO execution and got advanced branch predictors that may fail, how does the debugger deal with that? So, if the cpu fails in predicting a branch how does the debugger know ...
0
votes
3
answers
545
views
How good is the Visual Studio compiler at branch-prediction for simple if-statements?
Here is some c++ pseudo-code as an example:
bool importantFlag = false;
for (SomeObject obj : arr) {
if (obj.someBool) {
importantFlag = true;
}
obj.doSomethingUnrelated();
}
...
5
votes
0
answers
150
views
If a function was entered via a near call, can it do a far tail call without breaking return address prediction?
Consider this code:
.globl _non_tail, _tail
.text
.code32
_non_tail:
lcall $0x33, $_non_tail.heavensgate
ret
.code64
_non_tail.heavensgate:
# do stuff. there's 12 bytes on the stack ...
0
votes
0
answers
330
views
can you produce BEQL MIPS instruction with C code?
So I have this code snippet in C
int unit_test_case08(int a, int b)
{
int success = 1336;
if(a != b)
{
success = 1337;
}
else
{
success = -1;
}
return ...
0
votes
0
answers
49
views
Branch prediction does not improve the performance [duplicate]
I executed the code from this famous topic Why is processing a sorted array faster than processing an unsorted array?
On my Mac OS Mojave:
//file test.cpp
#include <algorithm>
#include <ctime&...