17,987 questions
3
votes
1
answer
59
views
Bootloader crashing when jumping to 0x100000
I am having a problem with a bootloader I made. Mostly used code snippets from wiki.osdev.org, and screeck (on Github and Youtube) . The issue is: the bootloader cannot jump farther than 0xFFFFF, and ...
1
vote
1
answer
103
views
Does Intel CPU have instruction for paging translation result
I wonder if Intel (and Intel compatible) CPUs have an instruction (for diagnostic/debugging purposes) which, for a given linear address, returns the result of paging translation (i.e. the ...
0
votes
1
answer
50
views
Paging in x86: what exactly is divided into pages, and why does the linear address behave differently depending on paging?
I'm trying to fully understand how paging works in the x86 architecture when segmentation is also enabled.
I have a couple of questions:
Does paging divide the logical memory (the selector + offset ...
3
votes
0
answers
150
views
The cost of non contiguous reads and writes (naive matrix transpose, power-of-2 and other sizes)
I was benchmarking a naive transposition and noticed a very large performance discrepancy in performance between:
a naive operation where we read data contiguously and write with a large stride;
the ...
Advice
0
votes
2
replies
87
views
Determine used micro architecture level in executable linkable format (ELF) on x86
I have some troubles with prebuilt development tools (compiler, linker, ...) on my very old workstation. Because the CPU from my old system only supports the micro architecture level x86-64-v1 it ...
1
vote
2
answers
151
views
How exactly does recursion work in x86 assembly?
My question is focused specifically on assembly (intel). In C for example, recursion can be done with a simple return command but in assembly I feel like there's a lot more things going on, especially ...
2
votes
2
answers
142
views
Location of the first value pushed onto stack in assembly (x86 I386 assembly) (gdb) (AT&T)
Consider the assembly program below:
.section .data
.section .text
.global _start
_start:
pushl $85 #make it obvious in memory
popl %ebx
movl $1, %eax
int $0x80
It ...
0
votes
1
answer
147
views
x86_64 assembly program segfaults if push/pop rdx is removed
writing some simple assembly code, the program segfaults at the second call of subroutine _printint. This only happens if i remove push rdx and pop rdx from either the _printint subroutine or the ...
-1
votes
1
answer
211
views
MIT OS course 6.828, boot/main.c - outw to port 0x8A00 with magic numbers? [closed]
void bootmain(void)
{
struct Proghdr *ph, *eph;
// read 1st page off disk
readseg((uint32_t) ELFHDR, SECTSIZE*8, 0);
// is this a valid ELF?
if (ELFHDR->e_magic != ELF_MAGIC)
...
2
votes
1
answer
176
views
x86 Protected Mode Jump Issue
I am currently trying to make the second stage of a bootloader in order to enable 32-bit protected mode. I have written some x86 assembly for the NASM assembler to do so, but when I compiled and ran ...
10
votes
1
answer
425
views
AVX-512 MD5 implementation: unexplained performance regression on Zen 4
I have written an implementation of the MD5 hash function using AVX-512. While it uses SIMD instructions, it is fundamentally a scalar algorithm. The point of using SIMD instructions is to access ...
15
votes
1
answer
394
views
Using OUTB to set cursor position in my minimal OS kernel causes QEMU screen to flicker
I am getting started with a minimal OS kernel (just gdt and place holder idt). Using i386 assembly and freestanding C. I wanted to change the position of the cursor for which i found several sites ...
0
votes
0
answers
84
views
x86 code generation, segfault when doing a PUSH instruction [duplicate]
Creating my own code area, jumping into it with CALL in assembly.
The RET alone works, but doing a PUSH fails with a segfault. Why?
int main() {
char* code = mmap(NULL, 4096, PROT_READ|PROT_WRITE|...
3
votes
1
answer
95
views
I am getting segmentation fault on this x86 assembly program
So i am trying to write basic string to integer function in x86 asm. I know there is a problem in my function str2int but i don't know which state causes error.
#string2integer.s
.data
number: .long 0
...
1
vote
1
answer
119
views
Can I toggle LARGEADDRESSAWARE for a 32-bit exe at runtime or via config, or does it require recompilation?
I have a 32-bit executable that I want to deploy with the LARGEADDRESSAWARE flag.
My question is:
Is it possible to put this under a toggle, e.g., via a config file or some runtime setting, so that if ...
3
votes
1
answer
151
views
INT 13, AH=42h fails with AH=1, CF=1
I am trying to write a simple bootloader which loads my program from LBA=1 into memory and jumps to its start. I emulate my bootloader in qemu with -drive flag passed. When I try to read blocks from ...
0
votes
0
answers
128
views
Why my x86 Asm code gets Segmentation fault?
I’m trying to make simple alloc and free functions from the ProgrammingGroundUp book but my code isn’t working and I’m kinda stuck.
I am using that commands when i compile the program.
as --32 myalloc....
0
votes
1
answer
73
views
How to write userspace or kernel application that would allow me to generate a lot of asynchronous interrupts on x86_64 Linux?
I am studying a performance (progress guarantee?) problem of an x86 hypervisor software. The current hypothesis is like this. There is a high intensity of interrupt requests caused by concurrently ...
3
votes
0
answers
121
views
IPC collapse with larger loop bodies despite constant I-cache miss rate, what's the bottleneck?
I'm seeing dramatic instructions-per-cycle collapse (2.08 -> 1.30) when increasing loop body size in simple arithmetic code with no branches, but instruction cache miss rate stays exactly constant ...
7
votes
1
answer
314
views
How to use plain RDTSC without using asm?
I want to use RDTSC in Rust to benchmark how many ticks my function takes.
There's a built-in std::arch::x86_64::_rdtsc, alas it always translates into:
rdtsc
shl rdx, 32
or rax, rdx
...
3
votes
1
answer
130
views
What is the overhead of jumps and call-rets for CPU front-end decoder?
How jumps and call-ret pairs affect the CPU front-end decoder in the best case scenario when there are few instructions, they are well cached, and branches are well predicted?
For example, I run a ...
2
votes
1
answer
124
views
How do I reconciliate the dual array problem with the nature of hardware gather/scatter?
Say I have an array of a given object type which keeps the index to a target in the same array.
struct type_1 { float data; int target_index; };
struct type_1 first_array[1024];
first_array[0]....
1
vote
0
answers
101
views
What's the difference between label and constant x64 AT&T assembly [duplicate]
Some context behind the question. I tried writing a simple exit call like this
.data
.equ EXIT, 60
.equ STATUS, 0
.text
movq EXIT, %rax
movq STATUS, %rdi
syscall
however the code fails with a ...
0
votes
0
answers
96
views
Bootloader flickers after enabling paging in x64 long mode
I am working on a simple bootloader in assembly. everything works fine until i enable paging and jump to 64 bit mode but then qemu just flickers. i think its a page fault but i am not sure what's ...
10
votes
4
answers
310
views
Why does this lookup table sine estimation perform worse when using float instead of double?
I've written a simple sine estimation function which uses a lookup table. Out of curiosity, I tried both float and double types, expecting float to perform a bit better because of being able to pack ...