2

I am currently toying around with WebAssembly compiled through LLVM but I haven't yet managed to understand the stack / stack pointer and how it relates to the overall memory layout.

I learned that I have to use s2wasm with --allocate-stack N to make my program run and I figured that this is basically adding (data (i32.const 4) "8\00\00\00") (with N=8) to my generated wast, with the binary part obviously being a pointer to a memory offset and the i32 constant being its offset in linear memory.

What I do not quite understand, though, is why the pointer's value is 56 (again with N=8) and how this value relates to the exact region of the stack in memory, which, in my case, currently looks like:

0-3: zero 4-7: 56 7-35: other data sections 36-55: zeroes 56-59: zero

I know that I am probably more a candidate for "just use emscripten", but I'd also like to understand this.

  • Is the stack pointer always stored at offset 4 in linear memory?
  • How is its initial value calculated? (aligned to next offset%16==0 + N after data?)
  • What's stored before, and what's after the offset it points at?

1 Answer 1

6

I touched on this in another question. From C++'s stack there are actually 3 places where the values can end up:

  1. On the execution stack (each opcode pushes and pops values, so add pops 2 and then pushes 1).
  2. As a local.
  3. In the Memory.

Notice that you can't take the address of 1. and 2. Only in these cases would I expect a code generator to go with 3. How this is done isn't dictated by WebAssembly, it's up to whatever ABI you chose. What Emscripten and other tools do is they store the stack pointer at address 4, and then very early in the program they choose a spot where the stack should go. It doesn't have to always be 4, but it's simpler to always stick to that ABI especially if dynamic linking is involved.

On initial value: that location has to be big enough to hold the whole stack, and the implementation of malloc has to know about it because it can't allocate heap space over it. That's why some tooling allows you to specify max size.

Anything can be stored before / after (though after you'd likely have prior stack values). WebAssembly doesn't currently have guard pages, so exhausting the in-memory stack will clobber heap values (unless the code generator also emits stack checks). That's all "memory safe" in that it still can't escape the WebAssembly.Memory, so the browser can't get owned but the developer's own code can totally be owned. A memory-safe language built on top of WebAssembly would have to enforce memory safety within the WebAssembly.Memory.

Note that I haven't explained 1. and 2. Their existence means that most C++ programs will use less in-memory stack in WebAssembly than a native C++ program uses stack.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you for your detailed answer. What I am trying to do is to create a little runtime myself to run some super basic C code that I compile with cmake -> s2wasm -> wast2wasm. As you noted, malloc needs to know where the stack is so that it doesn't alloc over it. But how does it know that? Is it correct to assume that, when using the binaryen toolchain, the initial stack pointer value points to the maximum offset, exclusive, that the VM should put stack values at - and everything beyond that offset can be used by malloc?
The linker outputs __stack_pointer, and if you specify a stack allocation size it'll also output a relocation called .stack with that value. That's not the only way to go though, if you roll your own I suggest you look at tool-conventions which, among other things, proposes using a global for the stack pointer.
Is it possible to produce such a global with clang/s2wasm/wasm-as alone? I now ended up with int stacktop() { int* ptr; return (int)&ptr + sizeof(int*); }, which is probably as stupid as one would expect from a guy coming to WebAssembly from a JS background. But anyway, thanks for your patience!
I mean "global" in the WebAssembly binary sense. This doesn't have a direct mapping from C++: I expect C++ global variables to be simple heap values unless the compiler can prove that the address never escapes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.