---
title: A Rusty Stack Jump
description: Jumping into a new stack with Rust
date: 2025-02-27
featuredImage:
featuredImageDesc:
tags:
- rust
- asm
- systems
- operating systems
- async
---
import { Notes, PostImage } from "~/components/Markdown";
import { Tree } from "~/components/Tree";
In my quest to learn to build an async runtime in Rust, I have to learn about CPU context switching. In order to switch from one async task to another, our async runtime has to perform a context switch. This means saving the current CPU registers marked as `callee saved` by the System V ABI manual and loading the CPU registers with our new async stack.
In this article, I will show you what I have learned about jumping onto a new stack in a x86_64 CPU.
I'm learning about async runtimes in Rust based on the amazing book [Asynchronous Programming in Rust: Learn asynchronous programming by building working examples of futures, green threads, and runtimes](https://www.packtpub.com/en-mt/product/asynchronous-programming-in-rust-9781805128137)
It's an amazing book, don't get me wrong, but I feel like the explanation can be hand-wavy sometimes. Thus, I write this to archive my own explanation and potentially help other people who also struggle with the subject.
Most async runtimes in Rust do not use stackful coroutines (which are used by
Go's `gochannel`, Erlang's `processes`) and instead, use state machines to
manage async tasks.
## Contents
## Setting the stage
Why do we need to swap the stack of async tasks in a runtime with stackful coroutines ?
Async tasks, by nature, are paused and resumed. Everytime a task is paused to move into a new task, we would have to save the current context of the task that is running and load the context of the upcoming task.
## Jumping into the new stack
Here is the code in its entirely, I'd recommend you run this on the [Rust Playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2024). I have left comments through out the code so you can get the general idea.
Note that you have to manually stop the process.
```rust lang="rust" file="stack_swap.rs"
use core::arch::asm;
// stack size of 48 bytes so its easy to print the stack before we switch contexts
const SSIZE: isize = 48;
// a struct that represents our CPU state
//
// This struct will stores the stack pointer
#[derive(Debug, Default)]
#[repr(C)]
struct ThreadContext {
rsp: u64,
}
// Returning ! means
// it will panic OR runs forever
fn hello() -> ! {
println!("I LOVE WAKING UP ON A NEW STACK!");
loop {}
}
// new is a pointer to a ThreadContext
unsafe fn gt_switch(new: *const ThreadContext) {
// inline assembly
asm!(
"mov rsp, [{0} + 0x00]", // move the content of where the new pointer is pointing to, into the rsp register
"ret", // ret pops the return address from our custom stackāin our example, the address of hello.
in(reg) new,
);
}
fn main() {
// initialize
let mut ctx = ThreadContext::default();
// stack initialize
// ie. 0x10
let mut stack = vec![0_u8; SSIZE as usize];
unsafe {
// we get the bottom of the stack
// remember that the stack grows downward from high memory address to low memory address
// i.e 0x40 -> because 0x30 = 0x40 - 0x10 and 0x30 = SSIZE in decimal
// NOTE: offset() is applied in units of the size of the type that the pointer points to
// in our case, stack is a pointer to u8 (a byte) so offset(SSIZE) == offset(48 bytes) == offset(0x30)
let stack_bottom = stack.as_mut_ptr().offset(SSIZE);
// we align the bottom of the stack to be 16-byte-aligned
// this is for performance reasons as some CPU instructions (SSE and SIMD)
// The technicality: 15 is b1111 so if we do (stack_bottom AND !15) we will zero out the bottom 4 bits
//
// we also want the bottom of the stack pointer to point to a byte (8bit or u8)
let sb_aligned = (stack_bottom as usize & !15) as *mut u8;
// Here, we write the address of the hello function as 64 bits(8 bytes)
// Remember that 16 bytes = 0x10 in hex
// So we go DOWN 10 memory addresses, i.e from 0x40 to 0x30
// NOTE: 16 bytes down (0x10) even though, the hello function pointer is ONLY 8 bytes
// This is because the System V ABI requires the stack pointer to be always be 16-byte aligned
std::ptr::write(sb_aligned.offset(-16) as *mut u64, hello as u64);
// we write the stack pointer into the rsp inside context
ctx.rsp = sb_aligned.offset(-16) as u64;
for i in 0..SSIZE {
println!("mem: {}, val: {}",
sb_aligned.offset(-i as isize) as usize,
*sb_aligned.offset(-i as isize))
};
// we go into the function
// we will write our stack pointer to the cpu stack pointer
// and `ret` will pop that stack pointer
gt_switch(&mut ctx);
}
}
```