minhtran_dev/src/routes/blog/rust_stack_jmp.mdx

---
title: A Rusty Stack Jump
description: Jumping into a new stack with Rust
date: 2025-02-27
featuredImage:
featuredImageDesc:
tags:
  - rust
  - asm
  - systems
  - operating systems
  - async
---

import { Notes, PostImage } from "~/components/Markdown";
import { Tree } from "~/components/Tree";

In my quest to learn to build an async runtime in Rust, I have to learn about CPU context switching. In order to switch from one async task to another, our async runtime has to perform a context switch. This means saving the current CPU registers marked as `callee saved` by the System V ABI manual and loading the CPU registers with our new async stack.

In this article, I will show you what I have learned about jumping onto a new stack in a x86_64 CPU.

<Notes>
I'm learning about async runtimes in Rust based on the amazing book [Asynchronous Programming in Rust: Learn asynchronous programming by building working examples of futures, green threads, and runtimes](https://www.packtpub.com/en-mt/product/asynchronous-programming-in-rust-9781805128137)

It's an amazing book, don't get me wrong, but I feel like the explanation can be hand-wavy sometimes. Thus, I write this to archive my own explanation and potentially help other people who also struggle with the subject.

</Notes>

<Notes>
  Most async runtimes in Rust do not use stackful coroutines (which are used by
  Go's `gochannel`, Erlang's `processes`) and instead, use state machines to
  manage async tasks.
</Notes>

## Contents

<hr />

## Setting the stage

Why do we need to swap the stack of async tasks in a runtime with stackful coroutines ?

Async tasks, by nature, are paused and resumed. Everytime a task is paused to move into a new task, we would have to save the current context of the task that is running and load the context of the upcoming task.

## Jumping into the new stack

Here is the code in its entirely, I'd recommend you run this on the [Rust Playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2024). I have left comments through out the code so you can get the general idea.

Note that you have to manually stop the process.

```rust file="stack_swap.rs"
use core::arch::asm;

// stack size of 48 bytes so its easy to print the stack before we switch contexts
const SSIZE: isize = 48;

// a struct that represents our CPU state
//
// This struct will stores the stack pointer
#[derive(Debug, Default)]
#[repr(C)]
struct ThreadContext {
    rsp: u64,
}

// Returning ! means
// it will panic OR runs forever
fn hello() -> ! {
    println!("I LOVE WAKING UP ON A NEW STACK!");
    loop {}
}

// new is a pointer to a ThreadContext
unsafe fn gt_switch(new: *const ThreadContext) {
    // inline assembly
    asm!(
        "mov rsp, [{0} + 0x00]", // move the content of where the new pointer is pointing to, into the rsp register
        "ret", // ret pops the return address from our custom stack—in our example, the address of hello.
        in(reg) new,
    );
}

fn main() {
    // initialize
    let mut ctx = ThreadContext::default();

    // stack initialize
    // ie. 0x10
    let mut stack = vec![0_u8; SSIZE as usize];

    unsafe {
        // we get the bottom of the stack
        // remember that the stack grows downward from high memory address to low memory address
        // i.e 0x40 -> because 0x30 = 0x40 - 0x10 and 0x30 = SSIZE in decimal
        // NOTE: offset() is applied in units of the size of the type that the pointer points to
        // in our case, stack is a pointer to u8 (a byte) so offset(SSIZE) == offset(48 bytes) == offset(0x30)
        let stack_bottom = stack.as_mut_ptr().offset(SSIZE);

        // we align the bottom of the stack to be 16-byte-aligned
        // this is for performance reasons as some CPU instructions (SSE and SIMD)

        // The technicality: 15 is b1111 so if we do (stack_bottom AND !15) we will zero out the bottom 4 bits
        //
        // we also want the bottom of the stack pointer to point to a byte (8bit or u8)
        let sb_aligned = (stack_bottom as usize & !15) as *mut u8;

        // Here, we write the address of the hello function as 64 bits(8 bytes)
        // Remember that 16 bytes = 0x10 in hex
        // So we go DOWN 10 memory addresses, i.e from 0x40 to 0x30
        // NOTE: 16 bytes down (0x10) even though, the hello function pointer is ONLY 8 bytes
        // This is because the System V ABI requires the stack pointer to be always be 16-byte aligned
        std::ptr::write(sb_aligned.offset(-16) as *mut u64, hello as u64);

        // we write the stack pointer into the rsp inside context
        ctx.rsp = sb_aligned.offset(-16) as u64;

        for i in 0..SSIZE {
            println!("mem: {}, val: {}",
            sb_aligned.offset(-i as isize) as usize,
            *sb_aligned.offset(-i as isize))
        };

        // we go into the function
        // we will write our stack pointer to the cpu stack pointer
        // and `ret` will pop that stack pointer
        gt_switch(&mut ctx);
    }
}
```