Nested async/await in Rust: Desugaring and assembly

This article is a continuation of the previous article on the desugaring and assembly of async/await in Rust. In this article, we will look at the desugaring and assembly code of nested async/await in Rust. We will also look at how infinite loops are handled in async functions.

The patrol function

Our primary focus will be on the desugaring of the patrol function, which is a simple function that moves a unit between two positions. The patrol function is defined as follows:

async fn patrol(unit: UnitRef, poses: [i32; 2]) {
    loop {
        goto(unit.clone(), poses[0]).await;
        goto(unit.clone(), poses[1]).await;
    }
}

patrol and goto functions

Here is the complete code for the patrol and goto functions and the Unit struct. We looked at the goto function in the previous async/await article. Here we will explore the patrol function in detail.

#[derive(Default)]
struct Unit {
    /// The 1-D position of the unit. The unit can only move along this axis.
    pub pos: i32,
}
type UnitRef = Rc<RefCell<Unit>>;

/// A future that will move the unit towards `target_pos` at each step,
/// and complete when the unit has reached that position.
struct UnitGotoFuture {
    unit: UnitRef,
    target_pos: i32,
}
impl Future for UnitGotoFuture {
    type Output = ();
    fn poll(self: Pin<&mut Self>, _cx: &mut Context) -> Poll<Self::Output> {
        let unit_pos = self.unit.borrow().pos;
        if unit_pos == self.target_pos {
            Poll::Ready(())
        } else {
            self.unit.borrow_mut().pos += (self.target_pos - unit_pos).signum();
            Poll::Pending
        }
    }
}

/// Helper async function to write unit behavior nicely
async fn goto(unit: UnitRef, pos: i32) {
    UnitGotoFuture {
        unit,
        target_pos: pos,
    }
    .await;
}

/// Let a unit go back and forth between two positions
async fn patrol(unit: UnitRef, poses: [i32; 2]) {
    loop {
        goto(unit.clone(), poses[0]).await;
        goto(unit.clone(), poses[1]).await;
    }
}

The above Rust code defines a Unit struct that contains a single field, pos, which represents the position of the unit.

The code also defines a UnitGotoFuture struct, which represents a future that moves a Unit towards a target position at each step and completes when the unit has reached that position. The UnitGotoFuture struct implements the Future trait, and its poll method is used to move the unit closer to the target_pos position by adjusting the pos field of the Unit instance. The poll method returns Poll::Pending if the unit has not yet reached the target_pos position or Poll::Ready(()) if it has.

The code also defines two async functions, goto and patrol. The goto function takes a UnitRef and a pos parameter and returns a future that awaits a new instance of the UnitGotoFuture struct. The patrol function and loops infinitely, moving the unit between the two positions in poses using the goto function.

Desugaring the patrol function

Before we proceed to understand the generated assembly, let's first understand how the patrol function could be desugared into a state machine. The patrol function is desugared into a state machine that resembles the following code:

// The state machine enum
enum PatrolFuture {
    // Initial state
    Start(UnitRef, [i32; 2]),
    // Waiting for goto to complete
    WaitingToReachPosition0(UnitRef, [i32; 2], impl Future<Output = ()>),
    WaitingToReachPosition1(UnitRef, [i32; 2], impl Future<Output = ()>),
    // Final state (which is unreachable)
    Done,
}

// Implementing Future for PatrolFuture
impl Future for PatrolFuture {
    type Output = ();

    fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
        loop {
            match *self {
                // In the start state, create a goto future and move to the waiting state
                PatrolFuture::Start(unit, poses) => {
                    let fut = goto(unit.clone(), poses[0]);
                    *self = PatrolFuture::WaitingToReachPosition0(unit, poses, fut);
                }
                // In the waiting state for goto1, poll the goto future and move to the next waiting state if it's ready
                PatrolFuture::WaitingToReachPosition0(unit, poses, ref mut fut) => {
                    match Pin::new(fut).poll(cx) {
                        Poll::Ready(()) => {
                            let fut = goto(unit.clone(), poses[1]);
                            *self = PatrolFuture::WaitingToReachPosition1(unit, poses, fut);
                        }
                        Poll::Pending => return Poll::Pending,
                    }
                }
                // In the waiting state for goto2, poll the goto future and move back to the start state if it's ready
                PatrolFuture::WaitingToReachPosition1(unit, poses, ref mut fut) => {
                    match Pin::new(fut).poll(cx) {
                        Poll::Ready(()) => *self = PatrolFuture::Start(unit.clone(), poses),
                        Poll::Pending => return Poll::Pending,
                    }
                }
                // In the done state (which is unreachable), return ready
                PatrolFuture::Done => return Poll::Ready(()),
            }
        }
    }
}

// The original async function is equivalent to creating a new PatrolFuture instance
fn patrol(unit: UnitRef, pos: [i32; 2]) -> impl Future<Output = ()> {
    PatrolFuture::Start(unit.clone(), pos)
}

We see in the above code that the patrol function is desugared into a state machine that consists of three key states: Start, WaitingToReachPosition0, and WaitingToReachPosition1. The Start state is the initial state of the state machine, and it creates a new goto future and moves to the WaitingToReachPosition0 state. The WaitingToReachPosition0 state polls the goto future and moves to the WaitingToReachPosition1 state if the future is ready. The WaitingToReachPosition1 state polls the goto future and moves back to the Start state if the future is ready. The moving back to Start implements the infinite loop behavior of the patrol function.

Rust async under the hood

Closures and state machines

We have looked at the desugared code for the patrol function. The generated assembly of the async code uses closures to implement the state machine. We explored the mapping of async functions to closures when we looked at the goto in the previous async deep dive.

The patrol function is converted into a closure that takes two inputs: the closure environment and the task context. The closure environment is a struct that holds the state of the state machine. The task context is a struct that holds the state of the task that is executing the future. The closure returns a Poll value, which can be either Poll::Pending or Poll::Ready. The Poll::Pending value indicates that the future is not ready, and the Poll::Ready value indicates that the future is ready.

The following diagram shows the closure environment for the patrol function. The environment is a struct that contains the local variables of the patrol closure (highlighted in green).The environment also contains the closure environment for the goto closure that will be invoked from the patrol closure. The environment is passed to the closure as a mutable reference.

Async closure environment

The second input to the closure is the task context that is used to schedule waking up the task when the future is ready.

Async context

The closure also keeps track of the state of the state machine. The state is stored in the state field of the closure environment. The state field is updated whenever the state machine moves to a new state. Just as we saw with the desugared code, the state machine starts in the Start state and moves to the WaitingToReachPosition0 state when the goto future is created. The state machine then moves to the WaitingToReachPosition1 state when the goto future is ready. The state machine then moves back to the Start state when the goto future is ready. The move back to Start implements the infinite loop behavior of the patrol function.

Async closure state machine

Role of the async runtime and infinite async loops

This is a sequence diagram that shows an example of interaction between three participants: an executor, a patrol closure, and a goto closure. The executor is the async runtime that executes the future.

In the first loop iteration, the executor sends a poll message to the patrol closure, which moves to the WaitingToReachPosition0 state. The patrol closure then sends a poll message to the goto closure, which moves the unit to the first position. The goto closure returns a Poll::Pending message to the patrol closure, which in turn returns a Poll::Pending message to the executor. The process is repeated until the unit reaches the first position, at which point the goto closure returns a Poll::Ready message to the patrol closure, which then sends a poll message to the goto closure to move the unit to the second position. The goto closure returns a Poll::Pending message to the patrol closure, which in turn returns a Poll::Pending message to the executor. The process is repeated until the unit reaches the second position, at which point the goto closure returns a Poll::Ready message to the patrol closure, which then goes back to the Start state.

In the second loop iteration, the patrol closure sends a poll message to the goto closure to move the unit to the first position, and the sequence repeats. This implements the infinite loop behavior of the patrol function. The point to note here is that the executor does not need to know anything about the infinite loop behavior of the patrol function. The executor simply sends a poll message to the patrol closure, and the patrol closure handles the infinite loop behavior.

Async closure sequence diagram

Flow chart of the patrol function

The following flow chart shows the overall execution flow of the patrol closure. On being called, the function uses the closure state to determine the next action to take.

In the πŸš€ Start (0) state, the function prepares the goto closure environment and invokes the goto closure. If the goto closure returns Poll::Ready, the function invokes the goto closure again. If both the goto closures return Poll::Ready, the function sets the state to πŸš€ Start (0) and jumps to continue the infinite loop. If any of the goto closures returns Poll::Pending, the function returns Poll::Pending after setting the state to πŸ•“ WaitingToReachPosition0 (3) or πŸ•ž WaitingToReachPosition1 (4).

The handling in the πŸ•“ WaitingToReachPosition0 (3) and πŸ•ž WaitingToReachPosition1 (4) states is a subset of the processing in the πŸš€ Start (0).

Async closure flow chart

The following flow chart shows how the goto closure is invoked. The steps are:

  1. The memory needed for the goto closure is allocated along with the patrol closure environment.
  2. The goto closure environment is initialized with the unit and the target position.
  3. The goto closure's starting state is set to πŸš€ Goto::Start (0).
  4. The goto closure is invoked.

Preparing a goto future

The handling of the goto closure is covered in the article Rust async under the hood: goto closure.

Deep dive into the generated assembly

We now understand the high-level flow of the async code. Let's take a look at the generated assembly code.

Input and output registers

This assembly code is generated from a patrol closure that takes two inputs: the closure environment stored in rdi, and the task context stored in rsi. The output is stored in rax and can be either Poll::Pending or Poll::Ready.

The code starts by saving the callee-saved registers and then moving the task context and closure environment to r14 and rbx, respectively. The state is then loaded from the closure environment, and a jump table is used to determine which block of code to execute based on the state.

State machine code blocks

The first block of code (state 0 - πŸš€ Start) is executed before the first future is polled. The second block of code (state 1) throws an exception, and the third block of code (state 2) panics. The fourth block of code (state 3 - πŸ•“ WaitingToReachPosition0) waits for the first future to resolve, and the fifth block of code (state 4 - πŸ•ž WaitingToReachPosition1) waits for the second future to resolve.

Async jump table

The switching between different blocks is achieved using a jump table that is indexed by the async functions state in the closure environment. The jump table is generated by the compiler. The patrol function's jump table looks like the following:

.LJTI58_0:
	.long	.LBB58_1-.LJTI58_0 	; πŸš€ Start (0): Code executed before the first future is polled.
	.long	.LBB58_3-.LJTI58_0 	; πŸ›‘ State 1: Throw exception.
	.long	.LBB58_2-.LJTI58_0 	; πŸ›‘ State 2: Panic.
	.long	.LBB58_6-.LJTI58_0 	; πŸ•“ WaitingToReachPosition0 (3): Waiting for poses[0]
	.long	.LBB58_16-.LJTI58_0 ; πŸ•ž WaitingToReachPosition1 (4): Waiting for poses[1]

States πŸ•“WaitingToReachPosition0 and πŸ•žWaitingToReachPosition1

When state 3 or 4 is executed, the reference count for the captured unit is incremented. If the reference count is zero, the code jumps to undefined behavior. Otherwise, UnitGotoFuture is prepared, and the address of the goto closure environment is stored in the closure environment. The goto closure is then called, and if the future is not ready, the state is saved, and the function returns. If the future is ready, the goto closure's state is checked, and the reference count for the captured unit is decremented accordingly. If the reference count is zero, the unit is deallocated.

The updated state is stored in the closure environment, and the function returns with the updated state and either Poll::Pending or Poll::Ready in rax.

Annotated patrol closure assembly code

Now we are ready to dig into the generated assembly code. The assembly has been annotated with comments to map it to the Rust code.

; Input:
;   rdi: The closure environment
;   rsi: The task_context
; Output:
;   rax: Poll::Pending or Poll::Ready

playground::patrol::{{closure}}:
	push	r15 ; Save the callee-saved registers
	push	r14 
	push	rbx 
	mov	r14, rsi ; Store the task_context in r14. The task_context is passed as the second argument to the closure.
	mov	rbx, rdi ; Store the closure environment in rbx. The environment is passed as the first argument to the closure.
	movzx	eax, byte ptr [rdi + 32] ; Load the state to determine which block to execute. The state is stored in the 32nd byte of the closure environment.
	lea	rcx, [rip + .LJTI58_0] ; Load the address of the jump table rcx. The jump table is a list of offsets from the start of the jump table to each block.
	movsxd	rax, dword ptr [rcx + 4*rax] ; Get the jump offset from the entry corresponding to the state. The index in rax is multiplied by 4 because the jump table is an array of 32-bit jump offsets indexed by the state.

	add	rax, rcx ; Add the jump offset to the jump table to determine the address of the block to execute.
	jmp	rax ; Jump to the correct block. 

.LBB58_1:
; πŸš€ Start (0): Code executed before the first future is polled.
; At this point rbx contains the closure environment and r14 contains the task_context.
; Block 0: Code executed when the future is first polled.
	mov	rcx, qword ptr [rbx] ; Load the captured poses array from the closure environment.
	mov	rax, qword ptr [rbx + 24] ; Load the captured Rc<RefCell<Unit>> from the closure environment.
	mov	qword ptr [rbx + 8], rax ; Store the captured Rc<RefCell<Unit>> in a local variable saved in the closure environment.
	mov	qword ptr [rbx + 16], rcx ; Store the captured poses array in a local variable saved in the closure environment.
	jmp	.LBB58_4 ; Jump to the next block.

.LBB58_2:
; πŸ›‘ State 2: Panic.
	lea	rdi, [rip + str.1]
	lea	rdx, [rip + .L__unnamed_24]
	mov	esi, 34
	call	qword ptr [rip + core::panicking::panic@GOTPCREL]

.LBB58_3:
; πŸ›‘ State 1: Throw exception.
	ud2

.LBB58_4:

; == πŸš€ Start (0): Getting ready to wait for poses[0] ==

	; At this point rbx contains the closure environment and rax points to Rc<RefCell<Unit>>.
	; Increment the Rc<RefCell<Unit>> reference count
	inc	qword ptr [rax] ; Increment the strong reference count for Rc<RefCell<Unit>>. The strong reference count is stored in the first 8 bytes of the Rc<RefCell<Unit>>.
	je	.LBB58_28 ; The reference count is 0, jump to undefined behavior. This should never happen.
	mov	ecx, dword ptr [rbx + 16] ; Load the poses[0] from the closure environment
	; Clone the Rc<RefCell<Unit>> to get a new reference to the Unit stored in the closure environment

	; Prepare UnitGotoFuture. UnitGotoFuture is a future that will wait for the Unit to reach the specified position.
	mov	qword ptr [rbx + 56], rax ; Copy the Rc<RefCell<Unit>> to the goto::{{closure}} environment
	mov	dword ptr [rbx + 64], ecx ; Save the poses[0] to the goto::{{closure}} environment
	mov	byte ptr [rbx + 68], 0; Set the initial state for the goto::{{closure}} to πŸš€ Goto::Start (0)

.LBB58_6:

; == πŸ•“ WaitingToReachPosition0 (3): Waiting for poses[0] ==

	lea	r15, [rbx + 40] ; Load the address of the environment for the goto::{{closure}}
	mov	rdi, r15 ; Load the address of the environment to the first argument register
	mov	rsi, r14 ; Load the task_context to the second argument register
	call	playground::goto::{{closure}} ; Call the goto::{{closure}} closure
	test	al, al ; Check if the future is ready
	jne	.LBB58_25 ; The future is not ready, save state and return
	; The future is ready, continue executing the patrol::{{closure}} closure
	movzx	eax, byte ptr [rbx + 68] ; Load the state of the goto::{{closure}} closure
	test	eax, eax ; Check if the goto's state is 0
	je	.LBB58_11 ; Goto's state is 0, proceed to the next goto closure after decrementing the reference count for goto's unit at offset 56 in patrol::{{closure}}'s environment
	cmp	eax, 3 ; Check if the goto's state is 3
	jne	.LBB58_14 ; The goto's state is not 0 or 3, proceed to the next goto closure without decrementing the reference count for goto's unit
	; The goto's state is 3, decrement the reference count using the goto::{{closure}} environment's unit at offset 40 in patrol::{{closure}}'s environment
	mov	rdi, qword ptr [r15] ; Load the Rc<RefCell<Unit>> from the goto::{{closure}} environment
	dec	qword ptr [rdi] ; Decrement the reference count for unit
	jne	.LBB58_14 ; The reference count is not 0, do not deallocate the unit at offset 40 in patrol::{{closure}}'s environment
	jmp	.LBB58_12

.LBB58_11:
	mov	rdi, qword ptr [rbx + 56] ; Load Rc<RefCell<Unit>> from the environment
	dec	qword ptr [rdi] ; Decrement the reference count for unit
	jne	.LBB58_14 ; The reference count is not 0, do not deallocate the unit

.LBB58_12:
	dec	qword ptr [rdi + 8] ; Decrement the reference count for unit
	jne	.LBB58_14 ; The reference count is not 0, jump to the next block
	mov	esi, 32 ; Load the size of the allocation
	mov	edx, 8 ; Load the alignment of the allocation
	call	qword ptr [rip + __rust_dealloc@GOTPCREL] ; Deallocate the allocation

.LBB58_14:
	mov	rax, qword ptr [rbx + 8] ; Load the Rc<RefCell<Unit>> from the environment
	inc	qword ptr [rax] ; Increment the reference count for unit
	je	.LBB58_28 ; The reference count is 0, jump to undefined behavior. This should never happen.
	; Prepare to wait for goto::{{closure}} environment
	mov	ecx, dword ptr [rbx + 20] ; Load poses[1] from the environment
	mov	qword ptr [rbx + 56], rax ; Store the Rc<RefCell<Unit>> in the goto::{{closure}} environment
	mov	dword ptr [rbx + 64], ecx ; Store the poses[1] in the goto::{{closure}} environment
	mov	byte ptr [rbx + 68], 0 ; Store the goto closure's initial state - πŸš€ Goto::Start (0)

.LBB58_16:

; == πŸ•ž WaitingToReachPosition1 (4): Waiting for poses[1] ==

	lea	r15, [rbx + 40] ; Get the address of the goto closure environment
	mov	rdi, r15 ; Load the address of the goto closure environment to the first argument register
	mov	rsi, r14 ; Load the task_context to the second argument register
	call	playground::goto::{{closure}} ; Call the goto closure
	test	al, al ; Check if the future is ready
	jne	.LBB58_26 ; The future is not ready yet, save state and return

	movzx	eax, byte ptr [rbx + 68] ; Load the goto closure state
	test	eax, eax ; Check if the state is 0
	je	.LBB58_21 ; The state is 0, jump to the next block
	cmp	eax, 3 ; Check if goto's state is 3
	jne	.LBB58_24 ; The state is not 3, jump to the next block
	mov	rdi, qword ptr [r15] ; Load the goto closure environment unit
	dec	qword ptr [rdi] ; Decrement the reference count for unit
	jne	.LBB58_24 ; The reference count is not 0, jump to the next block
	jmp	.LBB58_22 ; The reference count is 0, jump to the next block

.LBB58_21:
	mov	rdi, qword ptr [rbx + 56] ; Load Rc<RefCell<Unit>> from the environment
	dec	qword ptr [rdi] ; Decrement the strong reference count for unit
	jne	.LBB58_24 ; The reference count is not 0, skip deallocating the unit

.LBB58_22:
	; Strong reference count is 0, decrement the weak reference count
	dec	qword ptr [rdi + 8] ; Decrement the weak reference count for unit
	jne	.LBB58_24 ; The weak reference count is not 0, jump to the next block
	; The weak reference count is 0, deallocate the unit
	mov	esi, 32 ; Size of unit
	mov	edx, 8 ; Alignment of unit
	call	qword ptr [rip + __rust_dealloc@GOTPCREL] ; Deallocate the unit

.LBB58_24:
	mov	rax, qword ptr [rbx + 8] ; Load the Rc<RefCell<Unit>> from the environment into rax
	jmp	.LBB58_4 ; Jump to the beginning of the loop to the code handling πŸš€ Start (0)
	; Note that the compiler does not explicitly update the state variable to 0, 
	; it just jumps to the beginning of the loop. 

.LBB58_25:
	mov	al, 3 ; Set the state to "πŸ•“ WaitingToReachPosition0 (3)" for the next call to the function.
	jmp	.LBB58_27 ; Jump to return

.LBB58_26:
	mov	al, 4 ; Set the state to "πŸ•ž WaitingToReachPosition1 (4)" for the next call to the function.

.LBB58_27:
    ; The function is about to return, store the updated state 
	; so that the next time the function is called, it will continue 
	; from the correct block.
	mov	byte ptr [rbx + 32], al ; Store the updated state
	mov	al, 1 ; Return Poll::Pending 
	pop	rbx
	pop	r14
	pop	r15
	ret

; The reference count was found to be 0 after an increment, jump to undefined behavior. This should never happen.
.LBB58_28:
	ud2 ; Undefined instruction
	ud2 
	jmp	.LBB58_31 ; Drop memory and unwind the stack

.LBB58_31:
	mov	r14, rax
	mov	rdi, r15
	call	core::ptr::drop_in_place<playground::goto::{{closure}}>
	mov	rdi, qword ptr [rbx + 8]
	call	core::ptr::drop_in_place<playground::UnitGotoFuture>
	mov	byte ptr [rbx + 32], 2
	mov	rdi, r14
	call	_Unwind_Resume@PLT
	ud2

.LJTI58_0:
	.long	.LBB58_1-.LJTI58_0 	; πŸš€ Start (0): Code executed before the first future is polled.
	.long	.LBB58_3-.LJTI58_0 	; πŸ›‘ State 1: Throw exception.
	.long	.LBB58_2-.LJTI58_0 	; πŸ›‘ State 2: Panic.
	.long	.LBB58_6-.LJTI58_0 	; πŸ•“ WaitingToReachPosition0 (3): Waiting for poses[0]
	.long	.LBB58_16-.LJTI58_0 ; πŸ•ž WaitingToReachPosition1 (4): Waiting for poses[1]

.LCPI59_0:
	.quad	1
	.quad	1

πŸ—‚οΈ Vtable for the patrol closure.
.L__unnamed_25:
	.quad	core::ptr::drop_in_place<playground::patrol::{{closure}}> ; Destructor for the FnOnce trait object
	.asciz	"H\000\000\000\000\000\000\000\b\000\000\000\000\000\000" ; Size of object: 72 bytes (H)
		                                                              ; Alignment of object: 8 bytes (\b)
	.quad	playground::patrol::{{closure}} ; call_once method of the FnOnce trait


Key takeaways

Articles in the async/await series

  1. Desugaring and assembly of async/await in Rust - goto
  2. Nested async/await in Rust: Desugaring and assembly - patrol
  3. Rust async executor - executor