Compare the Assembly Generated for Static vs Dynamic Dispatch in Rust

Traits in Rust are similar to interfaces in other languages. Traits permit the implementation of a common interface for multiple types. Developers can then write code based on the traits rather than the concrete types.

When a trait-specified method is called, the compiler will generate code to dispatch the call to the concrete method implementing the trait specification. Rust supports two types of dispatch:

Static dispatch is the default dispatch mode when the concrete type can be determined at compile time.
Dynamic dispatch is used when the concrete type implementing the trait is not known at compile time.

We will use the following example to illustrate the difference between static and dynamic dispatch.

// The Shape trait requires that the implementor have a method called `area`
// that returns the area as the associated type `Shape::T`.
pub trait Shape {
    type T;
    fn area(&self) -> Self::T;
}

// A generic Point for a specified type T.
pub struct Point<T> {
    x: T,
    y: T,
}

// A generic Rectangle for a specified type T.
pub struct Rectangle<T> {
    top_left: Point<T>,
    bottom_right: Point<T>,
}

// Implement the Shape trait for the Rectangle using a generic type T.
// The `T` is the return type of the `area` method. The where clause specifies
// that `T` must support subtraction, multiplication and the ability to copy.
impl<T> Shape for Rectangle<T>
where
    T: std::ops::Sub<Output = T> + std::ops::Mul<Output = T> + Copy,
{
    type T = T;
    fn area(&self) -> T {
        let width = self.bottom_right.x - self.top_left.x;
        let height = self.top_left.y - self.bottom_right.y;

        width * height
    }
}

// A function that calculates the area of two shapes and returns a tuple containing the areas.
// This function requires that the two shapes implement the Shape trait. The function will call
// the area function via a static dispatch. Code will be generated for the function only if concrete
// types are specified for `a` and `b`.
pub fn area_pair_static(a: impl Shape<T = f64>, b: impl Shape<T = f64>) -> (f64, f64) {
    (a.area(), b.area())
}

pub fn static_dispatch_pair(a: Rectangle<f64>, b: Rectangle<f64>) -> (f64, f64) {
    // The following line will generate code for the function as concrete types are specified for `a` and `b`.
    area_pair_static(a, b)
}

// This function performs the same function as `area_pair_static` but uses a dynamic dispatch. The compiler
// will generate code for this function. The calls to `area` are made through the `vtable`.
pub fn area_pair_dynamic(a: &dyn Shape<T = f64>, b: &dyn Shape<T = f64>) -> (f64, f64) {
    (a.area(), b.area())
}

Type of dispatch Parameters a and b (highlighted in green)

Static dispatch: The static_dispatch_pair expects a and b parameters as values with concrete types (impl Shape<T = f64> is syntactic sugar for generic parameters that implement the stated trait.). Thus the called function knows the layout of a and b at compile time. This also means any calls to the area method in the Shape trait can be made directly. These calls, as we will see, can also be inlined.

Dynamic dispatch: The area_pair_dynamic function expects &dyn Shape<T = f64> references to a and b. The function has no access to the concrete type implementing the trait. Even the calls to the area method must be made via a function pointer. The function pointer is obtained via a vtable that points to a memory block that contains pointers to the methods implemented for the trait.

Type of dispatch	Parameters a and b (highlighted in green)
Static dispatch: The `static_dispatch_pair` expects `a` and `b` parameters as values with concrete types (`impl Shape<T = f64>` is syntactic sugar for generic parameters that implement the stated trait.). Thus the called function knows the layout of `a` and `b` at compile time. This also means any calls to the `area` method in the `Shape` trait can be made directly. These calls, as we will see, can also be inlined.
Dynamic dispatch: The `area_pair_dynamic` function expects `&dyn Shape<T = f64>` references to `a` and `b`. The function has no access to the concrete type implementing the trait. Even the calls to the `area` method must be made via a function pointer. The function pointer is obtained via a `vtable` that points to a memory block that contains pointers to the methods implemented for the trait.

Static dispatch

pub fn area_pair_static(a: impl Shape<T = f64>, b: impl Shape<T = f64>) -> (f64, f64) {
    (a.area(), b.area())
}

pub fn static_dispatch_pair(a: Rectangle<f64>, b: Rectangle<f64>) -> (f64, f64) {
    area_pair_static(a, b)
}

The static dispatch code executes in the call to area_pair_static function. The function requires that a and b should belong to a type that implements the Shape<T = f64> trait. As mentioned earlier, the syntax used for impl based parameter passing is syntactic sugar to a generic-based representation.

pub fn area_pair_static<S, T> (a: S, b: T) -> (f64, f64) 
where
    S: Shape<T = f64>,
    T: Shape<T = f64>,
{
    (a.area(), b.area())
}

With the generic representation, the area_pair_static function is generic over the S and T types. The function requires that S and T should implement the Shape<T = f64> trait. The function body calls the area method on the a and b parameters. The compiler will only generate code for the function if concrete types are specified for a and b. The function static_dispatch_pair does exactly that. S and T are Rectangle<f64>.

Rectangle<f64> layout is shown below. The top_left and bottom_right fields are of type Point<f64>. The x and y fields are of type f64. The layout of Rectangle<f64> also shows the byte offset from the starting address on the left side of each field. This will help us understand the assembly code generated for the function.

struct passed by value

The following code is generated for the static_dispatch_pair function. The function is inlined, and the code for the area_pair_static and area functions has been inlined. The assembly code is annotated with comments that help map it to the Rust code. The generated code uses vector instructions to speed up the computation of the two area calculations. The two area tuples are returned in the xmm0 and xmm1 registers.

; Input parameters:
; a: Rectangle<f64> in rdi
; b: Rectangle<f64> in rsi
; Output parameters:
; (f64, f64) in (lower xmm0, lower xmm1)

example::static_dispatch_pair:
        movupd  xmm1, xmmword ptr [rdi + 8] ; Vector load a.top_left.y and a.bottom_right.x into xmm1 (unaligned load)
                                            ; xmm1 = (a.top_left.y, a.bottom_right.x)
        movsd   xmm0, qword ptr [rdi + 24]  ; Load a.bottom_right.y into lower xmm0
        movhpd  xmm0, qword ptr [rdi]       ; Load a.top_left.x into upper xmm0
                                            ; xmm0 = (a.top_left.y, a.top_left.x)
        subpd   xmm1, xmm0                  ; xmm1 = (a.top_left.y - a.bottom_right.y, a.bottom_right.x - a.top_left.x)
                                            ; xmm1 = (a:height, a:width)                                            
        movapd  xmm0, xmm1                  ; xmm0 = (a:height, a:width)
        unpckhpd xmm0, xmm1                 ; xmm0 = (a:width, a:width)
                                            ; xmm1 = (a:height, a:height)
        mulsd   xmm0, xmm1                  ; lower xmm0 = a:width * a:height = a:area

        movupd  xmm2, xmmword ptr [rsi + 8] ; Vector load b.top_left.y and b.bottom_right.x into xmm2 (unaligned load)
                                            ; xmm2 = (b.top_left.y, b.bottom_right.x)
        movsd   xmm1, qword ptr [rsi + 24]  ; Load b.bottom_right.y into lower xmm1
        movhpd  xmm1, qword ptr [rsi]       ; Load b.top_left.x into upper xmm1
                                            ; xmm1 = (b.top_left.y, b.top_left.x)
        subpd   xmm2, xmm1                  ; xmm2 = (b.top_left.y - b.bottom_right.y, b.bottom_right.x - b.top_left.x)
                                            ; xmm2 = (b:height, b:width)
        movapd  xmm1, xmm2                  ; xmm1 = (b:height, b:width)
        unpckhpd xmm1, xmm2                 ; xmm1 = (b:width, b:width)
                                            ; xmm2 = (b:height, b:height)
        mulsd   xmm1, xmm2                  ; lower xmm1 = b:width * b:height = b:area
        ret                                 ; Return (a:area, b:area) in (xmm0, xmm1)

Dynamic dispatch

pub fn area_pair_dynamic(a: &dyn Shape<T = f64>, b: &dyn Shape<T = f64>) -> (f64, f64) {
    (a.area(), b.area())
}

The area_pair_dynamic function takes two trait objects as parameters. The function body calls the area method on the a and b parameters. The compiler generates code for the function referencing the trait object. There is no reference to the concrete types.

The layout of the trait object is shown below. It is a tuple of two-pointers. The first pointer is the object's address. The second pointer is the address of the object's vtable, which contains the addresses of the trait methods implemented by the object.

In addition to the function pointers, the vtable also contains a pointer to the drop function, the size, and the alignment of the concrete type. The drop function is called when the object goes out of scope. This typically happens when a trait object is contained in a smart pointer like Box. The drop function is called on the object if the' Box' is dropped. The size and alignment of the concrete type are used to free memory for the object.

dyn trait pointer

The following code is generated for the area_pair_dynamic function. The code has been annotated with comments that help map it to the Rust code.

The compiler passes each trait object pointer via two registers. One register carries the address of the object. The other register carries the address of the object's vtable. The generated code obtains the address of the area function from the object's vtable. It sets the self parameter to the address of the object. The area function is then called using the area function pointer. The area function is called twice, once for each object.

; Input parameters:
; rdi contains the address of the first object (a).
; rsi contains the address of the vtable of the first object (a)
; rdx contains the address of the second object (b).
; rcx contains the address of the vtable of the second object (b)
; Output parameters:
; Tuple<f64, f64> in (xmm0, xmm1)

example::area_pair_dynamic:
        push    r14                     ; Save the previous r14
        push    rbx                     ; Save the previous rbx
        push    rax                     ; Save the previous rax
        mov     r14, rcx                ; r14 = address of the vtable of b
        mov     rbx, rdx                ; rbx = b (Get the address of b)

        ; Dynamic dispatch:
        ; rdi contains the address of a (self)
        ; rsi contains the address of the vtable for a's type
        call    qword ptr [rsi + 24]    ; Obtain the address of the area function from the vtable and call it.
        movsd   qword ptr [rsp], xmm0   ; Save a.area() return value in a local variable

        mov     rdi, rbx                ; rdi = address of b

        ; Dynamic dispatch:
        ; rdi contains the address of b (self)
        ; r14 contains the address of the vtable for b's type
        call    qword ptr [r14 + 24]    ; Obtain the address of the area function from the vtable and call it.
        movaps  xmm1, xmm0              ;Save b.area() return value in xmm1

        movsd   xmm0, qword ptr [rsp]   ; Get a.area() return value from a local variable
        add     rsp, 8                  ; Remove local variable from the stack
        pop     rbx                     ; Restore the previous rbx
        pop     r14                     ; Restore the previous r14
        ret                             ; Return a.area() and b.area() as (xmm0, xmm1)

Key takeaways

The key takeaways can be summarized as follows:

Static dispatch	Dynamic dispatch
The compiler can inline the static dispatch code for the concrete type.	The code generated for a dynamic dispatch is inefficient as the compiler does not know the concrete type and invokes the called trait method via a function pointer contained in the `vtable` associated with the concrete type.
The compiler generates code that is specific to the concrete type. This can result in code bloat if static dispatch is used for many types that implement the applicable trait.	The compiler generates code that works for all types that implement the referenced trait.
On a 64-bit system, the compiler generates code that passes the concrete type as a 64-bit pointer.	On a 64-bit system, the trait object is referenced via a 64-bit fat pointer that is a tuple of the 64-bit pointer to the object and a 64-bit pointer to the `vtable` associated with the concrete type.
In our example, the compiler inlined the calls to the `area` method.	In our example, the compiler generated code called the `area` method via a function pointer obtained from the `vtable`.

Experiment with the Compiler Explorer

Edit the code in the Compiler Explorer and see the assembly generated for the code.

Freeing memory for trait objects

Wrap the trait objects in a Box and see the assembly generated for the area_pair_dynamic function. You will see that the compiler generates code to free the memory for the trait objects. This is because the Box is dropped when the function returns.

pub fn area_pair_dynamic(a: Box<&dyn Shape<T = f64>>, b: Box<&dyn Shape<T = f64>>) -> (f64, f64) {
 (a.area(), b.area())
}

Auto conversion of dynamic dispatch to static dispatch

Add the following function in the code and see the assembly generated. In this case, the compiler can optimize the code and generate the same assembly as the static dispatch version. This is because the compiler knows that the type of the arguments is Rectangle<f64>, and it can inline the code for the area function.

pub fn dynamic_dispatch_pair(a: Rectangle<f64>, b: Rectangle<f64>) -> (f64, f64) {
    area_pair_dynamic(&a, &b)
}