Assembly on stable

So far we have managed to boot the device and handle interrupts without a single line of assembly. That's quite a feat! But depending on the architecture you are targeting you may need some assembly to get to this point. There are also some operations like context switching that require assembly, etc.

The problem is that both inline assembly (asm!) and free form assembly (global_asm!) are unstable, and there's no estimate for when they'll be stabilized, so you can't use them on stable . This is not a showstopper because there are some workarounds which we'll document here.

To motivate this section we'll tweak the HardFault handler to provide information about the stack frame that generated the exception.

Here's what we want to do:

Instead of letting the user directly put their HardFault handler in the vector table we'll make the rt crate put a trampoline to the user-defined HardFault handler in the vector table.

$ tail -n36 ../rt/src/lib.rs

# #![allow(unused_variables)]
#fn main() {
extern "C" {
    fn NMI();
    fn HardFaultTrampoline(); // <- CHANGED!
    fn MemManage();
    fn BusFault();
    fn UsageFault();
    fn SVCall();
    fn PendSV();
    fn SysTick();
}

#[link_section = ".vector_table.exceptions"]
#[no_mangle]
pub static EXCEPTIONS: [Vector; 14] = [
    Vector { handler: NMI },
    Vector { handler: HardFaultTrampoline }, // <- CHANGED!
    Vector { handler: MemManage },
    Vector { handler: BusFault },
    Vector {
        handler: UsageFault,
    },
    Vector { reserved: 0 },
    Vector { reserved: 0 },
    Vector { reserved: 0 },
    Vector { reserved: 0 },
    Vector { handler: SVCall },
    Vector { reserved: 0 },
    Vector { reserved: 0 },
    Vector { handler: PendSV },
    Vector { handler: SysTick },
];

#[no_mangle]
pub extern "C" fn DefaultExceptionHandler() {
    loop {}
}
#}

This trampoline will read the stack pointer and then call the user HardFault handler. The trampoline will have to be written in assembly:

  mrs r0, MSP
  b HardFault

Due to how the ARM ABI works this sets the Main Stack Pointer (MSP) as the first argument of the HardFault function / routine. This MSP value also happens to be a pointer to the registers pushed to the stack by the exception. With these changes the user HardFault handler must now have signature fn(&StackedRegisters) -> !.

.s files

One approach to stable assembly is to write the assembly in an external file:

$ cat ../rt/asm.s
  .section .text.HardFaultTrampoline
  .global HardFaultTrampoline
  .thumb_func
HardFaultTrampoline:
  mrs r0, MSP
  b HardFault

And use the cc crate in the build script of the rt crate to assemble that file into an object file (.o) and then into an archive (.a).

$ cat ../rt/build.rs
use std::{env, error::Error, fs::File, io::Write, path::PathBuf};

use cc::Build;

fn main() -> Result<(), Box<dyn Error>> {
    // build directory for this crate
    let out_dir = PathBuf::from(env::var_os("OUT_DIR").unwrap());

    // extend the library search path
    println!("cargo:rustc-link-search={}", out_dir.display());

    // put `link.x` in the build directory
    File::create(out_dir.join("link.x"))?.write_all(include_bytes!("link.x"))?;

    // assemble the `asm.s` file
    Build::new().file("asm.s").compile("asm"); // <- NEW!

    // rebuild if `asm.s` changed
    println!("cargo:rerun-if-changed=asm.s"); // <- NEW!

    Ok(())
}

$ tail -n2 ../rt/Cargo.toml
[build-dependencies]
cc = "1.0.25"

And that's it!

We can confirm that the vector table contains a pointer to HardFaultTrampoline by writing a very simple program.

#![no_main]
#![no_std]

use rt::entry;

entry!(main);

fn main() -> ! {
    loop {}
}

#[allow(non_snake_case)]
#[no_mangle]
pub fn HardFault(_ef: *const u32) -> ! {
    loop {}
}

Here's the disassembly. Look at the address of HardFaultTrampoline.

$ cargo objdump --bin app --release -- -d -no-show-raw-insn -print-imm-hex

app:	file format ELF32-arm-little


Disassembly of section .text:

HardFault:
                b	#-0x4 <HardFault>

main:
                trap

Reset:
                bl	#-0x6
                trap

DefaultExceptionHandler:
                b	#-0x4 <DefaultExceptionHandler>

UsageFault:
                <unknown>

HardFaultTrampoline:
                mrs	r0, msp
                b	#-0x14 <HardFault>

NOTE: To make this disassembly smaller I commented out the initialization of RAM

Now look at the vector table. The 4th entry should be the address of HardFaultTrampoline plus one.

$ cargo objdump --bin app --release -- -s -j .vector_table

app:	file format ELF32-arm-little

Contents of section .vector_table:
 0000 00000120 45000000 4b000000 4d000000  ... E...K...M...
 0010 4b000000 4b000000 4b000000 00000000  K...K...K.......
 0020 00000000 00000000 00000000 4b000000  ............K...
 0030 00000000 00000000 4b000000 4b000000  ........K...K...

.o / .a files

The downside of using the cc crate is that it requires some assembler program on the build machine. For example when targeting ARM Cortex-M the cc crate uses arm-none-eabi-gcc as the assembler.

Instead of assembling the file on the build machine we can ship a pre-assembled file with the rt crate. That way no assembler program is required on the build machine. However, you would still need an assembler on the machine that packages and publishes the crate.

There's not much difference between an assembly (.s) file and its compiled version: the object (.o) file. The assembler doesn't do any optimization; it simply chooses the right object file format for the target architecture.

Cargo provides support for bundling archives (.a) with crates. We can package object files into an archive using the ar command and then bundle the archive with the crate. In fact, this what the cc crate does; you can see the commands it invoked by searching for a file named output in the target directory.

$ grep running $(find target -name output)
running: "arm-none-eabi-gcc" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-mthumb" "-march=armv7-m" "-Wall" "-Wextra" "-o" "/tmp/app/target/thumbv7m-none-eabi/debug/build/rt-6ee84e54724f2044/out/asm.o" "-c" "asm.s"
running: "ar" "crs" "/tmp/app/target/thumbv7m-none-eabi/debug/build/rt-6ee84e54724f2044/out/libasm.a" "/home/japaric/rust-embedded/embedonomicon/ci/asm/app/target/thumbv7m-none-eabi/debug/build/rt-6ee84e54724f2044/out/asm.o"
$ grep cargo $(find target -name output)
cargo:rustc-link-search=/tmp/app/target/thumbv7m-none-eabi/debug/build/rt-6ee84e54724f2044/out
cargo:rustc-link-lib=static=asm
cargo:rustc-link-search=native=/tmp/app/target/thumbv7m-none-eabi/debug/build/rt-6ee84e54724f2044/out

We'll do something similar to produce an archive.

$ # most of flags `cc` uses have no effect when assembling so we drop them
$ arm-none-eabi-as -march=armv7-m asm.s -o asm.o

$ ar crs librt.a asm.o

$ arm-none-eabi-objdump -Cd librt.a
In archive librt.a:

asm.o:     file format elf32-littlearm


Disassembly of section .text.HardFaultTrampoline:

00000000 <HardFaultTrampoline>:
   0:	f3ef 8008 	mrs	r0, MSP
   4:	e7fe      	b.n	0 <HardFault>

Next we modify the build script to bundle this archive with the rt rlib.

$ cat ../rt/build.rs
use std::{
    env,
    error::Error,
    fs::{self, File},
    io::Write,
    path::PathBuf,
};

fn main() -> Result<(), Box<dyn Error>> {
    // build directory for this crate
    let out_dir = PathBuf::from(env::var_os("OUT_DIR").unwrap());

    // extend the library search path
    println!("cargo:rustc-link-search={}", out_dir.display());

    // put `link.x` in the build directory
    File::create(out_dir.join("link.x"))?.write_all(include_bytes!("link.x"))?;

    // link to `librt.a`
    fs::copy("librt.a", out_dir.join("librt.a"))?; // <- NEW!
    println!("cargo:rustc-link-lib=static=rt"); // <- NEW!

    // rebuild if `librt.a` changed
    println!("cargo:rerun-if-changed=librt.a"); // <- NEW!

    Ok(())
}

Now we can test this new version against the simple program from before and we'll get the same output.

$ cargo objdump --bin app --release -- -d -no-show-raw-insn -print-imm-hex

app:	file format ELF32-arm-little


Disassembly of section .text:

HardFault:
                b	#-0x4 <HardFault>

main:
                trap

Reset:
                bl	#-0x6
                trap

DefaultExceptionHandler:
                b	#-0x4 <DefaultExceptionHandler>

UsageFault:
                <unknown>

HardFaultTrampoline:
                mrs	r0, msp
                b	#-0x14 <HardFault>

NOTE: As before I have commented out the RAM initialization to make the disassembly smaller.

$ cargo objdump --bin app --release -- -s -j .vector_table

app:	file format ELF32-arm-little

Contents of section .vector_table:
 0000 00000120 45000000 4b000000 4d000000  ... E...K...M...
 0010 4b000000 4b000000 4b000000 00000000  K...K...K.......
 0020 00000000 00000000 00000000 4b000000  ............K...
 0030 00000000 00000000 4b000000 4b000000  ........K...K...

The downside of shipping pre-assembled archives is that, in the worst case scenario, you'll need to ship one build artifact for each compilation target your library supports.