Assembly on stable
So far we have managed to boot the device and handle interrupts without a single line of assembly. That's quite a feat! But depending on the architecture you are targeting you may need some assembly to get to this point. There are also some operations like context switching that require assembly, etc.
The problem is that both inline assembly (asm!
) and free form assembly
(global_asm!
) are unstable, and there's no estimate for when they'll be
stabilized, so you can't use them on stable . This is not a showstopper because
there are some workarounds which we'll document here.
To motivate this section we'll tweak the HardFault
handler to provide
information about the stack frame that generated the exception.
Here's what we want to do:
Instead of letting the user directly put their HardFault
handler in the vector
table we'll make the rt
crate put a trampoline to the user-defined HardFault
handler in the vector table.
$ tail -n36 ../rt/src/lib.rs
# #![allow(unused_variables)] #fn main() { extern "C" { fn NMI(); fn HardFaultTrampoline(); // <- CHANGED! fn MemManage(); fn BusFault(); fn UsageFault(); fn SVCall(); fn PendSV(); fn SysTick(); } #[link_section = ".vector_table.exceptions"] #[no_mangle] pub static EXCEPTIONS: [Vector; 14] = [ Vector { handler: NMI }, Vector { handler: HardFaultTrampoline }, // <- CHANGED! Vector { handler: MemManage }, Vector { handler: BusFault }, Vector { handler: UsageFault, }, Vector { reserved: 0 }, Vector { reserved: 0 }, Vector { reserved: 0 }, Vector { reserved: 0 }, Vector { handler: SVCall }, Vector { reserved: 0 }, Vector { reserved: 0 }, Vector { handler: PendSV }, Vector { handler: SysTick }, ]; #[no_mangle] pub extern "C" fn DefaultExceptionHandler() { loop {} } #}
This trampoline will read the stack pointer and then call the user HardFault
handler. The trampoline will have to be written in assembly:
mrs r0, MSP
b HardFault
Due to how the ARM ABI works this sets the Main Stack Pointer (MSP) as the first
argument of the HardFault
function / routine. This MSP value also happens to
be a pointer to the registers pushed to the stack by the exception. With these
changes the user HardFault
handler must now have signature
fn(&StackedRegisters) -> !
.
.s
files
One approach to stable assembly is to write the assembly in an external file:
$ cat ../rt/asm.s
.section .text.HardFaultTrampoline
.global HardFaultTrampoline
.thumb_func
HardFaultTrampoline:
mrs r0, MSP
b HardFault
And use the cc
crate in the build script of the rt
crate to assemble that
file into an object file (.o
) and then into an archive (.a
).
$ cat ../rt/build.rs
use std::{env, error::Error, fs::File, io::Write, path::PathBuf}; use cc::Build; fn main() -> Result<(), Box<dyn Error>> { // build directory for this crate let out_dir = PathBuf::from(env::var_os("OUT_DIR").unwrap()); // extend the library search path println!("cargo:rustc-link-search={}", out_dir.display()); // put `link.x` in the build directory File::create(out_dir.join("link.x"))?.write_all(include_bytes!("link.x"))?; // assemble the `asm.s` file Build::new().file("asm.s").compile("asm"); // <- NEW! // rebuild if `asm.s` changed println!("cargo:rerun-if-changed=asm.s"); // <- NEW! Ok(()) }
$ tail -n2 ../rt/Cargo.toml
[build-dependencies]
cc = "1.0.25"
And that's it!
We can confirm that the vector table contains a pointer to HardFaultTrampoline
by writing a very simple program.
#![no_main] #![no_std] use rt::entry; entry!(main); fn main() -> ! { loop {} } #[allow(non_snake_case)] #[no_mangle] pub fn HardFault(_ef: *const u32) -> ! { loop {} }
Here's the disassembly. Look at the address of HardFaultTrampoline
.
$ cargo objdump --bin app --release -- -d -no-show-raw-insn -print-imm-hex
app: file format ELF32-arm-little
Disassembly of section .text:
HardFault:
b #-0x4 <HardFault>
main:
trap
Reset:
bl #-0x6
trap
DefaultExceptionHandler:
b #-0x4 <DefaultExceptionHandler>
UsageFault:
<unknown>
HardFaultTrampoline:
mrs r0, msp
b #-0x14 <HardFault>
NOTE: To make this disassembly smaller I commented out the initialization of RAM
Now look at the vector table. The 4th entry should be the address of
HardFaultTrampoline
plus one.
$ cargo objdump --bin app --release -- -s -j .vector_table
app: file format ELF32-arm-little
Contents of section .vector_table:
0000 00000120 45000000 4b000000 4d000000 ... E...K...M...
0010 4b000000 4b000000 4b000000 00000000 K...K...K.......
0020 00000000 00000000 00000000 4b000000 ............K...
0030 00000000 00000000 4b000000 4b000000 ........K...K...
.o
/ .a
files
The downside of using the cc
crate is that it requires some assembler program
on the build machine. For example when targeting ARM Cortex-M the cc
crate
uses arm-none-eabi-gcc
as the assembler.
Instead of assembling the file on the build machine we can ship a pre-assembled
file with the rt
crate. That way no assembler program is required on the build
machine. However, you would still need an assembler on the machine that packages
and publishes the crate.
There's not much difference between an assembly (.s
) file and its compiled
version: the object (.o
) file. The assembler doesn't do any optimization; it
simply chooses the right object file format for the target architecture.
Cargo provides support for bundling archives (.a
) with crates. We can package
object files into an archive using the ar
command and then bundle the archive
with the crate. In fact, this what the cc
crate does; you can see the commands
it invoked by searching for a file named output
in the target
directory.
$ grep running $(find target -name output)
running: "arm-none-eabi-gcc" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-mthumb" "-march=armv7-m" "-Wall" "-Wextra" "-o" "/tmp/app/target/thumbv7m-none-eabi/debug/build/rt-6ee84e54724f2044/out/asm.o" "-c" "asm.s"
running: "ar" "crs" "/tmp/app/target/thumbv7m-none-eabi/debug/build/rt-6ee84e54724f2044/out/libasm.a" "/home/japaric/rust-embedded/embedonomicon/ci/asm/app/target/thumbv7m-none-eabi/debug/build/rt-6ee84e54724f2044/out/asm.o"
$ grep cargo $(find target -name output)
cargo:rustc-link-search=/tmp/app/target/thumbv7m-none-eabi/debug/build/rt-6ee84e54724f2044/out
cargo:rustc-link-lib=static=asm
cargo:rustc-link-search=native=/tmp/app/target/thumbv7m-none-eabi/debug/build/rt-6ee84e54724f2044/out
We'll do something similar to produce an archive.
$ # most of flags `cc` uses have no effect when assembling so we drop them
$ arm-none-eabi-as -march=armv7-m asm.s -o asm.o
$ ar crs librt.a asm.o
$ arm-none-eabi-objdump -Cd librt.a
In archive librt.a:
asm.o: file format elf32-littlearm
Disassembly of section .text.HardFaultTrampoline:
00000000 <HardFaultTrampoline>:
0: f3ef 8008 mrs r0, MSP
4: e7fe b.n 0 <HardFault>
Next we modify the build script to bundle this archive with the rt
rlib.
$ cat ../rt/build.rs
use std::{ env, error::Error, fs::{self, File}, io::Write, path::PathBuf, }; fn main() -> Result<(), Box<dyn Error>> { // build directory for this crate let out_dir = PathBuf::from(env::var_os("OUT_DIR").unwrap()); // extend the library search path println!("cargo:rustc-link-search={}", out_dir.display()); // put `link.x` in the build directory File::create(out_dir.join("link.x"))?.write_all(include_bytes!("link.x"))?; // link to `librt.a` fs::copy("librt.a", out_dir.join("librt.a"))?; // <- NEW! println!("cargo:rustc-link-lib=static=rt"); // <- NEW! // rebuild if `librt.a` changed println!("cargo:rerun-if-changed=librt.a"); // <- NEW! Ok(()) }
Now we can test this new version against the simple program from before and we'll get the same output.
$ cargo objdump --bin app --release -- -d -no-show-raw-insn -print-imm-hex
app: file format ELF32-arm-little
Disassembly of section .text:
HardFault:
b #-0x4 <HardFault>
main:
trap
Reset:
bl #-0x6
trap
DefaultExceptionHandler:
b #-0x4 <DefaultExceptionHandler>
UsageFault:
<unknown>
HardFaultTrampoline:
mrs r0, msp
b #-0x14 <HardFault>
NOTE: As before I have commented out the RAM initialization to make the disassembly smaller.
$ cargo objdump --bin app --release -- -s -j .vector_table
app: file format ELF32-arm-little
Contents of section .vector_table:
0000 00000120 45000000 4b000000 4d000000 ... E...K...M...
0010 4b000000 4b000000 4b000000 00000000 K...K...K.......
0020 00000000 00000000 00000000 4b000000 ............K...
0030 00000000 00000000 4b000000 4b000000 ........K...K...
The downside of shipping pre-assembled archives is that, in the worst case scenario, you'll need to ship one build artifact for each compilation target your library supports.