Logging with symbols

This section will show you how to utilize symbols and the ELF format to achieve super cheap logging.

Arbitrary symbols

Whenever we needed a stable symbol interface between crates we have mainly used the no_mangle attribute and sometimes the export_name attribute. The export_name attribute takes a string which becomes the name of the symbol whereas #[no_mangle] is basically sugar for #[export_name = <item-name>].

Turns out we are not limited to single word names; we can use arbitrary strings, e.g. sentences, as the argument of the export_name attribute. As least when the output format is ELF anything that doesn't contain a null byte is fine.

Let's check that out:

$ cargo new --lib foo

$ cat foo/src/lib.rs

# #![allow(unused_variables)]
#fn main() {
#[export_name = "Hello, world!"]
#[used]
static A: u8 = 0;

#[export_name = "こんにちは"]
#[used]
static B: u8 = 0;
#}
$ ( cd foo && cargo nm --lib )
foo-d26a39c34b4e80ce.3lnzqy0jbpxj4pld.rcgu.o:
0000000000000000 r Hello, world!
0000000000000000 V __rustc_debug_gdb_scripts_section__
0000000000000000 r こんにちは

Can you see where this is going?

Encoding

Here's what we'll do: we'll create one static variable per log message but instead of storing the messages in the variables we'll store the messages in the variables' symbol names. What we'll log then will not be the contents of the static variables but their addresses.

As long as the static variables are not zero sized each one will have a different address. What we're doing here is effectively encoding each message into a unique identifier, which happens to be the variable address. Some part of the log system will have to decode this id back into the message.

Let's write some code to illustrate the idea.

In this example we'll need some way to do I/O so we'll use the cortex-m-semihosting crate for that. Semihosting is a technique for having a target device borrow the host I/O capabilities; the host here usually refers to the machine that's debugging the target device. In our case, QEMU supports semihosting out of the box so there's no need for a debugger. On a real device you'll have other ways to do I/O like a serial port; we use semihosting in this case because it's the easiest way to do I/O on QEMU.

Here's the code

#![no_main]
#![no_std]

use core::fmt::Write;
use cortex_m_semihosting::{debug, hio};

use rt::entry;

entry!(main);

fn main() -> ! {
    let mut hstdout = hio::hstdout().unwrap();

    #[export_name = "Hello, world!"]
    static A: u8 = 0;

    let _ = writeln!(hstdout, "{:#x}", &A as *const u8 as usize);

    #[export_name = "Goodbye"]
    static B: u8 = 0;

    let _ = writeln!(hstdout, "{:#x}", &B as *const u8 as usize);

    debug::exit(debug::EXIT_SUCCESS);

    loop {}
}

We also make use of the debug::exit API to have the program terminate the QEMU process. This is a convenience so we don't have to manually terminate the QEMU process.

And here's the dependencies section of the Cargo.toml:

[dependencies]
cortex-m-semihosting = "0.3.1"
rt = { path = "../rt" }

Now we can build the program

$ cargo build

To run it we'll have to add the --semihosting-config flag to our QEMU invocation:

$ qemu-system-arm \
      -cpu cortex-m3 \
      -machine lm3s6965evb \
      -nographic \
      -semihosting-config enable=on,target=native \
      -kernel target/thumbv7m-none-eabi/debug/app
0x1fe0
0x1fe1

NOTE: These addresses may not be the ones you get locally because addresses of static variable are not guaranteed to remain the same when the toolchain is changed (e.g. optimizations may have improved).

Now we have two addresses printed to the console.

Decoding

How do we convert these addresses into strings? The answer is in the symbol table of the ELF file.

$ cargo objdump --bin app -- -t | grep '\.rodata\s*0*1\b'
00001fe1 g       .rodata		 00000001 Goodbye
00001fe0 g       .rodata		 00000001 Hello, world!

$ # first column is the symbol address; last column is the symbol name

objdump -t prints the symbol table. This table contains all the symbols but we are only looking for the ones in the .rodata section and whose size is one byte (our variables have type u8).

It's important to note that the address of the symbols will likely change when optimizing the program. Let's check that.

PROTIP You can set target.thumbv7m-none-eabi.runner to the long QEMU command from before (qemu-system-arm -cpu (..) -kernel) in the Cargo configuration file (.cargo/conifg) to have cargo run use that runner to execute the output binary.

$ head -n2 .cargo/config
[target.thumbv7m-none-eabi]
runner = "qemu-system-arm -cpu cortex-m3 -machine lm3s6965evb -nographic -semihosting-config enable=on,target=native -kernel"
$ cargo run --release
     Running `qemu-system-arm -cpu cortex-m3 -machine lm3s6965evb -nographic -semihosting-config enable=on,target=native -kernel target/thumbv7m-none-eabi/release/app`
0xb9c
0xb9d

$ cargo objdump --bin app --release -- -t | grep '\.rodata\s*0*1\b'
00000b9d g     O .rodata	00000001 Goodbye
00000b9c g     O .rodata	00000001 Hello, world!

So make sure to always look for the strings in the ELF file you executed.

Of course, the process of looking up the strings in the ELF file can be automated using a tool that parses the symbol table (.symtab section) contained in the ELF file. Implementing such tool is out of scope for this book and it's left as an exercise for the reader.

Making it zero cost

Can we do better? Yes, we can!

The current implementation places the static variables in .rodata, which means they occupy size in Flash even though we never use their contents. Using a little bit of linker script magic we can make them occupy zero space in Flash.

$ cat log.x
SECTIONS
{
  .log 0 (INFO) : {
    *(.log);
  }
}

We'll place the static variables in this new output .log section. This linker script will collect all the symbols in the .log sections of input object files and put them in an output .log section. We have seen this pattern in the Memory layout chapter.

The new bit here is the (INFO) part; this tells the linker that this section is a non-allocatable section. Non-allocatable sections are kept in the ELF binary as metadata but they are not loaded onto the target device.

We also specified the start address of this output section: the 0 in .log 0 (INFO).

The other improvement we can do is switch from formatted I/O (fmt::Write) to binary I/O, that is send the addresses to the host as bytes rather than as strings.

Binary serialization can be hard but we'll keep things super simple by serializing each address as a single byte. With this approach we don't have to worry about endianness or framing. The downside of this format is that a single byte can only represent up to 256 different addresses.

Let's make those changes:

#![no_main]
#![no_std]

use cortex_m_semihosting::{debug, hio};

use rt::entry;

entry!(main);

fn main() -> ! {
    let mut hstdout = hio::hstdout().unwrap();

    #[export_name = "Hello, world!"]
    #[link_section = ".log"] // <- NEW!
    static A: u8 = 0;

    let address = &A as *const u8 as usize as u8;
    hstdout.write_all(&[address]).unwrap(); // <- CHANGED!

    #[export_name = "Goodbye"]
    #[link_section = ".log"] // <- NEW!
    static B: u8 = 0;

    let address = &B as *const u8 as usize as u8;
    hstdout.write_all(&[address]).unwrap(); // <- CHANGED!

    debug::exit(debug::EXIT_SUCCESS);

    loop {}
}

Before you run this you'll have to append -Tlog.x to the arguments passed to the linker. That can be done in the Cargo configuration file.

$ cat .cargo/config
[target.thumbv7m-none-eabi]
runner = "qemu-system-arm -cpu cortex-m3 -machine lm3s6965evb -nographic -semihosting-config enable=on,target=native -kernel"
rustflags = [
  "-C", "link-arg=-Tlink.x",
  "-C", "link-arg=-Tlog.x", # <- NEW!
]

[build]
target = "thumbv7m-none-eabi"

Now you can run it! Since the output now has a binary format we'll pipe it through the xxd command to reformat it as a hexadecimal string.

$ cargo run | xxd -p
0001

The addresses are 0x00 and 0x01. Let's now look at the symbol table.

$ cargo objdump --bin app -- -t | grep '\.log'
00000001 g     O .log	00000001 Goodbye
00000000 g     O .log	00000001 Hello, world!

There are our strings. You'll notice that their addresses now start at zero; this is because we set a start address for the output .log section.

Each variable is 1 byte in size because we are using u8 as their type. If we used something like u16 then all address would be even and we would not be able to efficiently use all the address space (0...255).

Packaging it up

You've noticed that the steps to log a string are always the same so we can refactor them into a macro that lives in its own crate. Also, we can make the logging library more reusable by abstracting the I/O part behind a trait.

$ cargo new --lib log

$ cat log/src/lib.rs

# #![allow(unused_variables)]
#![no_std]

#fn main() {
pub trait Log {
    type Error;

    fn log(&mut self, address: u8) -> Result<(), Self::Error>;
}

#[macro_export]
macro_rules! log {
    ($logger:expr, $string:expr) => {{
        #[export_name = $string]
        #[link_section = ".log"]
        static SYMBOL: u8 = 0;

        $crate::Log::log(&mut $logger, &SYMBOL as *const u8 as usize as u8)
    }};
}

#}

Given that this library depends on the .log section it should be its responsibility to provide the log.x linker script so let's make that happen.

$ mv log.x ../log/
$ cat ../log/build.rs
use std::{env, error::Error, fs::File, io::Write, path::PathBuf};

fn main() -> Result<(), Box<dyn Error>> {
    // Put the linker script somewhere the linker can find it
    let out = PathBuf::from(env::var("OUT_DIR")?);

    File::create(out.join("log.x"))?.write_all(include_bytes!("log.x"))?;

    println!("cargo:rustc-link-search={}", out.display());

    Ok(())
}

Now we can refactor our application to use the log! macro:

$ cat src/main.rs
#![no_main]
#![no_std]

use cortex_m_semihosting::{
    debug,
    hio::{self, HStdout},
};

use log::{log, Log};
use rt::entry;

struct Logger {
    hstdout: HStdout,
}

impl Log for Logger {
    type Error = ();

    fn log(&mut self, address: u8) -> Result<(), ()> {
        self.hstdout.write_all(&[address])
    }
}

entry!(main);

fn main() -> ! {
    let hstdout = hio::hstdout().unwrap();
    let mut logger = Logger { hstdout };

    let _ = log!(logger, "Hello, world!");

    let _ = log!(logger, "Goodbye");

    debug::exit(debug::EXIT_SUCCESS);

    loop {}
}

Don't forget to update the Cargo.toml file to depend on the new log crate.

$ tail -n4 Cargo.toml
[dependencies]
cortex-m-semihosting = "0.3.1"
log = { path = "../log" }
rt = { path = "../rt" }
$ cargo run | xxd -p
0001

$ cargo objdump --bin app -- -t | grep '\.log'
00000001 g     O .log	00000001 Goodbye
00000000 g     O .log	00000001 Hello, world!

Same output as before!

Bonus: Multiple log levels

Many logging frameworks provide ways to log messages at different log levels. These log levels convey the severity of the message: "this is an error", "this is just a warning", etc. These log levels can be used to filter out unimportant messages when searching for e.g. error messages.

We can extend our logging library to support log levels without increasing its footprint. Here's how we'll do that:

We have a flat address space for the messages: from 0 to 255 (inclusive). To keep things simple let's say we only want to differentiate between error messages and warning messages. We can place all the error messages at the beginning of the address space, and all the warning messages after the error messages. If the decoder knows the address of the first warning message then it can classify the messages. This idea can be extended to support more than two log levels.

Let's test the idea by replacing the log macro with two new macros: error! and warn!.

$ cat ../log/src/lib.rs

# #![allow(unused_variables)]
#![no_std]

#fn main() {
pub trait Log {
    type Error;

    fn log(&mut self, address: u8) -> Result<(), Self::Error>;
}

/// Logs messages at the ERROR log level
#[macro_export]
macro_rules! error {
    ($logger:expr, $string:expr) => {{
        #[export_name = $string]
        #[link_section = ".log.error"] // <- CHANGED!
        static SYMBOL: u8 = 0;

        $crate::Log::log(&mut $logger, &SYMBOL as *const u8 as usize as u8)
    }};
}

/// Logs messages at the WARNING log level
#[macro_export]
macro_rules! warn {
    ($logger:expr, $string:expr) => {{
        #[export_name = $string]
        #[link_section = ".log.warning"] // <- CHANGED!
        static SYMBOL: u8 = 0;

        $crate::Log::log(&mut $logger, &SYMBOL as *const u8 as usize as u8)
    }};
}

#}

We distinguish errors from warnings by placing the messages in different link sections.

The next thing we have to do is update the linker script to place error messages before the warning messages.

$ cat ../log/log.x
SECTIONS
{
  .log 0 (INFO) : {
    *(.log.error);
    __log_warning_start__ = .;
    *(.log.warning);
  }
}

We also give a name, __log_warning_start__, to the boundary between the errors and the warnings. The address of this symbol will be the address of the first warning message.

We can now update the application to make use of these new macros.

$ cat src/main.rs
#![no_main]
#![no_std]

use cortex_m_semihosting::{
    debug,
    hio::{self, HStdout},
};

use log::{error, warn, Log};
use rt::entry;

entry!(main);

fn main() -> ! {
    let hstdout = hio::hstdout().unwrap();
    let mut logger = Logger { hstdout };

    let _ = warn!(logger, "Hello, world!"); // <- CHANGED!

    let _ = error!(logger, "Goodbye"); // <- CHANGED!

    debug::exit(debug::EXIT_SUCCESS);

    loop {}
}

struct Logger {
    hstdout: HStdout,
}

impl Log for Logger {
    type Error = ();

    fn log(&mut self, address: u8) -> Result<(), ()> {
        self.hstdout.write_all(&[address])
    }
}

The output won't change much:

$ cargo run | xxd -p
0100

We still get two bytes in the output but the error is given the address 0 and the warning is given the address 1 even though the warning was logged first.

Now look at the symbol table.

$ cargo objdump --bin app -- -t | grep '\.log'
00000000 g     O .log	00000001 Goodbye
00000001 g     O .log	00000001 Hello, world!
00000001         .log	00000000 __log_warning_start__

There's now an extra symbol, __log_warning_start__, in the .log section. The address of this symbol is the address of the first warning message. Symbols with addresses lower than this value are errors, and the rest of symbols are warnings.

With an appropriate decoder you could get the following human readable output from all this information:

WARNING Hello, world!
ERROR Goodbye

If you liked this section check out the stlog logging framework which is a complete implementation of this idea.