Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: A better Message trait #156

Open
GabrielDertoni opened this issue Mar 19, 2023 · 20 comments
Open

Discussion: A better Message trait #156

GabrielDertoni opened this issue Mar 19, 2023 · 20 comments

Comments

@GabrielDertoni
Copy link
Contributor

Introduction

Currently the Message trait is defined like this

pub trait Message
where
    Self: Sized,
{
    fn message_id(&self) -> u32;
    fn message_name(&self) -> &'static str;

    /// Serialize **Message** into byte slice and return count of bytes written
    fn ser(&self, version: MavlinkVersion, bytes: &mut [u8]) -> usize;

    fn parse(
        version: MavlinkVersion,
        msgid: u32,
        payload: &[u8],
    ) -> Result<Self, error::ParserError>;

    fn message_id_from_name(name: &str) -> Result<u32, &'static str>;
    fn default_message_from_id(id: u32) -> Result<Self, &'static str>;
    fn extra_crc(id: u32) -> u8;
}

Then types implementing Message are structs of several *_DATA types that store the actual data for the message. Each of those types has ser and deser methods but no unifying trait to encompass this behaviour. They also have an associated constant ENCODED_LEN and implement Default.

Current limitations and issues

  1. Returning a Result<T, &'static str> isn't very helpfull in this case. There is a single reason for this to fail: the massage with the given name/id doesn't exist! However, the type indicates that many different error messages could be emitted. A better type would be Result<T, MessageDoesNotExist> or simply Option<T>.
  2. message_id_from_name, default_message_from_id, extra_crc, message_id and message_name are all trying to get some metadata about the message, though some do that with an instance of the message and others are static. We could instead return a &'static MessageMeta type that stores all the metadata in a single place. This would allow stuff like getting the message name from an id without having to construct the message. We would also avoid having to do multiple calls if we want multiple pieces of metadata. The metadata struct can also be passed around which is pretty usefull.
  3. In order to get a default instance of a message we need to construct it, even if we only need a reference to it. So, this MessageMeta struct could also store the default value for the message. Then we would have one MessageMeta static for every message *_DATA.
  4. Currently there is a mandatory dependency on heapless, but very little of the library is actually used (only Vec) and those uses could just be standard Rust arrays.
  5. Instead of implementing Bytes and BytesMut, use the bytes crate. It's a tiny crate that does pretty much the same as this crate's implementation, but uses a trait instead of a type, and implements the trait for slices.

A better interface

pub struct MessageMeta<M: Message> {
    pub id: u32,
    pub name: &'static str,
    pub extra_crc: u8,
    pub serialized_len: u8,
    pub default: M,
}

pub trait Message: Sized + 'static {
    fn meta(&self) -> &'static MessageMeta<Self>;
    fn meta_from_id(id: u32) -> Option<&'static MessageMeta<Self>>;
    fn meta_from_name(name: &str) -> Option<&'static MessageMeta<Self>>;

    fn serialize(&self, version: MavlinkVersion, bytes: &mut [u8]) -> usize;

    fn deserialize(
        version: MavlinkVersion,
        msgid: u32,
        payload: &[u8],
    ) -> Result<Self, error::ParserError>;
}

Additionaly, every message could implement another trait

pub trait MessageData<M: Message>: Default + Into<M> {
    fn meta() -> &'static MessageMeta<M>;

    fn serialize_payload(&self, version: MavlinkVersion, payload: &mut [u8]) -> usize;
    fn deserialize_payload(version: MavlinkVersion, payload: &[u8]) -> Result<Self, ParserError>;
}

this trait is parameterized on M: Message since a single *_DATA type can be used in multiple message enums. Issues 4 and 5 are about the internal message implementation and not the external interface, but switching to use them is pretty trivial. For serde, we can use serde_arrays which is also tiny, at least until serde supports const generics.

Also, using bytes might make it easier to integrate with other libraries in the future, such as tokio_util::codec::Decoder.

Final notes

I have implemented pretty much all of these changes already here, but I would like some feedback if these are things you would be interested in changing in the library. It's a big change, but I think the API would be much better with them. Anyway, let me know what you think!

@qwerty19106
Copy link
Contributor

  1. I removed bytes dependency in #136 because of it not supports no_std mode.
    First I wanted write PR for bytes, but its API strongly depends on alloc dependency. I think that such PR will be discarded to keep backward compatibility.
  2. The std::Vec was replaced on heapless::Vec in #136.
    The parser.rs contains MavType::rust_type method which support mavlink Array type.
    But stable Rust can not initialize [T; N] yet where N >32.

P.S. The MessageData is very good idea. It allows reduce firmware flash size on microcontrollers.
Today my firmware requires 265224 bytes as flash size. I expect that it reduced to ~70000 bytes.

@GabrielDertoni
Copy link
Contributor Author

GabrielDertoni commented Mar 20, 2023

As far as I can tell, bytes does support no_std, you can simply add default_features = false to it's dependency in Cargo.toml. There is a std feature that enables the standard library, and is only required for the Bytes and BytesMut data structures, which is not what we would want.

The problem with arrays is that the Default trait must be implemented for [T; 0] with any T, but requires T: Default when N > 0. Since there is no way to express this with const generics yet, the implementation hasn't changed since the introduction of const generics and is still limited to N <= 32. So, the #[derive(Default)] macro won't work here. However, we can just generate the Default impl ourselves and use [0; N] to initialize which is supported in stable Rust.

PS: Did you mean MessageMeta? I think this is great!

@qwerty19106
Copy link
Contributor

qwerty19106 commented Mar 20, 2023

The bytes support no_std, but requires alloc. The alloc use heap, which is not recommended for most of no_std architectures, such as Cortex-M and RISC-V microcontrollers.

You can not impl Default for [T; N] in this crate, because stdcontains Default and [T; N]. Rust requires this impl be placed in std also.

Const Default is possible in nightly #67792.

@GabrielDertoni
Copy link
Contributor Author

Oh I see, I will try to make a PR for them. As far as I can tell they only use alloc for Bytes and BytesMut not the Buf and BufMut traits we're using.

What I mean is generating the Default impl for the *_DATA types. For example, let's say you are going to generate a type

#[derive(Default, Clone)]
pub struct MY_SPECIAL_MESSAGE_DATA {
    pub my_field: [u8; 128],
}

instead, generate

#[derive(Clone)]
pub struct MY_SPECIAL_MESSAGE_DATA {
    pub my_field: [u8; 128],
}

impl MY_SPECIAL_MESSAGE_DATA {
    const DEFAULT: Self = MY_SPECIAL_MESSAGE_DATA {
        my_field: [0; 128],
    };
}

impl Default for MY_SPECIAL_MESSAGE_DATA {
    fn default() -> Self {
        Self::DEFAULT.clone()
    }
}

this works on stable Rust and is already implemented here if you'd like to take a look.

@qwerty19106
Copy link
Contributor

It is good idea. You can make small PR, which remove heapless::Vec from parser?

Need to think about MessageMeta and MessageData, I think that it can simplify.

@GabrielDertoni
Copy link
Contributor Author

Will do.

My only concerns with MessageMeta are

  1. Does it generate less code? I think so.
  2. Is it any slower to compile? Compile times are already pretty bad.
  3. Do we actually want a default field in MessageMeta? That requires a generic type param, which could be avoided if this was removed.

@qwerty19106
Copy link
Contributor

I think ***_DATA::default() will be inlined. Thus default field is not necessary.

Traits supports associated constants in stable rust (1.20).
I propose this API:

pub trait MessageData {
    const ID: u32;
    const NAME: &'static str;
    const EXTRA_CRC: u8;
    const SERIALIZED_LEN: u8;

    fn serialize_payload(&self, version: MavlinkVersion, payload: &mut [u8]) -> usize;
    fn deserialize_payload(version: MavlinkVersion, payload: &[u8]) -> Result<Self, ParserError>;
}

impl MAVLinkV[1|2]MessageRaw {
    ...
    pub fn serialize_message_data(&mut self, header: MavHeader, message_data: &impl MessageData) {
        ...
    }
}

pub fn read_v[1|2]_msg_data<D: MessageData, R: Read>(
    read: &mut R,
) -> Result<(MavHeader, D), error::MessageReadError> {
    ...
}

It allows read and write ***_DATA without Message, thus .text size on microcontrollers will be reduced on ~200KB.
Typical flash size (Blue Pill STM32F103C8T6) is 128 KB.

@GabrielDertoni
Copy link
Contributor Author

I thing this is better than what we have now, however it doesn't really solve the issues of how to get message metadata from the message id or name. In this case I think using a concrete type is better, since we can also pass it around without needing dyn Trait. I don't think you can access the constants from dyn Trait too, which is possible with MessageMeta. Maybe for the DEFAULT we can make it an associated constant in the trait, but also having the field with the default is more flexible. Since we can put all this meta data inside a single type, I think it's better than having it in the trait. It would also allow for other operations such as iterating through all the message metadata which could be added in the future.

@qwerty19106
Copy link
Contributor

Your API requires ~16 bytes on each MessageMeta object.
For mavlink::common::Message RAM size will be increased (if I call Message::meta in my code) on 203*16 = 32KB.
Typical RAM size for microcontrollers (Blue Pill STM32F103C8T6) is 20 KB. Thus it is totally unacceptable for most of microcontrollers.

My suggestion is to separate the convenient Message API and the less convenient but fast and lightweight low-level ***_DATA API. One can use a few ***_DATA necessary structs in no_std context, and not use Message if it is not necessary.
The Message API remains as is, and its code call ***_DATA API. I will create PR a little later.

@GabrielDertoni
Copy link
Contributor Author

GabrielDertoni commented Mar 21, 2023

I don't understand how this can increase memory usage in comparison to the current API. Currently code needs to be generated for the Default impls anyway. All this metadata is already in the binary and will occupy RAM regardless of this change. The only difference is that by making it a static object it will live in the data segment instead of text. Do you have some concrete examples that I can use to measure this impact?

If you want to only use a few messages, just use the MessageData::meta() instead. Sure, you'll pay the full size of the metadata struct for each metadata you use but can't dead code elimination get rid of the rest?

@GabrielDertoni
Copy link
Contributor Author

I got interested and ran some tests. I used the embedded example provided by this crate in order to get some binary size measurements. Note that in the example no metadata information is used, so it should be eliminated as dead code. (The implementation of this proposal is also using the bytes crate which may accout for some noise in the measurement).

From the results bellow, its clear that the compiler can eliminate unreachable MessageMeta. I have also tested using MessageData::meta() and it has the same footprint as the case that doesn't use Message::meta().

Embedded example

Impl Binary size .text size .data size .bss size
proposed 92K 82548 0 4
proposed* 36K 28592 0 4
baseline 28K 20316 0 4

Embedded example (using Message trait to get message id)

Impl Binary size .text size .data size .bss size
proposed* 88K 80592 0 4
baseline 100K 92068 0 4

(*): Without the MessageData::default field.

How to reproduce

  1. Go to examples/embedded directory
  2. Edit Cargo.toml and add strip = true below [profile.release].
  3. Build with cargo build --release
  4. Measure total binary size with du -h target/thumbv7em-none-eabihf/release/mavlink-embedded
  5. Measure text, data and bss size with size target/thumbv7em-none-eabihf/release/mavlink-embedded

Changes to the example in order to use metadata

diff --git a/examples/embedded/src/main.rs b/examples/embedded/src/main.rs
index 4a71ca8..b3adc56 100644
--- a/examples/embedded/src/main.rs
+++ b/examples/embedded/src/main.rs
@@ -9,7 +9,7 @@ use panic_halt as _;
 use cortex_m_rt::entry;
 use hal::pac;
 use hal::prelude::*;
-use mavlink;
+use mavlink::Message;
 use stm32f3xx_hal as hal;
 
 #[entry]
@@ -57,7 +57,7 @@ fn main() -> ! {
     );
 
     // Break serial in TX and RX (not used)
-    let (mut tx, _) = serial.split();
+    let (mut tx, mut rx) = serial.split();
 
     // Create our mavlink header and heartbeat message
     let header = mavlink_header();
@@ -72,6 +72,10 @@ fn main() -> ! {
         mavlink::write_versioned_msg(&mut tx, mavlink::MavlinkVersion::V2, header, &heartbeat)
             .unwrap();
 
+        let (_, msg) = mavlink::read_versioned_msg::<mavlink::common::MavMessage, _>(&mut rx, mavlink::MavlinkVersion::V2).unwrap();
+        let id = msg.meta().id.to_be_bytes();
+        tx.write(id[0]).unwrap();
+
         // Toggle the LED
         led.toggle().unwrap();

for the version on master I just used msg.message_id() instead of msg.meta().id.

@qwerty19106
Copy link
Contributor

Embedded example is very simple. In real code I use RTOS, allocators, lock-free queues, et.al.
To send message first I get object from allocator:

use heapless::{mpmc::Q4, object_pool, Object};
pub const TX_POOL_SIZE: usize = 4;
object_pool!(TX_POOL: Message);

let message = TX_POOL::alloc().unwrap();

then initialize message

*message = mavlink_heartbeat_message();

put message to lock-free queue and call interrupt

pub static TX_QUEUE: Q4<Object<TX_POOL>> = Q4::new();

TX_QUEUE.enqueue(message).unwrap();
cortex_m::peripheral::NVIC::pend(Interrupt::DMA1_STREAM5);

get message from lock-free queue within DMA1_STREAM5 interrupt,
serialize it and send over DMA

#[interrupt]
fn DMA1_STREAM5() {
    ...

    if let Some(message) = TX_QUEUE.deque() {
        let mut raw_message = MAVLinkV2MessageRaw::new();
        raw_message.serialize_message(header, message);

        // Send raw_message by UART DMA
        ...
    }
}

Compiler generates full .text code (~80KB for TX and ~120KB for RX), because message at initializing and message as sending over DMA not related to each other for compiler.
Similarly it will not be able to optimize .data for meta.

If put aside the name and default fields, why other fields can not be constants?

@GabrielDertoni
Copy link
Contributor Author

Yes, it can be made constant. In fact, the entire MessageMeta could be constant and all of that would get inlined at compile time. So I think it is very reasonable to have

trait MessageData {
    const META: MessageMeta;
    /*...*/
}

But this has nothing to do with the reason your usecase generates a lot of code. I can see how using MessageMeta would be harmful in this scenario, since the compiler will need to generate the entire struct when all you need is the id. However, for these constrained environments even using Message::message_id will generate around 1.3 KiB (as reported by cargo bloat). If you end up using the other metadata functions you will have to generate a lot more code anyway and for this very common usecase of getting the message id we can simply do much better! If we have an instance of Message we can just look at it's enum discriminant and generate 0 extra code.

For code generation this would mean we use

#[repr(u32)]
pub enum MavMessage {
    VARIANT(VARIANT_DATA) = <id>,
    /*...*/
}

impl Message for MavMessage {
    fn message_id(&self) -> u32 {
        // SAFETY: Because `Self` is marked `repr(u32)`, its layout is a `repr(C)` `union`
        // between `repr(C)` structs, each of which has the `u8` discriminant as its first
        // field, so we can read the discriminant without offsetting the pointer.
        unsafe { *<*const _>::from(self).cast::<u32>() }
    }
}

the code and safety comment were taken from the discriminant docs. This is orthogonal to this proposal, what do you think of it? Should I open a separate PR?

With the current proposal this would mean we actually have two ways of accessing the message id:

  • msg.meta().id
  • msg.message_id()

But I think this is still very acceptable since MessageMeta is something more general you can pass around and message_id() is an optimized function for a specific (but very common) use case. I also must note that MessageMeta could be used to hold more metadata about the message itself, taken from the message definition which is usefull in other, less constrained scenarios. These other fields could be opt-in through feature flags.

@qwerty19106
Copy link
Contributor

At first it seems like a good idea. But I checked the generated assembler code on Cortex-M4 (ARM).

#![no_std]

#[repr(u32)]
pub enum E {
    A(u8) = 5,
    B(u16) = 18,
    C(u32) = 32,
    D(&'static str) = 180,
    E(u32) = 64,
    F(u32) = 133,
    G(u32) = 1240,
}

#[inline(never)]
pub fn match1(e: E)-> u32 {
    match e {
        E::A(v) => v as u32,
        E::B(v) => v as u32,
        E::C(v) => v,
        E::D(v) => v.len() as u32,
        E::E(v) => v,
        E::F(v) => v,
        E::G(v) => v,
    }
}
-Copt-level=3 --target thumbv7em-none-eabihf -C debuginfo=0
example::match1:
        ldr     r1, [r0]
        cmp     r1, #63
        ble     .LBB0_3
        cmp     r1, #179
        bgt     .LBB0_5
        cmp     r1, #64
        b       .LBB0_6
.LBB0_3:
        cmp     r1, #5
        itt     eq
        ldrbeq  r0, [r0, #4]
        bxeq    lr
        cmp     r1, #18
        itt     eq
        ldrheq  r0, [r0, #4]
        bxeq    lr
        b       .LBB0_6
.LBB0_5:
        cmp     r1, #180
        itt     eq
        ldreq   r0, [r0, #8]
        bxeq    lr
.LBB0_6:
        ldr     r0, [r0, #4]
        bx      lr

It is long if-else sequence.
But for usual case

#![no_std]

pub enum E {
    A(u8),
    B(u16),
    C(u32),
    D(&'static str),
    E(u32),
    F(u32),
    G(u32),
}

compiler generates one tbb table (placed in FLASH):

example::match1:
        ldrb    r1, [r0]
        tbb     [pc, r1]
        .byte   (.LBB0_3-(.LCPI0_0+4))/2
        .byte   (.LBB0_4-(.LCPI0_0+4))/2
        .byte   (.LBB0_2-(.LCPI0_0+4))/2
        .byte   (.LBB0_5-(.LCPI0_0+4))/2
        .byte   (.LBB0_2-(.LCPI0_0+4))/2
        .byte   (.LBB0_2-(.LCPI0_0+4))/2
        .byte   (.LBB0_2-(.LCPI0_0+4))/2
        ldr     r0, [r0, #4]
        bx      lr
        ldrb    r0, [r0, #1]
        bx      lr
        ldrh    r0, [r0, #2]
        bx      lr
        ldr     r0, [r0, #8]
        bx      lr

because discriminant is 1, 2, 3, ...

For mavlink::common::MavMessage compiler generates 3 tbb table and 5 if-else.
I checked aarch64-unknown-linux-gnu target also. The result is the same.
I guess that assembler code will be analogious for x86 and x86-64.

Summary

The msg.message_id() will be very fast, but any match msg { ... all variants ... } will be slow. Also it will be require more FLASH size then usual.

@patrickelectric
Copy link
Member

Interesting finds!

@GabrielDertoni
Copy link
Contributor Author

GabrielDertoni commented Mar 24, 2023

Oh, interesting... But I guess this example isn't large enough to represent a real usecase. In particular, pretty much all message ids are in range 0..256 because of MAVLink V1 compatibility, and message ids are not sparse. So in fact what you get is many, many messages in that range and only a few outside of it. Which won't generate code as bad as simply a bunch of if-elses. In fact, when using mavlink::common::MavMessage, the enum is so large that you need some jumps in the middle anyway because (I think) the table instruction can't reach that far. I have written a closer to real world example here if you want to check it out. It will still generate tables, just like it would without the explicit discriminant, however it will generate one table for each chunk of message ids. In the example, if you compile with --cfg discriminant the explicit discriminants will be used, and if you remove this flag it will let Rust assign the discriminant values as it wishes. I also added the message_id function and in the --cfg discriminant case it is a single instruction (not counting the return of the function) and without this it is another KiB of generated code.

Summary

match msg { ... all variants ... } is probably just as fast with explicit discriminants as without them, since in real-world cases message ids are not sparse and lie mostly in the range 0..256 so the compiler will generate jump tables too. The previous example does not represent well real-world scenarios since it assumes sparse message ids and only few variants. Example.

@qwerty19106
Copy link
Contributor

You example with --cfg discriminant generates 2 table which contains 216 LBB0_210 (padding). It gives ~432 bytes.
Without --cfg discriminant compiler generates 1 table with 0 padding.

With --cfg discriminant the fn message_id(&self) performed for 1 tick (when function inlined).
Without --cfg discriminant it performed for ~5 ticks (when function inlined). On ARM It is fast already (but I not know x86 ASM to check it).

@GabrielDertoni
Copy link
Contributor Author

Great, so can we agree that this optimization is indeed better overall? Since although without --cfg discriminant, fn message_id(&self) is still fast, that's not the point of the optimization, the point is saving ~1KiB of unecessary code. And jumping through two tables instead of one for match msg { ... all variants ... } is still acceptable.

@qwerty19106
Copy link
Contributor

With --cfg discriminant compiler generates 2 table (836 bytes) for match msg { ... all variants ... }, and 0 table (~4 bytes) for fn message_id(&self).
Without --cfg discriminant compiler generates 1 table (406 bytes) for match msg { ... all variants ... }, and 1 table (406 bytes) for fn message_id(&self).

Thus FLASH consumption is approximately the same.

@GabrielDertoni
Copy link
Contributor Author

Well, yes. But that's only if you actually use match msg { ... al variants ... } and, if you're doing that you'll have to pay for it. The optimization is for the usecase where you ONLY want message_id and for that case you should pay nothing for it! If you want to get the message name and other metadata THEN you pay for what you use. The problem this addresses is that paying the extra 1KiB is always required, even if all you need is the message id.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants