4

I have a lot of repetitive code in my unit tests, which looks like this:

#[rustfmt::skip]
let bytes = [
    0x00, // Byte order
    0x00, 0x00, 0x00, 0x02, // LineString
    0x00, 0x00, 0x00, 0x02, // Number of points
    0x3f, 0xf0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 1.0
    0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 2.0
    0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 2.0
    0x3f, 0xf0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 1.0
];

It's a concatenation of encoded primitives: u8, u32 and f64, which can be in big-endian or little-endian byte order. (For the curious: it's WKB.)

Of course, this code is not very readable or maintainable. I'd like to clean it up like this:

/// 1.0f64, big endian.
const ONE_BE: [u8; 8] = [0x3f, 0xf0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00];
/// 2.0f64, big endian.
const TWO_BE: [u8; 8] = [0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00];

let bytes = [
    0x00, // Byte order
    0x00, 0x00, 0x00, 0x02, // LineString
    0x00, 0x00, 0x00, 0x02, // Number of points
    ...ONE_BE, ...TWO_BE,
    ...TWO_BE, ...ONE_BE,
];

Unfortunately, the ... syntax is only my invention, not actual Rust. I tried using a (declarative) macro instead, but a macro can only expand to a single item, not to a comma-separated sequence of items.

What's the most ergonomic way to accomplish this?

Keep in mind that I have only a small number of constants like ONE_BE and TWO_BE, but a large (and growing) number of tests that use these constants. So the less boilerplate in the actual tests, the better.

It's just test setup code, so performance is not a concern.

6
  • I'm not a rust expert or anything, but you might look into creating a separate structure and then std::mem::transmute it to byte slice. Commented Sep 2, 2023 at 8:27
  • 1
    I'm trying to avoid that, because it introduces additional assumptions (endianness!) and makes it less clear what's actually going into the function that's being tested. Also, the size (length) is not fixed. Commented Sep 2, 2023 at 8:28
  • transmute works by re-interpreting a memory region as another type, I'm not sure what do you mean by endianness assumption here. Though if the length is not fixed, guess transmuting a fixed type is out of the question. I don't have any better ideas. Commented Sep 2, 2023 at 8:32
  • The memory representation of u32 and f64 is not fixed; it depends on the platform. So transmute would give you a different result on, for example, x86_64 vs. ARM. Commented Sep 2, 2023 at 8:34
  • I think what @bumbread meant was transmuting (ARRAY1, ARRAY2) where ARRAY1: [u8; N] and ARRAY2: [u8; M] into [u8; N+M] (that is, "flatten" the arrays). I'm not sure whether this is safe, but that would not have endianness issues. Commented Sep 2, 2023 at 9:11

3 Answers 3

5

You can do this purely at compile-time with no unsafe by using const fn:

const fn eval(data: &[&[u8]]) -> [u8; 41] {
    let mut result = [0; 41];
    
    let mut i = 0;
    let mut result_i = 0;
    while i < data.len() {
        let mut j = 0;
        while j < data[i].len() {
            result[result_i] = data[i][j];
            result_i += 1;
            j += 1;
        }
        i += 1;
    }
    
    result
}

const BYTES: [u8; 41] = eval(&[
    &[0x00], // Byte order
    &[0x00, 0x00, 0x00, 0x02], // LineString
    &[0x00, 0x00, 0x00, 0x02], // Number of points
    &ONE_BE, &TWO_BE,
    &TWO_BE, &ONE_BE,
]);

Edit: You can deduce the size with a macro, like the following:

const fn eval<const N: usize>(data: &[&[u8]]) -> [u8; N] {
    let mut result = [0; N];
    
    let mut i = 0;
    let mut result_i = 0;
    while i < data.len() {
        let mut j = 0;
        while j < data[i].len() {
            result[result_i] = data[i][j];
            result_i += 1;
            j += 1;
        }
        i += 1;
    }
    
    result
}

const fn count_len(arr: &[&[u8]]) -> usize {
    let mut result = 0;
    let mut i = 0;
    while i < arr.len() {
        result += arr[i].len();
        i += 1;
    }
    result
}

macro_rules! declare_const {
    ( const $const_name:ident = [ $($data:tt)* ] ) => {
        const $const_name: [u8; count_len(&[ $($data)* ])] = eval(&[ $($data)* ]);
    };
}

declare_const!(const BYTES = [
    &[0x00], // Byte order
    &[0x00, 0x00, 0x00, 0x02], // LineString
    &[0x00, 0x00, 0x00, 0x02], // Number of points
    &ONE_BE, &TWO_BE,
    &TWO_BE, &ONE_BE,
]);
Sign up to request clarification or add additional context in comments.

2 Comments

This is somewhat nice, but hardcoding the total length of the array makes it useless. We could make eval const generic to get around that. However, I'd still have to manually calculate the length of the array at each call site, and the penalty for getting it wrong is pretty severe (playground).
@Thomas Edited for how you can deduce the type automatically.
1

As pointed out in the comments that can be achieved by having a datastructure that can be transmuted to a byte array, a #[repr(C)] struct containing only bytes or arrays thereof would do:

#[repr(C)]
struct Line<const N: usize> {
    byte_order: u8,
    line_string: [u8; 4],
    number_of_points: [u8; 4],
    points: [[[u8; 8]; 2]; N],
}
const ONE: [u8; 8] = [0x3f, 0xf0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00];
const TWO: [u8; 8] = [0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00];
const BYTES: [u8; 41] = unsafe {
    std::mem::transmute(Line {
        byte_order: 0,
        line_string: [0x00, 0x00, 0x00, 0x02],
        number_of_points: [0x00, 0x00, 0x00, 0x02], // Number of points
        points: [[ONE, TWO], [TWO, ONE]],
    })
};

1 Comment

Alas, this isn't flexible enough for my use case. I need to be able to test with various kinds of invalid data, for example.
0

I wrote a macro:

/// Concatenates all given byte arrays into a vector.
macro_rules! concat_bytes {
    ($($array:expr),* $(,)?) => {{
        let mut vec = Vec::<u8>::new();
        $(
            vec.extend($array);
        )*
        vec
    }}
}

Now I can do:

/// 1.0f64, big endian.
const ONE_BE: [u8; 8] = [0x3f, 0xf0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00];
/// 2.0f64, big endian.
const TWO_BE: [u8; 8] = [0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00];

let bytes = concat_bytes![
    // Individual bytes must be in brackets, but I can live with that:
    [0x00], // Byte order
    [0x00, 0x00, 0x00, 0x02], // LineString
    [0x00, 0x00, 0x00, 0x02], // Number of points
    // Constant arrays:
    ONE_BE, TWO_BE,
    // Or even this:
    2.0f64.to_be_bytes(), 1.0f64.to_be_bytes(),
];

This technique also lets me introduce helper functions if I want.

It's not an array literal because it's constructed at runtime, but in this case that doesn't matter. If anyone comes up with a purely compile-time answer, I'll accept that instead.

5 Comments

Proably better to use lazy initialization, like docs.rs/once_cell.
@ChayimFriedman Why?
Just because it's easier this way, you don't need to pass the parameter to each function (but it may be a little (little! very little!) slower).
Pass what parameter to which function?
The functions that need to access BYTES.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.