How can I perform compile-time concatenation of array literals?

Question

I have a lot of repetitive code in my unit tests, which looks like this:

#[rustfmt::skip]
let bytes = [
    0x00, // Byte order
    0x00, 0x00, 0x00, 0x02, // LineString
    0x00, 0x00, 0x00, 0x02, // Number of points
    0x3f, 0xf0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 1.0
    0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 2.0
    0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 2.0
    0x3f, 0xf0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 1.0
];

It's a concatenation of encoded primitives: u8, u32 and f64, which can be in big-endian or little-endian byte order. (For the curious: it's WKB.)

Of course, this code is not very readable or maintainable. I'd like to clean it up like this:

/// 1.0f64, big endian.
const ONE_BE: [u8; 8] = [0x3f, 0xf0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00];
/// 2.0f64, big endian.
const TWO_BE: [u8; 8] = [0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00];

let bytes = [
    0x00, // Byte order
    0x00, 0x00, 0x00, 0x02, // LineString
    0x00, 0x00, 0x00, 0x02, // Number of points
    ...ONE_BE, ...TWO_BE,
    ...TWO_BE, ...ONE_BE,
];

Unfortunately, the ... syntax is only my invention, not actual Rust. I tried using a (declarative) macro instead, but a macro can only expand to a single item, not to a comma-separated sequence of items.

What's the most ergonomic way to accomplish this?

Keep in mind that I have only a small number of constants like ONE_BE and TWO_BE, but a large (and growing) number of tests that use these constants. So the less boilerplate in the actual tests, the better.

It's just test setup code, so performance is not a concern.

I'm not a rust expert or anything, but you might look into creating a separate structure and then std::mem::transmute it to byte slice. — bumbread
– bumbread, Commented Sep 2, 2023 at 8:27
I'm trying to avoid that, because it introduces additional assumptions (endianness!) and makes it less clear what's actually going into the function that's being tested. Also, the size (length) is not fixed. — Thomas
– Thomas, Commented Sep 2, 2023 at 8:28
transmute works by re-interpreting a memory region as another type, I'm not sure what do you mean by endianness assumption here. Though if the length is not fixed, guess transmuting a fixed type is out of the question. I don't have any better ideas. — bumbread
– bumbread, Commented Sep 2, 2023 at 8:32
The memory representation of u32 and f64 is not fixed; it depends on the platform. So transmute would give you a different result on, for example, x86_64 vs. ARM. — Thomas
– Thomas, Commented Sep 2, 2023 at 8:34
I think what @bumbread meant was transmuting (ARRAY1, ARRAY2) where ARRAY1: [u8; N] and ARRAY2: [u8; M] into [u8; N+M] (that is, "flatten" the arrays). I'm not sure whether this is safe, but that would not have endianness issues. — jthulhu
– jthulhu, Commented Sep 2, 2023 at 9:11

Chayim Friedman · Accepted Answer · 2023-09-03 10:41:33Z

5

You can do this purely at compile-time with no unsafe by using const fn:

const fn eval(data: &[&[u8]]) -> [u8; 41] {
    let mut result = [0; 41];
    
    let mut i = 0;
    let mut result_i = 0;
    while i < data.len() {
        let mut j = 0;
        while j < data[i].len() {
            result[result_i] = data[i][j];
            result_i += 1;
            j += 1;
        }
        i += 1;
    }
    
    result
}

const BYTES: [u8; 41] = eval(&[
    &[0x00], // Byte order
    &[0x00, 0x00, 0x00, 0x02], // LineString
    &[0x00, 0x00, 0x00, 0x02], // Number of points
    &ONE_BE, &TWO_BE,
    &TWO_BE, &ONE_BE,
]);

Edit: You can deduce the size with a macro, like the following:

const fn eval<const N: usize>(data: &[&[u8]]) -> [u8; N] {
    let mut result = [0; N];
    
    let mut i = 0;
    let mut result_i = 0;
    while i < data.len() {
        let mut j = 0;
        while j < data[i].len() {
            result[result_i] = data[i][j];
            result_i += 1;
            j += 1;
        }
        i += 1;
    }
    
    result
}

const fn count_len(arr: &[&[u8]]) -> usize {
    let mut result = 0;
    let mut i = 0;
    while i < arr.len() {
        result += arr[i].len();
        i += 1;
    }
    result
}

macro_rules! declare_const {
    ( const $const_name:ident = [ $($data:tt)* ] ) => {
        const $const_name: [u8; count_len(&[ $($data)* ])] = eval(&[ $($data)* ]);
    };
}

declare_const!(const BYTES = [
    &[0x00], // Byte order
    &[0x00, 0x00, 0x00, 0x02], // LineString
    &[0x00, 0x00, 0x00, 0x02], // Number of points
    &ONE_BE, &TWO_BE,
    &TWO_BE, &ONE_BE,
]);

edited Sep 3, 2023 at 10:41

answered Sep 3, 2023 at 4:12

Chayim Friedman

76k5 gold badges96 silver badges139 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Thomas Over a year ago

This is somewhat nice, but hardcoding the total length of the array makes it useless. We could make eval const generic to get around that. However, I'd still have to manually calculate the length of the array at each call site, and the penalty for getting it wrong is pretty severe (playground).

Chayim Friedman Over a year ago

@Thomas Edited for how you can deduce the type automatically.

cafce25 · Accepted Answer · 2023-09-02 12:51:40Z

1

As pointed out in the comments that can be achieved by having a datastructure that can be transmuted to a byte array, a #[repr(C)] struct containing only bytes or arrays thereof would do:

#[repr(C)]
struct Line<const N: usize> {
    byte_order: u8,
    line_string: [u8; 4],
    number_of_points: [u8; 4],
    points: [[[u8; 8]; 2]; N],
}
const ONE: [u8; 8] = [0x3f, 0xf0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00];
const TWO: [u8; 8] = [0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00];
const BYTES: [u8; 41] = unsafe {
    std::mem::transmute(Line {
        byte_order: 0,
        line_string: [0x00, 0x00, 0x00, 0x02],
        number_of_points: [0x00, 0x00, 0x00, 0x02], // Number of points
        points: [[ONE, TWO], [TWO, ONE]],
    })
};

answered Sep 2, 2023 at 12:51

cafce25

29.9k5 gold badges49 silver badges68 bronze badges

1 Comment

Thomas Over a year ago

Alas, this isn't flexible enough for my use case. I need to be able to test with various kinds of invalid data, for example.

Thomas · Accepted Answer · 2023-09-02 12:21:34Z

0

I wrote a macro:

/// Concatenates all given byte arrays into a vector.
macro_rules! concat_bytes {
    ($($array:expr),* $(,)?) => {{
        let mut vec = Vec::<u8>::new();
        $(
            vec.extend($array);
        )*
        vec
    }}
}

Now I can do:

/// 1.0f64, big endian.
const ONE_BE: [u8; 8] = [0x3f, 0xf0, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00];
/// 2.0f64, big endian.
const TWO_BE: [u8; 8] = [0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00];

let bytes = concat_bytes![
    // Individual bytes must be in brackets, but I can live with that:
    [0x00], // Byte order
    [0x00, 0x00, 0x00, 0x02], // LineString
    [0x00, 0x00, 0x00, 0x02], // Number of points
    // Constant arrays:
    ONE_BE, TWO_BE,
    // Or even this:
    2.0f64.to_be_bytes(), 1.0f64.to_be_bytes(),
];

This technique also lets me introduce helper functions if I want.

It's not an array literal because it's constructed at runtime, but in this case that doesn't matter. If anyone comes up with a purely compile-time answer, I'll accept that instead.

edited Sep 2, 2023 at 12:21

answered Sep 2, 2023 at 12:09

Thomas

183k57 gold badges383 silver badges510 bronze badges

5 Comments

Chayim Friedman Over a year ago

Proably better to use lazy initialization, like docs.rs/once_cell.

Thomas Over a year ago

@ChayimFriedman Why?

Chayim Friedman Over a year ago

Just because it's easier this way, you don't need to pass the parameter to each function (but it may be a little (little! very little!) slower).

Thomas Over a year ago

Pass what parameter to which function?

Chayim Friedman Over a year ago

The functions that need to access BYTES.

Collectives™ on Stack Overflow

How can I perform compile-time concatenation of array literals?

3 Answers 3

2 Comments

1 Comment

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related