6

From a byte array, I want to convert a slice to a string using the ASCII-encoding. The solution

fn main() {
    let buffer: [u8; 9] = [255, 255, 255, 255, 77, 80, 81, 82, 83];
    let s = String::from_iter(buffer[5..9].iter().map(|v| { *v as char }));
    println!("{}", s);
    assert_eq!("PQRS", s);
}

does not seem to be idiomatic, and has a smell of poor performance. Can we do better? Without external crates?

2 Answers 2

5

A Rust string can be directly created from a UTF-8 encoded byte buffer like so:

fn main() {
    let buffer: [u8; 9] = [255, 255, 255, 255, 77, 80, 81, 82, 83];
    let s = std::str::from_utf8(&buffer[5..9]).expect("invalid utf-8 sequence");
    println!("{}", s);
    assert_eq!("PQRS", s);
}

The operation can fail if the input buffer contains an invalid UTF-8 sequence, however ASCII characters are valid UTF-8 so it works in this case.

Note that here, the type of s is &str, meaning that it is a reference to buffer. No allocation takes place here, so the operation is very efficient.

See it in action: Playground link

Sign up to request clarification or add additional context in comments.

3 Comments

So the solution was more about UTF-8: ASCII characters are valid UTF-8. I didn't know.
This is correct. This even was one of the original design goals of UTF-8. Be backwards compatible with ASCII.
Out of curiosity, how would one do the opposite? ie. take an ASCII string like 'a' and convert it to the numeric value (0x61).
1

As SirDarius said you can try to use core::str::from_utf8. But you need to understand that not every UTF8 string is an ASCII string. What I mean is: just because a byte array can be interpreted as a UTF8 string, that does not mean it can be interpreted as an ASCII string.

In other words, core::str::from_utf8 will only work if you already know your byte array is truly ASCII.

But in that case it's more efficient to just use core::str::from_utf_unchecked, as the documentation on from_utf8 says:

If you are sure that the byte slice is valid UTF-8, and you don’t want to incur the overhead of the validity check, there is an unsafe version of this function, from_utf8_unchecked, which has the same behavior but skips the check.

Here's an example where you can get a valid string from an invalid ASCII array:

fn main() {
    let buffer = [ 226, 154, 160 ];
    //             ^^^  ^^^  ^^^ None of these are valid ASCII characters
    let str = core::str::from_utf8(&buffer).unwrap(); // Doesn't panic
    println!("{}", str); // Prints "⚠"
}

Run this example yourself

Instead you need to first scan the byte array for invalid ASCII characters.

Solution

fn get_ascii_str<'a>(buffer: &'a [u8]) -> Result<&'a str, ()> {
    for byte in buffer.into_iter() {
        if byte >= &128 {
            return Err(());
        }
    }
    Ok(unsafe {
        // This is safe because we verified above that it's a valid ASCII
        // string, and all ASCII strings are also UTF8 strings
        core::str::from_utf8_unchecked(buffer)
    })
}

Note: this function will work in [no_std] environments.

Example:

fn main() {
    let buffer = [ 226, 154, 160 ]; // UTF8 bytes for "⚠"
    //             ^^^  ^^^  ^^^ None of these are valid ASCII characters
    assert_eq!(Err(()), get_ascii_str(&buffer)); // Correctly fails to interpret as ASCII
    let buffer = [
        'H' as u8,
        'e' as u8,
        'l' as u8,
        'l' as u8,
        'o' as u8,
        ',' as u8,
        ' ' as u8,
        'w' as u8,
        'o' as u8,
        'r' as u8,
        'l' as u8,
        'd' as u8,
        '!' as u8,
    ];
    let str = get_ascii_str(&buffer).unwrap();
    println!("{}", str); // Prints "Hello, world!"
}

fn get_ascii_str<'a>(buffer: &'a [u8]) -> Result<&'a str, ()> {
    // See implementation above
}

Run this example yourself

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.