20

I am having issues with removing trailing null characters from UTF-8 encoded strings:

enter image description here

How would one go about removing these characters from a String?

Here is the code I use to create the String from a Vec:

let mut data: Vec<u8> = vec![0; 512];
// populate data
let res = String::from_utf8(data).expect("Found invalid UTF-8");
3
  • 1
    Might be related to stackoverflow.com/questions/31101915/… Commented Mar 21, 2018 at 12:38
  • 1
    Is the padding all \0? Usually C strings contain one \0 right after the string and every character after that might be garbage. Commented Mar 21, 2018 at 16:58
  • 3
    I'm glad you got the answer, but it really seems like the right answer here might be to not put the null bytes there to begin with. I'm assuming your snippet is not your actual real code, and only an example? Commented Mar 21, 2018 at 17:43

1 Answer 1

46

You can trim custom patterns from a string using trim_matches. The pattern can be a null character:

fn main() {
    let mut data: Vec<u8>  = vec![0; 8];
    
    data[0] = 104;
    data[1] = 105;
    
    let res = String::from_utf8(data).expect("Found invalid UTF-8");
    println!("{}: {:?}", res.len(), res);
    // 8: "hi\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}"
    
    let res = res.trim_matches(char::from(0));
    println!("{}: {:?}", res.len(), res);
    // 2: "hi"
}

This removes 0 from both sides. If you only want to remove trailing 0s use trim_end_matches instead.

Sign up to request clarification or add additional context in comments.

4 Comments

Awesome! I knew there had to be an easy way to do this!
@T-Pane Btw, if the 0-bytes are there because you pre-allocate a data buffer you may be able to avoid them altogether by initializing the data with Vec::with_capacity(512) and filling it as required.
CStr::from_bytes_until_nul(&buffer).unwrap().to_str()?.to_owned() is faster. My tests result on 100_000 iterate. from_utf8 took 43ms while from_bytes_until_nul took 11ms
The variant by @CorrM has slightly different semantics: from_bytes_until_nul fails if the input does not contain a null byte at all. Depending on the use-case, the Vec<u8> may or may not include the null byte. It certainly does not need one since the Vec itself has a length.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.