0

I'm reading a binary file, with some text fields inside of it.

The original field value is "asdf è" (i cannot change the binary file encoding)

The UTF-8 encoding would be "asdf \xc3\xa8"

What i'm reading is instead "asdf \xc3\x83\xc2\xa8" so the NSString stringWithUTF8String: or initWithCString method gives to me "asdf è"

How to get back the right "asdf è" value?

thanks

1 Answer 1

2

That is really a strange "encoding" that you read from the binary file.

  • C3 A8 is the UTF-8 sequence for U+00C3 ("Ã")
  • C2 A8 it the UTF-8 sequence for U+00A8 ("¨")
  • The lower bytes of these Unicodes C3 A8 are the UTF-8 sequence for U+00E8 ("è")

The following "trick" uses the ISO Latin 1 encoding to convert the Characters U+00C3 U+00A8 to the bytes C3 A8:

char *s = "\xc3\x83\xc2\xa8";
NSString *s1 = [[NSString alloc] initWithBytes:s length:strlen(s) encoding:NSUTF8StringEncoding];
NSLog(@"%@", s1);   // è
NSData *d = [s1 dataUsingEncoding:NSISOLatin1StringEncoding];
NSLog(@"%@", d);    // <c3a8>
NSString *s2 = [[NSString alloc] initWithData:d encoding:NSUTF8StringEncoding];
NSLog(@"%@", s2);   // è
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.