0

I have a file that i need to import. The problem is that I have problems with a lot of characters in that file.

For example these names are wrong:

Björn (in file) - Should be Björn

Ã…ke (in file) - Should be Åke

Unfortunately I can't recreate the file with the correct encoding. Also there are a lot of characters that are wrong (these was just examples). I can't do a search and replace on all (if there isn't a dictionary with all conversions).

Can I decode the strings in some way?

thanks Patrik

Edit: Just some more info that I should added before (I blame my tiredness). The file is an .xlsx file.

2
  • UTF-8? I'm not sure if I understand your question well: 1) do you know which encoding is used and don't know how to use it in .NET or 2) are you looking for a way to determine the encoding? Commented Oct 13, 2011 at 21:06
  • 1
    You can try and save the file as Unicode. Notepad, file save as, pick unicode. If the file was saved previously with the wrong encoding, then they will have resend the file with the correct encoding. Unincode would be preferred as all the characters will be there. The same goes try for opening, the right encoding should be used to open and read the file, otherwise not all the characters may be able to be read in. Commented Oct 13, 2011 at 21:07

2 Answers 2

4

I debugged this with Notepad++. I copied the correct strings into Notepad++. I used Encoding | Convert to UTF-8. Then I selected Encoding | Encode as ANSI. This has the effect of interpreting the UTF-8 bytes as if they were ANSI. And when I did this I end up with the same erroneous values as you. So clearly when you read the file you are interpreting is as ANSI rather than UTF-8.

The solution then is that your file has been encoded as UTF-8. Make sure that the file is interpreted as UTF-8 when you read it. I can't tell you exactly how to do that since you didn't show how you were reading the file in the first place.

It's possible that your file does not contain a byte-order-mark (BOM). If so then specify the encoding when you read the file by passing Encoding.UTF8.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks a lot. You solved my problem!!! In Excel the characters in the file looked wrong (as I described earlier) and also when I imported the content with Linq to Excel. I saved the file (in Excel) to an ordinary text file and now the characters are correct.
0

I've just tried your first example, and it definitely looks like that's UTF-8.

It's unclear what you're using to look at the file in the first place, but if you load it with a text editor which understands UTF-8 and tell it that it's a UTF-8 file, it should be fine.

When you load it with .NET, you should just be able to use File.OpenText, File.ReadAllText etc - most IO dealing with encodings in .NET defaults to UTF-8 anyway.

1 Comment

It's probably a UTF-8 file with no BOM

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.