0

I am trying to process some Google Adwords csv files. The files are available in UNICODE format. When I use Ruby CSV parser to parse the file. I am not able to read the file. The characters display as \x00a \x00b etc.

I ended up having to open the file in OpenOffice and choose UTF-8 to render the file and then save it. After that, Ruby CSV can process the file. I also have to remove the first character in the csv file that looks like number 8 in black circle because it is not a valid UTF-8 character. This special character was the result of UNICODE to UTF-8 conversion in OpenOffice.

So what is the best way to convert the csv file to a Ruby friendly encoding without illegal characters?

To see what I can mean, you can try open Ruby CSV to open this file and parse the lines.

https://github.com/zben/encoding_test/blob/master/encoding_test.csv

1
  • file says encoding_test.csv: Little-endian UTF-16 Unicode text Commented Apr 13, 2013 at 22:36

1 Answer 1

0

This page suggests using Iconv.iconv to convert:

doc = Iconv.iconv('UTF-8', 'UTF-16', doc)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.