13

I would like to know how can I change the encoding of my CSV file when I import it and parse it. I have this code:

csv = CSV.parse(output, :headers => true, :col_sep => ";")
csv.each do |row|
  row = row.to_hash.with_indifferent_access
  insert_data_method(row)
end

When I read my file, I get this error:

Encoding::CompatibilityError in FileImportingController#load_file
incompatible character encodings: ASCII-8BIT and UTF-8

I read about row.force_encoding('utf-8') but it does not work:

NoMethodError in FileImportingController#load_file
undefined method `force_encoding' for #<ActiveSupport::HashWithIndifferentAccess:0x2905ad0>

Thanks.

1
  • Instead of converting it to a different encoding, would it be possible to add a step of indirection and output separate files? For example, a text file is encoded UTF-8 in some parts but UTF-16LE in others. As long as the headers are identical, output one file to filename_utf8.txt and another to filename_utf16le.txt. This way might make it possible to not force encoding. Commented Jul 17, 2014 at 13:14

3 Answers 3

16

I had to read CSV files encoded in ISO-8859-1. Doing the documented

CSV.foreach(filename, encoding:'iso-8859-1:utf-8', col_sep: ';', headers: true) do |row|

threw the exception

ArgumentError: invalid byte sequence in UTF-8
    from csv.rb:2027:in '=~' 
    from csv.rb:2027:in 'init_separators' 
    from csv.rb:1570:in 'initialize' 
    from csv.rb:1335:in 'new' 
    from csv.rb:1335:in 'open' 
    from csv.rb:1201:in 'foreach'

so I ended up reading the file and converting it to UTF-8 while reading, then parsing the string:

CSV.parse(File.open(filename, 'r:iso-8859-1:utf-8'){|f| f.read}, col_sep: ';', headers: true, header_converters: :symbol) do |row|
    pp row
end
Sign up to request clarification or add additional context in comments.

Comments

7

force_encoding is meant to be run on a string, but it looks like you're calling it on a hash. You could say:

output.force_encoding('utf-8')
csv = CSV.parse(output, :headers => true, :col_sep => ";")
...

6 Comments

I just tried it. I get this error: ArgumentError in FileImportingController#load_file invalid byte sequence in UTF-8
try running this instead: Iconv.conv('utf-8//IGNORE','utf-8',output)
Unfortunately, I get that error: Encoding::CompatibilityError in FileImportingController#load_file incompatible character encodings: ASCII-8BIT and UTF-8
I assume you don't really care about changing the encode type, your goal is to parse the file. where are you loading your string from ? Maybe there's another approach that can be taken.
I am loading it from a CSV file. Now it works, I have changed the encoding directly from the file.
|
2

Hey I wrote a little blog post about what I did, but it's slightly more verbose than what's already been posted. For whatever reason, I couldn't get those solutions to work and this did.

This gist is that I simply replace (or in my case, remove) the invalid/undefined characters in my file then rewrite it. I used this method to convert the files:

def convert_to_utf8_encoding(original_file)  
  original_string = original_file.read
  final_string = original_string.encode(invalid: :replace, undef: :replace, replace: '') #If you'd rather invalid characters be replaced with something else, do so here.
  final_file = Tempfile.new('import') #No need to save a real File
  final_file.write(final_string)
  final_file.close #Don't forget me
  final_file
end 

Hope this helps.

Edit: No destination encoding is specified here because encode assumes that you're encoding to your default encoding which for most Rails applications is UTF-8 (I believe)

1 Comment

Taking the string and using 'encode' to remove the invalid and undefined characters is what worked for me. Perfect, thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.