1

First of all This is not a duplicate of this SO question here .I have a csv file encoded in Shift-JIS this is my script to parse the file

require 'csv'
str1 = '社員番号'
str2 = 'メールアドレス'
str1.force_encoding("Shift_JIS").encode!
str2.force_encoding("Shift_JIS").encode!
file=File.open("SyainInfo.csv", "r:Shift_JIS")
csv = CSV.read(file, headers: true)
p csv[str1]
p csv [str2]

but even after specifying enconding, I am getting invalid byte sequence in UTF-8 (ArgumentError) . Any thoughts? My ruby is 2.3.0

0

1 Answer 1

3

First of all, your encoding doesn't look right:

'社員番号'.force_encoding("Shift_JIS").encode!
#=> "\x{E7A4}\xBE\x{E593}\xA1\x{E795}\xAA\x{E58F}\xB7"

force_encoding takes the bytes from str1 and interprets them as Shift JIS, whereas you probably want to convert the string to Shift JIS:

'社員番号'.encode('Shift_JIS')
#=> "\x{8ED0}\x{88F5}\x{94D4}\x{8D86}"

Next, you can pass a filename to CSV.read, so instead of:

file = File.open(filename)
CSV.read(file)

You can just write:

CSV.read(filename)

That said, you could either work with Shift JIS encoded strings:

require 'csv'
str1 = '社員番号'.encode("Shift_JIS")
str2 = 'メールアドレス'.encode("Shift_JIS")
csv = CSV.read('SyainInfo.csv', encoding: 'Shift_JIS', headers: true)
csv[str1]
csv[str2]

Or – and that's what I would do – you could work with UTF-8 strings by specifying a second encoding:

require 'csv'
str1 = '社員番号'
str2 = 'メールアドレス'
csv = CSV.read('SyainInfo.csv', encoding: 'Shift_JIS:UTF-8', headers: true)
csv[str1]
csv[str2]

encoding: 'Shift_JIS:UTF-8' instructs CSV to read Shift JIS data and transcode it to UTF-8. It's equivalent to passing 'r:Shift_JIS:UTF-8' to File.open

Sign up to request clarification or add additional context in comments.

8 Comments

many thanks for your help sir , but when I tried your solution I am getting 'in gets': "\xFB\xFC" from Shift_JIS to UTF-8 (Encoding::UndefinedConversionError)
@TonyVincent where does that error come from? I don't see a gets in your code.
Thats from rub's csv class usr/local/lib/ruby/2.3.0/csv.rb:1807:in gets...
@TonyVincent can you upload your SyainInfo.csv somewhere? (or a sample file with the very same encoding)
but i don't know why csv = CSV.read('SyainInfo.csv', encoding: 'Shift_JIS:UTF-8', headers: true) throws Shift_JIS to UTF-8 (Encoding::UndefinedConversionError)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.