Data frame with different encodings in R

Here is a snippet from data encoding in R memory. The CSV file was read with encoding "Latin-1" using data.table::fread. As this piece suggests, the data is stored with different encodings, which is not desirable because I'll leave the data in a SQLite database, so whenever I send data to database and call it back, Latin-1 is not read in appropriately. Is there a way to normalize this? It seem that common functions like iconv won't work, once the data have multiple encodings in different parts of the data.frame.

Encoding(Data$DESC)

 [5305] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
 [5311] "unknown" "unknown" "unknown" "latin1"  "unknown" "unknown"
 [5317] "unknown" "latin1"  "latin1"  "latin1"  "latin1"  "unknown"
 [5323] "latin1"  "latin1"  "latin1"  "latin1"  "unknown" "latin1"

asked Feb 14, 2016 at 14:41

Marie-Eve

5856 silver badges16 bronze badges

2

Please provide a reproducible example. And give your session info output including the version of data table.

Arun
– Arun

2016-02-14 15:17:45 +00:00
Commented Feb 14, 2016 at 15:17
What RDBS do use, may be you can set the encoding at the client the side stackoverflow.com/a/6477516/3338646

huckfinn
– huckfinn

2016-02-14 15:27:14 +00:00
Commented Feb 14, 2016 at 15:27
2

I don't know who downvoted this question, but sometimes a proof of research effort or clarity is not just a matter of providing a reproducible example. I think this is a good question, and if you really need a dataset, try e.g., df1 <- data.frame(matrix(letters[1:24],ncol=4),stringsAsFactors=FALSE). The command sapply(df1,Encoding) shows "unknown" for all entries. I'd be interested to see how the encoding of individual entries can be changed.

RHertel
– RHertel

2016-02-14 18:04:43 +00:00
Commented Feb 14, 2016 at 18:04
That sounds like a bug in RSQlite - it should always convert to UTF-8 before sending to the db.

hadley
– hadley

2016-02-17 04:51:47 +00:00
Commented Feb 17, 2016 at 4:51

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Data frame with different encodings in R

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked