3

Here is a snippet from data encoding in R memory. The CSV file was read with encoding "Latin-1" using data.table::fread. As this piece suggests, the data is stored with different encodings, which is not desirable because I'll leave the data in a SQLite database, so whenever I send data to database and call it back, Latin-1 is not read in appropriately. Is there a way to normalize this? It seem that common functions like iconv won't work, once the data have multiple encodings in different parts of the data.frame.

Encoding(Data$DESC)

 [5305] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
 [5311] "unknown" "unknown" "unknown" "latin1"  "unknown" "unknown"
 [5317] "unknown" "latin1"  "latin1"  "latin1"  "latin1"  "unknown"
 [5323] "latin1"  "latin1"  "latin1"  "latin1"  "unknown" "latin1" 
4
  • 2
    Please provide a reproducible example. And give your session info output including the version of data table. Commented Feb 14, 2016 at 15:17
  • What RDBS do use, may be you can set the encoding at the client the side stackoverflow.com/a/6477516/3338646 Commented Feb 14, 2016 at 15:27
  • 2
    I don't know who downvoted this question, but sometimes a proof of research effort or clarity is not just a matter of providing a reproducible example. I think this is a good question, and if you really need a dataset, try e.g., df1 <- data.frame(matrix(letters[1:24],ncol=4),stringsAsFactors=FALSE). The command sapply(df1,Encoding) shows "unknown" for all entries. I'd be interested to see how the encoding of individual entries can be changed. Commented Feb 14, 2016 at 18:04
  • That sounds like a bug in RSQlite - it should always convert to UTF-8 before sending to the db. Commented Feb 17, 2016 at 4:51

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.