Here is a snippet from data encoding in R memory. The CSV file was read with encoding "Latin-1" using data.table::fread.
As this piece suggests, the data is stored with different encodings, which is not desirable because I'll leave the data in a SQLite database, so whenever I send data to database and call it back, Latin-1 is not read in appropriately. Is there a way to normalize this?
It seem that common functions like iconv won't work, once the data have multiple encodings in different parts of the data.frame.
Encoding(Data$DESC)
[5305] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
[5311] "unknown" "unknown" "unknown" "latin1" "unknown" "unknown"
[5317] "unknown" "latin1" "latin1" "latin1" "latin1" "unknown"
[5323] "latin1" "latin1" "latin1" "latin1" "unknown" "latin1"
df1 <- data.frame(matrix(letters[1:24],ncol=4),stringsAsFactors=FALSE). The commandsapply(df1,Encoding)shows "unknown" for all entries. I'd be interested to see how the encoding of individual entries can be changed.