2

I am reading a table from a SQL database, with a column of cities in (English, Arabic and possibly other languages). The encoding type for all is Unknown, when I try to force them using Encoding, some change and some remain Unknown.

I tried using dbGetQuery but it is the same :

con <- dbConnect(RMySQL::MySQL(), host = "***",dbname="***",user = "***", password = "***")

dbGetQuery(con,"set names utf8")

Q1 <- dbSendQuery(con, "SELECT * FROM cities")

city <- fetch(Q1, n = -1)

> Encoding(city$name) %>% table()
.
unknown 
  45734 

When I force the change, some get converted, but entries with Arabic characters for example do not get converted.

> Encoding(city$name) <- "UTF-8"
> Encoding(city$name) %>% table()
.
unknown   UTF-8 
  44920     814 

And here is the result of SHOW VARIABLES LIKE 'character_set_%

dbSendQuery(con, "SET NAMES UTF8; ")
<MySQLResult:8,3,3>
> dbGetQuery(con, "SHOW VARIABLES LIKE 'character_set_%'")
             Variable_name                      Value
1     character_set_client                       utf8
2 character_set_connection                       utf8
3   character_set_database                     latin1
4 character_set_filesystem                     binary
5    character_set_results                       utf8
6     character_set_server                     latin1
7     character_set_system                       utf8
8       character_sets_dir /usr/share/mysql/charsets/

And here's the session info

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stringi_1.1.2 tidyr_0.5.1   purrr_0.2.2   dplyr_0.5.0   RMySQL_0.10.9 DBI_0.4-1    

loaded via a namespace (and not attached):
[1] magrittr_1.5   R6_2.1.2       assertthat_0.1 tools_3.3.1    tibble_1.1     Rcpp_0.12.6

And here's the encoding in the db: table info from php my admin

1 Answer 1

1

As said in the docs:

ASCII strings will never be marked with a declared encoding, since their representation is the same in all supported encodings.

Illustration:

v <- c("café", "floor", "window", "naïve")
Encoding(v) <- "UTF-8"

Encoding(v)
# [1] "UTF-8"   "unknown" "unknown" "UTF-8"

Some of your city names are in English, so they probably don't contain non-ASCII characters.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.