2

I am trying to query data from a mysql database, which contains some strings, of course. For the connection and data retrieval I am using RMySQL in R, which works fine. Apart from one thing: the strings I am retrieving seem not to be in utf8. But I need this, because I have some german "Umlaute" in these strings. When I ask teh databse, which are its encoding by

dbGetQuery(db, "SHOW VARIABLES LIKE 'character_set_%';")

I get the desired answer:

             Variable_name           Value
1   character_set_client             utf8
2   character_set_connection         utf8
3   character_set_database           utf8
4   character_set_filesystem         binary
5    character_set_results           utf8
6     character_set_server           utf8
7     character_set_system           utf8
8       character_sets_dir C:\\Program Files\\MySQL\\MySQL Server 5.7\\share\\charsets\\

But e.g. I receive

Andreas Wünsche

instead of

Andreas Wünsche

Hope that somebody knows how to deal with it. If additonal information is needed, just ask. I can provide it.

3
  • Have you tried to change the default encoding of R ? If you are using R studio go : Tool -> Global Options -> Code -> Saving and put UTF-8 Commented Jul 13, 2016 at 9:17
  • yes, it is already set to utf8 in Rstudio Commented Jul 13, 2016 at 10:39
  • Ok. I have tried to use some iconv function to set the Wünsche to the proper format but didn't find any solution... i will continue Commented Jul 13, 2016 at 11:40

3 Answers 3

3

I find something a bit tricky but works for me :

you have to manually define the col of your data frame to utf-8 like this :

x <- "Wünsche"
Encoding(x) <- "UTF-8"
x
[1] "Wünsche"

Think you have to do this to all your strings vector

EDIT :

Take a look here
seems to fix the same problem by adding 'set character set "utf8"'inside the dbSendQuery()

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, that helps. But I would like to know, how this happens, that the encoding is set to "unkown".
may be the SQL query dont apply the format to the R object
The character set approach did not work for me. I have to keep your encoding workaround. Thanks, again.
2

I took this answer from: https://stat.ethz.ch/pipermail/r-sig-db/2012q1/001141.html Before dbSendQuery you have to place dbGetQuery(mydb, "SET NAMES 'utf8'")

mydb <-  dbConnect(MySQL(), user = db_user, password = db_password,
               dbname = db_name, host = db_host, port = db_port)

s=dbGetQuery(mydb, "SET NAMES 'utf8'") 
s=paste0("select * from ", db_table) 
rs=dbSendQuery(mydb, s)
df=fetch(rs, n = -1)

Comments

0

When trying to use utf8/utf8mb4, if you see Mojibake, check the following. This discussion also applies to Double Encoding, which is not necessarily visible.

  • The bytes to be stored need to be utf8-encoded.
  • The connection when INSERTing and SELECTing text needs to specify utf8 or utf8mb4.
  • The column needs to be declared CHARACTER SET utf8 (or utf8mb4).
  • HTML should start with <meta charset=UTF-8>.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.