We have a database and R scripts running on different virtual machines. First we connect to db
con <- dbConnect(
odbc(),
Driver = "SQL Server",
Server = "server", Database = "db", UID = "uid", PWD = "pwd",
encoding = "UTF-8"
)
and gathering the data
data <- dbGetQuery(con, "SELECT * FROM TableName")
The problem is following: when different machines running the same script, for some of these we face character variables encoding issues.
For example, this is what we have on machine A
> data$char_var[1]
[1] "фамилия"
> Encoding(data$char_var[1])
[1] "UTF-8"
> Sys.getlocale()
[1] "LC_COLLATE=Russian_Russia.1251;LC_CTYPE=Russian_Russia.1251;LC_MONETARY=Russian_Russia.1251;LC_NUMERIC=C;LC_TIME=Russian_Russia.1251"
> Encoding(data$char_var[1]) <- "1251"
> data$char_var[1]
[1] "гревцев"
and this is what we have on machine B
> data$char_var[1]
[1] "<e3><f0><e5><e2><f6><e5><e2>"
> Encoding(data$char_var[1])
[1] "UTF-8"
> Sys.getlocale()
[1] "LC_COLLATE=Russian_Russia.1251;LC_CTYPE=Russian_Russia.1251;LC_MONETARY=Russian_Russia.1251;LC_NUMERIC=C;LC_TIME=Russian_Russia.1251"
> Encoding(data$char_var[1]) <- "1251"
> data$char_var[1]
[1] "фамилия"
First script returns gibberish, but it prints the initial value correctly. The same code running on machine B initially prints utf-8 and then returns encoded values. What could be the reason for such a difference?
As a result we want a script that would have the same "фамилия" output value to show it on a dashboard.