mariadb-dump with database name containing unicode characters?

Ask Question

Asked 2 years, 1 month ago

Modified 2 years, 1 month ago

Viewed 199 times

Trying to author a mariadb-dump.exe command line to back up our MariaDB 10.11 databases on Windows (Swedish locale, win-1252 system codepage). Unfortunately we expect full unicode support and named a database so it contains Swedish character "ö".

I tried this command line:

mariadb-dump.exe --user=root --password=XXX --opt --result-file=backup.sql --default-character-set=utf8mb4 --quote-names dbföretag FirstTable SecondTable

I get this error:

mariadb-dump.exe: Error: 'Illegal mix of collations (utf8mb3_general_ci,IMPLICIT) and (utf8mb4_general_ci,COERCIBLE) for operation '='' when trying to dump tablespaces
mariadb-dump.exe: Got error: 1300: "Invalid utf8mb4 character string: 'dbf\xF6retag'" when selecting the database

I am trying to resolve the second error, which seems to indicate that mariadb-dump.exe fails to correctly encode the database name when sending it to the server, or the server incorrectly interprets the string when received.

I tried chcp 65001 in the cmd.exe session before running mariadb-dump.exe, but I get the exact same result.

The character "ö" has unicode codepoint U+00F6, which matches \xF6 in the error message, but UTF-8 encodes it as 0xC3 0xB6. Since I get this same result regardless of which chcp I use in the cmd.exe session, I conclude that mariadb-dump.exe correctly interprets the command line and understands that the "ö" is unicode codepoint U+00F6.

But it seems to fail to convert it into the encoding that should be sent to the server. Instead of encoding U+00F6 into utf-8 \xC3\xB6 it passes the unicode codepoint without conversion, as \xF6. I fail to see how that could work regardless of encoding. Is there ANY unicode encoding that uses 1 byte per character up to and including code point U+00F6?

As a work around I am able to artificially create the correct utf-8 string by passing on the commnad line the two characters with unicode code points U+00C3 and U+00B6, i.e. using characters Ã¶. Since it's the unicode code points of those characters that matter and not how they are encoded in the cmd.exe session's code page, these two characters give the correct result regardless of which code page is being used in the cmd.exe session.

So, this command line works:

mariadb-dump.exe --user=root --password=XXX --opt --result-file=backup.sql --default-character-set=utf8mb4 --quote-names dbfÃ¶retag FirstTable SecondTable

Is there any way I can get mariadb-dump.exe to encode the database name into utf-8 correctly?

I tried adding these lines to my.ini, but it doesn't help:

[client]
character_set_connection=utf8mb4
collation_connection=utf8mb4_bin

Is this a bug in mariadb-dump.exe? In the server? In the MariaDB client library being used by mariadb-dump.exe? Or what?

UPDATE: Bug reported to MariaDB: https://jira.mariadb.org/browse/MDEV-32264

UPDATE: As can be read in the answers to the bug report above, the current implementation is said to work in newer Windows versions, and the one we have (Server 2019) will be out of mainstream support soon and it would be a significant effort to fix it so it works with that old Windows version. So, they won't fix anything. Instead, we will plan an upgrade to Server 2022 and hope that the problem will go away. In the meantime, using a work around.

edited Oct 10, 2023 at 4:42

asked Sep 26, 2023 at 15:45

Kjell Rilbe

1,62318 silver badges43 bronze badges

Any input re. the first error message is also welcome, although not really the subject of this question.

Kjell Rilbe
– Kjell Rilbe

2023-09-26 15:45:47 +00:00
Commented Sep 26, 2023 at 15:45
You shpould always use the same version of zhe database ad then upgrade, but you can open the my.cnf/ini and look for the default character set collation and change it to utf8mb3 and utf8mb3_general_ci

nbk
– nbk

2023-09-26 15:49:43 +00:00
Commented Sep 26, 2023 at 15:49
if you have a linux system you can try stackoverflow.com/a/62309115/5193536

nbk
– nbk

2023-09-26 15:55:22 +00:00
Commented Sep 26, 2023 at 15:55
1

Bug submitted to MariaDB: jira.mariadb.org/browse/MDEV-32264

Kjell Rilbe
– Kjell Rilbe

2023-09-28 08:31:31 +00:00
Commented Sep 28, 2023 at 8:31
1

Let me clarify what happens here. mariadb-dump is a C++ program, which gets its arguments from the main() function. The arguments encoded using ANSI codepage, and it does not matter how much you "chcp" . The ANSI codepage is likely win-1252, mostly latin1. In that codepage ö corresponds to the byte 0xf6. mariadb-dump connects to the database server, using incorrectly encoded dbname- it tells the server, it is using UTF-8, but dbname is latin1. On newer Windows, ANSI codepage for MariaDB executables is UTF-8, so 0xf6 won't be sent with dbname, but correct UTF-8 byte sequence instead.

Vladislav Vaintroub
– Vladislav Vaintroub

2023-10-14 16:46:31 +00:00
Commented Oct 14, 2023 at 16:46

| Show 10 more comments

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

mariadb-dump with database name containing unicode characters?

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked