I compared strings with en_US.utf8 locale and found a strange behavior.
Let's take:
std::string a = "A";
std::string b = "a";
C locale (default one) says that a < b, when utf8 - a > b. However, when
std::string a = "A0";
std::string b = "ad";
Both locales give a result a < b.
Code to check:
#include <iostream>
#include <string>
void cmp_strs(std::locale l, std::string s1, std::string s2) {
auto &f = std::use_facet<std::collate<char>>(l);
std::cout << l.name() << ": ";
std::cout << f.compare(&s1[0], &s1[0] + s1.size(), &s2[0],
&s2[0] + s2.size()) << ' ';
std::cout << "\n";
}
int main() {
std::cout << "A v a\n";
std::string a = "A";
std::string b = "a";
cmp_strs(std::locale("C"), a, b);
cmp_strs(std::locale("en_US.utf8"), a, b);
std::cout << '\n';
std::cout << "A0 v ad\n";
a = "A0";
b = "ad";
cmp_strs(std::locale("C"), a, b);
cmp_strs(std::locale("en_US.utf8"), a, b);
}
What is weird here is that "A0" and "ad" have to be compared by first symbols, A and a, and utf8 returns "less". But when I compare just the first symbols (the first case), it's the "greater".
en_US.utf8locale is (at least partially) case-insensitive andClocale is not.