0

I compared strings with en_US.utf8 locale and found a strange behavior.

Let's take:

std::string a = "A";
std::string b = "a";

C locale (default one) says that a < b, when utf8 - a > b. However, when

std::string a = "A0";
std::string b = "ad";

Both locales give a result a < b.

Code to check:

#include <iostream>
#include <string>


void cmp_strs(std::locale l, std::string s1, std::string s2) {
  auto &f = std::use_facet<std::collate<char>>(l);

  std::cout << l.name() << ": ";
  std::cout << f.compare(&s1[0], &s1[0] + s1.size(), &s2[0],
                         &s2[0] + s2.size()) << ' ';
  std::cout << "\n";
}

int main() {

    std::cout << "A v a\n";
    std::string a = "A";
    std::string b = "a";
    
    cmp_strs(std::locale("C"), a, b);    
    cmp_strs(std::locale("en_US.utf8"), a, b);

    std::cout << '\n';

    std::cout << "A0 v ad\n";
    a = "A0";
    b = "ad";
    
    cmp_strs(std::locale("C"), a, b);    
    cmp_strs(std::locale("en_US.utf8"), a, b);
    
}

Compare function

What is weird here is that "A0" and "ad" have to be compared by first symbols, A and a, and utf8 returns "less". But when I compare just the first symbols (the first case), it's the "greater".

3
  • 3
    Note that en_US.utf8 locale is (at least partially) case-insensitive and C locale is not. Commented Jul 21, 2024 at 10:56
  • 1
    The results are consistent with using case as the secondary tie-breaker, when the strings would be equal by case-insensitive comparison. See also: Unicode collation algorithm. It is much more elaborate than just comparing strings character by character. Commented Jul 21, 2024 at 15:37
  • @clstrfsck Thank you, I suppose it's an answer. Could you please write a full answer, not a comment so I can mark it? Can I also ask you to mention @ Igor Tandetnik who provided more details? Commented Jul 21, 2024 at 19:51

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.