Strange string comparison using different locales (C and en_US.utf8)

I compared strings with en_US.utf8 locale and found a strange behavior.

Let's take:

std::string a = "A";
std::string b = "a";

C locale (default one) says that a < b, when utf8 - a > b. However, when

std::string a = "A0";
std::string b = "ad";

Both locales give a result a < b.

Code to check:

#include <iostream>
#include <string>


void cmp_strs(std::locale l, std::string s1, std::string s2) {
  auto &f = std::use_facet<std::collate<char>>(l);

  std::cout << l.name() << ": ";
  std::cout << f.compare(&s1[0], &s1[0] + s1.size(), &s2[0],
                         &s2[0] + s2.size()) << ' ';
  std::cout << "\n";
}

int main() {

    std::cout << "A v a\n";
    std::string a = "A";
    std::string b = "a";
    
    cmp_strs(std::locale("C"), a, b);    
    cmp_strs(std::locale("en_US.utf8"), a, b);

    std::cout << '\n';

    std::cout << "A0 v ad\n";
    a = "A0";
    b = "ad";
    
    cmp_strs(std::locale("C"), a, b);    
    cmp_strs(std::locale("en_US.utf8"), a, b);
    
}

Compare function

What is weird here is that "A0" and "ad" have to be compared by first symbols, A and a, and utf8 returns "less". But when I compare just the first symbols (the first case), it's the "greater".

asked Jul 21, 2024 at 10:54

ксения петренко

12 bronze badges

3

Note that en_US.utf8 locale is (at least partially) case-insensitive and C locale is not.

clstrfsck
– clstrfsck

2024-07-21 10:56:50 +00:00
Commented Jul 21, 2024 at 10:56
1

The results are consistent with using case as the secondary tie-breaker, when the strings would be equal by case-insensitive comparison. See also: Unicode collation algorithm. It is much more elaborate than just comparing strings character by character.

Igor Tandetnik
– Igor Tandetnik

2024-07-21 15:37:23 +00:00
Commented Jul 21, 2024 at 15:37
@clstrfsck Thank you, I suppose it's an answer. Could you please write a full answer, not a comment so I can mark it? Can I also ask you to mention @ Igor Tandetnik who provided more details?

ксения петренко
– ксения петренко

2024-07-21 19:51:38 +00:00
Commented Jul 21, 2024 at 19:51

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Strange string comparison using different locales (C and en_US.utf8)

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest