3

I am trying to implement a C++ Translate function for localization.

// Language package, containing key-value pairs of translation, e.g.,
// g_LanguagePack["HELLO@1"] = "Hello, {}!"
// The "@N" suffix indicates that this format string has N parameters.
// When there is no parameter, the suffix can be omitted.
std::unordered_map<std::string, std::string> g_LanguagePack;

// The translator function
template <typename VA...>
std::sting Translate(const std::string& key, VA&&... params);

When invoked, e.g., Translate("HELLO@1", "FOO") will do a lookup in the language package and return the localized string "Hello, FOO!".

key is guaranteed to be a compile-time string (so key's type may need to be changed), and in practice developers may provide mismatching number of parameters, or missing @N while providing parameters. So I think it is necessary to add a check mechanism to ensure N == sizeof...(VA).

At the beginning, I used static_assert in Checker, and it failed because static assertion expression is not an integral constant expression. Then I learned User-defined literal string: compile-time length check that I can directly use assert in consteval functions, and it works.

However, the GPT said it's not recommended to call assert in consteval functions (I am not sure about it, since it compiles on both Clang and MSVC). And, if it is not recomended, what could be a better implementation?

template <std::size_t N, typename... VA>
consteval void Checker(const char (&key)[N], VA&&... va)
{
    std::string_view string_view(key, N - 1);
    std::size_t param_cnt = 0;
    auto indicator_index = string_view.find('@');
    if (indicator_index == std::string_view::npos) // no params
    {
        assert(sizeof...(VA) == 0);
    }
    else
    {
        // parse param_cnt_
        for (auto i = indicator_index + 1; string_view.begin() + i != string_view.end(); i++)
        {
            auto digit = string_view.at(i); // get digit
            assert('0' <= digit && digit <= '9');
            param_cnt = param_cnt * 10 + digit - '0';
        }
        assert(sizeof...(va) == param_cnt);
    }
}

int main(int argc, char* argv[])
{
    Checker("foo@2", 1, 2);
    Checker("foo@1", "string");
    return 0;
}

Then comes the tricky part. I tried to use Checker in Translate, unfortunately, it did not work. It's because key, when passed as parameter, is no longer guaranteed to be a compile-time constant.

template <std::size_t N, typename... VA>
std::string Translate(const char (&key)[N], VA&&... va)
{
  Checker(key, va...); // Function parameter 'key' with unknown value cannot be used in a constant expression
  // do translation
  return "";
}
5
  • Imo, strongly type the keys. LocalString<1> hello_foo_key={42}; This also lets you move the key lookup to link time instead of runtime + memory. Even better: LocalString<std::string, int> hello_name_and_count_key = {43}; makes it trivial to ensure the developer passed arguments of the right types. Commented May 15 at 1:31
  • The language package is loaded at runtime. Since the function Translate is always called with a literal key, and key contains param count information, what I want to do is to mimic the compile-time check facility of std::format while maintaining compatibility so that Translate("HELLO@1", "FOO"); still works. Commented May 15 at 2:25
  • As for assert in immediate functions: consteval/constexpr functions are inline, and inline functions shouldn't use assert because it might be an ODR violation unless you ensure NDEBUG is the same before you #include <cassert> every time the function is compiled. Also, it just won't error if NDEBUG is set for your assert macro Commented May 15 at 8:36
  • Maybe take a look at what the Qt library does with their translations - that has worked well for many people for decades. Commented May 15 at 14:48
  • @Silver I assumed that was the case. But the key lookup to an Index+metadata can be compile time. It's only the index->translated string that's deferred to runtime Commented May 17 at 23:37

1 Answer 1

8

This is a very similar problem to what fmt::format and now std::format want to do: type-check the format string:

std::format("x={}"); // compile-time error (missing argument)
std::format("x={}", 1); // ok
std::format("x={:d}", "not a number"); // compile-time error (bad specifier)

The mechanism by which this works is pretty clever. You think of the signature to format as being:

template <typename... Args>
auto format(string_view, Args&&...) -> string;

But it's really this:

template <typename... Args>
auto format(format_string<Args...>, Args&&...) -> string;

where:

template <typename... Args>
struct basic_format_string {
    template <class S> requires std::convertible_to<S, std::string_view>
    consteval basic_format_string(S s) {
        std::string_view sv = sv;
        // now parse the thing
    }
};

template <typename... Args>
using format_string = basic_format_string<type_identity_t<Args>...>;

That is: when you call format("x={}") that is going to try to initialize basic_format_string<> from "x={}". That constructor is consteval. It's in that constructor that the format string is parsed. If that parsing fails, you just do some non-constant-expression operation and that will case the whole expression to fail.


So you just have to do the exact same thing:

template <typename... Args>
struct basic_format_string {
    std::string_view sv;

    template <class S> requires std::convertible_to<S, std::string_view>
    consteval basic_format_string(S s) : sv(s) {
        auto idx = sv.find('@');
        if (idx == sv.npos) {
            if (sizeof...(Args) != 0) {
                throw "expected no arguments";
            }
        } else {
            int v;
            auto [p, ec] = std::from_chars(sv.data() + idx + 1, sv.data() + sv.size(), v);
            if (ec == std::errc() and p == sv.data() + sv.size()) {
                if (sizeof...(Args) != v) {
                    throw "wrong number of arguments";
                }
            } else {
                throw "invalid arg";
            }
        }
    }
};

template <typename... Args>
using format_string = basic_format_string<std::type_identity_t<Args>...>;

template <typename... VA>
std::string Translate(format_string<VA...> fmt, VA&&... va)
{
    // use fmt.sv and va...
    return "something";
}

Which you can see work here:

int main() {
    Translate("foo@2", 1, 2); // ok
    Translate("foo@3", 1, 2); // compile-time error (wrong number of arguments)
}
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your explanation. I took a close look at the cppreference and STL implementation and indeed it is. I didn't notice this before.
Nice trick, for those who, like me, didn't understand why we need std::type_identity_t: stackoverflow.com/questions/68675444/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.