GCC 4.7 Source Character Encoding and Execution Character Encoding For String Literals?

Question

Does GCC 4.7 on Linux/x86_64 have a default character encoding by which it validates and decodes the contents of string literals in C source files? Is this configurable?

Further, when linking the string data from string literals into the data section of the output does it have a default execution character encoding? Is this configurable?

In any configuration is it possible to have a source character encoding that differs from the execution character encoding? (That is will gcc ever transcode between character encodings?)

Kerrek SB · Accepted Answer · 2012-08-31 14:21:42Z

15

I don't know how well these options actually work (not using them atm; I still prefer treating string literals as 'ASCII only', since localized strings come from external files anyway so it's mostly things like format strings or filenames), but they have added options like

-fexec-charset=charset
Set the execution character set, used for string and character constants. The default
is UTF-8. charset can be any encoding supported by the system's iconv library routine. 

-fwide-exec-charset=charset
Set the wide execution character set, used for wide string and character constants.
The default is UTF-32 or UTF-16, whichever corresponds to the width of wchar_t. As
with -fexec-charset, charset can be any encoding supported by the system's iconv
library routine; however, you will have problems with encodings that do not fit
exactly in wchar_t.

-finput-charset=charset
Set the input character set, used for translation from the character set of the
input file to the source character set used by GCC. If the locale does not specify,
or GCC cannot get this information from the locale, the default is UTF-8. This can
be overridden by either the locale or this command line option. Currently the command
line option takes precedence if there's a conflict. charset can be any encoding
supported by the system's iconv library routine.

edited Aug 31, 2012 at 14:21

Kerrek SB

480k96 gold badges904 silver badges1.1k bronze badges

answered Aug 31, 2012 at 14:07

Christian Stieber

12.6k30 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Andrew Tomazos Over a year ago

I wonder if when the src and exec encoding is the default UTF-8, it actually validates the string literal as well-formed UTF-8 and raises an error if it contains invalid byte sequences - or whether it just lets the invalid bytes pass through.

alrav Over a year ago

@AndrewTomazos I would also be very interested in this. Did you ever determine if it performs this validation?

Mark Ransom Over a year ago

@alrav since this is something that's not defined by the standard, it's probably free to change from one compiler version to the next. I wouldn't trust an answer someone came up with 10 years ago.

Collectives™ on Stack Overflow

GCC 4.7 Source Character Encoding and Execution Character Encoding For String Literals?

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related