8

Why do GCC and Clang produce different output with this conforming C code:

int (puts) (); int (main) (main, puts) int main;
char *puts[(&puts) (&main["\0April 1"])]; <%%>

Neither compiler produces any warning or error even with -Wall -std=c18 -pedantic, but the program produces no output when built with GCC but prints the current date when built with Clang.

8
  • 3
    Use -Wstrict-prototypes -Werror too — that'll put an end to the nonsense. Commented Apr 1, 2022 at 14:05
  • 4
    @JonathanLeffler: Given today's date, I think the nonsense is the whole point :) Commented Apr 1, 2022 at 14:10
  • 4
    If anyone wants their fun spoiled, here is a less obfuscated example. Commented Apr 1, 2022 at 14:17
  • 1
    @NateEldredge The fun just begins. Commented Apr 1, 2022 at 14:20
  • 2
    In which respect godbolt.org/z/nd4oxro74, with int arg[foo()][foo()];, is similarly fun: gcc prints Hello once and clang prints it twice. I'm leaning toward gcc being wrong but careful parsing of the standard might be needed, if it has a clear answer at all. Commented Apr 1, 2022 at 15:09

2 Answers 2

12

Why do GCC and Clang produce different output with this conforming C code:

int (puts) (); int (main) (main, puts) int main;
char *puts[(&puts) (&main["\0April 1"])]; <%%>

In the first place, it is conforming code, though it does make use of a variable-length array, which is an optional language feature in C11 and C17. Some of the obfuscations are

  • use of the obscure digraphs <% and %>, which mean the same thing as { and }, respectively.
  • parenthesizing the function identifiers in function declarations
  • a forward declaration of function puts that is not a prototype
  • a K&R-style definition of function main
    • with a VLA parameter
      • whose dimension expression contains a function call
      • and a reference to another parameter
  • use of unconventional identifiers for the parameters to function main()
  • use of identifiers (puts and main) in declarations of an object and a function, respectively, with the same identifier
  • use of the identifier main for something more than the program's entry-point function
  • inversion of the conventional order of the operands of the indexing operator ([])
    • plus, indexing a sting literal
  • calling a function via an explicit function pointer constant expression
  • A string literal with an explicit null character within
  • Unconventional placement (and omission) of line breaks

A less obfuscated equivalent would be

int puts();

int main(
    int argc,
    char *argv[ puts("\0April 1" + argc) ]
) {
}

But the central question about the difference in behavior between the version compiled with GCC and the one built with Clang comes down to whether the expression for the size of the VLA function parameter is evaluated at runtime.

The language spec says that when a function parameter is declared with array type, its type is "adjusted" to the corresponding pointer type. That applies equally to complete, incomplete, and variable-length array types, but the spec does not explicitly say that the expression(s) for the dimension(s) are not evaluated. It does specify that expressions go unevaluated in certain other cases, and it even makes an exception to such a rule in the case of sizeof expressions involving VLAs, so the omission in this case could be interpreted as meaningful.

That makes a difference only for parameters of VLA type, because only for those can evaluation of the dimension expression(s) produce side effects on the machine state, including, but not limited to, observable program behavior.

GCC does not evaluate the VLA parameter's size expression at runtime, and I am inclined to take this as conforming to the intent of the standard. As a result, the GCC-compiled program does nothing but exit with status 0.

Clang does evaluate the VLA parameter's size expression at runtime. Although I disfavor this interpretation of the spec, I cannot rule it out. When it does evaluate the size expression, it uses the passed value of the first parameter. When the program is run without arguments, then the first parameter has value 1, with the result that the standard library's puts function is called with a pointer to the 'A' in "\0April 1".

Sign up to request clarification or add additional context in comments.

5 Comments

Interestingly, GCC also evaluates the size if the modern-style parameter declarations are used.
Interesting indeed, @HolyBlackCat. Although I am prepared to accept either interpretation of whether the dimension expression is evaluated, I think GCC is taking unwarranted liberties by applying a different interpretation for one parameter-declaration style than for the other.
In the case of an argument of type pointer to variable-length array, int (*arrayptr)[foo()], the function foo() must be called because the code needs to know the size of *arrayptr to do correct pointer arithmetic. So your interpretation seems to have the counterintuitive consequence that in void blah(int twodim[foo()][bar()]) { }, we'd have that bar() is called and foo() is not.
@NateEldredge, yes, my interpretation would have that consequence. I don't find it especially counterintuitive. At least, not more so than function calls in a function's parameter list in general.
I congratulate you for the amount of time and effort spent to decipher this code salad. (:
0
int (puts) ();
int (main) (main, puts)
    int main;
    char *puts[(&puts) (&main["\0April 1"])];
{
}

Somebody's got a compiler bug; I'm just not sure who anymore. I don't understand why any compiler would emit code to evaluate the size parameter of a VLA as an argument.

The clang output is rather bizarre. For it to work, it would have had to find main in the function's scope but puts in the global scope despite having already encountered the declaration for puts. Normally, you can access a variable in its own declaration.

If somebody did this in production code my answer would be rather: "Stop using K&R function definitions."

3 Comments

If you rewrite to use the modern parameters, GCC also prints the string: godbolt.org/z/67TYqWo9o
@HolyBlackCat: Well that's a double-take.
RE “it would have had to find main in the function's scope but puts in the global scope”: This aspect of the code is fully defined in the C standard. Per C 2018 6.2.1 7, the scope of an identifier other than a tag or enumeration constant begins after its declarator. And the declarator includes array brackets and function call parentheses. So you can use an identifier in its initializer(s), but, as long as you are still in the declarator, the identifier can refer to something in the enclosing scope.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.