I'm writing a tiny HTTP server using C++ (just for fun).
When receiving request from a client, should I worry about charset of HTTP headers? Is it guaranteed that all of them constist only of one-byte ASCII characters?
Is it guaranteed that all of them constist only of one-byte ASCII characters?
No. HTTP uses TCP, so octets >= 128 can be transferred.
Does HTTP allow non-ASCII characters?
Yes. See the ABNF for field-content (RFC 2616, Section 4.2) and quoted-string (RFC 2616, Section 2.2).
Does HTTP define the encoding?
More or less, by stating that non-ISO-8859-1 characters require an additional layer of encoding (again, from 2.2):
The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO-8859-1 [22] only when encoded according to the rules of RFC 2047 [14].
Is this used in practice?
Yes. For instance, in Content-Disposition.
Is this a good idea?
No, because many recipients and intermediates get this wrong.
That's a great question and I don't know but would like to. I believe you will find the answer here: http://www.w3.org/Protocols/rfc2616/rfc2616.html
That doc says that Headers follow RFC822 (http://www.ietf.org/rfc/rfc0822.txt) and that one says ASCII. I'm thinking that you can rely upon the ASCIIness of it all.