1

I'm writing a tiny HTTP server using C++ (just for fun).

When receiving request from a client, should I worry about charset of HTTP headers? Is it guaranteed that all of them constist only of one-byte ASCII characters?

2 Answers 2

3

Is it guaranteed that all of them constist only of one-byte ASCII characters?

No. HTTP uses TCP, so octets >= 128 can be transferred.

Does HTTP allow non-ASCII characters?

Yes. See the ABNF for field-content (RFC 2616, Section 4.2) and quoted-string (RFC 2616, Section 2.2).

Does HTTP define the encoding?

More or less, by stating that non-ISO-8859-1 characters require an additional layer of encoding (again, from 2.2):

The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO-8859-1 [22] only when encoded according to the rules of RFC 2047 [14].

Is this used in practice?

Yes. For instance, in Content-Disposition.

Is this a good idea?

No, because many recipients and intermediates get this wrong.

Sign up to request clarification or add additional context in comments.

8 Comments

I don't get it. According to this doc rfc822 (BNF on the page 9, where CHAR is ASCII character). Do you mean custom headers?
Comrade - How is RFC 822 relevant?
In section 4.2 it calls out RFC 822: w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2 sounds to me like the standard is being ignored.
Methinks some reference is in order here. Where have you seen non-ascii chars in headers and what spec are they following such that we can write some code around this.
Bob: "follow the same generic format" doesn't mean they are the same. RFC 2616 itself defines the format, and it definitely allows code points >= 128. I'm not saying it is a good idea, but it's what the spec says. Re: "Where have you seen non-ascii chars in headers and what spec are they following such that we can write some code around this." - the spec is RFC 2616. Example header fields where I have seen non-ASCII are Content-Disposition (in the Filename parameter) and WWW-Authenticate (in the Realm parameter).
|
0

That's a great question and I don't know but would like to. I believe you will find the answer here: http://www.w3.org/Protocols/rfc2616/rfc2616.html

That doc says that Headers follow RFC822 (http://www.ietf.org/rfc/rfc0822.txt) and that one says ASCII. I'm thinking that you can rely upon the ASCIIness of it all.

2 Comments

It seems that you're right: "Each header field can be viewed as a single, logical line of ASCII characters, comprising a field-name and a field-body." Thanks a lot!
The answer above is incorrect. RFC 2616 mentions RFC 822, but this is not a normative statement, just a reference to a prior, similar format. The ABNF in RFC 2616 makes it clear that you can have octets >= 128.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.