-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Background and Motivation
Http header values should only contain printable ASCII characters according to RFC, but in reality, services accept/require other characters/encodings - mainly Latin1 and UTF8.
Currently, SocketsHttpHandler only allows sending ASCII characters in header values. When receiving headers, it assumes Latin1 and therefore silently corrupts the characters if UTF8 was meant.
MultipartContent also assumes Latin1 and silently corrupts anything else.
On .NET Framework, Latin1 is assumed and anything else is silently corrupted.
There is a real-world need to support sending and receiving non-ascii headers.
This proposal adds APIs that allow the user to specify the Encoding to be used, while keeping the default behavior unchanged.
Proposed API
namespace System.Net.Http
{
// "RequestHeaderEncodingSelector" in Kestrel
public sealed class SocketsHttpHandler
{
public Func<string, HttpRequestMessage, Encoding?>? RequestHeaderEncodingSelector { get; set; }
public Func<string, HttpResponseMessage, Encoding?>? ResponseHeaderEncodingSelector { get; set; }
}
public class MultipartContent
{
public Func<string, HttpContent, Encoding?>? HeaderEncodingSelector { get; set; }
}
}If no selector callback is specified or if null is returned for the Encoding, the current default behavior is used.
Usage Examples
var handler = new SocketsHttpHandler()
{
// Treat all headers as UTF8
HeaderEncodingSelector = delegate { return Encoding.UTF8; }
};
var httpClient = new HttpClient(handler);var handler = new SocketsHttpHandler()
{
// Allow UTF8 for some custom headers
HeaderEncodingSelector = (name, request) => request.Uri.Host == "contoso.com" && name.StartsWith("X-Custom-") ? Encoding.UTF8 : null
};Risks
Should a user provide a custom Encoding that outputs new line bytes for characters other than CR/LF, the request could be malformed.
If we so choose, we can guard against this at a slight overhead when the provided Encoding isn't a built-in known-safe one (Ascii/Latin1/UTF8). Using anything other than those 3 is expected to be very rare.
Note
The API (or a switch to allow non-Ascii Latin1 characters in request headers) would have to be backported to Core 3.1
Fixes #37024