Skip to content

API Proposal: Http header value Encoding selection #38711

@MihaZupan

Description

@MihaZupan

Background and Motivation

Http header values should only contain printable ASCII characters according to RFC, but in reality, services accept/require other characters/encodings - mainly Latin1 and UTF8.

Currently, SocketsHttpHandler only allows sending ASCII characters in header values. When receiving headers, it assumes Latin1 and therefore silently corrupts the characters if UTF8 was meant.

MultipartContent also assumes Latin1 and silently corrupts anything else.

On .NET Framework, Latin1 is assumed and anything else is silently corrupted.

There is a real-world need to support sending and receiving non-ascii headers.
This proposal adds APIs that allow the user to specify the Encoding to be used, while keeping the default behavior unchanged.

Proposed API

namespace System.Net.Http
{
    // "RequestHeaderEncodingSelector" in Kestrel

    public sealed class SocketsHttpHandler
    {
        public Func<string, HttpRequestMessage, Encoding?>? RequestHeaderEncodingSelector { get; set; }
        public Func<string, HttpResponseMessage, Encoding?>? ResponseHeaderEncodingSelector { get; set; }
    }
    
    public class MultipartContent
    {
        public Func<string, HttpContent, Encoding?>? HeaderEncodingSelector { get; set; }
    }
}

If no selector callback is specified or if null is returned for the Encoding, the current default behavior is used.

Usage Examples

var handler = new SocketsHttpHandler()
{
    // Treat all headers as UTF8
    HeaderEncodingSelector = delegate { return Encoding.UTF8; }
};

var httpClient = new HttpClient(handler);
var handler = new SocketsHttpHandler()
{
    // Allow UTF8 for some custom headers
    HeaderEncodingSelector = (name, request) => request.Uri.Host == "contoso.com" && name.StartsWith("X-Custom-") ? Encoding.UTF8 : null
};

Risks

Should a user provide a custom Encoding that outputs new line bytes for characters other than CR/LF, the request could be malformed.

If we so choose, we can guard against this at a slight overhead when the provided Encoding isn't a built-in known-safe one (Ascii/Latin1/UTF8). Using anything other than those 3 is expected to be very rare.

Note

The API (or a switch to allow non-Ascii Latin1 characters in request headers) would have to be backported to Core 3.1

Fixes #37024

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions