5

I'd like to process a large JSON response (a large list of measurements) from a webserver in a streaming fashion using JsonSerializer.DeserializeAsyncEnumerable(). Problem is that the array of measurements is wrapped in a header JSON document. For example:

public record Header(string Id, Measurement [] Measurements);
public record Measurement(string Timestamp, decimal Value);

The DeserializeAsyncEnumerable() specifies that it only works for root level items. Is it possible to still use this method and somehow skip the wrapping class?

I've looked into writing a custom JsonConverter, but that doesn't seem to solve the problem.

I've also tried to create a property of type IAsyncEnumerable<Measurement> on the Header but even if I don't iterate the collection it has already created all the measurement objects.

As for my scenario: I want to go through the file without actually loading the entire file into memory. Simple example: calculate an average over the measurement values. I don't need the header but since it's produced by an external service I cannot change the contract. In the past, with an XML reader I could relatively easily do this but it appears it's not as easy with system.text.json.

7

1 Answer 1

1

As of .NET 8, System.Text.Json only implements streaming SAX-like parsing for root level JSON arrays. As stated in Announcing .NET 6 Preview 4: Streaming deserialization:

JsonSerializer.DeserializeAsyncEnumerable... only supports reading from root-level JSON arrays, although that could be relaxed in the future based on feedback.

Unfortunately the restriction has not been relaxed as of .NET 8. For confirmation, see [API Proposal]: Support streaming deserialization of JSON objects #64182 which was closed as a duplicate of Developers should be able to pass state to custom converters. #63795 -- which is still open.

So what are some possible workarounds?

Firstly, you could use Utf8JsonStreamReader from this answer by mtosh to Parsing a JSON file with .NET core 3.0/System.text.Json to stream through the measurements, deserialize each one, and process it as required:

using var jsonStreamReader = new Utf8JsonStreamReader(stream, 32 * 1024);

int totalCount = 0;
decimal totalValue = 0;

while (jsonStreamReader.Read())
{
    if (jsonStreamReader.CurrentDepth == 1 && jsonStreamReader.TokenType == JsonTokenType.PropertyName)
    {
        var propertyName = jsonStreamReader.GetString();             
        if (string.Equals(propertyName, "Measurements", StringComparison.OrdinalIgnoreCase))
        {
            if (!jsonStreamReader.Read())
                throw new JsonException();
            if (jsonStreamReader.TokenType == JsonTokenType.StartArray)
            {
                while (jsonStreamReader.Read() && jsonStreamReader.TokenType != JsonTokenType.EndArray)
                {
                    var measurement = jsonStreamReader.Deserialize<Measurement>();
                    // Do something with Measurement, such as compute the total measurement value and count.
                    totalCount++;
                    totalValue += measurement.Value;
                }
            }
        }
    }
}

var average = totalValue / totalCount;

Demo fiddle #1 here.

Secondly, you could use a psuedo-collection that implements ICollection<Measurement> but only processes the added measurements without actually accumulating them.

E.g., define the following classes:

public class TotalMeasurementCollection : AggregatingCollection<Measurement>
{
    public int TotalAdded { get; set; }
    public decimal TotalValue { set; get; }

    // Do something with Measurement, such as compute the total measurement value and count.
    public override void Add(Measurement item) => (this.TotalAdded, this.TotalValue) = (this.TotalAdded + 1, this.TotalValue + item.Value);
};

public class AggregatingCollection<TItem> : ICollection<TItem>
{
    public virtual void Add(TItem item) {}
    public bool Contains(TItem item) => false;
    public void CopyTo(TItem[] array, int arrayIndex) => ArgumentNullException.ThrowIfNull(array);
    public int Count => 0;
    public bool IsReadOnly => false;
    public bool Remove(TItem item) => false;
    public void Clear() {}
    public IEnumerator<TItem> GetEnumerator() => Enumerable.Empty<TItem>().GetEnumerator();
    IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
}

public record Header<TCollection> (string Id, TCollection Measurements) where TCollection : IEnumerable<Measurement>;

And now you will be able to do:

var result = await JsonSerializer.DeserializeAsync<Header<TotalMeasurementCollection>>(stream);
var average = result!.Measurements!.TotalValue / result!.Measurements!.TotalAdded;

This trick is non-obvious but has the advantage of working with any serializer.

Demo fiddle #2 here.

Thirdly, you could switch to Json.NET which supports streaming deserialization of individual objects inside a huge JSON file natively by using JsonTextReader to read through the file. See e.g.:

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the very complete answer. I i hadn't thought of using this collection trick. It's a nice one!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.