4

Is there a way to get the progress of the ReadAsStringAsync() method? I am just getting the HTML content of a website and parsing.

public static async Task<returnType> GetStartup(string url = "http://")
{
    using (HttpClient client = new HttpClient())
    {
        client.DefaultRequestHeaders.Add("User-Agent",
            "Mozilla/5.0 (compatible, MSIE 11, Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko");
        using (HttpResponseMessage response = await client.GetAsync(url))
        {
            using (HttpContent content = response.Content)
            {
                string result = await content.ReadAsStringAsync();
            }
        }
    }
}
10
  • How long is the string? If the response size is big enough to warrant progress information (i.e. sized larger than a few megabytes) then you probably shouldn't be reading it as a String. Commented Nov 26, 2020 at 21:42
  • Also, what kind of progress information are you after, exactly? For a response sized smaller than ~10KB (i.e. a couple of TCP packets/ethernet-frames) then it's impossible to get a numerical percentage progress figure because it will jump from 0% to 100% in a single go. Commented Nov 26, 2020 at 21:43
  • @Dai the string size is between 3MB and 10MB Commented Nov 26, 2020 at 21:45
  • 1
    Without a Content-Length header it is impossible to indicate any kind of percentage progress. Commented Nov 26, 2020 at 21:59
  • 1
    None of those are relevant. Content-Length is required, without it you're SOL. Commented Nov 26, 2020 at 22:03

1 Answer 1

6

Is there a way to get the progress of the ReadAsStringAsync() method? I am just getting the html content of a website and parsing.

Yes and no.

HttpClient does not expose timing and progress information from the underlying network-stack, but you can get some information out by using HttpCompletionOption.ResponseHeadersRead, the Content-Length header, and reading the response yourself with your own StreamReader (asynchronously, of course).

Do note that the Content-Length in the response headers will refer to the length of the compressed content prior to decompression, not the original content length, which complicates things because probably most web-servers today will serve HTML (and static content) with gzip compression (as either Content-Encoding or Transfer-Encoding), so the Content-Length header will not tell you the length of the decompressed content. Unfortunately, while HttpClient can do automatic GZip decompression for you, it won't tell you what the decompressed content length is.

But you can still report some kinds of progress back to your method's consumer, see below for an example. You should do this using the .NET idiomatic IProgress<T> interface rather than rolling your own.

Like so:

private static readonly HttpClient _hc = new HttpClient()
{
    DefaultRequestHeaders =
    {
        { "User-Agent", "Mozilla/5.0 (compatible, MSIE 11, Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko" }
    }
    // NOTE: Automatic Decompression is not enabled in this HttpClient so that Content-Length can be safely used. But this will drastically slow down content downloads.
};

public static async Task<T> GetStartupAsync( IProgress<String> progress, string url = "http://")
{
    progress.Report( "Now making HTTP request..." );

    using( HttpResponseMessage response = await client.GetAsync( url, HttpCompletionOption.ResponseHeadersRead ) )
    {
        progress.Report( "Received HTTP response. Now reading response content..." );

        Int64? responseLength = response.Content.Headers.ContentLength;
        if( responseLength.HasValue )
        {
            using( Stream responseStream = await response.Content.ReadAsStreamAsync().ConfigureAwait(false) )
            using( StreamReader rdr = new StreamReader( responseStream ) )
            {
                Int64 totalBytesRead = 0;
                StringBuilder sb = new StringBuilder( capacity: responseLength.Value ); // Note that `capacity` is in 16-bit UTF-16 chars, but responseLength is in bytes, though assuming UTF-8 it evens-out.

                Char[] charBuffer = new Char[4096];
                while( true )
                {
                    Int32 read = await rdr.ReadAsync( charBuffer ).ConfigureAwait(false);
                    sb.Append( charBuffer, 0, read );

                    if( read === 0 )
                    {
                        // Reached end.
                        progress.Report( "Finished reading response content." );
                        break;
                    }
                    else
                    {
                        progress.Report( String.Format( CultureInfo.CurrentCulture, "Read {0:N0} / {1:N0} chars (or bytes).", sb.Length, resposneLength.Value );
                    }
                }
            }
        }
        else
        {
            progress.Report( "No Content-Length header in response. Will read response until EOF." );
            
            string result = await content.ReadAsStringAsync();
        }
       
        progress.Report( "Finished reading response content." );
    }

Notes:

  • In general, any async method or method returning a Task/Task<T> should be named with an Async suffix, so your method should be named GetStartupAsync, not GetStartup.
  • Unless you have an IHttpClientFactory available, you should not wrap a HttpClient in a using block because this can cause system resource exhaustion, especially in server application.
    • (The reasons for this are complicated and also may differ depending on your .NET implementation (e.g. I believe Xamarin's HttpClient doesn't have this problem), but I won't go into details here).
    • So you can safely ignore any Code Analysis warning about not disposing of your HttpClient. This is one of the few exceptions to the rule about always disposing of any IDisposable objects that you create or own.
    • As HttpClient is thread-safe and this is a static method consider using a cached static instance instead.
  • You also don't need to wrap HttpResponseMessage.Content in a using block either, as the Content object is owned by the HttpResponseMessage.
Sign up to request clarification or add additional context in comments.

5 Comments

As I said in the comment above, Content-Length is available. I guess I should go for ReadAsStreamAsync?
@Alejandro I have updated my answer to account for Content-Length.
You have too many typo mistakes. I cleared it up but you should also do it for the answer
@Alejandro The code in my answer is only intended as an illustrative example and it is not intended to be copied-and-pasted into production. You should never be blindly copying and pasting code from StackOverflow - or any other website for that matter.
I am not talking about me, I am talking about future references. I didn't really need any ReadAsStringAsync illustrative, my question was about ReadAsStringAsync not streamer. I am accepting the answer as I am finding out there is not solution with ReadAsStringAsync besides using a streamer. Anyway, as you wish

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.