0

I have just posted this question, which answer came right away. It, in turn, creates the following new question:

If my understanding is correct, the StreamContent object, from HttpResponseMessage, is created upon making an HTTP request via HttpClient.GetAsync. Its Header property, or part of it, will be set according to meta tags included in the HTML source file.

For instance, a meta tag can tell the response object with which charset encode the file's contents.

<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />

Running a request to a resource that contains such line will generate a HttpResponseMessage.Content.Header with this setting.

In the other question referenced at the top of this question, I mention about a response object being created without the correct encoding. Since the HTML source that generates such incompatible response does contain the setting that is responsible for creating responses properly encoded:

<meta HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1255">

what is the reason that responses for that site are not being passed the charset setting included in the meta tag and thus being rendered in an incorrect charset?

Here's a pictorial description of the question: both sites contain the meta tag with charset setting, but one, for some reason, misses it...

enter image description here


Fiddler's header details for both requests:

Working one: (removed cookie header)

Request:

GET http://www.ynet.co.il/home/0,7340,L-8,00.html HTTP/1.1
Host: www.ynet.co.il
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
If-Modified-Since: Thu, 31 Mar 2016 10:04:39 GMT

Response:

HTTP/1.1 200 OK
vg_id: 1
X-me: 06
Content-Type: text/html; charset=UTF-8
Last-Modified: Thu, 31 Mar 2016 10:38:57 GMT
Accept-Ranges: bytes
VX-Cache: HIT
WAI: 01
V-TTL: 0
backend-cache-control: 
Content-Length: 410685
Vary: Accept-Encoding
Date: Thu, 31 Mar 2016 10:38:48 GMT
Connection: keep-alive

Problematic one:

Request:

GET http://winedepot.co.il/ HTTP/1.1
Host: winedepot.co.il
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: __utma=201832727.725995063.1458660502.1459413977.1459418530.8; __utmz=201832727.1458660502.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); __utmc=201832727; ASPSESSIONIDCQTRQCAQ=FEOHEBFCBGABBKOBAHOGKBGB
Connection: keep-alive

Response:

HTTP/1.1 200 OK
Cache-Control: private
Content-Length: 118225
Content-Type: text/html
Server: Microsoft-IIS/7.5
X-Powered-By: ASP.NET
Date: Thu, 31 Mar 2016 10:36:21 GMT
7
  • I'm fairly sure that the HttpResponseMessage class does not parse the response HTML to read any meta tags. I might be wrong though. Are you very sure that the behavior you're seeing stems from those tags, and if so, how did you verify this? Commented Mar 31, 2016 at 9:42
  • This is an assumption, however based on analyzing the results of the excerpt above. Commented Mar 31, 2016 at 9:50
  • Yeah but you don't show the entire HTTP response, so there is no way for us to verify that the character set doesn't actually come from a response header. Commented Mar 31, 2016 at 9:54
  • Which request header you think can influence here ? Don't forget that Content-Type is a response header only. I will add it to the screenshot, but I see nothing that seems related. Commented Mar 31, 2016 at 9:57
  • I'm not talking about request headers anywhere. Don't add screenshots, add it as text. Use Fiddler to obtain the request and response headers. Also, content-type can be used as a request header. Commented Mar 31, 2016 at 9:57

2 Answers 2

0

As you can see from your Fiddler screenshots, the HttpResponseMessage.Content.Headers.ContentType will contain exactly what was specified in the Content-type header of the response.

The HttpResponseMessage will not parse the response HTML and search for any <meta /> tags.

Sign up to request clarification or add additional context in comments.

12 Comments

Thanks, but I do not see how this answer the question. I noticed the difference in the response headers in fiddler. Why does one response header get a charset setting and the other not, when this parameter is defined in a meta tag in the html source - and both uris html sources do contain it ?
@Veverke my answer answers your question "Why do I see these content-type headers while I expect something else?". Your expectation is wrong. That this answer doesn't solve the underlying problem is not something I can change.
The HttpResponseMessage will not parse the response HTML. Fine, this means these tags have no influence in the response object creation. Stil... here are go again - which other setting then is responsible for one response being created with UTF-8 and the other with none (default) ?
I am sorry buddy but you will not tell me what my question is :-)
I am telling you the answer to what you're asking, you're simply not understanding it, which is not my problem. The HttpResponseMessage.Content.Headers.ContentType will contain exactly the value that the server sends in its Content-type response header, and I have ran out of ways to tell you that. You have no influence over that, and if that content-type header is actually wrong (i.e. the repsonse body actually is encoded differrently), then there's nothing you can do but go and detect or guess the actual encoding.
|
-1

content type comes from the HTTP HEADER

https://en.wikipedia.org/wiki/List_of_HTTP_header_fields

<meta http-equiv='Content-Type' content='text/html; charset=utf-8' />

is part of the content and not part of the headers.

I suggest you to install the application Fiddler to better understand what those request actually do. set fiddler as your proxy and use the inspectors to see what is actually passed when you make http requests.

better explanation is far from the scope here

3 Comments

Did not get your point, Nahum. I am trying to figure out why one site is able to create http responses properly encoded and why others not. I gave examples of both cases. What is the reason for responses not properly encoded ? You say this has nothing to do with the meta tag ? What is the reason then ?
By the way I was aware from the beginning that Content-Type is part of the Content headers (see code sample).
why some people create bad code? your browser is made to take care of people not following standarts and writing bad code. thats simply what the sites return you have no control over it. youl have to work around it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.