1

I'm trying to scrape web page via C# application, but it keeps responding

"The remote server returned an error: (404) Not Found."

The web page is accesible through browser, but the app keeps failing. Any help appreciated.

var d = DateTime.UtcNow.Date;
var AddressString = @"http://www.booking.com/searchresults.html?src=searchresults&si=ai%2Cco%2Cci%2Cre%2Cdi&ss={0}&checkin_monthday={1}&checkin_year_month={2}&checkout_monthday={3}&checkout_year_month={4}";
var URi = String.Format(AddressString, "Prague", d.Day, d.Year + "-" + d.Month, d.Day + 1, d.Year + "-" + d.Month);
var request = (HttpWebRequest)WebRequest.Create(URi);
request.Timeout = 5000;
request.UserAgent = "Fiddler"; //I tried to set next three rows not to be null
request.Credentials = CredentialCache.DefaultCredentials;
request.Proxy = WebProxy.GetDefaultProxy();
try
{
    var response = (HttpWebResponse)request.GetResponse();
}
catch(WebException e)
{
    var response = (HttpWebResponse)e.Response; //e.Response contains WebPage, but it is incomplete
    StreamReader sr = new StreamReader(response.GetResponseStream());
    HtmlDocument doc = new HtmlDocument();
    doc.Load(sr);
    var a = doc.DocumentNode.SelectNodes("div[@class='resut-details']"); //fails, as not all desired nodes arent in response
 }

EDIT:

Hi guys, thx for suggestions.

I added header: "Accept-Encoding: gzip,deflate,sdch" according to David Martins reply, but it didn't helped on its own.

I used Fidller to try to get any info about the problem, but I saw that app for the first time and it didn't made me any smarter. On the other hand, I tried to change request.UserAgent to that which is sent by my browser ("User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36";) and voila, I am not getting 404 exception anymore, but the document is not readable, as it is filled with such chars: ¿½O~���G�. I tried setting request.TransferEncoding = "UTF-8", but to enable this propperty, request.SendChunked must be set to true, which ends in

ProtocolViolationException

Additional information: Content-Length or Chunked Encoding cannot be set for an operation that does not write data.

EDIT 2: I'm forgetting something and I can't figure out what. I'm getting somehow encoded response and need to decode it first to read it correctly. Even in Fiddler, when I want to see response, I need to confirm decoding to inspect result. After I decode it in fiddler, I'm getting just what I want to get into my application...

1
  • I suggest you look at what's coming back on the network with Fiddler2 or Wireshark. If the web server is sending a response of 404, then WebRequest is behaving entirely correctly. Commented May 2, 2014 at 11:49

1 Answer 1

1

So, after trying suggestions from Jon Skeet and David Martin I got somewhere further and found relevant answer on new question in another toppic. If anyone ever looked for sth similar, answer is here:

.NET: Is it possible to get HttpWebRequest to automatically decompress gzip'd responses?

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.