0

i try to automate a download from a HTML-datasheet to generate a customized reporting. The following i was doing with CURL:

// init cURL HTTP Client 
$header = array(); 
$header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,"; 
$header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"; 
$header[] = "Cache-Control: max-age=0"; 
$header[] = "Connection: keep-alive"; 
$header[] = "Keep-Alive: 300"; 
$header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"; 
$header[] = "Accept-Language: en-us,en;q=0.5"; 
$header[] = "Pragma: "; 

$ch = curl_init(); 
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.8.1.7) Gecko/20070914 Firefox/2.0.0.7'); 
curl_setopt($ch, CURLOPT_HTTPHEADER, $header); 
curl_setopt($ch, CURLOPT_COOKIEFILE, '/.cookies'); 
curl_setopt($ch, CURLOPT_COOKIEJAR,  '/.cookies'); 
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE); 
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE); 
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); 
curl_setopt($ch, CURLOPT_FAILONERROR, TRUE); 
curl_setopt($ch, CURLOPT_HEADER, TRUE); 
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 600); 

curl_setopt($ch, CURLOPT_URL, 'https:// ... /signin.html'); 
curl_setopt($ch, CURLOPT_POST, TRUE); 
curl_setopt($ch, CURLOPT_POSTFIELDS, "username=".$login."&password=".$pass); 
$response = curl_exec($ch);

The login works fine and i can get many pages without any problems. Now i try to get the datasheet by the following:

curl_setopt($ch, CURLOPT_URL, 'https:// ... /data.html'); 
curl_setopt($ch, CURLOPT_POST, FALSE); 
curl_setopt($ch, CURLOPT_POSTFIELDS, ''); 
$response = curl_exec($ch);

But now i get the following answer:

<html>
<head>
<script language='javascript'>function autoNavigate() {window.location="/data.html";}</script>
</head>
<body onload='autoNavigate()'></body>
</html>

The javaScript call refresh the same page as i loaded before. In a browser it works fine, but if i load the same page again with "curl_exec($ch)" i've got a 302-error?

Is there a possibilty the refresh the page with curl without a full reload? Or any other idea to get the content of the page?

Thanks

2
  • 302 isn't an error, it's a redirect code. Do you have CURLOPT_FOLLOWLOCATION set in the second curl call? Commented Dec 4, 2014 at 8:09
  • I dont change CURLOPT_FOLLOWLOCATION in the second call. Yes, 302 isnt an error, but it redirects to an error page. Commented Dec 9, 2014 at 7:17

2 Answers 2

1

try:

$postfields = '';
curl_setopt($ch, CURLOPT_URL, 'https:// ... /data.html'); 
curl_setopt($ch, CURLOPT_POST, TRUE); 
curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields); 
$response = curl_exec($ch);

It creates problem when you set false the CURLOPT_POSTFIELDS value but earlier you set it as True bacause it holds the previous details in Cookie.

I hope this will helpful for you.

Sign up to request clarification or add additional context in comments.

Comments

0

Did you check the link of data.html?
If the data.html in window.location="data.html"; is the same location of data.html in curl_setopt($ch, CURLOPT_URL, 'https:// ... /data.html'); try to double curl_exec($ch) so may be it needs to access two times. Or if it different, just simple change your link.

6 Comments

Yes, i check it. It is the same link. Only difference the first one is a url with "https:// ... /" (absolute) and the url in the javascript is without (relative). But it is always the same.
If that, may be you need to exec curl 2 times, and if need set the referer also like this:<br/> curl_exec($ch);<br/> curl_setopt($ch, CURLOPT_REFERER, 'https://..../data.html');<br/> $response = curl_exec($ch);
A double "curl_exec($ch)" results also a 302-error.
Ok, first HTTP 302 is not an error. It's redirect code. Try to show the header of response then get the location in the header and change url to the new location. Like this: curl_setopt($ch, CURLOPT_HEADER, 1); $response = curl_exec($ch); in the $response you will see something like this: Location: https://.../newlink.html Use this new location to get your page. This because some host not allow CURLOPT_FOLLOWLOCATION options.
Yes, you are right. I forgot to explain that the 302 is not an error, but a redirection to an error page. Here a part of the Header: HTTP/1.1 302 Found Server: Apache-Coyote/1.1 Cache-Control: private Location: https:// ... /reportingError.html Content-Type: text/html;charset=windows-1252 Content-Length: 0 Set-Cookie: JSESSIONID=C9CFFE27F202765BC562274539CA11F5; Path=/; Secure; HttpOnly Vary: User-Agent
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.