0

first let me say that I have read over numerous "scrapping" threads on here and none have been of help to me. I also checked around the internet for days and now I am getting close to the wire I am hoping someone can shed some light on this for me.

I am using PHP Simple HTML DOM Parser to scrape some data from a page. The url I am working with serves dynamic content and I can not seem to get anything to work to pull that content in. I need to scrape the text(plain) from <tr id="0" class="ui-widget-content jqgrow ui-row-ltr" role="row"> to <tr id="9" class="ui-widget-content jqgrow ui-row-ltr" role="row">, I feel like once I get one to work I can get the others. Because this info is not actually on the page when the page is loaded but rather comes into the fold after the page loads I am in a rutt.

With that said, here is what I have tried:

echo file_get_html('http://sheriffclevelandcounty.com/p2c/jailinmates.aspx')->plaintext; 

The above will show me everything BUT the info I need, like this:

I also tried using the example from the plugin using IMDb and modified to my needs, this is it:

// Defining the basic cURL function
    function curl($url) {
        // Assigning cURL options to an array
        $options = Array(
            CURLOPT_RETURNTRANSFER => TRUE,  // Setting cURL's option to return the webpage data
            CURLOPT_FOLLOWLOCATION => TRUE,  // Setting cURL to follow 'location' HTTP headers
            CURLOPT_AUTOREFERER => TRUE, // Automatically set the referer where following 'location' HTTP headers
            CURLOPT_CONNECTTIMEOUT => 120,   // Setting the amount of time (in seconds) before the request times out
            CURLOPT_TIMEOUT => 120,  // Setting the maximum amount of time for cURL to execute queries
            CURLOPT_MAXREDIRS => 10, // Setting the maximum number of redirections to follow
            CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8",  // Setting the useragent
            CURLOPT_URL => $url, // Setting cURL's URL option with the $url variable passed into the function
        );

        $ch = curl_init();  // Initialising cURL
        curl_setopt_array($ch, $options);   // Setting cURL's options using the previously assigned array data in $options
        $data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
        curl_close($ch);    // Closing cURL
        return $data;   // Returning the data from the function
    }

     // Defining the basic scraping function
    function scrape_between($data, $start, $end){
        $data = stristr($data, $start); // Stripping all data from before $start
        $data = substr($data, strlen($start));  // Stripping $start
        $stop = stripos($data, $end);   // Getting the position of the $end of the data to scrape
        $data = substr($data, 0, $stop);    // Stripping all data from after and including the $end of the data to scrape
        return $data;   // Returning the scraped data from the function
    }

    $scraped_page = curl("http://sheriffclevelandcounty.com/p2c/jailinmates.aspx");    // Downloading IMDB home page to variable $scraped_page
    $scraped_data = scrape_between($scraped_page, '<table id="tblII" class="ui-jqgrid-btable" cellspacing="0" cellpadding="0" border="0" role="grid" aria-multiselectable="false" aria-labelledby="gbox_tblII" style="width: 456px;">', '</table>');   // Scraping downloaded dara in $scraped_page for content between <title> and </title> tags

    echo $scraped_data; // Echoing $scraped data, should show "The Internet Movie Database (IMDb)"

Of course neither of these work, so my question is: How do I use the PHP Simple DOM Parser to get dynamic content that is loaded after page load? Is it possible or am I just completely on the wrong track here?

2

1 Answer 1

0

I understand that you need the dynamic data that comes in the jqgrid. For that you can use post URL which in response gives the data.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://sheriffclevelandcounty.com/p2c/jqHandler.ashx?op=s");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_POST, 1);
curl_setopt($ch,CURLOPT_POSTFIELDS, array(
'rows'=>10000, //Here you can specify how many records you want
't'=>'ii'
    ));
$output = curl_exec($ch);
curl_close($ch);
echo "<pre>";
print_r(json_decode($output));
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.