3

I try to scrape data of this website: http://ntthnue.edu.vn/tracuudiem

First, when I insert the SBD field with data 'TS4740', I can successfully get the result. However, when I try to run this code:

Here is my PHP cURL code:

<?php

function getData($id) {
    $url = 'http://ntthnue.edu.vn/tracuudiem';
    $ch = curl_init($url);

    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_POSTFIELDS, ['sbd' => $id]);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

    $result = curl_exec($ch);

    curl_close($ch);

    return $result;
}

echo getData('TS4740');

I just got the old page. Can anybody explain why? Thank you!

1 Answer 1

5

Make sure you add all the necessary headers and input data. The server that is processing this request can do all kinds of checks to see if it's a "valid" form request. As such you need to spoof the request to be as close to a regular browser request as possible.

Use tools like Chrome Dev Tools to see both the request and respons headers that are sent between the server and your browser to better understand what you curl setup should be like. And further use a app like Postman to make the request simulation super easy and to see what works and not.

Working example:

<?php

function getData($id) {
    $url = 'http://ntthnue.edu.vn/tracuudiem';
    $ch = curl_init($url);
    $postdata = 'namhoc=2015-2016&kythi_name=Tuy%E1%BB%83n+sinh+v%C3%A0o+l%E1%BB%9Bp+10&hoten=&sbd='.$id.'&btnSearch=T%C3%ACm+ki%E1%BA%BFm';
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $postdata);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_HTTPHEADER, array(
        'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Origin: http://ntthnue.edu.vn',
        'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.130 Safari/537.36',
        'Content-Type: application/x-www-form-urlencoded',
        'Referer: http://ntthnue.edu.vn/tracuudiem',
    ));

    $result = curl_exec($ch);

    curl_close($ch);

    return $result;
}

echo getData('TS4740');
Sign up to request clarification or add additional context in comments.

4 Comments

It doesn't work. I think the problem is about javascript
I've got it working with a Postman test. I've updated my answer. Make sure you pass all the inputs from the form.
Thank you, it works. But can you explain why your code can run ?
Yes, I've updated my answer for a better explanation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.