java.io.IOException: Server returned HTTP response code: 503 for URL: Error

Question

I'm scraping data from a website by getting the HTML code from the website then parsing it in Java.

I'm currently using java.net.URL as well as java.net.URLConnection. This is the code I use to get the HTML code from a certain website (Found on this website, slightly edited to fit my needs):

public static String getURL(String name) throws Exception{

    //Set URL
    String s = "";
    URL url = new URL(name);
    URLConnection spoof = url.openConnection();

    //Spoof the connection so we look like a web browser
    spoof.setRequestProperty( "User-Agent", "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0; H010818)" );
    BufferedReader in = new BufferedReader(new InputStreamReader(spoof.getInputStream()));
    String strLine = "";

    //Loop through every line in the source
    while ((strLine = in.readLine()) != null){

        //Prints each line to the console
        s = s + strLine + "\n";
    }
    return s;
}

When I run it, the HTML code is received correctly for about 100-200 webpages. However, before I am done grabbing HTML code, I get a "java.io.IOException: Server returned HTTP response code: 503 for URL" exception. I've researched this topic fully and other questions like this one do not cover the package I am using.

Thanks in advance for the help!

A 503 is usually caused by a temporary overloading of the web server. It may be your process that's swamping it, or maybe there's something else accessing the web server. What happens if you try inserting a short sleep between each of your requests? — Dawood ibn Kareem
– Dawood ibn Kareem, Commented Jan 30, 2014 at 4:36
Running it now. With a 100-millisecond rest in between each access, there seem to be fewer long pauses in between each access, but they are still there. Waiting until it is done. Edit 1: At access 339 out of 358, it gives the same error. Adding the delay did not seem to help, so I'll run it with a 1000-second delay. — user3251567
– user3251567, Commented Jan 30, 2014 at 4:40
Okay. Adding a full 1-second delay still puts it out at about 240 accesses. I'll try the answer below. — user3251567
– user3251567, Commented Jan 30, 2014 at 4:57

Vlad Sonkin · Accepted Answer · 2014-01-30 04:51:52Z

1

Maybe server have a limits. In this case you can try Socket and input/outputStream instead of URLConnection

answered Jan 30, 2014 at 4:51

Vlad Sonkin

3953 silver badges4 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

java.io.IOException: Server returned HTTP response code: 503 for URL: Error

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related