1

I am trying to get URL for top 2000 Java repos sorted by most stars in Github

Code:

import requests
import json

urls = []

#increased per_page in the request results to 100, so 20 iterations * 100 = 2000 repos
for i in range(20):

   #waiting for 75 secs every 10 requests to respect the github limit
   if i%10==0 and i >0:
      time.sleep(75)
   r = requests.get('https://api.github.com/search/repositories?q=language:java&sort=stars&order=desc&per_page=100&page='+str(i))
   print (r.ok)
   if r.ok == True:
       items = r.json()["items"]
       for each in items:
            urls.append(each["html_url"])

The request fails after 10, (r.ok==False) iterations every time so it does not cross 1000 repos. Any suggestions on the mistake would be great.

2
  • What is the status code that you're getting. Check using r.status_code. The likely cause is that you're exceeding rate limits, but I'd check since you're running out after 10 requests, so it could be something else. Commented Jul 3, 2020 at 0:23
  • @GamesBrainiac the failure status code is 403 Commented Jul 3, 2020 at 0:29

1 Answer 1

2

I think you are exceeding request limit.According to the documentation,

In rate limits,

The Search API has a custom rate limit. For requests using Basic Authentication, OAuth, or client ID and secret, you can make up to 30 requests per minute. For unauthenticated requests, the rate limit allows you to make up to 10 requests per minute.

In repository search API limit,

Find repositories via various criteria. This method returns up to 100 results per page.

So you get 1000 results per minute and exceed the limitations.

Sign up to request clarification or add additional context in comments.

5 Comments

I did not know that they introduced just 10 requests as a rate limit. Good catch there.
I tried to do a workaround with time.sleep(60) every 10 requests, bit even then it fails with status code 422
Yep.you have to wait at least 60 seconds.60 + few seconds will be ideal
i have edited the solution, it fails still with status code 422
422 means Sending invalid fields will result in a 422 Unprocessable Entity response.check your request parameters are correct after the sleep.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.