4

I need to get json data and I'm using urllib2:

request = urllib2.Request(url)
request.add_header('Accept-Encoding', 'gzip')
opener = urllib2.build_opener()
connection = opener.open(request)
data = connection.read()

but although the data aren't so big it is too slow.
Is there a way to speed it up? I can use 3rd party libraries too.

3
  • This is really a poor question. What means slow? Commented Feb 26, 2011 at 9:27
  • I think the connection is what is slow Commented Feb 26, 2011 at 9:28
  • slow means that to get a 50 lines json response it takes 1 second... I thought the problem was in urllib's headers. Commented Feb 26, 2011 at 9:33

4 Answers 4

6

Accept-Encoding:gzip means that the client is ready to gzip Encoded content if the Server is ready to send it first. The rest of the request goes down the sockets and to over your Operating Systems TCP/IP stack and then to physical layer.

If the Server supports ETags, then you can send a If-None-Match header to ensure that content has not changed and rely on the cache. An example is given here.

You cannot do much with clients only to improve your HTTP request speed.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you I will check your link!
3

You're dependant on a number of different things here that may not be within your control:

  1. Latency/Bandwidth of your connection
  2. Latency/Bandwidth of server connection
  3. Load of server application and its individual processes

Items 2 and 3 are probably where the problem lies and you won't be able to do much about it. Is the content cache-able? This will depend on your own application needs and HTTP headers (e.g. ETags, Cache-Control, Last-Modified) that are returned from the server. The server may only up date every day in which case you might be better off only requesting data every hour.

2 Comments

Oh so there is no solution... the problem is me... I thought there was something else because my connection is good.
rubik - it may be the problem with the server. They could use some CDNs to improve their performance.
1

There is unlikely an issue with urllib. If you have network issues and performance problems: consider using tools like Wireshark to investigate on the network level. I have very strong doubts that this is related to Python in any way.

Comments

0

If you are making lots of requests, look into threading. Having about 10 workers making requests can speed things up - you don't grind to a halt if one of them takes too long getting a connection.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.