Basic http file downloading and saving to disk in python?

Question

^{I've been going through the Q&A on this site, for an answer to my question. However, I'm a beginner and I find it difficult to understand some of the solutions. I need a very basic solution.}

Could someone please explain a simple solution to 'Downloading a file through http' and 'Saving it to disk, in Windows', to me?

I'm not sure how to use shutil and os modules, either.

The file I want to download is under 500 MB and is an .gz archive file.If someone can explain how to extract the archive and utilise the files in it also, that would be great!

Here's a partial solution, that I wrote from various answers combined:

import requests
import os
import shutil

global dump

def download_file():
    global dump
    url = "http://randomsite.com/file.gz"
    file = requests.get(url, stream=True)
    dump = file.raw

def save_file():
    global dump
    location = os.path.abspath("D:\folder\file.gz")
    with open("file.gz", 'wb') as location:
        shutil.copyfileobj(dump, location)
    del dump

Could someone point out errors (beginner level) and explain any easier methods to do this?

note if you are downloading from pycharm note that who knows where the "current folder is" — Charlie Parker
– Charlie Parker, Commented Aug 10, 2021 at 17:45

Community · Accepted Answer · 2017-05-23 11:47:32Z

225

A clean way to download a file is:

import urllib

testfile = urllib.URLopener()
testfile.retrieve("http://randomsite.com/file.gz", "file.gz")

This downloads a file from a website and names it file.gz. This is one of my favorite solutions, from Downloading a picture via urllib and python.

This example uses the urllib library, and it will directly retrieve the file form a source.

edited May 23, 2017 at 11:47

CommunityBot

11 silver badge

answered Oct 26, 2013 at 4:59

Blue Ice

7,9406 gold badges34 silver badges52 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

arvindch Over a year ago

Ok, thanks! But is there a way to get it working through requests?

John Lapoya Over a year ago

Any possibility to save in /myfolder/file.gz ?

Dharmit Over a year ago

No better possibility than trying it yourself, maybe? :) I could successfully do testfile.retrieve("http://example.com/example.rpm", "/tmp/test.rpm").

MichielB Over a year ago

This is deprecated since Python 3.3, and the urllib.request.urlretrieve solution (see answer below) is the 'modern' way

Estefy Over a year ago

What is the best way to add a username and password to this code? tks

|

Om Sao · Accepted Answer · 2019-05-14 11:10:18Z

222

For Python3+ URLopener is deprecated. And when used you will get error as below:

url_opener = urllib.URLopener() AttributeError: module 'urllib' has no attribute 'URLopener'

So, try:

import urllib.request 
urllib.request.urlretrieve(url, filename)

answered May 14, 2019 at 11:10

Om Sao

7,7432 gold badges49 silver badges72 bronze badges

5 Comments

wowkin2 Over a year ago

Weird... Why nobody votes for this answer when Python 2 became deprecated and only this solution should work properly...

Yechiel K Over a year ago

Agreed! I was pulling my hair over the earlier solutions. Wish I could upvote 200 times!

Charlie Parker Over a year ago

how do indicate which folder/path to save the contents of the url?

Charlie Parker Over a year ago

note if you are downloading from pycharm note that who knows where the "current folder is"

Fernando Ortega Over a year ago

You deserve more upvotes. I don't understand why solutions for Python 2 are still accepted.

Community · Accepted Answer · 2017-05-23 12:02:48Z

121

As mentioned here:

import urllib
urllib.urlretrieve ("http://randomsite.com/file.gz", "file.gz")

EDIT: If you still want to use requests, take a look at this question or this one.

edited May 23, 2017 at 12:02

CommunityBot

11 silver badge

answered Oct 26, 2013 at 5:00

dparpyani

2,5032 gold badges16 silver badges16 bronze badges

9 Comments

arvindch Over a year ago

urllib will work, however, many people seem to recommend the use of requests over urllib. Why's that?

dparpyani Over a year ago

requests is extremely helpful compared to urllib when working with a REST API. Unless, you are looking to do a lot more, this should be good.

arvindch Over a year ago

Ok, now I've read the links you've provided for requests usage. I'm confused about how to declare the file path, for saving the download. How do I use os and shutil for this?

Flash Over a year ago

For Python3: import urllib.request urllib.request.urlretrieve(url, filename)

Aashish T Over a year ago

I am not able to extract the http status code with this if the download fails

|

falsePockets · Accepted Answer · 2025-02-17 17:32:46Z

44

Four methods using wget, urllib and request.

#!/usr/bin/python
import requests
from io import StringIO
from PIL import Image
import profile as profile
import urllib
import wget


url = 'https://tinypng.com/images/social/website.jpg'

def testRequest():
    image_name = 'test1.jpg'
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(image_name, 'wb') as f:
            for chunk in r.iter_content():
                f.write(chunk)

def testRequest2():
    image_name = 'test2.jpg'
    r = requests.get(url)
    r.raise_for_status()
    i = Image.open(StringIO(r.content))
    i.save(image_name)

def testUrllib():
    image_name = 'test3.jpg'
    testfile = urllib.URLopener()
    testfile.retrieve(url, image_name)

def testwget():
    image_name = 'test4.jpg'
    wget.download(url, image_name)

if __name__ == '__main__':
    profile.run('testRequest()')
    profile.run('testRequest2()')
    profile.run('testUrllib()')
    profile.run('testwget()')

testRequest - 4469882 function calls (4469842 primitive calls) in 20.236 seconds

testRequest2 - 8580 function calls (8574 primitive calls) in 0.072 seconds

testUrllib - 3810 function calls (3775 primitive calls) in 0.036 seconds

testwget - 3489 function calls in 0.020 seconds

edited Feb 17 at 17:32

falsePockets

4,4237 gold badges26 silver badges44 bronze badges

answered Jul 24, 2017 at 11:21

Saurabh yadav

6666 silver badges7 bronze badges

1 Comment

Abdelhak Over a year ago

How did you get the number of function calls?

Ali · Accepted Answer · 2014-09-13 21:38:41Z

38

I use wget.

Simple and good library if you want to example?

import wget

file_url = 'http://johndoe.com/download.zip'

file_name = wget.download(file_url)

wget module support python 2 and python 3 versions

edited Sep 13, 2014 at 21:38

answered Sep 13, 2014 at 21:13

Ali

1,3934 gold badges21 silver badges32 bronze badges

Comments

Max · Accepted Answer · 2017-11-22 00:50:39Z

6

Exotic Windows Solution

import subprocess

subprocess.run("powershell Invoke-WebRequest {} -OutFile {}".format(your_url, filename), shell=True)

answered Nov 22, 2017 at 0:50

Max

1,85518 silver badges22 bronze badges

Comments

falsePockets · Accepted Answer · 2025-02-17 17:28:41Z

3

For text files, you can use:

import requests

url = 'https://WEBSITE.com'
req = requests.get(url)
req.raise_for_status()
path = "C:\\YOUR\\FILE.html"

with open(path, 'wb') as f:
    f.write(req.content)

edited Feb 17 at 17:28

falsePockets

4,4237 gold badges26 silver badges44 bronze badges

answered Sep 21, 2020 at 7:17

DaWe

1,7231 gold badge19 silver badges32 bronze badges

3 Comments

Michael Schnerring Over a year ago

Don't you have to req.iter_content()? Or use the req.raw file object? See this

DaWe Over a year ago

No, it just works, haven't you tried? @MichaelSchnerring

falsePockets Feb 17 at 17:34

Note that this loads the entire file into memory before writing, so will not work for large files.

Jayme Snyder · Accepted Answer · 2018-06-08 15:17:57Z

2

I started down this path because ESXi's wget is not compiled with SSL and I wanted to download an OVA from a vendor's website directly onto the ESXi host which is on the other side of the world.

I had to disable the firewall(lazy)/enable https out by editing the rules(proper)

created the python script:

import ssl
import shutil
import tempfile
import urllib.request
context = ssl._create_unverified_context()

dlurl='https://somesite/path/whatever'
with urllib.request.urlopen(durl, context=context) as response:
    with open("file.ova", 'wb') as tmp_file:
        shutil.copyfileobj(response, tmp_file)

ESXi libraries are kind of paired down but the open source weasel installer seemed to use urllib for https... so it inspired me to go down this path

edited Jun 8, 2018 at 15:17

answered Jun 8, 2018 at 15:03

Jayme Snyder

763 bronze badges

Comments

falsePockets · Accepted Answer · 2025-02-17 17:45:12Z

For those who want a solution with requests, here you go:

import requests

url = 'https://static.wikia.nocookie.net/dqw4w9wgxcq/images/0/08/Site-background-dark/revision/latest?cb=20220428173233'
path = "myfile.jpg"

with requests.get(url, stream=True) as r:
    r.raise_for_status()
    with open(path, 'wb') as f:
        for chunk in r.iter_content():
            f.write(chunk)

Note that:

session.get() should also work, if you already have a requests.Session() (which is generally a good idea if you are making many requests.)
raise_for_status() checks whether the HTTP status code is an error. Without that you might blindly download a json or HTML error page and save it with a .jpg file extension. Without it, you won't notice what's gone wrong until you've wasted time debugging why your .jpg (or .mp3 or whatever format you're expecting) is corrupt. (The other solutions with requests didn't include this, until I edited them to add it.)
This stream=True approach means that you don't have to load the whole object into memory first. e.g. if you have 8GB of memory, you can still download a 16GB file with this approach
We open the stream with a with context manager. This helps tidy things up. Otherwise it's possible to get issues with connections not being returned to the pool.

The urllib solution above works fine. Since urllib is pre-installed and requests not, you should probably only use this approach if you already have requests installed and imported.

CDspace · Accepted Answer · 2017-01-25 15:30:34Z

-5

Another clean way to save the file is this:

import csv
import urllib

urllib.retrieve("your url goes here" , "output.csv")

edited Jan 25, 2017 at 15:30

CDspace

2,69919 gold badges32 silver badges39 bronze badges

answered Sep 30, 2014 at 16:46

Ala

113 bronze badges

2 Comments

mateor Over a year ago

This should probably be urllib.urlretrieve or urllib.URLopener().retrieve, unclear which you meant here.

Azeezah M Over a year ago

Why do you import csv if you're just naming a file?

Collectives™ on Stack Overflow

Basic http file downloading and saving to disk in python?

10 Answers 10

11 Comments

5 Comments

9 Comments

1 Comment

Comments

Comments

3 Comments

Comments

Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

10 Answers 10

11 Comments

5 Comments

9 Comments

1 Comment

Comments

Comments

3 Comments

Comments

Comments

2 Comments

Linked

Related