0

I just found this great wget wrapper and I'd like to rewrite it as a python script using the subprocess module. However it turns out to be quite tricky giving me all sorts of errors.

download()
{
    local url=$1
    echo -n "    "
    wget --progress=dot $url 2>&1 | grep --line-buffered "%" | \
    sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", $2)}'

    echo -ne "\b\b\b\b"
    echo " DONE"
}

Then it can be called like this:

file="patch-2.6.37.gz"
echo -n "Downloading $file:"
download "http://www.kernel.org/pub/linux/kernel/v2.6/$file"

Any ideas?

Source: http://fitnr.com/showing-file-download-progress-using-wget.html

5
  • 3
    You'll need to how us what you have tried in Python so that we'll be able to help you. Commented Dec 5, 2013 at 7:06
  • Basically nothing yet..! I am currently lost in the subprocess documentation..! The ideal thing to do here would be an insightful explanation of a proposed solution so that I can properly grasp the concept of the subprocess module and expand on it. Commented Dec 5, 2013 at 7:12
  • Allright, so far I did this:wgetExecutable = '/usr/bin/wget' grepExecutable = '/usr/grep' wgetParameters = ['--progress=dot', "link_to_file"] grepParameters = ['--line-buffered', "%"] wgetPopen = subprocess.Popen([wgetExecutable] + wgetParameters, stdout=subprocess.PIPE) Commented Dec 5, 2013 at 10:22
  • grepPopen = subprocess.Popen([grepExecutable] + grepParameters, stdin=wgetPopen.stdout) however I get an error in stdin=wgetPopen.stdout OSError: [Errno 2] No such file or directory Commented Dec 5, 2013 at 10:29
  • Note that there is also an sh module (with that name) that can take care of the bridge between bash and python! Commented Dec 9, 2013 at 7:50

5 Answers 5

5
+100

I think you're not far off. Mainly I'm wondering, why bother with running pipes into grep and sed and awk when you can do all that internally in Python?

#! /usr/bin/env python

import re
import subprocess

TARGET_FILE = "linux-2.6.0.tar.xz"
TARGET_LINK = "http://www.kernel.org/pub/linux/kernel/v2.6/%s" % TARGET_FILE

wgetExecutable = '/usr/bin/wget'
wgetParameters = ['--progress=dot', TARGET_LINK]

wgetPopen = subprocess.Popen([wgetExecutable] + wgetParameters,
                             stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

for line in iter(wgetPopen.stdout.readline, b''):
    match = re.search(r'\d+%', line)
    if match:
        print '\b\b\b\b' + match.group(0),

wgetPopen.stdout.close()
wgetPopen.wait()
Sign up to request clarification or add additional context in comments.

10 Comments

It does. Try on a smaller file. Or wait a little longer. :-)
Your code seems to update on some sort of intervals and in this file for example the first progress indication is only after 25%. However I need the progress to be instantaneous from the start just like the bash script..!
On my machine the behavior of this script is identical to the behavior of the bash script you posted. They both produce line-buffered output at the same rate. I'd be happy to adjust the script to do something different but I'm not able to reproduce the behavior you're talking about. I suspect that you're just seeing different response times for different files.
Ah: I get results closer to what you describe if I use awk -W interactive in the bash script. I'll poke at this some more later and see if I need to do something special to force line-buffered output in subprocess.
+1. wgetPopen.stdout might be destroyed (I expect so, but I don't know). As well as with ordinary files, it is better to close them explicitly (with-statement is used for the files) without relying on garbage collection (that is complex and hard to reason about). if not obj says "if obj empty or zero" (the test for None should be written as if obj is None) without concerning with types e.g., in Python 3 pipe.readline() may return b'' or '' that are different types and if not line works for both. And It supports both Python 2/3 from the same source.
|
2

If you are rewriting the script in Python; you could replace wget by urllib.urlretrieve() in this case:

#!/usr/bin/env python
import os
import posixpath
import sys
import urllib
import urlparse

def url2filename(url):
    """Return basename corresponding to url.

    >>> url2filename('http://example.com/path/to/file?opt=1')
    'file'
    """
    urlpath = urlparse.urlsplit(url).path  # pylint: disable=E1103
    basename = posixpath.basename(urllib.unquote(urlpath))
    if os.path.basename(basename) != basename:
        raise ValueError  # refuse 'dir%5Cbasename.ext' on Windows
    return basename

def reporthook(blocknum, blocksize, totalsize):
    """Report download progress on stderr."""
    readsofar = blocknum * blocksize
    if totalsize > 0:
        percent = readsofar * 1e2 / totalsize
        s = "\r%5.1f%% %*d / %d" % (
            percent, len(str(totalsize)), readsofar, totalsize)
        sys.stderr.write(s)
        if readsofar >= totalsize: # near the end
            sys.stderr.write("\n")
    else: # total size is unknown
        sys.stderr.write("read %d\n" % (readsofar,))

url = sys.argv[1]
filename = sys.argv[2] if len(sys.argv) > 2 else url2filename(url)
urllib.urlretrieve(url, filename, reporthook)

Example:

$ python download-file.py http://example.com/path/to/file 

It downloads the url to a file. If the file is not given then it uses basename from the url.

You could also run wget if you need it:

#!/usr/bin/env python
import sys
from subprocess import Popen, PIPE, STDOUT

def urlretrieve(url, filename=None, width=4):
    destination = ["-O", filename] if filename is not None else []
    p = Popen(["wget"] + destination + ["--progress=dot", url],
              stdout=PIPE, stderr=STDOUT, bufsize=1) # line-buffered (out side)
    for line in iter(p.stdout.readline, b''):
        if b'%' in line: # grep "%"
            line = line.replace(b'.', b'') # sed -u -e "s,\.,,g"
            percents = line.split(None, 2)[1].decode() # awk $2
            sys.stderr.write("\b"*width + percents.rjust(width))
    p.communicate() # close stdout, wait for child's exit
    print("\b"*width + "DONE")

url = sys.argv[1]
filename = sys.argv[2] if len(sys.argv) > 2 else None
urlretrieve(url, filename)

I have not noticed any buffering issues with this code.

Comments

2

I've done something like this before. and i'd love to share my code with you:)

#!/usr/bin/python2.7
# encoding=utf-8

import sys
import os
import datetime

SHEBANG = "#!/bin/bash\n\n"

def get_cmd(editor='vim', initial_cmd=""):
    from subprocess import call
    from tempfile import NamedTemporaryFile
    # Create the initial temporary file.
    with NamedTemporaryFile(delete=False) as tf:
        tfName = tf.name
        tf.write(initial_cmd)
    # Fire up the editor.
    if call([editor, tfName], shell=False) != 0:
        return None
        # Editor died or was killed.
        # Get the modified content.
    fd = open(tfName)
    res = fd.read()
    fd.close()
    os.remove(tfName)
    return res

def main():
    initial_cmd = "wget " + sys.argv[1]
    cmd  = get_cmd(editor='vim', initial_cmd=initial_cmd)
    if len(sys.argv) > 1 and sys.argv[1] == 's':
        #keep the download infomation.
        t = datetime.datetime.now()
        filename = "swget_%02d%02d%02d%02d%02d" %\
                (t.month, t.day, t.hour, t.minute, t.second)
        with open(filename, 'w') as f:
            f.write(SHEBANG)
            f.write(cmd)
            f.close()
            os.chmod(filename, 0777)
    os.system(cmd)

main()


# run this script with the optional argument 's'
# copy the command to the editor, then save and quit. it will 
# begin to download. if you have use the argument 's'.
# then this script will create another executable script, you 
# can use that script to resume you interrupt download.( if server support)

so, basically, you just need to modify the initial_cmd's value, in your case, it's

wget --progress=dot $url 2>&1 | grep --line-buffered "%" | \
    sed -u -e "s,\.,,g" | awk '{printf("\b\b\b\b%4s", $2)}'

this script will first create a temp file, then put shell commands in it, and give it execute permissions. and finally run the temp file with commands in it.

2 Comments

i'd love to give you some feedback :) You could call(filename) instead of os.system(cmd). To format datetime, you could use .strftime() method. with-statement closes files automatically that is the point of using it in the first place, no need to call f.close() by hand (unindent chmod in this case). If you want to make script executable by your user: os.chmod(filename, os.stat(filename).st_mode | stat.S_IEXEC) (or | 0111 for +x). To avoid leaking files, move code inside with Named..File() as tf: call tf.flush() before call([editor..) then tf.seek(0); res=tf.read()
@J.F.Sebastian wow, thank you, man! it's a script i wrote long time ago. I was a bad python programmer back then:) thank you for pointing that out!
1

vim download.py

#!/usr/bin/env python

import subprocess
import os

sh_cmd = r"""
download()
{
    local url=$1
    echo -n "    "
    wget --progress=dot $url 2>&1 |
        grep --line-buffered "%"  |
        sed -u -e "s,\.,,g"       |
        awk '{printf("\b\b\b\b%4s", $2)}'

    echo -ne "\b\b\b\b"
    echo " DONE"
}
download "http://www.kernel.org/pub/linux/kernel/v2.6/$file"
"""

cmd = 'sh'
p = subprocess.Popen(cmd, 
    shell=True,
    stdin=subprocess.PIPE,
    env=os.environ
)
p.communicate(input=sh_cmd)

# or:
# p = subprocess.Popen(cmd,
#    shell=True,
#    stdin=subprocess.PIPE,
#    env={'file':'xx'})
# 
# p.communicate(input=sh_cmd)

# or:
# p = subprocess.Popen(cmd, shell=True,
#    stdin=subprocess.PIPE,
#    stdout=subprocess.PIPE,
#    stderr=subprocess.PIPE,
#    env=os.environ)
# stdout, stderr = p.communicate(input=sh_cmd)

then you can call like:

file="xxx" python dowload.py

3 Comments

Why use sh as the command, and use shell=True? Why not run sh_cmd directly?
@MartijnPieters Because the sh_cmd is not a "shell command", so we use sh to run it. In linux shell, we can use sh script.sh , and we can also use a PIPE or stdin to run some command, such as:cat some_file | sh or curl http://xxx.xx | sh and so on. For shell=Ture, From the docs, is says:The shell argument (which defaults to False) specifies whether to use the shell as the program to execute. If shell is True, it is recommended to pass args as a string rather than as a sequence.
If you set shell=True a shell is used to run the command you pass in. You quoted the documentation yourself there.
0

In very simple words, considering you have script.sh file, you can execute it and print its return value, if any:

import subprocess
process = subprocess.Popen('/path/to/script.sh', shell=True, stdout=subprocess.PIPE)
process.wait()
print process.returncode

2 Comments

And ensure the script.sh has execute permission(chmod +x script.sh) or Popen('sh /path/to/script.sh', shell=True ...)
sure, it must have an execute permission +X, otherwise, it will give you an error, then, the above python code should work like a charm!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.