1

So using my linux terminal, I can run a command to download all pdfs from a website

wget -A pdf -m -p -E -k -K -np http://site/path/

but I want to automate the process. For example run the command for multiple urls and then process the downloaded files later using Python/Jupyter notebook. The wget library in Python is different and it does not allow me to use the same options/parameters that I can use in wget on my Linux machine. So, how can I achieve the same thing using Python?

0

2 Answers 2

3

You can just use the os library so it would look something like this

import os
os.system('wget -A pdf -m -p -E -k -K -np http://site/path/')

And with that you are just passing a command to the system.

Sign up to request clarification or add additional context in comments.

6 Comments

what if I want to use it in a loop, for example use it for multiple urls in a list?
You call it once for each web site in the loop. You are building the string to be passed to System, so you decide what's inside the string. Just try it, and if it does not work, ask a new question by showing your failed attempt.
You can just loop throw the list, something like this gist.github.com/abodsakah/550ed2bae02e7f8c744b1062ef0b2620
I tried this but it does not download any command with wget as it is supposed to. Nothing happens.
You can look at this library maybe it is better pypi.org/project/wget
|
0

You don't need Python for that.

#!/bin/bash
for url in "http://site/path/" "https://example.com/another"
do
    wget -A pdf -m -p -E -k -K -np "$url"
done

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.