I am using selenium webdriver to automate downloading several PDF files. I get the PDF preview window (see below), and now I would like to download the file. How can I accomplish this using Google Chrome as the browser?
6 Answers
Try this code, it worked for me.
options = webdriver.ChromeOptions()
options.add_experimental_option('prefs', {
"download.default_directory": "C:/Users/XXXX/Desktop", #Change default directory for downloads
"download.prompt_for_download": False, #To auto download the file
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True #It will not show PDF directly in chrome
})
self.driver = webdriver.Chrome(options=options)
5 Comments
download.directory_upgrade for?I did it and it worked, don't ask me how :)
options = webdriver.ChromeOptions()
options.add_experimental_option('prefs', {
#"download.default_directory": "C:/Users/517/Download", #Change default directory for downloads
#"download.prompt_for_download": False, #To auto download the file
#"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True #It will not show PDF directly in chrome
})
driver = webdriver.Chrome(options=options)
1 Comment
I found this piece of code somewhere on Stackoverflow itself and it serves the purpose for me without having to use selenium at all.
import urllib.request
response = urllib.request.urlopen(URL)
file = open("FILENAME.pdf", 'wb')
file.write(response.read())
file.close()
1 Comment
You can download the pdf (Embeded pdf & Normal pdf) from web using selenium.
from selenium import webdriver
download_dir = "C:\\Users\\omprakashpk\\Documents" # for linux/*nix, download_dir="/usr/Public"
options = webdriver.ChromeOptions()
profile = {"plugins.plugins_list": [{"enabled": False, "name": "Chrome PDF Viewer"}], # Disable Chrome's PDF Viewer
"download.default_directory": download_dir , "download.extensions_to_open": "applications/pdf"}
options.add_experimental_option("prefs", profile)
driver = webdriver.Chrome('C:\\chromedriver\\chromedriver_2_32.exe', chrome_options=options) # Optional argument, if not specified will search path.
driver.get(`pdf_url`)
It will download and save the pdf in directory specified. Change the download_dir location and chrome driver location as per your convenience.
You can download chrome driver from here.
Hope it helps!
6 Comments
options.add_argument('headless') it doesn't work. Any idea why?add_argument("--headless"). It works with python3. I am sure, it will work for python 2 also.http://spark-public.s3.amazonaws.com/nlp/slides/AdvancedMaxent.pdf . Even wget doesn't for aws links. I'm not sure how aws checks you whether you are in gui mode or not.In My case it worked without any code modification,Just need to disabled the Chrome pdf viewer
Here are the steps to disable it
- Go into Chrome Settings
- Scroll to the bottom click on Advanced
- Under Privacy And Security - Click on "Site Settings"
- Scroll to PDF Documents
- Enable "Download PDF files instead of automatically opening them in Chrome"
Comments
You can download the PDF file using Python's requests library
import requests
pdf_url = driver.current_url # Get Current URL
response = requests.get(pdf_url)
file_name = 'filename.pdf'
with open(file_name, 'wb') as f:
f.write(response.content)
