Selenium Webdriver: How to Download a PDF File with Python?

Question

I am using selenium webdriver to automate downloading several PDF files. I get the PDF preview window (see below), and now I would like to download the file. How can I accomplish this using Google Chrome as the browser?

Take a look at this answer... maybe it'll help you.

dot.Py
– dot.Py

2017-07-27 11:35:31 +00:00
Commented Jul 27, 2017 at 11:35 — dot.Py
– dot.Py, Commented Jul 27, 2017 at 11:35

Celius Stingher · Accepted Answer · 2022-08-08 22:11:07Z

48

Try this code, it worked for me.

options = webdriver.ChromeOptions()
options.add_experimental_option('prefs', {
"download.default_directory": "C:/Users/XXXX/Desktop", #Change default directory for downloads
"download.prompt_for_download": False, #To auto download the file
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True #It will not show PDF directly in chrome
})
self.driver = webdriver.Chrome(options=options)

edited Aug 8, 2022 at 22:11

Celius Stingher

18.4k6 gold badges26 silver badges54 bronze badges

answered Jan 29, 2019 at 18:12

Kumar

4814 silver badges2 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Abang F. Over a year ago

This didn't work for me until I changed the default directory to use backslash, so instead of "C:/Users/XXXX/Desktop" I use "C:\\Users\\XXXX\\Desktop".

Nam G VU Over a year ago

What is download.directory_upgrade for?

rsc05 Over a year ago

Ordinal0 [0x00A75230+1856048] BaseThreadInitThunk [0x76FDFA29+25] RtlGetAppContainerNamedObjectPath [0x77A37B5E+286] RtlGetAppContainerNamedObjectPath [0x77A37B2E+238]

Liquidgenius Over a year ago

Confirming this works using Splinter (based on Selenium) which doesn't do file downloads.

Sam Ginrich Over a year ago

Which line does the download?

Nick · Accepted Answer · 2021-05-30 01:43:50Z

6

I did it and it worked, don't ask me how :)

options = webdriver.ChromeOptions()
options.add_experimental_option('prefs', {
#"download.default_directory": "C:/Users/517/Download", #Change default directory for downloads
#"download.prompt_for_download": False, #To auto download the file
#"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True #It will not show PDF directly in chrome 
})
driver = webdriver.Chrome(options=options)

edited May 30, 2021 at 1:43

Nick

147k23 gold badges67 silver badges106 bronze badges

answered May 30, 2021 at 1:39

user16072805

691 silver badge1 bronze badge

1 Comment

rsc05 Over a year ago

Ordinal0 [0x00A75230+1856048] BaseThreadInitThunk [0x76FDFA29+25] RtlGetAppContainerNamedObjectPath [0x77A37B5E+286] RtlGetAppContainerNamedObjectPath [0x77A37B2E+238]

Saravana · Accepted Answer · 2021-06-18 02:39:44Z

4

I found this piece of code somewhere on Stackoverflow itself and it serves the purpose for me without having to use selenium at all.

import urllib.request

response = urllib.request.urlopen(URL)    
file = open("FILENAME.pdf", 'wb')
file.write(response.read())
file.close()

answered Jun 18, 2021 at 2:39

Saravana

571 silver badge2 bronze badges

1 Comment

Liquidgenius Over a year ago

This method will only work for non-authenticated sessions. It is not robust to websites which require a login. @Kumar's answer will work for both non-authenticated and authenticated sessions.

Om Prakash · Accepted Answer · 2018-02-09 12:13:56Z

3

You can download the pdf (Embeded pdf & Normal pdf) from web using selenium.

from selenium import webdriver

download_dir = "C:\\Users\\omprakashpk\\Documents" # for linux/*nix, download_dir="/usr/Public"
options = webdriver.ChromeOptions()

profile = {"plugins.plugins_list": [{"enabled": False, "name": "Chrome PDF Viewer"}], # Disable Chrome's PDF Viewer
               "download.default_directory": download_dir , "download.extensions_to_open": "applications/pdf"}
options.add_experimental_option("prefs", profile)
driver = webdriver.Chrome('C:\\chromedriver\\chromedriver_2_32.exe', chrome_options=options)  # Optional argument, if not specified will search path.

driver.get(`pdf_url`)

It will download and save the pdf in directory specified. Change the download_dir location and chrome driver location as per your convenience.

You can download chrome driver from here.

Hope it helps!

edited Feb 9, 2018 at 12:13

answered Feb 9, 2018 at 11:50

Om Prakash

2,9014 gold badges32 silver badges50 bronze badges

6 Comments

jaggi Over a year ago

this works with gui, if I add options.add_argument('headless') it doesn't work. Any idea why?

Om Prakash Over a year ago

Try add_argument("--headless"). It works with python3. I am sure, it will work for python 2 also.

jaggi Over a year ago

I'm also using python3. it might be working for other pdf links but for AWS S3 links, it's not working. eg:http://spark-public.s3.amazonaws.com/nlp/slides/AdvancedMaxent.pdf . Even wget doesn't for aws links. I'm not sure how aws checks you whether you are in gui mode or not.

jaggi Over a year ago

it seems that 'not allowing' file downloads in headless mode is a security feature bugs.chromium.org/p/chromium/issues/detail?id=696481#c39

exteral Over a year ago

@ Om Prakash， have you tested your code with mode of headless chrome? Because I tested the code from your github page in headless chrome and it didn't work.

|

Umer · Accepted Answer · 2020-05-20 18:18:47Z

-2

In My case it worked without any code modification,Just need to disabled the Chrome pdf viewer

Here are the steps to disable it

Go into Chrome Settings
Scroll to the bottom click on Advanced
Under Privacy And Security - Click on "Site Settings"
Scroll to PDF Documents
Enable "Download PDF files instead of automatically opening them in Chrome"

answered May 20, 2020 at 18:18

Umer

1,1581 gold badge13 silver badges31 bronze badges

Comments

Ravi Teja · Accepted Answer · 2023-02-17 07:05:09Z

-2

You can download the PDF file using Python's requests library

import requests
pdf_url = driver.current_url       # Get Current URL
response = requests.get(pdf_url)
file_name = 'filename.pdf'
with open(file_name, 'wb') as f:
   f.write(response.content)

answered Feb 17, 2023 at 7:05

Ravi Teja

431 silver badge10 bronze badges

1 Comment

Branden Keck Over a year ago

I believe this answer was downvoted because the question concerns a website with an embedded PDF (i.e. <embed> tag) where this wouldn't work. However, I have a use case where the PDF is being displayed in-brower from the website and this answer is a far better solution than using selenium. I am commenting to note this for any others that see this post.

Collectives™ on Stack Overflow

Selenium Webdriver: How to Download a PDF File with Python?

6 Answers 6

5 Comments

1 Comment

1 Comment

6 Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

5 Comments

1 Comment

1 Comment

6 Comments

Comments

1 Comment

Linked

Related