2

i created simple python for downloading excel file from internet, I'm using python and selenium with Chromedriver, the problem is, when download complete and file is saved, i can't read that excel file (tried with Libreoffice, MsExcel), but it can be readed when I manually download that file whitout Selenium, when i tried to read those file using python xlrd the error is zipfile.BadZipFile: Bad magic number for file header

at first i was think the download is not finished yet because closing the browser too soon, tried to increasing sleep(20 but the result is the same.

is there anything I have missed in the process that I have been working on?

here is my python script.

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException

def every_downloads_chrome(driver):
    if not driver.current_url.startswith("chrome://downloads"):
        driver.get("chrome://downloads/")
    return driver.execute_script("""
        var items = downloads.Manager.get().items_;
        if (items.every(e => e.state === "COMPLETE"))
            return items.map(e => e.fileUrl || e.file_url);
        """)

uri = "https://cfs.ojk.go.id/cfs/ReportViewerForm.aspx?BankCode=PT.+BPR+Cikarang+Raharja&Month=3&Year=2019&FinancialReportPeriodTypeCode=R&FinancialReportTypeCode=BPK-900-000002"
option = webdriver.ChromeOptions()
option.add_argument("--incognito")
option.add_argument("--window-size=400,400")
option.add_argument('disable-component-cloud-policy')
option.add_experimental_option("prefs", {
  "download.prompt_for_download": False,
  "download.directory_upgrade": False,
  "safebrowsing.enabled": True
})
browser = webdriver.Chrome(executable_path=chromedriver_path, options=option)
browser.get(uri)
timeout = 20
try:
    WebDriverWait(browser, timeout).until(EC.visibility_of_element_located((By.XPATH, '//*[@id="CFSReportViewer_ctl05_ctl04_ctl00_ButtonImg"]')))
except TimeoutException:
    browser.quit()

#Download the file by execute JS Command
browser.execute_script("$find('CFSReportViewer').exportReport('EXCELOPENXML');")

#wait until download is finish
WebDriverWait(browser, 120,1).until(every_downloads_chrome)

#break 2 second and quit
time.sleep(2)
browser.quit()

I'm trying to download excel from this site

https://cfs.ojk.go.id/cfs/ReportViewerForm.aspx?BankCode=PT.+BPR+Cikarang+Raharja&Month=3&Year=2019&FinancialReportPeriodTypeCode=R&FinancialReportTypeCode=BPK-900-000002

Btw I'm using Mac OS with Chrome 77 and Chromedrive 77.0.3865.40

here is the case video https://drive.google.com/file/d/1N6q66AVpo4XtrZemxoD5E94xUohzcaNx/view

Update

It was my environment, I'm using virtualenv ro run this script and got that error message for downloaded file, but when i try to run whitout virtualenv the file can be read without a singel error, that's make me sure that was my virtualenv.

2
  • I tried your code and the excel file can be opened successfully. Maybe the problem is not "the download is not finished". You can tried to sleep more seconds to clarify the issue. Commented Sep 27, 2019 at 10:00
  • I've add more seconds, and the result remain same, I dont know whats wrong yet. still trying to find out. Commented Sep 30, 2019 at 9:29

1 Answer 1

1

I have no problem if I change the javascript to clicking the a tag

browser.execute_script('document.querySelector("[alt=Excel]").click();')
Sign up to request clarification or add additional context in comments.

6 Comments

just tried, still can't open saved excel file, btw I'm using Mac OSx, does the operating system version have any effect?
Maybe though not sure why. I ran twice on Windows no problem. Does your file have an .xlsx extension however?
Hummmmmm..... I may try running on mac when home though don't really want to install libreoffice.
I don't think this has anything to do with the operating system version, I've tried on windows as well, but the results still remain the same
I assume it is something about your environment. Because I ran with your code minus the executable path and added in my line in place of yours for js and it works in repeated runs.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.