Downloading CSV file from website/server with Python 3.X

Question

Programming beginner here. So for my very first project I was able to make a quick python script that downloaded files from this website: http://www.wesm.ph/inner.php/downloads/market_prices_&_schedules

I noticed that the link address of the to-be-downloaded file followed a pattern.
(http://wesm.ph/admin/downloads/download.php?download=../csv/mpas/XXXXX/XXXX.csv)

With some string concatenation and using the datetime module, I was able to create the HTML string of the csv file. After which, I just would use the:

urllib.request.urlopen(HTMLlink).read()

and save it with something like:

with open('output.csv', "w", newline='') as f:
    writer = csv.writer(f)
    writer.writerows(fullList)

It used to work - now it doesn't. I noticed however whenever I clicked the 'Generate Report' button and THEN run the script, the script would generate the output file. I'm not sure why this works. Is there a way to send a request to their server to generate the actual file? Which module, or commands should I use?

This is because when you click Generate report the website creates the file, which your script is able to download then. Probably, after a while website removes these generated reports. What you need to do is to modigy your script so that it first submits a form, then extracts url to a generated report and finally downloads it. — vrs
– vrs, Commented Jan 1, 2016 at 18:49
@Caridorc to clarify: it used to work like 5 days ago using python 3.X — sandrosil
– sandrosil, Commented Jan 1, 2016 at 18:55
@vrs Ok, I think I get what you mean. I'll have to request the data before running the script that I made. Sorry, I'm fairly new to this but which module/command will allow me to request from said website? — sandrosil
– sandrosil, Commented Jan 1, 2016 at 19:29
@slsilv Go ahead with urllib or requests, they both have methods to make GET and POSt requests. Then you may also use BeautifulSoup to get url of the csv file. Google for it. StackOverflow is full of threads about these modules. — vrs
– vrs, Commented Jan 1, 2016 at 20:53

Danny_ds · Accepted Answer · 2016-01-01 18:53:58Z

1

Most likely those files are only temporarily stored on that webserver after you click 'Generate Report'.

In order to grenerate new ones, there might even be a check (in JavaScript or using Cookies, Session-ID), to see if the generation of the new link/file is asked from a human, or a bot.

You might also want to check the HTTP return code (or even the full returned headers to see what exactly the server is answering).

answered Jan 1, 2016 at 18:53

Danny_ds

11.4k1 gold badge26 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

sandrosil Over a year ago

Thanks for the input. I think you helped me figure it out. I checked what the link was returning and found an HTML link that essentially makes the request. With some quick manipulations I think I'll be able to figure this out.

Collectives™ on Stack Overflow

Downloading CSV file from website/server with Python 3.X

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related