0

Programming beginner here. So for my very first project I was able to make a quick python script that downloaded files from this website: http://www.wesm.ph/inner.php/downloads/market_prices_&_schedules

I noticed that the link address of the to-be-downloaded file followed a pattern.
(http://wesm.ph/admin/downloads/download.php?download=../csv/mpas/XXXXX/XXXX.csv)

With some string concatenation and using the datetime module, I was able to create the HTML string of the csv file. After which, I just would use the:

urllib.request.urlopen(HTMLlink).read()

and save it with something like:

with open('output.csv', "w", newline='') as f:
    writer = csv.writer(f)
    writer.writerows(fullList)

It used to work - now it doesn't. I noticed however whenever I clicked the 'Generate Report' button and THEN run the script, the script would generate the output file. I'm not sure why this works. Is there a way to send a request to their server to generate the actual file? Which module, or commands should I use?

5
  • It used to work do you mean in Python 2 it used to work? Commented Jan 1, 2016 at 18:37
  • 3
    This is because when you click Generate report the website creates the file, which your script is able to download then. Probably, after a while website removes these generated reports. What you need to do is to modigy your script so that it first submits a form, then extracts url to a generated report and finally downloads it. Commented Jan 1, 2016 at 18:49
  • @Caridorc to clarify: it used to work like 5 days ago using python 3.X Commented Jan 1, 2016 at 18:55
  • @vrs Ok, I think I get what you mean. I'll have to request the data before running the script that I made. Sorry, I'm fairly new to this but which module/command will allow me to request from said website? Commented Jan 1, 2016 at 19:29
  • @slsilv Go ahead with urllib or requests, they both have methods to make GET and POSt requests. Then you may also use BeautifulSoup to get url of the csv file. Google for it. StackOverflow is full of threads about these modules. Commented Jan 1, 2016 at 20:53

1 Answer 1

1

Most likely those files are only temporarily stored on that webserver after you click 'Generate Report'.

In order to grenerate new ones, there might even be a check (in JavaScript or using Cookies, Session-ID), to see if the generation of the new link/file is asked from a human, or a bot.

You might also want to check the HTTP return code (or even the full returned headers to see what exactly the server is answering).

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the input. I think you helped me figure it out. I checked what the link was returning and found an HTML link that essentially makes the request. With some quick manipulations I think I'll be able to figure this out.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.