0

I'm in the process of porting over a MySQL database to a Heroku hosted, dedicated PostgreSQL instance. I understand how to get the initial data over to Heroku. However, there is a daily "feed" of data from an external company that will need to be imported each day. It is pushed up to an FTP server and it's a zip file containing several different CSV files. Normally, I could/would just scp it over to the Postgres box and then have a cron job that does a "COPY tablename FROM path/to/file.csv" to import the data. However, using Heroku has me a bit baffled as to the best way to do this. Note: I've seen and reviewed the heroku dev article on importing data. But, this is more of a dump file. I'm just dealing with a daily import from a CSV file.

Does anyone do something similar to this on Heroku? If so, can you give any advice on what's the best way.

Just a bit more info: My application is Python/Django 1.3.3 on the Cedar stack. And my files can be a bit large. Some of them can be over 50K records. So, to loop through them and use the Django ORM is probably going to be a bit slow (but, still might be the best/only solution).

0

1 Answer 1

1

Two options:

  1. Boot up a non-heroku EC2 instance, fetch from FTP, unzip and initiate the copy from there. By making use of the COPY STDIN option (http://www.postgresql.org/docs/9.1/static/sql-copy.html) you can instruct it that the data is coming from the client connection, as opposed to from a file on the server's filesystem which you don't have access to.

  2. How large is the file? It might fit in a dyno's ephemeral filesystem, so a process or one off job can download the file from the FTP server and do the whole process from within a dyno. Once the process exits, away goes the filesystem data.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for the reply. Most files are under 1MB, but 1 daily file is approx 110MB and growing (but slowly). What's the size limit here? Thanks again.
The limit is undetermined, but quite large. It can take the 110MB+ just fine. Additionally, you can issue a \copy from the dyno referencing the temporary file (on the dyno) directly, without needing to resort to the STDIN option.
That's excellent... It will REALLY speed it up over using Django ORM.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.