0

I am using Python 2.7 and running a regular scraping task. I would like to use a CSV to store data between scrapes.

Currently I'm reading data in from one CSV file, writing it out row-by-row to another, and then deleting and renaming the files:

reader = pd.read_csv('temp1.csv')
reader.set_index('id', inplace=True)
writer = csv.DictWriter(open('temp2.csv', 'wb'), fieldnames=['id', 'links'])
writer.writeheader()
for i, row in reader.iterrows():
    # Check if data is already in CSV, if not scrape it. 
    try:
       links = df_links.ix[row['id']]['links']
    except KeyError: 
       links = do_scrape(row['id'])
    if links:
       df.set_value(i, 'pubmed_links', links)
    # Write data out to new CSV file. 
    writer.writerow({'id': row['id'], 'links': links})
os.remove('temp1.csv')
os.rename('temp2.csv', 'temp1.csv')

Is there a better way? Specifically, can I add any new data directly to the existing file, without having to create and delete files, and safely so that if the network breaks half-way through I don't lose half the file?

I know about append mode, but I'm editing existing rows, not just adding new rows.

Thanks!

2
  • There is no better way. Commented Sep 28, 2016 at 13:09
  • This is the way people used to do things in the 1980s. Now we use RDBMs. Commented Sep 28, 2016 at 13:11

1 Answer 1

0

If you want your solution to still be file based (obviously using a pure database solution such as MySQL would be better here), then consider using SQLite. The SQLite database can be stored as a file on disk that can just be passed around, but you can use most database functions on it. (These functions will allow you to do the "Check if data is already in CSV" bit.) Then you can insert values in the SQLite database without having to make any new files.

Now, if there is some end user who wants the data as a .csv, then make some utility that uses pd.read_sql() and then df.to_csv() to provide that to them.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.