I am using Python 2.7 and running a regular scraping task. I would like to use a CSV to store data between scrapes.
Currently I'm reading data in from one CSV file, writing it out row-by-row to another, and then deleting and renaming the files:
reader = pd.read_csv('temp1.csv')
reader.set_index('id', inplace=True)
writer = csv.DictWriter(open('temp2.csv', 'wb'), fieldnames=['id', 'links'])
writer.writeheader()
for i, row in reader.iterrows():
# Check if data is already in CSV, if not scrape it.
try:
links = df_links.ix[row['id']]['links']
except KeyError:
links = do_scrape(row['id'])
if links:
df.set_value(i, 'pubmed_links', links)
# Write data out to new CSV file.
writer.writerow({'id': row['id'], 'links': links})
os.remove('temp1.csv')
os.rename('temp2.csv', 'temp1.csv')
Is there a better way? Specifically, can I add any new data directly to the existing file, without having to create and delete files, and safely so that if the network breaks half-way through I don't lose half the file?
I know about append mode, but I'm editing existing rows, not just adding new rows.
Thanks!