Python for loop to read csv using pandas

Question

I can combined 2 csv scripts and it works well.

import pandas

csv1=pandas.read_csv('1.csv')
csv2=pandas.read_csv('2.csv')
merged=csv1.merge(csv2,on='field1')
merged.to_csv('output.csv',index=False)

Now, I would like to combine more than 2 csvs using the same method as above. I have list of CSV which I defined to something like this

import pandas
collection=['1.csv','2.csv','3.csv','4.csv']
for i in collection:
  csv=pandas.read_csv(i)
  merged=csv.merge(??,on='field1')
  merged.to_csv('output2.csv',index=False)

I havent got it work so far if more than 1 csv..I guess it just a matter iterate inside the list ..any idea?

Are you using merge for a SQL-style inner join? Or could you possible concat instead? — measureallthethings
– measureallthethings, Commented Apr 10, 2015 at 13:46
Too bad I dont have an access to the SQL DB. The one given in csv unfortunately :( — FRizal
– FRizal, Commented Apr 10, 2015 at 14:10
I see. I'm just trying to understand the type of join you are doing; if it's inner join, then sticking with merge is good, but if you can do concat, the code would be a lot simpler. — measureallthethings
– measureallthethings, Commented Apr 10, 2015 at 14:40

Aaron Digulla · Accepted Answer · 2015-04-10 10:57:18Z

1

You need special handling for the first loop iteration:

import pandas
collection=['1.csv','2.csv','3.csv','4.csv']

result = None
for i in collection:
  csv=pandas.read_csv(i)
  if result is None:
    result = csv
  else:
    result = result.merge(csv, on='field1')

if result:
  result.to_csv('output2.csv',index=False)

Another alternative would be to load the first CSV outside the loop but this breaks when the collection is empty:

import pandas
collection=['1.csv','2.csv','3.csv','4.csv']

result = pandas.read_csv(collection[0])
for i in collection[1:]:
  csv = pandas.read_csv(i)
  result = result.merge(csv, on='field1')

if result:
  result.to_csv('output2.csv',index=False)

I don't know how to create an empty document (?) in pandas but that would work, too:

import pandas
collection=['1.csv','2.csv','3.csv','4.csv']

result = pandas.create_empty() # not sure how to do this
for i in collection:
  csv = pandas.read_csv(i)
  result = result.merge(csv, on='field1')

result.to_csv('output2.csv',index=False)

answered Apr 10, 2015 at 10:57

Aaron Digulla

330k111 gold badges626 silver badges840 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

FRizal Over a year ago

Thanks. I think its getting nearer. I am getting this error ----"Traceback (most recent call last): File "merge.py", line 11, in <module> if result.all(): File "/Library/Python/2.7/site-packages/pandas/core/generic.py", line 709, in nonzero .format(self.__class__.__name__)) ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()"-----

Aaron Digulla Over a year ago

I don't know enough about Panda to help you there. Ask a new question and include the data record(s) which cause the error and your code.

FRizal Over a year ago

I just removed if result: statement for second sample, and it works. Apparently ValueError in Pandas play a role here. pandas.pydata.org/pandas-docs/version/0.15.2/gotchas.html Thanks Aaron.

Collectives™ on Stack Overflow

Python for loop to read csv using pandas

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related