Reading multiple CSV files from multiple files into pandas DataFrame

Question

I would like to go through different csv files contained in different folders in the same directory. My folders are in my working directory. My folders are named:

folder1, folder2,folder3

each of them have csv's with identical names csv1.csv, csv2.csv.

I tried this code:

import os
import re
import pandas as pd
from pandas.core.frame import DataFrame

rootDir = '.'
for dirName, subdirList, fileList in os.walk(rootDir, topdown=False):
    print('Found directory: %s' % dirName)
    for fname in fileList:
        print('\t%s' % fname)

        if "csv1.csv" == fname:
            var= pd.read_csv(fname)

I can print the name of the csv file in that folder but i get an error: IOError: File csv1.csv does not exist
what could be the problem?

looks like if you are expecting fname to equal 'csv1.csv' you're not passing the path of the file to read_csv. Probably need something like pd.read_csv(os.join(path, fname)) — postelrich
– postelrich, Commented Sep 2, 2015 at 21:23
if your working directory is /home/user and your files are located in /home/user/dir1, /home/user/dir2..., you need to at least have the relative path of the file dir2/csv1.csv. — postelrich
– postelrich, Commented Sep 2, 2015 at 21:29

Community · Accepted Answer · 2020-06-20 09:12:55Z

1

As you can see in comments, you have to join rootDir, dirName and fname.

import os
import re
import pandas as pd
from pandas.core.frame import DataFrame

rootDir = '.'
for dirName, subdirList, fileList in os.walk(rootDir, topdown=False):
    print('Found directory: %s' % dirName)
    for fname in fileList:
        print('\t%s' % fname)
        filepath = os.path.join(rootDir, dirName, fname)
        if "csv1.csv" == fname:
            var = pd.read_csv(filepath)
            print var.head()

os.path.join(path, *paths):

Join one or more path components intelligently. The return value is the concatenation of path and any members of *paths with exactly one directory separator (os.sep) following each non-empty part except the last, meaning that the result will only end in a separator if the last part is empty. If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.

On Windows, the drive letter is not reset when an absolute path component (e.g., r'\foo') is encountered. If a component contains a drive letter, all previous components are thrown away and the drive letter is reset. Note that since there is a current directory for each drive, os.path.join("c:", "foo") represents a path relative to the current directory on drive C: (c:foo), not c:\foo.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Sep 3, 2015 at 6:33

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user3841581 Over a year ago

thank you very much; i have a question though, it it seems to only read about 7000 rows but the data i have contains around 50000 rows for each csv; what is the issue here?

user3841581 Over a year ago

actually it was fine; i actually asked the wrong question; as it only keeps a copy of the last csv five in my var, i actually wanted to concatenate it all the columns for each csv read from the directory; that is my var should contain the column of both csv file

jezrael Over a year ago

you can create empty df and append one column of data from csv's - stackoverflow.com/questions/32352211/sorting-by-groups/…, but you have to change for adding column code : df = pd.DataFrame() and df = df.append(g['code'].tolist(), ignore_index=True)

Collectives™ on Stack Overflow

Reading multiple CSV files from multiple files into pandas DataFrame

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related