1

I would like to go through different csv files contained in different folders in the same directory. My folders are in my working directory. My folders are named:

folder1, folder2,folder3

each of them have csv's with identical names csv1.csv, csv2.csv.

I tried this code:

import os
import re
import pandas as pd
from pandas.core.frame import DataFrame

rootDir = '.'
for dirName, subdirList, fileList in os.walk(rootDir, topdown=False):
    print('Found directory: %s' % dirName)
    for fname in fileList:
        print('\t%s' % fname)

        if "csv1.csv" == fname:
            var= pd.read_csv(fname)

I can print the name of the csv file in that folder but i get an error: IOError: File csv1.csv does not exist
what could be the problem?

3
  • looks like if you are expecting fname to equal 'csv1.csv' you're not passing the path of the file to read_csv. Probably need something like pd.read_csv(os.join(path, fname)) Commented Sep 2, 2015 at 21:23
  • the path of the whole folder or the path of that csv file? Commented Sep 2, 2015 at 21:24
  • if your working directory is /home/user and your files are located in /home/user/dir1, /home/user/dir2..., you need to at least have the relative path of the file dir2/csv1.csv. Commented Sep 2, 2015 at 21:29

1 Answer 1

1

As you can see in comments, you have to join rootDir, dirName and fname.

import os
import re
import pandas as pd
from pandas.core.frame import DataFrame

rootDir = '.'
for dirName, subdirList, fileList in os.walk(rootDir, topdown=False):
    print('Found directory: %s' % dirName)
    for fname in fileList:
        print('\t%s' % fname)
        filepath = os.path.join(rootDir, dirName, fname)
        if "csv1.csv" == fname:
            var = pd.read_csv(filepath)
            print var.head()

os.path.join(path, *paths):

Join one or more path components intelligently. The return value is the concatenation of path and any members of *paths with exactly one directory separator (os.sep) following each non-empty part except the last, meaning that the result will only end in a separator if the last part is empty. If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component.

On Windows, the drive letter is not reset when an absolute path component (e.g., r'\foo') is encountered. If a component contains a drive letter, all previous components are thrown away and the drive letter is reset. Note that since there is a current directory for each drive, os.path.join("c:", "foo") represents a path relative to the current directory on drive C: (c:foo), not c:\foo.

Sign up to request clarification or add additional context in comments.

3 Comments

thank you very much; i have a question though, it it seems to only read about 7000 rows but the data i have contains around 50000 rows for each csv; what is the issue here?
actually it was fine; i actually asked the wrong question; as it only keeps a copy of the last csv five in my var, i actually wanted to concatenate it all the columns for each csv read from the directory; that is my var should contain the column of both csv file
you can create empty df and append one column of data from csv's - stackoverflow.com/questions/32352211/sorting-by-groups/…, but you have to change for adding column code : df = pd.DataFrame() and df = df.append(g['code'].tolist(), ignore_index=True)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.