Reading multiple csv files into separate dataframes in Python [duplicate]

Question

I have read multiple answers but none have worked in my case so far. I want to read multiple csv files (which may not be in the same directory as my python file), without specifying names (as I may have to read thousands of such files). I want to do something like the last example in this but I am not sure how to add my desktop path.

I tried the following, as given in the link:

# Assign path. The folder "Healthy" contains all the csv files
path, dirs, files = next(os.walk("/Users/my_name/Desktop/All hypnograms/Healthy"))
file_count = len(files)
# create empty list
dataframes_list = []
 
# append datasets to the list
for i in range(file_count):
    temp_df = pd.read_csv("./csv/"+files[i])
    dataframes_list.append(temp_df)

However, I got the following error: "FileNotFoundError: [Errno 2] No such file or directory:". I am using MAC OS. Can someone please help? Thank you!

You are using os.walk ..., can csv files exist in subdirectories of this directory? — tdelaney
– tdelaney, Commented Jan 7, 2023 at 22:51
I can import them but I wanted an algorithm to read it from any folder that I want. — S C
– S C, Commented Jan 8, 2023 at 18:12

tdelaney · Accepted Answer · 2023-01-08 22:36:13Z

1

In your example, path is the root of each file in files, so you can do

temp_df = pd.read_csv(os.path.join(path, files[i]))

But we really wouldn't do it this way. Suppose there aren't any files in the directory, then next(os.walk("/Users/my_name/Desktop/All hypnograms/Healthy")) would raise a StopIteration error that you don't handle. I think it would be more natural to use os.listdir, glob.glob or even pathlib.Path. Since pathlib keeps track of the root for you, a good choice is

from pathlib import Path 
import pandas as pd

healthy = Path("/Users/my_name/Desktop/All hypnograms/Healthy")
dataframes_list = [pd.read_csv(file) for file in healthy.iterdir()
    if file.is_file()]

Many pandas errors inherit from ValueError. If you have problems with some files, you can put the read into an exception handler to find out which files are in error

dataframes_list = []
error_files = []

for file in helthy.iterdir():
    if file.is_file():
        try:
            dataframes_list.append(pd.read_csv(file, skiprows=18))
        except ValueError as e:
            error_files.append(file)
            print(f"{file}: {e}")

edited Jan 8, 2023 at 22:36

answered Jan 7, 2023 at 22:59

tdelaney

77.9k6 gold badges91 silver badges129 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

S C Over a year ago

It's giving me the following error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte

tdelaney Over a year ago

That means that the file doesn't use the same encoding as the default selected by pandas. Do you know the encoding? You could try encoding="cp1252" or encoding="latin-1" in the read_csv call.

S C Over a year ago

encoding="cp1252" worked! But in my read_csv(), I need to also skip the first 18 rows of each csv file. If in your code, I write pd.read_csv(file, encoding="cp1252",skiprows=18) it gives me the error "EmptyDataError: No columns to parse from file". Do you know if I can achieve this here?

tdelaney Over a year ago

Sounds like there were only 18 lines or perhaps the first remaining line was an empty line.

tdelaney Over a year ago

I added an example that would at least tell you which files fail. This code isn't checking whether this is really a csv (it doesn't even check the file extension) so there could be non-csv files in there.

|

Corralien · Accepted Answer · 2023-01-07 22:45:39Z

0

You can use pathlib to do that easily:

import pandas as pd
import pathlib

DATA_DIR = pathlib.Path.home() / 'Desktop' / 'All hypnograms' / 'Healthy' / 'csv'

dataframes_list = []
for csvfile in DATA_DIR.glob('**/*.csv'):
    temp_df = pd.read_csv(csvfile)
    dataframes_list.append(temp_df)

answered Jan 7, 2023 at 22:45

Corralien

121k8 gold badges43 silver badges68 bronze badges

5 Comments

S C Over a year ago

When I run dataframes_list, it's giving me an empty list [].

Corralien Over a year ago

Is DATA_DIR path exists? Just try print(list(DATA_DIR.glob('**/*.*'))). The output is correct?

S C Over a year ago

It's giving me an empty list []

Corralien Over a year ago

Ok. so simply use print(list(pathlib.Path("/Users/my_name/Desktop/All hypnograms/Healthy").glob('**/*.*')))

Corralien Over a year ago

You have to fix the path according the location of your data.

SWEEPY · Accepted Answer · 2023-01-07 22:55:21Z

0

I guess you should specify the whole path in read_csv method by adding the path variable to the concatenated string. Something like :

for i in range(file_count):
    temp_df = pd.read_csv(path + "/csv/" + files[i])
    dataframes_list.append(temp_df)

You can remove the "/csv/" by doing path + files[i] directly if your CSV files are in the Healthy directory

edited Jan 7, 2023 at 22:55

answered Jan 7, 2023 at 22:39

SWEEPY

6391 gold badge7 silver badges20 bronze badges

2 Comments

S C Over a year ago

It doesn't work, it is giving me the same error.

SWEEPY Over a year ago

Which folder contains all your csv files? Can you provide a screenshot of it?

Laurent B. · Accepted Answer · 2023-01-08 20:11:58Z

0

Assuming you want indeed to filter the files list by excluding non .csv files in order to use the pandas method read_csv :

Proposed code to execute :

Like you do not provide dataframe to work with I voluntarily excluded pd.read_csv but you would have to use pd.read_csv(os.path.join(path, f)) in real code.

import os
from pathlib import Path

# Let'us suppose path and files following values
path = '/home/Motors'
files = ['engine.html', 'engine.csv']

dataframes_list=[]

for f in files:
    if Path(f).suffixes[0]=='.csv':
        # temp_df = pd.read_csv(os.path.join(path, f))
        temp_df = os.path.join(path, f)
        dataframes_list.append(temp_df)
print(dataframes_list)

Result :

['/home/Motors/engine.csv']

To answer to S C comment:

What you should do is, as a first step, create a an iterator containing all the names. And after that to read it by chunks to make short listnames to process.

filenames = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M']

def iterchunks(filenames, n):
    for i in range(0, len(filenames), n):
        yield filenames[i:i + n]

chk = iterchunks(filenames, n=2)

print(next(chk))       
# ['A', 'B']

print(next(chk))       
# ['C', 'D']

edited Jan 8, 2023 at 20:11

answered Jan 8, 2023 at 12:33

Laurent B.

2,3032 gold badges10 silver badges18 bronze badges

1 Comment

S C Over a year ago

I do not want to write the names of the files. There are hundreds of csv files that I need to read. Writing all the names of all the files isn't feasible, as mentioned in my question.

Collectives™ on Stack Overflow

Reading multiple csv files into separate dataframes in Python [duplicate]

4 Answers 4

6 Comments

5 Comments

2 Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

5 Comments

2 Comments

1 Comment

Linked

Related