1

I have read multiple answers but none have worked in my case so far. I want to read multiple csv files (which may not be in the same directory as my python file), without specifying names (as I may have to read thousands of such files). I want to do something like the last example in this but I am not sure how to add my desktop path.

I tried the following, as given in the link:

# Assign path. The folder "Healthy" contains all the csv files
path, dirs, files = next(os.walk("/Users/my_name/Desktop/All hypnograms/Healthy"))
file_count = len(files)
# create empty list
dataframes_list = []
 
# append datasets to the list
for i in range(file_count):
    temp_df = pd.read_csv("./csv/"+files[i])
    dataframes_list.append(temp_df)

However, I got the following error: "FileNotFoundError: [Errno 2] No such file or directory:". I am using MAC OS. Can someone please help? Thank you!

3
  • 1
    You are using os.walk ..., can csv files exist in subdirectories of this directory? Commented Jan 7, 2023 at 22:51
  • I can import them but I wanted an algorithm to read it from any folder that I want. Commented Jan 8, 2023 at 18:12
  • Was that an answer to my question? Commented Jan 8, 2023 at 18:20

4 Answers 4

1

In your example, path is the root of each file in files, so you can do

temp_df = pd.read_csv(os.path.join(path, files[i]))

But we really wouldn't do it this way. Suppose there aren't any files in the directory, then next(os.walk("/Users/my_name/Desktop/All hypnograms/Healthy")) would raise a StopIteration error that you don't handle. I think it would be more natural to use os.listdir, glob.glob or even pathlib.Path. Since pathlib keeps track of the root for you, a good choice is

from pathlib import Path 
import pandas as pd

healthy = Path("/Users/my_name/Desktop/All hypnograms/Healthy")
dataframes_list = [pd.read_csv(file) for file in healthy.iterdir()
    if file.is_file()]

Many pandas errors inherit from ValueError. If you have problems with some files, you can put the read into an exception handler to find out which files are in error

dataframes_list = []
error_files = []

for file in helthy.iterdir():
    if file.is_file():
        try:
            dataframes_list.append(pd.read_csv(file, skiprows=18))
        except ValueError as e:
            error_files.append(file)
            print(f"{file}: {e}")
Sign up to request clarification or add additional context in comments.

6 Comments

It's giving me the following error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte
That means that the file doesn't use the same encoding as the default selected by pandas. Do you know the encoding? You could try encoding="cp1252" or encoding="latin-1" in the read_csv call.
encoding="cp1252" worked! But in my read_csv(), I need to also skip the first 18 rows of each csv file. If in your code, I write pd.read_csv(file, encoding="cp1252",skiprows=18) it gives me the error "EmptyDataError: No columns to parse from file". Do you know if I can achieve this here?
Sounds like there were only 18 lines or perhaps the first remaining line was an empty line.
I added an example that would at least tell you which files fail. This code isn't checking whether this is really a csv (it doesn't even check the file extension) so there could be non-csv files in there.
|
0

You can use pathlib to do that easily:

import pandas as pd
import pathlib

DATA_DIR = pathlib.Path.home() / 'Desktop' / 'All hypnograms' / 'Healthy' / 'csv'

dataframes_list = []
for csvfile in DATA_DIR.glob('**/*.csv'):
    temp_df = pd.read_csv(csvfile)
    dataframes_list.append(temp_df)

5 Comments

When I run dataframes_list, it's giving me an empty list [].
Is DATA_DIR path exists? Just try print(list(DATA_DIR.glob('**/*.*'))). The output is correct?
It's giving me an empty list []
Ok. so simply use print(list(pathlib.Path("/Users/my_name/Desktop/All hypnograms/Healthy").glob('**/*.*')))
You have to fix the path according the location of your data.
0

I guess you should specify the whole path in read_csv method by adding the path variable to the concatenated string. Something like :

for i in range(file_count):
    temp_df = pd.read_csv(path + "/csv/" + files[i])
    dataframes_list.append(temp_df)

You can remove the "/csv/" by doing path + files[i] directly if your CSV files are in the Healthy directory

2 Comments

It doesn't work, it is giving me the same error.
Which folder contains all your csv files? Can you provide a screenshot of it?
0

Assuming you want indeed to filter the files list by excluding non .csv files in order to use the pandas method read_csv :

Proposed code to execute :

Like you do not provide dataframe to work with I voluntarily excluded pd.read_csv but you would have to use pd.read_csv(os.path.join(path, f)) in real code.

import os
from pathlib import Path

# Let'us suppose path and files following values
path = '/home/Motors'
files = ['engine.html', 'engine.csv']

dataframes_list=[]

for f in files:
    if Path(f).suffixes[0]=='.csv':
        # temp_df = pd.read_csv(os.path.join(path, f))
        temp_df = os.path.join(path, f)
        dataframes_list.append(temp_df)
print(dataframes_list)

Result :

['/home/Motors/engine.csv']

To answer to S C comment:

What you should do is, as a first step, create a an iterator containing all the names. And after that to read it by chunks to make short listnames to process.

filenames = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M']

def iterchunks(filenames, n):
    for i in range(0, len(filenames), n):
        yield filenames[i:i + n]

chk = iterchunks(filenames, n=2)

print(next(chk))       
# ['A', 'B']

print(next(chk))       
# ['C', 'D']

1 Comment

I do not want to write the names of the files. There are hundreds of csv files that I need to read. Writing all the names of all the files isn't feasible, as mentioned in my question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.