2

I'm trying to covert multiple text files into a single .csv file using Python. My current code is this:

import pandas
import glob

#Collects the files names of all .txt files in a given directory.
file_names = glob.glob("./*.txt")

#[Middle Step] Merges the text files into a single file titled 'output_file'.
with open('output_file.txt', 'w') as out_file:
    for i in file_names:
        with open(i) as in_file:
            for j in in_file:
                out_file.write(j)

#Reading the merged file and creating dataframe.
data = pandas.read_csv("output_file.txt", delimiter = '/')
  
#Store dataframe into csv file.
data.to_csv("convert_sample.csv", index = None)

So as you can see, I'm reading from all the files and merging them into a single .txt file. Then I convert it into a single .csv file. Is there a way to accomplish this without the middle step? Is it necessary to concatenate all my .txt files into a single .txt to convert it to .csv, or is there a way to directly convert multiple .txt files to a single .csv?

Thank you very much.

3
  • 2
    You might want to label your "middle step" with a comment. I don't see a problem with your code, as it does everything you said you needed. Commented Jun 11, 2021 at 20:30
  • 1
    do you know the column names ahead of time? Commented Jun 11, 2021 at 20:36
  • Yes, the column names will be known ahead of time, and are the same for all of the text files. There will be between 3 and 5 text files at a time to be converted. Commented Jun 11, 2021 at 20:37

1 Answer 1

3

Of course it is possible. And you really don't need to involve pandas here, just use the standard library csv module. If you know the column names ahead of time, the most painless way is to use csv.DictWriter and csv.DictReader objects:

import csv
import glob

column_names = ['a','b','c'] # or whatever


with open("convert_sample.csv", 'w', newline='') as target:
    writer = csv.DictWriter(target, fieldnames=column_names)
    writer.writeheader() # if you want a header
    for path in glob.glob("./*.txt"):
        with open(path, newline='') as source:
            reader = csv.DictReader(source, delimiter='/', fieldnames=column_names)
            writer.writerows(reader)
        
Sign up to request clarification or add additional context in comments.

7 Comments

Yes! Thank you for noting that the stdlib csv module is sufficient for this. It's disturbing how often folks are willing to add pandas as a dependency solely for basic csv processing.
@MichaelRuth yeah, it really drives me up the wall.
When i try this, I get a blank row in between rows with values. Would it be something to do with the newline=''?
@thentangler are you omitting that?
No I am not. When I open in excel I see alternate blank rows, but when I try printing in python I see hex Unicode nuls like ‘/x00’. How do I decode that?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.