Multiple specific text files into CSV in python

Question

I have many text files in a very specific format that need to be read into a csv. I can not seem to figure out how to get all the data in the format that I want in my csv. I can get the file name and the header for the sheet, but none of the data is active in the sheet. The text file is a s follows:

"market":"billing, MI"
"mileStoneUpdates":"N"
"woName":"Dsca_55354_55as0"
"buildStage":"CPD"
"designType":"Core"
"woOverwrite":"Y"

My code:

import os
import csv

dirpath = 'C:\Usersnput\\'
output = 'C:\Users\gputew Microsoft Excel Worksheet.csv'
with open(output, 'w') as outfile:
    csvout = csv.writer(outfile)
    csvout.writerow(['market','mileStoneUpdates','woName','buildStage','designType','woOverwrite'])
files = os.listdir(dirpath)

for filename in files:
    with open(dirpath + '/' + filename) as afile:
        csvout.writerow([filename, afile.read()])
        afile.close()

outfile.close()

Need to have spreadsheet with headers;market, mileStoneUpdates, woName, buildstage, designType, woOverwrite with the cells filled with billings...ect from each text file.

So in the whole text file, there is only one value for each header, is that correct? — kushy
– kushy, Commented Jan 4, 2019 at 13:00
But apparently, you have multiple text files in that format and you want to merge the contents of all text files into a single csv, but using the headers only once, right? What about the order of the lines in a text file? Is it always the same or can it change? — kushy
– kushy, Commented Jan 4, 2019 at 13:02
The order does not change, but your solutions working. Thank you — Gregory Putman
– Gregory Putman, Commented Jan 4, 2019 at 15:28

kushy · Accepted Answer · 2019-01-04 13:15:46Z

2

As a general advice: The pandas library is pretty useful for things like this. If I understood your problem correctly, this should basically do it:

import os
import pandas as pd

dirpath = 'C:\Users\gputman\Desktop\Control_File_Tracker\Input\\'
output = 'C:\Users\gputman\Desktop\Control_File_Tracker\Output\New Microsoft Excel Worksheet.csv'
csvout = pd.DataFrame()

for filename in files:
    data = pd.read_csv(filename, sep=':', index_col=0, header=None).T
        csvout = csvout.append(data)

csvout.to_csv(output)

For explanation of the code, see this question/answer which explains how to read a transposed text file with pandas.

answered Jan 4, 2019 at 13:15

kushy

3613 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Serge Ballesta · Accepted Answer · 2019-01-04 13:33:17Z

You can use the csv module to parse your input file into a dict and a DictWriter to write it back:

import os
import csv

dirpath = 'C:\Users\gputman\Desktop\Control_File_Tracker\Input\\'
output = 'C:\Users\gputman\Desktop\Control_File_Tracker\Output\New Microsoft Excel Worksheet.csv'
with open(output, 'w', newline='') as outfile:
    csvout = csv.DictWriter(outfile, fieldnames =
                ['market','mileStoneUpdates','woName',
                 'buildStage','designType','woOverwrite'])
    csvout.writeheader()
    files = os.listdir(dirpath)

    for filename in files:
        with open(dirpath + '/' + filename) as afile:
            csvin = csv.reader(afile, delimiter=':')
            csvout.writerow({ row[0]: row[1] for row in csvin})

quacodas · Accepted Answer · 2019-01-04 13:38:18Z

First, a remark about the "with ... as" syntax: This is designed to do all the work about opening and closing files for you, so when you leave the "with ... as" block, your file will automatically be closed. So your line "afile.close" is unnecessary. Also, you won't be able to edit your output file later on because it has already been closed. So bear that in mind.

If you are looking for a solution that doesn't require any additional libraries (depends on how often you do things like this), this should work, if all your files are in exactly the same format:

import os
import csv

dirpath = 'C:\Users\gputman\Desktop\Control_File_Tracker\Input\\'
output = 'C:\Users\gputman\Desktop\Control_File_Tracker\Output\New Microsoft 
Excel Worksheet.csv'
outfile = open(output, 'w')
csvout = csv.writer(outfile)
csvout.writerow(['market','mileStoneUpdates','woName','buildStage','designType','woOverwrite'])
files = os.listdir(dirpath)

for filename in files:
    with open(dirpath + '/' + filename) as afile:
        row=[] # list of values we will be constructing
        for line in afile: # loops through the lines in the file one by one
            value = line.split(':')[1].strip('" \n') # I will be explaining this later
            row.append(value) # adds the retrieved value to our row
        csvout.writerow(row)

outfile.close()

Now let's see what happens in the value = ... row: line.split(':') makes a list of strings that are separated by ':'. So '"market":"billing, MI"\n' becomes ['"market"','"billing, MI"\n'] [1] takes the second item of the list (remember, Python is zero-indexed), because we already know the first item (It's the name of the field). .strip(' "\n') removes the specified characters (double-quote, space or newline) from the beginning and end of the string. In a way, it "cleans up" the string so that only the actual value is left over.

Adam Puza · Accepted Answer · 2019-01-04 13:44:53Z

There are few changes needed:

All operations on file need to be inside with clause, and closing is not necessary there.
Then you need collect data from file.

Easiest solution will be:

import os
import csv
from collections import OrderedDict

HEADERS = ['market', 'mileStoneUpdates', 'woName', 'buildStage', 'designType', 'woOverwrite']

dirpath = '/tmp/input'
output = '/tmp/output'
with open(output, 'w') as outfile:
    csvout = csv.writer(outfile)
    csvout.writerow(HEADERS)
    files = os.listdir(dirpath)

    for filename in files:
        with open(dirpath + '/' + filename) as afile:
            data = OrderedDict.fromkeys(HEADERS, "")
            for line in afile:
                for header in HEADERS:
                    if line.startswith('"{}"'.format(header)):
                        value = line.split('"{}":"'.format(header)).pop()
                        value = value[:-2]
                        data[header] = value
            csvout.writerow(data.values())
            afile.close()

    outfile.close()

For given input files:

"market":"billing, MI"
"mileStoneUpdates":"N"
"woName":"Dsca_55354_55as0"
"buildStage":"CPD"
"designType":"Core"
"woOverwrite":"Y"

"market":"billing, MI2"
"mileStoneUpdates":"N2"
"woName":"Dsca_55354_55as02"
"buildStage":"CPD2"
"designType":"Cor2e"
"woOverwrite":"Y2"

Will produce:

market,mileStoneUpdates,woName,buildStage,designType,woOverwrite
"billing, MI",N,Dsca_55354_55as0,CPD,Core,Y
"billing, MI2",N2,Dsca_55354_55as02,CPD2,Cor2e,Y2

Note: if data in files is more complicated, use regexp instead of simple string split.

Collectives™ on Stack Overflow

Multiple specific text files into CSV in python

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related