1

I'm working on a Python script that should merge some columns of some CSV files (a lot, something around 200 files). All the files look like:

Timestamp; ...; ...; ...; Value; ...
date1;...;...;...; FirstValue;...
date2;...;...;...; SecondValue;...

and so on.

From the first file I want to extract the timestamp and the column Value. From the other files I need only the column Values.

My script for now is:

#!/usr/bin/python
import csv
import os, sys

# Open a file
path = "Z:/myfolder"
dirs = os.listdir( path )
#Conto il numero di file nella cartella
print len(dirs)
#Assegno il nome del primo file
file = dirs[0]

#Apro il primo file per la lettura di timestamp e primo valore (Value)
primofile = csv.reader(open(file, 'rb'), delimiter=";", quotechar='|')
timestamp, firstValue = [], []
#Per ogni riga del primofile
for row in primofile:
    #Copio timestamp
    timestamp.append(row[2])
    #e Value
    firstValue.append(row[15])

with open("provacript.csv", 'wb') as f:
    writer = csv.writer(f, delimiter=';')
    i = 0
    while i < len(timestamp):
        writer.writerow([timestamp[i]] + [firstValue[i]])
        i = i+1

So in "provascript.csv" I have the timestamp and the first column with my values from the first file. The next step is to open, one by one, the files in the list "dirs", read the column "Values" (the 15th column), save this column in an array and write it in "provascript.csv".

My code is:

for file in dirs:
data = csv.reader(open(file, 'rb'), delimiter=";", quotechar='|')
column = []
for row in data:
    column.append(row[15])

In the array "column" I should have the values. I have to add this values in a new column in "provascript.csv" and move on doing the same thing with all the files. How can I do that?

I would like to have something like

TimestampFromFirstFile;ValueFromFirstFile;ValueFromSecondFile;ValueFromThirdFile;...
date1;value;value,value;...
date2;value;value;value;...
date3;value;value;value;...

So far so good. I fixed it (thanks), but instead of reading and writing Value in the first row I would like to write a part of the name. Instead of having Timestamp;Value;Value;Value I would prefer Timestamp;Temperature1;Temperature2;Presence1;Presence2.

How can I do it?

2
  • 1
    Show us a sample of the desired output. Commented Aug 6, 2014 at 11:24
  • Can you provide the input as minimal example (like 3 rows, and 3 columns), and the corresponding output (don't show ... if they do not exist, and if value is actually an int, write 5 for instance.) Commented Aug 6, 2014 at 11:33

2 Answers 2

3

I should create the full structure and finally i will save it in the output file (assuming that files are ordered between them)

#create the full structure:  output_rows
primofile = csv.reader(open(file, 'rb'), delimiter=";", quotechar='|')
output_rows = []
for row in primofile:
    output_rows.append([row[2], row[15]])

Once we have an ordered list of lists, complete them with the other files

for file in dirs:
    data = csv.reader(open(file, 'rb'), delimiter=";", quotechar='|')
    column = []
    for idx,row in enumerate(data):
        output_rows[idx].append(row[15])

Finally save it to a file

with open("output.csv", 'wb') as f:
    writer = csv.writer(f, delimiter=';')
    for row in output_rows:
        writer.writerow(row)
Sign up to request clarification or add additional context in comments.

6 Comments

Traceback (most recent call last): File "C:/Users/ButoeruG/Desktop/xelena/CsvMerger.py", line 14, in <module> primofile = csv.reader(open(file, 'rb'), delimiter=";", quotechar='|') IOError: [Errno 2] No such file or directory: 'Scheduled_Archive_(20140804-220703)_(Server_1_TrendLogExtended_PowerV1FreddaC-Ext).csv' The list of the files is dinamic and generated by the system. The file shoud exist. Am I wrong?
maybe your data files and your python file aren't in the same directory?
Ok, my bad. Now the error changed in: File "C:/Users/ButoeruG/Desktop/xelena/prova/CsvMerger.py", line 19, in <module> output_rows.append([row[2], row[15]]) IndexError: list index out of range
seems that csv primofile hasn't 15 rows, could it be? Remember index is zero based
It looks like the python file is trying to open itself as a csv file. I solved using glob instead of listdir
|
2

You can do it with Pandas :

file1 = pd.read_csv("file1", index_col=0, sep=";", skipinitialspace=1)
file2 = pd.read_csv("file2", index_col=0, sep=";", skipinitialspace=1)
file3 = pd.read_csv("file3", index_col=0, sep=";", skipinitialspace=1)

here, you have plenty of options, notably to parse date while reading your csv.

file 1 being :

           ...  ....1 ....2      Value  ....3
Timestamp                                    
date1      ...   ...   ...   FirstValue   ...
date2      ...   ...   ...  SecondValue   ...

f1 = pd.DataFrame(file1.Value)
f2 = pd.DataFrame(file2.Value)
f3 = pd.DataFrame(file3.Value)

f2

          Value
Timestamp      
date1       AAA
date2       BBB

f3
           Value
Timestamp       
date1        456
date2        123

Then you define a function for recursive merge :

def recursive_merge(list_df):

    suffixe = range(1,len(list_df)+1)
    merged  = list_df[0]
    for i in range(1,len(list_df)):
        merged = merged.merge(list_df[i], left_index=True, right_index=True,
                                          suffixes=('_%s' %suffixe[i-1], '_%s' %suffixe[i]))
    if len(list_df)%2 !=0 : 
        merged.rename(
            columns = {'Value':"Value_%s" %suffixe[i]}, 
            inplace = True) # if number of recursive merge is odd 
    return merged

and call :

recursive_merge([f1,f2,f3])

Output :

               Value_1 Value_2  Value_3
Timestamp                              
date1       FirstValue     AAA      456
date2      SecondValue     BBB      123

And then you can easily write that dataframe with :

recursive_merge([f1,f2,f3]).to_csv("output.csv")

Of course if you have more than 3 files, you can make for-loops and or functions to open files and end up with a list like [f1,f2,f3,...f200]

Hope this helps

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.