extract rows and filenames from multiple csv files

Question

I have multiple csv files with date as filename (20080101.csv to 20111031.csv) in a folder. The csv files have common headers. The csv file looks like this:

20080101.csv  
X ;Y; Z  
1 ; 1 ; 3  
1 ; 2 ; 6  
1 ; 3 ; 24  
2 ; 1 ; 24  
2 ; 2 ; 24  

20080102.csv   
X ;Y; Z  
1 ; 1 ; 0.1  
1 ; 2 ; 2  
1 ; 3 ; 67  
2 ; 1 ; 24  
2 ; 2 ; 24  

20080103.csv  
X ;Y; Z  
1 ; 1 ; 3  
1 ; 3 ; 24  
2 ; 1 ; 24  
2 ; 2 ; 24  

20080104.csv   
X ;Y; Z  
1 ; 1 ; 34  
1 ; 2 ; 23  
1 ; 3 ; 67  
2 ; 1 ; 24  
2 ; 2 ; 24

… and so on. I want to write a script that would read the rows and if in a given row we have X=1 and Y=2, the whole row is copied to a new csv file along with filename giving the following output:

X ;Y ; Z ; filename  
1  ; 2 ; 6 ; 20080101  
1  ; 2 ; 2 ; 20080102  
1  ; 2 ; NA; 20080103  
1  ; 2 ; 23; 20080104

Any idea how this can be done and any suggestions about modules that i should look into or any examples. Thanks for your time and help.

Cheers, Navin

you are not interested records whose (x,y) are not (1,2)? just throw them away? — yosukesabai
– yosukesabai, Commented Nov 4, 2011 at 23:48
@Danny Character Separated Values? I'm clutching at straws with that :) — Rob Cowie
– Rob Cowie, Commented Nov 5, 2011 at 0:05
I saw one divided by bar, why not semicolon. somebody want to reseve comma for whatever purpose they have — yosukesabai
– yosukesabai, Commented Nov 5, 2011 at 0:07
hmmm... weird homework... what can i learn from this exercise? — yosukesabai
– yosukesabai, Commented Nov 5, 2011 at 0:15

Dave · Accepted Answer · 2011-11-05 00:08:54Z

4

This is a well-formed question, from which the logic should be apparent. For someone to provide finished code would defeat the purpose of the assignment. First, add a "homework" tag to the question, then think about what you want to do: 1) loop over the files (keeping track of each filename as it's opened) 2) read lines from the current file 3) if the selection criteria (x==1 and y==2) is met, then write the line.

To get started, try:

import csv, os

for fn in os.listdir():
    if ".csv" in fn:
        with open(fn, 'r', newline='') as f:
            reader = csv.reader(f, delimiter=";")
            for row in reader:
                ...

Then extend the solution to open the output file and write the selected lines using csv.writer.

answered Nov 5, 2011 at 0:08

Dave

3,9672 gold badges36 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Adam Zalcman · Accepted Answer · 2011-11-05 00:18:54Z

2

This should do the job:

import glob
import os

outfile = open('output.csv', 'w')
outfile.write('X ; Y ; Z ; filename\n')
for filename in glob.glob('*.csv'):
  if filename == 'output.csv': # Skip the file we're writing.
    continue
  with open(filename, 'r') as infile:
    count = 0 
    lineno = 0 
    for line in infile:
      lineno += 1
      if lineno == 1: # Skip the header line.
        continue
      fields = line.split(';')
      x = int(fields[0])
      y = int(fields[1])
      z = float(fields[2])
      if x == 1 and y == 2:
        outfile.write('%d ; %d ; %g ; %s\n' % (x, y, z, filename))
        count += 1
    if count == 0: # Handle the case when no lines were found.
      outfile.write('1 ; 2 ; NA ; %s\n' % filename)
outfile.close()

Note that if you can't control or trust the file format you may want to handle exceptions thrown by the conversions to int/float.

edited Nov 5, 2011 at 0:18

answered Nov 4, 2011 at 23:58

Adam Zalcman

27.3k4 gold badges75 silver badges95 bronze badges

1 Comment

Navin Over a year ago

thank you very much...sorry for late response..being a beginner i took some time to understand each one of valuable reply. Learning python is indeed fun !!

Tyler Ferraro · Accepted Answer · 2011-11-05 00:06:30Z

2

You could read in each file at a time. Read it line by line

files = ['20080101.csv', '20080102.csv', '20080103.csv'] #...etc
for f in files:
    file = open(f, 'r')
    for line in file:
        ray = line.split(';')
        if (ray[0].strip() == '1' and ray[1].strip() == '2'):
            fout = open('output.csv', 'a')
            fout.write(ray[0].strip() + ' ; ' + ray[1].strip() + ' ; ' + ray[2].strip() + ' ; ' + f + '\n')
            fout.close()
    file.close()

Tested and works. May need some slight modifications.

edited Nov 5, 2011 at 0:06

answered Nov 4, 2011 at 23:48

Tyler Ferraro

3,7821 gold badge23 silver badges29 bronze badges

5 Comments

Adam Zalcman Over a year ago

The if will always fails since ray is a string list.

Tyler Ferraro Over a year ago

@AdamZalcman: I've checked this code and it works. ray is a list of strings. I strip each string down to the number and then compare it to '1' and '2'. Please test this and tell me you get an error before telling me it's wrong.

Adam Zalcman Over a year ago

When I was writing the comment you were comparing against bare integer 1 and 2.

Tyler Ferraro Over a year ago

@AdamZalcman your comment was written 2 minutes after my last edit. But I won't split hairs with you. The above code is now working as it should.

Adam Zalcman Over a year ago

You're right, your edit was before my comment. I didn't reload the page. Also, the code does work indeed.

yosukesabai · Accepted Answer · 2011-11-04 23:54:49Z

0

if you know that you have one file for each day, no missing day, then i'd use glob('*.csv') to get list of file names, open one bye one, then read like Tyler is doing

if you konw that there are days where file is missing i'd use datetime to star with datetime.date(2008,1,1) and loop incrementing by one day. then for each of day i compose file name using .strftime() + '.csv', and try process file (if no file, just write a recode with NA)

answered Nov 4, 2011 at 23:54

yosukesabai

6,2644 gold badges35 silver badges42 bronze badges

Comments

Andrew Clark · Accepted Answer · 2011-11-05 00:02:17Z

0

The following should work:

import csv
with open('output.csv', 'w') as outfile:
    outfile.write('X ; Y ; Z ; filename\n')
    fmt = '1 ; 2 ; %s ; %s\n'
    files = ['20080101.csv', '20080102.csv', '20080103.csv', '20080104.csv']
    for file in files:
        with open(file) as f:
            reader = csv.reader(f, delimiter=';')
            for row in reader:
                if len(row) > 2 and row[0].strip() == '1' and row[1].strip() == '2':
                    outfile.write(fmt % (row[2].strip(), file[:-4]))
                    break
            else:
                outfile.write(fmt % ('NA', file[:-4]))

answered Nov 5, 2011 at 0:02

Andrew Clark

210k36 gold badges285 silver badges310 bronze badges

Collectives™ on Stack Overflow

extract rows and filenames from multiple csv files

5 Answers 5

Comments

1 Comment

5 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

1 Comment

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related