0

I have multiple csv files with date as filename (20080101.csv to 20111031.csv) in a folder. The csv files have common headers. The csv file looks like this:

20080101.csv  
X ;Y; Z  
1 ; 1 ; 3  
1 ; 2 ; 6  
1 ; 3 ; 24  
2 ; 1 ; 24  
2 ; 2 ; 24  

20080102.csv   
X ;Y; Z  
1 ; 1 ; 0.1  
1 ; 2 ; 2  
1 ; 3 ; 67  
2 ; 1 ; 24  
2 ; 2 ; 24  

20080103.csv  
X ;Y; Z  
1 ; 1 ; 3  
1 ; 3 ; 24  
2 ; 1 ; 24  
2 ; 2 ; 24  

20080104.csv   
X ;Y; Z  
1 ; 1 ; 34  
1 ; 2 ; 23  
1 ; 3 ; 67  
2 ; 1 ; 24  
2 ; 2 ; 24  

… and so on. I want to write a script that would read the rows and if in a given row we have X=1 and Y=2, the whole row is copied to a new csv file along with filename giving the following output:

X ;Y ; Z ; filename  
1  ; 2 ; 6 ; 20080101  
1  ; 2 ; 2 ; 20080102  
1  ; 2 ; NA; 20080103  
1  ; 2 ; 23; 20080104 

Any idea how this can be done and any suggestions about modules that i should look into or any examples. Thanks for your time and help.

Cheers, Navin

5
  • you are not interested records whose (x,y) are not (1,2)? just throw them away? Commented Nov 4, 2011 at 23:48
  • 1
    Can you really call a file separated by semicolons a csv? Commented Nov 5, 2011 at 0:02
  • @Danny Character Separated Values? I'm clutching at straws with that :) Commented Nov 5, 2011 at 0:05
  • I saw one divided by bar, why not semicolon. somebody want to reseve comma for whatever purpose they have Commented Nov 5, 2011 at 0:07
  • hmmm... weird homework... what can i learn from this exercise? Commented Nov 5, 2011 at 0:15

5 Answers 5

4

This is a well-formed question, from which the logic should be apparent. For someone to provide finished code would defeat the purpose of the assignment. First, add a "homework" tag to the question, then think about what you want to do: 1) loop over the files (keeping track of each filename as it's opened) 2) read lines from the current file 3) if the selection criteria (x==1 and y==2) is met, then write the line.

To get started, try:

import csv, os

for fn in os.listdir():
    if ".csv" in fn:
        with open(fn, 'r', newline='') as f:
            reader = csv.reader(f, delimiter=";")
            for row in reader:
                ...

Then extend the solution to open the output file and write the selected lines using csv.writer.

Sign up to request clarification or add additional context in comments.

Comments

2

This should do the job:

import glob
import os

outfile = open('output.csv', 'w')
outfile.write('X ; Y ; Z ; filename\n')
for filename in glob.glob('*.csv'):
  if filename == 'output.csv': # Skip the file we're writing.
    continue
  with open(filename, 'r') as infile:
    count = 0 
    lineno = 0 
    for line in infile:
      lineno += 1
      if lineno == 1: # Skip the header line.
        continue
      fields = line.split(';')
      x = int(fields[0])
      y = int(fields[1])
      z = float(fields[2])
      if x == 1 and y == 2:
        outfile.write('%d ; %d ; %g ; %s\n' % (x, y, z, filename))
        count += 1
    if count == 0: # Handle the case when no lines were found.
      outfile.write('1 ; 2 ; NA ; %s\n' % filename)
outfile.close()

Note that if you can't control or trust the file format you may want to handle exceptions thrown by the conversions to int/float.

1 Comment

thank you very much...sorry for late response..being a beginner i took some time to understand each one of valuable reply. Learning python is indeed fun !!
2

You could read in each file at a time. Read it line by line

files = ['20080101.csv', '20080102.csv', '20080103.csv'] #...etc
for f in files:
    file = open(f, 'r')
    for line in file:
        ray = line.split(';')
        if (ray[0].strip() == '1' and ray[1].strip() == '2'):
            fout = open('output.csv', 'a')
            fout.write(ray[0].strip() + ' ; ' + ray[1].strip() + ' ; ' + ray[2].strip() + ' ; ' + f + '\n')
            fout.close()
    file.close()

Tested and works. May need some slight modifications.

5 Comments

The if will always fails since ray is a string list.
@AdamZalcman: I've checked this code and it works. ray is a list of strings. I strip each string down to the number and then compare it to '1' and '2'. Please test this and tell me you get an error before telling me it's wrong.
When I was writing the comment you were comparing against bare integer 1 and 2.
@AdamZalcman your comment was written 2 minutes after my last edit. But I won't split hairs with you. The above code is now working as it should.
You're right, your edit was before my comment. I didn't reload the page. Also, the code does work indeed.
0

if you know that you have one file for each day, no missing day, then i'd use glob('*.csv') to get list of file names, open one bye one, then read like Tyler is doing

if you konw that there are days where file is missing i'd use datetime to star with datetime.date(2008,1,1) and loop incrementing by one day. then for each of day i compose file name using .strftime() + '.csv', and try process file (if no file, just write a recode with NA)

Comments

0

The following should work:

import csv
with open('output.csv', 'w') as outfile:
    outfile.write('X ; Y ; Z ; filename\n')
    fmt = '1 ; 2 ; %s ; %s\n'
    files = ['20080101.csv', '20080102.csv', '20080103.csv', '20080104.csv']
    for file in files:
        with open(file) as f:
            reader = csv.reader(f, delimiter=';')
            for row in reader:
                if len(row) > 2 and row[0].strip() == '1' and row[1].strip() == '2':
                    outfile.write(fmt % (row[2].strip(), file[:-4]))
                    break
            else:
                outfile.write(fmt % ('NA', file[:-4]))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.