2

I have a large csv file with about 5000 rows in it. The first column contains identifying names for each row i.e. LHGZZ01 The first 9 rows have LHGZZ01 as a name the next 10 have something else and so on. There is no pattern as such so I used np.unique to find the index where the name changes.

I want to write a loop which will write each row of the source csv to new csv files only containing the same names in a loop.

datafile = open('source.csv','rb')
reader = csv.reader(datafile)
data = []
idx = []
dataidx = []
next(reader, None)#skip headers
for row in reader:
    d = row[0]
    idx.append(d)
    data.append(row)
    dataidx.append(row[0])

index =np.sort(np.unique(idx,return_index=True)[1])

nme = []#list of unique names
for row in index:
    nm = data[row][0]
    nme.append(nm) 

for i in np.arange(0,9):
    with open(str(out_dir)+str(nme[0])+'.csv','w') as f1:
        row = data[i]
        writer=csv.writer(f1, delimiter=',')#lineterminator='\n',
        writer.writerow(row)

The code above writes the first row of the new csv and stops.

My question is how to I loop through the source.csv file splitting the data after every new name change and then write the rows with the same row name to a unique csv?

Apologies for the long winded question but this problem is beyond my python skills unfortunately and is driving me nuts.

Any help or suggestions greatly appreciated

Sample csv:

Sample csv<br/>
ID  NORTH_DMS   EAST_DMS    DIST    <br/>
LHGZZ01 403921  374459  12500m  <br/>
LHGZZ01 403610  353000  12500m  <br/>
LHGZZ01 404640  360400  12500m  <br/>
LHGZZ01 404515  361900  12500m  <br/>
LHGZZ01 411240  381900  12500m  <br/>
LHGZZ01 415629  400600  12500m  <br/>
LHGZZ01 401503  384400  12500m  <br/>
LHGZZ01 400319  382200  12500m  <br/>
LHGZZ01 403921  372800  12500m  <br/>
LHGZZ02 412000  353200  12500m  <br/>
LHGZZ02 412749  343200  12500m  <br/>
LHGZZ02 403111  353000  12500m  <br/>
LHGZZ02 400600  374459  12500m  <br/>
LHGZZ02 401818  400600  12500m  <br/>
LHGZZ02 401525  393100  12500m  <br/>
LHGZZ02 401605  392400  12500m  <br/>
LHGZZ02 412000  384400  12500m  <br/>
LHGZZ02 372912  382157  8400m   <br/>
GPPHA01 381500  382200  8400m   <br/>
GPPHA01 393000  375252  8400m   <br/>
GPPHA01 395400  370602  8400m   <br/>
GPPHA01 401503  372912  8400m   <br/>
GPPHA01 400831  382157  8400m   <br/>
GPPHA01 390651  365700  8400m   <br/>
GPPHA01 372912  382954  8400m   <br/>
GPPHA02 392130  370602  12500m  <br/>
GPPHA02 400319  364000  12500m  <br/>
GPPHA02 400831  361900  12500m  <br/>
GPPHA02 390651  365700  12500m  <br/>
GPPHA02 382157  400600  12500m  <br/>
GPPHA02 382200  401818  12500m  <br/>
GPPHA02 375252  401525  12500m  <br/>
GPPHA02 385112  401605  12500m  <br/>
GPPHA02 392020  400319  12500m  <br/>
GPPHA02 392130  392130  12500m  <br/>
GPPHA03 392020  392020  9800m   <br/>
GPPHA03 385112  383000  9800m   <br/>
GPPHA03 382954  400600  9800m   <br/>
GPPHA03 365700  364000  9800m   <br/>
GPPHA03 381900  372912  9800m   <br/>
GPPHA03 383000  380700  9800m   <br/>
GPPHA03 392020  373724  9800m   <br/>
GPPHA03 385112  363842  7500m   <br/>
VVDFB01 374459  361210  12500m  <br/>
VVDFB01 353000  360002  12500m  <br/>
VVDFB01 360400  360002  12500m  <br/>
VVDFB01 361900  364000  12500m  <br/>
VVDFB01 381900  360002  12500m  <br/>
VVDFB01 400600  360002  12500m  <br/>
VVDFB01 384400  361210  12500m  <br/>
VVDFB01 382200  350530  12500m  <br/>
VVDFB02 372800  344400  12500m  <br/>
VVDFB02 353200  343100  12500m  <br/>
VVDFB02 343200  351448  12500m  <br/>
VVDFB02 353000  360002  12500m  <br/>
VVDFB02 374459  364000  12500m  <br/>
VVDFB02 400600  351448  12500m  <br/>
VVDFB02 393100  345353  12500m  <br/>
VVDFB02 392400  341731  12500m  <br/>
1
  • 1
    I think you've put your code in twice... (looks duplicated at a quick glance) Commented May 29, 2015 at 15:37

2 Answers 2

4

Every time you open the file in w mode, it will overwrite everything that was there. You should open the file one time, then loop over calls to writerow like:

with open(str(out_dir)+str(nme[0])+'.csv','w') as f1:
    writer=csv.writer(f1, delimiter=',')#lineterminator='\n',
    for i in np.arange(0,9):
        row = data[i]
        writer.writerow(row)

instead of reopening the file each iteration through the for loop

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the answers appreciate it. I was opening the file in the loop to try and incorporate it into a larger loop so I could open a new file after some number of iterations through the loop for each csv file with a new name. Not working out that way for me. Thanks
0

Just to finish off the question above.

I solved my problem (not very elegantly) by opening/writing all the csv files I needed with the w attribute. Then used the a attribute to append each csv file within a second for loop.

Thanks for the answers

Cheers

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.