I have a large csv file with about 5000 rows in it. The first column contains identifying names for each row i.e. LHGZZ01 The first 9 rows have LHGZZ01 as a name the next 10 have something else and so on. There is no pattern as such so I used np.unique to find the index where the name changes.
I want to write a loop which will write each row of the source csv to new csv files only containing the same names in a loop.
datafile = open('source.csv','rb')
reader = csv.reader(datafile)
data = []
idx = []
dataidx = []
next(reader, None)#skip headers
for row in reader:
d = row[0]
idx.append(d)
data.append(row)
dataidx.append(row[0])
index =np.sort(np.unique(idx,return_index=True)[1])
nme = []#list of unique names
for row in index:
nm = data[row][0]
nme.append(nm)
for i in np.arange(0,9):
with open(str(out_dir)+str(nme[0])+'.csv','w') as f1:
row = data[i]
writer=csv.writer(f1, delimiter=',')#lineterminator='\n',
writer.writerow(row)
The code above writes the first row of the new csv and stops.
My question is how to I loop through the source.csv file splitting the data after every new name change and then write the rows with the same row name to a unique csv?
Apologies for the long winded question but this problem is beyond my python skills unfortunately and is driving me nuts.
Any help or suggestions greatly appreciated
Sample csv:
Sample csv<br/>
ID NORTH_DMS EAST_DMS DIST <br/>
LHGZZ01 403921 374459 12500m <br/>
LHGZZ01 403610 353000 12500m <br/>
LHGZZ01 404640 360400 12500m <br/>
LHGZZ01 404515 361900 12500m <br/>
LHGZZ01 411240 381900 12500m <br/>
LHGZZ01 415629 400600 12500m <br/>
LHGZZ01 401503 384400 12500m <br/>
LHGZZ01 400319 382200 12500m <br/>
LHGZZ01 403921 372800 12500m <br/>
LHGZZ02 412000 353200 12500m <br/>
LHGZZ02 412749 343200 12500m <br/>
LHGZZ02 403111 353000 12500m <br/>
LHGZZ02 400600 374459 12500m <br/>
LHGZZ02 401818 400600 12500m <br/>
LHGZZ02 401525 393100 12500m <br/>
LHGZZ02 401605 392400 12500m <br/>
LHGZZ02 412000 384400 12500m <br/>
LHGZZ02 372912 382157 8400m <br/>
GPPHA01 381500 382200 8400m <br/>
GPPHA01 393000 375252 8400m <br/>
GPPHA01 395400 370602 8400m <br/>
GPPHA01 401503 372912 8400m <br/>
GPPHA01 400831 382157 8400m <br/>
GPPHA01 390651 365700 8400m <br/>
GPPHA01 372912 382954 8400m <br/>
GPPHA02 392130 370602 12500m <br/>
GPPHA02 400319 364000 12500m <br/>
GPPHA02 400831 361900 12500m <br/>
GPPHA02 390651 365700 12500m <br/>
GPPHA02 382157 400600 12500m <br/>
GPPHA02 382200 401818 12500m <br/>
GPPHA02 375252 401525 12500m <br/>
GPPHA02 385112 401605 12500m <br/>
GPPHA02 392020 400319 12500m <br/>
GPPHA02 392130 392130 12500m <br/>
GPPHA03 392020 392020 9800m <br/>
GPPHA03 385112 383000 9800m <br/>
GPPHA03 382954 400600 9800m <br/>
GPPHA03 365700 364000 9800m <br/>
GPPHA03 381900 372912 9800m <br/>
GPPHA03 383000 380700 9800m <br/>
GPPHA03 392020 373724 9800m <br/>
GPPHA03 385112 363842 7500m <br/>
VVDFB01 374459 361210 12500m <br/>
VVDFB01 353000 360002 12500m <br/>
VVDFB01 360400 360002 12500m <br/>
VVDFB01 361900 364000 12500m <br/>
VVDFB01 381900 360002 12500m <br/>
VVDFB01 400600 360002 12500m <br/>
VVDFB01 384400 361210 12500m <br/>
VVDFB01 382200 350530 12500m <br/>
VVDFB02 372800 344400 12500m <br/>
VVDFB02 353200 343100 12500m <br/>
VVDFB02 343200 351448 12500m <br/>
VVDFB02 353000 360002 12500m <br/>
VVDFB02 374459 364000 12500m <br/>
VVDFB02 400600 351448 12500m <br/>
VVDFB02 393100 345353 12500m <br/>
VVDFB02 392400 341731 12500m <br/>