I am attempting to build a python dataframe to capture an initial dataset, and then update fields as csv files containing data become available in a specified folder location. I have built my initial table and saved it to a csv, but when I reopen the table and try to bring in the next text file data I am accidentally duplicating the headers for the new data everytime I run the code. What is the best way to read in the new text files and append the data to a row based on a common column name?
My current attempt is to build the new text files into their own database, match the column headers from the initial import, and merge on the common header with the starting dataset. However, like I said the columns just keep duplicating themselves and adding _x, _y etc. afterwards.
Sample of Initial Dataframe...
Line Name,Bearing,SOL X,SOL Y,EOL X,EOL Y,FSP 0.5m,FSP 1m,Length km,Length m,LSP 0.5m,LSP 1m,Date,Julian Day,Seq,Storage Db,Time Start,SOG Start,Fix Start,SOL Bearing,Line Length,SOL Easting,SOL Northing,Obs SOL X,Obs SOL Y,Time End,Fix End,SOG End,Planned EOL X,Planned EOL Y,Obs EOL X,Obs EOL Y
A901,38.67269998,568453.03,4343701.73,569156.01,4344580.05,250,125,1.125,1125,2500,1250,,,,,,,,,,,,,,,,,,,,
A902,38.67269998,568476.45,4343682.99,569179.43,4344561.31,250,125,1.125,1125,2500,1250,,,,,,,,,,,,,,,,,,,,
A903,38.67269998,568499.87,4343664.24,569202.85,4344542.56,250,125,1.125,1125,2500,1250,,,,,,,,,,,,,,,,,,,,
Sample of Data to be Merged...
,Line Name,Date,Julian Day,Seq,Storage Db,Time Start,SOG Start,Fix Start,SOL Bearing,Line Length,SOL Easting,SOL Northing,Obs SOL X,Obs SOL Y,Time End,Fix End,SOG End,Planned EOL X,Planned EOL Y,Obs EOL X,Obs EOL Y
0,A901,9/26/2019,269,37,0037_JD269_X331 - 0001.db,09:29:54,2.73,12792,128.67,1000.0,587985.95,4380278.68,587811.22,4380427.4,09:43:51,15594,3.28,588766.68,4379653.81,588901.53,4379545.64
1,A902,9/26/2019,269,38,0038_JD269_M104 - 0001.db,11:38:24,3.69,98260,218.67,42875.0,591391.62,4383593.91,591626.99,4383892.87,17:40:02,25764,3.02,564600.29,4350120.19,568980.53,4355589.75
2,A903,9/29/2019,273,80,0080_JD273_M305 - 0001.db,00:50:53,3.64,27721,38.67,1125.0,576455.88,4351038.29,576365.15,4350932.06,01:03:26,30615,3.78,577158.86,4351916.61,577275.9,4352057.35
I've tried merging, appending and concatenating with varying combinations of rules applied, but without any luck. Some error, others duplicate the columns...
df_proj = pd.concat([df_in,df_acc], axis=1,sort=False)
or
df_proj = pd.merge(df_in, df_acc, on='Line Name', how = 'right')
I've tried to find examples of this type of problem without luck, I be going about this the wrong way. Maybe I shouldn't be treating the new text files as their own dataframe to begin with. But any help would be greatly appreciated. Thank you!
EDIT:
The expected result would look like this...
Line Name,Bearing,SOL X,SOL Y,EOL X,EOL Y,FSP 0.5m,FSP 1m,Length km,Length m,LSP 0.5m,LSP 1m,Date,Julian Day,Seq,Storage Db,Time Start,SOG Start,Fix Start,SOL Bearing,Line Length,SOL Easting,SOL Northing,Obs SOL X,Obs SOL Y,Time End,Fix End,SOG End,Planned EOL X,Planned EOL Y,Obs EOL X,Obs EOL Y
A901,38.67269998,568453.03,4343701.73,569156.01,4344580.05,250,125,1.125,1125,2500,1250,9/26/2019,269,37,0037_JD269_X331 - 0001.db,09:29:54,2.73,12792,128.67,1000.0,587985.95,4380278.68,587811.22,4380427.4,09:43:51,15594,3.28,588766.68,4379653.81,588901.53,4379545.64
A902,38.67269998,568476.45,4343682.99,569179.43,4344561.31,250,125,1.125,1125,2500,1250,9/26/2019,269,38,0038_JD269_M104 - 0001.db,11:38:24,3.69,98260,218.67,42875.0,591391.62,4383593.91,591626.99,4383892.87,17:40:02,25764,3.02,564600.29,4350120.19,568980.53,4355589.75
A903,38.67269998,568499.87,4343664.24,569202.85,4344542.56,250,125,1.125,1125,2500,1250,9/29/2019,273,80,0080_JD273_M305 - 0001.db,00:50:53,3.64,27721,38.67,1125.0,576455.88,4351038.29,576365.15,4350932.06,01:03:26,30615,3.78,577158.86,4351916.61,577275.9,4352057.35