In the pandas documentation, it states:
It is worth noting however, that concat (and therefore append) makes a full copy of the data, and that constantly reusing this function can create a signifcant performance hit. If you need to use the operation over several datasets, use a list comprehension.
frames = [ process_your_file(f) for f in files ]
result = pd.concat(frames)
My current situation is that I will be concatenating a new dataframe to a growing list of data frames over and over. This will result in a horrifying number of concatenations.
I'm worried about performance, and I'm not sure how to make use of list comprehension in this case. My code is as follows.
df = first_data_frame
while verify == True:
# download data (new data becomes available through each iteration)
# then turn [new] data into data frame, called 'temp'
frames = [df, temp]
df = concat(frames)
if condition_met:
verify == False
I don't think the parts that download data and create the data frame are relevant; my concern is with the constant concatenation.
How do I implement list comprehension in this case?