0

I continue to use pandas.dataframe.append, however appending in the for-loop continues to get written over and information is duplicated. The out put of the for-loop should look exactly like the original dataframe (I am not including functions to eliminate any complexity). Any help would be appreciated.

import pandas as pd
df = pd.DataFrame({
    'date': ['2019-01-01','2019-01-01','2019-01-01',
             '2019-02-01','2019-02-01','2019-02-01',
             '2019-03-01','2019-03-01','2019-03-01',],
    'Asset': ['Asset A', 'Asset A', 'Asset A', 'Asset B', 'Asset B', 'Asset B',
             'Asset C', 'Asset C', 'Asset C'],
    'Monthly Value': [2100, 8100, 1400, 1400, 3100, 1600, 2400, 2100, 2100]
})
print(df.sort_values(by=['Asset']))

         date    Asset  Monthly Value
0  2019-01-01  Asset A           2100
1  2019-01-01  Asset A           8100
2  2019-01-01  Asset A           1400
3  2019-02-01  Asset B           1400
4  2019-02-01  Asset B           3100
5  2019-02-01  Asset B           1600
6  2019-03-01  Asset C           2400
7  2019-03-01  Asset C           2100
8  2019-03-01  Asset C           2100

This for loop creates multiple appendages to the df and duplicates rows

assetlist = list(df['Asset'].unique())

for asset in assetlist:
    df_subset = df[df['Asset'] == asset]
    dfcopy = df_subset.copy()
    newdf = newdf.append(dfcopy)
print(newdf)

This output is incorrect, it should look exactly like the original dataframe.

date    Asset   Monthly Value
6   2019-03-01  Asset C 2400
7   2019-03-01  Asset C 2100
8   2019-03-01  Asset C 2100
6   2019-03-01  Asset C 2400
7   2019-03-01  Asset C 2100
8   2019-03-01  Asset C 2100
0   2019-01-01  Asset A 2100
1   2019-01-01  Asset A 8100
2   2019-01-01  Asset A 1400
3   2019-02-01  Asset B 1400
4   2019-02-01  Asset B 3100
5   2019-02-01  Asset B 1600
6   2019-03-01  Asset C 2400
7   2019-03-01  Asset C 2100
8   2019-03-01  Asset C 2100
0   2019-01-01  Asset A 2100
1   2019-01-01  Asset A 8100
2   2019-01-01  Asset A 1400
3   2019-02-01  Asset B 1400
4   2019-02-01  Asset B 3100
5   2019-02-01  Asset B 1600
6   2019-03-01  Asset C 2400
7   2019-03-01  Asset C 2100
8   2019-03-01  Asset C 2100
0   2019-01-01  Asset A 2100
1   2019-01-01  Asset A 8100
2   2019-01-01  Asset A 1400
3   2019-02-01  Asset B 1400
4   2019-02-01  Asset B 3100
5   2019-02-01  Asset B 1600
6   2019-03-01  Asset C 2400
7   2019-03-01  Asset C 2100
8   2019-03-01  Asset C 2100
0   2019-01-01  Asset A 2100
1   2019-01-01  Asset A 8100
2   2019-01-01  Asset A 1400
3   2019-02-01  Asset B 1400
4   2019-02-01  Asset B 3100
5   2019-02-01  Asset B 1600
6   2019-03-01  Asset C 2400
7   2019-03-01  Asset C 2100
8   2019-03-01  Asset C 2100
6
  • Perhaps you meant to append to newdf instead of dfcopy? Commented Jan 15, 2020 at 21:41
  • That was a type-o. Edited. Thank you, but still doesnt work. Commented Jan 15, 2020 at 21:44
  • I'm a bit confused what you are trying to accomplish... You are iterating over "Asset"s, filtering down to distinct groups, and then recombining those groups into a single DataFrame... won't your desired result be exactly the same as df, just in a different order? Can you give an example of the output you expect? Commented Jan 15, 2020 at 21:46
  • Yes, it should look exactly like the same. I have several functions that I took out to save time. Commented Jan 15, 2020 at 21:47
  • 1
    It might be easier (and faster) to throw all of your dfcopys into a list and then use pd.concat to combine them at the end. Commented Jan 15, 2020 at 21:49

1 Answer 1

2

You are missing one line I think:

assetlist = list(df['Asset'].unique())
newdf = pd.DataFrame()   # <-- define it as a data frame
for asset in assetlist:
    df_subset = df[df['Asset'] == asset]
    dfcopy = df_subset.copy()
    newdf = newdf.append(dfcopy)
print(newdf)

         date    Asset  Monthly Value
0  2019-01-01  Asset A           2100
1  2019-01-01  Asset A           8100
2  2019-01-01  Asset A           1400
3  2019-02-01  Asset B           1400
4  2019-02-01  Asset B           3100
5  2019-02-01  Asset B           1600
6  2019-03-01  Asset C           2400
7  2019-03-01  Asset C           2100
8  2019-03-01  Asset C           2100

However, an easier way to do this is:

newdf = pd.concat([df.query("Asset == @asset") for asset in assetlist])
Sign up to request clarification or add additional context in comments.

1 Comment

Not only easier, but entirely more efficient. pd.concat+ list.append doesn't lead to the quadratic copying of DataFrame.append in a loop

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.