I am working in Jupyter Notebook with pandas, and I noticed something strange.
In one cell , I did this:
import pandas as pd
df1 = pd.DataFrame({"A":[1,2,3]})
df2 = df1
Then in another cell, I changed df2:
df2.loc[0,"A"] = 100
But when I check df1 , its also updated - even though I never touched it directly!
print(df1)
output
A
0 100
1 2
2 3
I expected df1 to stay unchanged. why this happening? Do Jupyter cells share variables differently , or is this pandas work with assignments?
- I tried using
df2 = df1.copy()- that seems to fix it. - I expected df1 and df2 to be two independent DataFrame since I created them separately.
- Just want to understand why the change happens and the right way to avoid it
df2 = df1doesn't create a new object. This just assigns a new variable to the same object. That's it. And you already know the fix: create a copy if you need a copy... And this has nothing to do with jupyter cells.