18

I have the following code:

x = pd.DataFrame(np.zeros((4, 1)), columns=['A'])
y = np.random.randn(4, 2)
x['A'] = y

I expect it to throw an exception because of shape mismatch. But pandas silently accepted the assignment: y's first column is assigned to x.

Is this an intentional design? If yes, what is the rationale behind?

I tried both pandas 0.21 and 0.23.


Thanks for those who tried to help. However, nobody gives a satisfactory answer although the bounty is going to expire.

Let me emphasis what is expected as an answer:

  1. whether this design is intentional? Is it a bug ? Is it a false design?
  2. what is the rationale to design it in this way?

Since the bounty is going to expiry, I accepted the most voted answer. But it does not provide a answer to the above questions.

4
  • Seems to be a peculiarity with 'A' already being a column. For isntance x['B'] = y gives you the expected ValueError: Wrong number of items passed 2, placement implies 1 Commented Sep 3, 2018 at 2:37
  • I would expect this to raise key error instead... Commented Sep 3, 2018 at 2:42
  • oh yeah, there is one too. Commented Sep 3, 2018 at 2:45
  • I agree that the situation is still unclear. In light of this, I don’t think that there should be an accepted answer. I opened an issue about this on the pandas repository, and it seems like it may be a bug. Commented Apr 7, 2021 at 20:24

3 Answers 3

5
+50

The values in y are un-indexed matrix. The case x['A'] = y works here as it take the first item from the matrix and assign it to the 'A'.

Similarly,

x = pd.DataFrame(np.zeros((4, 2)), columns=['A', 'B'])
y = np.random.randn(4, 2)
x[['A', 'B']] = y

will also work because the extra data is being discarded by pandas. If you're trying to pass less columns, say:

x = pd.DataFrame(np.zeros((4, 2)), columns=['A', 'B'])
y = np.random.randn(4, 1)
x[['A', 'B']] = y

That will also work as it will assign the same values to both the columns. This case is similar to x['A'] = 0 which will replace all the data in column A with zeros.

Sign up to request clarification or add additional context in comments.

2 Comments

what do you mean by "un-indexed matrix" and what is the first item of y? the first columns?
@LiuSha Dataframe and Series have index. as np.random.randn is a list of list its un-indexed.
-1

for

x = pd.DataFrame(np.zeros((4, 1)), columns=['A'])
y = np.random.randn(4, 2)

if x['A'] = y ;then column is replicated and if we iterate it with different column lengths such as:

x = pd.DataFrame(np.zeros((4, 3)), columns=['A','B','C'])
y = np.random.randn(4, 2)

and try x['A'] = y then also first column is replicated but if we equate x = y then the x data frame is replicated with y matrix. So i guess we are getting this ambiguity as we are trying to equate a data frame column with a matix created in numpy. Hope it explains

Comments

-1

Pandas series are numpy array, since its one columns, it treats it as one object, to which the reference has changed.

>> import numpy as np
>>> x = np.zeros((4,1))
>>> x = np.random.randn(4,2)
>>> y= np.zeros((4,1))
>>> y
array([[0.],
       [0.],
       [0.],
       [0.]])
>>> x
array([[-1.00731291, -0.37151425],
       [-0.78154847, -0.72854126],
       [-0.98566253,  1.68786232],
       [ 0.12614892,  0.41804799]])
>>> y = x
>>>y
array([[-1.00731291, -0.37151425],
       [-0.78154847, -0.72854126],
       [-0.98566253,  1.68786232],
       [ 0.12614892,  0.41804799]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.