6

I have a MultiIndex DataFrame:

                 predicted_y actual_y predicted_full actual_full
subj_id org_clip                                                
123     3                  2        5      [1, 2, 3]   [4, 5, 6]

That I wish to add a new row to:

                 predicted_y actual_y predicted_full   actual_full
subj_id org_clip                                                  
123     3                  2        5      [1, 2, 3]     [4, 5, 6]
321     4                 20       50   [10, 20, 30]  [40, 50, 60]    # add this row

And the following code does it:

df.loc[('321', 4),['predicted_y', 'actual_y']] = [20, 50]
df.loc[('321', 4),['predicted_full', 'actual_full']] = [[10,20,30], [40,50,60]]

But when trying to add a new row in a single line, I'm getting an error:

df.loc[('321', 4),['predicted_y', 'actual_y', 'predicted_full', 'actual_full']] = [20, 50, [10,20,30], [40,50,60]]

>>> ValueError: setting an array element with a sequence.

Notes:

I believe it has something (possibly syntactic) to do with me trying to add a row that contains both values and lists. All other attempts had raised the same error; see the following examples:

df.loc[('321', 4),['predicted_y', 'actual_y', ['predicted_full', 'actual_full']]] = [20, 50, [10,20,30], [40,50,60]]
df.loc[('321', 4),['predicted_y', 'actual_y', ['predicted_full'], ['actual_full']]] = [20, 50, [10,20,30], [40,50,60]]
df.loc[('321', 4),['predicted_y', 'actual_y', [['predicted_full'], ['actual_full']]]] = [20, 50, [10,20,30], [40,50,60]]
df.loc[('321', 4),['predicted_y', 'actual_y', 'predicted_full', 'actual_full']] = [20, 50, np.array([10,20,30]), np.array([40,50,60])]

The code to construct the initial DataFrame:

df = pd.DataFrame(index=pd.MultiIndex(levels=[[], []], labels=[[], []], names=['subj_id', 'org_clip']),
                  columns=['predicted_y', 'actual_y', 'predicted_full', 'actual_full'])
df.loc[('123', 3),['predicted_y', 'actual_y']] = [2, 5]
df.loc[('123', 3),['predicted_full', 'actual_full']] = [[1,2,3], [4,5,6]]

2 Answers 2

8

You can let pd.Series handle the dtypes

row_to_append = pd.Series([20, 50, [10, 20, 30], [40, 50, 60]])
cols = ['predicted_y', 'actual_y', 'predicted_full', 'actual_full']
df.loc[(321, 4), cols] = row_to_append.values

df

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

4

Make at least one of the sublists an array of dtype object:

In [27]: df.loc[('321', 4),['predicted_y', 'actual_y', 'predicted_full', 'actual_full']] =  (
           [20, 50, np.array((10, 20, 30), dtype='O'), [40, 50, 60]])

In [28]: df
Out[28]: 
                 predicted_y actual_y predicted_full   actual_full
subj_id org_clip                                                  
123     3                  2        5      [1, 2, 3]     [4, 5, 6]
321     4                 20       50   [10, 20, 30]  [40, 50, 60]

Notice that the error

ValueError: setting an array element with a sequence.

occurs on this line:

--> 643         arr_value = np.array(value)

and can be reproduced like this

In [12]: np.array([20, 50, [10, 20, 30], [40, 50, 60]])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-f6122275ab9f> in <module>()
----> 1 np.array([20, 50, [10, 20, 30], [40, 50, 60]])

ValueError: setting an array element with a sequence.

But if one of the sublists is an array of dtype object, then the result is an array of dtype object:

In [16]: np.array((20, 50, np.array((10, 20, 30), dtype='O'), (40, 50, 60)))
Out[16]: array([20, 50, array([10, 20, 30], dtype=object), (40, 50, 60)], dtype=object)

Thus the ValueError can be avoided.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.