Store numpy array in multiples cells of pandas dataframe (Python)

Question

i'm pretty new here. I have a pandas dataframe like this:

                    078401115X            0790747324            0790750708

A10ODC971MDHV8          0                     0             [(354, 1), (393, 1)]
A16CZRQL23NOIW          0              [(124, 1), (697, 1)]          0
A19ZXK9HHVRV1X          0                     0                      0

And I have the index where columns are zero (for the first row):

['078401115X',
'0790747324']

Now, I'm trying to store numpy arrays of zeros in those positions of the pandas dataframe, there is anyway to do that directly without a 'for loop' I managed to do with scalar values but I can´t do that with numpy arrays.

Thanks you very much for your help.

what ever your column is you should be able to do df.loc[df['col']==0, 'col'] = df.index — EdChum
– EdChum, Commented Apr 29, 2016 at 12:28

tmthydvnprt · Accepted Answer · 2016-04-29 13:50:52Z

Multi row assignment with `.loc` and `DataFrame` dimension matching

Here is a full solution using .loc of zero indexes and overcomes your dimension/length error

error: 'cannot set using a list-like indexer with a different length than the value'

To match the dimensions, create a DataFrame of the zero arrays in the shape you want/need when you assign to the zero indexes instead of assigning the raw arrays.

import numpy as np
import pandas as pd
from cStringIO import StringIO

# Create example DataFrame
df_text = '''
078401115X|                                                0
0790747324|                                                0
0790750708|[(354, 1), (393, 1), (447, 1), (642, 1), (886,1)]
0800103688|                                                0
5556167281|[(41, 1), (86, 1), (341, 1), (362, 1), (419, 10)]
6300157423|                                                0
6300266850|                                                0
6301699599|                                                0
6301723465|                                                0
'''
df = pd.read_table(StringIO(df_text), sep='|', index_col=0, header=None, skipinitialspace=True)

print 'Original DataFrame:'
print df
print

# Find indexes with zero data in first column
zero_indexes = df[df[1] == '0'].index

print 'Zero Indexes:'
print zero_indexes.tolist()
print

# Assign numpy zero array to indexes
df.loc[zero_indexes] = pd.DataFrame([[np.zeros(4)]], index=zero_indexes, columns=[1])

print 'New DataFrame:'
print df

Original DataFrame:
                                                            1
0                                                            
078401115X                                                  0
0790747324                                                  0
0790750708  [(354, 1), (393, 1), (447, 1), (642, 1), (886,1)]
0800103688                                                  0
5556167281  [(41, 1), (86, 1), (341, 1), (362, 1), (419, 10)]
6300157423                                                  0
6300266850                                                  0
6301699599                                                  0
6301723465                                                  0

Zero Indexes:
['078401115X', '0790747324', '0800103688', '6300157423', '6300266850', '6301699599', '6301723465']

New DataFrame:
                                                            1
0                                                            
078401115X                               [0.0, 0.0, 0.0, 0.0]
0790747324                               [0.0, 0.0, 0.0, 0.0]
0790750708  [(354, 1), (393, 1), (447, 1), (642, 1), (886,1)]
0800103688                               [0.0, 0.0, 0.0, 0.0]
5556167281  [(41, 1), (86, 1), (341, 1), (362, 1), (419, 10)]
6300157423                               [0.0, 0.0, 0.0, 0.0]
6300266850                               [0.0, 0.0, 0.0, 0.0]
6301699599                               [0.0, 0.0, 0.0, 0.0]
6301723465                               [0.0, 0.0, 0.0, 0.0]

That it's the solution! but I was confused I have to do the same thing but instead of columns with rows. How can i do that? What do I have to change in your code? thanks you very much
I'm not sure I understand what you are asking. Can you post the data that you would like to see...?
I have updated the post, I hope it can be understandable @tmthydvnprt

Mathias711 · Accepted Answer · 2016-04-29 13:27:24Z

1

df.loc[list_indices, column_name] = np.zeros(4)

is what you want. df is your dataframe, list_indices is the list of indices where the rows are 0, and np.zeros makes a list of zeros. Change the 4 if you want a different length of course.

the df.loc[list_indices, column_name] selects the rows that have an index within list_indices and column with column_name.

edited Apr 29, 2016 at 13:27

answered Apr 29, 2016 at 12:28

Mathias711

6,6864 gold badges48 silver badges61 bronze badges

3 Comments

AAG Over a year ago

thanks for your answer Mathias, but i have a problem that is only the first column. There is anyway to indicate also the number of column?

Mathias711 Over a year ago

@AAG Sorry for my late reply. Yes there is! See my updated answer. It is basically adding column_name in the .loc[] indexer.

AAG Over a year ago

thanks so much, I obtained this error: 'shape mismatch: value array of shape (4,) could not be broadcast to indexing result of shape (86,)' when I tried this: pd.loc[index_row, zeros_columns] = np.zeros(4)

Collectives™ on Stack Overflow

Store numpy array in multiples cells of pandas dataframe (Python)

2 Answers 2

Multi row assignment with `.loc` and `DataFrame` dimension matching

3 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Multi row assignment with .loc and DataFrame dimension matching

3 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related

Multi row assignment with `.loc` and `DataFrame` dimension matching