2

i'm pretty new here. I have a pandas dataframe like this:

                    078401115X            0790747324            0790750708

A10ODC971MDHV8          0                     0             [(354, 1), (393, 1)]
A16CZRQL23NOIW          0              [(124, 1), (697, 1)]          0
A19ZXK9HHVRV1X          0                     0                      0

And I have the index where columns are zero (for the first row):

['078401115X',
'0790747324']

Now, I'm trying to store numpy arrays of zeros in those positions of the pandas dataframe, there is anyway to do that directly without a 'for loop' I managed to do with scalar values but I can´t do that with numpy arrays.

Thanks you very much for your help.

1
  • what ever your column is you should be able to do df.loc[df['col']==0, 'col'] = df.index Commented Apr 29, 2016 at 12:28

2 Answers 2

2

Multi row assignment with .loc and DataFrame dimension matching

Here is a full solution using .loc of zero indexes and overcomes your dimension/length error

error: 'cannot set using a list-like indexer with a different length than the value'

To match the dimensions, create a DataFrame of the zero arrays in the shape you want/need when you assign to the zero indexes instead of assigning the raw arrays.

import numpy as np
import pandas as pd
from cStringIO import StringIO

# Create example DataFrame
df_text = '''
078401115X|                                                0
0790747324|                                                0
0790750708|[(354, 1), (393, 1), (447, 1), (642, 1), (886,1)]
0800103688|                                                0
5556167281|[(41, 1), (86, 1), (341, 1), (362, 1), (419, 10)]
6300157423|                                                0
6300266850|                                                0
6301699599|                                                0
6301723465|                                                0
'''
df = pd.read_table(StringIO(df_text), sep='|', index_col=0, header=None, skipinitialspace=True)

print 'Original DataFrame:'
print df
print

# Find indexes with zero data in first column
zero_indexes = df[df[1] == '0'].index

print 'Zero Indexes:'
print zero_indexes.tolist()
print

# Assign numpy zero array to indexes
df.loc[zero_indexes] = pd.DataFrame([[np.zeros(4)]], index=zero_indexes, columns=[1])

print 'New DataFrame:'
print df

Original DataFrame:
                                                            1
0                                                            
078401115X                                                  0
0790747324                                                  0
0790750708  [(354, 1), (393, 1), (447, 1), (642, 1), (886,1)]
0800103688                                                  0
5556167281  [(41, 1), (86, 1), (341, 1), (362, 1), (419, 10)]
6300157423                                                  0
6300266850                                                  0
6301699599                                                  0
6301723465                                                  0

Zero Indexes:
['078401115X', '0790747324', '0800103688', '6300157423', '6300266850', '6301699599', '6301723465']

New DataFrame:
                                                            1
0                                                            
078401115X                               [0.0, 0.0, 0.0, 0.0]
0790747324                               [0.0, 0.0, 0.0, 0.0]
0790750708  [(354, 1), (393, 1), (447, 1), (642, 1), (886,1)]
0800103688                               [0.0, 0.0, 0.0, 0.0]
5556167281  [(41, 1), (86, 1), (341, 1), (362, 1), (419, 10)]
6300157423                               [0.0, 0.0, 0.0, 0.0]
6300266850                               [0.0, 0.0, 0.0, 0.0]
6301699599                               [0.0, 0.0, 0.0, 0.0]
6301723465                               [0.0, 0.0, 0.0, 0.0]
Sign up to request clarification or add additional context in comments.

3 Comments

That it's the solution! but I was confused I have to do the same thing but instead of columns with rows. How can i do that? What do I have to change in your code? thanks you very much
I'm not sure I understand what you are asking. Can you post the data that you would like to see...?
I have updated the post, I hope it can be understandable @tmthydvnprt
1
df.loc[list_indices, column_name] = np.zeros(4)

is what you want. df is your dataframe, list_indices is the list of indices where the rows are 0, and np.zeros makes a list of zeros. Change the 4 if you want a different length of course.

the df.loc[list_indices, column_name] selects the rows that have an index within list_indices and column with column_name.

3 Comments

thanks for your answer Mathias, but i have a problem that is only the first column. There is anyway to indicate also the number of column?
@AAG Sorry for my late reply. Yes there is! See my updated answer. It is basically adding column_name in the .loc[] indexer.
thanks so much, I obtained this error: 'shape mismatch: value array of shape (4,) could not be broadcast to indexing result of shape (86,)' when I tried this: pd.loc[index_row, zeros_columns] = np.zeros(4)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.