Converting panda object to numpy array

Question

I have a simple code to find similar rows in a dataset.

 h=0
count=0
#227690
deletedIndexes=np.zeros((143,))
len(data)
for i in np.arange(len(data)):
    if(data[i-1,2]==data[i,2]):
        similarIndexes[h]=int(i)
        h=h+1        
        count=count+1
        print("similar found in -->", i," there are--->", count)

It works correctly when data is a numpy.ndarray But if data is a panda object, i give the following error:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
 File "<stdin>", line 7, in smilarData
  File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1658, in __getitem__
return self._getitem_column(key)
  File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1665, in _getitem_column

return self._get_item_cache(key)

File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1005, in _get_item_cache
values = self._data.get(item)



File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 2874, in get
_, block = self._find_block(item)



File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3186, in _find_block
self._check_have(item)



 File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3193, in _check_have


 raise KeyError('no item named %s' % com.pprint_thing(item))
KeyError: u'no item named (-1, 2)'

What should i do to use this code? If converting pandas object to numpy array is helpful, how can i do that?

You can just call .values on the df to get the df as a np array df.values will work — EdChum
– EdChum, Commented Oct 24, 2015 at 20:32

redacted · Accepted Answer · 2015-10-24 08:03:25Z

1

I can not comment yet to Adrienne's answer so I would like to add that dataframes have built in method to convert df to array i.e. matrix

>>> df = pd.DataFrame({"a":range(5),"b":range(5,10)})
>>> df
   a  b
0  0  5
1  1  6
2  2  7
3  3  8
4  4  9
>>> mat = df.as_matrix()
array([[0, 5],
       [1, 6],
       [2, 7],
       [3, 8],
       [4, 9]])
>>>col = [x[0] for x in mat] # to get certain columns
>>> col
[0, 1, 2, 3, 4]

also to find duplicated rows you can do:

>>> df2
   a  b
0  0  5
1  1  6
2  2  7
3  3  8
4  4  9
5  0  5
>>> df2[df2.duplicated()]
   a  b
5  0  5

edited Oct 24, 2015 at 8:03

answered Oct 24, 2015 at 7:56

redacted

4,0096 gold badges28 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Adrienne · Accepted Answer · 2015-10-24 06:32:56Z

1

To convert a pandas dataframe to a numpy array:

import numpy as np
np.array(dataFrame)

answered Oct 24, 2015 at 6:32

Adrienne

1931 gold badge2 silver badges9 bronze badges

Comments

Yannis P. · Accepted Answer · 2015-10-24 08:14:50Z

0

I subscribe to the previous answers but in case you want to work directly with pandas objects, accessing DataFrame items has its own special way. In your code you should say e.g.

if(data.iloc[i-1,2]==data.iloc[i,2]):

See the documentation for more

answered Oct 24, 2015 at 8:14

Yannis P.

2,7851 gold badge29 silver badges44 bronze badges

Collectives™ on Stack Overflow

Converting panda object to numpy array

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related