2

I have a simple code to find similar rows in a dataset.

 h=0
count=0
#227690
deletedIndexes=np.zeros((143,))
len(data)
for i in np.arange(len(data)):
    if(data[i-1,2]==data[i,2]):
        similarIndexes[h]=int(i)
        h=h+1        
        count=count+1
        print("similar found in -->", i," there are--->", count)

It works correctly when data is a numpy.ndarray But if data is a panda object, i give the following error:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
 File "<stdin>", line 7, in smilarData
  File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1658, in __getitem__
return self._getitem_column(key)
  File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1665, in _getitem_column

return self._get_item_cache(key)

File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1005, in _get_item_cache
values = self._data.get(item)



File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 2874, in get
_, block = self._find_block(item)



File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3186, in _find_block
self._check_have(item)



 File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3193, in _check_have


 raise KeyError('no item named %s' % com.pprint_thing(item))
KeyError: u'no item named (-1, 2)'

What should i do to use this code? If converting pandas object to numpy array is helpful, how can i do that?

1
  • 1
    You can just call .values on the df to get the df as a np array df.values will work Commented Oct 24, 2015 at 20:32

3 Answers 3

1

I can not comment yet to Adrienne's answer so I would like to add that dataframes have built in method to convert df to array i.e. matrix

>>> df = pd.DataFrame({"a":range(5),"b":range(5,10)})
>>> df
   a  b
0  0  5
1  1  6
2  2  7
3  3  8
4  4  9
>>> mat = df.as_matrix()
array([[0, 5],
       [1, 6],
       [2, 7],
       [3, 8],
       [4, 9]])
>>>col = [x[0] for x in mat] # to get certain columns
>>> col
[0, 1, 2, 3, 4]

also to find duplicated rows you can do:

>>> df2
   a  b
0  0  5
1  1  6
2  2  7
3  3  8
4  4  9
5  0  5
>>> df2[df2.duplicated()]
   a  b
5  0  5
Sign up to request clarification or add additional context in comments.

Comments

1

To convert a pandas dataframe to a numpy array:

import numpy as np
np.array(dataFrame)

Comments

0

I subscribe to the previous answers but in case you want to work directly with pandas objects, accessing DataFrame items has its own special way. In your code you should say e.g.

if(data.iloc[i-1,2]==data.iloc[i,2]):

See the documentation for more

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.