I have a simple code to find similar rows in a dataset.
h=0
count=0
#227690
deletedIndexes=np.zeros((143,))
len(data)
for i in np.arange(len(data)):
if(data[i-1,2]==data[i,2]):
similarIndexes[h]=int(i)
h=h+1
count=count+1
print("similar found in -->", i," there are--->", count)
It works correctly when data is a numpy.ndarray But if data is a panda object, i give the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 7, in smilarData
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1658, in __getitem__
return self._getitem_column(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1665, in _getitem_column
return self._get_item_cache(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1005, in _get_item_cache
values = self._data.get(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 2874, in get
_, block = self._find_block(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3186, in _find_block
self._check_have(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3193, in _check_have
raise KeyError('no item named %s' % com.pprint_thing(item))
KeyError: u'no item named (-1, 2)'
What should i do to use this code? If converting pandas object to numpy array is helpful, how can i do that?
.valueson the df to get the df as a np arraydf.valueswill work