3

I am trying to find rows with column values are equal to eachother. Or maybe little different then other (1, 0.5 and etc.) or even at least 2 columns are equal.

df.head(10)

         a        b        c        d
0  1128.70  1137.00  1121.30  1132.05
1  1130.20  1142.30  1109.10  1114.90
2  1113.40  1127.90  1109.85  1124.55
3  1126.25  1129.30  1111.20  1124.50
4  1124.45  1141.10  1121.00  1137.95
5  1137.90  1141.90  1094.50  1098.25
6  1097.60  1117.00  1095.65  1112.50
7  1111.05  1119.10  1089.85  1092.10
8  1092.75  1097.60  1074.10  1083.75
9  1083.60  1096.05  1079.10  1087.20

in the table above, i am trying to find rows with values equal to eachother (or close to eachother). Let's say:

125  1020.50  1020.50  1020.50  1020.50
452  1047.88  1047.88  1046.95  1048.01
2
  • 1
    Can you define what the tolerance of similar numbers is? A difference of 1, 10, etc? Commented Apr 25, 2018 at 8:11
  • actually "n" is the tolerance. So it can be changed Commented Apr 25, 2018 at 8:27

2 Answers 2

1

Can you just look at the standard deviation?

>>> import pandas as pd
>>> pd.DataFrame({'a': {0: 1128.7, 1: 1130.2, 2: 1113.4, 3: 1126.25, 4: 1124.45, 5: 1137.9, 6: 1097.6, 7: 1111.05, 8: 1092.75, 9: 1083.6, 125: 1020.5, 452: 1047.88}, 'b': {0: 1137.0, 1: 1142.3, 2: 1127.9, 3: 1129.3, 4: 1141.1, 5: 1141.9, 6: 1117.0, 7: 1119.1, 8: 1097.6, 9: 1096.05, 125: 1020.5, 452: 1047.88}, 'c': {0: 1121.3, 1: 1109.1, 2: 1109.85, 3: 1111.2, 4: 1121.0, 5: 1094.5, 6: 1095.65, 7: 1089.85, 8: 1074.1, 9: 1079.1, 125: 1020.5, 452: 1046.95}, 'd': {0: 1132.05, 1: 1114.9, 2: 1124.55, 3: 1124.5, 4: 1137.95, 5: 1098.25, 6: 1112.5, 7: 1092.1, 8: 1083.75, 9: 1087.2, 125: 1020.5, 452: 1048.01}})

           a        b        c        d
0    1128.70  1137.00  1121.30  1132.05
1    1130.20  1142.30  1109.10  1114.90
2    1113.40  1127.90  1109.85  1124.55
3    1126.25  1129.30  1111.20  1124.50
4    1124.45  1141.10  1121.00  1137.95
5    1137.90  1141.90  1094.50  1098.25
6    1097.60  1117.00  1095.65  1112.50
7    1111.05  1119.10  1089.85  1092.10
8    1092.75  1097.60  1074.10  1083.75
9    1083.60  1096.05  1079.10  1087.20
125  1020.50  1020.50  1020.50  1020.50
452  1047.88  1047.88  1046.95  1048.01

>>> import numpy as np
>>> np.std(df.values, axis=1)

array([  5.70869676,  13.02005664,   7.50120824,   6.92101645,
         8.56084838,  21.84866629,   9.22688836,  12.40707963,
         8.97754142,   6.22217556,   0.        ,   0.42479407])

You can see that your last two example rows have much lower standard deviations, 0 if all the values are equal. Now you can just compare against a threshold:

>>> n = 1
>>> np.std(df.values, axis=1) < n

array([False, False, False, False, False, False, False, False, False,
       False,  True,  True], dtype=bool)
Sign up to request clarification or add additional context in comments.

2 Comments

Sorry i was away for a while. But how can i get these values from DataFrame? I mean let's say i have assigned return array to "arr". How can i get rows equals to this array from DataFrame?
I have got the idea. That's a perfect solution for me. Thanks a million!
0

you could convert your data in numpy array. npData

then rowIndex = [iter for iter in range(npData.shape[0]) if np.std(npData[iter,1:]) <= threshold]

something like that ? you may need a "normalization" of your data for the thresholding based on std.

or use the zscore from scipy on the npData array along the good axis https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.zscore.html, and then use this zscored array to find rows where the sum of each absolute value of the colomn is <= threshold

best solution I found for the at least 2 columns equal:

`from itertools import combinations
 n=2
 test=[]
 for (x1,x2) in combinations(df.values.T,2):
    diff = numpy.where(abs(x1-x2)<n)
    test = numpy.union1d(test,diff[0])`

or just append test and then do a numpy.histogram to find when at least three or more columns are equal

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.