3

i wanna make if statement to show all REF_INT that are duplicated i tried this:

(df_picru['REF_INT'].value_counts()==1)

and it shows me all values with true or false but i dont wanna do something like this:

if (df_picru['REF_INT'].value_counts()==1)
print "df_picru['REF_INT']"

3 Answers 3

3
In [28]: df_picru['new'] = \
             df_picru['REF_INT'].duplicated(keep=False) \
                     .map({True:'duplicates',False:'unique'})

In [29]: df_picru
Out[29]:
   REF_INT         new
0        1      unique
1        2  duplicates
2        3      unique
3        8  duplicates
4        8  duplicates
5        2  duplicates
Sign up to request clarification or add additional context in comments.

1 Comment

i wanted to see only if there is any value duplicated because i have a lot of data and when i applied this it didnt show me all columns.
2

I think you need duplicated for boolean mask and for new column numpy.where:

mask = df_picru['REF_INT'].duplicated(keep=False)

Sample:

df_picru = pd.DataFrame({'REF_INT':[1,2,3,8,8,2]})

mask = df_picru['REF_INT'].duplicated(keep=False)
print (mask)
0    False
1     True
2    False
3     True
4     True
5     True
Name: REF_INT, dtype: bool

df_picru['new'] = np.where(mask, 'duplicates', 'unique')
print (df_picru)
   REF_INT         new
0        1      unique
1        2  duplicates
2        3      unique
3        8  duplicates
4        8  duplicates
5        2  duplicates

If need check at least one if unique value need any for convert boolean mask - array to scalar True or False:

if mask.any():
    print ('at least one unique')
at least one unique

1 Comment

thank you @jezrael. i was looking for at least one duplicated because i wanted to know if there is any value duplicated in my column.
1

Another solution using groupby.

#groupby REF_INT and then count the occurrence and set as duplicate if count is greater than 1
df_picru.groupby('REF_INT').apply(lambda x: 'Duplicated' if len(x)> 1 else 'Unique')
Out[21]: 
REF_INT
1        Unique
2    Duplicated
3        Unique
8    Duplicated
dtype: object

value_counts can actually work if you make a minor change:

df_picru.REF_INT.value_counts()[lambda x: x>1]
Out[31]: 
2    2
8    2
Name: REF_INT, dtype: int64

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.