1

Suppose I have the following pandas data frame, df1, in a jupyter notebook from an excel file:

Name    ID      Password
A       User_1  PW_1
A       User_2  PW_2
A       User_3  PW_3
B       User_4  PW_4
B       User_5  PW_5
C       User_6  PW_6

I'd like to add a new column, called STAT, that goes through the Name column, and for every item in Name, if the previous cell in Name contained the same item, print dup (for duplicate) in STAT; otherwise, don't put anything. In my example above, Users 2,3, and 5 should have dup in the SRC column after my operation.

Here is my attempt. I add a new blank column called STAT using df1.insert, and then I run:

for index, name in enumerate(df1['Name']):
    if index > 0:
        if df1['Name'][index - 1] == name:
            df1.ix[index, 'STAT'] = 'dup'`

This works fine, but I'd like to know

a) if it can be improved

and more importantly

b) Why it's throwing a A value is trying to be set on a copy of a slice from a DataFrame warning despite my using .ix. Even .loc throws the warning.

It would be easy to check ordinarily, but I'm using jupyter notebook in PyCharm, and every time I reload the file I get a _xrsf argument missing from POST.

Relevant snippet of code, applied to my actual example. df names will differ:

sort_full = full_set.sort_values(['Name','SRC'])
dupless_full = sort_full.drop_duplicates(subset = ['Name', 'ER', 'ID', 
'PW'], keep = 'last')
dupless_full.reset_index(drop = True, inplace = True)

dupless_full['STAT'] = np.where(dupless_full['Name'] == 
dupless_full['Name'].shift(), 'dup', "")
2
  • Are values in Name column sorted? Commented Sep 18, 2017 at 5:11
  • Yes. In fact, they are sorted by another column, called SRC, taking values A or B, after they have been sorted by name. I chose to not include that information. Commented Sep 18, 2017 at 22:02

2 Answers 2

4

You can use np.where

df1['Stat'] = np.where(df['Name'] == df['Name'].shift(), 'Dupe', np.nan)

    Name    ID      Password    Stat
0   A       User_1  PW_1        nan
1   A       User_2  PW_2        Dupe
2   B       User_3  PW_3        nan
3   C       User_4  PW_4        nan
Sign up to request clarification or add additional context in comments.

7 Comments

Well, again, this works, and is faster than my solution, but it still throws the same warning.
How did you generate the dataframe? If its not generated using .copy(), it would lead to copy warning.
I'm not using .copy(), because I was told that was never the correct way to generate a data frame. Is that nonsense?
Can you post the code that you are using to create the dataframe?
Tacked it on to the end.
|
0

If values in column Name are sorted is possible use duplicated for boolean mask:

df1['Stat'] = np.where(df1['Name'].duplicated(), 'Dupe', '')
print (df1)
  Name      ID Password  Stat
0    A  User_1     PW_1      
1    A  User_2     PW_2  Dupe
2    B  User_3     PW_3      
3    C  User_4     PW_4    

If values are not sorted, I add comparison with another answer:

df1['Stat_shift'] = np.where(df1['Name'] == df1['Name'].shift(), 'Dupe', np.nan)
df1['Stat_duplicated'] = np.where(df1['Name'].duplicated(), 'Dupe', '')
print (df1)
  Name      ID Password Stat_shift Stat_duplicated
0    A  User_1     PW_1        nan                
1    A  User_2     PW_2       Dupe            Dupe
2    B  User_3     PW_3        nan                
3    A  User_2     PW_2        nan            Dupe
4    C  User_4     PW_4        nan                
5    B  User_3     PW_3        nan            Dupe
6    B  User_3     PW_3       Dupe            Dupe

6 Comments

I think, OP wants to check only previous value, not all duplicates in frame.
It is possible, but if values are sorted this should workling too.
I meant, only if values are sorted, this'll work, but I'm not sure if OP stated that anywhere.
Yes, exactly. So I add this to answer.
I want to list all duplicates as 'dup' except the first one. Let me edit my example, to clarify.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.