Suppose I have the following pandas data frame, df1, in a jupyter notebook from an excel file:
Name ID Password
A User_1 PW_1
A User_2 PW_2
A User_3 PW_3
B User_4 PW_4
B User_5 PW_5
C User_6 PW_6
I'd like to add a new column, called STAT, that goes through the Name column, and for every item in Name, if the previous cell in Name contained the same item, print dup (for duplicate) in STAT; otherwise, don't put anything. In my example above, Users 2,3, and 5 should have dup in the SRC column after my operation.
Here is my attempt. I add a new blank column called STAT using df1.insert, and then I run:
for index, name in enumerate(df1['Name']):
if index > 0:
if df1['Name'][index - 1] == name:
df1.ix[index, 'STAT'] = 'dup'`
This works fine, but I'd like to know
a) if it can be improved
and more importantly
b) Why it's throwing a A value is trying to be set on a copy of a slice from a DataFrame warning despite my using .ix. Even .loc throws the warning.
It would be easy to check ordinarily, but I'm using jupyter notebook in PyCharm, and every time I reload the file I get a _xrsf argument missing from POST.
Relevant snippet of code, applied to my actual example. df names will differ:
sort_full = full_set.sort_values(['Name','SRC'])
dupless_full = sort_full.drop_duplicates(subset = ['Name', 'ER', 'ID',
'PW'], keep = 'last')
dupless_full.reset_index(drop = True, inplace = True)
dupless_full['STAT'] = np.where(dupless_full['Name'] ==
dupless_full['Name'].shift(), 'dup', "")
Namecolumn sorted?