2

I want to delete duplicate rows with respect to column 'a' in a dataFrame with the argument 'take_last = True' unless some condition. For instance, If I had the following dataFrame

 a | b | c
 1 | S | Blue 
 2 | M | Black
 2 | L | Blue
 1 | L | Green

I want to drop duplicate rows with respect to column 'a' with the general rule as take_last = true unless some condition say, c = 'Blue', in which case I want to make the argument take_last = false.

so that I get this as my resulting df

 a | b | c
 1 | L | Green
 2 | M | Black
5
  • I don't understand. Why does take_last=True means take only Green? Commented Oct 7, 2015 at 15:11
  • That's not what I was trying to do. I edited the question now. I was just trying to give an example of my situation which is, I want to retain the last duplicate row unless some condition is True. Commented Oct 7, 2015 at 15:16
  • yes, with respect to column 'a' Commented Oct 7, 2015 at 15:22
  • Okay, now you need to explain your condition. "c = 'Blue'" is a condition which applies to a row, not a group. If I assemble all the a = 1 rows into a group, how do I determine whether or not you want the last or the first? Do you want the last unless any of the rows in the group have c = Blue? Commented Oct 7, 2015 at 15:24
  • I don't understand the example resulting df. From your description, I assumed you want to keep the last row with a given value of a, except if there are rows where c is 'Blue' - then you'd like to keep the first one of those rows, but that's not what the example df shows. Commented Oct 7, 2015 at 15:26

1 Answer 1

2
#   a  b      c
#0  1  S   Blue
#1  2  M  Black
#2  2  L   Blue
#3  1  L  Green

#get first rows of groups, sort them and reset index; delete redundant col index
df1 = df.groupby('a').head(1).sort('a').reset_index()
del df1['index']

#get last rows of groups, sort them and reset index; delete redundant col index
df2 = df.groupby('a').tail(1).sort('a').reset_index()
del df2['index']
print df1
#   a  b      c
#0  1  S   Blue
#1  2  M  Black
print df2
#   a  b      c
#0  1  L  Green
#1  2  L   Blue

#if value in col c in df1 is 'Blue' replace this row with row from df2 (indexes are same)
df1.loc[df1['c'].isin(['Blue'])] = df2
print df1
#   a  b      c
#0  1  L  Green
#1  2  M  Black
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.