Find columns negative value in another column - dataframe

Question

I have this code:

test = {"number": ['1555','1666','1777', '1888', '1999'],
        "order_amount": ['100.00','200.00','-200.00', '300.00', '-150.00'],
        "number_of_refund": ['','','1666', '', '1888']
    }

df = pd.DataFrame(test)

Which returns the following dataframe:

  number order_amount number_of_refund
0   1555       100.00                 
1   1666       200.00                 
2   1777      -200.00             1666
3   1888       300.00                 
4   1999      -150.00             1888

I want to remove order and order refund entries if:

"number_of_refund" matches a number from "number" column (there might not be a number of order in the dataframe if order was made last month and refund during the current month)
amount of "number_of_refund" (which was matched to "number") has a negative amount of "number" amount (in this case number 1666 has 200, and refund of 1666 has -200 so both rows should be removed)

So the result in this case should be:

number order_amount number_of_refund
0   1555       100.00                 
3   1888       300.00                 
4   1999      -150.00           1888

How do I check if amount of one column's value is in another column but with opposite amount (negative)?

mozway · Accepted Answer · 2022-09-12 12:20:40Z

3

IIUC, you can use a boolean indexing approach:

# ensure numeric values
df['order_amount'] = pd.to_numeric(df['order_amount'], errors='coerce')

# is the row a refund?
m1 = df['number_of_refund'].ne('')
# get mapping of refunds
s = df[m1].set_index('number_of_refund')['order_amount']

# get reimbursements and find which ones will equal the original value
reimb = df['number'].map(s)
m2 = reimb.eq(-df['order_amount'])
m3 = df['number_of_refund'].isin(df.loc[m2, 'number'])

# keep rows that do not match any m2 or m3 mask
df = df[~(m2|m3)]

output:

  number  order_amount number_of_refund
0   1555         100.0                 
3   1888         300.0                 
4   1999        -150.0             1888

edited Sep 12, 2022 at 12:20

answered Sep 12, 2022 at 12:15

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Kirilas Over a year ago

Hello, I have a problem with this solution, so basically instead of empty in number_of_refund field if it does not have value it is NULL so I am using: m1 = df['number_of_refund'].notna() and if there is no value in that field if shows an error on the s = df[m1].set_index('number_of_refund')['order_amount'] step. How could I fix this?

mozway Over a year ago

If you want to replace NaNs with empty strings do df['number_of_refund'] = df['number_of_refund'].fillna('')

Kirilas Over a year ago

I did that but it still shows error: InvalidIndexError: Reindexing only valid with uniquely valued Index objects. Please wait, there might be a mistake in my fields formatting.

mozway Over a year ago

Did you have an initially duplicated index? You can drop it if it's not important: df = df.reset_index(drop=True), then fillna

blackraven · Accepted Answer · 2022-09-12 16:13:57Z

2

Let's say I change the refunded amount for 1999 to be -200.00

test = {"number": ['1555','1666','1777', '1888', '1999'],
        "order_amount": ['100.00','200.00','-200.00', '300.00', '-200.00'],
        "number_of_refund": ['','','1666', '', '1888']  }
df = pd.DataFrame(test)
print(df)

  number order_amount number_of_refund
0   1555       100.00                 
1   1666       200.00                 
2   1777      -200.00             1666
3   1888       300.00                 
4   1999      -200.00             1888

Here's another way to do it. I create a unique string by concatenating the number_of_refund (filled with the number column on the blanks) and the absolute order_amount (ie, without the negative sign), then drop both duplicates found

df['unique'] = df.apply(lambda x: x['order_amount'].replace('-','')+'|'+x['number'] if x['number_of_refund']=='' else x['order_amount'].replace('-','')+'|'+x['number_of_refund'], axis=1)
#df['unique'] = df['order_amount'].str.replace('-','') + '|' + df['number_of_refund'].mask(df['number_of_refund'].eq(''), df['number'])  #the same
print(df)

  number order_amount number_of_refund       unique
0   1555       100.00                   100.00|1555
1   1666       200.00                   200.00|1666    #duplicate
2   1777      -200.00             1666  200.00|1666    #duplicate
3   1888       300.00                   300.00|1888
4   1999      -200.00             1888  200.00|1888

The duplicate rows are easily identified, and ready to be dropped (including the column unique)

df = df.drop_duplicates(['unique'], keep=False).drop(columns=['unique'])
print(df)

  number order_amount number_of_refund
0   1555       100.00                 
3   1888       300.00                 
4   1999      -200.00             1888

edited Sep 12, 2022 at 16:13

answered Sep 12, 2022 at 12:53

blackraven

5,6797 gold badges27 silver badges51 bronze badges

8 Comments

mozway Over a year ago

It's a bit dangerous however to rely on unique amounts. There could be the same about for 2 different orders by coincidence.

blackraven Over a year ago

but the order number will differentiate them, right? That is, both the order number and the order amount form the unique string

mozway Over a year ago

@perpetualstudent not directly because the mapping id is split in two different columns (see. here 1666 is not duplicated in any column). You could use number_of_refund filled with the number column on the blanks though. ;)

mozway Over a year ago

I took the liberty to update to fix the flaw, fell free to revert if you want ;)

blackraven Over a year ago

sure, separator added! Is this better? Cheers!

|

Collectives™ on Stack Overflow

Find columns negative value in another column - dataframe

2 Answers 2

4 Comments

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related