Drop row if lambda returns None Pandas

Question

self.df['X'] = self.df['x'].apply(lambda x: my_map.get(x))

How can i a drop those rows where my_map.get(x) returns None.

I am looking for a solution where i do not have to iterate over the column again to drop rows.

Thanks

This sounds like you might be better served doing a left join from a dataframe made from mymap? — jeremycg
– jeremycg, Commented Nov 3, 2017 at 13:08

jezrael · Accepted Answer · 2017-11-03 13:04:32Z

4

I think you need dropna, because is possible remove None in first step, by assign to new column create NaNs:

self.df['X'] = self.df['x'].apply(lambda x: my_map.get(x))
self.df = self.df.dropna('X')

Or:

self.df = self.df[self.df['X'].notnull()]

edited Nov 3, 2017 at 13:04

answered Nov 3, 2017 at 13:00

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Raheel Over a year ago

I understand this logic, But i am afraid because my one df is 250k rows, and this is a chunk out of 60 million.

jezrael Over a year ago

@RaheelKhan - I add another solution, it is a bit faster

piRSquared Over a year ago

I suspect that if you ask a different question where you share what you are trying to do with the entire dataframe and what your lambda is, you’d get a much better answer

Raheel Over a year ago

I tried with 250k df, didn't make any such difference. Thanks

Raheel Over a year ago

@piRSquared my_map is just a dictionary, where the keys will be x i am assigning the value against that key to X a new column. But since the data is huge there will be many x which will not be matched in my dictso i dont want those records in my df.

|

piRSquared · Accepted Answer · 2017-11-03 12:58:30Z

3

Either loc or pd.Series.compress take a callable argument and return a subset where the callable evaluates to True

compress

self.df['x'].compress(lambda x: my_map.get(x) is not None)

loc

self.df['x'].loc[lambda x: my_map.get(x) is not None]

answered Nov 3, 2017 at 12:58

piRSquared

296k68 gold badges509 silver badges654 bronze badges

3 Comments

jezrael Over a year ago

So need self.df = self.df.dropna('X')

piRSquared Over a year ago

@jezrael thinking

Pyd Over a year ago

can you please check this one stackoverflow.com/questions/47096797/…

Fabian Ying · Accepted Answer · 2017-11-03 13:01:33Z

1

You can find the indices as follows

idxs = self.df.index[self.df['X'].isnull()]  # find all indices with None in df.X

Full code:

self.df['X'] = self.df['x'].apply(lambda x: my_map.get(x))
idxs = self.df.index[self.df['X'].isnull()]  # find all indices with None in df.X
self.df = self.df.drop(idxs)

edited Nov 3, 2017 at 13:01

answered Nov 3, 2017 at 12:53

Fabian Ying

1,2741 gold badge10 silver badges15 bronze badges

3 Comments

Raheel Over a year ago

do you think self.df = self.df.dropna('X') this will be more optimized way ?

jezrael Over a year ago

self.df['X'] == None return False :(

Fabian Ying Over a year ago

@jezrael Yes, you're right. You can use .isnull(), but I think self.df.dropna('X') is a cleaner solution.

jeremycg · Accepted Answer · 2017-11-03 13:26:39Z

0

You can do this as a merge, if you convert your mymap to a dict:

mymerge = pd.DataFrame.from_dict(mymap, orient = 'index')

Then use a left join, to only join on the required columns:

mymerge.merge(df, left_index = True, right_on = 'x')

In one line:

pd.DataFrame.from_dict(mymap, orient = 'index').merge(df, left_index = True, right_on = 'x')

answered Nov 3, 2017 at 13:26

jeremycg

25k6 gold badges67 silver badges77 bronze badges

Collectives™ on Stack Overflow

Drop row if lambda returns None Pandas

4 Answers 4

8 Comments

3 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

8 Comments

3 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related