1
self.df['X'] = self.df['x'].apply(lambda x: my_map.get(x))

How can i a drop those rows where my_map.get(x) returns None.

I am looking for a solution where i do not have to iterate over the column again to drop rows.

Thanks

3
  • Does this means first i apply and then run dropna ? Commented Nov 3, 2017 at 12:54
  • 1
    Yes, exactly. I think it is not possible in one step Commented Nov 3, 2017 at 12:55
  • This sounds like you might be better served doing a left join from a dataframe made from mymap? Commented Nov 3, 2017 at 13:08

4 Answers 4

4

I think you need dropna, because is possible remove None in first step, by assign to new column create NaNs:

self.df['X'] = self.df['x'].apply(lambda x: my_map.get(x))
self.df = self.df.dropna('X')

Or:

self.df = self.df[self.df['X'].notnull()]
Sign up to request clarification or add additional context in comments.

8 Comments

I understand this logic, But i am afraid because my one df is 250k rows, and this is a chunk out of 60 million.
@RaheelKhan - I add another solution, it is a bit faster
I suspect that if you ask a different question where you share what you are trying to do with the entire dataframe and what your lambda is, you’d get a much better answer
I tried with 250k df, didn't make any such difference. Thanks
@piRSquared my_map is just a dictionary, where the keys will be x i am assigning the value against that key to X a new column. But since the data is huge there will be many x which will not be matched in my dictso i dont want those records in my df.
|
3

Either loc or pd.Series.compress take a callable argument and return a subset where the callable evaluates to True

compress

self.df['x'].compress(lambda x: my_map.get(x) is not None)

loc

self.df['x'].loc[lambda x: my_map.get(x) is not None]

3 Comments

So need self.df = self.df.dropna('X')
@jezrael thinking
can you please check this one stackoverflow.com/questions/47096797/…
1

You can find the indices as follows

idxs = self.df.index[self.df['X'].isnull()]  # find all indices with None in df.X

Full code:

self.df['X'] = self.df['x'].apply(lambda x: my_map.get(x))
idxs = self.df.index[self.df['X'].isnull()]  # find all indices with None in df.X
self.df = self.df.drop(idxs)

3 Comments

do you think self.df = self.df.dropna('X') this will be more optimized way ?
self.df['X'] == None return False :(
@jezrael Yes, you're right. You can use .isnull(), but I think self.df.dropna('X') is a cleaner solution.
0

You can do this as a merge, if you convert your mymap to a dict:

mymerge = pd.DataFrame.from_dict(mymap, orient = 'index')

Then use a left join, to only join on the required columns:

mymerge.merge(df, left_index = True, right_on = 'x')

In one line:

pd.DataFrame.from_dict(mymap, orient = 'index').merge(df, left_index = True, right_on = 'x')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.