Lambda function for creating two different columns in pandas dataframes

Question

I have a pandas dataframe having HTML based text field from which I want to derive two fields; the count of tags in it and clean text without any tag. I am using BeautifulSoup to perform the functions. Say,

df_ads['content_elements_cnt'] = df_ads['content'].apply(lambda x: dict(Counter([element.name for element in BeautifulSoup(x).html if element.name != None])))
df_ads['content_refined'] = df_ads['content'].apply(lambda x : BeautifulSoup(x).text)

Is it possible if I can encapsulate the above two statements in one function, call it in apply function to generate two columns (I want to utilize BeautifulSoup instantiation and looping only for one). In other words, is there an efficient way of doing these two operations?

Can you provide a minimal reproducible example of the dataset? — mozway
– mozway, Commented Jan 23, 2022 at 22:09

mozway · Accepted Answer · 2022-01-23 22:14:59Z

1

You could use a helper function and return a Series:

def bs_extract(x):
    soup = BeautifulSoup(x)
    return pd.Series({'content_elements_cnt': dict(Counter([element.name for element in soup.html if element.name != None])),
                      'content_refined': soup.text})

df_ads[['content_elements_cnt', 'content_refined']] = df_ads['content'].apply(bs_extract)

NB. the code is untested (no input provided)

answered Jan 23, 2022 at 22:14

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Lambda function for creating two different columns in pandas dataframes

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related