Is there a faster method to process pandas list of string values

There are 13000 values approximately for a given column. The below function works in a way that the input is a list of strings and does the NER tagging for each word in the list. On an average there could be 300 words in a list across 13000 values. It takes around more than 1 hour for the function to process the current column. Hence, I would like to have a solution which processed it faster. I am running on azure ml notebook with a standard CPU compute.

Function :

def perform_ner_batch(texts):
    if not texts:  # Check if texts is empty
        return []
    # Perform NER on the provided texts
    list_entity = []
    for i in texts:
      ner_result = ner_pipeline(i)
      if ner_result == []:
        list_entity.append('O')
      for results in ner_result:
        list_entity.append(results['entity_group'])
    return list_entity

Calling the function:

df['entities'] = df['Tokenized_Abstract_list'].apply(lambda x: perform_ner_batch(x))

asked Feb 14, 2024 at 10:38

srinivas muralidharan

392 silver badges11 bronze badges

3

read notice : minimal reproducible example

Panda Kim
– Panda Kim

2024-02-14 10:39:28 +00:00
Commented Feb 14, 2024 at 10:39
Maybe you could try swifter

user19077881
– user19077881

2024-02-14 11:53:33 +00:00
Commented Feb 14, 2024 at 11:53
Have u tried with GPU?

Venkat
– Venkat

2024-02-14 11:55:40 +00:00
Commented Feb 14, 2024 at 11:55
If I get it right, pandas is irrelevant here, bottleneck is on ner_pipeline(i).

dankal444
– dankal444

2024-02-14 12:33:27 +00:00
Commented Feb 14, 2024 at 12:33
On GPU it works faster. However, it still takes hours for rows more than 8k or so. Data with 1k takes just 14 minutes. It is a lot better, I think given that ner_pipeline function logic makes it a bit longer.

srinivas muralidharan
– srinivas muralidharan

2024-02-14 16:52:41 +00:00
Commented Feb 14, 2024 at 16:52

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Is there a faster method to process pandas list of string values

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest