0

I am working on a dataset with tweets and I am trying to find the mentions to other users in a tweet, these tweets can have none, single or multiple users mentioned.

Here is the head of the DataFrame:

Head of the DataFrame

The following is the function that I created to extract the list of mentions in a tweet:

def getMention(text):
    mention = re.findall('(^|[^@\w])@(\w{1,15})', text)
    if len(mention) > 0:
        return [x[1] for x in mention]
    else:
        return None

I'm trying to create a new column in the DataFrame and apply the function with the following code:

 df['mention'] = df['text'].apply(getMention)

On running this code I get the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-43-426da09a8770> in <module>
----> 1 df['mention'] = df['text'].apply(getMention)

~/anaconda3_501/lib/python3.6/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   3192             else:
   3193                 values = self.astype(object).values
-> 3194                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3195 
   3196         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()

<ipython-input-42-d27373022afd> in getMention(text)
      1 def getMention(text):
      2 
----> 3     mention = re.findall('(^|[^@\w])@(\w{1,15})', text)
      4     if len(mention) > 0:
      5         return [x[1] for x in mention]

~/anaconda3_501/lib/python3.6/re.py in findall(pattern, string, flags)
    220 
    221     Empty matches are included in the result."""
--> 222     return _compile(pattern, flags).findall(string)
    223 
    224 def finditer(pattern, string, flags=0):

TypeError: expected string or bytes-like object

1 Answer 1

1

I can't comment (not enough rep) so here's what I suggest to troubleshoot the error. It seems findall raises an exception because text is not a string so you might want to check which type text actually is, using this:

def getMention(text):
    print(type(text))
    mention = re.findall(r'(^|[^@\w])@(\w{1,15})', text)
    if len(mention) > 0:
        return [x[1] for x in mention]
    else:
        return None

(or the debugger if you know how to)

And if text can be converted to a string maybe try this ?

def getMention(text):
    mention = re.findall(r'(^|[^@\w])@(\w{1,15})', str(text))
    if len(mention) > 0:
        return [x[1] for x in mention]
    else:
        return None

P.S.: don't forget the r'...' in front of your regexp, to avoid special chars to be interpreted

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! There was blank string in one of the rows which was being treated as float type, I had tried printing the column values but I did notice a blank row but I assumed it will be still treated as a blank string and not NaN value. Just made a slight change to the code to just print the types of the values which are not str this way it's easier to find the type error.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.