1

I have a dataframe with a row for phone numbers. I wrote the following function to fill any NaNs with an empty string, and then add a '+' and '1' to any phone numbers that needed them.

def fixCampaignerPhone(phone):
    if phone.isnull():
        phone = ''
    phone = str(phone)
    if len(phone) == 10:
        phone = ('1' + phone)
    if len(phone) > 1:
        phone = ('+' + phone)
    return phone

I tried to apply this function to a column of a dataframe as follows: df['phone'] = df.apply(lambda row: fixCampaignerPhone(row['phone']), axis =1)

My function was not correctly identifying and replacing NaN values. Error "object of type 'float' has no len()" I worked around it with a .fillna() on a separate line, but I would like to understand why this didn't work. The function works if I manually pass a NaN value, so I assume it has to do with the fact that pandas is passing the argument as a float object, and not just a regular float.

EDIT: full working code with sample data for debugging.

import pandas as pd
import numpy as np

def fixCampaignerPhone(phone):# adds + and 1 to front of phone numbers if necessary
    if phone.isnull():
        phone = ''
    phone = str(phone)
    if len(phone) == 10:
        phone = ('1' + phone)
    if len(phone) > 1:
        phone = ('+' + phone)
    return phone

d = {0: float("NaN"), 1:"2025676789"}
sampledata = pd.Series(data = d, index = [0 , 1])
sampledata.apply(lambda row: fixCampaignerPhone(row))

EDIT 2: changing phone.isnull() to pd.isna(phone) works for my sample data, but not for my production data set, so it must just be a weird quirk in my data somewhere. For context, the phone numbers in my production dataset must either be NaN, an 11 digit string starting with 1, or a 10 digit string. However, when I run my lambda function on my production dataset, I get the error "object of type 'float' has no len()" so somehow some floats/NaNs are slipping past my if statement

5
  • Please give a full working code example so we can reproduce and help you debug. Commented Dec 17, 2021 at 23:03
  • @Malo added, thanks Commented Dec 17, 2021 at 23:17
  • you have to decide the type of phone variable. Is it a string with a phone number inside ? or a float format ? Then "isnull" is nor a string nor a float function. You have to change this. Commented Dec 17, 2021 at 23:25
  • @Malo Updated my post, but I did realize the problem with "isnull" The data is production data, and the series is either a float object nan, or a string. Unfortunately I can't enforce a single data type Commented Dec 17, 2021 at 23:28
  • plase have a look at my answer, i made it work. you have to write pd.isnull(phone) Commented Dec 17, 2021 at 23:29

2 Answers 2

1

From this imaginary DataFrame :

>>> import pandas as pd
>>> from io import StringIO

>>> df = pd.read_csv(StringIO("""
A,phone
L,3453454564
L,345345
R,345345
h,3
A,345345
L,345345
R,3453434543
R,345345
R,345345
R,345345
"""), sep=',')
>>> df
    A   phone
0   L   3453454564
1   L   345345
2   R   345345
3   h   3
4   A   345345
5   L   345345
6   R   3453434543
7   R   345345
8   R   345345
9   R   345345

We can use select from numpy to build our if segment and get the expected result :

import numpy as np

df['phone'] = df['phone'].astype(str)

condlist = [df['phone'].str.len() == 10, 
            df['phone'].str.len() > 1]

choicelist = ['1' + df['phone'],
              '+' + df['phone']]            

df['phone'] = np.select(condlist, choicelist, default='')

Output :

    A   phone
0   L   13453454564
1   L   +345345
2   R   +345345
3   h   
4   A   +345345
5   L   +345345
6   R   13453434543
7   R   +345345
8   R   +345345
9   R   +345345
Sign up to request clarification or add additional context in comments.

Comments

0

Here is a working piece of code, you have to use pd.isnull(phone) instead of phone.isnull():

import pandas as pd
import numpy as np

def fixCampaignerPhone(phone):# adds + and 1 to front of phone numbers if necessary
    if pd.isnull(phone):
        phone = ''
    phone = str(phone)
    if len(phone) == 10:
        phone = ('1' + phone)
    if len(phone) > 1:
        phone = ('+' + phone)
    return phone

d = {0: float("NaN"), 1:"2025676789"}
sampledata = pd.Series(data = d, index = [0 , 1])
r=sampledata.apply(lambda row: fixCampaignerPhone(row))

print(r)

result is:

0                
1    +12025676789
dtype: object

2 Comments

Since my problem is almost assuredly with my dataset, and this worked with the sample, I will go ahead and accept this one, thank you
Do you have some more faulty samples ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.