Filtering out null values with a lambda function in pandas

Question

I have a dataframe with a row for phone numbers. I wrote the following function to fill any NaNs with an empty string, and then add a '+' and '1' to any phone numbers that needed them.

def fixCampaignerPhone(phone):
    if phone.isnull():
        phone = ''
    phone = str(phone)
    if len(phone) == 10:
        phone = ('1' + phone)
    if len(phone) > 1:
        phone = ('+' + phone)
    return phone

I tried to apply this function to a column of a dataframe as follows: df['phone'] = df.apply(lambda row: fixCampaignerPhone(row['phone']), axis =1)

My function was not correctly identifying and replacing NaN values. Error "object of type 'float' has no len()" I worked around it with a .fillna() on a separate line, but I would like to understand why this didn't work. The function works if I manually pass a NaN value, so I assume it has to do with the fact that pandas is passing the argument as a float object, and not just a regular float.

EDIT: full working code with sample data for debugging.

import pandas as pd
import numpy as np

def fixCampaignerPhone(phone):# adds + and 1 to front of phone numbers if necessary
    if phone.isnull():
        phone = ''
    phone = str(phone)
    if len(phone) == 10:
        phone = ('1' + phone)
    if len(phone) > 1:
        phone = ('+' + phone)
    return phone

d = {0: float("NaN"), 1:"2025676789"}
sampledata = pd.Series(data = d, index = [0 , 1])
sampledata.apply(lambda row: fixCampaignerPhone(row))

EDIT 2: changing phone.isnull() to pd.isna(phone) works for my sample data, but not for my production data set, so it must just be a weird quirk in my data somewhere. For context, the phone numbers in my production dataset must either be NaN, an 11 digit string starting with 1, or a 10 digit string. However, when I run my lambda function on my production dataset, I get the error "object of type 'float' has no len()" so somehow some floats/NaNs are slipping past my if statement

Please give a full working code example so we can reproduce and help you debug. — Malo
– Malo, Commented Dec 17, 2021 at 23:03
you have to decide the type of phone variable. Is it a string with a phone number inside ? or a float format ? Then "isnull" is nor a string nor a float function. You have to change this. — Malo
– Malo, Commented Dec 17, 2021 at 23:25
@Malo Updated my post, but I did realize the problem with "isnull" The data is production data, and the series is either a float object nan, or a string. Unfortunately I can't enforce a single data type — Joel Olazagasti
– Joel Olazagasti, Commented Dec 17, 2021 at 23:28
plase have a look at my answer, i made it work. you have to write pd.isnull(phone) — Malo
– Malo, Commented Dec 17, 2021 at 23:29

tlentali · Accepted Answer · 2021-12-17 23:17:09Z

From this imaginary DataFrame :

>>> import pandas as pd
>>> from io import StringIO

>>> df = pd.read_csv(StringIO("""
A,phone
L,3453454564
L,345345
R,345345
h,3
A,345345
L,345345
R,3453434543
R,345345
R,345345
R,345345
"""), sep=',')
>>> df
    A   phone
0   L   3453454564
1   L   345345
2   R   345345
3   h   3
4   A   345345
5   L   345345
6   R   3453434543
7   R   345345
8   R   345345
9   R   345345

We can use select from numpy to build our if segment and get the expected result :

import numpy as np

df['phone'] = df['phone'].astype(str)

condlist = [df['phone'].str.len() == 10, 
            df['phone'].str.len() > 1]

choicelist = ['1' + df['phone'],
              '+' + df['phone']]            

df['phone'] = np.select(condlist, choicelist, default='')

Output :

    A   phone
0   L   13453454564
1   L   +345345
2   R   +345345
3   h   
4   A   +345345
5   L   +345345
6   R   13453434543
7   R   +345345
8   R   +345345
9   R   +345345

Malo · Accepted Answer · 2021-12-17 23:28:10Z

0

Here is a working piece of code, you have to use pd.isnull(phone) instead of phone.isnull():

import pandas as pd
import numpy as np

def fixCampaignerPhone(phone):# adds + and 1 to front of phone numbers if necessary
    if pd.isnull(phone):
        phone = ''
    phone = str(phone)
    if len(phone) == 10:
        phone = ('1' + phone)
    if len(phone) > 1:
        phone = ('+' + phone)
    return phone

d = {0: float("NaN"), 1:"2025676789"}
sampledata = pd.Series(data = d, index = [0 , 1])
r=sampledata.apply(lambda row: fixCampaignerPhone(row))

print(r)

result is:

0                
1    +12025676789
dtype: object

answered Dec 17, 2021 at 23:28

Malo

1,3131 gold badge11 silver badges29 bronze badges

2 Comments

Joel Olazagasti Over a year ago

Since my problem is almost assuredly with my dataset, and this worked with the sample, I will go ahead and accept this one, thank you

Malo Over a year ago

Do you have some more faulty samples ?

Collectives™ on Stack Overflow

Filtering out null values with a lambda function in pandas

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related