2

I have a python program which process multiple files. Each file have customer id column based on department. Some file has 8 digit customer id and some have 9 digit customer id. I need to do this

if length of customer id column value is == 8:
    input_file['customer_id'] = input_file['customer_id'].str[0:2] + '-' + input_file['customer_id'].str[2:8]
if length of customer id column value is == 9:
    input_file['customer_id'] = 'K' + '-' + input_file['customer_id'].str[0:3] + '-' + input_file['customer_id'].str[3:8]

Input

id cusotmerid
1  89898988
2  898989889

Output

id cusotmerid
1  89-898988
2  K-898-989889

How can I achieve it. Unable to find anything which can do this

2 Answers 2

3

You can also do something like this.

You can use pd.Series.map with len built-in to find out the length of the column value. With that you can determine how to assign the value.

Using astype(str) converts the numeric value to string so you can do string concatenation.

input_file.loc[input_file['cusotmerid'].astype(str).map(len) == 8, 'new_customer_id'] = input_file['cusotmerid'].astype(str).str[:2]+ '-' + input_file['cusotmerid'].astype(str).str[2:]

input_file.loc[input_file['cusotmerid'].astype(str).map(len) == 9, 'new_customer_id'] = 'K-' + input_file['cusotmerid'].astype(str).str[:3] + '-' + input_file['cusotmerid'].astype(str).str[3:]

This should take are of the new value assignments.

The output of this is:

   cusotmerid new_customer_id
0    89898988       89-898988
1   898989889    K-898-989889
Sign up to request clarification or add additional context in comments.

3 Comments

I use .apply(lambda x: len(x) == all of the time. .map(len) is MUCH better. + 1
apply() is a bit slower than inbuilt functions. so i try to use what's already available.
exactly, I wasn't aware of this one. Apply should be a last resort if there are no available methods.
1

you can use np.select. In order to check the length of the string, you have to first make sure that the column's format is a string, hence .astype(str). Then, you can use .apply(lambda x: len(x) == condition) to to return a result based on the condition:

import numpy as np
input_file['cusotmerid'] = input_file['cusotmerid'].astype(str)
input_file['cusotmerid'] = np.select([input_file['cusotmerid'].apply(lambda x: len(x) == 8),
                                     input_file['cusotmerid'].apply(lambda x: len(x) == 9)],
                                     [input_file['cusotmerid'].str[0:2] + '-' + input_file['cusotmerid'].str[2:8],
                                     'K' + '-' + input_file['cusotmerid'].str[0:3] + '-' + input_file['cusotmerid'].str[3:9]],
                                      input_file['cusotmerid'])
input_file

    id  cusotmerid
0   1   89-898988
1   2   K-898-989889

It may be easier to break the np.select statement down into conditions and results. The 3 parameters I pass are conditions, results, and default value if no conditions are met.

input_file['cusotmerid'] = input_file['cusotmerid'].astype(str)

c1 = input_file['cusotmerid'].apply(lambda x: len(x) == 8)
c2 = input_file['cusotmerid'].apply(lambda x: len(x) == 9)

conditions = [c1, c2]

r1 = input_file['cusotmerid'].str[0:2] + '-' + input_file['cusotmerid'].str[2:8]
r2 = 'K' + '-' + input_file['cusotmerid'].str[0:3] + '-' + input_file['cusotmerid'].str[3:9]

results = [r1,r2]

input_file['cusotmerid'] = np.select(conditions, results, input_file['cusotmerid'])
input_file

    id  cusotmerid
0   1   89-898988
1   2   K-898-989889

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.