Python 3 pandas add a column with if then statement using length

Question

Working on a dataframe in Python 3 Pandas that requires a new column to be created. I have two similar columns with different length strings. The new column should return either column 1 or 2 that has a 13 character length. In excel I would write it as: c2=if(len(b2)=13,b2,a2) then copy the formula down.

The code I need interpreted is:

df = pd.read_csv("example15.csv")

#create a new column with if-then statment
df['13_digit_#'] = (df.column1 len = 13 or df.column2 len = 13)

How would I rewrite the last line? Thanks much!

All the columns of your dataframe should return the same len(col) argument. That is, its not possible to have a dataframe with columns of different lengths. Do you mean some of the columns have missing observations and others do not? e.g. df[col1] = [a,b,c,d, N/A], df[col2] = [a,b,c,d, e]? — measure_theory
– measure_theory, Commented Oct 3, 2016 at 13:51
measure_theory - I meant that the results in each of those columns are either blank, have one or two digits, or have 13. Seeking to have the new column "clean up the data" by only giving the result with 13 characters in length. — Arthur D. Howland
– Arthur D. Howland, Commented Oct 3, 2016 at 14:06

jezrael · Accepted Answer · 2016-10-03 14:26:25Z

3

I think you can use numpy.where with str.len or apply(len):

df['13_digit_#'] = np.where((df.column1.str.len() == 13) | 
                            (df.column2.str.len() == 13), 'a', 'b')

Or if other condition:

df['13_digit_#'] = np.where(df.column1.str.len() == 13, df.column1, df.column2)

Sample:

df = pd.DataFrame({'column1':['0123456789abc','a','b'],
                   'column2':['abcabcabcabca','c','d']})

print (df)
         column1        column2
0  0123456789abc  abcabcabcabca
1              a              c
2              b              d

df['13_digit_#'] = np.where(df.column1.str.len() == 13, df.column1, df.column2)
#df['13_digit_#'] = np.where(df.column1.apply(len) == 13, df.column1, df.column2)
print (df)
         column1        column2     13_digit_#
0  0123456789abc  abcabcabcabca  0123456789abc
1              a              c              c
2              b              d              d

edited Oct 3, 2016 at 14:26

answered Oct 3, 2016 at 13:48

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Arthur D. Howland Over a year ago

Used the if-other condition, that checks out. Thanks again jezrael! Its a huge dataset and got warnings: "A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer, col_indexer] = value instead. Its ok, the results work and is exporting nicely.

jezrael Over a year ago

Glad can help you!

measure_theory · Accepted Answer · 2016-10-03 14:22:18Z

0

Assuming the blank, or missing, elements of each column are NaN, then the following will drop the column that doesn't have the full number of observations and will save it as new variable in your dataframe

import pandas as pd
import numpy as np

df = pd.DataFrame({'a':[1,2,3], 'b':[1,2,np.nan], 'b':[1, np.nan, np.nan]})

df['newcol'] = df[['a','b']].dropna(axis = 1, how = 'any')

In the last line, axis = 1 tells the command to look at each column (a and b) and "how = 'any'" tells it to drop the column that has any missing values and saves it as 'newcol'.

edited Oct 3, 2016 at 14:22

answered Oct 3, 2016 at 14:11

measure_theory

8841 gold badge10 silver badges29 bronze badges

1 Comment

Arthur D. Howland Over a year ago

Oh no I don't want to drop any data, either column will have the 13 digit string, I just want the new column to look at both old columns and use the value that has the 13 digit string.

Collectives™ on Stack Overflow

Python 3 pandas add a column with if then statement using length

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related