1

can someone help me create two new columns in this dataframe?

The desire is to parse the state out, "s" and then ensure that the state is removed from the original title string. The result would be to include the original title, the cleaned title (without the trailing State) and finally the state name.

df=pd.Series(['Accommodation Payroll Employment in Texas',
          'Accounting, Tax Preparation, Bookkeeping, and Payroll Services    Payroll Employment in Texas']).to_frame()
df.columns=['title']

def state_code(row):
    t=None
    s=None
    if len(row['title'].split(' in '))==2: 
        s=str(row['title'].split(' in ')[1])
        t=str(row['title'].split(' in ')[0])
    elif len(row['title'].split(' in '))==3:
        s=str(row['title'].split(' in ')[2])
        t=str(row['title'].split(' in ')[0]+row['title'].split(' in ')[1])
    elif len(row['title'].split(' for '))==2: 
        s=str(row['title'].split(' for ')[1])
        t=str(row['title'].split(' for ')[0])

    return t,s
df[['title_clean','state']]=df.apply(state_code,axis=1)

1 Answer 1

2

Instead of

return t, s

try

return pd.Series(dict(state=s, title_clean=t))

and instead of

df[['title_clean','state']]=df.apply(state_code,axis=1)

use

pd.concat([df, df.apply(state_code,axis=1)], axis=1)

Incidentally, your

t = None
s = None

seems redundant.

Sign up to request clarification or add additional context in comments.

1 Comment

like a charm... its depressing to think how much time i spent trying to figure that out this morning... many thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.