0

I have the table below with codes. I would like to create two new columns. 1 that identifies any code with the letters YYY and another that identifies the letters WWW as seen in the intermediate table. After that, I would like to aggregate these and have a ID's columns with all the YYY codes and WWW codes and their totals.

I am a bit new to python. I am trying to get to the final table below but am stuck trying to get to the intermediate table and have provided my code below. I am receiving a KeyError: 'code':

#for YYY

def categorise(y):  
    if y['Code'].str.contains('YYY'):
        return 1
    return 0

df1['Code'] = df.apply(lambda y: categorise(y), axis=1)

#for WWW

def categorise(w):  
    if w['Code'].str.contains('WWW'):
        return 1
    return 0

df1['Code'] = df.apply(lambda w: categorise(w), axis=1)

Any help would be appreciated on this.

Current Table:

Code
001,ABC,123,YYY
002,ABC,546,WWW
003,ABC,342,WWW
004,ABC,635,YYY

Intermediate Table:

Code Location_Y Location_W
001,ABC,123,YYY 1 0
002,ABC,546,WWW 0 1
003,ABC,342,WWW 0 1
004,ABC,635,YYY 1 0

Final Table:

IDs Location_Y Location_W
001,ABC,123,YYY - 004,ABC,635,YYY 2 0
002,ABC,546,WWW - 003,ABC,342,WWW 0 2

Any help would be appreciated

2
  • 1
    Your're getting the key error because you are using lowercase code as a key but your column has upper case Code as a name. Commented Nov 1, 2022 at 20:40
  • @sunnytown getting the same error when I make that change Commented Nov 1, 2022 at 20:43

3 Answers 3

1
# assuming the string of interest is the last under the code column

df['id'] = df['Code'].str.rsplit(',', n=1, expand=True)[1]

# create columns with 1 or 0 if string exists in the Code
df['Location_Y'] = df['id'].eq('YYY').astype(int)
df['Location_W'] = df['id'].eq('WWW').astype(int)


# groupby to get the aggregates
df.groupby('id', as_index=False).agg({'Code' : ' - '.join,
                      'Location_Y': sum,
                      'Location_W': sum
                     })[['Code', 'Location_Y', 'Location_W']]
                                 Code   Location_Y  Location_W
0   002,ABC,546,WWW - 003,ABC,342,WWW            0           2
1   001,ABC,123,YYY - 004,ABC,635,YYY            2           0
Sign up to request clarification or add additional context in comments.

Comments

1

Extract the last elements, get_dummies. Group by the elements and agggregate getting sums and joining as required. Coded below

df=df.assign(coded=df['Code'].str.split('\,').str[-1])


 #intermediate
df=df.assign(coded=df['Code'].str.split('\,').str[-1])
s = df.join(pd.get_dummies(df['coded']))

          Code      coded  WWW  YYY
0  001,ABC,123,YYY   YYY    0    1
1  002,ABC,546,WWW   WWW    1    0
2  003,ABC,342,WWW   WWW    1    0
3  004,ABC,635,YYY   YYY    0    1

#Final

s.groupby('coded').agg(**{'Code':('Code', lambda x: x.str.cat(sep='-')),'Y':('YYY', 'sum'),'W':('WWW', 'sum')}).reset_index().drop(columns='coded')


     

     Code                           Y  W
0  002,ABC,546,WWW-003,ABC,342,WWW  0  2
1  001,ABC,123,YYY-004,ABC,635,YYY  2  0

Comments

1

Well, unlike others I would like to suggest you to use regex and do it in simpler way if you are beginner.

So, for intermediate table do something like this:

import pandas as pd
import re

df = pd.read_csv('test_table.csv')

yyy = []
www = []

for index, row in df.iterrows():
    val_y = re.search('YYY', row['test data'])
    if val_y is None:
        yyy.append(0)
    else:
        yyy.append(1)

    val_w = re.search('WWW', row['test data'])
    if val_w is None:
        www.append(0)
    else:
        www.append(1)

df['Location_Y'] = yyy
df['Location_W'] = www

print(df)

and for Final, change that for loop like this

for index, row in df.iterrows():
    val_y = row['test data'].count('YYY')
    yyy.append(val_y)
    val_w = row['test data'].count('WWW')
    www.append(val_w)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.