I have the table below with codes. I would like to create two new columns. 1 that identifies any code with the letters YYY and another that identifies the letters WWW as seen in the intermediate table. After that, I would like to aggregate these and have a ID's columns with all the YYY codes and WWW codes and their totals.
I am a bit new to python. I am trying to get to the final table below but am stuck trying to get to the intermediate table and have provided my code below. I am receiving a KeyError: 'code':
#for YYY
def categorise(y):
if y['Code'].str.contains('YYY'):
return 1
return 0
df1['Code'] = df.apply(lambda y: categorise(y), axis=1)
#for WWW
def categorise(w):
if w['Code'].str.contains('WWW'):
return 1
return 0
df1['Code'] = df.apply(lambda w: categorise(w), axis=1)
Any help would be appreciated on this.
Current Table:
| Code |
|---|
| 001,ABC,123,YYY |
| 002,ABC,546,WWW |
| 003,ABC,342,WWW |
| 004,ABC,635,YYY |
Intermediate Table:
| Code | Location_Y | Location_W |
|---|---|---|
| 001,ABC,123,YYY | 1 | 0 |
| 002,ABC,546,WWW | 0 | 1 |
| 003,ABC,342,WWW | 0 | 1 |
| 004,ABC,635,YYY | 1 | 0 |
Final Table:
| IDs | Location_Y | Location_W |
|---|---|---|
| 001,ABC,123,YYY - 004,ABC,635,YYY | 2 | 0 |
| 002,ABC,546,WWW - 003,ABC,342,WWW | 0 | 2 |
Any help would be appreciated
codeas a key but your column has upper caseCodeas a name.