Replacing multiple values within a pandas dataframe cell - python

Question

My problem: I have a pandas dataframe and one column in particular which I need to process contains values separated by (":") and in some cases, some of those values between ":" can be value = value, and can appear at the start/middle/end of the string. The length of the string can differ in each cell as we iterate through the row, for e.g.

clickstream['events']  
1:3:5:7=23  
23=1:5:1:5:3  
9:0:8:6=5:65:3:44:56  
1:3:5:4

I have a file which contains the lookup values of these numbers,e.g.

event_no,description,event
1,xxxxxx,login
3,ffffff,logout
5,eeeeee,button_click
7,tttttt,interaction
23,ferfef,click1

output required:

clickstream['events']  
login:logout:button_click:interaction=23
click1=1:button_click:login:button_click:logout

Is there a pythonic way of looking up these individual values and replacing with the event column corresponding to the event_no row as shown in the output? I have hundreds of events and trying to work out a smart way of doing this. pd.merge would have done the trick if I had a single value, but I'm struggling to work out how I can work across the values and ignore the "=value" part of the string

Liam Foley · Accepted Answer · 2015-02-11 01:51:06Z

1

Edit for to ignore missing keys in Dict:

import pandas as pd

EventsDict = {1:'1:3:5:7',2:'23:45:1:5:3',39:'0:8:46:65:3:44:56',4:'1:3:5:4'}
clickstream = pd.Series(EventsDict)
#Keep this as a dictionary
EventsLookup = {1:'login',3:'logout',5:'button_click',7:'interaction'}

def EventLookup(x):
    list1 = [EventsLookup.get(int(item),'Missing') for item in x.split(':')]
    return ":".join(list1)

clickstream.apply(EventLookup)

Since you are using a full DF and not just a series, use:

clickstream['events'].apply(EventLookup)
Output:
1                 login:logout:button_click:interaction
2             Missing:Missing:login:button_click:logout
4                     login:logout:button_click:Missing
39    Missing:Missing:Missing:Missing:logout:Missing...

edited Feb 11, 2015 at 1:51

answered Feb 2, 2015 at 17:54

Liam Foley

7,8822 gold badges28 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Maruhk Over a year ago

Hi @Liam Foley - Thanks for answering. I have tried to replicate the above but seem to get the following error AttributeError: ("'Series' object has no attribute 'split'", u'occurred at index 1'). The only change to your recreate statements was to have clickstream = pd.DataFrame(EventsDict) as clickstream = pd.DataFrame([EventsDict]) to avoid the error ValueError: If using all scalar values, you must must pass an index.... any ideas? thanks

Liam Foley Over a year ago

@Maruhk Sounds like the apply isn't working. Can you post the exact code you have? If you're using a full dataframe, you would have to do something like: DF['COL'] = DF['COL'].apply(lambda x: .......

Maruhk Over a year ago

seems like my lookup dictionary was not created properly but following your method I replicated the transformation and ran the function but seem to get an error on the first value in the clickstream dataset. I have copied the code in the following location ClickStream - Code & Output Error Link - thanks

Liam Foley Over a year ago

You make the dict, but then turn the object back into a series right away. eventlookup = eventlookup.set_index('no')['value'].to_dict() eventlookup = pd.Series(eventlookup) Don't do the second part. eventlookup = pd.Series(eventlookup). What does the Dict look like?

Liam Foley Over a year ago

Great, it's a dictionary now, which it needs to be. Try the other part now. clickstream['events'] = clickstream['events'].apply(lambda x: ":".join([eventlookup[str(item)] for item in x.split(':')])) If that doesn't work, check to see if you your dictionary keys are strings or ints. your dict keys datatype needs to match the datatype in clickstream['events']. They all need to be either ints or strs.

|

Collectives™ on Stack Overflow

Replacing multiple values within a pandas dataframe cell - python

1 Answer 1

11 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

11 Comments

Your Answer

Sign up or log in

Post as a guest

Related