Add Multiple Columns to Pandas Dataframe from Function

Question

I have a pandas data frame mydf that has two columns,and both columns are datetime datatypes: mydate and mytime. I want to add three more columns: hour, weekday, and weeknum.

def getH(t): #gives the hour
    return t.hour
def getW(d): #gives the week number
    return d.isocalendar()[1] 
def getD(d): #gives the weekday
    return d.weekday() # 0 for Monday, 6 for Sunday

mydf["hour"] = mydf.apply(lambda row:getH(row["mytime"]), axis=1)
mydf["weekday"] = mydf.apply(lambda row:getD(row["mydate"]), axis=1)
mydf["weeknum"] = mydf.apply(lambda row:getW(row["mydate"]), axis=1)

The snippet works, but it's not computationally efficient as it loops through the data frame at least three times. I would just like to know if there's a faster and/or more optimal way to do this. For example, using zip or merge? If, for example, I just create one function that returns three elements, how should I implement this? To illustrate, the function would be:

def getHWd(d,t):
    return t.hour, d.isocalendar()[1], d.weekday()

Possible duplicate of Is it possible to add several columns at once to a pandas DataFrame? — finiteautomata
– finiteautomata, Commented Nov 3, 2016 at 14:36

Zero · Accepted Answer · 2015-05-04 09:57:09Z

67

Here's on approach to do it using one apply

Say, df is like

In [64]: df
Out[64]:
       mydate     mytime
0  2011-01-01 2011-11-14
1  2011-01-02 2011-11-15
2  2011-01-03 2011-11-16
3  2011-01-04 2011-11-17
4  2011-01-05 2011-11-18
5  2011-01-06 2011-11-19
6  2011-01-07 2011-11-20
7  2011-01-08 2011-11-21
8  2011-01-09 2011-11-22
9  2011-01-10 2011-11-23
10 2011-01-11 2011-11-24
11 2011-01-12 2011-11-25

We'll take the lambda function out to separate line for readability and define it like

In [65]: lambdafunc = lambda x: pd.Series([x['mytime'].hour,
                                           x['mydate'].isocalendar()[1],
                                           x['mydate'].weekday()])

And, apply and store the result to df[['hour', 'weekday', 'weeknum']]

In [66]: df[['hour', 'weekday', 'weeknum']] = df.apply(lambdafunc, axis=1)

And, the output is like

In [67]: df
Out[67]:
       mydate     mytime  hour  weekday  weeknum
0  2011-01-01 2011-11-14     0       52        5
1  2011-01-02 2011-11-15     0       52        6
2  2011-01-03 2011-11-16     0        1        0
3  2011-01-04 2011-11-17     0        1        1
4  2011-01-05 2011-11-18     0        1        2
5  2011-01-06 2011-11-19     0        1        3
6  2011-01-07 2011-11-20     0        1        4
7  2011-01-08 2011-11-21     0        1        5
8  2011-01-09 2011-11-22     0        1        6
9  2011-01-10 2011-11-23     0        2        0
10 2011-01-11 2011-11-24     0        2        1
11 2011-01-12 2011-11-25     0        2        2

answered May 4, 2015 at 9:57

Zero

77.4k22 gold badges153 silver badges153 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

EFL Over a year ago

Thanks, John. Looks good. This approach performed faster than the one in the original post. For a data frame with ~500 rows, the average performance for your approach was 0.1446926 seconds, while the original one took 0.15949020 seconds, on the average (10 runs).

Jason S Over a year ago

lambdafunc = lambda x: -- why not just use def lambdafunc(x): instead? there's not much point in using an anonymous function if you're immediately going to name it.

Andras Vanyolos Over a year ago

Very minor correction: column names order on the left hand side of the assignment should read: df[['hour', 'weeknum', 'weekday']] to match columns order returned by lambdafunc

E. Ducateme · Accepted Answer · 2017-12-05 05:35:31Z

To complement John Galt's answer:

Depending on the task that is performed by lambdafunc, you may experience some speedup by storing the result of apply in a new DataFrame and then joining with the original:

lambdafunc = lambda x: pd.Series([x['mytime'].hour,
                                  x['mydate'].isocalendar()[1],
                                  x['mydate'].weekday()])

newcols = df.apply(lambdafunc, axis=1)
newcols.columns = ['hour', 'weekday', 'weeknum']
newdf = df.join(newcols)

Even if you do not see a speed improvement, I would recommend using the join. You will be able to avoid the (always annoying) SettingWithCopyWarning that may pop up when assigning directly on the columns:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

samsamoa · Accepted Answer · 2019-02-26 01:45:10Z

13

You can do this in a somewhat cleaner method by having the function you apply return a pd.Series with named elements:

def process(row):
    return pd.Series(dict(b=row["a"] * 2, c=row["a"] + 2))


my_df = pd.DataFrame(dict(a=range(10)))
new_df = my_df.join(my_df.apply(process, axis="columns"))

The result is:

   a   b   c
0  0   0   2
1  1   2   3
2  2   4   4
3  3   6   5
4  4   8   6
5  5  10   7
6  6  12   8
7  7  14   9
8  8  16  10
9  9  18  11

answered Feb 26, 2019 at 1:45

samsamoa

1351 silver badge6 bronze badges

Comments

Venkat R · Accepted Answer · 2015-05-04 10:12:52Z

3

def getWd(d):
    d.isocalendar()[1], d.weekday()
def getH(t):
    return t.hour
mydf["hour"] = zip(*df["mytime"].map(getH))
mydf["weekday"], mydf["weeknum"] = zip(*df["mydate"].map(getWd))

answered May 4, 2015 at 10:12

Venkat R

513 bronze badges

1 Comment

EFL Over a year ago

Venkat, hi. The snippet returns a TypeError: zip argument #1 must support iteration

Collectives™ on Stack Overflow

Add Multiple Columns to Pandas Dataframe from Function

4 Answers 4

3 Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related