how add a column in dataframe after applying groupby

Question

I have dataframe like this

    id             Date
    546451991   2018-07-31 00:00:00
    546451991   2018-08-02 00:00:00
    5441440119  2018-08-13 00:00:00
    5441440119  2018-08-13 00:00:00
    5441440119  2018-08-14 00:00:00
    5344265358  2018-07-13 00:00:00
    5344265358  2018-07-15 00:00:00
    5441438884  2018-07-19 00:00:00

I want to groupby 'ID' then sort on the basis of date then add a column containing date of next ROW

E.g i want output like this

 id             Date              Date1
546451991   2018-07-31 00:00:00  2018-08-02 00:00:00
546451991   2018-08-02 00:00:00  NULL
5441440119  2018-08-13 00:00:00  2018-08-14 00:00:00
5441440119  2018-08-14 00:00:00  2018-08-15 00:00:00
5441440119  2018-08-15 00:00:00  NULL
5344265358  2018-07-13 00:00:00  2018-07-15 00:00:00
5344265358  2018-07-15 00:00:00  NULL
5441438884  2018-07-19 00:00:00  NULL

i have tried but not succeeded df.groupby('id')['Date'].sort_values() not working

You're not clear about what kind of sorting you would like to do in your df, is it ascending or descending? — user2906838
– user2906838, Commented Sep 8, 2018 at 10:10
Can you please post a pre-made DF? I'm faffing around now for quite a while trying to get this into a testable format from the clipboard, when I should be trying to answer the question — roganjosh
– roganjosh, Commented Sep 8, 2018 at 10:13

Naga kiran · Accepted Answer · 2018-11-09 06:58:53Z

2

df['Date1'] = df.groupby('id')['Date'].apply(lambda x: x.sort_values().shift(-1))

Out:

            Date           id          Date1
0   2018-07-3100:00:00  546451991   2018-08-0200:00:00
1   2018-08-0200:00:00  546451991   NaN
2   2018-08-1300:00:00  5441440119  2018-08-1300:00:00
3   2018-08-1300:00:00  5441440119  2018-08-1400:00:00
4   2018-08-1400:00:00  5441440119  NaN
5   2018-07-1300:00:00  5344265358  2018-07-1500:00:00
6   2018-07-1500:00:00  5344265358  NaN
7   2018-07-1900:00:00  5441438884  NaN

edit

from sandeep inputs

df['Date1'] = df.groupby('id')['Date'].shift(-1)

edited Nov 9, 2018 at 6:58

answered Sep 8, 2018 at 10:17

Naga kiran

4,6071 gold badge21 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Muhammad Waleed Over a year ago

i tried your code but it is giving me following error TypeError: incompatible index of inserted column with frame index

Naga kiran Over a year ago

I checked the code again, it is giving the same results. are you using id as index as index ?

Muhammad Waleed Over a year ago

No but my index are not in increasing order values are missing e.g 1,2,4,7,8,9,10

Naga kiran Over a year ago

please reset the index <b> df.reset_index(drop = True,inplace = True) </b>

Naga kiran Over a year ago

Yes @SandeepKadapa, i didnt think of that ;-)

user2906838 · Accepted Answer · 2018-09-08 10:25:10Z

0

This is probably what you're looking for, while @Naga Kiran's answer does it in one liner, I'm just making things simple step by step.

import pandas as pd
df = pd.DataFrame({"id":[1, 2, 3, 4], "Date":["2018-07-01", "2018-08-01", "2018-09-02", "2018-10-03"]})
newdf = df.sort_values(["Date"], ascending=False)
newdf["Date1"] = newdf["Date"].transform(lambda x:x.shift(-1))
newdf.groupby("id").head(3)

I first sorted the dataframe, then added the Date1 with shift(-1) which shift the column value in one row up, then did the groupby("id").

Hope this helps.

answered Sep 8, 2018 at 10:25

user2906838

1,17811 silver badges22 bronze badges

Collectives™ on Stack Overflow

how add a column in dataframe after applying groupby

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related