How to populate a column based another column in python?

Question

I have a df that look like this:

ID       Test Done      Test Action    Test Date
1234     Happy Test     Decline        2021-11-30
1234     None           Decline        None
1235     Sad Test       Decline        2022-03-24
1235     None           Decline        2022-03-04
1235     None           Decline        2022-03-04
1236     Lonely Test    Decline        2022-05-06
1236     Lonely Test    Decline        2022-05-06
1236     Lonely Test    Decline        2022-05-06

I am trying to populate all the None or empty fields in Test Done that are associated with an ID number. So I want my df to look like....

ID       Test Done      Test Action    Test Date
1234     Happy Test     Decline        2021-11-30
1234     Happy Test     Decline        None
1235     Sad Test       Decline        2022-03-24
1235     Sad Test       Decline        2022-03-04
1235     Sad Test       Decline        2022-03-04
1236     Lonely Test    Decline        2022-05-06
1236     Lonely Test    Decline        2022-05-06
1236     Lonely Test    Decline        2022-05-06

I am not sure how to go about this. From what I searched on the web, I did not find anything that related to this specific question I had or find any functions that could answer my question.

Edit:

I want to populate just the None values with the first value that is shown in Test Done. So for instance in the example the first value is Happy Test with ID 1234, I wand the None value to be Happy Test, the same goes for Sad Test with 1235 ID. If an ID already has a Test Done populated then we can skip that. Hope this makes sense.

How do you want to populate those fields? What happens if there is more than one single value associated with an ID? Which one do you choose, the most common? Please be more specific — Dani Mesejo
– Dani Mesejo, Commented Jan 6, 2023 at 17:43
@Dani Mesejo I made edits to what I am looking for. Please let me know if you need more clarification. — Astro_raf
– Astro_raf, Commented Jan 6, 2023 at 17:51

Stu Sztukowski · Accepted Answer · 2023-01-06 17:58:23Z

3

Use a groupby with ffill().

data = {'id': [1234, 1234, 1235, 1235, 1235, 1236, 1236, 1236],
        'test': ['Happy Test', 
                  None, 
                 'Sad Test', 
                  None, 
                  None, 
                 'Lonely Test', 
                 'Lonely Test', 
                 'Lonely Test']
       }

df = pd.DataFrame(data)

df['test'] = df.groupby('id')['test'].ffill()

Output:

     id         test
0  1234   Happy Test
1  1234   Happy Test
2  1235     Sad Test
3  1235     Sad Test
4  1235     Sad Test
5  1236  Lonely Test
6  1236  Lonely Test
7  1236  Lonely Test

answered Jan 6, 2023 at 17:58

Stu Sztukowski

13.1k1 gold badge16 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

How to populate a column based another column in python?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related