1

I have a df that look like this:

ID       Test Done      Test Action    Test Date
1234     Happy Test     Decline        2021-11-30
1234     None           Decline        None
1235     Sad Test       Decline        2022-03-24
1235     None           Decline        2022-03-04
1235     None           Decline        2022-03-04
1236     Lonely Test    Decline        2022-05-06
1236     Lonely Test    Decline        2022-05-06
1236     Lonely Test    Decline        2022-05-06

I am trying to populate all the None or empty fields in Test Done that are associated with an ID number. So I want my df to look like....

ID       Test Done      Test Action    Test Date
1234     Happy Test     Decline        2021-11-30
1234     Happy Test     Decline        None
1235     Sad Test       Decline        2022-03-24
1235     Sad Test       Decline        2022-03-04
1235     Sad Test       Decline        2022-03-04
1236     Lonely Test    Decline        2022-05-06
1236     Lonely Test    Decline        2022-05-06
1236     Lonely Test    Decline        2022-05-06

I am not sure how to go about this. From what I searched on the web, I did not find anything that related to this specific question I had or find any functions that could answer my question.

Edit:

I want to populate just the None values with the first value that is shown in Test Done. So for instance in the example the first value is Happy Test with ID 1234, I wand the None value to be Happy Test, the same goes for Sad Test with 1235 ID. If an ID already has a Test Done populated then we can skip that. Hope this makes sense.

3
  • 1
    How do you want to populate those fields? What happens if there is more than one single value associated with an ID? Which one do you choose, the most common? Please be more specific Commented Jan 6, 2023 at 17:43
  • It looks like you'd like LOCF but grouped by ID. Commented Jan 6, 2023 at 17:45
  • @Dani Mesejo I made edits to what I am looking for. Please let me know if you need more clarification. Commented Jan 6, 2023 at 17:51

1 Answer 1

3

Use a groupby with ffill().

data = {'id': [1234, 1234, 1235, 1235, 1235, 1236, 1236, 1236],
        'test': ['Happy Test', 
                  None, 
                 'Sad Test', 
                  None, 
                  None, 
                 'Lonely Test', 
                 'Lonely Test', 
                 'Lonely Test']
       }

df = pd.DataFrame(data)

df['test'] = df.groupby('id')['test'].ffill()

Output:

     id         test
0  1234   Happy Test
1  1234   Happy Test
2  1235     Sad Test
3  1235     Sad Test
4  1235     Sad Test
5  1236  Lonely Test
6  1236  Lonely Test
7  1236  Lonely Test
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.