0

I have a data frame that is formatted like this:

details col_1 col2 col3
ex1 2019 test 1 1 1
ex1 2020 review 2 2 2
example2 2021 survey 3 3 3
row3 2019 data 4 4 4

I want to create a new column called "Year" appended to the end of this data frame that takes the year value from the row name. I want it to look like this:

details col_1 col2 col3 Year
ex1 2019 test 1 1 1 2019
ex1 2020 review 2 2 2 2020
example2 2021 survey 3 3 3 2021
row3 2019 data 4 4 4 2019

The row names are unstandardized on purpose to reflect my actual data. Thanks in advance for the help!

2 Answers 2

1

This will work:

df['Year'] = df.details.str.extract(r'\b(\d{4})\b').astype(int)

Output:

                details  col_1  col2  col3  Year
0         ex1 2019 test      1     1     1  2019
1       ex1 2020 review      2     2     2  2020
2  example2 2021 survey      3     3     3  2021
3        row3 2019 data      4     4     4  2019
Sign up to request clarification or add additional context in comments.

2 Comments

@Pranav Hosangadi good idea - updated.
Do you need any more help with your question?
0
from dateutil.parser import parse
df['Year'] = df.apply(lambda row: parse(row.details, fuzzy=True).year, axis=1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.