2

I have a pandas dataframe that I am trying to manipulate so that the 'amount' column converts from a string ("hh.hh 'hours'") to an int or float.

I am wondering what the best practice is in this situation. I have tried pd.to_numeric(), but without success. I think my problem is with the 'hours' tacked on the end of each string.

Is there another function that will identify numeric characters and simply ignore the 'hours' portion of the string, or do I first need to trim the last 5 characters before working with the built in dtype conversion function (pd.to_numeric)? Thanks!

           day  amount
2018-08-23  3   24.00 hours
2018-08-24  4   8.00 hours
2018-08-25  5   32.00 hours
2018-08-26  6   24.00 hours
2018-08-27  0   24.00 hours
2
  • 3
    Trimming 'hours' would be easiest i think. should be pretty quick to do Commented Sep 10, 2018 at 14:57
  • Thanks. I had encountered a parser function imported from dateutils in the past when working with time series data (used for converting something to the tune of 'Jul 4th, 2018 10:30 AM' into a datetime() object). I was curious to know if there was something similar built into pandas that I haven't stumbled upon. Commented Sep 10, 2018 at 15:03

2 Answers 2

2

Just use the string methods to get only the numbers that matter. There are a lot of options available depending upon how messy or formatted your column is:

import pandas as pd

df['amount'] = pd.to_numeric(df.amount.str.replace('hours', ''), downcast='integer')
# or 
df['amount'] = pd.to_numeric(df.amount.str[:-5], downcast='integer')
# or
df['amount'] = pd.to_numeric(df.amount.str.extract('(\d+\.?\d*)')[0], downcast='integer')

All output:

            day  amount
2018-08-23    3      24
2018-08-24    4       8
2018-08-25    5      32
2018-08-26    6      24
2018-08-27    0      24
Sign up to request clarification or add additional context in comments.

Comments

2

Depending on the integrity of your data, You can use pd.to_timedelta to convert it to duration(timeDelta):

>>>df.amount=pd.to_timedelta(df.amount)
>>>df

    date      day   amount
0   2018-08-23  3   1 days 00:00:00
1   2018-08-24  4   0 days 08:00:00
2   2018-08-25  5   1 days 08:00:00
3   2018-08-26  6   1 days 00:00:00
4   2018-08-27  0   1 days 00:00:00

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.