Issue while converting date to datetime format in pandas dataframe

Question

Here is the dataframe. I want the dates here in '%Y-%m-%d %H:%M:%S' format.

import pandas as pd
df2 = pd.DataFrame([['2017-18','','','','','','','','','','','',''], ['COMPANIES', '01-APR-2017', '01-MAY-2017', '01-JUN-2017', 
                    '01-JULY-2017', '01-AUG-2017', '01-SEP-2017', '01-OCT-2017', '01-NOV-2017', '01-DEC-2017', '01-JAN-2018', '01-FEB-2018', '01-MAR-2018']])

I tried,

df2.iloc[1, 1:] = df2.iloc[1, 1:].str.replace("JULY", "JUL")
df2.iloc[1, 1:] = df2.iloc[1, 1:].apply(pd.to_datetime, format = '%d-%b-%Y')

but, it gives:

          0                    1                    2                    3   \
0    2017-18                                                                  
1  COMPANIES  1491004800000000000  1493596800000000000  1496275200000000000   

                    4                    5                    6   \
0                                                                  
1  1498867200000000000  1501545600000000000  1504224000000000000   

                    7                    8                    9   \
0                                                                  
1  1506816000000000000  1509494400000000000  1512086400000000000   

                    10                   11                   12  
0                                                                 
1  1514764800000000000  1517443200000000000  1519862400000000000

Am I missing something? Is there any other way to achieve the dates in required format?

I even tried:

for i in df2.iloc[1, 1:]:
    i = datetime.datetime.fromtimestamp(int(i)).strftime('%Y-%m-%d %H:%M:%S')

but gives a ValueError: timestamp out of range for platform localtime()/gmtime() function

jpp · Accepted Answer · 2018-05-15 09:56:28Z

3

In my opinion, you should transpose your dataframe and use dateutil.parser, which is more flexible with regards to date input format.

Structurally, pandas works best and most intuitively when you have series (or columns) of fixed types.

Setup

import pandas as pd
from dateutil import parser

df2 = pd.DataFrame([['2017-18','','','','','','','','','','','',''], ['COMPANIES', '01-APR-2017', '01-MAY-2017', '01-JUN-2017', 
                    '01-JULY-2017', '01-AUG-2017', '01-SEP-2017', '01-OCT-2017', '01-NOV-2017', '01-DEC-2017', '01-JAN-2018', '01-FEB-2018', '01-MAR-2018']])

Solution

res = df2.T.iloc[1:, 1].apply(parser.parse)

Result

print(res)

1    2017-04-01
2    2017-05-01
3    2017-06-01
4    2017-07-01
5    2017-08-01
6    2017-09-01
7    2017-10-01
8    2017-11-01
9    2017-12-01
10   2018-01-01
11   2018-02-01
12   2018-03-01
Name: 1, dtype: datetime64[ns]

answered May 15, 2018 at 9:56

jpp

166k37 gold badges301 silver badges362 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

JE_Muc Over a year ago

I would avoid using apply() and parser. Instead just calling the parser function (I recommend using pd.to_datetime()) on the data, like shown in my example, is more intuitive and faster (and easier to debug).

jpp Over a year ago

@Scotty1-, that's what I'd normally say, but I find the other answers (convert JULY to JUL to fit datetime conversion format) a poor alternative. Test it with OP's data - it won't work without this replacement.

JE_Muc Over a year ago

That is true. I already forgot about that conversion But I'd still try to avoid using .apply()... I'll upvote your answer.

Rakesh · Accepted Answer · 2018-05-15 09:49:44Z

You can access strftime using .dt

Ex:

import pandas as pd
df2 = pd.DataFrame([['2017-18','','','','','','','','','','','',''], ['COMPANIES', '01-APR-2017', '01-MAY-2017', '01-JUN-2017', 
                    '01-JULY-2017', '01-AUG-2017', '01-SEP-2017', '01-OCT-2017', '01-NOV-2017', '01-DEC-2017', '01-JAN-2018', '01-FEB-2018', '01-MAR-2018']])


df2.iloc[1, 1:] = df2.iloc[1, 1:].str.replace("JULY", "JUL")
df2.iloc[1, 1:] = df2.iloc[1, 1:].apply(pd.to_datetime, format = '%d-%b-%Y').dt.strftime('%Y-%m-%d %H:%M:%S')

print(df2)

Output:

          0                    1                    2                    3   \
0    2017-18                                                                  
1  COMPANIES  2017-04-01 00:00:00  2017-05-01 00:00:00  2017-06-01 00:00:00   

                    4                    5                    6   \
0                                                                  
1  2017-07-01 00:00:00  2017-08-01 00:00:00  2017-09-01 00:00:00   

                    7                    8                    9   \
0                                                                  
1  2017-10-01 00:00:00  2017-11-01 00:00:00  2017-12-01 00:00:00   

                    10                   11                   12  
0                                                                 
1  2018-01-01 00:00:00  2018-02-01 00:00:00  2018-03-01 00:00:00

JE_Muc · Accepted Answer · 2018-05-15 10:09:47Z

Your timestamp is saved in two different rows. The first row contains one timestamp and several empty entries. The second row contains the string 'COMPANIES' AND datetimes as strings. Now when you try to parse these datetime strings to datetime format, they will be parsed and then converted to the absolute numeric date format, like 1506816000000000000.

This is because pandas stores the second row as dtype=object, since it is of mixed types: strings and datetimes.
To represent the datetimes correctly, they need to be stored in a row/columns with a correct dtype. To show you the effect of storing them separately:

dates = pd.to_datetime(df2.iloc[1, 1:], format = '%d-%b-%Y')

Btw.: Why is everyone using apply()? Just calling a function on a row/column directly is alot faster and more intuitive.

Collectives™ on Stack Overflow

Issue while converting date to datetime format in pandas dataframe

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related