2

Here is the dataframe. I want the dates here in '%Y-%m-%d %H:%M:%S' format.

import pandas as pd
df2 = pd.DataFrame([['2017-18','','','','','','','','','','','',''], ['COMPANIES', '01-APR-2017', '01-MAY-2017', '01-JUN-2017', 
                    '01-JULY-2017', '01-AUG-2017', '01-SEP-2017', '01-OCT-2017', '01-NOV-2017', '01-DEC-2017', '01-JAN-2018', '01-FEB-2018', '01-MAR-2018']])

I tried,

df2.iloc[1, 1:] = df2.iloc[1, 1:].str.replace("JULY", "JUL")
df2.iloc[1, 1:] = df2.iloc[1, 1:].apply(pd.to_datetime, format = '%d-%b-%Y')

but, it gives:

          0                    1                    2                    3   \
0    2017-18                                                                  
1  COMPANIES  1491004800000000000  1493596800000000000  1496275200000000000   

                    4                    5                    6   \
0                                                                  
1  1498867200000000000  1501545600000000000  1504224000000000000   

                    7                    8                    9   \
0                                                                  
1  1506816000000000000  1509494400000000000  1512086400000000000   

                    10                   11                   12  
0                                                                 
1  1514764800000000000  1517443200000000000  1519862400000000000  

Am I missing something? Is there any other way to achieve the dates in required format?

I even tried:

for i in df2.iloc[1, 1:]:
    i = datetime.datetime.fromtimestamp(int(i)).strftime('%Y-%m-%d %H:%M:%S')

but gives a ValueError: timestamp out of range for platform localtime()/gmtime() function

3 Answers 3

3

In my opinion, you should transpose your dataframe and use dateutil.parser, which is more flexible with regards to date input format.

Structurally, pandas works best and most intuitively when you have series (or columns) of fixed types.

Setup

import pandas as pd
from dateutil import parser

df2 = pd.DataFrame([['2017-18','','','','','','','','','','','',''], ['COMPANIES', '01-APR-2017', '01-MAY-2017', '01-JUN-2017', 
                    '01-JULY-2017', '01-AUG-2017', '01-SEP-2017', '01-OCT-2017', '01-NOV-2017', '01-DEC-2017', '01-JAN-2018', '01-FEB-2018', '01-MAR-2018']])

Solution

res = df2.T.iloc[1:, 1].apply(parser.parse)

Result

print(res)

1    2017-04-01
2    2017-05-01
3    2017-06-01
4    2017-07-01
5    2017-08-01
6    2017-09-01
7    2017-10-01
8    2017-11-01
9    2017-12-01
10   2018-01-01
11   2018-02-01
12   2018-03-01
Name: 1, dtype: datetime64[ns]
Sign up to request clarification or add additional context in comments.

3 Comments

I would avoid using apply() and parser. Instead just calling the parser function (I recommend using pd.to_datetime()) on the data, like shown in my example, is more intuitive and faster (and easier to debug).
@Scotty1-, that's what I'd normally say, but I find the other answers (convert JULY to JUL to fit datetime conversion format) a poor alternative. Test it with OP's data - it won't work without this replacement.
That is true. I already forgot about that conversion But I'd still try to avoid using .apply()... I'll upvote your answer.
1

You can access strftime using .dt

Ex:

import pandas as pd
df2 = pd.DataFrame([['2017-18','','','','','','','','','','','',''], ['COMPANIES', '01-APR-2017', '01-MAY-2017', '01-JUN-2017', 
                    '01-JULY-2017', '01-AUG-2017', '01-SEP-2017', '01-OCT-2017', '01-NOV-2017', '01-DEC-2017', '01-JAN-2018', '01-FEB-2018', '01-MAR-2018']])


df2.iloc[1, 1:] = df2.iloc[1, 1:].str.replace("JULY", "JUL")
df2.iloc[1, 1:] = df2.iloc[1, 1:].apply(pd.to_datetime, format = '%d-%b-%Y').dt.strftime('%Y-%m-%d %H:%M:%S')

print(df2)

Output:

          0                    1                    2                    3   \
0    2017-18                                                                  
1  COMPANIES  2017-04-01 00:00:00  2017-05-01 00:00:00  2017-06-01 00:00:00   

                    4                    5                    6   \
0                                                                  
1  2017-07-01 00:00:00  2017-08-01 00:00:00  2017-09-01 00:00:00   

                    7                    8                    9   \
0                                                                  
1  2017-10-01 00:00:00  2017-11-01 00:00:00  2017-12-01 00:00:00   

                    10                   11                   12  
0                                                                 
1  2018-01-01 00:00:00  2018-02-01 00:00:00  2018-03-01 00:00:00  

Comments

1

Your timestamp is saved in two different rows. The first row contains one timestamp and several empty entries. The second row contains the string 'COMPANIES' AND datetimes as strings. Now when you try to parse these datetime strings to datetime format, they will be parsed and then converted to the absolute numeric date format, like 1506816000000000000.

This is because pandas stores the second row as dtype=object, since it is of mixed types: strings and datetimes.
To represent the datetimes correctly, they need to be stored in a row/columns with a correct dtype. To show you the effect of storing them separately:

dates = pd.to_datetime(df2.iloc[1, 1:], format = '%d-%b-%Y')

Btw.: Why is everyone using apply()? Just calling a function on a row/column directly is alot faster and more intuitive.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.