0

I am reading from an Excel sheet. The header is date in the format of Month-Year and I want to keep it that way. But when it reades it, it changes the format to "2014-01-01 00:00:00". I wrote the following peice to fix it, but doesn't work.

import pandas as pd
import numpy as np
import datetime
from datetime import date
import time
file_loc = "path.xlsx"
df = pd.read_excel(file_loc, index_col=None, na_values=['NA'], parse_cols = 37)
df.columns=pd.to_datetime(df.columns, format='%b-%y')

Which didn't do anything. On another try, I did the following:

df.columns = datetime.datetime.strptime(df.columns, '%Y-%m-%d %H:%M:%S').strftime('%b-%y')

Which returns the must be str, not datetime.datetime error. I don't know how make it read the row cell by cell to read the strings!

Here is a sample data:

NaT 11/14/2015 00:00:00 12/15/2015 00:00:00 1/15/2016 00:00:00
A   5                   1                   6
B   6                   3                   3   

My main problem with this is that it does not recognize it as the header, e.g., df['11/14/2015 00:00:00'] retuns an keyError.

Any help is appreciated.

UPDATE: Here is a photo to illustrate what I keep geting! Box 6 is the implementation of apply, and box 7 is what my data looks like.

enter image description here

3
  • @EdChum, I posted the problem with date formating here. Thanks. Commented Nov 11, 2015 at 18:12
  • I think because you strip incorrectly: 11/14/2015 00:00:00 should be '%Y/%m/%d %H:%M:%S' but not with '-' symbols between month and day Commented Nov 11, 2015 at 21:06
  • @Anton Protopopov, I tried it with /, but still the error says must be str, not Index. Commented Nov 11, 2015 at 23:38

2 Answers 2

1
import datetime
df = pd.DataFrame({'data': ["11/14/2015 00:00:00", "11/14/2015 00:10:00", "11/14/2015 00:20:00"]})
df["data"].apply(lambda x: datetime.datetime.strptime(x, '%m/%d/%Y %H:%M:%S').strftime('%b-%y'))

EDIT

If you'd like to work with df.columns you could use map function:

df.columns = list(map(lambda x: datetime.datetime.strptime(x, '%m/%d/%Y %H:%M:%S').strftime('%b-%y'), df1.columns))

You need list if you are using python 3.x because it's iterator by default.

Sign up to request clarification or add additional context in comments.

6 Comments

You could use apply method for pd.Series of dataframe
thanks but it didn't work, I have attached a picture in the question body to show the code and result. The next issue is that my data is not in a column, it is the header.
So you could convert your df.columns to pd.Series with pd.Series(df.columns) and then use apply method. Look to the last edit
The list worked! I just had to change the inner x to str(x) to avoid the must be str, not Timestamp error. Thansk.
But the problem is when I print the dataframe, it still displays the old format.
|
0

The problem might be that the data in excel isn't stored in the string format you think it is. Perhaps it is stored as a number, and just displayed as a date string in excel.

Excel sometimes uses milliseconds after an epoch to store dates. Check what the actual values you see in the df array.

What does this show?

from pprint import pprint
pprint(df)

1 Comment

In the excel, the very first cell is 1/1/2014 and then each cell is =previous cell +31. pprint(df) prints 2015-11-14 00:00:00.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.