5

I try to parse a CSV file which looks like this:

dd.mm.yyyy   value

01.01.2000   1
02.01.2000   2
01.02.2000   3

I need to add missing dates and fill according values with NaN. I used Series.reindex like in this question:

import pandas as pd

ts=pd.read_csv(file, sep=';', parse_dates='True', index_col=0)

idx = pd.date_range('01.01.2000', '02.01.2000')

ts.index = pd.DatetimeIndex(ts.index)
ts = ts.reindex(idx, fill_value='NaN')

But in result, values for certain dates are swapped due to date format (i.e. mm/dd instead of dd/mm):

01.01.2000   1
02.01.2000   3
03.01.2000   NaN
...
...
31.01.2000   NaN
01.02.2000   2

I tried several ways (i.e. add dayfirst=True to read_csv) to do it right but still can't figure it out. Please, help.

2
  • You could try adding dayfirst=True to read_csv(). Commented Dec 17, 2015 at 4:32
  • Yes i have tried it, but result is the same Commented Dec 17, 2015 at 5:34

1 Answer 1

6

Set parse_dates to the first column with parse_dates=[0]:

ts = pd.read_csv(file, sep=';', parse_dates=[0], index_col=0, dayfirst=True)

idx = pd.date_range('01.01.2000', '02.01.2000')

ts.index = pd.DatetimeIndex(ts.index)
ts = ts.reindex(idx, fill_value='NaN')
print(ts)

prints:

              value
2000-01-01        1
2000-01-02        2
2000-01-03      NaN
...
2000-01-31      NaN
2000-02-01        3

parse_dates=[0] tells pandas to explicitly parse the first column as dates. From the docs:

parse_dates : boolean, list of ints or names, list of lists, or dict

If True -> try parsing the index.

If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.

If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.

{'foo' : [1, 3]} -> parse columns 1, 3 as date and call result 'foo'

A fast-path exists for iso8601-formatted dates.

Sign up to request clarification or add additional context in comments.

3 Comments

@MikeMüller, Trying to use this but I have multiple entries per date, it's absence records from factory shifts. So, we could have 2 records of people calling in sick on 8/5/2015 but then no more sick days till 8/9/2015 when we could have 3. My code breaks at the line: ts = ts.reindex(idx, fill_value='NaN') Can you answer that in this question or should I create a new one? Thank you!
Go ahead and create a new question. Please and an example of what you try to achieve.
@MikeMüller, question posted here: stackoverflow.com/questions/45506038/… Thank you for any insight you can provide!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.