2

I am trying to make a dataframe so that I can send it to a CSV easily, otherwise I have to do this process manually..

I'd like this to be my final output. Each person has a month and year combo that starts at 1/1/2014 and goes to 12/1/2016:

      Name    date
0     ben     1/1/2014
1     ben     2/1/2014
2     ben     3/1/2014
3     ben     4/1/2014
....

12    dan     1/1/2014
13    dan     2/1/2014
14    dan     3/1/2014

code so far:

import pandas as pd

days = [1]
months = list(range(1, 13))
years = ['2014', '2015', '2016']
listof_people = ['ben','dan','nathan', 'gary', 'Mark', 'Sean', 'Tim', 'Chris']

df = pd.DataFrame({"Name": listof_people})
for month in months:
    df.append({'date': month}, ignore_index=True)
print(df)

When I try looping to create the dataframe it either does not work, I get index errors (because of the non-matching lists) and I'm at a loss.

I've done a good bit of searching and have found some following links that are similar, but I can't reverse engineer the work to fit my case.

Filling empty python dataframe using loops

How to build and fill pandas dataframe from for loop?

I don't want anyone to feel like they are "doing my homework", so if i'm derping on something simple please let me know.

2
  • 1
    append is not an inplace process, so you need to reassign: df = df.append({'date': month}, ignore_index=True). Commented Jan 17, 2017 at 18:57
  • @root thank you! this get's me closer, but still not where i need to be. with reassigning, the months come in after the last list name (being Chris). Adding this for index, row in df.iterrows(): before the month loop is helping, but how do i do this for each person? Commented Jan 17, 2017 at 18:59

2 Answers 2

3

I think you can use product for all combination with to_datetime for column date:

from  itertools import product

days = [1]
months = list(range(1, 13))
years = ['2014', '2015', '2016']
listof_people = ['ben','dan','nathan', 'gary', 'Mark', 'Sean', 'Tim', 'Chris']

df1 = pd.DataFrame(list(product(listof_people, months, days, years)))
df1.columns = ['Name', 'month','day','year']
print (df1)
      Name  month  day  year
0      ben      1    1  2014
1      ben      1    1  2015
2      ben      1    1  2016
3      ben      2    1  2014
4      ben      2    1  2015
5      ben      2    1  2016
6      ben      3    1  2014
7      ben      3    1  2015
8      ben      3    1  2016
9      ben      4    1  2014
10     ben      4    1  2015
...
...
df1['date'] = pd.to_datetime(df1[['month','day','year']])
df1 = df1[['Name','date']]
print (df1)
      Name       date
0      ben 2014-01-01
1      ben 2015-01-01
2      ben 2016-01-01
3      ben 2014-02-01
4      ben 2015-02-01
5      ben 2016-02-01
6      ben 2014-03-01
7      ben 2015-03-01
...
...
Sign up to request clarification or add additional context in comments.

1 Comment

This helps very much, thank you! Know of any resources that outline how you came to this answer (even documentation). I understand the solution, but i'm sure that in that resource it will save me from answering other questions like this one.
2
mux = pd.MultiIndex.from_product(
    [listof_people, years, months],
    names=['Name', 'Year', 'Month'])

pd.Series(
    1, mux, name='Day'
).reset_index().assign(
    date=pd.to_datetime(df[['Year', 'Month', 'Day']])
)[['Name', 'date']]

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.