python pandas dataframe predict values based on date

Question

I have a python pandas dataframe df:

Group   date           Value
  A     01-02-2016     16 
  A     01-03-2016     15 
  A     01-04-2016     14 
  A     01-05-2016     17 
  A     01-06-2016     19 
  A     01-07-2016     20 
  B     01-02-2016     16 
  B     01-03-2016     13 
  B     01-04-2016     13 
  C     01-02-2016     16 
  C     01-03-2016     16

I want to predict the value based on the date. I want to predict the value on 01-08-2016.

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

#I change the dates to be integers, I am not sure this is the best way    
df['date'] = pd.to_datetime(df['date'])  
df['date_delta'] = (df['date'] - df['date'].min())  / np.timedelta64(1,'D')

#Is this correct? 
model = LinearRegression()
X = df[['date_delta']]
y = df.Value
model.fit(X, y)
model.score(X, y)
coefs = zip(model.coef_, X.columns)
print "sl = %.1f + " % model.intercept_ + \
" + ".join("%.1f %s" % coef for coef in coefs)

I am not sure if I am treating the date correctly. Is there a better way?

Can you make sure your code matches the df you provided? (eg: date and not Date, also df.shown instead of df.Value — Julien Marrec
– Julien Marrec, Commented Jan 5, 2017 at 22:14

Community · Accepted Answer · 2017-05-23 12:13:57Z

1

I don't see any problem with what you're doing. You could use datetime.toordinal instead, but that will give you the same result (the intercept will be logically different, but that's normal).

df['date_ordinal'] = df['Date'].apply(lambda x: x.toordinal())
model = LinearRegression()
X = df[['date_ordinal']]
y = df.shown
model.fit(X, y)

If you have a case where you think there might be daily/weekly/monthly/seasonal variations, you could use 1-of-K encoding. See this question for example.

Update given your comment

You say you want to get one equation per Group:

In [2]:
results = {}
for (group, df_gp) in df.groupby('Group'):
    print("Dealing with group {}".format(group))
    print("----------------------")
    X=df_gp[['date_ordinal']]
    y=df_gp.Value
    model.fit(X,y)
    print("Score: {:.2f}%".format(100*model.score(X,y)))

    coefs = list(zip(X.columns, model.coef_))
    results[group] = [('intercept', model.intercept_)] + coefs

    coefs = zip(model.coef_, X.columns)

    print ("sl = %.1f + " % model.intercept_ + \
    " + ".join("%.1f %s" % coef for coef in coefs))

    print("\n")

Out[2]:
Dealing with group A
----------------------
Score: 65.22%
sl = -735950.7 + 1.0 date_ordinal


Dealing with group B
----------------------
Score: 75.00%
sl = 1103963.0 + -1.5 date_ordinal


Dealing with group C
----------------------
Score: 100.00%
sl = 16.0 + 0.0 date_ordinal

You also have them in a convenient dict:

In [3]: results
Out[3]:
{'A': [('intercept', -735950.66666666663), ('date_ordinal', 1.0)],
 'B': [('intercept', 1103962.9999999995),
  ('date_ordinal', -1.4999999999999993)],
 'C': [('intercept', 16.0), ('date_ordinal', 0.0)]}

edited May 23, 2017 at 12:13

CommunityBot

11 silver badge

answered Jan 5, 2017 at 22:21

Julien Marrec

11.9k5 gold badges51 silver badges67 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

jeangelj Over a year ago

Thank you very much, Julian. Is there are way to use this model by group?

Julien Marrec Over a year ago

What's the result you expect? One equation per group? Or one equation with a representation of group in it?

jeangelj Over a year ago

I am looking for one equation per group.

Julien Marrec Over a year ago

Did that answer your question? If so please upvote and accept my answer.

jeangelj Over a year ago

thank you; is there a way to use the formulas and put the results by group as a new column for 01-10-2016?

|

Collectives™ on Stack Overflow

python pandas dataframe predict values based on date

1 Answer 1

7 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related