4

I have time series data which are multi-indexed on (Year, Month) as seen here:

print(df.index)
print(df)
MultiIndex(levels=[[2016, 2017], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0], [2, 3, 4, 5, 6, 7, 8, 9]],
           names=['Year', 'Month'])
            Value
Year Month            
2016 3       65.018150
     4       63.130035
     5       71.071254
     6       72.127967
     7       67.357795
     8       66.639228
     9       64.815232
     10      68.387698

I want to do very basic linear regression on these time series data. Because pandas.DataFrame.plot does not do any regression, I intend to use Seaborn to do my plotting.

I attempted to do this by using lmplot:

sns.lmplot(x=("Year", "Month"), y="Value", data=df, fit_reg=True) 

but I get an error:

TypeError: '>' not supported between instances of 'str' and 'tuple'

This is particularly interesting to me because all elements in df.index.levels[:] are of type numpy.int64, all elements in df.index.labels[:] are of type numpy.int8.

Why am I receiving this error? How can I resolve it?

2 Answers 2

10

You can use reset_index to turn the dataframe's index into columns. Plotting DataFrames columns is then straight forward with seaborn.

As I guess the reason to use lmplot would be to show different regressions for different years (otherwise a regplot may be better suited), the "Year"column can be used as hue.

import numpy as np
import pandas as pd
import seaborn.apionly as sns
import matplotlib.pyplot as plt

iterables = [[2016, 2017], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]]
index = pd.MultiIndex.from_product(iterables, names=['Year', 'Month'])
df = pd.DataFrame({"values":np.random.rand(24)}, index=index)

df2 = df.reset_index()  # or, df.reset_index(inplace=True) if df is not required otherwise 

g = sns.lmplot(x="Month", y="values", data=df2, hue="Year")

plt.show()

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

4

Consider the following approach:

df['x'] = df.index.get_level_values(0) + df.index.get_level_values(1)/100

yields:

In [49]: df
Out[49]:
                Value        x
Year Month
2016 3      65.018150  2016.03
     4      63.130035  2016.04
     5      71.071254  2016.05
     6      72.127967  2016.06
     7      67.357795  2016.07
     8      66.639228  2016.08
     9      64.815232  2016.09
     10     68.387698  2016.10

let's prepare X-ticks labels:

labels = df.index.get_level_values(0).astype(str) + '-' + \
         df.index.get_level_values(1).astype(str).str.zfill(2)

sns.lmplot(x='x', y='Value', data=df, fit_reg=True)
ax = plt.gca()
ax.set_xticklabels(labels)

Result:

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.