2,852 questions
138
votes
6
answers
290k
views
Run an OLS regression with Pandas Data Frame
I have a pandas data frame and I would like to able to predict the values of column A from the values in columns B and C. Here is a toy example:
import pandas as pd
df = pd.DataFrame({"A": [10,20,30,...
88
votes
10
answers
101k
views
auto.arima() equivalent for python
I am trying to predict weekly sales using ARMA ARIMA models. I could not find a function for tuning the order(p,d,q) in statsmodels. Currently R has a function forecast::auto.arima() which will tune ...
69
votes
9
answers
135k
views
Variance Inflation Factor in Python
I'm trying to calculate the variance inflation factor (VIF) for each column in a simple dataset in python:
a b c d
1 2 4 4
1 2 6 3
2 3 7 4
3 2 8 5
4 1 9 4
I have already done this in R using the vif ...
61
votes
6
answers
89k
views
Why do I get only one parameter from a statsmodels OLS fit
Here is what I am doing:
$ python
Python 2.7.6 (v2.7.6:3a1db0d2747e, Nov 10 2013, 00:42:54)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
>>> import statsmodels.api as sm
>>&...
51
votes
5
answers
144k
views
How to extract the regression coefficient from statsmodels.api?
result = sm.OLS(gold_lookback, silver_lookback ).fit()
After I get the result, how can I get the coefficient and the constant?
In other words, if
y = ax + c
how to get the values a and c?
49
votes
5
answers
50k
views
Print 'std err' value from statsmodels OLS results
(Sorry to ask but http://statsmodels.sourceforge.net/ is currently down and I can't access the docs)
I'm doing a linear regression using statsmodels, basically:
import statsmodels.api as sm
model = ...
48
votes
11
answers
64k
views
Where can I find mad (mean absolute deviation) in scipy?
It seems scipy once provided a function mad to calculate the mean absolute deviation for a set of numbers:
http://projects.scipy.org/scipy/browser/trunk/scipy/stats/models/utils.py?rev=3473
However, ...
41
votes
7
answers
24k
views
Highest Posterior Density Region and Central Credible Region
Given a posterior p(Θ|D) over some parameters Θ, one can define the following:
Highest Posterior Density Region:
The Highest Posterior Density Region is the set of most probable values of Θ that, in ...
38
votes
1
answer
44k
views
ANOVA in python using pandas dataframe with statsmodels or scipy?
I want to use the Pandas dataframe to breakdown the variance in one variable.
For example, if I have a column called 'Degrees', and I have this indexed for various dates, cities, and night vs. day, I ...
35
votes
2
answers
15k
views
Pandas rolling regression: alternatives to looping
I got good use out of pandas' MovingOLS class (source here) within the deprecated stats/ols module. Unfortunately, it was gutted completely with pandas 0.20.
The question of how to run rolling OLS ...
33
votes
2
answers
69k
views
How to plot statsmodels linear regression (OLS) cleanly
Problem Statement:
I have some nice data in a pandas dataframe. I'd like to run simple linear regression on it:
Using statsmodels, I perform my regression. Now, how do I get my plot? I've tried ...
32
votes
3
answers
36k
views
OLS Regression: Scikit vs. Statsmodels? [closed]
Short version: I was using the scikit LinearRegression on some data, but I'm used to p-values so put the data into the statsmodels OLS, and although the R^2 is about the same the variable coefficients ...
31
votes
3
answers
19k
views
statsmodels linear regression - patsy formula to include all predictors in model
Say I have a dataframe (let's call it DF) where y is the dependent variable and x1, x2, x3 are my independent variables. In R I can fit a linear model using the following code, and the . will include ...
30
votes
2
answers
46k
views
Error: ValueWarning: A date index has been provided, but it has no associated frequency information and so will be ignored when e.g. forecasting
So I have a CSV file with two columns: date and price, but when I tried to use ARIMA on that time series I encountered this error:
ValueWarning: A date index has been provided, but it has no ...
29
votes
2
answers
34k
views
Capturing high multi-collinearity in statsmodels
Say I fit a model in statsmodels
mod = smf.ols('dependent ~ first_category + second_category + other', data=df).fit()
When I do mod.summary() I may see the following:
Warnings:
[1] The condition ...
27
votes
4
answers
22k
views
Specifying which category to treat as the base with 'statsmodels'
In understand that when I have a category variable in a model passed to a statsmodels fit that dummy variables will automatically be generated for the categories. For example if I have a variable '...
25
votes
4
answers
54k
views
ImportError: cannot import name 'factorial'
I want to use a logit model and trying to import statsmodels library.
My Version: Python 3.6.8
The best suggestion I got is to downgrade scipy but unclear how to and to what version should I ...
25
votes
2
answers
58k
views
How to get the regression intercept using Statsmodels.api
I am trying calculate a regression output using python library but I am unable to get the intercept value when I use the library:
import statsmodels.api as sm
It prints all the regression analysis ...
25
votes
2
answers
30k
views
Statsmodels ARIMA - Different results using predict() and forecast()
I use ARIMA from statsmodels package in order to predict values from a series:
plt.plot(ind, final_results.predict(start=0 ,end=26))
plt.plot(ind, forecast.values)
plt.show()
I thought that I would ...
24
votes
3
answers
17k
views
Any Python Library Produces Publication Style Regression Tables
I've been using Python for regression analysis. After getting the regression results, I need to summarize all the results into one single table and convert them to LaTex (for publication). Is there ...
24
votes
3
answers
66k
views
logit regression and singular Matrix error in Python
am trying to run logit regression for german credit data (www4.stat.ncsu.edu/~boos/var.select/german.credit.html). To test the code, I have used only numerical variables and tried regressing it with ...
22
votes
5
answers
23k
views
Changing fig size with statsmodel
I am trying to make QQ-plots using the statsmodel package. However, the resolution of the figure is so low that I could not possibly use the results in a presentation.
I know that to make networkX ...
22
votes
1
answer
23k
views
add_constant() in statsmodels not working
I try to use the add_constant() function with an array of dataset. At index 59 it works (the column is created) but at index 60 it isn't created. Initially, testmat[59] returns a shape of (24, 54) and ...
22
votes
2
answers
10k
views
Poisson Regression in statsmodels and R
Given the some randomly generated data with
2 columns,
50 rows and
integer range between 0-100
With R, the poisson glm and diagnostics plot can be achieved as such:
> col=2
> row=50
> ...
21
votes
3
answers
43k
views
How to interpret adfuller test results? [closed]
I am struggling to understand the concept of p-value and the various other results of adfuller test.
The code I am using:
(I found this code in Stack Overflow)
import numpy as np
import os
import ...