Skip to main content
Filter by
Sorted by
Tagged with
138 votes
6 answers
290k views

I have a pandas data frame and I would like to able to predict the values of column A from the values in columns B and C. Here is a toy example: import pandas as pd df = pd.DataFrame({"A": [10,20,30,...
Michael's user avatar
  • 14k
88 votes
10 answers
101k views

I am trying to predict weekly sales using ARMA ARIMA models. I could not find a function for tuning the order(p,d,q) in statsmodels. Currently R has a function forecast::auto.arima() which will tune ...
Ajax's user avatar
  • 1,749
69 votes
9 answers
135k views

I'm trying to calculate the variance inflation factor (VIF) for each column in a simple dataset in python: a b c d 1 2 4 4 1 2 6 3 2 3 7 4 3 2 8 5 4 1 9 4 I have already done this in R using the vif ...
Nizag's user avatar
  • 999
61 votes
6 answers
89k views

Here is what I am doing: $ python Python 2.7.6 (v2.7.6:3a1db0d2747e, Nov 10 2013, 00:42:54) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin >>> import statsmodels.api as sm >>&...
Tom's user avatar
  • 2,999
51 votes
5 answers
144k views

result = sm.OLS(gold_lookback, silver_lookback ).fit() After I get the result, how can I get the coefficient and the constant? In other words, if y = ax + c how to get the values a and c?
JOHN's user avatar
  • 1,521
49 votes
5 answers
50k views

(Sorry to ask but http://statsmodels.sourceforge.net/ is currently down and I can't access the docs) I'm doing a linear regression using statsmodels, basically: import statsmodels.api as sm model = ...
Gabriel's user avatar
  • 43k
48 votes
11 answers
64k views

It seems scipy once provided a function mad to calculate the mean absolute deviation for a set of numbers: http://projects.scipy.org/scipy/browser/trunk/scipy/stats/models/utils.py?rev=3473 However, ...
Ton van den Heuvel's user avatar
41 votes
7 answers
24k views

Given a posterior p(Θ|D) over some parameters Θ, one can define the following: Highest Posterior Density Region: The Highest Posterior Density Region is the set of most probable values of Θ that, in ...
Amelio Vazquez-Reina's user avatar
38 votes
1 answer
44k views

I want to use the Pandas dataframe to breakdown the variance in one variable. For example, if I have a column called 'Degrees', and I have this indexed for various dates, cities, and night vs. day, I ...
wolfsatthedoor's user avatar
35 votes
2 answers
15k views

I got good use out of pandas' MovingOLS class (source here) within the deprecated stats/ols module. Unfortunately, it was gutted completely with pandas 0.20. The question of how to run rolling OLS ...
Brad Solomon's user avatar
  • 41.2k
33 votes
2 answers
69k views

Problem Statement: I have some nice data in a pandas dataframe. I'd like to run simple linear regression on it: Using statsmodels, I perform my regression. Now, how do I get my plot? I've tried ...
Alex Lenail's user avatar
  • 14.7k
32 votes
3 answers
36k views

Short version: I was using the scikit LinearRegression on some data, but I'm used to p-values so put the data into the statsmodels OLS, and although the R^2 is about the same the variable coefficients ...
Nat Poor's user avatar
  • 461
31 votes
3 answers
19k views

Say I have a dataframe (let's call it DF) where y is the dependent variable and x1, x2, x3 are my independent variables. In R I can fit a linear model using the following code, and the . will include ...
Greg's user avatar
  • 7,161
30 votes
2 answers
46k views

So I have a CSV file with two columns: date and price, but when I tried to use ARIMA on that time series I encountered this error: ValueWarning: A date index has been provided, but it has no ...
Dorki's user avatar
  • 1,199
29 votes
2 answers
34k views

Say I fit a model in statsmodels mod = smf.ols('dependent ~ first_category + second_category + other', data=df).fit() When I do mod.summary() I may see the following: Warnings: [1] The condition ...
Amelio Vazquez-Reina's user avatar
27 votes
4 answers
22k views

In understand that when I have a category variable in a model passed to a statsmodels fit that dummy variables will automatically be generated for the categories. For example if I have a variable '...
orome's user avatar
  • 49.2k
25 votes
4 answers
54k views

I want to use a logit model and trying to import statsmodels library. My Version: Python 3.6.8 The best suggestion I got is to downgrade scipy but unclear how to and to what version should I ...
Bhavya Geethika's user avatar
25 votes
2 answers
58k views

I am trying calculate a regression output using python library but I am unable to get the intercept value when I use the library: import statsmodels.api as sm It prints all the regression analysis ...
Shank's user avatar
  • 685
25 votes
2 answers
30k views

I use ARIMA from statsmodels package in order to predict values from a series: plt.plot(ind, final_results.predict(start=0 ,end=26)) plt.plot(ind, forecast.values) plt.show() I thought that I would ...
Simone's user avatar
  • 4,980
24 votes
3 answers
17k views

I've been using Python for regression analysis. After getting the regression results, I need to summarize all the results into one single table and convert them to LaTex (for publication). Is there ...
Titanic's user avatar
  • 557
24 votes
3 answers
66k views

am trying to run logit regression for german credit data (www4.stat.ncsu.edu/~boos/var.select/german.credit.html). To test the code, I have used only numerical variables and tried regressing it with ...
user3122731's user avatar
22 votes
5 answers
23k views

I am trying to make QQ-plots using the statsmodel package. However, the resolution of the figure is so low that I could not possibly use the results in a presentation. I know that to make networkX ...
mlg4080's user avatar
  • 423
22 votes
1 answer
23k views

I try to use the add_constant() function with an array of dataset. At index 59 it works (the column is created) but at index 60 it isn't created. Initially, testmat[59] returns a shape of (24, 54) and ...
florian's user avatar
  • 1,021
22 votes
2 answers
10k views

Given the some randomly generated data with 2 columns, 50 rows and integer range between 0-100 With R, the poisson glm and diagnostics plot can be achieved as such: > col=2 > row=50 > ...
alvas's user avatar
  • 123k
21 votes
3 answers
43k views

I am struggling to understand the concept of p-value and the various other results of adfuller test. The code I am using: (I found this code in Stack Overflow) import numpy as np import os import ...
Sid's user avatar
  • 4,075

1
2 3 4 5
58