I have a logistic model which is created with statsmodel library. I want to get marginal effects, after that i will plot it. However, output of get_margeff() gives me table with all NA.
Xtrain = df_3[["USGoal","videolink","min_USPledge","max_USPledge","npledges","nbackers","ProjectDuration","Staff_pick1"
,"art","comics","crafts","dance","design","fashion","film & video","food","games","journalism","music","photography","publishing"
,"theater","OC","EU","AS","SA","AF"]]
ytrain = df_3[['Success']]
Xtrain = Xtrain.astype(int)
ytrain = ytrain.astype(int)
Xtrain_with_constant = sm.add_constant(Xtrain)
# building the model and fitting the data
log_reg = sm.Logit(ytrain, Xtrain_with_constant).fit(cov_type="hc0")
building the model and fitting the data
log_reg = sm.Logit(ytrain, Xtrain_with_constant).fit(cov_type="hc0")`
Some of my independent variables are continuous, some of them are binary.
When I try to get marginal effects with get_margeff() functions it gives me this table:
# Compute the marginal effects
marginal_effects = log_reg.get_margeff()
print(marginal_effects.summary())
Logit Marginal Effects
=====================================
Dep. Variable: Success
Method: dydx
At: overall
===================================================================================
dy/dx std err z P>|z| [0.025 0.975]
-----------------------------------------------------------------------------------
USGoal nan nan nan nan nan nan
videolink nan nan nan nan nan nan
min_USPledge nan nan nan nan nan nan
max_USPledge nan nan nan nan nan nan
npledges nan nan nan nan nan nan
nbackers nan nan nan nan nan nan
ProjectDuration nan nan nan nan nan nan
Staff_pick1 nan nan nan nan nan nan
art nan nan nan nan nan nan
comics nan nan nan nan nan nan
crafts nan nan nan nan nan nan
dance nan nan nan nan nan nan
design nan nan nan nan nan nan
fashion nan nan nan nan nan nan
film & video nan nan nan nan nan nan
food nan nan nan nan nan nan
games nan nan nan nan nan nan
journalism nan nan nan nan nan nan
music nan nan nan nan nan nan
photography nan nan nan nan nan nan
publishing nan nan nan nan nan nan
theater nan nan nan nan nan nan
OC nan nan nan nan nan nan
EU nan nan nan nan nan nan
AS nan nan nan nan nan nan
SA nan nan nan nan nan nan
AF nan nan nan nan nan nan
===================================================================================
Every example in internet directly get results. I find a half-solution if I add parametres count=True for continuous and dummy=True for dummy variables. However, even I got dy/dx results (and I'm not sure it is the true method), I cannot get anything about other columns like std error or z. I don't have any problem about output of my logistic regression and other tests/calculations on it.
Logit Regression Results
==============================================================================
Dep. Variable: Success No. Observations: 1996
Model: Logit Df Residuals: 1968
Method: MLE Df Model: 27
Date: Wed, 07 Jun 2023 Pseudo R-squ.: 0.2144
Time: 22:18:07 Log-Likelihood: -1058.3
converged: True LL-Null: -1347.1
Covariance Type: hc0 LLR p-value: 1.244e-104
===================================================================================
coef std err z P>|z| [0.025 0.975]
-----------------------------------------------------------------------------------
const -0.2847 0.300 -0.950 0.342 -0.872 0.303
USGoal -2.15e-05 1.34e-05 -1.609 0.108 -4.77e-05 4.69e-06
videolink 0.2493 0.128 1.953 0.051 -0.001 0.499
min_USPledge 0.0018 0.002 0.771 0.441 -0.003 0.006
max_USPledge -9.086e-05 2.97e-05 -3.064 0.002 -0.000 -3.27e-05
npledges 0.0479 0.012 3.978 0.000 0.024 0.072
nbackers 0.0015 0.002 0.917 0.359 -0.002 0.005
ProjectDuration -0.0173 0.004 -3.881 0.000 -0.026 -0.009
Staff_pick1 1.8564 0.266 6.977 0.000 1.335 2.378
art 0.7794 0.296 2.629 0.009 0.198 1.360
comics 1.2965 0.412 3.147 0.002 0.489 2.104
crafts 0.7756 0.463 1.676 0.094 -0.131 1.682
dance 2.2466 0.615 3.654 0.000 1.042 3.452
design 2.2163 0.437 5.074 0.000 1.360 3.072
fashion 0.2082 0.321 0.649 0.516 -0.420 0.837
film & video 1.2599 0.246 5.118 0.000 0.777 1.742
food 0.0235 0.286 0.082 0.935 -0.537 0.584
games 0.8746 0.358 2.440 0.015 0.172 1.577
journalism -0.4274 0.484 -0.884 0.377 -1.375 0.520
music 0.9218 0.293 3.150 0.002 0.348 1.495
photography 1.2375 0.431 2.871 0.004 0.393 2.082
publishing -0.1861 0.280 -0.665 0.506 -0.734 0.362
theater 2.1847 0.504 4.336 0.000 1.197 3.172
OC -0.3697 0.357 -1.036 0.300 -1.069 0.330
EU -0.1767 0.147 -1.204 0.228 -0.464 0.111
AS 0.9400 0.508 1.851 0.064 -0.056 1.935
SA 0.0634 1.074 0.059 0.953 -2.042 2.169
AF -0.6389 1.186 -0.539 0.590 -2.963 1.685
===================================================================================
Also, I got 4 different RunTimeWarnings:
RuntimeWarning: invalid value encountered in divide return np.exp(-X)/(1+np.exp(-X))**2
RuntimeWarning: invalid value encountered in square return np.exp(-X)/(1+np.exp(-X))**2
overflow encountered in exp return 1/(1+np.exp(-X))
RuntimeWarning: overflow encountered in exp return np.exp(-X)/(1+np.exp(-X))**2
summary()look ok and does not have nans? Do you get the same nans when you uses standard nonrobust cov_type? Do you get any RuntimeWarnings, e.g. zero division, .... (It's difficult to guess without having a reproducible example)log_reg.predict()for nans and values close to zero or one. Also, try margeff at means instead of "overall". My guess is that in computing prediction or margeff an overflow occurs, maybe at points where predicted variance is zero, which causes a nan that propagates to all results.