0

I have a logistic model which is created with statsmodel library. I want to get marginal effects, after that i will plot it. However, output of get_margeff() gives me table with all NA.


Xtrain = df_3[["USGoal","videolink","min_USPledge","max_USPledge","npledges","nbackers","ProjectDuration","Staff_pick1"
               ,"art","comics","crafts","dance","design","fashion","film & video","food","games","journalism","music","photography","publishing"
               ,"theater","OC","EU","AS","SA","AF"]]
ytrain = df_3[['Success']]
Xtrain = Xtrain.astype(int)
ytrain = ytrain.astype(int)

Xtrain_with_constant = sm.add_constant(Xtrain)


# building the model and fitting the data
log_reg = sm.Logit(ytrain, Xtrain_with_constant).fit(cov_type="hc0")

building the model and fitting the data

log_reg = sm.Logit(ytrain, Xtrain_with_constant).fit(cov_type="hc0")`

Some of my independent variables are continuous, some of them are binary.

When I try to get marginal effects with get_margeff() functions it gives me this table:

# Compute the marginal effects
marginal_effects = log_reg.get_margeff()
print(marginal_effects.summary())
        Logit Marginal Effects       
=====================================
Dep. Variable:                Success
Method:                          dydx
At:                           overall
===================================================================================
                     dy/dx    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------
USGoal                 nan        nan        nan        nan         nan         nan
videolink              nan        nan        nan        nan         nan         nan
min_USPledge           nan        nan        nan        nan         nan         nan
max_USPledge           nan        nan        nan        nan         nan         nan
npledges               nan        nan        nan        nan         nan         nan
nbackers               nan        nan        nan        nan         nan         nan
ProjectDuration        nan        nan        nan        nan         nan         nan
Staff_pick1            nan        nan        nan        nan         nan         nan
art                    nan        nan        nan        nan         nan         nan
comics                 nan        nan        nan        nan         nan         nan
crafts                 nan        nan        nan        nan         nan         nan
dance                  nan        nan        nan        nan         nan         nan
design                 nan        nan        nan        nan         nan         nan
fashion                nan        nan        nan        nan         nan         nan
film & video           nan        nan        nan        nan         nan         nan
food                   nan        nan        nan        nan         nan         nan
games                  nan        nan        nan        nan         nan         nan
journalism             nan        nan        nan        nan         nan         nan
music                  nan        nan        nan        nan         nan         nan
photography            nan        nan        nan        nan         nan         nan
publishing             nan        nan        nan        nan         nan         nan
theater                nan        nan        nan        nan         nan         nan
OC                     nan        nan        nan        nan         nan         nan
EU                     nan        nan        nan        nan         nan         nan
AS                     nan        nan        nan        nan         nan         nan
SA                     nan        nan        nan        nan         nan         nan
AF                     nan        nan        nan        nan         nan         nan
===================================================================================

Every example in internet directly get results. I find a half-solution if I add parametres count=True for continuous and dummy=True for dummy variables. However, even I got dy/dx results (and I'm not sure it is the true method), I cannot get anything about other columns like std error or z. I don't have any problem about output of my logistic regression and other tests/calculations on it.

                           Logit Regression Results
==============================================================================
Dep. Variable:                Success   No. Observations:                 1996
Model:                          Logit   Df Residuals:                     1968
Method:                           MLE   Df Model:                           27
Date:                Wed, 07 Jun 2023   Pseudo R-squ.:                  0.2144
Time:                        22:18:07   Log-Likelihood:                -1058.3
converged:                       True   LL-Null:                       -1347.1
Covariance Type:                  hc0   LLR p-value:                1.244e-104
===================================================================================
                      coef    std err          z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------------
const              -0.2847      0.300     -0.950      0.342      -0.872       0.303
USGoal           -2.15e-05   1.34e-05     -1.609      0.108   -4.77e-05    4.69e-06
videolink           0.2493      0.128      1.953      0.051      -0.001       0.499
min_USPledge        0.0018      0.002      0.771      0.441      -0.003       0.006
max_USPledge    -9.086e-05   2.97e-05     -3.064      0.002      -0.000   -3.27e-05
npledges            0.0479      0.012      3.978      0.000       0.024       0.072
nbackers            0.0015      0.002      0.917      0.359      -0.002       0.005
ProjectDuration    -0.0173      0.004     -3.881      0.000      -0.026      -0.009
Staff_pick1         1.8564      0.266      6.977      0.000       1.335       2.378
art                 0.7794      0.296      2.629      0.009       0.198       1.360
comics              1.2965      0.412      3.147      0.002       0.489       2.104
crafts              0.7756      0.463      1.676      0.094      -0.131       1.682
dance               2.2466      0.615      3.654      0.000       1.042       3.452
design              2.2163      0.437      5.074      0.000       1.360       3.072
fashion             0.2082      0.321      0.649      0.516      -0.420       0.837
film & video        1.2599      0.246      5.118      0.000       0.777       1.742
food                0.0235      0.286      0.082      0.935      -0.537       0.584
games               0.8746      0.358      2.440      0.015       0.172       1.577
journalism         -0.4274      0.484     -0.884      0.377      -1.375       0.520
music               0.9218      0.293      3.150      0.002       0.348       1.495
photography         1.2375      0.431      2.871      0.004       0.393       2.082
publishing         -0.1861      0.280     -0.665      0.506      -0.734       0.362
theater             2.1847      0.504      4.336      0.000       1.197       3.172
OC                 -0.3697      0.357     -1.036      0.300      -1.069       0.330
EU                 -0.1767      0.147     -1.204      0.228      -0.464       0.111
AS                  0.9400      0.508      1.851      0.064      -0.056       1.935
SA                  0.0634      1.074      0.059      0.953      -2.042       2.169
AF                 -0.6389      1.186     -0.539      0.590      -2.963       1.685
===================================================================================
Also, I got 4 different RunTimeWarnings:
RuntimeWarning: invalid value encountered in divide return np.exp(-X)/(1+np.exp(-X))**2

RuntimeWarning: invalid value encountered in square return np.exp(-X)/(1+np.exp(-X))**2

overflow encountered in exp return 1/(1+np.exp(-X))

RuntimeWarning: overflow encountered in exp return np.exp(-X)/(1+np.exp(-X))**2
5
  • Did the estimation converge successfully? Does the summary() look ok and does not have nans? Do you get the same nans when you uses standard nonrobust cov_type? Do you get any RuntimeWarnings, e.g. zero division, .... (It's difficult to guess without having a reproducible example) Commented Jun 7, 2023 at 15:38
  • @Josef Thanks for your help. My "summary()" look ok and doesn't have nans. Also, I don't have any nans in my dataset. However, I checked after your comment and I have some RuntimeWarnings. . I uptade my question with runtimewarnings and result of my model. However, still I couldn't fix them. Commented Jun 7, 2023 at 19:41
  • check log_reg.predict() for nans and values close to zero or one. Also, try margeff at means instead of "overall". My guess is that in computing prediction or margeff an overflow occurs, maybe at points where predicted variance is zero, which causes a nan that propagates to all results. Commented Jun 8, 2023 at 14:09
  • @Josef Thank you so much! I have checked prediction results before. However, after I try margeff at mean, it works for table! Commented Jun 8, 2023 at 18:44
  • If it works for margeff at mean, then the margeff computation for some observations computes the nans (overflow). You could find the observations that cause the nan problem and check what might be the reason, e.g. strange prediction or x values. Commented Jun 8, 2023 at 18:51

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.