0

I'm making a logistic regression model to predict female presence on boards in tech SMEs. I was going to take out companies with only 1 employee, as they don't have boards, but my supervisor told me to keep them in, and account for their effect with a self_employed binary variable.

I have done this, but I'm worried about correlation with the log_employees variable, as self employed companies always have 1 employee, so log_employes is always 0. The pairwise correlation is -0.71, but the GVIFs are around 2, with the GVIF^(1/(2*Df)) 1.5 for log_employees, and 1.43 for self_employed. I feel like this may still be causing issues, as over 50% of the companies data are self employed.

is it enough to account for with the variable or should i remove them? I have also tried binning employee count into self-employed, micro, small, medium, but i think this is over complicated and the reference category makes interpretation confusing. Any help would be greatly appreciated! thank you.

(I can no longer contact my supervisor for advice)

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.