I'm making a logistic regression model to predict female presence on boards in tech SMEs. I was going to take out companies with only 1 employee, as they don't have boards, but my supervisor told me to keep them in, and account for their effect with a self_employed binary variable.
I have done this, but I'm worried about correlation with the log_employees variable, as self employed companies always have 1 employee, so log_employes is always 0. The pairwise correlation is -0.71, but the GVIFs are around 2, with the GVIF^(1/(2*Df)) 1.5 for log_employees, and 1.43 for self_employed. I feel like this may still be causing issues, as over 50% of the companies data are self employed.
is it enough to account for with the variable or should i remove them? I have also tried binning employee count into self-employed, micro, small, medium, but i think this is over complicated and the reference category makes interpretation confusing. Any help would be greatly appreciated! thank you.
(I can no longer contact my supervisor for advice)