Timeline for Why does the coefficient of a regressor increase with sample size when using poly() in lm() and glm()?
Current License: CC BY-SA 4.0
10 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| 5 hours ago | history | became hot network question | |||
| 12 hours ago | comment | added | Aaron - mostly inactive | A request: I've answered the question "How do I estimate the linear effect for a factor so that my estimate doesn't depend on the sample size?" If that (or some version) of it better reflects what you actually want to know, please help out future readers by editing your post to revise the title, and to state that more clearly in the text. If you actually are interested in how poly works, then leave as is; my answer can stay as an alternate answer for others who find it. | |
| 12 hours ago | comment | added | Guillaume | I think you're right, flattening to linear is probably the issue here, but I still don't get why poly(,1) scales with the sample size. I'm reading: stackoverflow.com/questions/19484053/… ; maybe the answer's in there. | |
| 12 hours ago | comment | added | Aaron - mostly inactive | Hmm. Anything you do to make this linear will be equivalent to 1,2,3. And I'd disagree that the idea of doing this highly frowned upon; what matters if it makes sense in your particular situation. That is, what is definitely frowned upon is people looking only at the linear effect of an ordered factor without thinking about if actually makes sense for them. | |
| 12 hours ago | answer | added | Aaron - mostly inactive | timeline score: 3 | |
| 13 hours ago | comment | added | Guillaume | People recommend to use the poly(a, 1, raw=TRUE) which keep it the way it is, which sounds good for numeric value, but it just transform ordered("sometime", "often", "always") into 1,2,3, which is, as I understand it, highly frowned upon in statistical circles. My question is more: how do you get the "true" coefficient, as there should be one that is mostly independent from the scale of the initial variable. | |
| 13 hours ago | comment | added | Aaron - mostly inactive |
Ah, thanks. The easy fix is to make a into a numeric integer-based value, though you could also use polynomial contrasts (instead of poly, which treats variables as numeric) or roll your own contrast function. I'll put a quick answer together.
|
|
| 13 hours ago | comment | added | Guillaume | Yes, I try to understand why poly scales the effect of the regressor according to the sample size. This behavior is creating a situation in which it is difficult to compare the importance of the regressors. | |
| 13 hours ago | comment | added | Aaron - mostly inactive | It seems you've answered your own question; the coefficients increase with sample size because with the way poly functions, the values of the linear predictor decrease with sample size. Are you interested in the mechanism of what poly is doing behind the scenes? | |
| 13 hours ago | history | asked | Guillaume | CC BY-SA 4.0 |