Reviewing articles makes me realize that people (including people who appear to be otherwise quite sophisticated in their methods) don’t know how to read tables for error and instability. Obviously, I just found a zinger. Details suppressed in the interest of the integrity of the peer review process. But if the author had really looked carefully at the tables instead of just coming up with stories to explain the coefficients, s/he should have realized something was amiss.
When you are comparing different model specifications on the same data, don’t just look at what is significant, and don’t just look at the variables you are interested in. Pay attention to whether the coefficient on each variable is relatively stable across models or fluctuates with the addition or subtraction of other variables. The coefficients on the same variables on the same sample normally stay pretty similar as other variables come and go from alternate specifications. If the coefficients are relatively stable (roughly the same magnitude, roughly the same standard error) in different models, this is good. They may go in and out of statistical significance depending on what else is in the model, but if the effect size stays about the same and the standard error stays about the same, that’s stable, that’s good.
If they are not stable, you need to know why before you mail the article off to the journal. In the worst case, unstable coefficients change between significantly positive and significantly negative, or between close to zero and large in either the positive or negative direction. But also pay attention if they keep the same sign but get a lot bigger or smaller.
What if coefficients are not stable? If the coefficient of variable X changes when you add other variables, one of three things is true: (1) the other variables correlate with X and overlap or interact with it in explaining the dependent variable, or (2) the sample is different in the two models, or (3) you made a mistake in running the models or copying the tables.
Some correlations or interactions among independent variables are substantively meaningful or otherwise unproblematic. It is normal for the coefficients of each of a set of correlated variables like income and education to be smaller when they are together in a model. Sometimes the whole point of an article is that a coefficient goes to zero or changes from zero to significant when something else is controlled. Similarly, sometimes the point is that some factor is salient only for a subset of the sample.
But before you hang your whole theory or interpretation on a fluctuating coefficient, you want to make sure it isn’t just a mistake. Make sure there are no typos in the code that produced the results. Make sure the table is copied properly. Check the sample sizes to be sure cases were not dropped for some unexpected reason. And especially check for specification error: explicitly test whether coefficients bounce with minor changes in model specification. Very often, you will see that the explanatory power of a model does not change at all when you add more variables, even though the coefficients change. This is a symptom that your sample is too small to make the distinctions you are trying to make. This is especially likely in fields where samples are necessarily relatively small, as is often true in research on organizations or political units or annual time series. Do your variables of interest have strong bivariate effects without controls? If not, exactly which control variables are needed in the model for the variable to have a significant effect? At what point do you stop adding explained variance and just change coefficients? In particular, watch out for pairs of correlated variables like income and education that take opposite signs in models with lots of other independent variables: this is frequently an artifact.
The problem of ignoring coefficient fluctuations is especially likely when the coefficients for “control variables” are suppressed. I have reviewed quite a few articles in which coefficients on control variables fluctuate quite suspiciously with nary a mention from the author, and am never happy when control variable coefficients are omitted entirely. (If they are going to be suppressed in the interest of space and readability from the main table, I still want to see them in an appendix as a reviewer, even if they appendix will end up on a web site instead of in print.)
Also pay attention to the number of cases in each model, to be sure you are not losing cases unexpectedly to missing data or other anomaly. If patterns of missing data are not a problem, the coefficients will stay pretty stable despite sample size fluctuations. But if a coefficient changes markedly when the sample size changes, that’s another sign of trouble.