(Mis)understanding Selection on Observables
On a recent exam I asked students to extend the logic of propensity score weighting to handle a treatment that takes on three rather than two values: basically a stripped-down version of Imbens (2000). Nearly everyone figured this out without much trouble, which is good news! At the same time, I noticed some common misconceptions about the all-important selection-on-observables assumption:
Two Misconceptions
The following two statements about selection on observables are false:
- Under selection on observables, if I know the value of someone’s covariate vector
, then learning her treatment status provides no additional information about the average value of her observed outcome . - Selection on observables requires the treatment
and potential outcomes to be conditionally independent given covariates .
If you’ve studied treatment effects, pause for a moment and see if you can figure out what’s wrong with each of them before reading further.
The First Misconception
The first statement:
Under selection on observables, if I know the value of someone’s covariate vector
, then learning her treatment status provides no additional information about the average value of her observed outcome .
is a verbal description of the following conditional mean independence condition:
So how could the RHS equal zero? One way is if
So we can’t have
To summarize: the first statement above cannot be an implication of selection on observables because it would either require a violation of the overlap assumption, or imply that there is no treatment effect whatsoever. To correct the statement, we simply need to change the last three words:
Under selection on observables, if I know the value of someone’s covariate vector
, then learning her treatment status provides no additional information about the average values of her potential outcomes .
This is a correct verbal statement of the mean exclusion restriction
The Second Misconception
And this leads nicely to the second misconception:
Selection on observables requires the treatment
and potential outcomes to be conditionally independent given covariates .
To see why this is false, consider an example in which
Selection on observables requires the treatment
and potential outcomes to be conditionally mean independent given covariates .
Conditional independence implies conditional mean independence, but the converse is false.
Epilogue
So what’s the moral here? First, it’s crucial to distinguish between the observed outcome
For more details see my lecture notes on treatment effects↩︎
You might object that in the real world it is difficult to think of settings in which conditional mean independence is plausible but full independence does not. This is a fair point. Nevertheless, it’s important to be clear about which assumptions are actually used in a given derivation, and here we only rely on conditional mean independence.↩︎