From Estimands to Statistical Inference
June 21, 2025
Association and causation are distinct concepts, with different types of tools and assumptions necessary for their study.
Example 1 How should we interpret the linear regression parameter
Question: What is “the effect” of drug A versus B on illness
Patient ID | |||
---|---|---|---|
1 | 19 | A | 1 |
2 | 45 | B | 0 |
… | … | … | … |
199 | 57 | B | 0 |
200 | 32 | A | 1 |
Question: What is “the effect” of drug A versus B on illness
Patient ID | |||||
---|---|---|---|---|---|
1 | 19 | A | 1 | ? | ? - 1 |
2 | 45 | B | ? | 0 | 0 - ? |
… | … | … | … | … | … |
199 | 57 | B | ? | 0 | 0 - ? |
200 | 32 | A | 1 | ? | ? - 1 |
Proposition 1 (Identification) Let
Then,
With Proposition 1, we can re-express
While
Since we have that
The difference in means estimator can be expressed
What if we knew the probability of treatment assignment for each arm?
In 1965, the (bio)statistician and epidemiologist Sir Austin Bradford Hill outlined a set of criteria for making causal judgments on the basis of observational evidence. An excerpt from Hill (1965) appears below:
When our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance, what aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?
- Strength: First upon my list I would put the strength of the association.
- Consistency: Next on my list of features to be specially considered I would place the consistency of the observed association. Has it been repeatedly observed by different persons, in different places, circumstances and times?
- Specificity: One reason, needless to say, is the specificity of the association, the third characteristic which invariably we must consider.
- Temporality: My fourth characteristic is the temporal relationship of the association—which is the cart and which the horse?
- Biological gradient: Fifthly, if the association is one which can reveal a biological gradient, or dose-response curve, then we should look most carefully for such evidence.
- Plausibility: It will be helpful if the causation we suspect is biologically plausible. But this is a feature I am convinced we cannot demand. What is biologically plausible depends upon the biological knowledge of the day.
- Coherence: On the other hand the cause-and effect interpretation of our data should not seriously conflict with the generally known facts of the natural history and biology of the disease—in the expression of the Advisory Committee to the Surgeon-General it should have coherence.
- Experiment: Occasionally it is possible to appeal to experimental, or semi-experimental, evidence. For example, because of an observed association some preventive action is taken. Does it in fact prevent?
- Analogy: In some circumstances it would be fair to judge by analogy. With the effects of thalidomide and rubella before us we would surely be ready to accept slighter but similar evidence with another drug or another viral disease in pregnancy
Proposition 2 (Identification) Let
Then,
With Proposition 2, we can re-express
This component is an instance of the CodeMirror interactive text editor. The editor has been configured so that the Tab key controls the indentation of code. To move focus away from the editor, press the Escape key, and then press the Tab key directly after it. Escape and then Shift-Tab can also be used to move focus backwards.
HST 190: Introduction to Biostatistics