Causal Inference

From Estimands to Statistical Inference

Nima Hejazi
nhejazi@hsph.harvard.edu

Harvard Biostatistics

September 3, 2025

Causal inference and (bio)statistics

  • Study designs: Randomized controlled trials (RCTs), observational studies, “natural” experiments
  • Why do study designs matter?
    • Sampling: Are our inferences generalizable?
    • Confounding: Are we learning what we think we are?
    • What type of study should we aim for? What can we learn?
  • When can a statistical inference be interpreted as causal?
    • Randomization of treatment assignment, no (unmeasured) confounding (in observational studies)
    • Sufficient experimentation in treatment assignment (positivity)

Questions…association or causation

  • “Is risk of symptomatic disease higher/lower in vaccinated or unvaccinated groups?”
  • Questions of association inquire about the actual (or realized) state of the system under study; they do not require conceiving of a manipulation of the system, only the ability to observe it.
  • “Would the risk of symptomatic disease be increased or decreased by vaccination?”
  • Questions of causality inquire about a counterfactual state; they require that we conceive of how the system would have behaved had it been subjected to an intervention (by us or others).

Association and causation are distinct concepts, with different types of tools and assumptions necessary for their study.

Regression and causality

Example 1 How should we interpret the linear regression parameter β1:

E[Y∣X]=β0+β1X1+…+βpXp

  • Tempting to call β1 “the expected change in outcome Y if covariate X1 was to be increased by one unit, keeping other covariates constant.” (It is tempting also to teach others to say so.)
  • With only the statement about the regression model above, is this true? Are there any key assumptions missing?
  • The linear form above makes no statement about mechanism; it does not claim to do any more than describe an observed reality—that is, it is not a statement about causality.

An early example of an RCT (per Senn (2022))

  • In 1747, James Lind, a surgeon on the HMS Salisbury, conducted one of the first medical trials—to assess the causal relationship, if any, between eating oranges and lemons and recovering from scurvy.
  • Lind took aside twelve men with advanced symptoms of scurvy “as similar as [he] could have them,” for a matched pairs experiment.
    • The first pair were given slightly alcoholic cider and the second an elixir of vitriol. The third pair took vinegar while the fourth drank sea water. The fifth were fed two oranges and one lemon daily for six days, and the sixth were given a medicinal paste and a mild laxative.
    • Of the six pairs, the pair who were fed the oranges and lemons were nearly recovered after only a week, and those who had drunk the cider responded favorably but were too weak to return to duty after two weeks. The other four pairs all improved little—or not at all.

The first RCT

  • In 1947, Great Britain’s Medical Research Council (MRC) conducted the first published instance of a blinded, randomized controlled trial (RCT).
  • Clinical setting: n=107 patients suffering from TB were assigned to experimental (streptomycin) or control regimens via a (manual) system of random number assignments devised by Sir Austin Bradford Hill.
    • Prior to this, alternative assignment strategies (e.g., “alternating allocation”) had been preferred over randomization
    • Hill himself harbored doubts about the ethics of randomization
    • Hill became persuaded, as a limited supply of streptomycin meant it was the sole way in which most patients could receive the drug
  • The MRC streptomycin trial was groundbreaking, pioneering the use of randomized treatment assignment in clinical settings.

The blessing of randomization

  • Today, RCTs are considered a “gold standard” for medical research (and causal inference!), due to the inferential safeguards they provide.
  • Randomization allows for the causal effect of a candidate treatment to be isolated from that of other variables (potential confounders).
  • Without randomization, any association between candidate treatment and the outcome could not be disentangled from the association of the confounders with either.
  • Randomization is not a panacea, however, and the validity of an RCT’s findings depends on the quality of its design and execution.
  • As a result of modern advances (ongoing since the 1980s), causal inference is now possible even without randomization (that is, in observational studies)—but this requires great care…

Learning from data…or trying to…

Question: What is “the effect” of drug X (1) versus Z (0) on illness Y?

Mock dataset from a hypothetical study.
Patient ID L (age) A (drug) Y (illness)
1 19 1 1
2 45 0 0
… … … …
199 57 0 0
200 32 1 1
  • Drug A: standard-of-care Z (0) or an investigational drug X (1).
  • Illness Y: some patients recover (Y=1), others don’t (Y=0).
  • Does the age (L) of the patients matter in our analysis?

When the ideal data are missing

Question: What is “the effect” of drug Z versus X on illness Y?

Potential outcomes (“science”) table
Patient ID L A Y1 Y0 Y1−Y0
1 19 1 1 ? 1 - ?
2 45 0 ? 0 ? - 0
… … … … … …
199 57 0 ? 0 ? - 0
200 32 1 1 ? 1 - ?
  • Potential outcomes: YiA=1 is the outcome of patient i had they taken drug X and YiA=0 the outcome had they taken drug Z.
  • YiA=1−YiA=0 is the individual causal effect (ICE) of patient i.

The fundamental problem of causal inference

  • Yi1, Yi0 are potential outcomes or counterfactual RVs (Imbens and Rubin 2015; Hernán and Robins 2025).
  • For a given study unit i, we cannot simultaneously observe both: Yi1 when drug X is assigned, Yi0 only when taking drug Z…
  • The ICE is Yi1−Yi0, and we cannot observe both potential outcomes—this is the fundamental problem of causal inference (Holland 1986).
  • The ICE cannot be identified but what about looking at it on average? θATE=E[Y1−Y0] , where θATE is the average treatment effect (ATE).

From causality to statistics in randomized studies

Proposition 1 (Identification) Let (A,Y)∼iidP and assume:

  1. Consistency: Y=Ya whenever A=a.
  2. Randomization: A⊥⊥Ya for each a∈A.
  3. Non-interference: Yia⊥⊥Aj for all i≠j.

Then, E[Y∣A=a]=E[Ya].

With Proposition 1, we can re-express θATE=E[Y1]−E[Y0] as ψATE=E[Y∣A=1]−E[Y∣A=0] .

While ψATE is a statistical estimand that can be evaluated using data, θATE is a causal estimand, defined by unobservable potential outcomes.

The difference-in-means estimator

Since we have that ψATE=E[Y∣A=1]−E[Y∣A=0] is equivalent to θATE=E[Y1]−E[Y0] by Proposition 1, we can estimate the ATE: ψ^ATEDM=E^[Y∣A=1]−E^[Y∣A=0]=1n1∑i:Ai=1Yi−1n0∑i:Ai=0Yi , where ψ^ATEDM is the difference-in-means (DM) estimator, with variance: Var(ψ^ATEDM)=V[Y∣A=1]p+V[Y∣A=0]1−p .

The Horvitz-Thompson (or IPW) estimator

The difference-in-means estimator can be expressed ψ^ATEDM=E^[Y∣A=1]−E^[Y∣A=0]=E^[YA]E^[A]−E^[Y(1−A)]E^[(1−A)]=E^[YA]p^1−E^[Y(1−A)]p^0 .

What if we knew the probability of treatment assignment for each arm? ψ^ATEHT=E^[Y∣A=1]−E^[Y∣A=0]=E^[YA]E[A]−E^[Y(1−A)]E[(1−A)]=E^[YA]p1−E^[Y(1−A)]p0 .

Efficiency, or how unsure should we be?

  • When comparing two estimators of the same parameter (target quantity), we care about bias and efficiency
  • Assuming both estimators are unbiased (accurately recover the target parameter), then our comparison boils down to relative efficiency
    • Relative efficiency is a measure of the quality of two unbiased estimators: it is the ratio of their (asymptotic) variances
  • It turns out that between the Horvitz-Thompson (HT) and difference-in-means (DM) estimators, DM is more efficient (smaller variance)
    • this is unintuitive…HT used the known assignment probabilities (p1,p0) while DM used the estimated probabilities (p^1,p^0)
    • Why does estimation of probabilities improve efficiency? When does the observed np^ get close to the theoretical np?

Observational studies: “Thank you for smoking”

  • Observational studies, randomized experiments not uniformly bad/good
    • Both types of studies range in quality, and it is not true that one type obviously dominates the other.
    • Their evidentiary status is context-dependent and needs to be evaluated case-by-case, based on specific merits or faults carefully considered.
  • R.A. Fisher was a major proponent of randomization for drawing causal inferences…and happened to enjoy smoking tobacco…
    • He was a prominent critic of any evidence linking smoking to cancer and other diseases; see, e.g., Fisher (1957), a commentary in The BMJ.
    • Since randomization of tobacco smoking is neither ethical nor feasible, so overwhelming observational evidence was used to link smoking to cancer
    • How to reconcile the lack of experimental evidence with the (very) strong observational evidence?

Observational studies and causal inference

The (bio)statistician and epidemiologist Sir Austin Bradford Hill outlined a set of criteria for making causal judgments on the basis of observational evidence. Here’s an excerpt from Hill (1965):

When our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance, what aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?

  1. Strength: First upon my list I would put the strength of the association.
  2. Consistency: Next on my list of features to be specially considered I would place the consistency of the observed association. Has it been repeatedly observed by different persons, in different places, circumstances and times?
  3. Specificity: One reason, needless to say, is the specificity of the association, the third characteristic which invariably we must consider.
  4. Temporality: My fourth characteristic is the temporal relationship of the association—which is the cart and which the horse?
  5. Biological gradient: Fifthly, if the association is one which can reveal a biological gradient, or dose-response curve, then we should look most carefully for such evidence.
  6. Plausibility: It will be helpful if the causation we suspect is biologically plausible. But this is a feature I am convinced we cannot demand. What is biologically plausible depends upon the biological knowledge of the day.
  7. Coherence: On the other hand the cause-and effect interpretation of our data should not seriously conflict with the generally known facts of the natural history and biology of the disease—in the expression of the Advisory Committee to the Surgeon-General it should have coherence.
  8. Experiment: Occasionally it is possible to appeal to experimental, or semi-experimental, evidence. For example, because of an observed association some preventive action is taken. Does it in fact prevent?
  9. Analogy: In some circumstances it would be fair to judge by analogy. With the effects of thalidomide and rubella before us we would surely be ready to accept slighter but similar evidence with another drug or another viral disease in pregnancy

From causality to statistics in observational studies

Proposition 2 (Identification) Let (L,A,Y)∼iidP and assume:

  1. Consistency: Y=Ya whenever A=a.
  2. Non-interference: Yia⊥⊥Aj for all i≠j.
  3. No unmeasured confounding: Ya⊥⊥A∣L for each a∈A.
  4. Positivity: 0<P(A=a∣L=l)<1 for all l∈L.

Then, E[E[Y∣A=a,L]]=E[Ya].

With Proposition 2, we can re-express θATE=E[Y1]−E[Y0] as ψATE=E[E[Y∣A=1,L]−E[Y∣A=0,L]] .

Observational studies: perils and promises

  • Overcome several limitations of RCTs, including by
    • increasing sample sizes and broadening eligibility criteria (and thus improving generalizability, possibly vastly)
    • limiting costs and logistic challenges and allowing for greater follow-up time
    • allowing retrospective construction from large registries and databases
  • Prone to critical drawbacks, requiring care to
    • appropriately handle confounding of the treatment–outcome relationship (e.g., via inverse probability weighting, regression adjustment)
    • assess reliability of conclusions, given the ubiquitous possibility of unmeasured confounding (via sensitivity analysis)
  • Recall that both RCTs and observational studies range in quality—and it is not true that one type obviously dominates the other

Target trial emulation protocols

Goal: Construct an observational study—or emulated trial—based on a hypothetical target trial.

  • Let’s imagine a hypothetical target trial: What desiderata would we seek if we were running an RCT de novo to answer questions of interest?
  • We cannot run an RCT de novo but contemplating doing so can guide design of an observational study (Hernán and Robins 2016).
    • Who would be eligible for the target trial?
    • What are the treatment strategies of interest?
    • How would treatment be assigned?
    • What serves as the index time (or time zero)?
    • When serves as the end of follow-up?

Emulating the target trial

Outlining essential features of the target trial clarifies how the emulated trial (observational study) may differ. Essential features include

  • Eligibility criteria: Who would be included in the study?
  • Treatment strategies: What are the investigational regimens of interest?
  • Assignment mechanism: How would treatment be assigned?
  • Time zero, follow-up: When would monitoring start and end?
  • Outcomes of interest: What is the outcome being evaluated?
  • Causal contrasts: What is the effect of interest (e.g., risk ratio)?
  • Statistical analysis: How will the causal contrast be recovered?

Enumerate how the target and emulated trials differ along these criteria in order to minimize and account for differences.

References

Fisher, Ronald A. 1957. “Dangers of Cigarette-Smoking.” British Medical Journal 2 (5039): 297–98.
Hernán, Miguel A, and James M Robins. 2016. “Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available.” American Journal of Epidemiology 183 (8): 758–64. https://doi.org/10.1093/aje/kwv254.
———. 2025. Causal Inference: What If. CRC Press.
Hill, Austin Bradford. 1965. “The Environment and Disease: Association or Causation?” Journal of the Royal Society of Medicine 58 (5): 295–300. https://doi.org/doi.org/10.1177/0141076814562718.
Holland, Paul W. 1986. “Statistics and Causal Inference.” Journal of the American Statistical Association 81 (396): 945–60. https://doi.org/10.1080/01621459.1986.10478354.
Imbens, Guido W, and Donald B Rubin. 2015. Causal Inference in Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge University Press. https://doi.org/10.1017/CBO9781139025751.
Senn, Stephen. 2022. Dicing with Death: Living by Data. Cambridge University Press. https://doi.org/10.1017/9781009000185.

The difference-in-means estimator…another look…

ψ^ATEDM=E^[Y∣A=1]−E^[Y∣A=0]=E^[YA]E^[A]−E^[Y(1−A)]E^[(1−A)]=1n∑i=1nYiAi1n∑i=1nAi−1n∑i=1nYi(1−Ai)1n∑i=1n(1−Ai)=1n∑i=1nYiAinn1−1n∑i=1nYi(1−Ai)nn0=1n1∑i:Ai=1Yi−1n0∑i:Ai=0Yi

HST 190: Introduction to Biostatistics

Causal Inference From Estimands to Statistical Inference Nima Hejazi nhejazi@hsph.harvard.edu Harvard Biostatistics September 3, 2025

  1. Slides

  2. Tools

  3. Close
  • Causal Inference
  • Causal inference and (bio)statistics
  • Questions…association or causation
  • Regression and causality
  • An early example of an RCT (per Senn (2022))
  • The first RCT
  • The blessing of randomization
  • Learning from data…or trying to…
  • When the ideal data are missing
  • The fundamental problem of causal inference
  • From causality to statistics in randomized studies
  • The difference-in-means estimator
  • The Horvitz-Thompson (or IPW) estimator
  • Efficiency, or how unsure should we be?
  • Observational studies: “Thank you for smoking”
  • Observational studies and causal inference
  • From causality to statistics in observational studies
  • Observational studies: perils and promises
  • Target trial emulation protocols
  • Emulating the target trial
  • References
  • The difference-in-means estimator…another look…
  • f Fullscreen
  • s Speaker View
  • o Slide Overview
  • e PDF Export Mode
  • r Scroll View Mode
  • b Toggle Chalkboard
  • c Toggle Notes Canvas
  • d Download Drawings
  • ? Keyboard Help