Matching and Weighting for Causal Inference: A Primer and Tutorial
Introduction
Matching and weighting, a popular special case of which is sometimes known as propensity score analysis, are popular methods of adjusting for confounding in observational studies, i.e., studies where patients are not randomly assigned into treatment groups. Despite their popularity in applied research, there are many nuances to the methods that are often missed by researchers, including about the assumptions required, the quantities that can be estimated, and the correct procedures for performing and reporting an analysis. The goal of this guide is to summarize best practices in matching and weighting for medical and social science researchers, highlighting the decisions researchers must make to validly perform and interpret an analysis. This guide is not a substitute for a PhD in biostatistics or even a course in causal inference or propensity score analysis; it should be seen as a starting point that synthesizes the existing literature and provides references for further reading to deepen one’s understanding of the methods involved.
Although matching and weighting are analytic methods, the key to performing them well is to understand the theoretical and substantive considerations relevant to the methods; therefore, this guide will focus on those aspects. For the more applied parts of the tutorial, R code and output will be presented to demonstrate the procedure.
This document will describe the three basic steps of performing a matching or weighting analysis: 1) planning the analysis, 2) running the analysis, and 3) reporting the analysis. In 1 Planning the Analysis, we will describe the conceptual steps that must be done to determine what options will be selected in subsequent steps and to align the analysis with the quantity of interest. In 2 Basic Steps and the chapters that follow, we provide step-by-step instructions for implementing the computational steps of performing the analysis, including fitting models and estimating quantities of interest. In 10 Reporting the Analysis, we provide guidelines for reporting the results of an analysis, including information to include in tables and figures and information necessary for another researchers to replicate your findings. Finally, in 11 Example Data and the chapters that follow, we present an example analysis that includes R code to implement the methods.
What are Matching and Weighting?
Matching and weighting are two of many methods that researchers can use to adjust for confounding when estimating the effect of a variable (e.g., a treatment) on an outcome. What distinguishes them from other methods are the assumptions they require in order to be valid and the robustness to unknown features of the data. Broadly, matching and weighting seek to make the treatment independent of confounders so that the association between the treatment and outcome is unbiased for the causal effect of the treatment on the outcome. Common methods of matching and weighting often work by using a new variable constructed as part of the analysis called the “propensity score”, which is why the methods are often known collectively as “propensity score analysis”, though there are variations of matching and weighting that serve the same function but do not involve the propensity score (and often perform better than those that do). Many introductory articles have been written about matching and weighting, including Austin (2011), Harder, Stuart, and Anthony (2010), Caliendo and Kopeinig (2008), Shadish and Steiner (2010), and Benedetto et al. (2018).
Consider a situation with a two-level treatment (e.g., treated and control) in which participants were free to select the treatment they received. In order to estimate the causal effect of the treatment on an outcome (e.g., death at 24 months), one needs to make these groups comparable on the variables that induce confounding (i.e., confounders, common causes of the treatment and outcome). Matching and weighting involve dropping, up- and down-weighting, or stratifying units in the sample so that in the adjusted sample, the treatment is independent from the variables used in the adjustment.
The most commonly used method of matching and weighting is propensity score matching, described originally in Rosenbaum and Rubin (1983), which involves computing the propensity score for each unit (more details on that later), finding pairs of units that have similar values of the propensity score, and discarding from the sample all units without pairs. What is left is (ideally) a sample in which the distributions of the variables used to compute the propensity score are close to identical between the treated and control groups, just as they would be in a randomized experiment. Despite its popularity, propensity score matching can do more harm than good (King and Nielsen 2019); this guide also explains alternative methods that avoid many of the problems propensity score matching faces.
Matching and weighting are common in medical and social science research because researchers are often interested in the causal effect of a drug, vaccine, procedure, policy, or behavior on individual outcomes (e.g., survival, hospital costs, earnings, etc.) but are not in the position to run a randomized control trial and instead have retrospective administrative data or large surveys on individuals receiving those treatments. Without adjustment, individuals receiving treatment may be quite different from those not receiving treatment; they may be older, more likely to be a certain sex, more likely to be a certain race, more likely to have certain comorbidities, etc. Comparing treated and untreated individuals on the specified outcome would not yield a valid estimate of the causal effect of the treatment on the outcome because any observed differences could be driven instead by differences in the distributions of the confounders. Matching and weighting help to isolate the effect of the treatment so that any observed difference in the outcome can only be attributed to treatment.