Propensity Score Analysis: A Primer and Tutorial

Author

Noah Greifer

Published

December 8, 2023

Introduction

Propensity score analysis is a popular method of adjusting for confounding in observational studies, i.e., studies where patients are not randomly assigned into treatment groups. Despite its popularity in applied research, there are many nuances to the method that are often missed by researchers, including about the assumptions required, the quantities that can be estimated, and the correct procedures for performing and reporting an analysis. The goal of this guide is to summarize best practices in propensity score analysis for medical and social science researchers, highlighting the decisions researchers must make to validly perform and interpret an analysis. This guide is not a substitute for a PhD in biostatistics or even a course in propensity score analysis; it should be seen as a starting point that synthesizes the existing literature and provides references for further reading to deepen one’s understanding of the methods involved.

Although propensity score analysis is an analytic method, the key to performing it well is to understand the theoretical and substantive considerations relevant to the method; therefore, this guide will focus on those aspects. For the more applied parts of the tutorial, R code and output will be presented to demonstrate the procedure.

This document will describe the three basic steps of performing a propensity score analysis: 1) planning the analysis, 2) running the analysis, and 3) reporting the analysis. In Chapter 1, we will describe the conceptual steps that must be done to determine what options will be selected in subsequent steps and to align the analysis with the quantity of interest. In Chapter 2 and the chapters that follow, we provide step-by-step instructions for implementing the computational steps of performing the analysis, including fitting models and estimating quantities of interest. In Chapter 10, we provide guidelines for reporting the results of an analysis, including information to include in tables and figures and information necessary for another researchers to replicate your findings. Finally, in Chapter 11 and the chapters that follow, we present an example analysis that includes R code to implement the methods.

What is Propensity Score Analysis?

Propensity score analysis is one of many methods that researchers can use to adjust for confounding when estimating the effect of a variable (e.g., a treatment) on an outcome. What distinguishes it from other methods are the assumptions it requires in order to be valid and the robustness to unknown features of the data. Broadly, propensity score analysis seeks to make the treatment independent of confounders so that the association between the treatment and outcome is unbiased for the causal effect of the treatment on the outcome. It does so using a new variable constructed as part of the analysis called the “propensity score”, though there are modern variations that serve the same function but do not require the propensity score1. Many introductory articles have been written about propensity score analysis, including Austin (2011), Harder, Stuart, and Anthony (2010), Caliendo and Kopeinig (2008), Shadish and Steiner (2010), and Benedetto et al. (2018).

Consider a situation with a two-level treatment (e.g., treated and control) in which participants were free to select the treatment they received. In order to estimate the causal effect of the treatment on an outcome (e.g., death at 24 months), one needs to make these groups comparable on the variables that induce confounding (i.e., confounders, common causes of the treatment and outcome). Propensity score analysis involves dropping, weighting, or stratifying units in the sample so that in the adjusted sample, the treatment is independent from the variables used in the adjustment.

Propensity score matching, the most popular method of propensity score analysis in medical research, described in Rosenbaum and Rubin (1983), involves computing the propensity score for each unit (more details on that later), finding pairs of units that have similar values of the propensity score, and discarding from the sample all units without pairs. What is left is (ideally) a sample in which the distributions of the variables used to compute the propensity score are close to identical between the treated and control groups, just as they would be in a randomized experiment.

The propensity score itself is a one-dimensional summary of the variables to be adjusted for, computed as the predicted probability of receiving treatment given the variables (i.e., covariates). The simplest and most common way to compute propensity scores is to run a logistic regression of treatment membership as the outcome and the covariates as the predictors, and use the predicted probability of being treated as the propensity score for each unit. Rosenbaum and Rubin (1983) proved that adjusting for the (true) propensity score is equivalent to adjusting for the covariates used to compute the propensity score, which is what makes it such a powerful technique. In practice, though, the performance of the adjustment must be evaluated (Ho et al. 2007), a process we describe in detail later.

Propensity score analysis is common in medical and social science research because researchers are often interested in the effect of a drug, vaccine, procedure, polic, or behavior on individual outcomes (e.g., survival, hospital costs, earnings, etc.) but are not in the position to run a randomized control trial and instead have retrospective administrative data or large surveys on individuals receiving those treatments. Without adjustment, individuals receiving treatment may be quite different from those not receiving treatment; they may be older, more likely to be a certain sex, more likely to be a certain race, more likely to have certain comorbidities, etc. Comparing treated and untreated individuals on the specified outcome would not yield a valid estimate of the causal effect of the treatment on the outcome because any observed differences could be driven instead by differences in the distributions of the confounders. Propensity score analysis helps to isolate the effect of the treatment so that any observed difference in the outcome can only be attributed to treatment.


  1. For the sake of this tutorial, we will classify all methods that manipulate the sample using observed variables to remove confounding to fall under the scope of “propensity score analysis”, even though propensity scores are not always used to do this. Many of the best performing methods for this task do not use propensity scores at all.↩︎