6  Respecification

If a given specification is not adequate in that balance is too poor, the effective sample size is too small, or the sample is no longer representative of the target population, one must respecify. Respecification can involve changing some aspect of the conditioning strategy, such as changing a parameter involved in the matching or weighting or changing the model used to estimate propensity scores, if any. Because there are so many parameters that can be changed and they can be changed in so many ways, it is impossible to give a complete account of the best way to respecify. One should try many specifications, examining patterns in how making those changes improves the quality of the resulting sample. As long as the outcome is not involved in this process, doing so will not invalidate inferences made at the end.

There are some common tricks that can be used to nudge the respecification process in the right direction. Below are some common issues and some potential solutions.

Having broad experience with the variety of matching and weighting methods available makes this process quick. Fortunately, the software we recommend and use in the examples, the R packages MatchIt and WeightIt, make switching between various specifications easy.

To avoid endless respecification, it is a good idea to use methods designed to optimize the evaluation criteria in a simple way. Often, the oldest and most commonly used methods are the worst in that they perform poorly and require manual respecification to get right. For example, 1:1 propensity score matching with a caliper is the most commonly used propensity score method in medical research, but it is widely known to have many problems: it hampers representativeness because the caliper discards units from both treatment groups (Rosenbaum and Rubin 1985), it reduces the effective sample size by dropping many units from the sample, it can make balance worse when used thoughtlessly (King and Nielsen 2019), and it has many specification parameters that need to be adjusted arbitrarily (e.g., the propensity score model, caliper width, matching order, etc.). Another popular but old method, propensity score weighting, also has many problems, including inability to achieve balance, low ESS due to extreme weights, and reduced representativeness when measures are taken to rectify the other problems.

Methods that consistently perform well include entropy balancing (Hainmueller 2012) and energy balancing (Huling and Mak, n.d.), as these ensure balance and representativeness without requiring major respecification. Entropy balancing guarantees exact balance as measured by the SMD, but it may be necessary to include other terms to fully balance the covariate distributions. Energy balancing balances the full covariate distribution, but can decrease ESS (though the trade-off between them can be managed with a single parameter). Though these methods are newer, they are beginning to see use in applied research (e.g., Bramante et al. 2022; Sharma et al. 2023) and should be the first line of defense when adjusting for confounders rather than poorly performing but older and more familiar methods.