modelDependence.Rd
modelDependence()
computes model dependence bounds for a single dataset using either the extreme bounds procedure or Athey-Imbens procedure. The width of the bounds indicates the degree of model dependence present in the dataset. Wide bounds imply that treatment effects could vary greatly depending on the outcome model used and therefore depend heavily on it.
modelDependence(base.form = NULL, data, treatment,
outcome, covariates, weights = NULL,
method = c("extreme-bounds", "athey-imbens"),
model.dependence.ests = 101,
specifications = NULL, cutpoints = NULL,
cutpoint.method = c("mean", "median", "segmented"),
verbose = TRUE, seed = NULL)
# S3 method for modelDependenceBounds
plot(x, ...)
# S3 method for modelDependenceBounds
print(x, ...)
a formula
relating the outcome to the treatment and covariates, e.g., Y ~ A + X1 + X2 + ...
where Y
represents the outcome variable, A
the treatment variable, and X1
and X2
the covariates. This should correspond to the "default" or originally intended outcome model whose sensitivity is being assessed. If omitted, will just use a simple regression of the outcome on the treatment.
a data frame containing the variables named in base.form
, treatment
, outcome
, and covariates
.
the name of the treatment variable. If missing and base.form
is specified and only has one right-hand side variable, that variable will be considered the treatment.
the name of the outcome variable. Ignored if base.form
is supplied and required otherwise.
a character vector containing the names of the covariates over which model dependence is to be assessed. Can also be a one-sided formula with the covariates on the right-hand side. If omitted, all variables in base.form
that are not the treatment or outcome will be considered covariates.
an optional vector of weights (e.g., matching weights) to be included in the fitted models.
the method of computing model dependence bounds. Can be one of "extreme-bounds"
or "athey-imbens"
for the extreme bounds or Athey-Imbens procedures, respectively. Abbreviations allowed. Default is extreme bounds. See Details.
when method = "extreme-bounds"
, the number of different model specifications to use. Higher numbers give more accurate bounds but take longer. Exactly one of model.dependence.ests
or specifications
must be supplied.
when method = "extreme-bounds"
, a list of formula
s containing all the model specifications to consider in computing the extreme bounds. If omitted, these will be randomly generated using up to third degree polynomials and two-way interactions between covariates. Exactly one of model.dependence.ests
or specifications
must be supplied.
when method = "athey-imbens"
, a named list of the cutpoints for each covariate that has more than two unique values. A cutpoint of NA
means the unique values of the variable will be used. For any covariates without cutpoints specified, the cutpoints will computed using the method supplied to cutpoint.method
.
when method = "athey-imbens"
, the name of the method to create cutpoints for covariates without entries in cutpoints
. Options include "mean"
(the default) to use the covariate mean as the cutpoint, "median"
to use the covariate median as the cutpoint, and "segmented"
to estimate the cutpoints with segmented (piecewise) regression. Abbreviations allowed.
logical
; whether to print additional information about the procedure.
the seed for the random number generator. If NULL
, the current seed is used.
a modelDependenceBounds
object, the output of a call to modelDependence()
.
for print.modelDependenceBounds
, further arguments passed to print()
. Ignored otherwise.
The simplest way to estimate the model dependence bounds is to supply base.form
with the treatment and covariates on the right hand side and treatment
supplied.
The extreme bounds procedure (Leamer, 1983) works by fitting many different models of various functional forms with many combinations of the covariates. When specifications
is not supplied, these models are randomly generated by appending to the model in base.form
sets of squares, cubes, and two-way interactions of the covariates with model.dependence.ests
controlling how many such models are estimated. More is always better but will take longer. The 5th and 95th percentiles of the treatment effects from all the specifications are used as the model dependence bounds.
The Athey-Imbens procedure (Athey & Imbens, 2015) works by splitting covariates at points determined by cutpoints
or cutpoint.method
, fitting the base.form
model within each subset, averaging the two effect estimates, and computing the standard deviation of the resulting effect estimates. The base treatment effect estimate (fit in the full sample) minus and plus this standard deviation are used as the model dependence bounds. cutpoints
can be supplied or estimated for each covariate. For covariates with exactly two unique values, the data will be split according to those values. For covariates with more than two values, if a cutpoint is supplied, it will be used to split the dataset; otherwise, a cutpoint will be estimated as the mean or median of the covariate or using segmented regression as implemented in segmented::segmented()
depending on the argument to cutpoint.method
.
It is important to consider the interpretation of base.form
. If there is no confounding by the covariates (e.g., because of the result of a randomized experiment or successful matching), base.form
can specified simply as Y ~ A
, with model dependence referring to all the possible adjustment sets and their functional forms. If there is confounding and base.form
is specified as Y ~ A
(or with only a subset of the confounding covariates included in the model), then model dependence concerns different adjustment sets, and therefore muddles dependence on model specification and covariate choice, which is undesirable. base.form
should therefore contain all the covariates required to eliminate confounding, with the model dependence referring to the possible additional terms that could be added to the model to change its functional form. Put simply, the adjustment set should always be in base.form
, but after randomization or successful matching the adjustment set might be empty.
A modelDependenceBounds
object, which is numeric vector with the lower and upper model dependence bounds and attributes concerning the fit. The attribute "method"
will contain the method supplied to method
. The attribute base.est
will contain the effect estimated using base.form
with no modification. The attribute "estimates"
will contain the effect estimates used in computing the bounds. For the extreme bounds procedure, the attribute "sepcifications"
contains a list of the model specifications used, each as a string representing the model formula. This can be supplied to future calls to modelDependence()
or estimateEffects()
. For the Athey-Imbens procedure, the attribute "sigma"
contains the standard deviation of the effect estimates that, when added to and subtracted from the estimate resulting from the model specified in base.form
, forms the bounds.
The print()
method displays the bounds as well as their difference. Note that the difference is not actually stored in the modelDependenceBounds
object but can be computed using diff()
.
The plot()
method displays a histogram of the effect estimates when method = "extreme-bounds"
and a dot plot of the estimates when method = "athey-imbens"
. Red vertical lines indicate the reported bounds, and the blue vertical line indicates the base estimate. The resulting object is a ggplot
object that can be further manipulated using ggplot2 functionality.
Athey, S., & Imbens, G. (2015). A measure of robustness to misspecification. American Economic Review, 105(5), 476-80.
Leamer, E. E. (1983). Let's take the con out of econometrics. The American Economic Review, 73(1), 31-43.
library(MatchIt)
data("lalonde", package = "MatchIt")
#Model dependence prior to matching
b1 <- modelDependence(re78 ~ treat + age + educ + married +
race + nodegree + re74 + re75,
data = lalonde, treatment = "treat",
method = "extreme-bounds", seed = 12345)
b1 # diff = 823.8418
#> Extreme model dependence bounds:
#> lower upper diff
#> 885.3360 1668.0174 782.6814
plot(b1)
#Model dependence after matching
m.out <- matchit(treat ~ age + educ + married + race +
nodegree + re74 + re75, data = lalonde,
distance = "mahalanobis", replace = TRUE)
b2 <- modelDependence(re78 ~ treat + age + educ + married +
race + nodegree + re74 + re75,
data = lalonde, treatment = "treat",
method = "extreme-bounds", seed = 12345,
weights = m.out$weights)
b2 # diff = 684.6681
#> Extreme model dependence bounds:
#> lower upper diff
#> 63.83596 544.24800 480.41204
plot(b2)
#Athey-Imbens bounds using various cutpoints
# Here we use cutpoints of 100 for re74 and re75 and split
# other covariates at their means
b3 <- modelDependence(re78 ~ treat + age + educ + married +
race + nodegree + re74 + re75,
data = lalonde, treatment = "treat",
method = "athey-imbens",
cutpoints = list(re74 = 100, re75 = 100),
cutpoint.method = "mean")
b3
#> Athey-Imbens model dependence bounds:
#> lower upper diff
#> 1043.574 2052.914 1009.340
plot(b3)