Fredrik Sävje
Fredrik Savje
 Assistant Professor
 Political Science and Statistics & Data Science
 Yale University
Working papers

Unbiased and consistent variance estimators generally do not exist for designbased treatment effect estimators because experimenters never observe more than one potential outcome for any unit. The problem is exacerbated by interference and complex experimental designs. In this paper, we consider variance estimation for linear treatment effect estimators under interference and arbitrary experimental designs. Experimenters must accept conservative estimators in this setting, but they can strive to minimize the conservativeness. We show that this task can be interpreted as an optimization problem in which one aims to find the lowest estimable upper bound of the true variance given one's risk preference and knowledge of the potential outcomes. We characterize the set of admissible bounds in the class of quadratic forms, and we demonstrate that the optimization problem is a convex program for many natural objectives. This allows experimenters to construct less conservative variance estimators, making inferences about treatment effects more informative. The resulting estimators are guaranteed to be conservative regardless of whether the background knowledge used to construct the bound is correct, but the estimators are less conservative if the knowledge is reasonably accurate.

The regression discontinuity (RD) design offers identification of causal effects under weak assumptions, earning it the position as a standard method in modern political science research. But identification does not necessarily imply that the causal effects can be estimated accurately with limited data. In this paper, we highlight that estimation is particularly challenging with the RD design and investigate how these challenges manifest themselves in the empirical literature. We collect all RDbased findings published in top political science journals from 20092018. The findings exhibit pathological features; estimates tend to bunch just above the conventional level of statistical significance. A reanalysis of all studies with available data suggests that researcher's discretion is not a major driver of these pathological features, but researchers tend to use inappropriate methods for inference, rendering standard errors artificially small. A retrospective power analysis reveals that most of these studies were underpowered to detect all but large effects. The issues we uncover, combined with welldocumented selection pressures in academic publishing, cause concern that many published findings using the RD design are exaggerated, if not entirely spurious.

We argue that randomized controlled trials (RCTs) are special even among settings where average treatment effects are identified by a nonparametric unconfoundedness assumption. This claim follows from two results of Robins and Ritov (1997): (1) with at least one continuous covariate control, no estimator of the average treatment effect exists which is uniformly consistent without further assumptions, (2) knowledge of the propensity score yields a uniformly consistent estimator and honest confidence intervals that shrink at parametric rates with increasing sample size, regardless of how complicated the propensity score function is. We emphasize the latter point, and note that successfullyconducted RCTs provide knowledge of the propensity score to the researcher. We discuss modern developments in covariate adjustment for RCTs, noting that statistical models and machine learning methods can be used to improve efficiency while preserving finite sample unbiasedness. We conclude that statistical inference has the potential to be fundamentally more difficult in observational settings than it is in RCTs, even when all confounders are measured.

In a bipartite experiment, units that are assigned treatments differ from the units for which we measure outcomes. The two groups of units are connected by a bipartite graph, governing how the treated units can affect the outcome units. Often motivated by experiments in marketplaces, the bipartite experimental framework has been used for example to investigate the causal effects of supplyside changes on demandside behavior. In this paper, we consider the problem of estimating the average total treatment effect in the bipartite experimental framework under a linear exposureresponse model. We introduce the Exposure Reweighted Linear (ERL) Estimator, an unbiased linear estimator of the average treatment effect in this setting. We show that the estimator is consistent and asymptotically normal, provided that the bipartite graph is sufficiently sparse. We derive a variance estimator which facilitates confidence intervals based on a normal approximation. In addition, we introduce ExposureDesign, a clusterbased design which aims to increase the precision of the ERL estimator by realizing desirable exposure distributions. Finally, we demonstrate the effectiveness of the described estimator and design with an application using a publicly available Amazon useritem review graph.

The paper introduces a class of experimental designs that allows experimenters to control the robustness and efficiency of their experiments. The designs build on a recently introduced algorithm in discrepancy theory, the GramSchmidt walk. We provide a tight analysis of this algorithm, allowing us to prove important properties of the designs it produces. These designs aim to simultaneously balance all linear functions of the covariates, and the variance of an estimator of the average treatment effect is shown to be bounded by a quantity that is proportional to the loss function of a ridge regression of the potential outcomes on the covariates. No regression is actually conducted, and one may see the procedure as regression adjustment by design. The class of designs is parameterized so to give experimenters control over the worse case performance of the treatment effect estimator. Greater covariate balance is attained by allowing for a less robust design in terms of worst case variance. We argue that the tradeoff between robustness and efficiency is an inherent aspect of experimental design. Finally, we provide nonasymptotic tail bounds for the treatment effect estimator under the class of designs we describe.

Exposure mappings facilitate investigations of complex causal effects when units interact in experiments. Current methods assume that the exposures are correctly specified, but such an assumption cannot be verified, and its validity is often questionable. This paper describes conditions under which one can draw inferences about exposure effects when the exposures are misspecified. The main result is a proof of consistency under mild conditions on the errors introduced by the misspecification. The rate of convergence is determined by the dependence between units' specification errors, and consistency is achieved even if the errors are large as long as they are sufficiently weakly dependent. In other words, exposure effects can be precisely estimated also under misspecification as long as the units' exposures are not misspecified in the same way. The limiting distribution of the estimator is discussed. Asymptotic normality is achieved under stronger conditions than those needed for consistency. Similar conditions also facilitate conservative variance estimation.

Recent studies of the effects of political incumbency on election outcomes have almost exclusively used regression discontinuity designs. This shift from the past methods has provided credible identification, but only for a specific type of incumbency effect: the effect for parties. The other effects in the literature, most notably the personal incumbency effect, have largely been abandoned together with the methods previously used to estimate them. This study aims at connecting the new methodical strides with the effects discussed in the past literature. A causal model is first introduced which allows for formal definitions of several effects that previously only been discussed informally. The model also allows previous methods to be revisited and derive how their estimated effects are related. Several strategies are then introduced which, under suitable assumptions, can identify some of the newly defined effects. Last, using these strategies, the incumbency effects in Brazilian mayoral elections are investigated.
Recent publications

Biometrika (2021), in print.
The paper shows that matching without replacement on propensity scores produces estimators that generally are inconsistent for the average treatment effect of the treated. To achieve consistency, practitioners must either assume that no units exist with propensity scores greater than onehalf or assume that there is no confounding among such units. The result is not driven by the use of propensity scores, and similar artifacts arise when matching on other scores as long as it is without replacement.

Political Analysis (2021), 29(4), 423–447.
Matching is a conceptually straightforward method to make groups of units comparable on observed characteristics. The method is, however, limited to settings where the study design is simple and the sample is moderately sized. We illustrate these limitations by asking what the causal effects would have been if a largescale voter mobilization experiment that took place in Michigan for the 2006 election were scaled up to the full population of registered voters. Matching could help us answer this question, but no existing matching method can accommodate the six treatment arms and the 6,762,701 observations involved in the study. To offer a solution for this and similar empirical problems, we introduce a generalization of the full matching method that can be used with any number of treatment conditions and complex compositional constraints. The associated algorithm produces nearoptimal matchings; the worstcase maximum withingroup dissimilarity is guaranteed to be no more than four times greater than the optimal solution, and simulation results indicate that it comes considerably closer to the optimal solution on average. The algorithm’s ability to balance the treatment groups does not sacrifice speed, and it uses little memory, terminating in linearithmic time using linear space. This enables investigators to construct wellperforming matchings within minutes even in complex studies with samples of several million units.

Annals of Statistics (2021), 49(2), 673–701.
We investigate largesample properties of treatment effect estimators under unknown interference in randomized experiments. The inferential target is a generalization of the average treatment effect estimand that marginalizes over potential spillover effects. We show that estimators commonly used to estimate treatment effects under no interference are consistent for the generalized estimand for several common experimental designs under limited but otherwise arbitrary and unknown interference. The rates of convergence depend on the growth rate of the unitaverage amount of interference and the degree to which the interference aligns with dependencies in treatment assignment. Importantly for practitioners, the results imply that even if one erroneously assumes that units do not interfere in a setting with moderate interference, standard estimators are nevertheless likely to be close to an average treatment effect if the sample is sufficiently large. Conventional confidence statements may, however, not be accurate.

Statistical Science (2020), 35(3), 356–360.

Journal of Statistical Planning and Inference (2020), 207, 190–197.
We extend current concentration results for the HorvitzThompson estimator in finite population settings. The estimator is demonstrated to converge in quadratic mean to its target under weaker and more general conditions than previously known. Specifically, we do not require that the variables of interest nor the normalized inclusion probabilities are bounded. Rates of convergence are provided.

Journal of the American Statistical Association (2020), 115(529), 482–485.
Software

Julia package with a fast implementation of the GramSchmidt Walk for balancing covariates in randomized experiments (also R wrapper).

R package with tools for distance metrics.

Quick Generalized Full Matching in R.

Quick Threshold Blocking in R.

C library for sizeconstrained clustering.
Last updated December 6, 2021.