# Abstracts

### Invited Talks

#### Causal Discovery with Latent Variables: the Measurement Problem

*Richard Scheines*

Causal discovery is challenging, even with large samples involving variables that are relatively easy to measure like Body Mass Index and Lung Capacity. As many scientific contexts involve quantities that are hard to measure, like Impulsiveness or Social Intelligence or Cumulative Exposure to Formaldehyde, it is important to understand how measurement challenges ramify into causal discovery challenges, and how we might use data to address these challenges. In this talk I give a simple account of how challenges in measurement influence causal discovery, overview a constraint based approach to how we might use data to help, and present some recent results that utilize this approach.

#### Discovering Dynamical Kinds

*Benjamin Jantzen*

Learning the causal structure of the natural world would be easier if one could know in advance whether two systems share a common causal dynamical structure, or in other words, whether the dynamics governing any two systems is of the same form. For example, a direct test of 'dynamical sameness' would allow one to pool data for learning an explicit model of the dynamics, validate complex models directly without an explicit model, and determine how many kinds of dynamical models are needed to explain the behavior of systems within a given domain. Developing such a test requires two things: (i) a precise theoretical account of what it means to share dynamical form, and (ii) an algorithm for applying that account. In this talk, I summarize how the theory of "dynamical kinds" meets the first requirement, and present algorithms that satisfy the second. Specifically, I present a robust algorithm for testing whether two deterministic systems are of the same dynamical kind on the basis of noisy samples. I then suggest how this algorithm can be extended to the more general case of stochastic causal systems describable by nonlinear structural equation models.

### Contributed Talks and Posters

#### Score-based vs Constraint-based Causal Learning in the Presence of Confounders

*Sofia Triantafillou and Ioannis Tsamardinos*

We compare score-based and constraint-based learning in the presence of latent confounders. We use a greedy search strategy to identify the best fitting maximal ancestral graph (MAG) from continuous data, under the assumption of multivariate normality. Scoring maximal ancestral graphs is based on (a) recursive iterative conditional fitting [Drton et al., 2009] for obtaining maximum likelihood estimates for the parameters of a given MAG and (b) factorization and score decomposition results for mixed causal graphs [Richardson, 2009, Nowzohour et al., 2015]. We compare the score-based approach in simulated settings with two standard constraintbased algorithms: FCI and conservative FCI. Results show a promising performance of the greedy search algorithm.

#### Causal Inference by Minimizing the Dual Norm of Bias: Kernel Matching & Weighting Estimators for Causal Effects

*Nathan Kallus*

We consider the problem of estimating causal effects from observational data and propose a novel framework for matching- and weighting-based causal estimators. The framework is based on expressing the bias of a causal estimator as an operator on the unknown conditional expectation function of outcomes and formulating the dual norm of the bias as the norm of this operator with respect to a function space that represents the potential structure for outcomes. We give the term worst-case bias minimizing (WCBM) to estimators that minimize this quantity for some function space and show that a great variety of existing causal estimators belong to this family, including one-to-one matching (with or without replacement), coarsened exact matching, and mean-matched sampling. We propose a range of new, kernel-based matching and weighting estimators that arise when one minimizes the dual norm of the bias with respect to a reproducing kernel Hilbert space. Depending on the case, these estimators can be solved either in closed form, using quadratic optimization, or using integer optimization. We show that estimators based on universal kernels are consistent for the causal effect. In numerical experiments, the new, kernel-based estimators outperform all standard causal estimators in estimation error, providing a successful balance between generality and efficiency.

#### Separating Sparse Signals from Correlated Noise in Binary Classification

*Stephan Mandt, Florian Wenzel, Shinichi Nakajima, Christoph Lippert and Marius Kloft*

Among the goals of statistical genetics is to find associations of genetic data with binary phenotypes, such as heritable diseases. Often, the data are obfuscated by confounders such as age, ethnicity, or population structure. Linear mixed models are linear regression models that correct for confounding by means of correlated label noise; they are widely appreciated in the field of statistical genetics. We generalize this modeling paradigm to binary classification, where we face the problem that marginalizing over the noise leads to an intractable, high-dimensional integral. We present a scalable, approximate inference algorithm that lets us fit the model to high-dimensional data sets. The algorithm selects features based on an `1-norm regularizer which are up to 40% less confounded compared to the outcomes of uncorrected feature selection, as we show. The proposed method also outperforms Gaussian process classification and uncorrelated probit regression in terms of prediction performance. In addition, we discuss ongoing work on employing stochastic gradient MCMC for this problem class.

#### Marginal Causal Consistency in Constraint-based Causal Learning

*Anna Roumpelaki, Giorgos Borboudakis, Sofia Triantafillou and Ioannis Tsamardinos*

Maximal Ancestral Graphs (MAGs) are probabilistic graphical models that can model the distribution and causal properties of a set of variables in the presence of latent confounders. They are closed under marginalization. Invariant pairwise features of a class of Markov equivalent MAGs can be learnt from observational data sets using the FCI algorithm and its variations (such as conservative FCI and order independent FCI). We investigate the consistency of causal features (causal ancestry relations) obtained by FCI in different marginals of a single data set. In principle, the causal relationships identified by FCI on a data set D measuring a set of variables V should not conflict the output of FCI on marginal data sets including only subsets of V. In practice, however, FCI is prone to error propagation, and running FCI in different marginals results in inconsistent causal predictions. We introduce the term of marginal causal consistency to denote the consistency of causal relationships when learning marginal distributions, and investigate the marginal causal consistency of different FCI variations.

#### Causal Inference for Recommendation

*Dawen Liang, Laurent Charlin and David Blei*

The goal of recommendation systems is to infer users’ preferences for items and then to predict items that users will like. We develop a causal inference approach to this problem. Here is the idea. Observational recommendation data contains two sources of information: which items each user decided to look at and which of those items each user liked. For example, one of the data sets we analyze contains which movies each user watched and which of them each liked; another contains which scientific abstracts each user saw and which PDFs each decided to download. We assume these two types of data come from different models—the exposure data comes from a model by which users discover items to consider; the click data comes from a model by which users decide which items they like. Traditionally, recommendation systems use the click data alone (or ratings data) to infer the user preferences. But this inference is biased by the exposure data, i.e., that users do not consider each item independently at random. We use causal inference to correct for this bias. First, we estimate the exposure model from the exposure data, a model of which items each user is likely to consider. Then we fit the preferences with weighted click data, where each click (or skip) is weighted by the inverse probability of exposure (from the exposure model). On three data sets, we demonstrate that causal inference for recommendation systems leads to improved generalization to new data.

#### Pairwise Cluster Comparison for Learning Latent Variable Models

*Nuaman Asbeh and Boaz Lerner*

Identification of latent variables that govern a problem and the relationships among them, given measurements in the observed world, are important for causal discovery. This identification can be accomplished by analyzing the constraints imposed by the latents in the measurements. We introduce the concept of pairwise cluster comparison (PCC) to identify causal relationships from clusters of data points and provide a two-stage algorithm called learning PCC (LPCC) that learns a latent variable model (LVM) using PCC. LPCC learns exogenous latents and latent colliders before also learning latent non-colliders, all together with their observed descendants, by using pairwise comparisons between data clusters in the measurement space that explain latent causes. If the true graph has no serial connections, LPCC returns the true graph, and if the true graph has at least a single serial connection, LPCC returns a pattern of the true graph. LPCC’s most important advantage is that it is not limited to linear or latent-tree models and makes only mild assumptions about the distribution. The code and evaluation results for synthetic and real domains are available on the authors’ webpage, and too technical details and all proofs are in a supplementary material file.

#### Split-door Criterion for Causal Identification: Natural Experiments with Testable Assumptions

*Amit Sharma, Jake Hofman and Duncan Watts*

Unobserved or unknown confounders complicate even the simplest attempts to estimate the effect of one variable on another using observational data. While there are a number of different approaches to eliminate confounds in the causal inference literature, each has its own set of assumptions, many of which are difficult to verify precisely because they involve statements about variables that, by definition, cannot be measured. In this paper we investigate a particular scenario that both permits causal identification in the presence of unobserved confounders and has explicitly testable assumptions stated only in terms of observable variables. Specifically, we examine what we call the Split-door setting, when the effect variable can be split up into two parts: one part that is potentially affected by the cause, and another other that is independent of it. We show that when both of these variables are caused by the same (unobserved) confounders, the problem of identification reduces to that of testing for independence among observed variables. We discuss various situations in which Split-door variables are commonly recorded in both online and offline settings, and demonstrate the method by estimating the causal impact of Amazon’s recommender system, obtaining similar—but more precise—estimates than past studies.

#### Validating Causal Models

*Dustin Tran, Francisco J. R. Ruiz, Susan Athey and David M. Blei*

The goal of causal inference is to understand the outcome of alternative courses of action. However, all causal inference requires assumptions—more so than for standard tasks in probabilistic modeling—and testing those assumptions is important to assess the validity of a causal model. We develop Bayesian model criticism for causal inference, building on the idea of posterior predictive checks to assess model fit. Our approach involves decomposing the problem, separately criticizing the model of treatment assignments and the model of outcomes. Further we discuss how and when we can check the central assumption of unconfoundedness, which enables causal statements from observational data. Our approach provides a foundation for diagnosing causal inferences from observational data.