Abstract

Spurious correlations are one of the biggest pain points for users of modern machine learning. To handle this issue, many approaches attempt to learn features that are causally linked to the prediction variable. Such techniques, however, suffer from various flaws—they are often prohibitively complex or based on heuristics and strong assumptions that may fail in practice. There is no onesize-fits-all causal feature identification approach. To address this challenge, we propose a simple way to fuse multiple noisy estimates of causal features. Our approach treats the underlying causal structure as a latent variable and exploits recent developments in estimating latent structures without any access to ground truth. In addition, our approach omnivorously integrates any source of causal signal. We propose new sources, including an automated way to extract causal insights from existing ontologies or foundation models. On multiple benchmark environmental shift datasets, our discovered features can train a model via vanilla empirical risk minimization that outperforms multiple baselines, including automated causal feature discovery techniques such as invariant risk minimization on three benchmark datasets.