BGM-IV: an AI-powered Bayesian generative modeling approach for instrumental variable analysis

Abstract

Instrumental-variable (IV) regression enables causal estimation under endogeneity, but modern IV problems often involve nonlinear structural effects and high-dimensional covariates. Existing nonlinear IV methods directly learn the causal relation in observed feature space or rely on learned representations within two-stage or moment-based procedures, which can struggle when the causal information is embedded in a high-dimensional representation. We propose BGM-IV, a latent Bayesian generative modeling approach that reframes nonlinear IV regression as posterior inference in a causally structured latent space. BGM-IV infers latent components that separately capture shared confounding structure, outcome-specific variation, treatment-specific variation, and covariate-only nuisance information. To account for endogeneity, BGM-IV replaces the confounded outcome likelihood with an IV-integrated pseudo-likelihood that averages over instrument-induced treatment values within the latent model. Across various benchmark datasets, BGM-IV remains competitive in the classical low-dimensional regime and performs best in high-dimensional covariate regimes. Together, these results show that structured latent generative modeling provides a principled and effective strategy to nonlinear IV estimation with rich covariates.

Publication
arXiv preprint arXiv:2605.07029
Guyue Luo
Guyue Luo
Master student

Guyue Luo is an M.S. candidate in Biostatistics (Data Science Pathway) at Yale University, advised by Prof. Qiao Liu. His research centers on causal inference and Causal AI/ML. He is also interested in marketplace applications of causal methods and small-sample learning. Outside of research, he enjoys hiking and is training for the Grand Teton Crest Trail.

Qiao Liu
Qiao Liu
Assistant Professor of Biostatistics

My research interests include generative AI, high-dimensional data analysis, and computational biology.