Research Statement
Artificial intelligence (AI) and statistics are two of the most transformative forces shaping modern data science. AI offers unprecedented power to extract patterns, generate realistic data, and model complex systems, while statistics provides the theoretical rigor and principled frameworks necessary for trustworthy, interpretable, and reproducible scientific discovery. The research interests in my lab lie at the intersection of these two fields, developing AI-powered methodologies that decipher the complex relationships hidden within massive biomedical datasets with the principled rigor of statistical science. Through this interdisciplinary research agenda, our lab aims at tackling fundamental challenges in computational biology and biomedical informatics.
Our overarching research goal in our lab is to create next-generation computational frameworks that are computationally efficient, theoretically sound, and powered by advances in generative AI. To this end, we have been developing novel frameworks to address challenges in high-dimensional data analysis, causal inference, and Bayesian computation, with a strong focus on impactful applications in computational biology. We work with diverse and complex datasets, from multiomics, pharmacogenomics to large-scale clinical data, to address pressing questions in biomedicine.
Highlights
AI-powered Frameworks for High-dimensional Data
High-dimensional data are now ubiquitous while traditional statistical methods may struggle to adapt to their complexity. We develop AI-powered computational frameworks that go beyond prediction, enabling robust inference and discovery. For example, our Encoding Generative Modeling (EGM) paradigm leverages generative AI to represent high-dimensional data in a structuralized mannifold that supports both flexible modeling and rigorous inference. These frameworks have been used to tackle problems ranging from density estimation (Liu et al PNAS. 2021), unsupervised learning (Liu et al Nature Machine Intelligence. 2021) to causal inference (Liu et al., PNAS. 2024; Liu et al., arXiv. 2025), leading to advances that have been recognized in international competitions (winner of NeurIPS 2021 Multimodal Single-Cell Data Integration competition) and adopted in real-world biomedical studies.
Developing Transformative AI Techniques in Computational Biology
Our methodology advances are closely tied to impactful applications in computational biology. We have developed AI-driven frameworks for modeling chromatin accessibility (Liu et al., Bioinformatics. 2017), 3D genome architecture (Liu et al., ISMB/Bioinformatics. 2019), multiomics data (Luo*, Liu* et al., bioRxiv. 2025), and predicting cancer drug response (Liu et al., ECCB/Bioinformatics. 2020). These open-source tools enable other researchers to reproduce, apply, and extend our methods. This commitment to open science allows our models to be widely used in both academic and industry settings.
Most recently, inspired by the transformative impact of large language models, we have pioneered genomic language models that learn the “grammar” of the genome. These models (Gao*, Liu* et al., Genome Biology. 2024; Liu et al., medRxiv 2024) promise to deepen our understanding of gene regulation and open the door to new discoveries in genomics, genetics, and precision medicine.
Vision
The next decades will witness data science advance not just through more powerful AI, but also through the principled integration of AI and statistics. By combining the flexibility and scale of modern AI techniques with the statistical rigor and interpretability, our lab is committed to building this bridge, with a dual focus on methodological innovation and scientific impact. We aim to create tools that are not only pratically powerful but also trustworthy, reproducible, and transformative for the biomedical sciences.