Lei Xiong
Postdoctoral Scholar
Genetics
Stanford University
My research interests are centered on the development and application of innovative deep learning approaches to solve complex biological questions, with a primary focus on advancing our understanding of gene regulation and cellular diversity. To achieve this goal, I build models that can effectively capture and interpret complex features from biological datasets, which provides new insights into fundamental biological processes. Through my work, I aim to drive significant advances in the field of single-cell multiomics and regulatory DNA sequence, which has the potential to contribute to a better understanding of human health and disease.
Feel free to reach out by email jsxlei at gmail.com or other social media listed above!
Education
Tsinghua University
Ph.D. in Computational Biology
Advisor: Prof.Qiangfeng Cliff Zhang
Thesis: Artificial intelligence method for single-cell ATAC-seq data via feature extraction
University of Science and Technology of China (USTC)
B.S. in Biology, Shitsan Pai Talent Program in Life Sciences
Advisor: Prof.Nieng Yan
Thesis: Structure basis and transport mechanism of membrane protein GLUT3
News
Awarded School of Medicine Dean’s Postdoctoral Fellowship, Stanford.
Join Prof.Anshul Kundaje lab in Genetics at Stanford, also co-supervised by Prof.Mike Bassik.
Our workshop paper scCLIP: Multi-modal Single-cell Contrastive Learning Integration Pre-training was presented on NeurIPS 2023 AI4Science workshop
Our work Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space was published on Nature Communications.
Join Prof.Manolis Kellis lab in CSAIL at MIT and Broad Institute of MIT and Harvard.
Awarded Outstanding Doctoral Dissertation Award of Tsinghua University.
Awarded Outstanding Graduate of Beijing.
PhD. thesis Defense.
Our work on SCALE method for single-cell ATAC-seq analysis via latent feature extraction was selected as Top 10 Advances in Bioinformatics in China in 2019 and Top 10 Algorithms and Tools for Bioinformatics in China in 2019 by Genomics, Proteomics & Bioinformatics.
Our work SCALE method for single-cell ATAC-seq analysis via latent feature extraction was published on Nature Communications.
Publications
2024
Deep learning modeling of ribosome profiling reveals regulatory underpinnings of translatome and interprets disease variants.
bioRxiv 2024.Gene expression involves transcription and translation. Despite large datasets and increasingly powerful methods devoted to calculating genetic variants’ effects on transcription, discrepancy between mRNA and protein levels hinders the systematic interpretation of the regulatory effects of disease-associated variants. Accurate models of the sequence determinants of translation are needed to close this gap and to interpret disease-associated variants that act on translation. Here, we present Translatomer, a multimodal transformer framework that predicts cell-type-specific translation from mRNA expression and gene sequence. We train Translatomer on 33 tissues and cell lines, and show that the inclusion of sequence substantially improves the prediction of ribosome profiling signal, indicating that Translatomer captures sequence-dependent translational regulatory information. Translatomer achieves accuracies of 0.72 to 0.80 for de novo prediction of cell-type-specific ribosome profiling. We develop an in silico mutagenesis tool to estimate mutational effects on translation and demonstrate that variants associated with translation regulation are evolutionarily constrained, both within the human population and across species. Notably, we identify cell-type-specific translational regulatory mechanisms independent of eQTLs for 3,041 non-coding and synonymous variants associated with complex diseases, including Alzheimer’s disease, schizophrenia, and congenital heart disease. Translatomer accurately models the genetic underpinnings of translation, bridging the gap between mRNA and protein levels, and providing valuable mechanistic insights toward mapping “missing regulation” in disease genetics. ### Competing Interest Statement The authors have declared no competing interest.
2023
scCLIP: Multi-modal Single-cell Contrastive Learning Integration Pre-training.
NeurIPS AI for Science workshop 2023.Recent advances in multi-modal single-cell sequencing technologies enable the simultaneous profiling of chromatin accessibility and transcriptome in individual cells. Integration analysis of multi-modal single-cell data offers a more comprehensive understanding of the regulatory mechanisms linking chromatin status and gene expression, driving cellular processes and diseases. In order to acquire features that align peaks and genes within the same embedding space and facilitate seamless zero-shot transfer to new data, we introduced scCLIP (single-cell Contrastive Learning Integration Pretraining), a generalized multi-modal transformer model with contrastive learning. We show that this model outperforms other competing methods, and beyond this, scCLIP learns transferable features across modalities and generalizes to unseen datasets, which pose the great potential to bridge the vast number of unpaired unimodal datasets both existing and new data generated in the future. Specifically, we propose the first large-scale transformer model designed for single-cell ATAC-seq data by patching peaks across the genomes and representing each patch as a token. This innovative approach enables us effectively to address the scalability challenges posed by scATAC-seq, even when dealing with datasets of up to one million dimensions. Codes are provided at: https://github.com/jsxlei/scCLIP.
2022
Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space.
Nat. Commun. 2022.Computational tools for integrative analyses of diverse single-cell experiments are facing formidable new challenges including dramatic increases in data scale, sample heterogeneity, and the need to informatively cross-reference new data with foundational datasets. Here, we present SCALEX, a deep-learning method that integrates single-cell data by projecting cells into a batch-invariant, common cell-embedding space in a truly online manner (i.e., without retraining the model). SCALEX substantially outperforms online iNMF and other state-of-the-art non-online integration methods on benchmark single-cell datasets of diverse modalities, (e.g., single-cell RNA sequencing, scRNA-seq, single-cell assay for transposase-accessible chromatin use sequencing, scATAC-seq), especially for datasets with partial overlaps, accurately aligning similar cell populations while retaining true biological differences. We showcase SCALEX’s advantages by constructing continuously expandable single-cell atlases for human, mouse, and COVID-19 patients, each assembled from diverse data sources and growing with every new data. The online data integration capacity and superior performance makes SCALEX particularly appropriate for large-scale single-cell applications to build upon previous scientific insights.CD127 imprints functional heterogeneity to diversify monocyte responses in inflammatory diseases.
J. Exp. Med. 2022.Inflammatory monocytes are key mediators of acute and chronic inflammation; yet, their functional diversity remains obscure. Single-cell transcriptome analyses of human inflammatory monocytes from COVID-19 and rheumatoid arthritis patients revealed a subset of cells positive for CD127, an IL-7 receptor subunit, and such positivity rendered otherwise inert monocytes responsive to IL-7. Active IL-7 signaling engaged epigenetically coupled, STAT5-coordinated transcriptional programs to restrain inflammatory gene expression, resulting in inverse correlation between CD127 expression and inflammatory phenotypes in a seemingly homogeneous monocyte population. In COVID-19 and rheumatoid arthritis, CD127 marked a subset of monocytes/macrophages that retained hypoinflammatory phenotypes within the highly inflammatory tissue environments. Furthermore, generation of an integrated expression atlas revealed unified features of human inflammatory monocytes across different diseases and different tissues, exemplified by those of the CD127high subset. Overall, we phenotypically and molecularly characterized CD127-imprinted functional heterogeneity of human inflammatory monocytes with direct relevance for inflammatory diseases.
2019
SCALE method for single-cell ATAC-seq analysis via latent feature extraction.
Nat. Commun. 2019.Single-cell ATAC-seq (scATAC-seq) profiles the chromatin accessibility landscape at single cell level, thus revealing cell-to-cell variability in gene regulation. However, the high dimensionality and sparsity of scATAC-seq data often complicate the analysis. Here, we introduce a method for analyzing scATAC-seq data, called Single-Cell ATAC-seq analysis via Latent feature Extraction (SCALE). SCALE combines a deep generative framework and a probabilistic Gaussian Mixture Model to learn latent features that accurately characterize scATAC-seq data. We validate SCALE on datasets generated on different platforms with different protocols, and having different overall data qualities. SCALE substantially outperforms the other tools in all aspects of scATAC-seq data analysis, including visualization, clustering, and denoising and imputation. Importantly, SCALE also generates interpretable features that directly link to cell populations, and can potentially reveal batch effects in scATAC-seq experiments.
2015
Molecular basis of ligand recognition and transport by glucose transporters.
Nature 2015.The major facilitator superfamily glucose transporters, exemplified by human GLUT1-4, have been central to the study of solute transport. Using lipidic cubic phase crystallization and microfocus X-ray diffraction, we determined the structure of human GLUT3 in complex with D-glucose at 1.5 Å resolution in an outward-occluded conformation. The high-resolution structure allows discrimination of both α- and β-anomers of D-glucose. Two additional structures of GLUT3 bound to the exofacial inhibitor maltose were obtained at 2.6 Å in the outward-open and 2.4 Å in the outward-occluded states. In all three structures, the ligands are predominantly coordinated by polar residues from the carboxy terminal domain. Conformational transition from outward-open to outward-occluded entails a prominent local rearrangement of the extracellular part of transmembrane segment TM7. Comparison of the outward-facing GLUT3 structures with the inward-open GLUT1 provides insights into the alternating access cycle for GLUTs, whereby the C-terminal domain provides the primary substrate-binding site and the amino-terminal domain undergoes rigid-body rotation with respect to the C-terminal domain. Our studies provide an important framework for the mechanistic and kinetic understanding of GLUTs and shed light on structure-guided ligand design.
Softwares
Honors and Awards
School of Medicine Dean’s Postdoctoral Fellowship, Stanford.
Outstanding Doctoral Dissertation of Tsinghua University (Top 5%).
Outstanding Graduate of Beijing (Top 5%).
Top 10 Advances in Bioinformatics in China in 2019.
Top 10 Algorithms and Tools for Bioinformatics in China in 2019.
Outstanding Fellowship, Advanced Innovation Center of Structure Biology, Tsinghua.
Innovation Fellowship, Advanced Innovation Center of Structural Biology, Tsinghua.
iGEM Gold Medal, USTC-China.
2012-2013 Student Scholarship, USTC.
2011-2012 Student Scholarship, USTC.
Freshman Scholarship, USTC.