Jingyi Jessica Li
Dr. Jingyi Jessica Li is a Professor in the Department of Biostatistics, UCLA Fielding School of Public Health. Her specific appointments are in the Department of Statistics (primary) and the Departments of Biostatistics, Computational Medicine, and Human Genetics (secondary) at University of California, Los Angeles (UCLA). She is also a faculty member in the Interdepartmental Ph.D. Program in Bioinformatics.
Before joining UCLA in 2013, Jessica obtained her Ph.D. degree from the Interdepartmental Group in Biostatistics at University of California, Berkeley, where she worked with Profs. Peter J. Bickel and Haiyan Huang. Jessica received her B.S. (summa cum laude) from the Department of Biological Sciences and Technology at Tsinghua University, China, in 2007. Jessica and her students focus on developing statistical and computational methods motivated by important questions in biomedical sciences and abundant information in big genomics and health-related data. On the statistical methodology side, her research interests include association measures, asymmetric classification, multiple testing, and high-dimensional variable selection. On the biomedical application side, her research interests include bulk and single-cell RNA sequencing, comparative genomics, and information flow in the central dogma. To bridge statistics and biology, Jessica is interested in enhancing the rigor in genomics data analysis.
Jessica is the recipient of the Hellman Fellowship (2015), the PhRMA Foundation Research Starter Grant in Informatics (2017), the Alfred P. Sloan Research Fellowship (2018), the Johnson & Johnson WiSTEM2D Math Scholar Award (2018), the NSF CAREER Award (2019), the UCLA DGSOM Keck W. M. Keck Foundation Junior Faculty Award (2020), the MIT Technology Review 35 Innovators Under 35 China (2020), the Harvard Radcliffe Fellowship (2022), and the COPSS Emerging Leader Award (2023).
Education
- PhD, Biostatistics, University of California, Berkeley, CA
- BS, Biological Sciences, Tsinghua University, Beijing, China
Areas of Interest
Dr. Li’s research is at the interface between statistics and biology. Her primary research interest lies in developing new statistical methods for understanding biological questions, especially those related to large-scale genomic and transcriptomic data. The specific topics she has examined include:
Bioinformatics / Statistical Genomics:
- Enhancing the rigor of data analysis
- Development of realistic generative models
- Statistical methods for analyzing next-generation bulk and single-cell RNA sequencing data
- Using statistics to quantitate the Central Dogma, a fundamental principle in molecular biology
- Comparative genomics: developing novel statistical methods to investigate conserved or divergent biological phenomena in different tissue and cell types across multiple species
- Novel statistical methods for imputing missing data or extracting hidden information from various types of genomics data
- Identification of gene-gene co-expression and protein-DNA and protein-RNA interactions using diverse genomic data
Statistics:
- Measures of association
- Neyman-Pearson classification that controls the prioritized type of error in binary classification
- High-dimensional linear model inference and variable selection
- Community detection in a bipartite network with node covariates
- P-value free control of false discovery rates
- Labeling ambiguity issue in multi-class classification
Selected Publications
Google Scholar Profile
Statistical rigor in omics data analysis
-
Zhou, H.J., Li, L., Li, Y., Li, W., and Li, J.J. (2022). PCA outperforms popular hidden variable inference methods for QTL mapping. Genome Biology 23:210. [ SOFTWARE ] [ PDF ]
-
Li, Y.*, Ge, X.*, Peng, F., Li, W., and Li, J.J. (2022). Exaggerated false positives by popular differential expression methods when analyzing human population samples. Genome Biology 23:79. [ CODE ] | [ PDF ]
-
Ge, X.*, Chen, Y.E.*, Song, D., McDermott, M., Woyshner, K., Manousopoulou, A., Wang, N., Li, W., Wang, L.D., and Li, J.J. (2021). Clipper: p-value-free FDR control on high-throughput data from two conditions. Genome Biology 22:288. [ UCLA News ] [ SOFTWARE ] [ CODE ] [ VIDEO ] | [ PDF ]
-
Li, J.J. and Tong, X. (2020). Statistical hypothesis testing versus machine-learning binary classification: distinctions and guidelines. Patterns 1(7):110115. [ UCLA News ] | [ PDF ]
Single-cell RNA-seq
-
Jiang, R., Sun, T., Song, D., and Li, J.J. (2022). Statistics or biology: the zero-inflation controversy about scRNA-seq data. Genome Biology 23:31. [ CODE ] | [ PDF ]
-
Song, D., Li, K., Hemminger, Z., Wollman, R., and Li, J.J. (2021). scPNMF: sparse gene encoding of single cells to facilitate gene selection for targeted gene profiling. Bioinformatics 37(Supplement_1):i358-i366. [ ISMB/ECCB 2021 ] [ SOFTWARE ] | [ PDF ]
-
Sun, T., Song, D., Li, W.V., and Li, J.J. (2021). scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured. Genome Biology 22:163. [ RECOMB 2021 ] [ SOFTWARE ] [ CODE ] | [ PDF ]
-
Song, D. and Li, J.J. (2021). PseudotimeDE: inference of differential gene expression along cell pseudotime with well-calibrated p-values from single-cell RNA sequencing data. Genome Biology 22:124. [ SOFTWARE ] [ CODE ] | [ PDF ]
-
Xi, N.M. and Li, J.J. (2021). Benchmarking computational doublet-detection methods for single-cell RNA sequencing data. Cell Systems 12:1-19. [ CODE ] [ DATA ] | [ PDF ]
-
Li, W.V. and Li, J.J. (2019). A statistical simulator scDesign for rational scRNA-seq experimental design. Bioinformatics 35(14):i41–i50. [ ISMB/ECCB 2019 ] [ SOFTWARE ] | [ PDF ]
-
Li, W.V. and Li, J.J. (2018). An accurate and robust imputation method scImpute for single-cell RNA-seq data. Nature Communications 9:997. [ UCLA News ] [ SOFTWARE ] | [ PDF ]
Bulk RNA-seq isoform discovery and quantification
-
Li, W.V.*, Li, S.*, Tong, X., Deng, L., Shi, H., and Li, J.J. (2019). AIDE: annotation-assisted isoform discovery with high precision. Genome Research 29:2056-2072. [ SOFTWARE ] [ COVER ART ] [ UCLA News ] | [ PDF ]
-
Li, W.V. and Li, J.J. (2018). Modeling and analysis of RNA-seq data: a review from a statistical perspective. Quantitative Biology 6(3):195-209. | [ PDF ]
-
Li, W.V.*, Zhao, A., Zhang, S., and Li, J.J.* (2018). MSIQ: joint modeling of multiple RNA-seq samples for accurate isoform quantification. Annals of Applied Statistics 12(1):510-539. [ SOFTWARE ] [ COLOR PDF ] | [ PDF ]
-
Ye, Y. and Li, J.J. (2016). NMFP: a non-negative matrix factorization based preselection method to increase accuracy of identifying mRNA isoforms from RNA-seq data. BMC Genomics 17(Supp 1):11. [ SOFTWARE ] | [ PDF ]
-
Li, J.J., Jiang, C.-R., Brown, B.J., Huang, H., and Bickel, P.J. (2011). Sparse linear modeling of RNA-seq data for isoform discovery and abundance estimation. Proc Natl Acad Sci. USA 108(50):19867-19872. [ SOFTWARE ] | [ PDF ]
Central dogma and translational control
-
Li, J.J., Chew, G.-L., and Biggin, M.D. (2019). Quantitative principles of cis-translational control by general mRNA sequence features in eukaryotes. Genome Biology 20:162. [ CODE ] | [ PDF ]
-
Li, J.J., Chew, G.-L., and Biggin, M.D. (2017). Quantitating translational control: mRNA abundance-dependent and independent contributions and the mRNA sequences that specify them. Nucleic Acids Research 45(20):11821-11836. [ Highlight talk at RECOMB ] | [ PDF ]
-
Li, J.J. and Biggin, M.D. (2015). Statistics requantitates the central dogma. Science 347(6226):1066-1067. [ UCLA News ] [ Interview at Significance 12(3):8 ] | [ PDF ]
-
Li, J.J., Bickel, P.B., and Biggin, M.D. (2014). System wide analyses have underestimated protein abundances and transcriptional importance in animals. PeerJ 2:e270. [ Press release ] [ Guest post on "Bits of DNA" blog ] [ PeerJ Picks 2015" Collection ] [ Top Bioinformatics Papers - June 2015" Collection ] [ Top 5 most cited PeerJ articles ] | [ PDF ]
Classification methodologies and applications
-
Zhang, C., Chen, Y.E., Zhang, S., and Li, J.J. (2021). Information-theoretic classification accuracy: a criterion that guides data-driven combination of ambiguous outcome labels in multi-class classification. Journal of Machine Learning Research accepted. [ CODE ]
-
Li, J.J., Chen, Y.E., and Tong, X. (2021). A flexible model-free prediction-based framework for feature ranking. Journal of Machine Learning Research 22(124):1-54. [ SOFTWARE ] | [ PDF ]
-
Lyu, J.*, Li, J.J.*, Su, J., Peng, F., Chen, Y.E., Ge, X., and Li, W. (2020). DORGE: Discovery of Oncogenes and tumor suppressoR genes using Genetic and Epigenetic features. Science Advances 6(46):eaba6784. [ VIDEO ] | [ PDF ]
-
Tong, X.*, Feng, Y.*, and Li, J.J. (2018). Neyman-Pearson classification algorithms and NP receiver operating characteristics. Science Advances 4(2):eaao1659. [ SOFTWARE ] [ VIDEO ] [ Francis X. Diebold's Blog on NP Classification ] | [ PDF ]
Microbiome sequencing data imputation
-
Jiang, R., Li, W.V., and Li, J.J. (2021). mbImpute: an accurate and robust imputation method for microbiome data. Genome Biology 22:192. [ SOFTWARE ] | [ PDF ]
Networks
-
Wang, Y.X.R., Li, L., Li, J.J., and Huang, H. (2021). Network modeling in biology: statistical methods for gene and brain networks. Statistical Science 36(1):89-108. | [ PDF ]
-
Sun, Y.E., Zhou, H.J., and Li, J.J. (2020). Bipartite tight spectral clustering (BiTSC) algorithm for identifying conserved gene co-clusters in two species. Bioinformatics 37(9):1225-1233. [ SOFTWARE ] | [ PDF ]
-
Razaee, Z.S., Amini, A.A., and Li, J.J. (2019). Matched bipartite block model with covariates. Journal of Machine Learning Research 20(34):1-44. | [ PDF ]
High-dimensional model inference
-
Liu, H., Xu, X., and Li, J.J. (2020). A bootstrap lasso + partial ridge method to construct confidence intervals for parameters in high-dimensional sparse linear models. Statistica Sinica 30:1333-1355. [ SOFTWARE ] | [ PDF ]
Comparative genomics
-
Ge, X.*, Zhang, H.*, Xie, L., Li, W.V., Kwon, S.B., and Li, J.J. (2019). EpiAlign: an alignment-based bioinformatic tool for comparing chromatin state sequences. Nucleic Acids Research 47(13):e77. [ SOFTWARE ] [ WEBSITE ] | [ PDF ]
-
Duong, D., Ahmad, W.U., Eskin, E., Chang, K.-W., and Li, J.J. (2019). Word and sentence embedding tools to measure semantic similarity of Gene Ontology terms by their definitions. Journal of Computational Biology 26(1):38-52. [ SOFTWARE ] | [ PDF ]
-
Li, W.V., Chen, Y., and Li, J.J. (2017). TROM: a testing-based method for finding transcriptomic similarity of biological samples. Statistics in Biosciences 9(1):105-136. [ SOFTWARE ] | [ PDF ]
-
Gao, R. and Li, J.J. (2017). Correspondence of D. melanogaster and C. elegans developmental stages revealed by alternative splicing characteristics of conserved exons. BMC Genomics 18:234. | [ PDF ]
-
Yang, Y.*, Yang, Y.T.*, Yuan, J., Lu, Z.J., and Li, J.J. (2017). Large-scale mapping of mammalian transcriptomes identifies conserved genes associated with different cell states. Nucleic Acids Research 45(4):1657-1672. [ DATA ] | [ PDF ]
-
Li, W.V., Razaee, Z.S., and Li, J.J. (2016). Epigenome overlap measure (EPOM) for comparing tissue/cell types based on chromatin states. BMC Genomics 17(Supp 1):10. [ SOFTWARE ] | [ PDF ]
-
Gerstein, M.B.*, Rozowsky, J.*, Yan, K.K.*, Wang, D.*, Cheng, C.*, Brown, J.B.*, Davis, C.A.*, Hillier, L*, Sisu, C.*, Li, J.J.*, Pei, B.*, Harmanci, A.O.*, Duff, M.O.*, Djebali, S.*, and 82 other authors from the modENCODE consortium (2014). Comparative analysis of the transcriptome across distant species. Nature 512(7515):445-448. [ NIH news ] | [ PDF ]
-
Li, J.J., Huang, H., Bickel, P.B., and Brenner, S.E. (2014). Comparison of D. melanogaster and C. elegans developmental stages, tissues, and cells by modENCODE RNA-seq data. Genome Research 24(7):1086-1101. [ Press release ] [ Top 10 papers selected at the 2014 RECOMB/ISCB Conference on Regulatory & Systems Genomics ] [ DATA ] [ SOFTWARE ] | [ PDF ]
Gene regulation
-
MacArthur, S.*, Li, X.Y.*, Li, J.*, Brown, J.B., Chu, H.C., Zeng, L., Grondona, B.P., Hechmer, A., Simirenko, L., Keranen, S.V., Knowles, D.W., Stapleton, M., Bickel, P., Biggin, M.D., and Eisen, M.B. (2009). Developmental roles of 21 Drosophila transcription factors are determined by quantitative differences in binding to an overlapping set of thousands of genomic regions. Genome Biology 10:R80. [ Faculty of 1000 recommendation ] | [ PDF ]