My research focuses on developing novel statistical methods for big data, meaning extremely large data sets with complex structures and high dimensional n, p, or both. High dimensional inference refers to the uncertainty measures of the statistical models, including asymptotic convergence, confidence intervals and hypothesis testing, which possesses unique challenges that have been drawing substantial research attention in recent years. My works aim to simultaneously give accurate and robust point estimation and inferences for the regression parameters when allowing the number of predictors to be much larger than the sample size. Our methods have seen prominent success in applications to cancer studies with different types of genomic predictors.
I have been working on a variety of research topics via collaborations. I have worked with scholars in Biomedical Informatics from Northwestern University on microRNA expression data from an air pollution study conducted in Beijing, China. I have also worked on single cell sequencing data, which is a classification/clustering problem with the challenge of the highly sparse reads of the single cells.
I am experienced in survival analysis, through multiple projects with national level health data concerning patients and health centers, for examples, patients’ survival after kidney transplantation and quality evaluations of nationwide dialysis facilities.