research
I develop statistical machine learning and deep learning methodologies for modeling complex datasets with high dimensionality, multi-modality, and limited supervision. Most of these methods are motivated by biomedical applications but are generalizable to other application domains.
My research has three main directions:
- Machine Learning with limited supervision and knowledge integration
- Disentanglement and fusion of multi-modal/high-dimensional datasets
- Data mining and subgroup identification
1. Machine Learning with Limited Supervision and Knowledge Integration
Labeled data is often scarce in biomedical applications, leading to the challenge of how to learn with limited supervision. One common strategy is weakly supervised learning, where models are trained with incomplete or noisy labels. Another approach involves integrating domain knowledge, enhancing model performance by incorporating expert insights or external data sources into the learning process.
2. Disentanglement and Fusion of Multi-modal/High-dimensional Datasets
With advancements in technology, high-dimensional and multi-source data are increasingly being collected for biomedical applications, including imaging, genomics, clinical questionnaires, and Molecular Dynamics (MD) simulations. Learning from multi-modal datasets can leverage complementary information and lead to improved performance for prediction tasks.
Analyzing these datasets presents interesting challenges, primarily due to the limited availability of precise labels in biomedical contexts. Additionally, some datasets may have missing modalities for a portion of the samples; for instance, not all patients may have all imaging modalities collected due to accessibility or financial constraints. Moreover, complex datasets often contain a mix of signals influenced by environment constraints, obscuring the true patterns of interest to researchers. Models must effectively disentangle various sources of signals and fuse information from these datasets for predictive modeling and knowledge discovery.
3. Data Mining and Predictive Modelling
Data mining is a powerful technique for knowledge-discovery and information analysis from large, complex datasets. For example, we analyzed millions of tweets from social media to identify influencers in a social network. In healthcare, we mined large-scale medical claims to discover patients in similar risk groups for hospital readmission.