publications
2024
- Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis: A reviewLingchao Mao, Hairong Wang, Leland S Hu, and 4 more authorsarXiv preprint arXiv:2401.06406, 2024
Cancer remains one of the most challenging diseases to treat in the medical field. Machine learning has enabled in-depth analysis of rich multi-omics profiles and medical imaging for cancer diagnosis and prognosis. Despite these advancements, machine learning models face challenges stemming from limited labeled sample sizes, the intricate interplay of high-dimensionality data types, the inherent heterogeneity observed among patients and within tumors, and concerns about interpretability and consistency with existing biomedical knowledge. One approach to surmount these challenges is to integrate biomedical knowledge into data-driven models, which has proven potential to improve the accuracy, robustness, and interpretability of model results. Here, we review the state-of-the-art machine learning studies that adopted the fusion of biomedical knowledge and data, termed knowledge-informed machine learning, for cancer diagnosis and prognosis. Emphasizing the properties inherent in four primary data types including clinical, imaging, molecular, and treatment data, we highlight modeling considerations relevant to these contexts. We provide an overview of diverse forms of knowledge representation and current strategies of knowledge integration into machine learning pipelines with concrete examples. We conclude the review article by discussing future directions to advance cancer research through knowledge-informed machine learning.
- A Holistic Weakly Supervised Approach for Liver Tumor SegmentationHairong Wang, Lingchao Mao, Zihan Zhang, and 1 more authorarXiv preprint arXiv:2410.10005, 2024
Liver cancer is a leading cause of mortality worldwide, and accurate CT-based tumor segmentation is essential for diagnosis and treatment. Manual delineation is time-intensive, prone to variability, and highlights the need for reliable automation. While deep learning has shown promise for automated liver segmentation, precise liver tumor segmentation remains challenging due to the heterogeneous nature of tumors, imprecise tumor margins, and limited labeled data. We present a novel holistic weakly supervised framework that integrates clinical knowledge to address these challenges with (1) A knowledge-informed label smoothing technique that leverages clinical data to generate smooth labels, which regularizes model training reducing the risk of overfitting and enhancing model performance; (2) A global and local-view segmentation framework, breaking down the task into two simpler sub-tasks, allowing optimized preprocessing and training for each; and (3) Pre- and post-processing pipelines customized to the challenges of each subtask, which enhances tumor visibility and refines tumor boundaries. We evaluated the proposed method on the HCC-TACE-Seg dataset and showed that these three key components complementarily contribute to the improved performance. Lastly, we prototyped a tool for automated liver tumor segmentation and diagnosis summary generation called MedAssistLiver. The app and code are published at https://github.com/lingchm/medassistliver-cancer.
- A Cross-Modal Mutual Knowledge Distillation Framework for Alzheimer’s Disease: Addressing Incomplete ModalitiesMingu Kwak, Lingchao Mao, Zhiyang Zheng, and 3 more authorsIEEE Transactions on Automation Science and Engineering (major revision), 2024
Early detection of Alzheimer’s Disease (AD) is crucial for timely interventions and optimizing treatment outcomes. Despite the promise of integrating multimodal neuroimages such as MRI and PET, handling datasets with incomplete modalities remains under-researched. This phenomenon, however, is common in real-world scenarios as not every patient has all modalities due to practical constraints such as cost, access, and safety concerns. We propose a deep learning framework employing cross-modal Mutual Knowledge Distillation (MKD) to model different sub-cohorts of patients based on their available modalities. In MKD, the multimodal model (e.g., MRI and PET) serves as a teacher, while the single-modality model (e.g., MRI only) is the student. Our MKD framework features three components: a Modality-Disentangling Teacher (MDT) model designed through information disentanglement, a student model that learns from classification errors and MDT’s knowledge, and the teacher model enhanced via distilling the student’s single-modal feature extraction capabilities. Moreover, we show the effectiveness of the proposed method through theoretical analysis and validate its performance with simulation studies. In addition, our method is demonstrated through a case study with Alzheimer’s Disease Neuroimaging Initiative (ADNI) datasets, underscoring the potential of artificial intelligence in addressing incomplete multimodal neuroimaging datasets and advancing early AD detection.
- Supervised Multi-Modal Fission LearningLingchao Mao, Qi Wang, Yi Su, and 2 more authorsarXiv preprint arXiv:2409.20559, 2024
Learning from multi-modal datasets can leverage complementary information and lead to improved performance for prediction tasks. To account for feature correlations in high-dimensional datasets, a commonly used strategy is the latent variable approach. Several latent variable methods in the literature have been proposed for multi-modal datasets, however, these methods either focus on extracting the shared component across all modalities or extracting a shared component and individual components specific to each modality. To address this gap, we propose a Multi-Modal Fission Learning (MMFL) model that simultaneously identifies globally joint, partially joint, and individual components underlying the features of multi-modal datasets. Unlike existing latent variable methods, MMFL uses supervision from labels to identify predictive latent components and has a natural extension to incorporate incomplete multi-modal data. In our simulation studies, MMFL outperformed a variety of existing multi-modal algorithms under both complete modality and incomplete modality settings. We applied MMFL to a real-world case study for early prediction of Alzheimer’s Disease using multi-modal neuroimaging (MRI and PET) and genetic data (SNP) from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset. MMFL provided more accurate predictions and better insights for understanding within and across modality correlations compared to existing methods.
- Identifying and Predicting Headache Trajectories Amongst Those with Acute Post-Traumatic HeadacheLingchao Mao, Jing Li, Todd Schwedt, and 5 more authorsHeadache: The Journal of Head and Face Pain (under review), 2024
Background: Post-traumatic headache (PTH) is a common symptom following mild traumatic brain injury (mTBI). Currently, there is no identified way to accurately predict if, when, and at what pace a person will have PTH improvement. In our prior studies, we focused on predicting headache improvement at three months post-mTBI. However, that approach may overlook individual differences in how headaches evolve over time. Objective: This study aims to identify individual subgroups based on their headache trajectories and to develop machine learning (ML) models for early prediction of headache evolution. Methods: Participants with acute PTH completed a daily electronic headache diary (eDIARY) over three months, recording their headache-related symptoms. Tensor decomposition was utilized to extract latent factors underlying the time-varying symptoms. We applied clustering techniques on the latent factors to identify patient subgroups with varying headache improvement trajectory. Next, we developed a ML method to classify each individual into a headache trajectory subgroup as early as possible within the three-month interval. Results: 73 individuals with acute PTH (mean age=44.8, SD=14.0; 50 females/23 males) were enrolled between 0-59 days post-mTBI. Data from 54 individuals were used as the developmental cohort for model training and 19 individuals were used as the test cohort for model evaluation. Tensor decomposition extracted two latent factors: one factor representing the ‘overall state’ of PTH severity and disability and the other representing the ‘improvement state’ of these symptoms over three months. Clustering identified four patient subgroups with distinct headache evolution trajectories: 1) severe symptoms without improvement, 2) severe symptoms with mild improvement, 3) milder symptoms with substantial improvement, 4) mildest symptoms with minimal improvement .The proposed ML model achieved 0.80 cross-validation accuracy in classifying individuals with PTH into subgroups for the training cohort and 0.84 accuracy for the test cohort. Notably, the model required only the first two weeks of headache data to accurately identify the subgroup with the mildest headaches, three additional weeks to identify the subgroup with the most severe headaches and no improvement in three months, and two additional weeks to distinguish the remaining subgroups. Conclusions: This study identified subgroups of individuals with acute PTH with distinct headache improvement trajectories. The proposed ML method accurately classified individuals into these subgroups using the minimally necessary early headache data for each person, including detecting the subgroup with the mildest headaches at two weeks. This approach could offer an estimated forecast of headache burden over time and could assist clinicians with determining treatment needs and eligibility for PTH clinical trials.
- Revealing the biology behind MRI signatures in high grade gliomaErika M Lewis, Lingchao Mao, Lujia Wang, and 6 more authorsmedRxiv, 2024
Magnetic resonance imaging (MRI) measurements are routinely collected during the treatment of high-grade gliomas (HGGs) to characterize tumor boundaries and guide surgical tumor resection. Using spatially matched MRI and transcriptomics we discovered HGG tumor biology captured by MRI measurements. We strategically overlaid the spatially matched omics characterizations onto a pre-existing transcriptional map of glioblastoma multiforme (GBM) to enhance the robustness of our analyses. We discovered that T1+C measurements, designed to capture vasculature and blood brain barrier (BBB) breakdown and subsequent contrast extravasation, also indirectly reveal immune cell infiltration. The disruption of the vasculature and BBB within the tumor creates a permissive infiltrative environment that enables the transmigration of anti-inflammatory macrophages into tumors. These relationships were validated through histology and enrichment of genes associated with immune cell transmigration and proliferation. Additionally, T2-weighted (T2W) and mean diffusivity (MD) measurements were associated with angiogenesis and validated using histology and enrichment of genes involved in neovascularization. Furthermore, we establish an unbiased approach for identifying additional linkages between MRI measurements and tumor biology in future studies, particularly with the integration of novel MRI techniques. Lastly, we illustrated how noninvasive MRI can be used to map HGG biology spatially across a tumor, and this provides a platform to develop diagnostics, prognostics, or treatment efficacy biomarkers to improve patient outcomes.
- Tracking Influencers in Policy Field on Social Media: A Global Longitudinal Study of Dietary Sodium Reduction Posts, 2006-2022Alana* Montoya, Mao* Lingchao, Adam Drewnowski, and 5 more authorsJournal of Medical Internet Research, 2024
Background: Excessive sodium intake is a major concern for global public health. Despite multiple dietary guidelines, population sodium intakes are above recommended levels. Lack of health literacy could be one contributing issue and contemporary health literacy is largely shaped by social media. Objective: We aim to quantify the posting behaviors and influence patterns on dietary sodium-related content by influencers in policy field on X (formerly Twitter) across time. Methods: We first identified X users with a scope of work related to dietary sodium and retrieved their posts (formerly Tweets) from 2006 to 2022. Users were categorized into the policy groups of outer-setting organization, inner-setting organization, or individual, based on their role in the conceptual policy field. Network analysis was used to analyze interactions among users and identify the top influencers in each policy group. A four-dimensional influence framework was applied to measure the overall influence, activity, priority, originality, and popularity scores. These measures were used to reveal the user-level, group-level, and temporal patterns of sodium-related influence. Results: We identified 78 users with content related to dietary sodium, with 1,099,605 posts in total and 14,732 dietary sodium posts. There was an increasing volume of sodium posts from 2010 to 2015; however, the trend has been decreasing since 2016, especially among outer-setting organizations. The top influencers from the three policy groups were the World Health Organization (WHO), American Heart Association (AHA), and Tom Frieden, with a total public engagement of 55,593, 26,395, and 12,672, respectively. The WHO and Simon Capewell ranked the highest in activity; the World Action on Salt, Sugar and Health (WASSH) and Action on Salt had the highest priority for dietary sodium content; General Mills, Tom Frieden, and Dariush Mozaffarian had the highest originality; WHO, Tom Frieden, and Harvard University School of Medicine received the highest popularity. The top influencers frequently interacted among themselves, especially those with similar initiatives and partnerships. Outer-setting organizations tend to interact with more users in the network compared to inner-setting organizations and individuals, while inner-setting organizations tend to receive more interactions from other users in the network than the other two groups. Monthly patterns showed a significant peak in the number of sodium posts in March compared with other months. Conclusion: Despite the increased usage of social media, recent trends of sodium intake education on social media are decreasing and the priority of sodium among other topics is low. To improve policy implementation effectiveness and meet recommended dietary targets, there is an increasing need for health leaders to consistently and collectively advocate for sodium intake reduction on social media.
2023
- Weakly Supervised Transfer Learning with Application in Precision MedicineLingchao Mao, Lujia Wang, Leland Hu, and 10 more authorsIEEE Transactions on Automation Science and Engineering, 2023
Precision medicine aims to provide diagnosis and treatment accounting for individual differences. To develop machine learning models in support of precision medicine, personalized models are expected to have better performance than one-model-fits-all approaches. A significant challenge, however, is the limited number of labeled samples that can be collected from each individual due to practical constraints. Transfer Learning (TL) addresses this challenge by leveraging the information of other patients with the same disease (i.e., the source domain) when building a personalized model for each patient (i.e., the target domain). We propose Weakly-Supervised Transfer Learning (WS-TL) to tackle two challenges that existing TL algorithms do not address well: (i) the target domain has only a few or even no labeled samples; (ii) how to integrate domain knowledge into themTL design. We design a novel mathematical framework of WS-TL to learn a model for the target domain based on paired samples whose order relationships are inferred from domain knowledge, while at the same time integrating labeled samples in the source domain for transfer learning. Also, we propose an efficient active sampling strategy to select informative paired samples. Theoretical properties were investigated. Finally, we present a real-world application in precision medicine of brain cancer, where WS-TL is used to build personalized patient models to predict Tumor Cell Density (TCD) distribution across the brain based on MRI images. WS-TL has the highest accuracy compared to a variety of existing TL algorithms. The predicted TCD map for each patient can help facilitate individually optimized treatment.
- Questionnaire and structural imaging data accurately predict headache improvement in patients with acute post-traumatic headache attributed to mild traumatic brain injuryLingchao Mao, Jing Li, Todd J Schwedt, and 6 more authorsCephalalgia, 2023
Our prior work demonstrated that questionnaires assessing psychosocial symptoms have utility for predicting improvement in patients with acute post-traumatic headache following mild traumatic brain injury. In this cohort study, we aimed to determine whether prediction accuracy can be refined by adding structural magnetic resonance imaging (MRI) brain measures to the model. Adults with acute post-traumatic headache (enrolled 0–59 days post-mild traumatic brain injury) underwent T1-weighted brain MRI and completed three questionnaires (Sports Concussion Assessment Tool, Pain Catastrophizing Scale, and the Trait Anxiety Inventory Scale). Individuals with post-traumatic headache completed an electronic headache diary allowing for determination of headache improvement at three- and at six-month follow-up. Questionnaire and MRI measures were used to train prediction models of headache improvement and headache trajectory. Forty-three patients with post-traumatic headache (mean age = 43.0, SD = 12.4; 27 females/16 males) and 61 healthy controls were enrolled (mean age = 39.1, SD = 12.8; 39 females/22 males). The best model achieved cross-validation Area Under the Curve of 0.801 and 0.805 for predicting headache improvement at three and at six months. The top contributing MRI features for the prediction included curvature and thickness of superior, middle, and inferior temporal, fusiform, inferior parietal, and lateral occipital regions. Patients with post-traumatic headache who did not improve by three months had less thickness and higher curvature measures and notably greater baseline differences in brain structure vs. healthy controls (thickness: p < 0.001, curvature: p = 0.012) than those who had headache improvement. A model including clinical questionnaire data and measures of brain structure accurately predicted headache improvement in patients with post-traumatic headache and achieved improvement compared to a model developed using questionnaire data alone.
- A high-dimensional incomplete-modality transfer learning method for early prediction of Alzheimer’s diseaseDohyun Ku, Zhiyang Zheng, Lingchao Mao, and 8 more authorsAlzheimer’s & Dementia, 2023
Prediction of Alzheimer’s disease (AD) risk for individuals with mild cognitive impairment (MCI) provides an opportunity for early intervention. Neuroimaging of different types/modalities has shown promise, but not every patient has all the modalities due to the cost and accessibility constraints. To integrate incomplete multi-modality datasets, we previously developed a machine learning (ML) model called incomplete-modality transfer learning (IMTL). We extended the capacity of IMTL to handle high-dimensional feature sets, namely, HD-IMTL, to further improve accuracy and robustness. Our dataset included 1319 T1-MRI scans from MCI patients in ADNI; among them, 1002 had FDG-PET and 612 had amyloid-PET. 156 regional volumetric and thickness features were computed from MRI and 83 and 83 regional SUVR features from FDG-PET and amyloid-PET, respectively. The dataset is randomly split into training and test sets. The goal of HD-IMTL was to jointly train 4 ML models to predict MCI conversion to AD in 36 months, with each model based on a certain combination of available modalities, namely, MRI, MRI+FDG, MRI+amyloid, and MRI+FDG+amyloid. These correspond to patient sub-cohorts that differ in their access to imaging modalities. To handle high-dimensional features, we employed feature screening to remove uninformative features, performed modality-wise partial least squares (PLS) to condense remaining features into PLS components, and used correlation tests to select components. To jointly train the 4 ML prediction models, IMTL was used, which is a generative model that uses expectation-maximization (EM) in joint parameter estimation to facilitate transfer learning. To account for sample imbalance in training, the Synthetic Minority Over-sampling Technique (SMOTE) was used. The trained models were applied to the test set. 20 training/test splits were repeated and AUCs on the test set were averaged. For comparison, three existing ML models for incomplete-modality fusion were applied to the same dataset. The AUCs by HD-IMTL were 0.802, 0.840, 0.868, and 0.880 for sub-cohorts with MRI, MRI+FDG, MRI+amyloid, and MRI+FDG+amyloid, respectively. The AUCs by existing methods were lower, with ranges of 0.749-0.793, 0.769-0.826, 0.816-0.863, and 0.832-0.868
- Developing multivariable models for predicting headache improvement in patients with acute post-traumatic headache attributed to mild traumatic brain injury: A preliminary studyLingchao Mao, Gina Dumkrieger, Dohyun Ku, and 5 more authorsHeadache: The Journal of Head and Face Pain, 2023
Post‐traumatic headache (PTH) is a common symptom after mild traumatic brain injury (mTBI). Although there have been several studies that have used clinical features of PTH to attempt to predict headache recovery, currently no accurate methods exist for predicting individuals’ improvement from acute PTH. This study investigated the utility of clinical questionnaires for predicting (i) headache improvement at 3 and 6 months, and (ii) headache trajectories over the first 3 months. We conducted a clinic-based observational longitudinal study of patients with acute PTH who completed a battery of clinical questionnaires within 0–59 days post-mTBI. The battery included headache history, symptom evaluation, cognitive tests, psychological tests, and scales assessing photosensitivity, hyperacusis, insomnia, cutaneous allodynia, and substance use. Each participant completed a web-based headache diary, which was used to determine headache improvement. Thirty-seven participants with acute PTH (mean age = 42.7, standard deviation [SD] = 12.0; 25 females/12 males) completed questionnaires at an average of 21.7 (SD = 13.1) days post-mTBI. The classification of headache improvement or non-improvement at 3 and 6 months achieved cross-validation area under the curve (AUC) of 0.72 and 0.84. Sub-models trained using only the top five features still achieved 0.72 and 0.77 AUC. The top five contributing features were from three questionnaires: Pain Catastrophizing Scale total score and helplessness sub-domain score; Sports Concussion Assessment Tool Symptom Evaluation total score and number of symptoms; and the State-Trait Anxiety Inventory score. The functional regression model achieved for modeling headache trajectory over the first 3 months. Questionnaires completed following mTBI have good utility for predicting headache improvement at 3 and 6 months in the future as well as the evolving headache trajectory. Reducing the battery to only three questionnaires, which assess post-concussive symptom load and biopsychosocialecologic factors, was helpful to determine a reasonable prediction accuracy for headache improvement.
- A 4D Theoretical Framework for Measuring Topic-Specific Influence on Twitter: Development and Usability Study on Dietary Sodium TweetsLingchao Mao, Emily Chu, Jinghong Gu, and 3 more authorsJournal of Medical Internet Research, 2023
Background: Social media has emerged as a prominent approach for health education and promotion. However, it is challenging to understand how to best promote health-related information on social media platforms such as Twitter. Despite commercial tools and prior studies attempting to analyze influence, there is a gap to fill in developing a publicly accessible and consolidated framework to measure influence and analyze dissemination strategies. Objective: We aimed to develop a theoretical framework to measure topic-specific user influence on Twitter and to examine its usability by analyzing dietary sodium tweets to support public health agencies in improving their dissemination strategies. Methods: We designed a consolidated framework for measuring influence that can capture topic-specific tweeting behaviors. The core of the framework is a summary indicator of influence decomposable into 4 dimensions: activity, priority, originality, and popularity. These measures can be easily visualized and efficiently computed for any Twitter account without the need for private access. We demonstrated the proposed methods by using a case study on dietary sodium tweets with sampled stakeholders and then compared the framework with a traditional measure of influence. Results: More than half a million dietary sodium tweets from 2006 to 2022 were retrieved for 16 US domestic and international stakeholders in 4 categories, that is, public agencies, academic institutions, professional associations, and experts. We discovered that World Health Organization, American Heart Association, Food and Agriculture Organization of the United Nations (UN-FAO), and World Action on Salt (WASH) were the top 4 sodium influencers in the sample. Each had different strengths and weaknesses in their dissemination strategies, and 2 stakeholders with similar overall influence, that is, UN-FAO and WASH, could have significantly different tweeting patterns. In addition, we identified exemplars in each dimension of influence. Regarding tweeting activity, a dedicated expert published more sodium tweets than any organization in the sample in the past 16 years. In terms of priority, WASH had more than half of its tweets dedicated to sodium. UN-FAO had both the highest proportion of original sodium tweets and posted the most popular sodium tweets among all sampled stakeholders. Regardless of excellence in 1 dimension, the 4 most influential stakeholders excelled in at least 2 out of 4 dimensions of influence. Conclusions: Our findings demonstrate that our method not only aligned with a traditional measure of influence but also advanced influence analysis by analyzing the 4 dimensions that contribute to topic-specific influence. This consolidated framework provides quantifiable measures for public health entities to understand their bottleneck of influence and refine their social media campaign strategies. Our framework can be applied to improve the dissemination of other health topics as well as assist policy makers and public campaign experts to maximize population impact.
2022
- Personalized Predictions for Unplanned Urinary Tract Infection Hospitalizations with Hierarchical ClusteringLingchao Mao, Kimia Vahdat, Sara Shashaani, and 1 more authorIn AI and Analytics for Public Health: Proceedings of the 2020 INFORMS International Conference on Service Science, 2022
Urinary Tract Infection (UTI) is the one of the most frequent and preventable healthcare-associated infections in the US and an important cause of morbidity and excess healthcare costs. This study aims to predict the 30-day risk of a beneficiary for unplanned hospitalization for UTI. Using 2008–12 Medicare fee-for-service claims and several public sources, we extracted 784 features, including patient demographics, clinical conditions, healthcare utilization, provider quality metrics, and community safety indicators. To address the challenge of high heterogeneity and imbalance in data, we propose a hierarchical clustering approach that leverages existing knowledge and data-driven algorithms to partition the population into groups of similar risk, followed by building a LASSO-Logistic Regression (LLR) model for each group. Our prediction models are trained on 237,675 2011 Medicare beneficiaries and tested on 230,042 2012 Medicare beneficiaries. We compare the clustering-based approach to a baseline LLR model using five performance metrics, including the area under the curve (AUC), the True Positive Rate (TPR), and the False Positive Rate (FPR). Results show that the hierarchical clustering approach achieves more accurate and precise predictions (AUC 0.72) than the benchmark model and offers more granular feature importance insights for each patient group.