publications
2024
- Knowledge-Informed Machine Learning for Cancer Diagnosis and Prognosis: A reviewLingchao Mao, Hairong Wang, Leland S Hu, and 4 more authorsarXiv preprint arXiv:2401.06406, 2024
Cancer remains one of the most challenging diseases to treat in the medical field. Machine learning has enabled in-depth analysis of rich multi-omics profiles and medical imaging for cancer diagnosis and prognosis. Despite these advancements, machine learning models face challenges stemming from limited labeled sample sizes, the intricate interplay of high-dimensionality data types, the inherent heterogeneity observed among patients and within tumors, and concerns about interpretability and consistency with existing biomedical knowledge. One approach to surmount these challenges is to integrate biomedical knowledge into data-driven models, which has proven potential to improve the accuracy, robustness, and interpretability of model results. Here, we review the state-of-the-art machine learning studies that adopted the fusion of biomedical knowledge and data, termed knowledge-informed machine learning, for cancer diagnosis and prognosis. Emphasizing the properties inherent in four primary data types including clinical, imaging, molecular, and treatment data, we highlight modeling considerations relevant to these contexts. We provide an overview of diverse forms of knowledge representation and current strategies of knowledge integration into machine learning pipelines with concrete examples. We conclude the review article by discussing future directions to advance cancer research through knowledge-informed machine learning.
2023
- Revealing the biology behind MRI signatures in high grade gliomaErika M Lewis, Lingchao Mao, Lujia Wang, and 6 more authorsmedRxiv, 2023
Magnetic resonance imaging (MRI) measurements are routinely collected during the treatment of high-grade gliomas (HGGs) to characterize tumor boundaries and guide surgical tumor resection. Using spatially matched MRI and transcriptomics we discovered HGG tumor biology captured by MRI measurements. We strategically overlaid the spatially matched omics characterizations onto a pre-existing transcriptional map of glioblastoma multiforme (GBM) to enhance the robustness of our analyses. We discovered that T1+C measurements, designed to capture vasculature and blood brain barrier (BBB) breakdown and subsequent contrast extravasation, also indirectly reveal immune cell infiltration. The disruption of the vasculature and BBB within the tumor creates a permissive infiltrative environment that enables the transmigration of anti-inflammatory macrophages into tumors. These relationships were validated through histology and enrichment of genes associated with immune cell transmigration and proliferation. Additionally, T2-weighted (T2W) and mean diffusivity (MD) measurements were associated with angiogenesis and validated using histology and enrichment of genes involved in neovascularization. Furthermore, we establish an unbiased approach for identifying additional linkages between MRI measurements and tumor biology in future studies, particularly with the integration of novel MRI techniques. Lastly, we illustrated how noninvasive MRI can be used to map HGG biology spatially across a tumor, and this provides a platform to develop diagnostics, prognostics, or treatment efficacy biomarkers to improve patient outcomes.
- Weaky Supervised Transfer Learning with Application in Precision MedicineLingchao Mao, Lujia Wang, Leland Hu, and 10 more authorsIEEE Transactions on Automation Science and Engineering (in press), 2023
Precision medicine aims to provide diagnosis and treatment accounting for individual differences. To develop machine learning models in support of precision medicine, personalized models are expected to have better performance than one-model-fits-all approaches. A significant challenge, however, is the limited number of labeled samples that can be collected from each individual due to practical constraints. Transfer Learning (TL) addresses this challenge by leveraging the information of other patients with the same disease (i.e., the source domain) when building a personalized model for each patient (i.e., the target domain). We propose Weakly-Supervised Transfer Learning (WS-TL) to tackle two challenges that existing TL algorithms do not address well: (i) the target domain has only a few or even no labeled samples; (ii) how to integrate domain knowledge into themTL design. We design a novel mathematical framework of WS-TL to learn a model for the target domain based on paired samples whose order relationships are inferred from domain knowledge, while at the same time integrating labeled samples in the source domain for transfer learning. Also, we propose an efficient active sampling strategy to select informative paired samples. Theoretical properties were investigated. Finally, we present a real-world application in precision medicine of brain cancer, where WS-TL is used to build personalized patient models to predict Tumor Cell Density (TCD) distribution across the brain based on MRI images. WS-TL has the highest accuracy compared to a variety of existing TL algorithms. The predicted TCD map for each patient can help facilitate individually optimized treatment.
- Questionnaire and structural imaging data accurately predict headache improvement in patients with acute post-traumatic headache attributed to mild traumatic brain injuryLingchao Mao, Jing Li, Todd J Schwedt, and 6 more authorsCephalalgia, 2023
Our prior work demonstrated that questionnaires assessing psychosocial symptoms have utility for predicting improvement in patients with acute post-traumatic headache following mild traumatic brain injury. In this cohort study, we aimed to determine whether prediction accuracy can be refined by adding structural magnetic resonance imaging (MRI) brain measures to the model. Adults with acute post-traumatic headache (enrolled 0–59 days post-mild traumatic brain injury) underwent T1-weighted brain MRI and completed three questionnaires (Sports Concussion Assessment Tool, Pain Catastrophizing Scale, and the Trait Anxiety Inventory Scale). Individuals with post-traumatic headache completed an electronic headache diary allowing for determination of headache improvement at three- and at six-month follow-up. Questionnaire and MRI measures were used to train prediction models of headache improvement and headache trajectory. Forty-three patients with post-traumatic headache (mean age = 43.0, SD = 12.4; 27 females/16 males) and 61 healthy controls were enrolled (mean age = 39.1, SD = 12.8; 39 females/22 males). The best model achieved cross-validation Area Under the Curve of 0.801 and 0.805 for predicting headache improvement at three and at six months. The top contributing MRI features for the prediction included curvature and thickness of superior, middle, and inferior temporal, fusiform, inferior parietal, and lateral occipital regions. Patients with post-traumatic headache who did not improve by three months had less thickness and higher curvature measures and notably greater baseline differences in brain structure vs. healthy controls (thickness: p < 0.001, curvature: p = 0.012) than those who had headache improvement. A model including clinical questionnaire data and measures of brain structure accurately predicted headache improvement in patients with post-traumatic headache and achieved improvement compared to a model developed using questionnaire data alone.
- A high-dimensional incomplete-modality transfer learning method for early prediction of Alzheimer’s diseaseDohyun Ku, Zhiyang Zheng, Lingchao Mao, and 8 more authorsAlzheimer’s & Dementia, 2023
- Developing multivariable models for predicting headache improvement in patients with acute post-traumatic headache attributed to mild traumatic brain injury: A preliminary studyLingchao Mao, Gina Dumkrieger, Dohyun Ku, and 5 more authorsHeadache: The Journal of Head and Face Pain, 2023
Post‐traumatic headache (PTH) is a common symptom after mild traumatic brain injury (mTBI). Although there have been several studies that have used clinical features of PTH to attempt to predict headache recovery, currently no accurate methods exist for predicting individuals’ improvement from acute PTH. This study investigated the utility of clinical questionnaires for predicting (i) headache improvement at 3 and 6 months, and (ii) headache trajectories over the first 3 months. We conducted a clinic-based observational longitudinal study of patients with acute PTH who completed a battery of clinical questionnaires within 0–59 days post-mTBI. The battery included headache history, symptom evaluation, cognitive tests, psychological tests, and scales assessing photosensitivity, hyperacusis, insomnia, cutaneous allodynia, and substance use. Each participant completed a web-based headache diary, which was used to determine headache improvement. Thirty-seven participants with acute PTH (mean age = 42.7, standard deviation [SD] = 12.0; 25 females/12 males) completed questionnaires at an average of 21.7 (SD = 13.1) days post-mTBI. The classification of headache improvement or non-improvement at 3 and 6 months achieved cross-validation area under the curve (AUC) of 0.72 and 0.84. Sub-models trained using only the top five features still achieved 0.72 and 0.77 AUC. The top five contributing features were from three questionnaires: Pain Catastrophizing Scale total score and helplessness sub-domain score; Sports Concussion Assessment Tool Symptom Evaluation total score and number of symptoms; and the State-Trait Anxiety Inventory score. The functional regression model achieved for modeling headache trajectory over the first 3 months. Questionnaires completed following mTBI have good utility for predicting headache improvement at 3 and 6 months in the future as well as the evolving headache trajectory. Reducing the battery to only three questionnaires, which assess post-concussive symptom load and biopsychosocialecologic factors, was helpful to determine a reasonable prediction accuracy for headache improvement.
- A 4D Theoretical Framework for Measuring Topic-Specific Influence on Twitter: Development and Usability Study on Dietary Sodium TweetsLingchao Mao, Emily Chu, Jinghong Gu, and 3 more authorsJournal of Medical Internet Research, 2023
Background: Social media has emerged as a prominent approach for health education and promotion. However, it is challenging to understand how to best promote health-related information on social media platforms such as Twitter. Despite commercial tools and prior studies attempting to analyze influence, there is a gap to fill in developing a publicly accessible and consolidated framework to measure influence and analyze dissemination strategies. Objective: We aimed to develop a theoretical framework to measure topic-specific user influence on Twitter and to examine its usability by analyzing dietary sodium tweets to support public health agencies in improving their dissemination strategies. Methods: We designed a consolidated framework for measuring influence that can capture topic-specific tweeting behaviors. The core of the framework is a summary indicator of influence decomposable into 4 dimensions: activity, priority, originality, and popularity. These measures can be easily visualized and efficiently computed for any Twitter account without the need for private access. We demonstrated the proposed methods by using a case study on dietary sodium tweets with sampled stakeholders and then compared the framework with a traditional measure of influence. Results: More than half a million dietary sodium tweets from 2006 to 2022 were retrieved for 16 US domestic and international stakeholders in 4 categories, that is, public agencies, academic institutions, professional associations, and experts. We discovered that World Health Organization, American Heart Association, Food and Agriculture Organization of the United Nations (UN-FAO), and World Action on Salt (WASH) were the top 4 sodium influencers in the sample. Each had different strengths and weaknesses in their dissemination strategies, and 2 stakeholders with similar overall influence, that is, UN-FAO and WASH, could have significantly different tweeting patterns. In addition, we identified exemplars in each dimension of influence. Regarding tweeting activity, a dedicated expert published more sodium tweets than any organization in the sample in the past 16 years. In terms of priority, WASH had more than half of its tweets dedicated to sodium. UN-FAO had both the highest proportion of original sodium tweets and posted the most popular sodium tweets among all sampled stakeholders. Regardless of excellence in 1 dimension, the 4 most influential stakeholders excelled in at least 2 out of 4 dimensions of influence. Conclusions: Our findings demonstrate that our method not only aligned with a traditional measure of influence but also advanced influence analysis by analyzing the 4 dimensions that contribute to topic-specific influence. This consolidated framework provides quantifiable measures for public health entities to understand their bottleneck of influence and refine their social media campaign strategies. Our framework can be applied to improve the dissemination of other health topics as well as assist policy makers and public campaign experts to maximize population impact.
2022
- Personalized Predictions for Unplanned Urinary Tract Infection Hospitalizations with Hierarchical ClusteringLingchao Mao, Kimia Vahdat, Sara Shashaani, and 1 more authorIn AI and Analytics for Public Health: Proceedings of the 2020 INFORMS International Conference on Service Science, 2022
Urinary Tract Infection (UTI) is the one of the most frequent and preventable healthcare-associated infections in the US and an important cause of morbidity and excess healthcare costs. This study aims to predict the 30-day risk of a beneficiary for unplanned hospitalization for UTI. Using 2008–12 Medicare fee-for-service claims and several public sources, we extracted 784 features, including patient demographics, clinical conditions, healthcare utilization, provider quality metrics, and community safety indicators. To address the challenge of high heterogeneity and imbalance in data, we propose a hierarchical clustering approach that leverages existing knowledge and data-driven algorithms to partition the population into groups of similar risk, followed by building a LASSO-Logistic Regression (LLR) model for each group. Our prediction models are trained on 237,675 2011 Medicare beneficiaries and tested on 230,042 2012 Medicare beneficiaries. We compare the clustering-based approach to a baseline LLR model using five performance metrics, including the area under the curve (AUC), the True Positive Rate (TPR), and the False Positive Rate (FPR). Results show that the hierarchical clustering approach achieves more accurate and precise predictions (AUC 0.72) than the benchmark model and offers more granular feature importance insights for each patient group.