Pesquisa | Portal Regional da BVS

1.

BarlowTwins-CXR: enhancing chest X-ray abnormality localization in heterogeneous data with cross-domain self-supervised learning.

Sheng, Haoyue; Ma, Linrui; Samson, Jean-François; Liu, Dianbo.

BMC Med Inform Decis Mak ; 24(1): 126, 2024 May 16.

Artigo em Inglês | MEDLINE | ID: mdl-38755563

RESUMO

BACKGROUND: Chest X-ray imaging based abnormality localization, essential in diagnosing various diseases, faces significant clinical challenges due to complex interpretations and the growing workload of radiologists. While recent advances in deep learning offer promising solutions, there is still a critical issue of domain inconsistency in cross-domain transfer learning, which hampers the efficiency and accuracy of diagnostic processes. This study aims to address the domain inconsistency problem and improve autonomic abnormality localization performance of heterogeneous chest X-ray image analysis, particularly in detecting abnormalities, by developing a self-supervised learning strategy called "BarlwoTwins-CXR". METHODS: We utilized two publicly available datasets: the NIH Chest X-ray Dataset and the VinDr-CXR. The BarlowTwins-CXR approach was conducted in a two-stage training process. Initially, self-supervised pre-training was performed using an adjusted Barlow Twins algorithm on the NIH dataset with a Resnet50 backbone pre-trained on ImageNet. This was followed by supervised fine-tuning on the VinDr-CXR dataset using Faster R-CNN with Feature Pyramid Network (FPN). The study employed mean Average Precision (mAP) at an Intersection over Union (IoU) of 50% and Area Under the Curve (AUC) for performance evaluation. RESULTS: Our experiments showed a significant improvement in model performance with BarlowTwins-CXR. The approach achieved a 3% increase in mAP50 accuracy compared to traditional ImageNet pre-trained models. In addition, the Ablation CAM method revealed enhanced precision in localizing chest abnormalities. The study involved 112,120 images from the NIH dataset and 18,000 images from the VinDr-CXR dataset, indicating robust training and testing samples. CONCLUSION: BarlowTwins-CXR significantly enhances the efficiency and accuracy of chest X-ray image-based abnormality localization, outperforming traditional transfer learning methods and effectively overcoming domain inconsistency in cross-domain scenarios. Our experiment results demonstrate the potential of using self-supervised learning to improve the generalizability of models in medical settings with limited amounts of heterogeneous data. This approach can be instrumental in aiding radiologists, particularly in high-workload environments, offering a promising direction for future AI-driven healthcare solutions.

Assuntos

Radiografia Torácica , Aprendizado de Máquina Supervisionado , Humanos , Aprendizado Profundo , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Conjuntos de Dados como Assunto

2.

Natural Language Processing Methods to Empirically Explore Social Contexts and Needs in Cancer Patient Notes.

Derton, Abigail; Guevara, Marco; Chen, Shan; Moningi, Shalini; Kozono, David E; Liu, Dianbo; Miller, Timothy A; Savova, Guergana K; Mak, Raymond H; Bitterman, Danielle S.

JCO Clin Cancer Inform ; 7: e2200196, 2023 05.

Artigo em Inglês | MEDLINE | ID: mdl-37235847

RESUMO

PURPOSE: There is an unmet need to empirically explore and understand drivers of cancer disparities, particularly social determinants of health. We explored natural language processing methods to automatically and empirically extract clinical documentation of social contexts and needs that may underlie disparities. METHODS: This was a retrospective analysis of 230,325 clinical notes from 5,285 patients treated with radiotherapy from 2007 to 2019. We compared linguistic features among White versus non-White, low-income insurance versus other insurance, and male versus female patients' notes. Log odds ratios with an informative Dirichlet prior were calculated to compare words over-represented in each group. A variational autoencoder topic model was applied, and topic probability was compared between groups. The presence of machine-learnable bias was explored by developing statistical and neural demographic group classifiers. RESULTS: Terms associated with varied social contexts and needs were identified for all demographic group comparisons. For example, notes of non-White and low-income insurance patients were over-represented with terms associated with housing and transportation, whereas notes of White and other insurance patients were over-represented with terms related to physical activity. Topic models identified a social history topic, and topic probability varied significantly between the demographic group comparisons. Classification models performed poorly at classifying notes of non-White and low-income insurance patients (F1 of 0.30 and 0.23, respectively). CONCLUSION: Exploration of linguistic differences in clinical notes between patients of different race/ethnicity, insurance status, and sex identified social contexts and needs in patients with cancer and revealed high-level differences in notes. Future work is needed to validate whether these findings may play a role in cancer disparities.

Assuntos

Processamento de Linguagem Natural , Neoplasias , Humanos , Masculino , Feminino , Estudos Retrospectivos , Meio Social , Neoplasias/diagnóstico , Neoplasias/epidemiologia , Neoplasias/terapia

3.

Spatio-temporal heterogeneity in the international trade resilience during COVID-19.

Luo, Wei; He, Lingfeng; Yang, Zihui; Zhang, Shirui; Wang, Yong; Liu, Dianbo; Hu, Sheng; He, Li; Xia, Jizhe; Chen, Min.

Appl Geogr ; 154: 102923, 2023 May.

Artigo em Inglês | MEDLINE | ID: mdl-36915293

RESUMO

The COVID-19 pandemic and subsequent lockdowns have created immeasurable health and economic crises, leading to unprecedented disruptions to world trade. The COVID-19 pandemic shows diverse impacts on different economies that suffer and recover at different rates and degrees. This research aims to evaluate the spatio-temporal heterogeneity of international trade network vulnerabilities in the current crisis to understand the global production resilience and prepare for the future crisis. We applied a series of complex network analysis approaches to the monthly international trade networks at the world, regional, and country scales for the pre- and post- COVID-19 outbreak period. The spatio-temporal patterns indicate that countries and regions with an effective COVID-19 containment such as East Asia show the strongest resilience, especially Mainland China, followed by high-income countries with fast vaccine roll-out (e.g., U.S.), whereas low-income countries (e.g., Africa) show high vulnerability. Our results encourage a comprehensive strategy to enhance international trade resilience when facing future pandemic threats including effective non-pharmaceutical measures, timely development and rollout of vaccines, strong governance capacity, robust healthcare systems, and equality via international cooperation. The overall findings elicit the hidden global trading disruption, recovery, and growth due to the adverse impact of the COVID-19 pandemic.

4.

Confederated learning in healthcare: Training machine learning models using disconnected data separated by individual, data type and identity for Large-Scale health system Intelligence.

Liu, Dianbo; Fox, Kathe; Weber, Griffin; Miller, Tim.

J Biomed Inform ; 134: 104151, 2022 10.

Artigo em Inglês | MEDLINE | ID: mdl-35872264

RESUMO

BACKGROUND: A patient's health information is generally fragmented across silos because it follows how care is delivered: multiple providers in multiple settings. Though it is technically feasible to reunite data for analysis in a manner that underpins a rapid learning healthcare system, privacy concerns and regulatory barriers limit data centralization for this purpose. OBJECTIVES: Machine learning can be conducted in a federated manner on patient datasets with the same set of variables but separated across storage. But federated learning cannot handle the situation where different data types for a given patient are separated vertically across different organizations and when patient ID matching across different institutions is difficult. We call methods that enable machine learning model training on data separated by two or more dimensions "confederated machine learning", which we aim to develop in this study. METHODS: We propose and evaluate confederated learning for training machine learning models to stratify the risk of several diseases among silos when data are horizontally separated by individual, vertically separated by data type, and separated by identity without patient ID matching. The confederated learning method can be intuitively understood as a distributed learning method with representation learning, generative model, imputation method and data augmentation elements. RESULTS: Our confederated learning method achieves AUCROC (Area Under The Curve Receiver Operating Characteristics) of 0.787 for diabetes prediction, 0.718 for psychological disorders prediction, and 0.698 for Ischemic heart disease prediction using nationwide health insurance claims. CONCLUSION: Our proposed confederated learning method successfully trained machine learning models on health insurance data separated by two or more dimensions.

Assuntos

Atenção à Saúde , Aprendizado de Máquina , Humanos , Inteligência , Privacidade , Curva ROC

5.

Machine learning approaches to predicting no-shows in pediatric medical appointment.

Liu, Dianbo; Shin, Won-Yong; Sprecher, Eli; Conroy, Kathleen; Santiago, Omar; Wachtel, Gal; Santillana, Mauricio.

NPJ Digit Med ; 5(1): 50, 2022 Apr 20.

Artigo em Inglês | MEDLINE | ID: mdl-35444260

RESUMO

Patients' no-shows, scheduled but unattended medical appointments, have a direct negative impact on patients' health, due to discontinuity of treatment and late presentation to care. They also lead to inefficient use of medical resources in hospitals and clinics. The ability to predict a likely no-show in advance could enable the design and implementation of interventions to reduce the risk of it happening, thus improving patients' care and clinical resource allocation. In this study, we develop a new interpretable deep learning-based approach for predicting the risk of no-shows at the time when a medical appointment is first scheduled. The retrospective study was conducted in an academic pediatric teaching hospital with a 20% no-show rate. Our approach tackles several challenges in the design of a predictive model by (1) adopting a data imputation method for patients with missing information in their records (77% of the population), (2) exploiting local weather information to improve predictive accuracy, and (3) developing an interpretable approach that explains how a prediction is made for each individual patient. Our proposed neural network-based and logistic regression-based methods outperformed persistence baselines. In an unobserved set of patients, our method correctly identified 83% of no-shows at the time of scheduling and led to a false alert rate less than 17%. Our method is capable of producing meaningful predictions even when some information in a patient's records is missing. We find that patients' past no-show record is the strongest predictor. Finally, we discuss several potential interventions to reduce no-shows, such as scheduling appointments of high-risk patients at off-peak times, which can serve as starting point for further studies on no-show interventions.

6.

Using Artificial Neural Network Condensation to Facilitate Adaptation of Machine Learning in Medical Settings by Reducing Computational Burden: Model Design and Evaluation Study.

Liu, Dianbo; Zheng, Ming; Sepulveda, Nestor Andres.

JMIR Form Res ; 5(12): e20767, 2021 Dec 08.

Artigo em Inglês | MEDLINE | ID: mdl-34889747

RESUMO

BACKGROUND: Machine learning applications in the health care domain can have a great impact on people's lives. At the same time, medical data is usually big, requiring a significant number of computational resources. Although this might not be a problem for the wide adoption of machine learning tools in high-income countries, the availability of computational resources can be limited in low-income countries and on mobile devices. This can limit many people from benefiting from the advancement in machine learning applications in the field of health care. OBJECTIVE: In this study, we explore three methods to increase the computational efficiency and reduce model sizes of either recurrent neural networks (RNNs) or feedforward deep neural networks (DNNs) without compromising their accuracy. METHODS: We used inpatient mortality prediction as our case analysis upon review of an intensive care unit dataset. We reduced the size of RNN and DNN by applying pruning of "unused" neurons. Additionally, we modified the RNN structure by adding a hidden layer to the RNN cell but reducing the total number of recurrent layers to accomplish a reduction of the total parameters used in the network. Finally, we implemented quantization on DNN by forcing the weights to be 8 bits instead of 32 bits. RESULTS: We found that all methods increased implementation efficiency, including training speed, memory size, and inference speed, without reducing the accuracy of mortality prediction. CONCLUSIONS: Our findings suggest that neural network condensation allows for the implementation of sophisticated neural network algorithms on devices with lower computational resources.

7.

High-throughput 5' UTR engineering for enhanced protein production in non-viral gene therapies.

Cao, Jicong; Novoa, Eva Maria; Zhang, Zhizhuo; Chen, William C W; Liu, Dianbo; Choi, Gigi C G; Wong, Alan S L; Wehrspaun, Claudia; Kellis, Manolis; Lu, Timothy K.

Nat Commun ; 12(1): 4138, 2021 07 06.

Artigo em Inglês | MEDLINE | ID: mdl-34230498

RESUMO

Despite significant clinical progress in cell and gene therapies, maximizing protein expression in order to enhance potency remains a major technical challenge. Here, we develop a high-throughput strategy to design, screen, and optimize 5' UTRs that enhance protein expression from a strong human cytomegalovirus (CMV) promoter. We first identify naturally occurring 5' UTRs with high translation efficiencies and use this information with in silico genetic algorithms to generate synthetic 5' UTRs. A total of ~12,000 5' UTRs are then screened using a recombinase-mediated integration strategy that greatly enhances the sensitivity of high-throughput screens by eliminating copy number and position effects that limit lentiviral approaches. Using this approach, we identify three synthetic 5' UTRs that outperform commonly used non-viral gene therapy plasmids in expressing protein payloads. In summary, we demonstrate that high-throughput screening of 5' UTR libraries with recombinase-mediated integration can identify genetic elements that enhance protein expression, which should have numerous applications for engineered cell and gene therapies.

Assuntos

Regiões 5' não Traduzidas/genética , Engenharia Genética , Terapia Genética , Algoritmos , Linhagem Celular , Expressão Gênica , Células HEK293 , Ensaios de Triagem em Larga Escala , Humanos , Plasmídeos , Regiões Promotoras Genéticas , Recombinases

8.

FeARH: Federated machine learning with anonymous random hybridization on electronic medical records.

Cui, Jianfei; Zhu, He; Deng, Hao; Chen, Ziwei; Liu, Dianbo.

J Biomed Inform ; 117: 103735, 2021 05.

Artigo em Inglês | MEDLINE | ID: mdl-33711540

RESUMO

Electrical medical records are restricted and difficult to centralize for machine learning model training due to privacy and regulatory issues. One solution is to train models in a distributed manner that involves many parties in the process. However, sometimes certain parties are not trustable, and in this project, we aim to propose an alternative method to traditional federated learning with central analyzer in order to conduct training in a situation without a trustable central analyzer. The proposed algorithm is called "federated machine learning with anonymous random hybridization (abbreviated as 'FeARH')", using mainly hybridization algorithm to degenerate the integration of connections between medical record data and models' parameters by adding randomization into the parameter sets shared to other parties. Based on our experiment, our new algorithm has similar AUCROC and AUCPR results compared with machine learning in a centralized manner and original federated machine learning.

Assuntos

Registros Eletrônicos de Saúde , Aprendizado de Máquina , Algoritmos , Privacidade , Projetos de Pesquisa

9.

Patients dispensed medications with actionable pharmacogenomic biomarkers: rates and characteristics.

Liu, Dianbo; Olson, Karen L; Manzi, Shannon F; Mandl, Kenneth D.

Genet Med ; 23(4): 782-786, 2021 04.

Artigo em Inglês | MEDLINE | ID: mdl-33420348

RESUMO

PURPOSE: Pharmacogenomic biomarkers are increasingly listed on medication labels and authoritative guidelines but pharmacogenomic-guided prescribing is not yet common. Our objective was to assess the potential for incorporating knowledge of patients' genomic characteristics into prescribing practices. METHODS: We performed a retrospective analysis of claims data for 2,096,971 beneficiaries with pharmacy coverage from a national, commercial health insurance plan between January 2017 and December 2019. Children between 0 and 17 years comprised 21% of the cohort. Adults were age 18 to 64. Medications with actionable pharmacogenomic biomarkers (MAPBs) were identified using public information from the US Food and Drug Administration (FDA), Clinical Pharmacogenomics Implementation Consortium (CPIC), and PharmGKB. RESULTS: MAPBs were dispensed to 63% of the adults and 29% of the children in the cohort. Most frequently dispensed were ibuprofen, ondansetron, codeine, and oxycodone. Most common were medications with CYP2D6, G6PD, or CYPC19 pharmacogenomic biomarkers. Ten percent of the cohort were codispensed more than one MAPB for at least 30 days. CONCLUSION: The number of people who might benefit from pharmacogenomic-guided prescribing is substantial. Future work should address obstacles to integrating genomic data into prescriber workflows, complex factors contributing to the magnitude of benefit, and the clinical availability of reliable on-demand or pre-emptive pharmacogenomic testing.

Assuntos

Farmacogenética , Testes Farmacogenômicos , Adolescente , Adulto , Biomarcadores , Criança , Rotulagem de Medicamentos , Humanos , Pessoa de Meia-Idade , Estudos Retrospectivos , Adulto Jovem

10.

Stochastic Channel-Based Federated Learning With Neural Network Pruning for Medical Data Privacy Preservation: Model Development and Experimental Validation.

Shao, Rulin; He, Hongyu; Chen, Ziwei; Liu, Hui; Liu, Dianbo.

JMIR Form Res ; 4(12): e17265, 2020 Dec 22.

Artigo em Inglês | MEDLINE | ID: mdl-33350391

RESUMO

BACKGROUND: Artificial neural networks have achieved unprecedented success in the medical domain. This success depends on the availability of massive and representative datasets. However, data collection is often prevented by privacy concerns, and people want to take control over their sensitive information during both the training and using processes. OBJECTIVE: To address security and privacy issues, we propose a privacy-preserving method for the analysis of distributed medical data. The proposed method, termed stochastic channel-based federated learning (SCBFL), enables participants to train a high-performance model cooperatively and in a distributed manner without sharing their inputs. METHODS: We designed, implemented, and evaluated a channel-based update algorithm for a central server in a distributed system. The update algorithm will select the channels with regard to the most active features in a training loop, and then upload them as learned information from local datasets. A pruning process, which serves as a model accelerator, was further applied to the algorithm based on the validation set. RESULTS: We constructed a distributed system consisting of 5 clients and 1 server. Our trials showed that the SCBFL method can achieve an area under the receiver operating characteristic curve (AUC-ROC) of 0.9776 and an area under the precision-recall curve (AUC-PR) of 0.9695 with only 10% of channels shared with the server. Compared with the federated averaging algorithm, the proposed SCBFL method achieved a 0.05388 higher AUC-ROC and 0.09695 higher AUC-PR. In addition, our experiment showed that 57% of the time is saved by the pruning process with only a reduction of 0.0047 in AUC-ROC performance and a reduction of 0.0068 in AUC-PR performance. CONCLUSIONS: In this experiment, our model demonstrated better performance and a higher saturating speed than the federated averaging method, which reveals all of the parameters of local models to the server. The saturation rate of performance could be promoted by introducing a pruning process and further improvement could be achieved by tuning the pruning rate.

11.

The role of environmental factors on transmission rates of the COVID-19 outbreak: an initial assessment in two spatial scales.

Poirier, Canelle; Luo, Wei; Majumder, Maimuna S; Liu, Dianbo; Mandl, Kenneth D; Mooring, Todd A; Santillana, Mauricio.

Sci Rep ; 10(1): 17002, 2020 10 12.

Artigo em Inglês | MEDLINE | ID: mdl-33046802

RESUMO

First identified in Wuhan, China, in December 2019, a novel coronavirus (SARS-CoV-2) has affected over 16,800,000 people worldwide as of July 29, 2020 and was declared a pandemic by the World Health Organization on March 11, 2020. Influenza studies have shown that influenza viruses survive longer on surfaces or in droplets in cold and dry air, thus increasing the likelihood of subsequent transmission. A similar hypothesis has been postulated for the transmission of COVID-19, the disease caused by SARS-CoV-2. It is important to propose methodologies to understand the effects of environmental factors on this ongoing outbreak to support decision-making pertaining to disease control. Here, we examine the spatial variability of the basic reproductive numbers of COVID-19 across provinces and cities in China and show that environmental variables alone cannot explain this variability. Our findings suggest that changes in weather (i.e., increase of temperature and humidity as spring and summer months arrive in the Northern Hemisphere) will not necessarily lead to declines in case counts without the implementation of drastic public health interventions.

Assuntos

Infecções por Coronavirus/epidemiologia , Infecções por Coronavirus/transmissão , Umidade , Pneumonia Viral/epidemiologia , Pneumonia Viral/transmissão , Betacoronavirus , COVID-19 , Temperatura Baixa , Meio Ambiente , Temperatura Alta , Humanos , Pandemias , Dinâmica Populacional , SARS-CoV-2

12.

Correction: Real-Time Forecasting of the COVID-19 Outbreak in Chinese Provinces: Machine Learning Approach Using Novel Digital Data and Estimates From Mechanistic Models.

Liu, Dianbo; Clemente, Leonardo; Poirier, Canelle; Ding, Xiyu; Chinazzi, Matteo; Davis, Jessica; Vespignani, Alessandro; Santillana, Mauricio.

J Med Internet Res ; 22(9): e23996, 2020 Sep 22.

Artigo em Inglês | MEDLINE | ID: mdl-32960774

RESUMO

[This corrects the article DOI: 10.2196/20285.].

13.

The Role of Environmental Factors on Transmission Rates of the COVID-19 Outbreak: An Initial Assessment in Two Spatial Scales.

Poirier, Canelle; Luo, Wei; Majumder, Maimuna S; Liu, Dianbo; Mandl, Kenneth D; Mooring, Todd A; Santillana, Mauricio.

SSRN ; : 3552677, 2020 Mar 12.

Artigo em Inglês | MEDLINE | ID: mdl-32714106

RESUMO

A novel coronavirus (SARS-CoV-2) was identified in Wuhan, Hubei Province, China, in December 2019 and has caused over 240,000 cases of COVID-19 worldwide as of March 19, 2020. Previous studies have supported an epidemiological hypothesis that cold and dry environments facilitate the survival and spread of droplet-mediated viral diseases, and warm and humid environments see attenuated viral transmission (e.g., influenza). However, the role of temperature and humidity in transmission of COVID-19 has not yet been established. Here, we examine the spatial variability of the basic reproductive numbers of COVID-19 across provinces and cities in China and show that environmental variables alone cannot explain this variability. Our findings suggest that changes in weather alone (i.e., increase of temperature and humidity as spring and summer months arrive in the Northern Hemisphere) will not necessarily lead to declines in case count without the implementation of extensive public health interventions.

14.

Real-Time Forecasting of the COVID-19 Outbreak in Chinese Provinces: Machine Learning Approach Using Novel Digital Data and Estimates From Mechanistic Models.

Liu, Dianbo; Clemente, Leonardo; Poirier, Canelle; Ding, Xiyu; Chinazzi, Matteo; Davis, Jessica; Vespignani, Alessandro; Santillana, Mauricio.

J Med Internet Res ; 22(8): e20285, 2020 08 17.

Artigo em Inglês | MEDLINE | ID: mdl-32730217

RESUMO

BACKGROUND: The inherent difficulty of identifying and monitoring emerging outbreaks caused by novel pathogens can lead to their rapid spread; and if left unchecked, they may become major public health threats to the planet. The ongoing coronavirus disease (COVID-19) outbreak, which has infected over 2,300,000 individuals and caused over 150,000 deaths, is an example of one of these catastrophic events. OBJECTIVE: We present a timely and novel methodology that combines disease estimates from mechanistic models and digital traces, via interpretable machine learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real time. METHODS: Our method uses the following as inputs: (a) official health reports, (b) COVID-19-related internet search activity, (c) news media activity, and (d) daily forecasts of COVID-19 activity from a metapopulation mechanistic model. Our machine learning methodology uses a clustering technique that enables the exploitation of geospatial synchronicities of COVID-19 activity across Chinese provinces and a data augmentation technique to deal with the small number of historical disease observations characteristic of emerging outbreaks. RESULTS: Our model is able to produce stable and accurate forecasts 2 days ahead of the current time and outperforms a collection of baseline models in 27 out of 32 Chinese provinces. CONCLUSIONS: Our methodology could be easily extended to other geographies currently affected by COVID-19 to aid decision makers with monitoring and possibly prevention.

Assuntos

Infecções por Coronavirus/epidemiologia , Infecções por Coronavirus/transmissão , Análise de Dados , Previsões/métodos , Aprendizado de Máquina , Modelos Biológicos , Pneumonia Viral/epidemiologia , Pneumonia Viral/transmissão , COVID-19 , China/epidemiologia , Surtos de Doenças , Humanos , Internet , Meios de Comunicação de Massa , Modelos Estatísticos , Pandemias , Saúde Pública/métodos

15.

A machine learning methodology for real-time forecasting of the 2019-2020 COVID-19 outbreak using Internet searches, news alerts, and estimates from mechanistic models.

Liu, Dianbo; Clemente, Leonardo; Poirier, Canelle; Ding, Xiyu; Chinazzi, Matteo; Davis, Jessica T; Vespignani, Alessandro; Santillana, Mauricio.

ArXiv ; 2020 Apr 08.

Artigo em Inglês | MEDLINE | ID: mdl-32550248

RESUMO

We present a timely and novel methodology that combines disease estimates from mechanistic models with digital traces, via interpretable machine-learning methodologies, to reliably forecast COVID-19 activity in Chinese provinces in real-time. Specifically, our method is able to produce stable and accurate forecasts 2 days ahead of current time, and uses as inputs (a) official health reports from Chinese Center Disease for Control and Prevention (China CDC), (b) COVID-19-related internet search activity from Baidu, (c) news media activity reported by Media Cloud, and (d) daily forecasts of COVID-19 activity from GLEAM, an agent-based mechanistic model. Our machine-learning methodology uses a clustering technique that enables the exploitation of geo-spatial synchronicities of COVID-19 activity across Chinese provinces, and a data augmentation technique to deal with the small number of historical disease activity observations, characteristic of emerging outbreaks. Our model's predictive power outperforms a collection of baseline models in 27 out of the 32 Chinese provinces, and could be easily extended to other geographies currently affected by the COVID-19 outbreak to help decision makers.

16.

LoAdaBoost: Loss-based AdaBoost federated machine learning with reduced computational complexity on IID and non-IID intensive care data.

Huang, Li; Yin, Yifeng; Fu, Zeng; Zhang, Shifa; Deng, Hao; Liu, Dianbo.

PLoS One ; 15(4): e0230706, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32302316

RESUMO

Intensive care data are valuable for improvement of health care, policy making and many other purposes. Vast amount of such data are stored in different locations, on many different devices and in different data silos. Sharing data among different sources is a big challenge due to regulatory, operational and security reasons. One potential solution is federated machine learning, which is a method that sends machine learning algorithms simultaneously to all data sources, trains models in each source and aggregates the learned models. This strategy allows utilization of valuable data without moving them. One challenge in applying federated machine learning is the possibly different distributions of data from diverse sources. To tackle this problem, we proposed an adaptive boosting method named LoAdaBoost that increases the efficiency of federated machine learning. Using intensive care unit data from hospitals, we investigated the performance of learning in IID and non-IID data distribution scenarios, and showed that the proposed LoAdaBoost method achieved higher predictive accuracy with lower computational complexity than the baseline method.

Assuntos

Unidades de Terapia Intensiva/estatística & dados numéricos , Aprendizado de Máquina , Informática Médica/métodos , Bases de Dados Factuais , Humanos

17.

Patients with Cancer Appear More Vulnerable to SARS-CoV-2: A Multicenter Study during the COVID-19 Outbreak.

Dai, Mengyuan; Liu, Dianbo; Liu, Miao; Zhou, Fuxiang; Li, Guiling; Chen, Zhen; Zhang, Zhian; You, Hua; Wu, Meng; Zheng, Qichao; Xiong, Yong; Xiong, Huihua; Wang, Chun; Chen, Changchun; Xiong, Fei; Zhang, Yan; Peng, Yaqin; Ge, Siping; Zhen, Bo; Yu, Tingting; Wang, Ling; Wang, Hua; Liu, Yu; Chen, Yeshan; Mei, Junhua; Gao, Xiaojia; Li, Zhuyan; Gan, Lijuan; He, Can; Li, Zhen; Shi, Yuying; Qi, Yuwen; Yang, Jing; Tenen, Daniel G; Chai, Li; Mucci, Lorelei A; Santillana, Mauricio; Cai, Hongbing.

Cancer Discov ; 10(6): 783-791, 2020 06.

Artigo em Inglês | MEDLINE | ID: mdl-32345594

RESUMO

The novel COVID-19 outbreak has affected more than 200 countries and territories as of March 2020. Given that patients with cancer are generally more vulnerable to infections, systematic analysis of diverse cohorts of patients with cancer affected by COVID-19 is needed. We performed a multicenter study including 105 patients with cancer and 536 age-matched noncancer patients confirmed with COVID-19. Our results showed COVID-19 patients with cancer had higher risks in all severe outcomes. Patients with hematologic cancer, lung cancer, or with metastatic cancer (stage IV) had the highest frequency of severe events. Patients with nonmetastatic cancer experienced similar frequencies of severe conditions to those observed in patients without cancer. Patients who received surgery had higher risks of having severe events, whereas patients who underwent only radiotherapy did not demonstrate significant differences in severe events when compared with patients without cancer. These findings indicate that patients with cancer appear more vulnerable to SARS-CoV-2 outbreak. SIGNIFICANCE: Because this is the first large cohort study on this topic, our report will provide much-needed information that will benefit patients with cancer globally. As such, we believe it is extremely important that our study be disseminated widely to alert clinicians and patients.This article is highlighted in the In This Issue feature, p. 747.

Assuntos

Betacoronavirus , Infecções por Coronavirus/terapia , Neoplasias , Pneumonia Viral/terapia , Idoso , COVID-19 , China/epidemiologia , Estudos de Coortes , Infecções por Coronavirus/complicações , Infecções por Coronavirus/epidemiologia , Surtos de Doenças , Feminino , Humanos , Unidades de Terapia Intensiva , Masculino , Pessoa de Meia-Idade , Neoplasias/complicações , Neoplasias/patologia , Neoplasias/terapia , Neoplasias/virologia , Pandemias , Pneumonia Viral/complicações , Pneumonia Viral/epidemiologia , Respiração Artificial , SARS-CoV-2

18.

Current state of science in machine learning methods for automatic infant pain evaluation using facial expression information: study protocol of a systematic review and meta-analysis.

Cheng, Dan; Liu, Dianbo; Philpotts, Lisa Liang; Turner, Dana P; Houle, Timothy T; Chen, Lucy; Zhang, Miaomiao; Yang, Jianjun; Zhang, Wei; Deng, Hao.

BMJ Open ; 9(12): e030482, 2019 12 11.

Artigo em Inglês | MEDLINE | ID: mdl-31831532

RESUMO

INTRODUCTION: Infants can experience pain similar to adults, and improperly controlled pain stimuli could have a long-term adverse impact on their cognitive and neurological function development. The biggest challenge of achieving good infant pain control is obtaining objective pain assessment when direct communication is lacking. For years, computer scientists have developed many different facial expression-centred machine learning (ML) methods for automatic infant pain assessment. Many of these ML algorithms showed rather satisfactory performance and have demonstrated good potential to be further enhanced for implementation in real-world clinical settings. To date, there is no prior research that has systematically summarised and compared the performance of these ML algorithms. Our proposed meta-analysis will provide the first comprehensive evidence on this topic to guide further ML algorithm development and clinical implementation. METHODS AND ANALYSIS: We will search four major public electronic medical and computer science databases including Web of Science, PubMed, Embase and IEEE Xplore Digital Library from January 2008 to present. All the articles will be imported into the Covidence platform for study eligibility screening and inclusion. Study-level extracted data will be stored in the Systematic Review Data Repository online platform. The primary outcome will be the prediction accuracy of the ML model. The secondary outcomes will be model utility measures including generalisability, interpretability and computational efficiency. All extracted outcome data will be imported into RevMan V.5.2.1 software and R V3.3.2 for analysis. Risk of bias will be summarised using the latest Prediction Model Study Risk of Bias Assessment Tool. ETHICS AND DISSEMINATION: This systematic review and meta-analysis will only use study-level data from public databases, thus formal ethical approval is not required. The results will be disseminated in the form of an official publication in a peer-reviewed journal and/or presentation at relevant conferences. PROSPERO REGISTRATION NUMBER: CRD42019118784.

Assuntos

Expressão Facial , Aprendizado de Máquina , Metanálise como Assunto , Medição da Dor/métodos , Projetos de Pesquisa , Revisões Sistemáticas como Assunto/métodos , Humanos , Lactente

19.

Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records.

Huang, Li; Shea, Andrew L; Qian, Huining; Masurkar, Aditya; Deng, Hao; Liu, Dianbo.

J Biomed Inform ; 99: 103291, 2019 11.

Artigo em Inglês | MEDLINE | ID: mdl-31560949

RESUMO

Electronic medical records (EMRs) support the development of machine learning algorithms for predicting disease incidence, patient response to treatment, and other healthcare events. But so far most algorithms have been centralized, taking little account of the decentralized, non-identically independently distributed (non-IID), and privacy-sensitive characteristics of EMRs that can complicate data collection, sharing and learning. To address this challenge, we introduced a community-based federated machine learning (CBFL) algorithm and evaluated it on non-IID ICU EMRs. Our algorithm clustered the distributed data into clinically meaningful communities that captured similar diagnoses and geographical locations, and learnt one model for each community. Throughout the learning process, the data was kept local at hospitals, while locally-computed results were aggregated on a server. Evaluation results show that CBFL outperformed the baseline federated machine learning (FL) algorithm in terms of Area Under the Receiver Operating Characteristic Curve (ROC AUC), Area Under the Precision-Recall Curve (PR AUC), and communication cost between hospitals and the server. Furthermore, communities' performance difference could be explained by how dissimilar one community was to others.

Assuntos

Estado Terminal/mortalidade , Registros Eletrônicos de Saúde/estatística & dados numéricos , Tempo de Internação/estatística & dados numéricos , Aprendizado de Máquina , Algoritmos , Análise por Conglomerados , Feminino , Humanos , Masculino , Pessoa de Meia-Idade

20.

Integrative construction of regulatory region networks in 127 human reference epigenomes by matrix factorization.

Liu, Dianbo; Davila-Velderrain, Jose; Zhang, Zhizhuo; Kellis, Manolis.

Nucleic Acids Res ; 47(14): 7235-7246, 2019 08 22.

Artigo em Inglês | MEDLINE | ID: mdl-31265076

RESUMO

Despite large experimental and computational efforts aiming to dissect the mechanisms underlying disease risk, mapping cis-regulatory elements to target genes remains a challenge. Here, we introduce a matrix factorization framework to integrate physical and functional interaction data of genomic segments. The framework was used to predict a regulatory network of chromatin interaction edges linking more than 20 000 promoters and 1.8 million enhancers across 127 human reference epigenomes, including edges that are present in any of the input datasets. Our network integrates functional evidence of correlated activity patterns from epigenomic data and physical evidence of chromatin interactions. An important contribution of this work is the representation of heterogeneous data with different qualities as networks. We show that the unbiased integration of independent data sources suggestive of regulatory interactions produces meaningful associations supported by existing functional and physical evidence, correlating with expected independent biological features.

Assuntos

Algoritmos , Biologia Computacional/métodos , Epigenômica/métodos , Redes Reguladoras de Genes , Regiões Promotoras Genéticas/genética , Elementos Reguladores de Transcrição/genética , Cromatina/genética , Cromatina/metabolismo , Perfilação da Expressão Gênica/métodos , Ontologia Genética , Humanos , Células K562 , Polimorfismo de Nucleotídeo Único , Mapeamento de Interação de Proteínas/métodos , Reprodutibilidade dos Testes

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA