Pesquisa | Portal Regional da BVS (teste)

1.

Automatically pre-screening patients for the rare disease aromatic l-amino acid decarboxylase deficiency using knowledge engineering, natural language processing, and machine learning on a large EHR population.

Cohen, Aaron M; Kaner, Jolie; Miller, Ryan; Kopesky, Jeffrey W; Hersh, William.

J Am Med Inform Assoc ; 31(3): 692-704, 2024 Feb 16.

Artigo em Inglês | MEDLINE | ID: mdl-38134953

RESUMO

OBJECTIVES: Electronic health record (EHR) data may facilitate the identification of rare diseases in patients, such as aromatic l-amino acid decarboxylase deficiency (AADCd), an autosomal recessive disease caused by pathogenic variants in the dopa decarboxylase gene. Deficiency of the AADC enzyme results in combined severe reductions in monoamine neurotransmitters: dopamine, serotonin, epinephrine, and norepinephrine. This leads to widespread neurological complications affecting motor, behavioral, and autonomic function. The goal of this study was to use EHR data to identify previously undiagnosed patients who may have AADCd without available training cases for the disease. MATERIALS AND METHODS: A multiple symptom and related disease annotated dataset was created and used to train individual concept classifiers on annotated sentence data. A multistep algorithm was then used to combine concept predictions into a single patient rank value. RESULTS: Using an 8000-patient dataset that the algorithms had not seen before ranking, the top and bottom 200 ranked patients were manually reviewed for clinical indications of performing an AADCd diagnostic screening test. The top-ranked patients were 22.5% positively assessed for diagnostic screening, with 0% for the bottom-ranked patients. This result is statistically significant at P < .0001. CONCLUSION: This work validates the approach that large-scale rare-disease screening can be accomplished by combining predictions for relevant individual symptoms and related conditions which are much more common and for which training data is easier to create.

Assuntos

Erros Inatos do Metabolismo dos Aminoácidos , Descarboxilases de Aminoácido-L-Aromático/deficiência , Processamento de Linguagem Natural , Doenças Raras , Humanos , Dopamina , Aprendizado de Máquina

2.

Clinical study applying machine learning to detect a rare disease: results and lessons learned.

Hersh, William R; Cohen, Aaron M; Nguyen, Michelle M; Bensching, Katherine L; Deloughery, Thomas G.

JAMIA Open ; 5(2): ooac053, 2022 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-35783073

RESUMO

Machine learning has the potential to improve identification of patients for appropriate diagnostic testing and treatment, including those who have rare diseases for which effective treatments are available, such as acute hepatic porphyria (AHP). We trained a machine learning model on 205â571 complete electronic health records from a single medical center based on 30 known cases to identify 22 patients with classic symptoms of AHP that had neither been diagnosed nor tested for AHP. We offered urine porphobilinogen testing to these patients via their clinicians. Of the 7 who agreed to testing, none were positive for AHP. We explore the reasons for this and provide lessons learned for further work evaluating machine learning to detect AHP and other rare diseases.

3.

Testing a filtering strategy for systematic reviews: evaluating work savings and recall.

Proescholdt, Randi; Hsiao, Tzu-Kun; Schneider, Jodi; Cohen, Aaron M; McDonagh, Marian S; Smalheiser, Neil R.

AMIA Jt Summits Transl Sci Proc ; 2022: 406-413, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35854734

RESUMO

Systematic reviews are extremely time-consuming. The goal of this work is to assess work savings and recall for a publication type filtering strategy that uses the output of two machine learning models, Multi-Tagger and web RCT Tagger, applied retrospectively to 10 systematic reviews on drug effectiveness. Our filtering strategy resulted in mean work savings of 33.6% and recall of 98.3%. Of 363 articles finally included in any of the systematic reviews, 7 were filtered out by our strategy, but 1 "error" was actually an article using a publication type that the SR team had not pre-specified as relevant for inclusion. Our analysis suggests that automated publication type filtering can potentially provide substantial work savings with minimal loss of included articles. Publication type filtering should be personalized for each systematic review and might be combined with other filtering or ranking methods to provide additional work savings for manual triage.

4.

Integrative analysis of drug response and clinical outcome in acute myeloid leukemia.

Bottomly, Daniel; Long, Nicola; Schultz, Anna Reister; Kurtz, Stephen E; Tognon, Cristina E; Johnson, Kara; Abel, Melissa; Agarwal, Anupriya; Avaylon, Sammantha; Benton, Erik; Blucher, Aurora; Borate, Uma; Braun, Theodore P; Brown, Jordana; Bryant, Jade; Burke, Russell; Carlos, Amy; Chang, Bill H; Cho, Hyun Jun; Christy, Stephen; Coblentz, Cody; Cohen, Aaron M; d'Almeida, Amanda; Cook, Rachel; Danilov, Alexey; Dao, Kim-Hien T; Degnin, Michie; Dibb, James; Eide, Christopher A; English, Isabel; Hagler, Stuart; Harrelson, Heath; Henson, Rachel; Ho, Hibery; Joshi, Sunil K; Junio, Brian; Kaempf, Andy; Kosaka, Yoko; Laderas, Ted; Lawhead, Matt; Lee, Hyunjung; Leonard, Jessica T; Lin, Chenwei; Lind, Evan F; Liu, Selina Qiuying; Lo, Pierrette; Loriaux, Marc M; Luty, Samuel; Maxson, Julia E; Macey, Tara.

Cancer Cell ; 40(8): 850-864.e9, 2022 08 08.

Artigo em Inglês | MEDLINE | ID: mdl-35868306

RESUMO

Acute myeloid leukemia (AML) is a cancer of myeloid-lineage cells with limited therapeutic options. We previously combined ex vivo drug sensitivity with genomic, transcriptomic, and clinical annotations for a large cohort of AML patients, which facilitated discovery of functional genomic correlates. Here, we present a dataset that has been harmonized with our initial report to yield a cumulative cohort of 805 patients (942 specimens). We show strong cross-cohort concordance and identify features of drug response. Further, deconvoluting transcriptomic data shows that drug sensitivity is governed broadly by AML cell differentiation state, sometimes conditionally affecting other correlates of response. Finally, modeling of clinical outcome reveals a single gene, PEAR1, to be among the strongest predictors of patient survival, especially for young patients. Collectively, this report expands a large functional genomic resource, offers avenues for mechanistic exploration and drug development, and reveals tools for predicting outcome in AML.

Assuntos

Leucemia Mieloide Aguda , Diferenciação Celular , Estudos de Coortes , Humanos , Leucemia Mieloide Aguda/tratamento farmacológico , Leucemia Mieloide Aguda/genética , Receptores de Superfície Celular/genética , Transcriptoma

5.

Evaluation of publication type tagging as a strategy to screen randomized controlled trial articles in preparing systematic reviews.

Schneider, Jodi; Hoang, Linh; Kansara, Yogeshwar; Cohen, Aaron M; Smalheiser, Neil R.

JAMIA Open ; 5(1): ooac015, 2022 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-35571360

RESUMO

Objectives: To produce a systematic review (SR), reviewers typically screen thousands of titles and abstracts of articles manually to find a small number which are read in full text to find relevant articles included in the final SR. Here, we evaluate a proposed automated probabilistic publication type screening strategy applied to the randomized controlled trial (RCT) articles (i.e., those which present clinical outcome results of RCT studies) included in a corpus of previously published Cochrane reviews. Materials and Methods: We selected a random subset of 558 published Cochrane reviews that specified RCT study only inclusion criteria, containing 7113 included articles which could be matched to PubMed identifiers. These were processed by our automated RCT Tagger tool to estimate the probability that each article reports clinical outcomes of a RCT. Results: Removing articles with low predictive scores P < 0.01 eliminated 288 included articles, of which only 22 were actually typical RCT articles, and only 18 were actually typical RCT articles that MEDLINE indexed as such. Based on our sample set, this screening strategy led to fewer than 0.05 relevant RCT articles being missed on average per Cochrane SR. Discussion: This scenario, based on real SRs, demonstrates that automated tagging can identify RCT articles accurately while maintaining very high recall. However, we also found that even SRs whose inclusion criteria are restricted to RCT studies include not only clinical outcome articles per se, but a variety of ancillary article types as well. Conclusions: This encourages further studies learning how best to incorporate automated tagging of additional publication types into SR triage workflows.

6.

An Analysis of Two Sources of Cardiology Patient Data to Measure Medication Agreement.

Goueth, Rose C; Cohen, Aaron M; Weiskopf, Nicole G.

AMIA Jt Summits Transl Sci Proc ; 2021: 267-275, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34457141

RESUMO

Errors and incompleteness in electronic health record (EHR) medication lists can result in medical errors. To reduce errors in these medication lists, clinicians use patient self-reported data to reconcile EHR data. We assessed the agreement between patient self-reported medications and medications recorded in the EHR for six medication classes related to cardiovascular care and used logistic regression models to determine which patient-related factors were associated with the disagreement between these two information sources. From our 297 patients, we found self-reported medications had an overall above-average agreement with the EHR (? = .727). We observed the highest agreement level for statins (? = .831) and the lowest for other antihypertensives (? = .465). Agreement was less likely for Hispanic and male patients. We also performed an in-depth error analysis of different types of disagreement beyond medication names, which revealed that the most frequent type of disagreement was mismatched dosages.

Assuntos

Cardiologia , Registros Eletrônicos de Saúde , Anti-Hipertensivos , Humanos , Masculino

7.

Evaluation of patient-level retrieval from electronic health record data for a cohort discovery task.

Chamberlin, Steven R; Bedrick, Steven D; Cohen, Aaron M; Wang, Yanshan; Wen, Andrew; Liu, Sijia; Liu, Hongfang; Hersh, William R.

JAMIA Open ; 3(3): 395-404, 2020 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-33215074

RESUMO

OBJECTIVE: Growing numbers of academic medical centers offer patient cohort discovery tools to their researchers, yet the performance of systems for this use case is not well understood. The objective of this research was to assess patient-level information retrieval methods using electronic health records for different types of cohort definition retrieval. MATERIALS AND METHODS: We developed a test collection consisting of about 100 000 patient records and 56 test topics that characterized patient cohort requests for various clinical studies. Automated information retrieval tasks using word-based approaches were performed, varying 4 different parameters for a total of 48 permutations, with performance measured using B-Pref. We subsequently created structured Boolean queries for the 56 topics for performance comparisons. In addition, we performed a more detailed analysis of 10 topics. RESULTS: The best-performing word-based automated query parameter settings achieved a mean B-Pref of 0.167 across all 56 topics. The way a topic was structured (topic representation) had the largest impact on performance. Performance not only varied widely across topics, but there was also a large variance in sensitivity to parameter settings across the topics. Structured queries generally performed better than automated queries on measures of recall and precision but were still not able to recall all relevant patients found by the automated queries. CONCLUSION: While word-based automated methods of cohort retrieval offer an attractive solution to the labor-intensive nature of this task currently used at many medical centers, we generally found suboptimal performance in those approaches, with better performance obtained from structured Boolean queries. Future work will focus on using the test collection to develop and evaluate new approaches to query structure, weighting algorithms, and application of semantic methods.

8.

Correction: Detecting rare diseases in electronic health records using machine learning and knowledge engineering: Case study of acute hepatic porphyria.

Cohen, Aaron M; Chamberlin, Steven; Deloughery, Thomas; Nguyen, Michelle; Bedrick, Steven; Meninger, Stephen; Ko, John J; Amin, Jigar J; Wei, Alex H; Hersh, William.

PLoS One ; 15(8): e0238277, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32817711

RESUMO

[This corrects the article DOI: 10.1371/journal.pone.0235574.].

9.

Detecting rare diseases in electronic health records using machine learning and knowledge engineering: Case study of acute hepatic porphyria.

Cohen, Aaron M; Chamberlin, Steven; Deloughery, Thomas; Nguyen, Michelle; Bedrick, Steven; Meninger, Stephen; Ko, John J; Amin, Jigar J; Wei, Alex J; Hersh, William.

PLoS One ; 15(7): e0235574, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32614911

RESUMO

BACKGROUND: With the growing adoption of the electronic health record (EHR) worldwide over the last decade, new opportunities exist for leveraging EHR data for detection of rare diseases. Rare diseases are often not diagnosed or delayed in diagnosis by clinicians who encounter them infrequently. One such rare disease that may be amenable to EHR-based detection is acute hepatic porphyria (AHP). AHP consists of a family of rare, metabolic diseases characterized by potentially life-threatening acute attacks and chronic debilitating symptoms. The goal of this study was to apply machine learning and knowledge engineering to a large extract of EHR data to determine whether they could be effective in identifying patients not previously tested for AHP who should receive a proper diagnostic workup for AHP. METHODS AND FINDINGS: We used an extract of the complete EHR data of 200,000 patients from an academic medical center and enriched it with records from an additional 5,571 patients containing any mention of porphyria in the record. After manually reviewing the records of all 47 unique patients with the ICD-10-CM code E80.21 (Acute intermittent [hepatic] porphyria), we identified 30 patients who were positive cases for our machine learning models, with the rest of the patients used as negative cases. We parsed the record into features, which were scored by frequency of appearance and filtered using univariate feature analysis. We manually choose features not directly tied to provider attributes or suspicion of the patient having AHP. We trained on the full dataset, with the best cross-validation performance coming from support vector machine (SVM) algorithm using a radial basis function (RBF) kernel. The trained model was applied back to the full data set and patients were ranked by margin distance. The top 100 ranked negative cases were manually reviewed for symptom complexes similar to AHP, finding four patients where AHP diagnostic testing was likely indicated and 18 patients where AHP diagnostic testing was possibly indicated. From the top 100 ranked cases of patients with mention of porphyria in their record, we identified four patients for whom AHP diagnostic testing was possibly indicated and had not been previously performed. Based solely on the reported prevalence of AHP, we would have expected only 0.002 cases out of the 200 patients manually reviewed. CONCLUSIONS: The application of machine learning and knowledge engineering to EHR data may facilitate the diagnosis of rare diseases such as AHP. Further work will recommend clinical investigation to identified patients' clinicians, evaluate more patients, assess additional feature selection and machine learning algorithms, and apply this methodology to other rare diseases. This work provides strong evidence that population-level informatics can be applied to rare diseases, greatly improving our ability to identify undiagnosed patients, and in the future improve the care of these patients and our ability study these diseases. The next step is to learn how best to apply these EHR-based machine learning approaches to benefit individual patients with a clinical study that provides diagnostic testing and clinical follow up for those identified as possibly having undiagnosed AHP.

Assuntos

Conhecimento , Aprendizado de Máquina , Sintase do Porfobilinogênio/deficiência , Porfirias Hepáticas/diagnóstico , Bases de Dados Factuais , Registros Eletrônicos de Saúde , Feminino , Humanos , Masculino , Porfirias Hepáticas/patologia

10.

Identifying main finding sentences in clinical case reports.

Luo, Mengqi; Cohen, Aaron M; Addepalli, Sidharth; Smalheiser, Neil R.

Database (Oxford) ; 20202020 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-32525207

RESUMO

Clinical case reports are the 'eyewitness reports' of medicine and provide a valuable, unique, albeit noisy and underutilized type of evidence. Generally, a case report has a single main finding that represents the reason for writing up the report in the first place. However, no one has previously created an automatic way of identifying main finding sentences in case reports. We previously created a manual corpus of main finding sentences extracted from the abstracts and full text of clinical case reports. Here, we have utilized the corpus to create a machine learning-based model that automatically predicts which sentence(s) from abstracts state the main finding. The model has been evaluated on a separate manual corpus of clinical case reports and found to have good performance. This is a step toward setting up a retrieval system in which, given one case report, one can find other case reports that report the same or very similar main findings. The code and necessary files to run the main finding model can be downloaded from https://github.com/qi29/main_ finding_recognition, released under the Apache License, Version 2.0.

Assuntos

Mineração de Dados/métodos , Aprendizado de Máquina , Prontuários Médicos/classificação , Humanos , Processamento de Linguagem Natural , Software

11.

Modelling disease risk for amyloid A (AA) amyloidosis in non-human primates using machine learning.

Leung, Eric T; Raboin, Michael J; McKelvey, Jessica; Graham, Adam; Lewis, Anne; Prongay, Kamm; Cohen, Aaron M; Vinson, Amanda.

Amyloid ; 26(3): 139-147, 2019 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-31210531

RESUMO

Objective: Amyloid A (AA) amyloidosis is found in humans and non-human primates, but quantifying disease risk prior to clinical symptoms is challenging. We applied machine learning to identify the best predictors of amyloidosis in rhesus macaques from available clinical and pathology records. To explore potential biomarkers, we also assessed whether changes in circulating serum amyloid A (SAA) or lipoprotein profiles accompany the disease. Methods: We conducted a retrospective study using 86 cases and 163 controls matched for age and sex. We performed data reduction on 62 clinical, pathological and demographic variables, and applied multivariate modelling and model selection with cross-validation. To test the performance of our final model, we applied it to a replication cohort of 2,775 macaques. Results: The strongest predictors of disease were colitis, gastrointestinal adenocarcinoma, endometriosis, arthritis, trauma, diarrhoea and number of pregnancies. Sensitivity and specificity of the risk model were predicted to be 82%, and were assessed at 79 and 72%, respectively. Total, low density lipoprotein and high density lipoprotein cholesterol levels were significantly lower, and SAA levels and triglyceride-to-HDL ratios were significantly higher in cases versus controls. Conclusion: Machine learning is a powerful approach to identifying macaques at risk of AA amyloidosis, which is accompanied by increased circulating SAA and altered lipoprotein profiles.

Assuntos

Amiloidose/diagnóstico , Aprendizado de Máquina/estatística & dados numéricos , Modelos Estatísticos , Proteína Amiloide A Sérica/metabolismo , Adenocarcinoma/diagnóstico , Adenocarcinoma/fisiopatologia , Amiloidose/sangue , Amiloidose/fisiopatologia , Animais , Artrite/diagnóstico , Artrite/fisiopatologia , Biomarcadores/sangue , Estudos de Casos e Controles , HDL-Colesterol/sangue , LDL-Colesterol/sangue , Colite/diagnóstico , Colite/fisiopatologia , Diarreia/diagnóstico , Diarreia/fisiopatologia , Modelos Animais de Doenças , Endometriose/diagnóstico , Endometriose/fisiopatologia , Feminino , Neoplasias Gastrointestinais/diagnóstico , Neoplasias Gastrointestinais/fisiopatologia , Humanos , Macaca mulatta , Masculino , Estudos Retrospectivos , Fatores de Risco , Triglicerídeos/sangue , Ferimentos e Lesões/diagnóstico , Ferimentos e Lesões/fisiopatologia

12.

Unsupervised low-dimensional vector representations for words, phrases and text that are transparent, scalable, and produce similarity metrics that are not redundant with neural embeddings.

Smalheiser, Neil R; Cohen, Aaron M; Bonifield, Gary.

J Biomed Inform ; 90: 103096, 2019 02.

Artigo em Inglês | MEDLINE | ID: mdl-30654030

RESUMO

Neural embeddings are a popular set of methods for representing words, phrases or text as a low dimensional vector (typically 50-500 dimensions). However, it is difficult to interpret these dimensions in a meaningful manner, and creating neural embeddings requires extensive training and tuning of multiple parameters and hyperparameters. We present here a simple unsupervised method for representing words, phrases or text as a low dimensional vector, in which the meaning and relative importance of dimensions is transparent to inspection. We have created a near-comprehensive vector representation of words, and selected bigrams, trigrams and abbreviations, using the set of titles and abstracts in PubMed as a corpus. This vector is used to create several novel implicit word-word and text-text similarity metrics. The implicit word-word similarity metrics correlate well with human judgement of word pair similarity and relatedness, and outperform or equal all other reported methods on a variety of biomedical benchmarks, including several implementations of neural embeddings trained on PubMed corpora. Our implicit word-word metrics capture different aspects of word-word relatedness than word2vec-based metrics and are only partially correlated (rhoâ¯=â¯0.5-0.8 depending on task and corpus). The vector representations of words, bigrams, trigrams, abbreviations, and PubMed titleâ¯+â¯abstracts are all publicly available from http://arrowsmith.psych.uic.edu/arrowsmith_uic/word_similarity_metrics.html for release under CC-BY-NC license. Several public web query interfaces are also available at the same site, including one which allows the user to specify a given word and view its most closely related terms according to direct co-occurrence as well as different implicit similarity metrics.

Assuntos

Mineração de Dados , PubMed , Semântica

13.

Towards augmenting structured EHR data: a comparison of manual chart review and patient self-report.

Weiskopf, Nicole G; Cohen, Aaron M; Hannan, Joely; Jarmon, Thad; Dorr, David A.

AMIA Annu Symp Proc ; 2019: 903-912, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-32308887

RESUMO

Structured electronic health record (EHR) data are often used for quality measurement and improvement, clinical research, and other secondary uses. These data, however, are known to suffer from quality problems. There may be value in augmenting structured EHR data to improve data quality, thereby improving the reliability and validity of the conclusions drawn from those data. Focusing on five diagnoses related to cardiovascular care, this paper considers the added value of two alternative data sources: manual chart abstraction and patient self-report. We assess the overall agreement between structured EHR problem list data, abstracted EHR data, and patient self- report; and explore possible causes of disagreement between those sources. Our findings suggest that both chart abstraction and patient self-report contain significantly more diagnoses than the problem list, but that the information they capture is different. Methods for collecting and validating self-reported medical data require further consideration and exploration.

Assuntos

Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação , Autorrelato , Adulto , Idoso , Idoso de 80 Anos ou mais , Confiabilidade dos Dados , Feminino , Humanos , Masculino , Registros Médicos Orientados a Problemas , Pessoa de Meia-Idade , Reprodutibilidade dos Testes , Adulto Jovem

14.

A probabilistic automated tagger to identify human-related publications.

Cohen, Aaron M; Dunivin, Zackary O; Smalheiser, Neil R.

Database (Oxford) ; 2018: 1-8, 2018 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-30184195

RESUMO

The Medical Subject Heading 'Humans' is manually curated and indicates human-related studies within MEDLINE. However, newly published MEDLINE articles may take months to be indexed and non-MEDLINE articles lack consistent, transparent indexing of this feature. Therefore, for up to date and broad literature searches, there is a need for an independent automated system to identify whether a given publication is human-related, particularly when they lack Medical Subject Headings. One million MEDLINE records published in 1987-2014 were randomly selected. Text-based features from the title, abstract, author name and journal fields were extracted. A linear support vector machine was trained to estimate the probability that a given article should be indexed as Humans and was evaluated on records from 2015 to 2016. Overall accuracy was high: area under the receiver operating curve = 0.976, F1 = 95% relative to MeSH indexing. Manual review of cases of extreme disagreement with MEDLINE showed 73.5% agreement with the automated prediction. We have tagged all articles indexed in PubMed with predictive scores and have made the information publicly available at http://arrowsmith.psych.uic.edu/evidence_based_medicine/index.html. We have also made available a web-based interface to allow users to obtain predictive scores for non-MEDLINE articles. This will assist in the triage of clinical evidence for writing systematic reviews.

Assuntos

Automação , Probabilidade , Publicações , Calibragem , Bases de Dados como Assunto , Humanos , Reprodutibilidade dos Testes

15.

Design of a generic, open platform for machine learning-assisted indexing and clustering of articles in PubMed, a biomedical bibliographic database.

Smalheiser, Neil R; Cohen, Aaron M.

Data Inf Manag ; 2(1): 27-36, 2018 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-30766970

RESUMO

Many investigators have carried out text mining of the biomedical literature for a variety of purposes, ranging from the assignment of indexing terms to the disambiguation of author names. A common approach is to define positive and negative training examples, extract features from article metadata, and employ machine learning algorithms. At present, each research group tackles each problem from scratch, and in isolation of other projects, which causes redundancy and great waste of effort. Here, we propose and describe the design of a generic platform for biomedical text mining, which can serve as a shared resource for machine learning projects, and can serve as a public repository for their outputs. We will initially focus on a specific goal, namely, classifying articles according to Publication Type, and emphasize how feature sets can be made more powerful and robust through the use of multiple, heterogeneous similarity measures as input to machine learning models. We then discuss how the generic platform can be extended to include a wide variety of other machine learning based goals and projects, and can be used as a public platform for disseminating the results of NLP tools to end-users as well.

16.

Identifying reports of randomized controlled trials (RCTs) via a hybrid machine learning and crowdsourcing approach.

Wallace, Byron C; Noel-Storr, Anna; Marshall, Iain J; Cohen, Aaron M; Smalheiser, Neil R; Thomas, James.

J Am Med Inform Assoc ; 24(6): 1165-1168, 2017 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-28541493

RESUMO

OBJECTIVES: Identifying all published reports of randomized controlled trials (RCTs) is an important aim, but it requires extensive manual effort to separate RCTs from non-RCTs, even using current machine learning (ML) approaches. We aimed to make this process more efficient via a hybrid approach using both crowdsourcing and ML. METHODS: We trained a classifier to discriminate between citations that describe RCTs and those that do not. We then adopted a simple strategy of automatically excluding citations deemed very unlikely to be RCTs by the classifier and deferring to crowdworkers otherwise. RESULTS: Combining ML and crowdsourcing provides a highly sensitive RCT identification strategy (our estimates suggest 95%-99% recall) with substantially less effort (we observed a reduction of around 60%-80%) than relying on manual screening alone. CONCLUSIONS: Hybrid crowd-ML strategies warrant further exploration for biomedical curation/annotation tasks.

Assuntos

Crowdsourcing , Armazenamento e Recuperação da Informação/métodos , Aprendizado de Máquina , Ensaios Clínicos Controlados Aleatórios como Assunto , Pesquisa Biomédica , Bases de Dados Bibliográficas , Processamento de Linguagem Natural , Curva ROC , Literatura de Revisão como Assunto , Máquina de Vetores de Suporte

17.

Evaluation of Clinical Text Segmentation to Facilitate Cohort Retrieval.

Edinger, Tracy; Demner-Fushman, Dina; Cohen, Aaron M; Bedrick, Steven; Hersh, William.

AMIA Annu Symp Proc ; 2017: 660-669, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-29854131

RESUMO

Objective: Secondary use of electronic health record (EHR) data is enabled by accurate and complete retrieval of the relevant patient cohort, which requires searching both structured and unstructured data. Clinical text poses difficulties to searching, although chart notes incorporate structure that may facilitate accurate retrieval. Methods: We developed rules identifying clinical document sections, which can be indexed in search engines that allow faceted searches, such as Lucene or Essie, an NLM search engine. We developed 22 clinical cohorts and two queries for each cohort, one utilizing section headings and the other searching the whole document. We manually evaluated a subset of retrieved documents to compare query performance. Results: Querying by section had lower recall than whole-document queries (0.83 vs 0.95), higher precision (0.73 vs 0.54), and higher F1 (0.78 vs 0.69). Conclusion: This evaluation suggests that searching specific sections may improve precision under certain conditions and often with loss of recall.

Assuntos

Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Ferramenta de Busca , Indexação e Redação de Resumos , Humanos

18.

A Mixed Methods Task Analysis of the Implementation and Validation of EHR-Based Clinical Quality Measures.

Weiskopf, Nicole G; Khan, Faiza J; Woodcock, Deborah; Dorr, David A; Cigarroa, Joaquin E; Cohen, Aaron M.

AMIA Annu Symp Proc ; 2016: 1229-1237, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-28269920

RESUMO

Clinical quality measures (CQMs) are important tools for the assessment and improvement of health care quality. Federal requirements initially set forth in the American Recovery and Reinvestment Act, and advanced in subsequent stages of the requirements, codified electronic health record (EHR)-based CQM reporting, and have made automated CQM implementation a priority amongst the clinical and informatics communities. Nevertheless, the processes surrounding CQM implementation and validation remain complex, time-consuming, and largely undefined. We collected issue-tracking data during the course of an agile and rigorous collaborative project to build an analytics platform for the Knight Cardiovascular Institute at OHSU, with nine heart failure CQMs defined by the American College of Cardiology (ACC) as an exemplar. Using a mixed methods approach we provide an overview of our CQM implementation and validation process, identify major roadblocks and bottlenecks, and make recommendations for other professionals working in the area of health care quality assessment and improvement.

Assuntos

Registros Eletrônicos de Saúde , Garantia da Qualidade dos Cuidados de Saúde/métodos , Insuficiência Cardíaca , Humanos , Qualidade da Assistência à Saúde , Estados Unidos , Estudos de Validação como Assunto

19.

Plasma Exosomal miRNAs in Persons with and without Alzheimer Disease: Altered Expression and Prospects for Biomarkers.

Lugli, Giovanni; Cohen, Aaron M; Bennett, David A; Shah, Raj C; Fields, Christopher J; Hernandez, Alvaro G; Smalheiser, Neil R.

PLoS One ; 10(10): e0139233, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26426747

RESUMO

To assess the value of exosomal miRNAs as biomarkers for Alzheimer disease (AD), the expression of microRNAs was measured in a plasma fraction enriched in exosomes by differential centrifugation, using Illumina deep sequencing. Samples from 35 persons with a clinical diagnosis of AD dementia were compared to 35 age and sex matched controls. Although these samples contained less than 0.1 microgram of total RNA, deep sequencing gave reliable and informative results. Twenty miRNAs showed significant differences in the AD group in initial screening (miR-23b-3p, miR-24-3p, miR-29b-3p, miR-125b-5p, miR-138-5p, miR-139-5p, miR-141-3p, miR-150-5p, miR-152-3p, miR-185-5p, miR-338-3p, miR-342-3p, miR-342-5p, miR-548at-5p, miR-659-5p, miR-3065-5p, miR-3613-3p, miR-3916, miR-4772-3p, miR-5001-3p), many of which satisfied additional biological and statistical criteria, and among which a panel of seven miRNAs were highly informative in a machine learning model for predicting AD status of individual samples with 83-89% accuracy. This performance is not due to over-fitting, because a) we used separate samples for training and testing, and b) similar performance was achieved when tested on technical replicate data. Perhaps the most interesting single miRNA was miR-342-3p, which was a) expressed in the AD group at about 60% of control levels, b) highly correlated with several of the other miRNAs that were significantly down-regulated in AD, and c) was also reported to be down-regulated in AD in two previous studies. The findings warrant replication and follow-up with a larger cohort of patients and controls who have been carefully characterized in terms of cognitive and imaging data, other biomarkers (e.g., CSF amyloid and tau levels) and risk factors (e.g., apoE4 status), and who are sampled repeatedly over time. Integrating miRNA expression data with other data is likely to provide informative and robust biomarkers in Alzheimer disease.

Assuntos

Doença de Alzheimer/genética , Biomarcadores Tumorais/metabolismo , Exossomos/genética , Regulação Neoplásica da Expressão Gênica , MicroRNAs/genética , Plasma/metabolismo , Animais , Estudos de Casos e Controles , Feminino , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Masculino , Camundongos

20.

Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine.

Cohen, Aaron M; Smalheiser, Neil R; McDonagh, Marian S; Yu, Clement; Adams, Clive E; Davis, John M; Yu, Philip S.

J Am Med Inform Assoc ; 22(3): 707-17, 2015 May.

Artigo em Inglês | MEDLINE | ID: mdl-25656516

RESUMO

OBJECTIVE: For many literature review tasks, including systematic review (SR) and other aspects of evidence-based medicine, it is important to know whether an article describes a randomized controlled trial (RCT). Current manual annotation is not complete or flexible enough for the SR process. In this work, highly accurate machine learning predictive models were built that include confidence predictions of whether an article is an RCT. MATERIALS AND METHODS: The LibSVM classifier was used with forward selection of potential feature sets on a large human-related subset of MEDLINE to create a classification model requiring only the citation, abstract, and MeSH terms for each article. RESULTS: The model achieved an area under the receiver operating characteristic curve of 0.973 and mean squared error of 0.013 on the held out year 2011 data. Accurate confidence estimates were confirmed on a manually reviewed set of test articles. A second model not requiring MeSH terms was also created, and performs almost as well. DISCUSSION: Both models accurately rank and predict article RCT confidence. Using the model and the manually reviewed samples, it is estimated that about 8000 (3%) additional RCTs can be identified in MEDLINE, and that 5% of articles tagged as RCTs in Medline may not be identified. CONCLUSION: Retagging human-related studies with a continuously valued RCT confidence is potentially more useful for article ranking and review than a simple yes/no prediction. The automated RCT tagging tool should offer significant savings of time and effort during the process of writing SRs, and is a key component of a multistep text mining pipeline that we are building to streamline SR workflow. In addition, the model may be useful for identifying errors in MEDLINE publication types. The RCT confidence predictions described here have been made available to users as a web service with a user query form front end at: http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/RCT_Tagger.cgi.

Assuntos

Inteligência Artificial , Armazenamento e Recuperação da Informação/métodos , Ensaios Clínicos Controlados Aleatórios como Assunto , Literatura de Revisão como Assunto , Máquina de Vetores de Suporte , Medicina Baseada em Evidências , Humanos , MEDLINE , Curva ROC

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA