Pesquisa | Portal Regional da BVS

Customizable Natural Language Processing Biomarker Extraction Tool.

Holmes, Benjamin; Chitale, Dhananjay; Loving, Joshua; Tran, Mary; Subramanian, Vinod; Berry, Anna; Rioth, Matthew; Warrier, Raghu; Brown, Thomas.

JCO Clin Cancer Inform ; 5: 833-841, 2021 08.

Artigo em Inglês | MEDLINE | ID: mdl-34406803

RESUMO

PURPOSE: Natural language processing (NLP) in pathology reports to extract biomarker information is an ongoing area of research. MetaMap is a natural language processing tool developed and funded by the National Library of Medicine to map biomedical text to the Unified Medical Language System Metathesaurus by applying specific tags to clinically relevant terms. Although results are useful without additional postprocessing, these tags lack important contextual information. METHODS: Our novel method takes terminology-driven semantic tags and incorporates those into a semantic frame that is task-specific to add necessary context to MetaMap. We use important contextual information to capture biomarker results to support Community Health System's use of Precision Medicine treatments for patients with cancer. For each biomarker, the name, type, numeric quantifiers, non-numeric qualifiers, and the time frame are extracted. These fields then associate biomarkers with their context in the pathology report such as test type, probe intensity, copy-number changes, and even failed results. A selection of 6,713 relevant reports contained the following standard-of-care biomarkers for metastatic breast cancer: breast cancer gene 1 and 2, estrogen receptor, progesterone receptor, human epidermal growth factor receptor 2, and programmed death-ligand 1. RESULTS: The method was tested on pathology reports from the internal pathology laboratory at Henry Ford Health System. A certified tumor registrar reviewed 400 tests, which showed > 95% accuracy for all extracted biomarker types. CONCLUSION: Using this new method, it is possible to extract high-quality, contextual biomarker information, and this represents a significant advance in biomarker extraction.

Assuntos

Processamento de Linguagem Natural , Neoplasias , Biomarcadores , Humanos , Relatório de Pesquisa

A core genome approach that enables prospective and dynamic monitoring of infectious outbreaks.

Aggelen, Helen van; Kolde, Raivo; Chamarthi, Hareesh; Loving, Joshua; Fan, Yu; Fallon, John T; Huang, Weihua; Wang, Guiqing; Fortunato-Habib, Mary M; Carmona, Juan J; Gross, Brian D.

Sci Rep ; 9(1): 7808, 2019 05 24.

Artigo em Inglês | MEDLINE | ID: mdl-31127153

RESUMO

Whole-genome sequencing is increasingly adopted in clinical settings to identify pathogen transmissions, though largely as a retrospective tool. Prospective monitoring, in which samples are continuously added and compared to previous samples, can generate more actionable information. To enable prospective pathogen comparison, genomic relatedness metrics based on single-nucleotide differences must be consistent across time, efficient to compute and reliable for a large variety of samples. The choice of genomic regions to compare, i.e., the core genome, is critical to obtain a good metric. We propose a novel core genome method that selects conserved sequences in the reference genome by comparing its k-mer content to that of publicly available genome assemblies. The conserved-sequence genome is sample set-independent, which enables prospective pathogen monitoring. Based on clinical data sets of 3436 S. aureus, 1362 K. pneumoniae and 348 E. faecium samples, ROC curves demonstrate that the conserved-sequence genome disambiguates same-patient samples better than a core genome consisting of conserved genes. The conserved-sequence genome confirms outbreak samples with high sensitivity: in a set of 2335 S. aureus samples, it correctly identifies 44 out of 44 known outbreak samples, whereas the conserved-gene method confirms 38 known outbreak samples.

Assuntos

Infecções Bacterianas/microbiologia , Doenças Transmissíveis/microbiologia , Genoma Bacteriano , Genômica/métodos , Bactérias/genética , Infecções Bacterianas/epidemiologia , Doenças Transmissíveis/epidemiologia , Surtos de Doenças , Enterococcus faecium/genética , Humanos , Klebsiella pneumoniae/genética , Epidemiologia Molecular , Staphylococcus aureus/genética , Sequenciamento Completo do Genoma

Integration of genomic and clinical data augments surveillance of healthcare-acquired infections.

Ward, Doyle V; Hoss, Andrew G; Kolde, Raivo; van Aggelen, Helen C; Loving, Joshua; Smith, Stephen A; Mack, Deborah A; Kathirvel, Raja; Halperin, Jeffery A; Buell, Douglas J; Wong, Brian E; Ashworth, Judy L; Fortunato-Habib, Mary M; Xu, Liyi; Barton, Bruce A; Lazar, Peter; Carmona, Juan J; Mathew, Jomol; Salgo, Ivan S; Gross, Brian D; Ellison, Richard T.

Infect Control Hosp Epidemiol ; 40(6): 649-655, 2019 06.

Artigo em Inglês | MEDLINE | ID: mdl-31012399

RESUMO

BACKGROUND: Determining infectious cross-transmission events in healthcare settings involves manual surveillance of case clusters by infection control personnel, followed by strain typing of clinical/environmental isolates suspected in said clusters. Recent advances in genomic sequencing and cloud computing now allow for the rapid molecular typing of infecting isolates. OBJECTIVE: To facilitate rapid recognition of transmission clusters, we aimed to assess infection control surveillance using whole-genome sequencing (WGS) of microbial pathogens to identify cross-transmission events for epidemiologic review. METHODS: Clinical isolates of Staphylococcus aureus, Enterococcus faecium, Pseudomonas aeruginosa, and Klebsiella pneumoniae were obtained prospectively at an academic medical center, from September 1, 2016, to September 30, 2017. Isolate genomes were sequenced, followed by single-nucleotide variant analysis; a cloud-computing platform was used for whole-genome sequence analysis and cluster identification. RESULTS: Most strains of the 4 studied pathogens were unrelated, and 34 potential transmission clusters were present. The characteristics of the potential clusters were complex and likely not identifiable by traditional surveillance alone. Notably, only 1 cluster had been suspected by routine manual surveillance. CONCLUSIONS: Our work supports the assertion that integration of genomic and clinical epidemiologic data can augment infection control surveillance for both the identification of cross-transmission events and the inclusion of missed and exclusion of misidentified outbreaks (ie, false alarms). The integration of clinical data is essential to prioritize suspect clusters for investigation, and for existing infections, a timely review of both the clinical and WGS results can hold promise to reduce HAIs. A richer understanding of cross-transmission events within healthcare settings will require the expansion of current surveillance approaches.

Assuntos

Infecção Hospitalar/epidemiologia , Genoma Bacteriano , Controle de Infecções/métodos , Tipagem Molecular , Sequenciamento Completo do Genoma , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Criança , Pré-Escolar , Análise por Conglomerados , Infecção Hospitalar/microbiologia , Infecção Hospitalar/prevenção & controle , Surtos de Doenças , Feminino , Humanos , Lactente , Recém-Nascido , Masculino , Massachusetts , Pessoa de Meia-Idade , Epidemiologia Molecular/métodos , Adulto Jovem

BitPAl: a bit-parallel, general integer-scoring sequence alignment algorithm.

Loving, Joshua; Hernandez, Yozen; Benson, Gary.

Bioinformatics ; 30(22): 3166-73, 2014 Nov 15.

Artigo em Inglês | MEDLINE | ID: mdl-25075119

RESUMO

MOTIVATION: Mapping of high-throughput sequencing data and other bulk sequence comparison applications have motivated a search for high-efficiency sequence alignment algorithms. The bit-parallel approach represents individual cells in an alignment scoring matrix as bits in computer words and emulates the calculation of scores by a series of logic operations composed of AND, OR, XOR, complement, shift and addition. Bit-parallelism has been successfully applied to the longest common subsequence (LCS) and edit-distance problems, producing fast algorithms in practice. RESULTS: We have developed BitPAl, a bit-parallel algorithm for general, integer-scoring global alignment. Integer-scoring schemes assign integer weights for match, mismatch and insertion/deletion. The BitPAl method uses structural properties in the relationship between adjacent scores in the scoring matrix to construct classes of efficient algorithms, each designed for a particular set of weights. In timed tests, we show that BitPAl runs 7-25 times faster than a standard iterative algorithm. AVAILABILITY AND IMPLEMENTATION: Source code is freely available for download at http://lobstah.bu.edu/BitPAl/BitPAl.html. BitPAl is implemented in C and runs on all major operating systems. CONTACT: jloving@bu.edu or yhernand@bu.edu or gbenson@bu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Alinhamento de Sequência/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Software

VNTRseek-a computational tool to detect tandem repeat variants in high-throughput sequencing data.

Gelfand, Yevgeniy; Hernandez, Yozen; Loving, Joshua; Benson, Gary.

Nucleic Acids Res ; 42(14): 8884-94, 2014 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-25056320

RESUMO

DNA tandem repeats (TRs) are ubiquitous genomic features which consist of two or more adjacent copies of an underlying pattern sequence. The copies may be identical or approximate. Variable number of tandem repeats or VNTRs are polymorphic TR loci in which the number of pattern copies is variable. In this paper we describe VNTRseek, our software for discovery of minisatellite VNTRs (pattern size ≥ 7 nucleotides) using whole genome sequencing data. VNTRseek maps sequencing reads to a set of reference TRs and then identifies putative VNTRs based on a discrepancy between the copy number of a reference and its mapped reads. VNTRseek was used to analyze the Watson and Khoisan genomes (454 technology) and two 1000 Genomes family trios (Illumina). In the Watson genome, we identified 752 VNTRs with pattern sizes ranging from 7 to 84 nt. In the Khoisan genome, we identified 2572 VNTRs with pattern sizes ranging from 7 to 105 nt. In the trios, we identified between 2660 and 3822 VNTRs per individual and found nearly 100% consistency with Mendelian inheritance. VNTRseek is, to the best of our knowledge, the first software for genome-wide detection of minisatellite VNTRs. It is available at http://orca.bu.edu/vntrseek/.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Repetições Minissatélites , Análise de Sequência de DNA/métodos , Software , Genoma Humano , Genômica/métodos , Humanos , Mutação INDEL

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA