Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add more filters










Database
Language
Publication year range
1.
Bioinformatics ; 38(19): 4466-4473, 2022 09 30.
Article in English | MEDLINE | ID: mdl-35929780

ABSTRACT

MOTIVATION: Whole-genome sequencing has revolutionized biosciences by providing tools for constructing complete DNA sequences of individuals. With entire genomes at hand, scientists can pinpoint DNA fragments responsible for oncogenesis and predict patient responses to cancer treatments. Machine learning plays a paramount role in this process. However, the sheer volume of whole-genome data makes it difficult to encode the characteristics of genomic variants as features for learning algorithms. RESULTS: In this article, we propose three feature extraction methods that facilitate classifier learning from sets of genomic variants. The core contributions of this work include: (i) strategies for determining features using variant length binning, clustering and density estimation; (ii) a programing library for automating distribution-based feature extraction in machine learning pipelines. The proposed methods have been validated on five real-world datasets using four different classification algorithms and a clustering approach. Experiments on genomes of 219 ovarian, 61 lung and 929 breast cancer patients show that the proposed approaches automatically identify genomic biomarkers associated with cancer subtypes and clinical response to oncological treatment. Finally, we show that the extracted features can be used alongside unsupervised learning methods to analyze genomic samples. AVAILABILITY AND IMPLEMENTATION: The source code of the presented algorithms and reproducible experimental scripts are available on Github at https://github.com/MNMdiagnostics/dbfe. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome , Software , Humans , Genomics/methods , Algorithms , Machine Learning
2.
Mol Diagn Ther ; 26(1): 105-116, 2022 01.
Article in English | MEDLINE | ID: mdl-34932189

ABSTRACT

BACKGROUND AND OBJECTIVE: Human epidermal growth factor receptor 2 (HER2) protein overexpression is one of the most significant biomarkers for breast cancer diagnostics, treatment prediction, and prognostics. The high accessibility of HER2 inhibitors in routine clinical practice directly translates into the diagnostic need for precise and robust marker identification. Even though multigene next-generation sequencing methodologies have slowly taken over the field of single-biomarker molecular tests, the copy number alterations such as amplification of the HER2-coding ERBB2 gene are hard to validate on next-generation sequencing platforms as they are characterized by chromosomal structural heterogeneity, polysomy, and genomic context of ploidy. In our study, we tested the approach of using whole genome sequencing instead of next-generation sequencing panels to determine HER2 status in the clinical set-up. METHODS: We used a large dataset of 876 patients with breast cancer whole genomes with curated clinical data and an additional set of 551 patients' external genomic data. We used the decision-tree-based algorithm for optimization of the diagnostic tool for HER2 status assessment by whole genome sequencing. RESULTS: The most efficient approach to assess HER2 status in whole genome sequencing data was the ploidy-corrected copy number, utilizing ERBB2 copy number and mean tumor ploidy. The classifier achieved sensitivity of 91.18% and specificity of 98.69% on the internal validation dataset and 89.86% and 96.06% on the external data, which is similar to other next-generation sequencing methods, currently tested in the clinic. CONCLUSIONS: We provide evidence that the HER2 status may be reliably determined by whole genome sequencing and is applicable across different laboratory protocols and pipelines. We suggest using the ploidy-corrected copy number for diagnostic purposes.


Subject(s)
Breast Neoplasms , Breast Neoplasms/diagnosis , Breast Neoplasms/genetics , Breast Neoplasms/pathology , DNA Copy Number Variations , Female , Genes, erbB-2 , Humans , Ploidies , Receptor, ErbB-2/genetics , Whole Genome Sequencing
SELECTION OF CITATIONS
SEARCH DETAIL
...