Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
PLoS One ; 16(4): e0249410, 2021.
Article in English | MEDLINE | ID: mdl-33886589

ABSTRACT

Search results from local alignment search tools use statistical scores that are sensitive to the size of the database to report the quality of the result. For example, NCBI BLAST reports the best matches using similarity scores and expect values (i.e., e-values) calculated against the database size. Given the astronomical growth in genomics data throughout a genomic research investigation, sequence databases grow as new sequences are continuously being added to these databases. As a consequence, the results (e.g., best hits) and associated statistics (e.g., e-values) for a specific set of queries may change over the course of a genomic investigation. Thus, to update the results of a previously conducted BLAST search to find the best matches on an updated database, scientists must currently rerun the BLAST search against the entire updated database, which translates into irrecoverable and, in turn, wasted execution time, money, and computational resources. To address this issue, we devise a novel and efficient method to redeem past BLAST searches by introducing iBLAST. iBLAST leverages previous BLAST search results to conduct the same query search but only on the incremental (i.e., newly added) part of the database, recomputes the associated critical statistics such as e-values, and combines these results to produce updated search results. Our experimental results and fidelity analyses show that iBLAST delivers search results that are identical to NCBI BLAST at a substantially reduced computational cost, i.e., iBLAST performs (1 + δ)/δ times faster than NCBI BLAST, where δ represents the fraction of database growth. We then present three different use cases to demonstrate that iBLAST can enable efficient biological discovery at a much faster speed with a substantially reduced computational cost.


Subject(s)
Computational Biology , Sequence Analysis, Protein/methods , Algorithms , Automation , Databases, Protein , Sequence Alignment , Software
2.
Sci Rep ; 10(1): 2022, 2020 02 06.
Article in English | MEDLINE | ID: mdl-32029803

ABSTRACT

Despite decades of research, effective treatments for most cancers remain elusive. One reason is that different instances of cancer result from different combinations of multiple genetic mutations (hits). Therefore, treatments that may be effective in some cases are not effective in others. We previously developed an algorithm for identifying combinations of carcinogenic genes with mutations (multi-hit combinations), which could suggest a likely cause for individual instances of cancer. Most cancers are estimated to require three or more hits. However, the computational complexity of the algorithm scales exponentially with the number of hits, making it impractical for identifying combinations of more than two hits. To identify combinations of greater than two hits, we used a compressed binary matrix representation, and optimized the algorithm for parallel execution on an NVIDIA V100 graphics processing unit (GPU). With these enhancements, the optimized GPU implementation was on average an estimated 12,144 times faster than the original integer matrix based CPU implementation, for the 3-hit algorithm, allowing us to identify 3-hit combinations. The 3-hit combinations identified using a training set were able to differentiate between tumor and normal samples in a separate test set with 90% overall sensitivity and 93% overall specificity. We illustrate how the distribution of mutations in tumor and normal samples in the multi-hit gene combinations can suggest potential driver mutations for further investigation. With experimental validation, these combinations may provide insight into the etiology of cancer and a rational basis for targeted combination therapy.


Subject(s)
Algorithms , Biomarkers, Tumor/genetics , Computational Biology/instrumentation , Computer Graphics , Neoplasms/genetics , Antineoplastic Combined Chemotherapy Protocols/pharmacology , Antineoplastic Combined Chemotherapy Protocols/therapeutic use , Biomarkers, Tumor/antagonists & inhibitors , Carcinogenesis/genetics , Computational Biology/methods , Datasets as Topic , Humans , Molecular Targeted Therapy/methods , Mutation , Neoplasms/drug therapy , Oligonucleotide Array Sequence Analysis/instrumentation , Oligonucleotide Array Sequence Analysis/methods , Precision Medicine/methods , Time Factors
3.
Sci Rep ; 9(1): 18928, 2019 Dec 09.
Article in English | MEDLINE | ID: mdl-31819072

ABSTRACT

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

4.
Sci Rep ; 9(1): 1005, 2019 01 30.
Article in English | MEDLINE | ID: mdl-30700767

ABSTRACT

Cancer is known to result from a combination of a small number of genetic defects. However, the specific combinations of mutations responsible for the vast majority of cancers have not been identified. Current computational approaches focus on identifying driver genes and mutations. Although individually these mutations can increase the risk of cancer they do not result in cancer without additional mutations. We present a fundamentally different approach for identifying the cause of individual instances of cancer: we search for combinations of genes with carcinogenic mutations (multi-hit combinations) instead of individual driver genes or mutations. We developed an algorithm that identified a set of multi-hit combinations that differentiate between tumor and normal tissue samples with 91% sensitivity (95% Confidence Interval (CI) = 89-92%) and 93% specificity (95% CI = 91-94%) on average for seventeen cancer types. We then present an approach based on mutational profile that can be used to distinguish between driver and passenger mutations within these genes. These combinations, with experimental validation, can aid in better diagnosis, provide insights into the etiology of cancer, and provide a rational basis for designing targeted combination therapies.


Subject(s)
Algorithms , Carcinogenesis/genetics , Databases, Genetic , Models, Genetic , Neoplasms/genetics , Computational Biology , Humans , Mutation
SELECTION OF CITATIONS
SEARCH DETAIL
...