Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
Add more filters










Publication year range
1.
BMC Bioinformatics ; 24(1): 117, 2023 Mar 26.
Article in English | MEDLINE | ID: mdl-36967390

ABSTRACT

BACKGROUND: Biomedical researchers use alignments produced by BLAST (Basic Local Alignment Search Tool) to categorize their query sequences. Producing such alignments is an essential bioinformatics task that is well suited for the cloud. The cloud can perform many calculations quickly as well as store and access large volumes of data. Bioinformaticians can also use it to collaborate with other researchers, sharing their results, datasets and even their pipelines on a common platform. RESULTS: We present ElasticBLAST, a cloud native application to perform BLAST alignments in the cloud. ElasticBLAST can handle anywhere from a few to many thousands of queries and run the searches on thousands of virtual CPUs (if desired), deleting resources when it is done. It uses cloud native tools for orchestration and can request discounted instances, lowering cloud costs for users. It is supported on Amazon Web Services and Google Cloud Platform. It can search BLAST databases that are user provided or from the National Center for Biotechnology Information. CONCLUSION: We show that ElasticBLAST is a useful application that can efficiently perform BLAST searches for the user in the cloud, demonstrating that with two examples. At the same time, it hides much of the complexity of working in the cloud, lowering the threshold to move work to the cloud.


Subject(s)
Cloud Computing , Software , Computational Biology/methods , Databases, Factual , Costs and Cost Analysis
2.
bioRxiv ; 2023 Jan 04.
Article in English | MEDLINE | ID: mdl-36789435

ABSTRACT

Background: Biomedical researchers use alignments produced by BLAST (Basic Local Alignment Search Tool) to categorize their query sequences. Producing such alignments is an essential bioinformatics task that is well suited for the cloud. The cloud can perform many calculations quickly as well as store and access large volumes of data. Bioinformaticians can also use it to collaborate with other researchers, sharing their results, datasets and even their pipelines on a common platform. Results: We present ElasticBLAST, a cloud native application to perform BLAST alignments in the cloud. ElasticBLAST can handle anywhere from a few to many thousands of queries and run the searches on thousands of virtual CPUs (if desired), deleting resources when it is done. It uses cloud native tools for orchestration and can request discounted instances, lowering cloud costs for users. It is supported on Amazon Web Services and Google Cloud Platform. It can search BLAST databases that are user provided or from the National Center for Biotechnology Information. Conclusion: We show that ElasticBLAST is a useful application that can efficiently perform BLAST searches for the user in the cloud, demonstrating that with two examples. At the same time, it hides much of the complexity of working in the cloud, lowering the threshold to move work to the cloud.

3.
BMC Bioinformatics ; 20(1): 405, 2019 Jul 25.
Article in English | MEDLINE | ID: mdl-31345161

ABSTRACT

BACKGROUND: Next-generation sequencing technologies can produce tens of millions of reads, often paired-end, from transcripts or genomes. But few programs can align RNA on the genome and accurately discover introns, especially with long reads. We introduce Magic-BLAST, a new aligner based on ideas from the Magic pipeline. RESULTS: Magic-BLAST uses innovative techniques that include the optimization of a spliced alignment score and selective masking during seed selection. We evaluate the performance of Magic-BLAST to accurately map short or long sequences and its ability to discover introns on real RNA-seq data sets from PacBio, Roche and Illumina runs, and on six benchmarks, and compare it to other popular aligners. Additionally, we look at alignments of human idealized RefSeq mRNA sequences perfectly matching the genome. CONCLUSIONS: We show that Magic-BLAST is the best at intron discovery over a wide range of conditions and the best at mapping reads longer than 250 bases, from any platform. It is versatile and robust to high levels of mismatches or extreme base composition, and reasonably fast. It can align reads to a BLAST database or a FASTA file. It can accept a FASTQ file as input or automatically retrieve an accession from the SRA repository at the NCBI.


Subject(s)
RNA/genetics , Sequence Alignment , Sequence Analysis, RNA/methods , Software , Algorithms , Base Sequence , Databases, Nucleic Acid , Humans , Introns/genetics , ROC Curve , Time Factors
4.
Nucleic Acids Res ; 41(Web Server issue): W29-33, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23609542

ABSTRACT

The Basic Local Alignment Search Tool (BLAST) website at the National Center for Biotechnology (NCBI) is an important resource for searching and aligning sequences. A new BLAST report allows faster loading of alignments, adds navigation aids, allows easy downloading of subject sequences and reports and has improved usability. Here, we describe these improvements to the BLAST report, discuss design decisions, describe other improvements to the search page and database documentation and outline plans for future development. The NCBI BLAST URL is http://blast.ncbi.nlm.nih.gov.


Subject(s)
Sequence Alignment/methods , Software , Animals , Genomics , Internet , L-Gulonolactone Oxidase/genetics , Rats
5.
Biol Direct ; 7: 12, 2012 Apr 17.
Article in English | MEDLINE | ID: mdl-22510480

ABSTRACT

BACKGROUND: BLAST is a commonly-used software package for comparing a query sequence to a database of known sequences; in this study, we focus on protein sequences. Position-specific-iterated BLAST (PSI-BLAST) iteratively searches a protein sequence database, using the matches in round i to construct a position-specific score matrix (PSSM) for searching the database in round i + 1. Biegert and Söding developed Context-sensitive BLAST (CS-BLAST), which combines information from searching the sequence database with information derived from a library of short protein profiles to achieve better homology detection than PSI-BLAST, which builds its PSSMs from scratch. RESULTS: We describe a new method, called domain enhanced lookup time accelerated BLAST (DELTA-BLAST), which searches a database of pre-constructed PSSMs before searching a protein-sequence database, to yield better homology detection. For its PSSMs, DELTA-BLAST employs a subset of NCBI's Conserved Domain Database (CDD). On a test set derived from ASTRAL, with one round of searching, DELTA-BLAST achieves a ROC5000 of 0.270 vs. 0.116 for CS-BLAST. The performance advantage diminishes in iterated searches, but DELTA-BLAST continues to achieve better ROC scores than CS-BLAST. CONCLUSIONS: DELTA-BLAST is a useful program for the detection of remote protein homologs. It is available under the "Protein BLAST" link at http://blast.ncbi.nlm.nih.gov.


Subject(s)
Databases, Protein , Protein Structure, Tertiary , Search Engine/methods , Software , Algorithms , Computational Biology/methods , Internet , ROC Curve , Reproducibility of Results , Sensitivity and Specificity , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Sequence Homology, Amino Acid , Time Factors
6.
J Am Soc Nephrol ; 20(9): 2065-74, 2009 Sep.
Article in English | MEDLINE | ID: mdl-19643930

ABSTRACT

One third of patients with type 1 diabetes and microalbuminuria experience an early, progressive decline in renal function that leads to advanced stages of chronic kidney disease and ESRD. We hypothesized that the urinary proteome may distinguish between stable renal function and early renal function decline among patients with type 1 diabetes and microalbuminuria. We followed patients with normal renal function and microalbuminuria for 10 to 12 yr and classified them into case patients (n = 21) with progressive early renal function decline and control subjects (n = 40) with stable renal function. Using liquid chromatography matrix-assisted laser desorption/ionization time-of-flight mass spectrometry, we identified three peptides that decreased in the urine of patients with early renal function decline [fragments of alpha1(IV) and alpha1(V) collagens and tenascin-X] and three peptides that increased (fragments of inositol pentakisphosphate 2-kinase, zona occludens 3, and FAT tumor suppressor 2). In renal biopsies from patients with early nephropathy from type 1 diabetes, we observed increased expression of inositol pentakisphosphate 2-kinase, which was present in granule-like cytoplasmic structures, and zona occludens 3. These results indicate that urinary peptide fragments reflect changes in expression of intact protein in the kidney, suggesting new potential mediators of diabetic nephropathy and candidate biomarkers for progressive renal function decline.


Subject(s)
Albuminuria/urine , Diabetes Mellitus, Type 1/urine , Diabetic Nephropathies/urine , Peptides/urine , Adult , Albuminuria/pathology , Albuminuria/physiopathology , Biomarkers/urine , Biopsy , Cadherins/urine , Carrier Proteins/urine , Diabetes Mellitus, Type 1/physiopathology , Diabetic Nephropathies/pathology , Diabetic Nephropathies/physiopathology , Disease Progression , Humans , Kidney/metabolism , Kidney/pathology , Membrane Proteins/urine , Phosphotransferases (Alcohol Group Acceptor)/urine , Poly(A)-Binding Proteins/metabolism , Predictive Value of Tests , Signal Transduction/physiology , T-Cell Intracellular Antigen-1 , Young Adult , Zonula Occludens Proteins
7.
Bioinformation ; 1(10): 396-405, 2007 Apr 10.
Article in English | MEDLINE | ID: mdl-17597929

ABSTRACT

UNLABELLED: In this paper we propose a data based algorithm to marry existing biological knowledge (e.g., functional annotations of genes) with experimental data (gene expression profiles) in creating an overall dissimilarity that can be used with any clustering algorithm that uses a general dissimilarity matrix. We explore this idea with two publicly available gene expression data sets and functional annotations where the results are compared with the clustering results that uses only the experimental data. Although more elaborate evaluations might be called for, the present paper makes a strong case for utilizing existing biological information in the clustering process. AVAILABILITY: Supplement is available at www.somnathdatta.org/Supp/Bioinformation/appendix.pdf.

8.
BMC Bioinformatics ; 7 Suppl 2: S8, 2006 Sep 06.
Article in English | MEDLINE | ID: mdl-17118151

ABSTRACT

BACKGROUND: Independent Component Analysis (ICA) proves to be useful in the analysis of neural activity, as it allows for identification of distinct sources of activity. Applied to measurements registered in a controlled setting and under exposure to an external stimulus, it can facilitate analysis of the impact of the stimulus on those sources. The link between the stimulus and a given source can be verified by a classifier that is able to "predict" the condition a given signal was registered under, solely based on the components. However, the ICA's assumption about statistical independence of sources is often unrealistic and turns out to be insufficient to build an accurate classifier. Therefore, we propose to utilize a novel method, based on hybridization of ICA, multi-objective evolutionary algorithms (MOEA), and rough sets (RS), that attempts to improve the effectiveness of signal decomposition techniques by providing them with "classification-awareness." RESULTS: The preliminary results described here are very promising and further investigation of other MOEAs and/or RS-based classification accuracy measures should be pursued. Even a quick visual analysis of those results can provide an interesting insight into the problem of neural activity analysis. CONCLUSION: We present a methodology of classificatory decomposition of signals. One of the main advantages of our approach is the fact that rather than solely relying on often unrealistic assumptions about statistical independence of sources, components are generated in the light of a underlying classification problem itself.


Subject(s)
Brain/physiology , Computational Biology/methods , Evoked Potentials , Algorithms , Animals , Rats
9.
Conf Proc IEEE Eng Med Biol Soc ; 2006: 5515-8, 2006.
Article in English | MEDLINE | ID: mdl-17947147

ABSTRACT

Cluster analysis has become a standard part of gene expression analysis. In this paper, we propose a novel semi-supervised approach that offers the same flexibility as that of a hierarchical clustering. Yet it utilizes, along with the experimental gene expression data, common biological information about different genes that is being complied at various public, Web accessible databases. We argue that such an approach is inherently superior than the standard unsupervised approach of grouping genes based on expression data alone. It is shown that our biologically supervised methods produce better clustering results than the corresponding unsupervised methods as judged by the distance from the model temporal profiles. R-codes of the clustering algorithm are available from the authors upon request.


Subject(s)
Cluster Analysis , Computational Biology/methods , Gene Expression Profiling/instrumentation , Gene Expression Profiling/methods , Algorithms , Artificial Intelligence , Computer Simulation , Gene Expression , Gene Expression Regulation , Humans , Models, Statistical , Models, Theoretical , Oligonucleotide Array Sequence Analysis , Pattern Recognition, Automated , Sequence Analysis, DNA
10.
Conf Proc IEEE Eng Med Biol Soc ; 2006: 5798-801, 2006.
Article in English | MEDLINE | ID: mdl-17947169

ABSTRACT

Interpretation and classification of mass spectra is usually performed using a list of peaks as their mathematical representation. This fact makes peak detection a bottleneck of mass spectra analysis, since quality and biological relevance of any results strongly depends on the accuracy of peak detection process. Many algorithms utilize intensity to differentiate between peaks and noise and thus bias the detection process to the high abundant peaks. However important information may also be contained in the lower-intensity peaks that are more difficult to discover. We present an algorithm specifically designed for detection of low-abundant peaks.


Subject(s)
Mass Spectrometry/methods , Pattern Recognition, Automated , Signal Processing, Computer-Assisted , Algorithms , Calibration , Humans , Models, Statistical , Models, Theoretical , Peptide Mapping , Protein Array Analysis , Proteins , Proteomics , Reproducibility of Results , Sensitivity and Specificity , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...