Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
Add more filters










Publication year range
1.
Mol Ther Nucleic Acids ; 35(2): 102202, 2024 Jun 11.
Article in English | MEDLINE | ID: mdl-38846999

ABSTRACT

Splicing factor 3b subunit 1 (SF3B1) is the largest subunit and core component of the spliceosome. Inhibition of SF3B1 was associated with an increase in broad intron retention (IR) on most transcripts, suggesting that IR can be used as a marker of spliceosome inhibition in chronic lymphocytic leukemia (CLL) cells. Furthermore, we separately analyzed exonic and intronic mapped reads on annotated RNA-sequencing transcripts obtained from B cells (n = 98 CLL patients) and healthy volunteers (n = 9). We measured intron/exon ratio to use that as a surrogate for alternative RNA splicing (ARS) and found that 66% of CLL-B cell transcripts had significant IR elevation compared with normal B cells (NBCs) and that correlated with mRNA downregulation and low expression levels. Transcripts with the highest IR levels belonged to biological pathways associated with gene expression and RNA splicing. A >2-fold increase of active pSF3B1 was observed in CLL-B cells compared with NBCs. Additionally, when the CLL-B cells were treated with macrolides (pladienolide-B), a significant decrease in pSF3B1, but not total SF3B1 protein, was observed. These findings suggest that IR/ARS is increased in CLL, which is associated with SF3B1 phosphorylation and susceptibility to SF3B1 inhibitors. These data provide additional support to the relevance of ARS in carcinogenesis and evidence of pSF3B1 participation in this process.

2.
BMC Bioinformatics ; 24(1): 117, 2023 Mar 26.
Article in English | MEDLINE | ID: mdl-36967390

ABSTRACT

BACKGROUND: Biomedical researchers use alignments produced by BLAST (Basic Local Alignment Search Tool) to categorize their query sequences. Producing such alignments is an essential bioinformatics task that is well suited for the cloud. The cloud can perform many calculations quickly as well as store and access large volumes of data. Bioinformaticians can also use it to collaborate with other researchers, sharing their results, datasets and even their pipelines on a common platform. RESULTS: We present ElasticBLAST, a cloud native application to perform BLAST alignments in the cloud. ElasticBLAST can handle anywhere from a few to many thousands of queries and run the searches on thousands of virtual CPUs (if desired), deleting resources when it is done. It uses cloud native tools for orchestration and can request discounted instances, lowering cloud costs for users. It is supported on Amazon Web Services and Google Cloud Platform. It can search BLAST databases that are user provided or from the National Center for Biotechnology Information. CONCLUSION: We show that ElasticBLAST is a useful application that can efficiently perform BLAST searches for the user in the cloud, demonstrating that with two examples. At the same time, it hides much of the complexity of working in the cloud, lowering the threshold to move work to the cloud.


Subject(s)
Cloud Computing , Software , Computational Biology/methods , Databases, Factual , Costs and Cost Analysis
3.
J Bacteriol ; 204(6): e0007922, 2022 06 21.
Article in English | MEDLINE | ID: mdl-35638784

ABSTRACT

The current classification of the phylum Firmicutes (new name, Bacillota) features eight distinct classes, six of which include known spore-forming bacteria. In Bacillus subtilis, sporulation involves up to 500 genes, many of which do not have orthologs in other bacilli and/or clostridia. Previous studies identified about 60 sporulation genes of B. subtilis that were shared by all spore-forming members of the Firmicutes. These genes are referred to as the sporulation core or signature, although many of these are also found in genomes of nonsporeformers. Using an expanded set of 180 firmicute genomes from 160 genera, including 76 spore-forming species, we investigated the conservation of the sporulation genes, in particular seeking to identify lineages that lack some of the genes from the conserved sporulation core. The results of this analysis confirmed that many small acid-soluble spore proteins (SASPs), spore coat proteins, and germination proteins, which were previously characterized in bacilli, are missing in spore-forming members of Clostridia and other classes of Firmicutes. A particularly dramatic loss of sporulation genes was observed in the spore-forming members of the families Planococcaceae and Erysipelotrichaceae. Fifteen species from diverse lineages were found to carry skin (sigK-interrupting) elements of different sizes that all encoded SpoIVCA-like recombinases but did not share any other genes. Phylogenetic trees built from concatenated alignments of sporulation proteins and ribosomal proteins showed similar topology, indicating an early origin and subsequent vertical inheritance of the sporulation genes. IMPORTANCE Many members of the phylum Firmicutes (Bacillota) are capable of producing endospores, which enhance the survival of important Gram-positive pathogens that cause such diseases as anthrax, botulism, colitis, gas gangrene, and tetanus. We show that the core set of sporulation genes, defined previously through genome comparisons of several bacilli and clostridia, is conserved in a wide variety of sporeformers from several distinct lineages of Firmicutes. We also detected widespread loss of sporulation genes in many organisms, particularly within the families Planococcaceae and Erysipelotrichaceae. Members of these families, such as Lysinibacillus sphaericus and Clostridium innocuum, could be excellent model organisms for studying sporulation mechanisms, such as engulfment, formation of the spore coat, and spore germination.


Subject(s)
Bacillus , Spores, Bacterial , Bacillus subtilis/genetics , Bacterial Proteins/genetics , Clostridium/genetics , Firmicutes , Humans , Phylogeny , Spores, Bacterial/genetics
4.
J Bacteriol ; 203(11)2021 06 01.
Article in English | MEDLINE | ID: mdl-33753464

ABSTRACT

Ribosomal proteins (RPs) are highly conserved across the bacterial and archaeal domains. Although many RPs are essential for survival, genome analysis demonstrates the absence of some RP genes in many bacterial and archaeal genomes. Furthermore, global transposon mutagenesis and/or targeted deletion showed that elimination of some RP genes had only a moderate effect on the bacterial growth rate. Here, we systematically analyze the evolutionary conservation of RPs in prokaryotes by compiling the list of the ribosomal genes that are missing from one or more genomes in the recently updated version of the Clusters of Orthologous Genes (COG) database. Some of these absences occurred because the respective genes carried frameshifts, presumably, resulting from sequencing errors, while others were overlooked and not translated during genome annotation. Apart from these annotation errors, we identified multiple genuine losses of RP genes in a variety of bacteria and archaea. Some of these losses are clade-specific, whereas others occur in symbionts and parasites with dramatically reduced genomes. The lists of computationally and experimentally defined non-essential ribosomal genes show a substantial overlap, revealing a common trend in prokaryote ribosome evolution that could be linked to the architecture and assembly of the ribosomes. Thus, RPs that are located at the surface of the ribosome and/or are incorporated at a late stage of ribosome assembly are more likely to be non-essential and to be lost during microbial evolution, particularly, in the course of genome compaction.IMPORTANCEIn many prokaryote genomes, one or more ribosomal protein (RP) genes are missing. Analysis of 1,309 prokaryote genomes included in the COG database shows that only about half of the RPs are universally conserved in bacteria and archaea. In contrast, up to 16 other RPs are missing in some genomes, primarily, tiny (<1 Mb) genomes of host-associated bacteria and archaea. Ten universal and nine archaea-specific ribosomal proteins show clear patterns of lineage-specific gene loss. Most of the RPs that are frequently lost from bacterial genomes are located on the ribosome periphery and are non-essential in Escherichia coli and Bacillus subtilis These results reveal general trends and common constraints in the architecture and evolution of ribosomes in prokaryotes.

5.
J Proteome Res ; 20(4): 2056-2061, 2021 04 02.
Article in English | MEDLINE | ID: mdl-33625229

ABSTRACT

BioContainers is an open-source project that aims to create, store, and distribute bioinformatics software containers and packages. The BioContainers community has developed a set of guidelines to standardize software containers including the metadata, versions, licenses, and software dependencies. BioContainers supports multiple packaging and container technologies such as Conda, Docker, and Singularity. The BioContainers provide over 9000 bioinformatics tools, including more than 200 proteomics and mass spectrometry tools. Here we introduce the BioContainers Registry and Restful API to make containerized bioinformatics tools more findable, accessible, interoperable, and reusable (FAIR). The BioContainers Registry provides a fast and convenient way to find and retrieve bioinformatics tool packages and containers. By doing so, it will increase the use of bioinformatics packages and containers while promoting replicability and reproducibility in research.


Subject(s)
Computational Biology , Proteomics , Registries , Reproducibility of Results , Software
6.
Gigascience ; 10(1)2021 01 07.
Article in English | MEDLINE | ID: mdl-33410471

ABSTRACT

BACKGROUND: FAIR (Findability, Accessibility, Interoperability, and Reusability) next-generation sequencing (NGS) data analysis relies on complex computational biology workflows and pipelines to guarantee reproducibility, portability, and scalability. Moreover, workflow languages, managers, and container technologies have helped address the problem of data analysis pipeline execution across multiple platforms in scalable ways. FINDINGS: Here, we present a project management framework for NGS data analysis called PM4NGS. This framework is composed of an automatic creation of a standard organizational structure of directories and files, bioinformatics tool management using Docker or Bioconda, and data analysis pipelines in CWL format. Pre-configured Jupyter notebooks with minimum Python code are included in PM4NGS to produce a project report and publication-ready figures. We present 3 pipelines for demonstration purposes including the analysis of RNA-Seq, ChIP-Seq, and ChIP-exo datasets. CONCLUSIONS: PM4NGS is an open source framework that creates a standard organizational structure for NGS data analysis projects. PM4NGS is easy to install, configure, and use by non-bioinformaticians on personal computers and laptops. It permits execution of the NGS data analysis on Windows 10 with the Windows Subsystem for Linux feature activated. The framework aims to reduce the gap between researcher in experimental laboratories producing NGS data and workflows for data analysis. PM4NGS documentation can be accessed at https://pm4ngs.readthedocs.io/.


Subject(s)
Data Analysis , Software , Computational Biology , High-Throughput Nucleotide Sequencing , Reproducibility of Results
7.
Nucleic Acids Res ; 49(D1): D274-D281, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33167031

ABSTRACT

The Clusters of Orthologous Genes (COG) database, also referred to as the Clusters of Orthologous Groups of proteins, was created in 1997 and went through several rounds of updates, most recently, in 2014. The current update, available at https://www.ncbi.nlm.nih.gov/research/COG, substantially expands the scope of the database to include complete genomes of 1187 bacteria and 122 archaea, typically, with a single genome per genus. In addition, the current version of the COGs includes the following new features: (i) the recently deprecated NCBI's gene index (gi) numbers for the encoded proteins are replaced with stable RefSeq or GenBank\ENA\DDBJ coding sequence (CDS) accession numbers; (ii) COG annotations are updated for >200 newly characterized protein families with corresponding references and PDB links, where available; (iii) lists of COGs grouped by pathways and functional systems are added; (iv) 266 new COGs for proteins involved in CRISPR-Cas immunity, sporulation in Firmicutes and photosynthesis in cyanobacteria are included; and (v) the database is made available as a web page, in addition to FTP. The current release includes 4877 COGs. Future plans include further expansion of the COG collection by adding archaeal COGs (arCOGs), splitting the COGs containing multiple paralogs, and continued refinement of COG annotations.


Subject(s)
Archaea/genetics , Bacteria/genetics , Databases, Genetic , Genome, Archaeal , Genome, Bacterial , Archaea/metabolism , Archaeal Proteins/classification , Archaeal Proteins/genetics , Archaeal Proteins/metabolism , Bacteria/immunology , Bacteria/metabolism , Bacterial Proteins/classification , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , CRISPR-Cas Systems , Gene Ontology , Humans , Molecular Sequence Annotation , Spores, Bacterial/genetics , Spores, Bacterial/growth & development
8.
Epigenetics Chromatin ; 13(1): 21, 2020 04 22.
Article in English | MEDLINE | ID: mdl-32321568

ABSTRACT

BACKGROUND: Next-generation sequencing allows genome-wide analysis of changes in chromatin states and gene expression. Data analysis of these increasingly used methods either requires multiple analysis steps, or extensive computational time. We sought to develop a tool for rapid quantification of sequencing peaks from diverse experimental sources and an efficient method to produce coverage tracks for accurate visualization that can be intuitively displayed and interpreted by experimentalists with minimal bioinformatics background. We demonstrate its strength and usability by integrating data from several types of sequencing approaches. RESULTS: We have developed BAMscale, a one-step tool that processes a wide set of sequencing datasets. To demonstrate the usefulness of BAMscale, we analyzed multiple sequencing datasets from chromatin immunoprecipitation sequencing data (ChIP-seq), chromatin state change data (assay for transposase-accessible chromatin using sequencing: ATAC-seq, DNA double-strand break mapping sequencing: END-seq), DNA replication data (Okazaki fragments sequencing: OK-seq, nascent-strand sequencing: NS-seq, single-cell replication timing sequencing: scRepli-seq) and RNA-seq data. The outputs consist of raw and normalized peak scores (multiple normalizations) in text format and scaled bigWig coverage tracks that are directly accessible to data visualization programs. BAMScale also includes a visualization module facilitating direct, on-demand quantitative peak comparisons that can be used by experimentalists. Our tool can effectively analyze large sequencing datasets (~ 100 Gb size) in minutes, outperforming currently available tools. CONCLUSIONS: BAMscale accurately quantifies and normalizes identified peaks directly from BAM files, and creates coverage tracks for visualization in genome browsers. BAMScale can be implemented for a wide set of methods for calculating coverage tracks, including ChIP-seq and ATAC-seq, as well as methods that currently require specialized, separate tools for analyses, such as splice-aware RNA-seq, END-seq and OK-seq for which no dedicated software is available. BAMscale is freely available on github (https://github.com/ncbi/BAMscale).


Subject(s)
Chromatin Immunoprecipitation Sequencing/methods , RNA-Seq/methods , Chromatin Assembly and Disassembly , DNA , DNA Breaks, Double-Stranded , Humans , K562 Cells , Software
9.
BMC Genomics ; 20(1): 378, 2019 May 14.
Article in English | MEDLINE | ID: mdl-31088352

ABSTRACT

BACKGROUND: Banana is one of the most important crops in tropical and sub-tropical regions. To meet the demands of international markets, banana plantations require high amounts of chemical fertilizers which translate into high farming costs and are hazardous to the environment when used excessively. Beneficial free-living soil bacteria that colonize the rhizosphere are known as plant growth-promoting rhizobacteria (PGPR). PGPR affect plant growth in direct or indirect ways and hold great promise for sustainable agriculture. RESULTS: PGPR of the genera Bacillus and Pseudomonas in banana cv. Williams were evaluated. These plants were produced through in vitro culture and inoculated individually with two rhizobacteria, Bacillus amyloliquefaciens strain Bs006 and Pseudomonas fluorescens strain Ps006. Control plants without microbial inoculum were also evaluated. These plants were kept in a controlled climate growth room with conditions required to favor plant-microorganism interactions. These interactions were evaluated at 1-, 48- and 96-h using transcriptome sequencing after inoculation to establish differentially expressed genes (DEGs) in plants elicited by the interaction with the two rhizobacteria. Additionally, droplet digital PCR was performed at 1, 48, 96 h, and also at 15 and 30 days to validate the expression patterns of selected DEGs. The banana cv. Williams transcriptome reported differential expression in a large number of genes of which 22 were experimentally validated. Genes validated experimentally correspond to growth promotion and regulation of specific functions (flowering, photosynthesis, glucose catabolism and root growth) as well as plant defense genes. This study focused on the analysis of 18 genes involved in growth promotion, defense and response to biotic or abiotic stress. CONCLUSIONS: Differences in banana gene expression profiles in response to the rhizobacteria evaluated here (Bacillus amyloliquefaciens Bs006 and Pseudomonas fluorescens Ps006) are influenced by separate bacterial colonization processes and levels that stimulate distinct groups of genes at various points in time.


Subject(s)
Bacillus amyloliquefaciens/physiology , Gene Expression Profiling/methods , Musa/growth & development , Plant Proteins/genetics , Pseudomonas fluorescens/physiology , Gene Expression Regulation, Plant , Gene Ontology , Musa/genetics , Musa/microbiology , Sequence Analysis, RNA , Soil Microbiology , Stress, Physiological
10.
Nucleic Acids Res ; 47(W1): W594-W599, 2019 07 02.
Article in English | MEDLINE | ID: mdl-31020319

ABSTRACT

Literature search is a routine practice for scientific studies as new discoveries build on knowledge from the past. Current tools (e.g. PubMed, PubMed Central), however, generally require significant effort in query formulation and optimization (especially in searching the full-length articles) and do not allow direct retrieval of specific statements, which is key for tasks such as comparing/validating new findings with previous knowledge and performing evidence attribution in biocuration. Thus, we introduce LitSense, which is the first web-based system that specializes in sentence retrieval for biomedical literature. LitSense provides unified access to PubMed and PMC content with over a half-billion sentences in total. Given a query, LitSense returns best-matching sentences using both a traditional term-weighting approach that up-weights sentences that contain more of the rare terms in the user query as well as a novel neural embedding approach that enables the retrieval of semantically relevant results without explicit keyword match. LitSense provides a user-friendly interface that assists its users to quickly browse the returned sentences in context and/or further filter search results by section or publication date. LitSense also employs PubTator to highlight biomedical entities (e.g. gene/proteins) in the sentences for better result visualization. LitSense is freely available at https://www.ncbi.nlm.nih.gov/research/litsense.


Subject(s)
Data Mining/methods , Software , Abstracting and Indexing , PubMed , Publications
11.
Bioinformatics ; 35(11): 1960-1962, 2019 06 01.
Article in English | MEDLINE | ID: mdl-30379987

ABSTRACT

SUMMARY: The quantification of RNA sequencing (RNA-seq) abundance using a normalization method that calculates transcripts per million (TPM) is a key step to compare multiple samples from different experiments. TPMCalculator is a one-step software to process RNA-seq alignments in BAM format and reports TPM values, raw read counts and feature lengths for genes, transcripts, exons and introns. The program describes the genomic features through a model generated from the gene transfer format file used during alignments reporting of the TPM values and the raw read counts for each feature. In this paper, we show the correlation for 1256 samples from the TCGA-BRCA project between TPM and FPKM reported by TPMCalculator and RSeQC. We also show the correlation for raw read counts reported by TPMCalculator, HTSeq and featureCounts. AVAILABILITY AND IMPLEMENTATION: TPMCalculator is freely available at https://github.com/ncbi/TPMCalculator. It is implemented in C++14 and supported on Mac OS X, Linux and MS Windows. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genomics , Software , Exons , RNA, Messenger , Sequence Analysis, RNA
12.
Database (Oxford) ; 20172017 01 01.
Article in English | MEDLINE | ID: mdl-28605765

ABSTRACT

Abstract: The volume of transcriptome data is growing exponentially due to rapid improvement of experimental technologies. In response, large central resources such as those of the National Center for Biotechnology Information (NCBI) are continually adapting their computational infrastructure to accommodate this large influx of data. New and specialized databases, such as Transcriptome Shotgun Assembly Sequence Database (TSA) and Sequence Read Archive (SRA), have been created to aid the development and expansion of centralized repositories. Although the central resource databases are under continual development, they do not include automatic pipelines to increase annotation of newly deposited data. Therefore, third-party applications are required to achieve that aim. Here, we present an automatic workflow and web application for the annotation of transcriptome data. The workflow creates secondary data such as sequencing reads and BLAST alignments, which are available through the web application. They are based on freely available bioinformatics tools and scripts developed in-house. The interactive web application provides a search engine and several browser utilities. Graphical views of transcript alignments are available through SeqViewer, an embedded tool developed by NCBI for viewing biological sequence data. The web application is tightly integrated with other NCBI web applications and tools to extend the functionality of data processing and interconnectivity. We present a case study for the species Physalis peruviana with data generated from BioProject ID 67621. Database: URL: http://www.ncbi.nlm.nih.gov/projects/physalis/.


Subject(s)
Databases, Genetic , Internet , Physalis/genetics , Sequence Alignment/methods , User-Computer Interface , Workflow , Animals , Humans
13.
Bioinformatics ; 33(16): 2580-2582, 2017 Aug 15.
Article in English | MEDLINE | ID: mdl-28379341

ABSTRACT

MOTIVATION: BioContainers (biocontainers.pro) is an open-source and community-driven framework which provides platform independent executable environments for bioinformatics software. BioContainers allows labs of all sizes to easily install bioinformatics software, maintain multiple versions of the same software and combine tools into powerful analysis pipelines. BioContainers is based on popular open-source projects Docker and rkt frameworks, that allow software to be installed and executed under an isolated and controlled environment. Also, it provides infrastructure and basic guidelines to create, manage and distribute bioinformatics containers with a special focus on omics technologies. These containers can be integrated into more comprehensive bioinformatics pipelines and different architectures (local desktop, cloud environments or HPC clusters). AVAILABILITY AND IMPLEMENTATION: The software is freely available at github.com/BioContainers/. CONTACT: yperez@ebi.ac.uk.


Subject(s)
Computational Biology/methods , Software , Genomics/methods , Metabolomics/methods , Proteomics/methods
14.
Article in English | MEDLINE | ID: mdl-25815274

ABSTRACT

luxR genes encode transcriptional regulators that control acyl homoserine lactone-based quorum sensing (AHL QS) in Gram negative bacteria. On the bacterial chromosome, luxR genes are usually found next or near to a luxI gene encoding the AHL signal synthase. Recently, a number of luxR genes were described that have no luxI genes in their vicinity on the chromosome. These so-called solo luxR genes may either respond to internal AHL signals produced by a non-adjacent luxI in the chromosome, or can respond to exogenous signals. Here we present a survey of solo luxR genes found in complete and draft bacterial genomes in the NCBI databases using HMMs. We found that 2698 of the 3550 luxR genes found are solos, which is an unexpectedly high number even if some of the hits may be false positives. We also found that solo LuxR sequences form distinct clusters that are different from the clusters of LuxR sequences that are part of the known luxR-luxI topological arrangements. We also found a number of cases that we termed twin luxR topologies, in which two adjacent luxR genes were in tandem or divergent orientation. Many of the luxR solo clusters were devoid of the sequence motifs characteristic of AHL binding LuxR proteins so there is room to speculate that the solos may be involved in sensing hitherto unknown signals. It was noted that only some of the LuxR clades are rich in conserved cysteine residues. Molecular modeling suggests that some of the cysteines may be involved in disulfide formation, which makes us speculate that some LuxR proteins, including some of the solos may be involved in redox regulation.


Subject(s)
Bacteria/genetics , Genome, Bacterial , Repressor Proteins/genetics , Trans-Activators/genetics , Amino Acid Motifs , Bacteria/classification , Bacteria/metabolism , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Databases, Genetic , Models, Molecular , Repressor Proteins/chemistry , Repressor Proteins/metabolism , Trans-Activators/chemistry , Trans-Activators/metabolism , Transcription Factors/genetics , Transcription Factors/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...