RESUMO
OBJECTIVE: We developed an in-house bioinformatics pipeline to improve the detection of respiratory pathogens in metagenomic sequencing data. This pipeline addresses the need for short-time analysis, high accuracy, scalability, and reproducibility in a high-performance computing environment. RESULTS: We evaluated our pipeline using ninety synthetic metagenomes designed to simulate nasopharyngeal swab samples. The pipeline successfully identified 177 out of 204 respiratory pathogens present in the compositions, with an average processing time of approximately 4 min per sample (processing 1 million paired-end reads of 150 base pairs). For the estimation of all the 470 taxa included in the compositions, the pipeline demonstrated high accuracy, identifying 420 and achieving a correlation of 0.9 between their actual and predicted relative abundances. Among the identified taxa, 27 were significantly underestimated or overestimated, including only three clinically relevant pathogens. We also validated the pipeline by applying it to a clinical dataset from a study on metagenomic pathogen characterization in patients with acute respiratory infections and successfully identified all pathogens responsible for the diagnosed infections. These findings underscore the pipeline's effectiveness in pathogen detection and highlight its potential utility in respiratory pathogen surveillance.
Assuntos
Metagenômica , Infecções Respiratórias , Metagenômica/métodos , Humanos , Infecções Respiratórias/microbiologia , Infecções Respiratórias/diagnóstico , Metagenoma/genética , Biologia Computacional/métodos , Reprodutibilidade dos Testes , Nasofaringe/microbiologia , Nasofaringe/virologiaRESUMO
Background: The Neotropics harbors the largest species richness of the planet; however, even in well-studied groups, there are potentially hundreds of species that lack a formal description, and likewise, many already described taxa are difficult to identify using morphology. Specifically in small mammals, complex morphological diagnoses have been facilitated by the use of molecular data, particularly from mitochondrial sequences, to obtain accurate species identifications. Obtaining mitochondrial markers implies the use of PCR and specific primers, which are largely absent for non-model organisms. Oxford Nanopore Technologies (ONT) is a new alternative for sequencing the entire mitochondrial genome without the need for specific primers. Only a limited number of studies have employed exclusively ONT long-reads to assemble mitochondrial genomes, and few studies have yet evaluated the usefulness of such reads in multiple non-model organisms. Methods: We implemented fieldwork to collect small mammals, including rodents, bats, and marsupials, in five localities in the northern extreme of the Cordillera Central of Colombia. DNA samples were sequenced using the MinION device and Flongle flow cells. Shotgun-sequenced data was used to reconstruct the mitochondrial genome of all the samples. In parallel, using a customized computational pipeline, species-level identifications were obtained based on sequencing raw reads (Whole Genome Sequencing). ONT-based identifications were corroborated using traditional morphological characters and phylogenetic analyses. Results: A total of 24 individuals from 18 species were collected, morphologically identified, and deposited in the biological collection of Universidad EAFIT. Our different computational pipelines were able to reconstruct mitochondrial genomes from exclusively ONT reads. We obtained three new mitochondrial genomes and eight new molecular mitochondrial sequences for six species. Our species identification pipeline was able to obtain accurate species identifications for up to 75% of the individuals in as little as 5 s. Finally, our phylogenetic analyses corroborated the identifications from our automated species identification pipeline and revealed important contributions to the knowledge of the diversity of Neotropical small mammals. Discussion: This study was able to evaluate different pipelines to reconstruct mitochondrial genomes from non-model organisms, using exclusively ONT reads, benchmarking these protocols on a multi-species dataset. The proposed methodology can be applied by non-expert taxonomists and has the potential to be implemented in real-time, without the need to euthanize the organisms and under field conditions. Therefore, it stands as a relevant tool to help increase the available data for non-model organisms, and the rate at which researchers can characterize life specially in highly biodiverse places as the Neotropics.
Assuntos
Genoma Mitocondrial , Mamíferos , Análise de Sequência de DNA , Animais , Mamíferos/genética , Genoma Mitocondrial/genética , Análise de Sequência de DNA/métodos , Nanoporos , Colômbia , DNA Mitocondrial/genética , Filogenia , Quirópteros/genética , Sequenciamento por Nanoporos/métodosRESUMO
Optimizing the distribution of refined oil products using pipeline systems has been posing a meaningful challenge to the operations research field of exceptional economic importance for the oil and gas industry. The solution of this problem stands on the interface of chemical engineering and operations research, the former has been the most important contributor, and the latter has been paying an increasing amount of attention to its solution over the last ten years or so. The goal of this work is fourfold: to unveil the current shape of the accomplished research work on this topical area according to its descriptive analytics, to present and discuss its modeling research perspectives, to outline its emerging research trends and to trigger discussions on its future research avenues. A future research agenda should study more realistic mathematical models for the system and solution procedures that exploit their algebraic structure.
RESUMO
Bioinformatics tools are essential for performing analyses in the omics sciences. Given the numerous experimental opportunities arising from advances in the field of omics and easier access to high-throughput sequencing platforms, these tools play a fundamental role in research projects. Despite the considerable progress made possible by the development of bioinformatics tools, some tools are tailored to specific analytical goals, leading to challenges for non-bioinformaticians who need to integrate the results of these specific tools into a customized pipeline. To solve this problem, we have developed the BioPipeline Creator, a user-friendly Java-based GUI that allows different software tools to be integrated into the repertoire while ensuring easy user interaction via an accessible graphical interface. Consisting of client and server software components, BioPipeline Creator provides an intuitive graphical interface that simplifies the use of various bioinformatics tools for users without advanced computer skills. It can run on less sophisticated devices or workstations, allowing users to keep their operating system without having to switch to another compatible system. The server is responsible for the processing tasks and can perform the analysis in the user's local or remote network structure. Compatible with the most important operating systems, available at https://github.com/allanverasce/bpc.git .
Assuntos
Biologia Computacional , Software , Interface Usuário-Computador , Biologia Computacional/métodos , Linguagens de Programação , Sequenciamento de Nucleotídeos em Larga Escala/métodos , HumanosRESUMO
Nickel and cobalt are frequently found in metallic alloys used in the manufacture of aneurysm clips and endovascular prostheses, such as the pipeline embolization device (PED). Nickel hypersensitivity can affect up to 15% of the population, however, it is very rarely overt in patients who undergo endovascular stent placement. Here, we present the case of a 35-year-old woman who developed allergic symptoms after PED placement and was later confirmed to be allergic to both nickel and cobalt by patch testing. Fortunately, she responded well to pharmacologic treatment, rendering surgical intervention unnecessary. To the best of our knowledge, this is the first report of symptomatic nickel hypersensitivity, and the second report of symptomatic cobalt allergy caused by the PED. Despite its low prevalence, we believe that surgeons should actively inquire patients in the postoperative period about allergic symptoms, to facilitate early diagnosis and treatment.
RESUMO
Mouse tumour models are extensively used as a pre-clinical research tool in the field of oncology, playing an important role in anticancer drugs discovery. Accordingly, in cancer genomics research, the demand for next-generation sequencing (NGS) is increasing, and consequently, the need for data analysis pipelines is likewise growing. Most NGS data analysis solutions to date do not support mouse data or require highly specific configuration for their use. Here, we present a genome analysis pipeline for mouse tumour NGS data including the whole-genome sequence (WGS) data analysis flow for somatic variant discovery, and the RNA-seq data flow for differential expression, functional analysis and neoantigen prediction. The pipeline is based on standards and best practices and integrates mouse genome references and annotations. In a recent study, the pipeline was applied to demonstrate the efficacy of low dose 6-thioguanine (6TG) treatment on low-mutation melanoma in a pre-clinical mouse model. Here, we further this study and describe in detail the pipeline and the results obtained in terms of tumour mutational burden (TMB) and number of predicted neoantigens, and correlate these with 6TG effects on tumour volume. Our pipeline was expanded to include a neoantigen analysis, resulting in neopeptide prediction and MHC class I antigen presentation evaluation. We observed that the number of predicted neoepitopes were more accurate indicators of tumour immune control than TMB. In conclusion, this study demonstrates the usability of the proposed pipeline, and suggests it could be an essential robust genome analysis platform for future mouse genomic analysis.
Assuntos
Melanoma , Tioguanina , Animais , Camundongos , Tioguanina/farmacologia , Genômica/métodos , Mutação , RNA-SeqRESUMO
Teratogenesis testing can be challenging due to the limitations of both in vitro and in vivo models. Test-systems, based especially on human embryonic cells, have been helping to overcome the difficulties when allied to omics strategies, such as transcriptomics. In these test-systems, cells exposed to different compounds are then analyzed in microarray or RNA-seq platforms regarding the impacts of the potential teratogens in the gene expression. Nevertheless, microarray and RNA-seq dataset processing requires computational resources and bioinformatics knowledge. Here, a pipeline for microarray and RNA-seq processing is presented, aiming to help researchers from any field to interpret the main transcriptome results, such as differential gene expression, enrichment analysis, and statistical interpretation. This chapter also discusses the main difficulties that can be encountered in a transcriptome analysis and the better alternatives to overcome these issues, describing both programming codes and user-friendly tools. Finally, specific issues in the teratogenesis field, such as time-course analysis, are also described, demonstrating how the pipeline can be applied in these studies.
Assuntos
Teratogênese , Humanos , Teratogênese/genética , Perfilação da Expressão Gênica , RNA-Seq , Transcriptoma , Biologia ComputacionalRESUMO
Tuberculosis (TB) continues to be the world’s leading killer of infectious diseases. Despite global efforts to gradually reduce the number of annual deaths and the incidence of this disease, the coronavirus disease 19 (COVID-19) pandemic caused decreased in TB detection and affected the prompt treatment TB which led to a setback to the 2019 rates. However, the development and testing of new TB vaccines has not stopped and now presents the possibility of implanting in the next five years a new vaccine that is affordable and might be used in the various key vulnerable populations affected by TB. Then, this assay aimed to discuss the main vaccines developed against TB that shortly could be selected and used worldwide, and additionally, evidence the Brazilian potential candidates’ vaccines in developing in Brazil that could be considered among those in level advanced to TB end.
RESUMO
Tuberculosis (TB) continues to be the world's leading killer of infectious diseases. Despite global efforts to gradually reduce the number of annual deaths and the incidence of this disease, the coronavirus disease 19 (COVID-19) pandemic caused decreased in TB detection and affected the prompt treatment TB which led to a setback to the 2019 rates. However, the development and testing of new TB vaccines has not stopped and now presents the possibility of implanting in the next five years a new vaccine that is affordable and might be used in the various key vulnerable populations affected by TB. Then, this assay aimed to discuss the main vaccines developed against TB that shortly could be selected and used worldwide, and additionally, evidence the Brazilian potential candidates' vaccines in developing in Brazil that could be considered among those in level advanced to TB end.
RESUMO
The following protocol introduces a targeted methodological approach of differential gene expression analysis, which is particularly beneficial in the context of non-model species. While we acknowledge that biological complexity often involves the interplay of multiple genes in any given biological response our method provides a strategy to streamline this complexity, enabling researchers to focus on a more manageable subset of genes of interest. In this context, red cedar transcriptome (Cedrela odorata L.) and known or hypothetical genes related to the response to herbivory were used as reference. The protocol key points are:â¢Implementation of a transcriptome thinning process to eliminate redundant and non-coding sequences, optimizing the analysis and reducing processing time.â¢Use of a custom gene database to identify and retain coding sequences with high precision.â¢Focus on specific genes of interest, allowing a more targeted analysis for specific experimental conditions. This approach holds particular value for pilot studies, research with limited resources, or when rapid identification and validation of candidate genes are needed in species without a reference genome.
RESUMO
En el presente trabajo se estudia la actividad horaria de los mamíferos que habitan el área circundante a la línea transportadora de gas de Camisea que atraviesa la Reserva Comunal Machiguenga. Desde febrero del 2020 hasta enero del 2021, se realizó un registro fotográfico mediante cámaras trampa dispuestas a lo largo de la tubería de gas. Los patrones de actividad se estimaron mediante la función de densidad de Kernel. Durante el periodo de estudio, se registraron 25 especies de mamíferos. Se encontró que Dasyprocta kalinowskii y Eira barbara presentan un patrón de actividad diurno; mientras que Cuniculus paca, Tapirus terrestris, Dasypus spp. y Mazama spp. presentan un patrón predominantemente nocturno. Se sugiere que los patrones de actividad observados estarían influenciados por varios factores como la exclusión competitiva entre D. kalinowskii y C. paca, disponibilidad estacional del alimento para T. terrestris, variación de temperatura y precipitación para Dasypus spp., restricciones filogenéticas en Mazama spp., y segregación temporal con otros carnívoros para E. barbara. Se destaca la importancia de la colaboración entre las empresas del rubro energético, las comunidades nativas y las organizaciones gubernamentales.
The present study investigates the hourly activity patterns of mammals inhabiting the area surrounding the Camisea gas pipeline that crosses the Machiguenga Communal Reserve. From February 2020 to January 2021, a photographic record was conducted using camera traps placed along the gas pipeline. Activity patterns were estimated using Kernel density functions. During the study period, 25 mammal species were recorded. It was found that Dasyprocta kalinowskii and Eira barbara exhibit a diurnal activity pattern, whereas Cuniculus paca, Tapirus terrestris, Dasypus spp., and Mazama spp. display predominantly nocturnal behavior. It is suggested that observed activity patterns could be influenced by various factors such as competitive exclusion between D. kalinowskii and C. paca, seasonal food availability for T. terrestris, temperature and precipitation variations for Dasypus spp., phylogenetic constraints in Mazama spp., and temporal segregation with other carnivores for E. barbara. The significance of collaboration between energy industry companies, native communities, and governmental organizations is emphasized.
RESUMO
The house sparrow (Passer domesticus) is a valuable avian model for studying evolutionary genetics, development, neurobiology, physiology, behavior, and ecology, both in laboratory and field-based settings. The current annotation of the P. domesticus genome available at the Ensembl Rapid Release site is primarily focused on gene set building and lacks functional information. In this study, we present the first comprehensive functional reannotation of the P. domesticus genome using intestinal Illumina RNA sequencing (RNA-Seq) libraries. Our revised annotation provides an expanded view of the genome, encompassing 38592 transcripts compared to the current 23574 transcripts in Ensembl. We also predicted 14717 protein-coding genes, achieving 96.4% completeness for Passeriformes lineage BUSCOs. A substantial improvement in this reannotation is the accurate delineation of untranslated region (UTR) sequences. We identified 82.7% and 93.8% of the transcripts containing 5'- and 3'-UTRs, respectively. These UTR annotations are crucial for understanding post-transcriptional regulatory processes. Our findings underscore the advantages of incorporating additional specific RNA-Seq data into genome annotation, particularly when leveraging fast and efficient data processing capabilities. This functional reannotation enhances our understanding of the P. domesticus genome, providing valuable resources for future investigations in various research fields.
RESUMO
ASGARD+ (Accelerated Sequential Genome-analysis and Antibiotic Resistance Detection) is a command-line platform for automatic identification of antibiotic-resistance genes in bacterial genomes, providing an easy-to-use interface to process big batches of sequence files from whole genome sequencing, with minimal configuration. It also provides a CPU-optimization algorithm that reduces the processing time. This tool consists of two main protocols. The first one, ASGARD, is based on the identification and annotation of antimicrobial resistance elements directly from the short reads using different public databases. SAGA, enables the alignment, indexing, and mapping of whole-genome samples against a reference genome for the detection and call of variants, as well as the visualization of the results through the construction of a tree of SNPs. The application of both protocols is performed using just one short command and one configuration file based on JSON syntax, which modulates each pipeline step, allowing the user to do as many interventions as needed on the different software tools that are adapted to the pipeline. The modular ASGARD+ allows researchers with little experience in bioinformatic analysis and command-line use to quickly explore bacterial genomes in depth, optimizing analysis times and obtaining accurate results. © 2023 Wiley Periodicals LLC. Basic Protocol 1: ASGARD+ installation Basic Protocol 2: Configuration files general setup Basic Protocol 3: ASGARD execution Support Protocol: Results visualization with Phandango Basic Protocol 4: SAGA execution Alternative Protocol 1: Container installation Alternative Protocol 2: Run ASGARD and SAGA in container.
Assuntos
Algoritmos , Software , Genoma Bacteriano , Sequenciamento Completo do Genoma , Resistência Microbiana a MedicamentosRESUMO
The reconstruction of phylogenomic trees containing multiple genes is best achieved by using a supermatrix. The advent of NGS technology made it easier and cheaper to obtain multiple gene data in one sequencing run. When numerous genes and organisms are used in the phylogenomic analysis, it is difficult to organize all information and manually align the gene sequences to further concatenate them. This study describes SPLACE, a tool to automatically SPLit, Align, and ConcatenatE the genes of all species of interest to generate a supermatrix file, and consequently, a phylogenetic tree, while handling possible missing data. In our findings, SPLACE was the only tool that could automatically align gene sequences and also handle missing data; and, it required only a few minutes to produce a supermatrix FASTA file containing 83 aligned and concatenated genes from the chloroplast genomes of 270 plant species. It is an open-source tool and is publicly available at https://github.com/reinator/splace.
RESUMO
Background: Genome skimming is a popular method in plant phylogenomics that do not include a biased enrichment step, relying on random shallow sequencing of total genomic DNA. From these data the plastome is usually readily assembled and constitutes the bulk of phylogenetic information generated in these studies. Despite a few attempts to use genome skims to recover low copy nuclear loci for direct phylogenetic use, such endeavor remains neglected. Causes might include the trade-off between libraries with few reads and species with large genomes (i.e., missing data caused by low coverage), but also might relate to the lack of pipelines for data assembling. Methods: A pipeline and its companion R package designed to automate the recovery of low copy nuclear markers from genome skimming libraries are presented. Additionally, a series of analyses aiming to evaluate the impact of key assembling parameters, reference selection and missing data are presented. Results: A substantial amount of putative low copy nuclear loci was assembled and proved useful to base phylogenetic inference across the libraries tested (4 to 11 times more data than previously assembled plastomes from the same libraries). Discussion: Critical aspects of assembling low copy nuclear markers from genome skims include the minimum coverage and depth of a sequence to be used. More stringent values of these parameters reduces the amount of assembled data and increases the relative amount of missing data, which can compromise phylogenetic inference, in turn relaxing the same parameters might increase sequence error. These issues are discussed in the text, and parameter tuning through multiple comparisons tracking their effects on support and congruence is highly recommended when using this pipeline. The skimmingLoci pipeline (https://github.com/mreginato/skimmingLoci) might stimulate the use of genome skims to recover nuclear loci for direct phylogenetic use, increasing the power of genome skimming data to resolve phylogenetic relationships, while reducing the amount of sequenced DNA that is commonly wasted.
Assuntos
DNA , Genoma de Planta , Filogenia , Genoma de Planta/genética , Análise de Sequência de DNA/métodos , Biblioteca GenômicaRESUMO
BACKGROUND: The advancement of hybrid sequencing technologies is increasingly expanding genome assemblies that are often annotated using hybrid sequencing transcriptomics, leading to improved genome characterization and the identification of novel genes and isoforms in a wide variety of organisms. RESULTS: We developed an easy-to-use genome-guided transcriptome annotation pipeline that uses assembled transcripts from hybrid sequencing data as input and distinguishes between coding and long non-coding RNAs by integration of several bioinformatic approaches, including gene reconciliation with previous annotations in GTF format. We demonstrated the efficiency of this approach by correctly assembling and annotating all exons from the chicken SCO-spondin gene (containing more than 105 exons), including the identification of missing genes in the chicken reference annotations by homology assignments. CONCLUSIONS: Our method helps to improve the current transcriptome annotation of the chicken brain. Our pipeline, implemented on Anaconda/Nextflow and Docker is an easy-to-use package that can be applied to a broad range of species, tissues, and research areas helping to improve and reconcile current annotations. The code and datasets are publicly available at https://github.com/cfarkas/annotate_my_genomes.
Assuntos
Análise de Sequência de RNARESUMO
A device known as a pipeline inspection gauge (PIG) runs through oil and gas pipelines which performs various maintenance operations in the oil and gas industry. The PIG velocity, which plays a role in the efficiency of these operations, is usually determined indirectly from odometers installed in it. Although this is a relatively simple technique, the loss of contact between the odometer wheel and the pipeline results in measurement errors. To help reduce these errors, this investigation employed neural networks to estimate the speed of a prototype PIG, using the pressure difference that acts on the device inside the pipeline and its acceleration instead of using odometers. Static networks (e.g., multilayer perceptron) and recurrent networks (e.g., long short-term memory) were built, and in addition, a prototype PIG was developed with an embedded system based on Raspberry Pi 3 to collect speed, acceleration and pressure data for the model training. The implementation of the supervised neural networks used the Python library TensorFlow package. To train and evaluate the models, we used the PIG testing pipeline facilities available at the Petroleum Evaluation and Measurement Laboratory of the Federal University of Rio Grande do Norte (LAMP/UFRN). The results showed that the models were able to learn the relationship among the differential pressure, acceleration and speed of the PIG. The proposed approach can complement odometer-based systems, increasing the reliability of speed measurements.
Assuntos
Aprendizado de Máquina , Redes Neurais de Computação , Reprodutibilidade dos TestesRESUMO
There are multiple tools for positive selection analysis, including vaccine design and detection of variants of circulating drug-resistant pathogens in population selection. However, applying these tools to analyze a large number of protein families or as part of a comprehensive phylogenomics pipeline could be challenging. Since many standard bioinformatics tools are only available as executables, integrating them into complex Bioinformatics pipelines may not be possible. We have developed OBI, an open-source tool aimed to facilitate positive selection analysis on a large scale. It can be used as a stand-alone command-line app that can be easily installed and used as a Conda package. Some advantages of using OBI are:â¢It speeds up the analysis by automating the entire processâ¢It allows multiple starting points and customization for the analysisâ¢It allows the retrieval and linkage of structural and evolutive data for a protein throughWe hope to provide with OBI a solution for reliably speeding up large-scale protein evolutionary and structural analysis.
RESUMO
In emergent technologies, data integrity is critical for message-passing communications, where security measures and validations must be considered to prevent the entrance of invalid data, detect errors in transmissions, and prevent data loss. The SHA-256 algorithm is used to tackle these requirements. Current hardware architecture works present issues regarding real-time balance among processing, efficiency and cost, because some of them introduce significant critical paths. Besides, the SHA-256 algorithm itself considers no verification mechanisms for internal calculations and failure prevention. Hardware implementations can be affected by diverse problems, ranging from physical phenomena to interference or faults inherent to data spectra. Previous works have mainly addressed this problem through three kinds of redundancy: information, hardware, or time. To the best of our knowledge, pipelining has not been previously used to perform different hash calculations with a redundancy topic. Therefore, in this work, we present a novel hybrid architecture, implemented on a 3-stage pipeline structure, which is traditionally used to improve performance by simultaneously processing several blocks; instead, we propose using a pipeline technique for implementing hardware and time redundancies, analyzing hardware resources and performance to balance the critical path. We have improved performance at a certain clock speed, defining a data flow transformation in several sequential phases. Our architecture reported a throughput of 441.72 Mbps and 2255 LUTs, and presented an efficiency of 195.8 Kbps/LUT.
RESUMO
Due to recent developments in NGS technologies, genome sequencing is generating large volumes of new data containing a wealth of biological information. Understanding sequenced genomes in a biologically meaningful way and delineating their functional and metabolic landscapes is a first-level challenge. Considering the global antimicrobial resistance (AMR) problem, investments to expand surveillance and improve existing genome analysis technologies are pressing. In addition, the speed at which new genomic data is generated surpasses our capacity to analyze it with available bioinformatics methods, thus creating a need to develop new, user-friendly and comprehensive analytical tools. To this end, we propose a new web application, CABGen, developed with open-source software. CABGen allows storing, organizing, analyzing, and interpreting bioinformatics data in a friendly, scalable, easy-to-use environment and can process data from bacterial isolates of different species and origins. CABGen has three modules: Upload Sequences, Analyze Sequences, and Verify Results. Functionalities include coverage estimation, species identification, de novo genome assembly, and assembly quality, genome annotation, MLST mapping, searches for genes related to AMR, virulence, and plasmids, and detection of point mutations in specific AMR genes. Visualization tools are also available, greatly facilitating the handling of biological data. The reports include those results that are clinically relevant. To illustrate the use of CABGen, whole-genome shotgun data from 181 bacterial isolates of different species collected in 5 Brazilian regions between 2018 and 2020 were uploaded and submitted to the platform's modules.