ABSTRACT
The reduction in the cost of DNA sequencing and the total time to perform this process has resulted in a significant increase in the deposit of biological information in public databases such as the NCBI (National Center for Biotechnology Information). The production of large volumes of data per run has culminated in the need to develop algorithms capable of handling data with this new feature and assisting in analyses such as the assembly and annotation of prokaryotic genomes. Over the years, several pipelines and computational tools have been developed to automate this task and consequently reduce the total time to know the genetic content of a given organism, especially non-model organisms, collaborating with the identification of possible targets with biotechnological applicability. In the case of automatic annotation tools, the accuracy of the results is widely observed in the literature, however, this does not excludes the manual curation process, where the information inferred in the automatic process is verified and enriched by the curators. This task requires a time which is directly proportional to the number of gene products of the target organism under study. To assist in this process, we present the ReNoteWeb web tool, endowed with a simple and intuitive interface, to perform the assembly enhancement process, with the possibility of identifying the missing products in the original genomic sequence. In addition, ReNoteWeb is capable of performing the annotation process for all products, based on information obtained from highly accurate external databases. The engine responsible for performing the data processing was developed in JAVA and the web platform uses the resources of the Yii framework. The annotation produced by this platform aims to reduce the overall time in the manual curation process. Twenty-three organisms were used to validate the tool. The efficiency was verified by comparing the annotation of these same organisms available in the NCBI database and the annotation performed on the RAST platform. The tool is available at: http://biod.ufpa.br/renoteweb/.
Subject(s)
Genome , Genomics , Databases, Genetic , Genomics/methods , Molecular Sequence Annotation , Sequence Analysis, DNA , SoftwareABSTRACT
The PAN2HGENE is a computational tool that enables two main analyses. First, the tool can identify gene products absent from the original prokaryotic genome sequence. Second, it enables automated comparative analysis for both complete and draft genomes. All analyses are performed through a simple and intuitive graphical user interface without the need for extensive and complex command lines. For complete details on the use and execution of this protocol, please refer to Silva de Oliveira (2021).
Subject(s)
Bacteria , Software , Genome , Prokaryotic CellsABSTRACT
Genome annotation conceptually consists of inferring and assigning biological information to gene products. Over the years, numerous pipelines and computational tools have been developed aiming to automate this task and assist researchers in gaining knowledge about target genes of study. However, even with these technological advances, manual annotation or manual curation is necessary, where the information attributed to the gene products is verified and enriched. Despite being called the gold standard process for depositing data in a biological database, the task of manual curation requires significant time and effort from researchers who sometimes have to parse through numerous products in various public databases. To assist with this problem, we present CODON, a tool for manual curation of genomic data, capable of performing the prediction and annotation process. This software makes use of a finite state machine in the prediction process and automatically annotates products based on information obtained from the Uniprot database. CODON is equipped with a simple and intuitive graphic interface that assists on manual curation, enabling the user to decide about the analysis based on information as to identity, length of the alignment, and name of the organism in which the product obtained a match. Further, visual analysis of all matches found in the database is possible, impacting significantly in the curation task considering that the user has at his disposal all the information available for a given product. An analysis performed on eleven organisms was used to test the efficiency of this tool by comparing the results of prediction and annotation through CODON to ones from the NCBI and RAST platforms.
Subject(s)
Bacteria/genetics , Genomics/methods , Molecular Sequence Annotation/methods , Software , Databases, Genetic , User-Computer InterfaceABSTRACT
The Next-Generation Sequencing (NGS) platforms provide a major approach to obtaining millions of short reads from samples. NGS has been used in a wide range of analyses, such as for determining genome sequences, analyzing evolutionary processes, identifying gene expression and resolving metagenomic analyses. Usually, the quality of NGS data impacts the final study conclusions. Moreover, quality assessment is generally considered the first step in data analyses to ensure the use of only reliable reads for further studies. In NGS platforms, the presence of duplicated reads (redundancy) that are usually introduced during library sequencing is a major issue. These might have a serious impact on research application, as redundancies in reads can lead to difficulties in subsequent analysis (e.g., de novo genome assembly). Herein, we present NGSReadsTreatment, a computational tool for the removal of duplicated reads in paired-end or single-end datasets. NGSReadsTreatment can handle reads from any platform with the same or different sequence lengths. Using the probabilistic structure Cuckoo Filter, the redundant reads are identified and removed by comparing the reads with themselves. Thus, no prerequisite is required beyond the set of reads. NGSReadsTreatment was compared with other redundancy removal tools in analyzing different sets of reads. The results demonstrated that NGSReadsTreatment was better than the other tools in both the amount of redundancies removed and the use of computational memory for all analyses performed. Available in https://sourceforge.net/projects/ngsreadstreatment/ .
Subject(s)
Algorithms , DNA, Bacterial/genetics , DNA, Fungal/genetics , Sequence Analysis, DNA/statistics & numerical data , Software , Arcobacter/genetics , Escherichia coli/genetics , High-Throughput Nucleotide Sequencing/methods , Humans , Internet , Mycobacterium tuberculosis/geneticsABSTRACT
The availability of biological information in public databases has increased exponentially. To ensure the accuracy of this information, researchers have adopted several methods and refinements to avoid the dissemination of incorrect information; for example, several automated tools are available for annotation processes. However, manual curation ensures and enriches biological information. Additionally, the genomic finishing process is complex, resulting in increased deposition of drafts genomes. This introduces bias in other omics analyses because incomplete genomic content is used. This is also observed for complete genomes. For example, genomes generated by reference assembly may not include new products in the new sequence or errors or bias can occur during the assembly process. Thus, we developed ImproveAssembly, a tool capable of identifying new products missing from genomic sequences, which can be used for complete and draft genomes. The identified products can improve the annotation of complete genomes and drafts while significantly reducing the bias when the information is used in other omics analyses.
Subject(s)
Genome , Sequence Analysis, DNA/methods , Software , Escherichia coli/genetics , Genetic Loci , Reproducibility of Results , WorkflowABSTRACT
UNLABELLED: Next-generation sequencing technologies have increased the amount of biological data generated. Thus, bioinformatics has become important because new methods and algorithms are necessary to manipulate and process such data. However, certain challenges have emerged, such as genome assembly using short reads and high-throughput platforms. In this context, several algorithms have been developed, such as Velvet, Abyss, Euler-SR, Mira, Edna, Maq, SHRiMP, Newbler, ALLPATHS, Bowtie and BWA. However, most such assemblers do not have a graphical interface, which makes their use difficult for users without computing experience given the complexity of the assembler syntax. Thus, to make the operation of such assemblers accessible to users without a computing background, we developed AutoAssemblyD, which is a graphical tool for genome assembly submission and remote management by multiple assemblers through XML templates. AVAILABILITY: AssemblyD is freely available at https://sourceforge.net/projects/autoassemblyd. It requires Sun jdk 6 or higher.