Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 499
Filtrar
1.
Sci Rep ; 14(1): 15145, 2024 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-38956134

RESUMO

Hepatitis C virus (HCV) is a plus-stranded RNA virus that often chronically infects liver hepatocytes and causes liver cirrhosis and cancer. These viruses replicate their genomes employing error-prone replicases. Thereby, they routinely generate a large 'cloud' of RNA genomes (quasispecies) which-by trial and error-comprehensively explore the sequence space available for functional RNA genomes that maintain the ability for efficient replication and immune escape. In this context, it is important to identify which RNA secondary structures in the sequence space of the HCV genome are conserved, likely due to functional requirements. Here, we provide the first genome-wide multiple sequence alignment (MSA) with the prediction of RNA secondary structures throughout all representative full-length HCV genomes. We selected 57 representative genomes by clustering all complete HCV genomes from the BV-BRC database based on k-mer distributions and dimension reduction and adding RefSeq sequences. We include annotations of previously recognized features for easy comparison to other studies. Our results indicate that mainly the core coding region, the C-terminal NS5A region, and the NS5B region contain secondary structure elements that are conserved beyond coding sequence requirements, indicating functionality on the RNA level. In contrast, the genome regions in between contain less highly conserved structures. The results provide a complete description of all conserved RNA secondary structures and make clear that functionally important RNA secondary structures are present in certain HCV genome regions but are largely absent from other regions. Full-genome alignments of all branches of Hepacivirus C are provided in the supplement.


Assuntos
Sequência Conservada , Genoma Viral , Hepacivirus , Conformação de Ácido Nucleico , RNA Viral , Hepacivirus/genética , RNA Viral/genética , RNA Viral/química , Humanos , Alinhamento de Sequência , Hepatite C/virologia , Hepatite C/genética
2.
Nat Struct Mol Biol ; 2024 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-38867113

RESUMO

G-protein-coupled receptors (GPCRs) activate heterotrimeric G proteins by promoting guanine nucleotide exchange. Here, we investigate the coupling of G proteins with GPCRs and describe the events that ultimately lead to the ejection of GDP from its binding pocket in the Gα subunit, the rate-limiting step during G-protein activation. Using molecular dynamics simulations, we investigate the temporal progression of structural rearrangements of GDP-bound Gs protein (Gs·GDP; hereafter GsGDP) upon coupling to the ß2-adrenergic receptor (ß2AR) in atomic detail. The binding of GsGDP to the ß2AR is followed by long-range allosteric effects that significantly reduce the energy needed for GDP release: the opening of α1-αF helices, the displacement of the αG helix and the opening of the α-helical domain. Signal propagation to the Gs occurs through an extended receptor interface, including a lysine-rich motif at the intracellular end of a kinked transmembrane helix 6, which was confirmed by site-directed mutagenesis and functional assays. From this ß2AR-GsGDP intermediate, Gs undergoes an in-plane rotation along the receptor axis to approach the ß2AR-Gsempty state. The simulations shed light on how the structural elements at the receptor-G-protein interface may interact to transmit the signal over 30 Å to the nucleotide-binding site. Our analysis extends the current limited view of nucleotide-free snapshots to include additional states and structural features responsible for signaling and G-protein coupling specificity.

3.
J Comput Biol ; 31(6): 549-563, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38935442

RESUMO

Extrinsic, experimental information can be incorporated into thermodynamics-based RNA folding algorithms in the form of pseudo-energies. Evolutionary conservation of RNA secondary structure elements is detectable in alignments of phylogenetically related sequences and provides evidence for the presence of certain base pairs that can also be converted into pseudo-energy contributions. We show that the centroid base pairs computed from a consensus folding model such as RNAalifold result in a substantial improvement of the prediction accuracy for single sequences. Evidence for specific base pairs turns out to be more informative than a position-wise profile for the conservation of the pairing status. A comparison with chemical probing data, furthermore, strongly suggests that phylogenetic base pairing data are more informative than position-specific data on (un)pairedness as obtained from chemical probing experiments. In this context we demonstrate, in addition, that the conversion of signal from probing data into pseudo-energies is possible using thermodynamic structure predictions as a reference instead of known RNA structures.


Assuntos
Algoritmos , Conformação de Ácido Nucleico , Filogenia , RNA , Termodinâmica , RNA/química , RNA/genética , Pareamento de Bases , Dobramento de RNA , Sequência de Bases , Biologia Computacional/métodos
4.
Methods Mol Biol ; 2802: 347-393, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38819565

RESUMO

Over the last quarter of a century it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large-scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible non-coding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of non-coding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.


Assuntos
Biologia Computacional , Genômica , Humanos , Biologia Computacional/métodos , Genômica/métodos , Conformação de Ácido Nucleico , RNA/genética , RNA não Traduzido/genética , Análise de Sequência de RNA/métodos
5.
Methods Mol Biol ; 2726: 347-376, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38780738

RESUMO

Structural changes in RNAs are an important contributor to controlling gene expression not only at the posttranscriptional stage but also during transcription. A subclass of riboswitches and RNA thermometers located in the 5' region of the primary transcript regulates the downstream functional unit - usually an ORF - through premature termination of transcription. Not only such elements occur naturally, but they are also attractive devices in synthetic biology. The possibility to design such riboswitches or RNA thermometers is thus of considerable practical interest. Since these functional RNA elements act already during transcription, it is important to model and understand the dynamics of folding and, in particular, the formation of intermediate structures concurrently with transcription. Cotranscriptional folding simulations are therefore an important step to verify the functionality of design constructs before conducting expensive and labor-intensive wet lab experiments. For RNAs, full-fledged molecular dynamics simulations are far beyond practical reach because of both the size of the molecules and the timescales of interest. Even at the simplified level of secondary structures, further approximations are necessary. The BarMap approach is based on representing the secondary structure landscape for each individual transcription step by a coarse-grained representation that only retains a small set of low-energy local minima and the energy barriers between them. The folding dynamics between two transcriptional elongation steps is modeled as a Markov process on this representation. Maps between pairs of consecutive coarse-grained landscapes make it possible to follow the folding process as it changes in response to transcription elongation. In its original implementation, the BarMap software provides a general framework to investigate RNA folding dynamics on temporally changing landscapes. It is, however, difficult to use in particular for specific scenarios such as cotranscriptional folding. To overcome this limitation, we developed the user-friendly BarMap-QA pipeline described in detail in this contribution. It is illustrated here by an elaborate example that emphasizes the careful monitoring of several quality measures. Using an iterative workflow, a reliable and complete kinetics simulation of a synthetic, transcription-regulating riboswitch is obtained using minimal computational resources. All programs and scripts used in this contribution are free software and available for download as a source distribution for Linux® or as a platform-independent Docker® image including support for Apple macOS® and Microsoft Windows®.


Assuntos
Simulação de Dinâmica Molecular , Conformação de Ácido Nucleico , Dobramento de RNA , Transcrição Gênica , Riboswitch/genética , RNA/química , RNA/genética , Software
6.
Methods Mol Biol ; 2802: 1-32, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38819554

RESUMO

Most genes are part of larger families of evolutionary-related genes. The history of gene families typically involves duplications and losses of genes as well as horizontal transfers into other organisms. The reconstruction of detailed gene family histories, i.e., the precise dating of evolutionary events relative to phylogenetic tree of the underlying species has remained a challenging topic despite their importance as a basis for detailed investigations into adaptation and functional evolution of individual members of the gene family. The identification of orthologs, moreover, is a particularly important subproblem of the more general setting considered here. In the last few years, an extensive body of mathematical results has appeared that tightly links orthology, a formal notion of best matches among genes, and horizontal gene transfer. The purpose of this chapter is to broadly outline some of the key mathematical insights and to discuss their implication for practical applications. In particular, we focus on tree-free methods, i.e., methods to infer orthology or horizontal gene transfer as well as gene trees, species trees, and reconciliations between them without using a priori knowledge of the underlying trees or statistical models for the inference of phylogenetic trees. Instead, the initial step aims to extract binary relations among genes.


Assuntos
Evolução Molecular , Transferência Genética Horizontal , Família Multigênica , Filogenia , Modelos Genéticos , Biologia Computacional/métodos
7.
Phys Rev E ; 109(3-1): 034303, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38632720

RESUMO

Graphs have become widely used to represent and study social, biological, and technological systems. Statistical methods to analyze empirical graphs were proposed based on the graph's spectral density. However, their running time is cubic in the number of vertices, precluding direct application to large instances. Thus, efficient algorithms to calculate the spectral density become necessary. For sparse graphs, the cavity method can efficiently approximate the spectral density of locally treelike undirected and directed graphs. However, it does not apply to most empirical graphs because they have heterogeneous structures. Thus, we propose methods for undirected and directed graphs with heterogeneous structures using a new vertex's neighborhood definition and the cavity approach. Our methods' time and space complexities are O(|E|h_{max}^{3}t) and O(|E|h_{max}^{2}t), respectively, where |E| is the number of edges, h_{max} is the size of the largest local neighborhood of a vertex, and t is the number of iterations required for convergence. We demonstrate the practical efficacy by estimating the spectral density of simulated and real-world undirected and directed graphs.

8.
Nat Genet ; 56(4): 721-731, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38622339

RESUMO

Coffea arabica, an allotetraploid hybrid of Coffea eugenioides and Coffea canephora, is the source of approximately 60% of coffee products worldwide, and its cultivated accessions have undergone several population bottlenecks. We present chromosome-level assemblies of a di-haploid C. arabica accession and modern representatives of its diploid progenitors, C. eugenioides and C. canephora. The three species exhibit largely conserved genome structures between diploid parents and descendant subgenomes, with no obvious global subgenome dominance. We find evidence for a founding polyploidy event 350,000-610,000 years ago, followed by several pre-domestication bottlenecks, resulting in narrow genetic variation. A split between wild accessions and cultivar progenitors occurred ~30.5 thousand years ago, followed by a period of migration between the two populations. Analysis of modern varieties, including lines historically introgressed with C. canephora, highlights their breeding histories and loci that may contribute to pathogen resistance, laying the groundwork for future genomics-based breeding of C. arabica.


Assuntos
Coffea , Coffea/genética , Café , Genoma de Planta/genética , Metagenômica , Melhoramento Vegetal
10.
RNA Biol ; 21(1): 1-12, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38528797

RESUMO

The accurate classification of non-coding RNA (ncRNA) sequences is pivotal for advanced non-coding genome annotation and analysis, a fundamental aspect of genomics that facilitates understanding of ncRNA functions and regulatory mechanisms in various biological processes. While traditional machine learning approaches have been employed for distinguishing ncRNA, these often necessitate extensive feature engineering. Recently, deep learning algorithms have provided advancements in ncRNA classification. This study presents BioDeepFuse, a hybrid deep learning framework integrating convolutional neural networks (CNN) or bidirectional long short-term memory (BiLSTM) networks with handcrafted features for enhanced accuracy. This framework employs a combination of k-mer one-hot, k-mer dictionary, and feature extraction techniques for input representation. Extracted features, when embedded into the deep network, enable optimal utilization of spatial and sequential nuances of ncRNA sequences. Using benchmark datasets and real-world RNA samples from bacterial organisms, we evaluated the performance of BioDeepFuse. Results exhibited high accuracy in ncRNA classification, underscoring the robustness of our tool in addressing complex ncRNA sequence data challenges. The effective melding of CNN or BiLSTM with external features heralds promising directions for future research, particularly in refining ncRNA classifiers and deepening insights into ncRNAs in cellular processes and disease manifestations. In addition to its original application in the context of bacterial organisms, the methodologies and techniques integrated into our framework can potentially render BioDeepFuse effective in various and broader domains.


Assuntos
Aprendizado Profundo , RNA não Traduzido/genética , Algoritmos , RNA , Redes Neurais de Computação
11.
bioRxiv ; 2024 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-38260273

RESUMO

Biological relatedness is a key consideration in studies of behavior, population structure, and trait evolution. Except for parent-offspring dyads, pedigrees capture relatedness imperfectly. The number and length of DNA segments that are identical-by-descent (IBD) yield the most precise estimates of relatedness. Here, we leverage novel methods for estimating locus-specific IBD from low coverage whole genome resequencing data to demonstrate the feasibility and value of resolving fine-scaled gradients of relatedness in free-living animals. Using primarily 4-6× coverage data from a rhesus macaque (Macaca mulatta) population with available long-term pedigree data, we show that we can call the number and length of IBD segments across the genome with high accuracy even at 0.5× coverage. The resulting estimates demonstrate substantial variation in genetic relatedness within kin classes, leading to overlapping distributions between kin classes. They identify cryptic genetic relatives that are not represented in the pedigree and reveal elevated recombination rates in females relative to males, which allows us to discriminate maternal and paternal kin using genotype data alone. Our findings represent a breakthrough in the ability to understand the predictors and consequences of genetic relatedness in natural populations, contributing to our understanding of a fundamental component of population structure in the wild.

12.
Mol Ecol Resour ; 24(2): e13904, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37994269

RESUMO

Several computational frameworks and workflows that recover genomes from prokaryotes, eukaryotes and viruses from metagenomes exist. Yet, it is difficult for scientists with little bioinformatics experience to evaluate quality, annotate genes, dereplicate, assign taxonomy and calculate relative abundance and coverage of genomes belonging to different domains. MuDoGeR is a user-friendly tool tailored for those familiar with Unix command-line environment that makes it easy to recover genomes of prokaryotes, eukaryotes and viruses from metagenomes, either alone or in combination. We tested MuDoGeR using 24 individual-isolated genomes and 574 metagenomes, demonstrating the applicability for a few samples and high throughput. While MuDoGeR can recover eukaryotic viral sequences, its characterization is predominantly skewed towards bacterial and archaeal viruses, reflecting the field's current state. However, acting as a dynamic wrapper, the MuDoGeR is designed to constantly incorporate updates and integrate new tools, ensuring its ongoing relevance in the rapidly evolving field. MuDoGeR is open-source software available at https://github.com/mdsufz/MuDoGeR. Additionally, MuDoGeR is also available as a Singularity container.


Assuntos
Metagenoma , Vírus , Metagenômica , Software , Bactérias/genética , Filogenia , Vírus/genética
13.
Front Bioinform ; 3: 1322477, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38152702

RESUMO

Proteinortho is a widely used tool to predict (co)-orthologous groups of genes for any set of species. It finds application in comparative and functional genomics, phylogenomics, and evolutionary reconstructions. With a rapidly increasing number of available genomes, the demand for large-scale predictions is also growing. In this contribution, we evaluate and implement major algorithmic improvements that significantly enhance the speed of the analysis without reducing precision. Graph-based detection of (co-)orthologs is typically based on a reciprocal best alignment heuristic that requires an all vs. all comparison of proteins from all species under study. The initial identification of similar proteins is accelerated by introducing an alternative search tool along with a revised search strategy-the pseudo-reciprocal best alignment heuristic-that reduces the number of required sequence comparisons by one-half. The clustering algorithm was reworked to efficiently decompose very large clusters and accelerate processing. Proteinortho6 reduces the overall processing time by an order of magnitude compared to its predecessor while maintaining its small memory footprint and good predictive quality.

14.
ACS Chem Biol ; 18(12): 2441-2449, 2023 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-37962075

RESUMO

The chemical biology of native nucleic acid modifications has seen an intense upswing, first concerning DNA modifications in the field of epigenetics and then concerning RNA modifications in a field that was correspondingly rebaptized epitranscriptomics by analogy. The German Research Foundation (DFG) has funded several consortia with a scientific focus in these fields, strengthening the traditionally well-developed nucleic acid chemistry community and inciting it to team up with colleagues from the life sciences and data science to tackle interdisciplinary challenges. This Perspective focuses on the genesis, scientific outcome, and downstream impact of the DFG priority program SPP1784 and offers insight into how it fecundated further consortia in the field. Pertinent research was funded from mid-2015 to 2022, including an extension related to the coronavirus pandemic. Despite being a detriment to research activity in general, the pandemic has resulted in tremendously boosted interest in the field of RNA and RNA modifications as a consequence of their widespread and successful use in vaccination campaigns against SARS-CoV-2. Funded principal investigators published over 250 pertinent papers with a very substantial impact on the field. The program also helped to redirect numerous laboratories toward this dynamic field. Finally, SPP1784 spawned initiatives for several funded consortia that continue to drive the fields of nucleic acid modification.


Assuntos
Ácidos Nucleicos , RNA , Epigênese Genética , Biologia
15.
Bioinformatics ; 39(11)2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37944046

RESUMO

SUMMARY: RNA molecules play crucial roles in various biological processes. They mediate their function mainly by interacting with other RNAs or proteins. At present, information about these interactions is distributed over different resources, often providing the data in simple tab-delimited formats that differ between the databases. There is no standardized data format that can capture the nature of all these different interactions in detail. AVAILABILITY AND IMPLEMENTATION: Here, we propose the RNA interaction format (RIF) for the detailed representation of RNA-RNA and RNA-Protein interactions and provide reference implementations in C/C++, Python, and JavaScript. RIF is released under licence GNU General Public License version 3 (GNU GPLv3) and is available on https://github.com/RNABioInfo/rna-interaction-format.


Assuntos
RNA , Software , Bases de Dados Factuais , Proteínas
16.
Algorithms Mol Biol ; 18(1): 16, 2023 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-37940998

RESUMO

BACKGROUND: Evolutionary scenarios describing the evolution of a family of genes within a collection of species comprise the mapping of the vertices of a gene tree T to vertices and edges of a species tree S. The relative timing of the last common ancestors of two extant genes (leaves of T) and the last common ancestors of the two species (leaves of S) in which they reside is indicative of horizontal gene transfers (HGT) and ancient duplications. Orthologous gene pairs, on the other hand, require that their last common ancestors coincides with a corresponding speciation event. The relative timing information of gene and species divergences is captured by three colored graphs that have the extant genes as vertices and the species in which the genes are found as vertex colors: the equal-divergence-time (EDT) graph, the later-divergence-time (LDT) graph and the prior-divergence-time (PDT) graph, which together form an edge partition of the complete graph. RESULTS: Here we give a complete characterization in terms of informative and forbidden triples that can be read off the three graphs and provide a polynomial time algorithm for constructing an evolutionary scenario that explains the graphs, provided such a scenario exists. While both LDT and PDT graphs are cographs, this is not true for the EDT graph in general. We show that every EDT graph is perfect. While the information about LDT and PDT graphs is necessary to recognize EDT graphs in polynomial-time for general scenarios, this extra information can be dropped in the HGT-free case. However, recognition of EDT graphs without knowledge of putative LDT and PDT graphs is NP-complete for general scenarios. In contrast, PDT graphs can be recognized in polynomial-time. We finally connect the EDT graph to the alternative definitions of orthology that have been proposed for scenarios with horizontal gene transfer. With one exception, the corresponding graphs are shown to be colored cographs.

17.
Anim Microbiome ; 5(1): 48, 2023 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-37798675

RESUMO

BACKGROUND: Metagenomic data can shed light on animal-microbiome relationships and the functional potential of these communities. Over the past years, the generation of metagenomics data has increased exponentially, and so has the availability and reusability of data present in public repositories. However, identifying which datasets and associated metadata are available is not straightforward. We created the Animal-Associated Metagenome Metadata Database (AnimalAssociatedMetagenomeDB - AAMDB) to facilitate the identification and reuse of publicly available non-human, animal-associated metagenomic data, and metadata. Further, we used the AAMDB to (i) annotate common and scientific names of the species; (ii) determine the fraction of vertebrates and invertebrates; (iii) study their biogeography; and (iv) specify whether the animals were wild, pets, livestock or used for medical research. RESULTS: We manually selected metagenomes associated with non-human animals from SRA and MG-RAST.  Next, we standardized and curated 51 metadata attributes (e.g., host, compartment, geographic coordinates, and country). The AAMDB version 1.0 contains 10,885 metagenomes associated with 165 different species from 65 different countries. From the collected metagenomes, 51.1% were recovered from animals associated with medical research or grown for human consumption (i.e., mice, rats, cattle, pigs, and poultry). Further, we observed an over-representation of animals collected in temperate regions (89.2%) and a lower representation of samples from the polar zones, with only 11 samples in total. The most common genus among invertebrate animals was Trichocerca (rotifers). CONCLUSION: Our work may guide host species selection in novel animal-associated metagenome research, especially in biodiversity and conservation studies. The data available in our database will allow scientists to perform meta-analyses and test new hypotheses (e.g., host-specificity, strain heterogeneity, and biogeography of animal-associated metagenomes), leveraging existing data. The AAMDB WebApp is a user-friendly interface that is publicly available at https://webapp.ufz.de/aamdb/ .

18.
Evolution ; 77(11): 2378-2391, 2023 11 02.
Artigo em Inglês | MEDLINE | ID: mdl-37724883

RESUMO

Some selection-based theories propose that genome streamlining, favoring smaller genome sizes, is advantageous in nutritionally limited environments, particularly under P-limitation. To test this prediction, we conducted several experimental evolution trials on clonal populations of a facultatively asexual rotifer that exhibits intraspecific variation in genome size. Most trials showed a rapid decline in clonal diversity, which was accelerated in populations that were initially nonadapted. Populations consisting of three rotifer clones often became monoclonal within a few weeks, while populations starting with 120 clones eroded to 10 multilocus genotypes, of which only five were abundant in higher numbers. While P-limitation affected population growth during the experiments, it did not affect the outcome of clonal competition or the speed at which clonal diversity was lost. Common garden transplant experiments revealed that the evolved populations were better adapted to the experimental conditions than the ancestral controls. However, contrary to expectations, the evolved populations did not show an overrepresentation of small genomes. Intermediate genomes were also frequently abundant, although very large genomes were rare. Our findings suggest that fitness is more influenced by genotypic differences among clones than by differences in GS, and indicate that such differences might hinder genome streamlining during early adaptation to a new environment.


Assuntos
Variação Genética , Tamanho do Genoma , Genótipo
19.
NAR Genom Bioinform ; 5(3): lqad072, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37608800

RESUMO

The in silico prediction of non-coding and protein-coding genetic loci has received considerable attention in comparative genomics aiming in particular at the identification of properties of nucleotide sequences that are informative of their biological role in the cell. We present here a software framework for the alignment-based training, evaluation and application of machine learning models with user-defined parameters. Instead of focusing on the one-size-fits-all approach of pervasive in silico annotation pipelines, we offer a framework for the structured generation and evaluation of models based on arbitrary features and input data, focusing on stable and explainable results. Furthermore, we showcase the usage of our software package in a full-genome screen of Drosophila melanogaster and evaluate our results against the well-known but much less flexible program RNAz.

20.
J Integr Bioinform ; 20(3)2023 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-37615674

RESUMO

The differentiation of regions with coding potential from non-coding regions remains a key task in computational biology. Methods such as RNAcode that exploit patterns of sequence conservation for this task have a substantial advantage in classification accuracy in particular for short coding sequences, compared to methods that rely on a single input sequence. However, they require sequence alignments as input. Frequently, suitable multiple sequence alignments are not readily available and are tedious, and sometimes difficult to construct. We therefore introduce here a new web service that provides access to the well-known coding sequence detector RNAcode with minimal user overhead. It requires as input only a single target nucleotide sequence. The service automates the collection, selection, and preparation of homologous sequences from the NCBI database, as well as the construction of the multiple sequence alignment that are needed as input for RNAcode. The service automatizes the entire pre- and postprocessing and thus makes the investigation of specific genomic regions for previously unannotated coding regions, such as small peptides or additional introns, a simple task that is easily accessible to non-expert users. RNAcode_Web is accessible online at rnacode.bioinf.uni-leipzig.de.


Assuntos
Genômica , Software , Fases de Leitura Aberta , Alinhamento de Sequência , Biologia Computacional/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...