RESUMEN
Genomic studies on sequence composition employ various approaches, such as calculating the proportion of guanine and cytosine within a given sequence (GC% content), which can shed light on various aspects of the organism's biology. In this context, GC% can provide insights into virus-host relationships and evolution. Here, we present a comprehensive gene-by-gene analysis of 61 representatives belonging to the phylum Nucleocytoviricota, which comprises viruses with the largest genomes known in the virosphere. Parameters were evaluated not only based on the average GC% of a given viral species compared to the entire phylum but also considering gene position and phylogenetic history. Our results reveal that while some families exhibit similar GC% among their representatives (e.g., Marseilleviridae), others such as Poxviridae, Phycodnaviridae, and Mimiviridae have members with discrepant GC% values, likely reflecting adaptation to specific biological cycles and hosts. Interestingly, certain genes located at terminal regions or within specific genomic clusters show GC% values distinct from the average, suggesting recent acquisition or unique evolutionary pressures. Horizontal gene transfer and the presence of potential paralogs were also assessed in genes with the most discrepant GC% values, indicating multiple evolutionary histories. Taken together, to the best of our knowledge, this study represents the first global and gene-by-gene analysis of GC% distribution and profiles within genomes of Nucleocytoviricota members, highlighting their diversity and identifying potential new targets for future studies.
RESUMEN
The standard genetic code determines that in most species, including viruses, there are 20 amino acids that are coded by 61 codons, while the other three codons are stop triplets. Considering the whole proteome each species features its own amino acid frequencies, given the slow rate of change, closely related species display similar GC content and amino acids usage. In contrast, distantly related species display different amino acid frequencies. Furthermore, within certain multicellular species, as mammals, intragenomic differences in the usage of amino acids are evident. In this communication, we shall summarize some of the most prominent and well-established factors that determine the differences found in the amino acid usage, both across evolution and intragenomically.
Asunto(s)
Aminoácidos , Código Genético , Animales , Aminoácidos/genética , Codón/genética , Composición de Base , Proteoma/genética , Evolución Molecular , Mamíferos/genéticaRESUMEN
MicroRNAs are small RNAs that regulate gene expression through complementary base pairing with their target mRNAs. A substantial understanding of microRNA target recognition and repression mechanisms has been reached using diverse empirical and bioinformatic approaches, primarily in vitro biochemical or cell culture perturbation settings. We sought to determine if rules of microRNA target efficacy could be inferred from extensive gene expression data of human tissues. A transcriptome-wide assessment of all the microRNA-mRNA canonical interactions' efficacy was performed using a normalized Spearman correlation (Z-score) between the abundance of the transcripts in the PRAD-TCGA dataset tissues (RNA-seq mRNAs and small RNA-seq for microRNAs, 546 samples). Using the Z-score of correlation as a surrogate marker of microRNA target efficacy, we confirmed hallmarks of microRNAs, such as repression of their targets, the hierarchy of preference for gene regions (3'UTR > CDS > 5'UTR), and seed length (6 mer < 7 mer < 8 mer), as well as the contribution of the 3'-supplementary pairing at nucleotides 13-16 of the microRNA. Interactions mediated by 6 mer + supplementary showed similar inferred repression as 7 mer sites, suggesting that the 6 mer + supplementary sites may be relevant in vivo. However, aggregated 7 mer-A1 seeds appear more repressive than 7 mer-m8 seeds, while similar when pairing possibilities at the 3'-supplementary sites. We then examined the 3'-supplementary pairing using 39 microRNAs with Z-score-inferred repressive 3'-supplementary interactions. The approach was sensitive to the offset of the bridge between seed and 3'-supplementary pairing sites, and the pattern of offset-associated repression found supports previous findings. The 39 microRNAs with effective repressive 3'supplementary sites show low GC content at positions 13-16. Our study suggests that the transcriptome-wide analysis of microRNA-mRNA correlations may uncover hints of microRNA targeting determinants. Finally, we provide a bioinformatic tool to identify microRNA-mRNA candidate interactions based on the sequence complementarity of the seed and 3'-supplementary regions.
RESUMEN
The genetic material of the three domains of life (Bacteria, Archaea, and Eukaryota) is always double-stranded DNA, and their GC content (molar content of guanine plus cytosine) varies between ≈ 13% and ≈ 75%. Nucleotide composition is the simplest way of characterizing genomes. Despite this simplicity, it has several implications. Indeed, it is the main factor that determines, among other features, dinucleotide frequencies, repeated short DNA sequences, and codon and amino acid usage. Which forces drive this strong variation is still a matter of controversy. For rather obvious reasons, most of the studies concerning this huge variation and its consequences, have been done in free-living organisms. However, no recent comprehensive study of all known viruses has been done (that is, concerning all available sequences). Viruses, by far the most abundant biological entities on Earth, are the causative agents of many diseases. An overview of these entities is important also because their genetic material is not always double-stranded DNA: indeed, certain viruses have as genetic material single-stranded DNA, double-stranded RNA, single-stranded RNA, and/or retro-transcribing. Therefore, one may wonder if what we have learned about the evolution of GC content and its implications in prokaryotes and eukaryotes also applies to viruses. In this contribution, we attempt to describe compositional properties of â¼ 10,000 viral species: base composition (globally and according to Baltimore classification), correlations among non-coding regions and the three codon positions, and the relationship of the nucleotide frequencies and codon usage of viruses with the same feature of their hosts. This allowed us to determine how the base composition of phages strongly correlate with the value of their respective hosts, while eukaryotic viruses do not (with fungi and protists as exceptions). Finally, we discuss some of these results concerning codon usage: reinforcing previous results, we found that phages and hosts exhibit moderate to high correlations, while for eukaryotes and their viruses the correlations are weak or do not exist.
RESUMEN
Several universal genomic traits affect trade-offs in the capacity, cost, and efficiency of the biochemical information processing that underpins metabolism and reproduction. We analyzed the role of these traits in mediating the responses of a planktonic microbial community to nutrient enrichment in an oligotrophic, phosphorus-deficient pond in Cuatro Ciénegas, Mexico. This is one of the first whole-ecosystem experiments to involve replicated metagenomic assessment. Mean bacterial genome size, GC content, total number of tRNA genes, total number of rRNA genes, and codon usage bias in ribosomal protein sequences were all higher in the fertilized treatment, as predicted on the basis of the assumption that oligotrophy favors lower information-processing costs whereas copiotrophy favors higher processing rates. Contrasting changes in trait variances also suggested differences between traits in mediating assembly under copiotrophic versus oligotrophic conditions. Trade-offs in information-processing traits are apparently sufficiently pronounced to play a role in community assembly because the major components of metabolism-information, energy, and nutrient requirements-are fine-tuned to an organism's growth and trophic strategy.
Asunto(s)
Bacterias/genética , Bacterias/metabolismo , Ecosistema , Genoma Bacteriano/genética , Metagenoma/genética , Composición de Base/genética , Uso de Codones/genética , Fertilizantes , México , Plancton/genética , Plancton/metabolismo , Plancton/microbiología , Estanques/microbiología , Biosíntesis de Proteínas/genética , ARN Bacteriano/genética , ARN Bacteriano/metabolismo , ARN Ribosómico/genética , ARN Ribosómico/metabolismo , ARN de Transferencia/genética , ARN de Transferencia/metabolismoRESUMEN
Abstract INTRODUCTION: Tuberculosis is listed among the top 10 causes of deaths worldwide. The resistant strains causing this disease have been considered to be responsible for public health emergencies and health security threats. As stated by the World Health Organization (WHO), around 558,000 different cases coupled with resistance to rifampicin (the most operative first-line drug) have been estimated to date. Therefore, in order to detect the resistant strains using the genomes of Mycobacterium tuberculosis (MTB), we propose a new methodology for the analysis of genomic similarities that associate the different levels of decomposition of the genome (discrete non-decimated wavelet transform) and the Hurst exponent. METHODS: The signals corresponding to the ten analyzed sequences were obtained by assessing GC content, and then these signals were decomposed using the discrete non-decimated wavelet transform along with the Daubechies wavelet with four null moments at five levels of decomposition. The Hurst exponent was calculated at each decomposition level using five different methods. The cluster analysis was performed using the results obtained for the Hurst exponent. RESULTS: The aggregated variance, differenced aggregated variance, and aggregated absolute value methods presented the formation of three groups, whereas the Peng and R/S methods presented the formation of two groups. The aggregated variance method exhibited the best results with respect to the group formation between similar strains. CONCLUSION: The evaluation of Hurst exponent associated with discrete non-decimated wavelet transform can be used as a measure of similarity between genome sequences, thus leading to a refinement in the analysis.
Asunto(s)
Humanos , Genoma Bacteriano/genética , Análisis de Ondículas , Modelos Genéticos , Mycobacterium tuberculosis/genéticaRESUMEN
In the present work, we performed a comparative genome-wide analysis of 22 species representative of the main clades and lifestyles of the phylum Platyhelminthes. We selected a set of 700 orthologous genes conserved in all species, measuring changes in GC content, codon, and amino acid usage in orthologous positions. Values of 3rd codon position GC spanned over a wide range, allowing to discriminate two distinctive clusters within freshwater turbellarians, Cestodes and Trematodes respectively. Furthermore, a hierarchical clustering of codon usage data differs remarkably from the phylogenetic tree. Additionally, we detected a synonymous codon usage bias that was more dramatic in extreme GC-poor or GC-rich genomes, i.e., GC-poor Schistosomes preferred to use AT-rich terminated synonymous codons, while GC-rich M. lignano showed the opposite behavior. Interestingly, these biases impacted the amino acidic usage, with preferred amino acids encoded by codons following the GC content trend. These are associated with non-synonymous substitutions at orthologous positions. The detailed analysis of the synonymous and non-synonymous changes provides evidence for a two-hit mechanism where both mutation and selection forces drive the diverse coding strategies of flatworms.
RESUMEN
El genoma humano, como el de todos los mamíferos y aves, es un mosaico de isocoros, los que son regiones muy largas de ADN (>>100 kb) que son homogéneas en cuanto a su composición de bases. Los isocoros pueden ser divididos en un pequeño número de familias que cubren un amplio rango de niveles de GC (GC es la relación molar de guanina+citosina en el ADN). En el genoma humano encontramos cinco familias, que (yendo de valores bajos a altos de GC) son L1, L2, H1, H2 y H3. Este tipo de organización tiene importantes consecuencias funcionales, tales como la diferente concentración de genes, su regulación, niveles de transcripción, tasas de recombinación, tiempo de replicación, etc. Además, la existencia de los isocoros lleva a las llamadas "correlaciones composicionales", lo que significa que en la medida en que diferentes secuencias están localizadas en diferentes isocoros, todas sus regiones (exones y sus tres posiciones de los codones, intrones, etc.) cambian su contenido en GC, y como consecuencia, cambian tanto el uso de aminoácidos como de codones sinónimos en cada familia de isocoros. Finalmente, discutimos el origen de estas estructuras en un marco evolutivo.
The human genome, as the genome of all mammals and birds, are mosaic of isochores, which are very long streches (>> 100 kb) of DNA that are homogeneous in base composition. Isochores can be divided in a small number of families that cover a broad range of GC levels (GC is the molar ratio of guanine+cytosine in DNA). In the human genome, we find five families, which are (going from GC- poor to GC- rich) L1, L2, H1, H2 and H3. This organization has important consequences, as is the case of the concentration of genes, their regulation, transcription levels, rate of recombination, time of replication, etc. Furthermore, the existence of isochores has as a consequence the so called "compositional correlations", which means that as long as sequences are placed in different families of isochores, all of their regions (exons and their three codon positions, introns, etc.) change their GC content, and as a consequence, both codon and amino acids usage change in each isochore family. Finally, we discuss the origin of isochores within an evolutioary framework.
O genoma humano, como todos os mamíferos e aves, é um mosaico de isocóricas, que são muito longas regiões de ADN (>> 100 kb) que são homogéneos na sua composição de base. Isóquos podem ser divididos em um pequeno número de famílias que cobrem uma ampla gama de níveis de GC (GC é a razão molar de guanina + citosina no DNA). No genoma humano, encontramos cinco famílias, que (variando de valores baixos a altos de GC) são L1, L2, H1, H2 e H3. Este tipo de organização tem importantes conseqüências funcionais, como a diferente concentração de genes, sua regulação, níveis de transcrição, taxas de recombinação, tempo de replicação, etc. Além disso, a existência de isocóricas portada chamado "correlações de composição", o que significa que, na medida em que diferentes sequências estão localizados em diferentes isocóricas, todas as regiões (exs e três posições de codões, intrs, etc.) mudam seu conteúdo em GC e, como consequência, alteram tanto o uso de aminoácidos quanto de códons sinônimos em cada família de isócoros. Finalmente, discutimos a origem dessas estruturas em uma estrutura evolucionária.
Asunto(s)
Humanos , Genoma Humano/genética , Isocoras/genética , Composición de Base , Intrones/genéticaRESUMEN
The digestive tract of triatomines (DTT) is an ecological niche favored by microbiota whose enzymatic profile is adapted to the specific substrate availability in this medium. This report describes the molecular enzymatic properties that promote bacterial prominence in the DTT. The microbiota composition was assessed previously based on 16S ribosomal DNA, and whole sequenced genomes of bacteria from the same genera were used to calculate the GC level of rare and prominent bacterial species in the DTT. The enzymatic reactions encoded by coding sequences of both rare and common bacterial species were then compared and revealed key functions explaining why some genera outcompete others in the DTT. Representativeness of DTT microbiota was investigated by shotgun sequencing of DNA extracted from bacteria grown in liquid Luria-Bertani broth (LB) medium. Results showed that GC-rich bacteria outcompete GC-poor bacteria and are the dominant components of the DTT microbiota. In addition, oxidoreductases are the main enzymatic components of these bacteria. In particular, nitrate reductases (anaerobic respiration), oxygenases (catabolism of complex substrates), acetate-CoA ligase (tricarboxylic acid cycle and energy metabolism), and kinase (signaling pathway) were the major enzymatic determinants present together with a large group of minor enzymes including hydrogenases involved in energy and amino acid metabolism. In conclusion, despite their slower growth in liquid LB medium, bacteria from GC-rich genera outcompete the GC-poor bacteria because their specific enzymatic abilities impart a selective advantage in the DTT.
RESUMEN
We characterised and reported the first full-length genomes of Human T-cell Lymphotropic Virus Type 1 subgroup HTLV-1aD (CV21 and CV79). This subgroup is one of the major determinants of HTLV-1 infections in North and West Africa, and recombinant strains involving this subgroup have been recently demonstrated. The CV21 and CV79 strains from Cape Verde/Africa were characterised as pure HTLV-1aD genomes, comparative analyses including HTLV-1 subtypes and subgroups revealed HTLV-1aD signatures in the envelope, pol, and pX regions. These genomes provide original information that will contribute to further studies on HTLV-1a epidemiology and evolution.
Asunto(s)
Humanos , Estudio de Asociación del Genoma Completo , Virus Linfotrópico T Tipo 1 Humano/genética , Cabo Verde , FilogeniaRESUMEN
Accessions of Plectranthusbarbatus (Lamiaceae), a medicinal plant, were investigated using a cytogenetic approach and flow cytometry (FCM). Here, we describe for the first time details of the karyotype including chromosome morphology, physical mapping of GC rich bands (CMA3 banding), as well as the mapping of 45S and 5S rDNA sites. All accessions studied showed karyotypes with 2n = 30 small metacentric and submetacentric chromosomes. The CMA3 banding and fluorescent in situ hybridization techniques revealed coincidence between CMA3 bands and 45S rDNA sites (6 terminal marks) while for the 5S rDNA were observed 4 subterminal marks no coincident with CMA3 marks. For nuclear genome size measurement, the FCM procedure provided histograms with G0/G1 peaks exhibiting CV between 2.0-4.9 and the mean values obtained for the species was 2C = 2.78 pg, with AT% = 61.08 and GC% = 38.92. The cytogenetic data obtained here present new and important information which enables the characterization of Plectranthusbarbatus.
RESUMEN
Aiming at generating a comprehensive genomic database on Elaeis spp., our group is leading several R&D initiatives with Elaeis guineensis (African oil palm) and Elaeis oleifera (American oil palm), including the whole-genome sequencing of the last. Genome size estimates currently available for this genus are controversial, as they indicate that American oil palm genome is about half the size of the African oil palm genome and that the genome of the interspecific hybrid is bigger than both the parental species genomes. We estimated the genome size of three E. guineensis genotypes, five E. oleifera genotypes, and two interspecific hybrids genotypes. On average, the genome size of E. guineensis is 4.32 ± 0.173 pg, while that of E. oleifera is 4.43 ± 0.018 pg. This indicates that both genomes are similar in size, even though E. oleifera is in fact bigger. As expected, the hybrid genome size is around the average of the two genomes, 4.40 ± 0.016 pg. Additionally, we demonstrate that both species present around 38% of GC content. As our results contradict the currently available data on Elaeis spp. genome sizes, we propose that the actual genome size of the Elaeis species is around 4 pg and that American oil palm possesses a larger genome than African oil palm.
RESUMEN
The concept of a 'proteomic constraint' proposes that the information content of the proteome exerts a selective pressure to reduce mutation rates, implying that larger proteomes produce a greater selective pressure to evolve or maintain DNA repair, resulting in a decrease in mutational load. Here, the distribution of 21 recombination repair genes was characterized across 900 bacterial genomes. Consistent with prediction, the presence of 17 genes correlated with proteome size. Intracellular bacteria were marked by a pervasive absence of recombination repair genes, consistent with their small proteome sizes, but also consistent with alternative explanations that reduced effective population size or lack of recombination may decrease selection pressure. However, when only non-intracellular bacteria were examined, the relationship between proteome size and gene presence was maintained. In addition, the more widely distributed (i.e. conserved) a gene, the smaller the average size of the proteomes from which it was absent. Together, these observations are consistent with the operation of a proteomic constraint on DNA repair. Lastly, a correlation between gene absence and genome AT content was shown, indicating a link between absence of DNA repair and elevated genome AT content.
Asunto(s)
Bacterias/genética , Reparación del ADN por Recombinación/genética , Proteínas Bacterianas/genética , Composición de Base , Análisis por Conglomerados , Enzimas Reparadoras del ADN/genética , Genoma Bacteriano , Modelos Genéticos , Proteoma/genéticaRESUMEN
The Naica Mine in northern Mexico is famous for its giant gypsum crystals, which may reach up to 11 m long and contain fluid inclusions that might have captured microorganisms during their formation. These crystals formed under particularly stable geochemical conditions in cavities filled by low salinity hydrothermal water at 54-58°C. We have explored the microbial diversity associated to these deep, saline hydrothermal waters collected in the deepest (ca. 700-760 m) mineshafts by amplifying, cloning and sequencing small-subunit ribosomal RNA genes using primers specific for archaea, bacteria, and eukaryotes. Eukaryotes were not detectable in the samples and the prokaryotic diversity identified was very low. Two archaeal operational taxonomic units (OTUs) were detected in one sample. They clustered with, respectively, basal Thaumarchaeota lineages and with a large clade of environmental sequences branching at the base of the Thermoplasmatales within the Euryarchaeota. Bacterial sequences belonged to the Candidate Division OP3, Firmicutes and the Alpha- and Beta-proteobacteria. Most of the lineages detected appear autochthonous to the Naica system, since they had as closest representatives environmental sequences retrieved from deep sediments or the deep subsurface. In addition, the high GC content of 16S rRNA gene sequences belonging to the archaea and to some OP3 OTUs suggests that at least these lineages are thermophilic. Attempts to amplify diagnostic functional genes for methanogenesis (mcrA) and sulfate reduction (dsrAB) were unsuccessful, suggesting that those activities, if present, are not important in the aquifer. By contrast, genes encoding archaeal ammonium monooxygenase (AamoA) were amplified, suggesting that Naica Thaumarchaeota are involved in nitrification. These organisms are likely thermophilic chemolithoautotrophs adapted to thrive in an extremely energy-limited environment.