Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Más filtros











Intervalo de año de publicación
1.
J Transl Med ; 22(1): 756, 2024 Aug 12.
Artículo en Inglés | MEDLINE | ID: mdl-39135093

RESUMEN

BACKGROUND: Decoding human genomic sequences requires comprehensive analysis of DNA sequence functionality. Through computational and experimental approaches, researchers have studied the genotype-phenotype relationship and generate important datasets that help unravel complicated genetic blueprints. Thus, the recently developed artificial intelligence methods can be used to interpret the functions of those DNA sequences. METHODS: This study explores the use of deep learning, particularly pre-trained genomic models like DNA_bert_6 and human_gpt2-v1, in interpreting and representing human genome sequences. Initially, we meticulously constructed multiple datasets linking genotypes and phenotypes to fine-tune those models for precise DNA sequence classification. Additionally, we evaluate the influence of sequence length on classification results and analyze the impact of feature extraction in the hidden layers of our model using the HERV dataset. To enhance our understanding of phenotype-specific patterns recognized by the model, we perform enrichment, pathogenicity and conservation analyzes of specific motifs in the human endogenous retrovirus (HERV) sequence with high average local representation weight (ALRW) scores. RESULTS: We have constructed multiple genotype-phenotype datasets displaying commendable classification performance in comparison with random genomic sequences, particularly in the HERV dataset, which achieved binary and multi-classification accuracies and F1 values exceeding 0.935 and 0.888, respectively. Notably, the fine-tuning of the HERV dataset not only improved our ability to identify and distinguish diverse information types within DNA sequences but also successfully identified specific motifs associated with neurological disorders and cancers in regions with high ALRW scores. Subsequent analysis of these motifs shed light on the adaptive responses of species to environmental pressures and their co-evolution with pathogens. CONCLUSIONS: These findings highlight the potential of pre-trained genomic models in learning DNA sequence representations, particularly when utilizing the HERV dataset, and provide valuable insights for future research endeavors. This study represents an innovative strategy that combines pre-trained genomic model representations with classical methods for analyzing the functionality of genome sequences, thereby promoting cross-fertilization between genomics and artificial intelligence.


Asunto(s)
Genoma Humano , Genómica , Fenotipo , Humanos , Genómica/métodos , Modelos Genéticos , Retrovirus Endógenos/genética , Aprendizaje Profundo , Genotipo
2.
J Infect Dis ; 2024 Jun 17.
Artículo en Inglés | MEDLINE | ID: mdl-38884588

RESUMEN

BACKGROUND: The global resurgence of syphilis necessitates vaccine development. METHODS: We collected ulcer exudates and blood from 17 primary syphilis (PS) participants and skin biopsies and blood from 51 secondary syphilis (SS) participants in Guangzhou, China for Treponema pallidum subsp. pallidum (TPA) qPCR, whole genome sequencing (WGS), and isolation of TPA in rabbits. RESULTS: TPA DNA was detected in 15 of 17 ulcer exudates and 3 of 17 blood PS specimens. TPA DNA was detected in 50 of 51 SS skin biopsies and 27 of 51 blood specimens. TPA was isolated from 47 rabbits with success rates of 71% (12/17) and 69% (35/51), respectively, from ulcer exudates and SS bloods. We obtained paired genomic sequences from 24 clinical samples and corresponding rabbit isolates. Six SS14- and two Nichols-clade genome pairs contained rare discordances. Forty-one of the 51 unique TPA genomes clustered within SS14 subgroups largely from East Asia, while 10 fell into Nichols C and E subgroups. CONCLUSIONS: Our TPA detection rate was high from PS ulcer exudates and SS skin biopsies and over 50% from SS blood, with TPA isolation in over two-thirds of samples. Our results support the use of WGS from rabbit isolates to inform vaccine development.


The incidence of new cases of syphilis has skyrocketed globally in the twenty-first century. This global resurgence requires new strategies, including vaccine development. As part of an NIH funded Cooperative Research Center to develop a syphilis vaccine, we established a clinical research site in Guangzhou, China to better define the local syphilis epidemic and obtain samples from patients with primary and secondary syphilis for whole genome sequencing (WGS) of circulating Treponema pallidum strains. Inoculation of rabbits enabled us to obtain T. pallidum genomic sequences from spirochetes disseminating in blood, a compartment of immense importance for syphilis pathogenesis. Collectively, our results further clarify the molecular epidemiology of syphilis in southern China, enrich our understanding of the manifestations of early syphilis, and demonstrate that the genomic sequences of spirochetes obtained by rabbit inoculation accurately represent those of the spirochetes infecting the corresponding patients.

3.
IJID Reg ; 11: 100356, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38655560

RESUMEN

Objectives: This study aimed to construct geographically, temporally, and epidemiologically representative data sets for SARS-CoV-2 in North Africa, focusing on Variants of Concern (VOCs), Variants of Interest (VOIs), and Variants Under Monitoring (VUMs). Methods: SARS-CoV-2 genomic sequences and metadata from the EpiCoV database via the Global Initiative on Sharing All Influenza Data platform were analyzed. Data analysis included cases, deaths, demographics, patient status, sequencing technologies, and variant analysis. Results: A comprehensive analysis of 10,783 viral genomic sequences from six North African countries revealed notable insights. SARS-CoV-2 sampling methods lack standardization, with a majority of countries lacking clear strategies. Over 59% of analyzed genomes lack essential clinical and demographic metadata, including patient age, sex, underlying health conditions, and clinical outcomes, which are essential for comprehensive genomic analysis and epidemiological studies, as submitted to the Global Initiative on Sharing All Influenza Data. Morocco reported the highest number of confirmed COVID-19 cases (1,272,490), whereas Tunisia leads in reported deaths (29,341), emphasizing regional variations in the pandemic's impact. The GRA clade emerged as predominant in North African countries. The lineage analysis showcased a diversity of 190 lineages in Egypt, 26 in Libya, 121 in Tunisia, 90 in Algeria, 146 in Morocco, and 10 in Mauritania. The temporal dynamics of SARS-CoV-2 variants revealed distinct waves driven by different variants. Conclusions: This study contributes valuable insights into the genomic landscape of SARS-CoV-2 in North Africa, highlighting the importance of genomic surveillance in understanding viral dynamics and informing public health strategies.

4.
mBio ; 15(5): e0069224, 2024 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-38567955

RESUMEN

Defective viral genomes (DVGs) are truncated derivatives of their parental viral genomes generated during an aberrant round of viral genomic replication. Distinct classes of DVGs have been identified in most families of both positive- and negative-sense RNA viruses. Importantly, DVGs have been detected in clinical samples from virally infected individuals and an emerging body of association studies implicates DVGs in shaping the severity of disease caused by viral infections in humans. Consequently, there is growing interest in understanding the molecular mechanisms of de novo DVG generation, how DVGs interact with the innate immune system, and harnessing DVGs as novel therapeutics and vaccine adjuvants to attenuate viral pathogenesis. This minireview focuses on single-stranded RNA viruses (excluding retroviridae), and summarizes the current knowledge of DVG generation, the functions and diversity of DVG species, the roles DVGs play in influencing disease progression, and their application as antivirals and vaccine adjuvants.


Asunto(s)
Virus Defectuosos , Genoma Viral , Humanos , Virus Defectuosos/genética , Replicación Viral , Animales , Virus ARN/genética , Inmunidad Innata , Virosis/virología , Virosis/genética , Virosis/inmunología
5.
Plant Dis ; 2023 Oct 26.
Artículo en Inglés | MEDLINE | ID: mdl-37884481

RESUMEN

Phytophthora parasitica is a highly destructive oomycete plant pathogen that is capable of infecting a wide range of hosts including many agricultural cash crops, fruit trees, and ornamental garden plants. One of the most important diseases caused by P. parasitica worldwide is black shank of tobacco. Rapid, sensitive, and specific pathogen detection is crucial for early rapid diagnosis which can facilitate effective disease management. In this study, we used a genomics approach to identify repeated sequences in the genome of P. parasitica by genome sequence alignment, and identified a 203 bp P. parasitica-specific sequence, PpM34, that is present in 31-60 copies in the genome. The P. parasitica genome-specificity of PpM34 was supported by PCR amplification of 24 genetically diverse strains of P. parasitica, 32 strains representing twelve other Phytophthora species, one Pythium specie, six fungal species and three bacterial species, all of which are plant pathogens. Our PCR and real-time PCR assays showed that the PpM34 sequence was highly sensitive in specifically detecting P. parasitica. Finally, we developed a PpM34-based high-efficiency Recombinase Polymerase Amplification (RPA) assay, which allowed us to specifically detect as little as 1 pg of P. parasitica total DNA from both pure cultures and infected Nicotiana benthamiana at 39°C using a fluorometric thermal cycler. The sensitivity, specificity, convenience and rapidity of this assay represents a major improvement for early diagnosis of P. parasitica infection.

6.
Front Plant Sci ; 14: 1232588, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37868307

RESUMEN

Introduction: The garden petunia, Petunia hybrida (Solanaceae) is a fertile, diploid, annual hybrid species (2n=14) originating from P. axillaris and P. inflata 200 years ago. To understand the recent evolution of the P. hybrida genome, we examined tandemly repeated or satellite sequences using bioinformatic and molecular cytogenetic analysis. Methods: Raw reads from available genomic assemblies and survey sequences of P. axillaris N (PaxiN), P. inflata S6, (PinfS6), P. hybrida (PhybR27) and the here sequenced P. parodii S7 (PparS7) were used for graph and k-mer based cluster analysis of TAREAN and RepeatExplorer. Analysis of repeat specific monomer lengths and sequence heterogeneity of the major tandem repeat families with more than 0.01% genome proportion were complemented by fluorescent in situ hybridization (FISH) using consensus sequences as probes to chromosomes of all four species. Results: Seven repeat families, PSAT1, PSAT3, PSAT4, PSAT5 PSAT6, PSAT7 and PSAT8, shared high consensus sequence similarity and organisation between the four genomes. Additionally, many degenerate copies were present. FISH in P. hybrida and in the three wild petunias confirmed the bioinformatics data and gave corresponding signals on all or some chromosomes. PSAT1 is located at the ends of all chromosomes except the 45S rDNA bearing short arms of chromosomes II and III, and we classify it as a telomere associated sequence (TAS). It is the most abundant satellite repeat with over 300,000 copies, 0.2% of the genomes. PSAT3 and the variant PSAT7 are located adjacent to the centromere or mid-arm of one to three chromosome pairs. PSAT5 has a strong signal at the end of the short arm of chromosome III in P. axillaris and P.inflata, while in P. hybrida additional interstitial sites were present. PSAT6 is located at the centromeres of chromosomes II and III. PSAT4 and PSAT8 were found with only short arrays. Discussion: These results demonstrate that (i) repeat families occupy distinct niches within chromosomes, (ii) they differ in the copy number, cluster organization and homogenization events, and that (iii) the recent genome hybridization in breeding P. hybrida preserved the chromosomal position of repeats but affected the copy number of repetitive DNA.

7.
J Comput Biol ; 30(9): 1009-1018, 2023 09.
Artículo en Inglés | MEDLINE | ID: mdl-37695837

RESUMEN

Identifying viral variants through clustering is essential for understanding the composition and structure of viral populations within and between hosts, which play a crucial role in disease progression and epidemic spread. This article proposes and validates novel Monte Carlo (MC) methods for clustering aligned viral sequences by minimizing either entropy or Hamming distance from consensuses. We validate these methods on four benchmarks: two SARS-CoV-2 interhost data sets and two HIV intrahost data sets. A parallelized version of our tool is scalable to very large data sets. We show that both entropy and Hamming distance-based MC clusterings discern the meaningful information from sequencing data. The proposed clustering methods consistently converge to similar clusterings across different runs. Finally, we show that MC clustering improves reconstruction of intrahost viral population from sequencing data.


Asunto(s)
COVID-19 , Humanos , COVID-19/genética , SARS-CoV-2/genética , Benchmarking , Análisis por Conglomerados , Progresión de la Enfermedad
8.
Genome Biol ; 24(1): 136, 2023 Jun 09.
Artículo en Inglés | MEDLINE | ID: mdl-37296461

RESUMEN

We propose a polynomial algorithm computing a minimum plain-text representation of k-mer sets, as well as an efficient near-minimum greedy heuristic. When compressing read sets of large model organisms or bacterial pangenomes, with only a minor runtime increase, we shrink the representation by up to 59% over unitigs and 26% over previous work. Additionally, the number of strings is decreased by up to 97% over unitigs and 90% over previous work. Finally, a small representation has advantages in downstream applications, as it speeds up SSHash-Lite queries by up to 4.26× over unitigs and 2.10× over previous work.


Asunto(s)
Algoritmos , Programas Informáticos , Análisis de Secuencia de ADN , Bacterias
9.
Diseases ; 11(2)2023 Mar 31.
Artículo en Inglés | MEDLINE | ID: mdl-37092436

RESUMEN

During the COVID-19 pandemic caused by SARS-CoV-2, new waves have been associated with new variants and have the potential to escape vaccinations. Therefore, it is useful to conduct retrospective genomic surveillance research. Herein, we present a detailed analysis of 88 SARS-CoV-2 genomes belonging to samples taken from COVID-19 patients from October 2020 to April 2021 at the "Reina Sofía" Hospital (Murcia, Spain) focused to variant appeared later. The results at the mentioned stage show the turning point since the 20E (EU1) variant was still prevalent (71.6%), but Alpha was bursting to 14.8%. Concern mutations have been found in 5 genomes classified as 20E (EU1), which were not characteristic of this still little evolved variant. Most of those mutations are found in the spike protein, namely Δ69-70, E484K, Q675H and P681H. However, a relevant deletion in ORF1a at positions 3675-3677 was also identified. These mutations have been reported in many later SARS-CoV-2 lineages, including Omicron. Taken together, our data suggest that preferential emergence mutations could already be present in the early converging evolution. Aside from this, the molecular information has been contrasted with clinical data. Statistical analyses suggest that the correlation between age and severity criteria is significantly higher in the viral samples with more accumulated changes.

10.
Biosystems ; 226: 104869, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-36858110

RESUMEN

The sequencing of eukaryotic genomes has shown that tandem repeats are abundant in their sequences. In addition to affecting some cellular processes, tandem repeats in the genome may be associated with specific diseases and have been the key to resolving criminal cases. Any tool developed for detecting tandem repeats must be accurate, fast, and useable in thousands of laboratories worldwide, including those with not very advanced computing capabilities. The proposed method, the Rapid Perfect Tandem Repeat Finder (RPTRF), minimizes the need for excess character comparison processing by indexing the input file and significantly helps to accelerate and prepare the output without artifacts by using an interval tree in the filtering section. The experiments demonstrated that the RPTRF is very fast in discovering all perfect tandem repeats of all categories of any genomic sequences. Although the detection of imperfect TRs is not the focus of the RPTRF, comparisons show that it even outperforms some other tools (in five selected gold standards) designed explicitly for this purpose. The implemented tool and how to use it are available on GitHub.


Asunto(s)
Genómica , Secuencias Repetidas en Tándem , Secuencia de Bases , Secuencias Repetidas en Tándem/genética , Análisis de Secuencia de ADN
11.
Biosystems ; 221: 104760, 2022 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-36031064

RESUMEN

The article is devoted to the author's results of the algebraic analysis of molecular genetic systems, including a set of structured DNA alphabets and long nucleotide sequences in single-stranded DNA of eukaryotic and prokaryotic genomes. A connection of the system of DNA n-plets alphabets with principles of algebraic holography is shown, which concerns a popular theme of holography principles in genetically inherited physiology. In addition, a relation between DNA n-plets alphabets and the Poincaré disk model of Lobachevski hyperbolic geometry is revealed. This relation can explain known facts of the relationship of physiological phenomena with hyperbolic geometry. Considering long DNA sequences as a bunch of many parallel texts written in different n-plets alphabets led to the discovery of some universal rules of the stochastic organization of genomic DNAs. These rules are discussed concerning the general problem of the biological dualism "probability-vs-determinism". In general, the presented results give pieces of evidence in favor of the efficiency of a model approach to living organisms as quantum-informational algebraic-harmonic essences.


Asunto(s)
Holografía , ADN , ADN de Cadena Simple , Informática , Células Procariotas
12.
J Comput Biol ; 29(5): 453-464, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-35325549

RESUMEN

In this work, we investigate using Fourier coefficients (FCs) for capturing useful information about viral sequences in a computationally efficient and compact manner. Specifically, we extract geographic submission location from SARS-CoV-2 sequence headers submitted to the GISAID Initiative, calculate corresponding FCs, and use the FCs to classify these sequences according to geographic location. We show that the FCs serve as useful numerical summaries for sequences that allow manipulation, identification, and differentiation via classical mathematical and statistical methods that are not readily applicable for character strings. Further, we argue that subsets of the FCs may be usable for the same purposes, which results in a reduction in storage requirements. We conclude by offering extensions of the research and potential future directions for subsequent analyses, such as the use of other series transforms for discreetly indexed signals such as genomes.


Asunto(s)
COVID-19 , SARS-CoV-2 , Benchmarking , Genoma Viral , Humanos , Filogenia , SARS-CoV-2/genética
13.
Comput Biol Chem ; 92: 107480, 2021 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-33826970

RESUMEN

Epigenetics and DNA methylation play a pivotal role in many processes of the cell and we often observe that an aberrant methylation pattern characterizes pathologies. In this work we investigate the role that the flanking sequences of CGs play in the methylation process in human. We built four different CG datasets: methylated, unmethylated, and two randomly extracted ones. We evaluated features associated to the flanking sequences of those CG sets, for different size around the CG, through five measures accounting for different aspects of sequence composition complexity and structure. The analysis performed through those measures revealed evident different behaviors between methylated and unmethylated probe sets. Major differences were observed for GC content and CG dinucleotide frequency in a window size of 300-400 bp and for CG self-attraction in 3K bp. It is remarkable as the effect of methylated CG lasts much more than expected far from the CG.


Asunto(s)
Islas de CpG/genética , ADN/genética , ADN/metabolismo , Metilación de ADN/genética , Entropía , Humanos
14.
Infect Genet Evol ; 88: 104708, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33421654

RESUMEN

The pandemic due to novel coronavirus, SARS-CoV-2 is a serious global concern now. More than thousand new COVID-19 infections are getting reported daily for this virus across the globe. Thus, the medical research communities are trying to find the remedy to restrict the spreading of this virus, while the vaccine development work is still under research in parallel. In such critical situation, not only the medical research community, but also the scientists in different fields like microbiology, pharmacy, bioinformatics and data science are also sharing effort to accelerate the process of vaccine development, virus prediction, forecasting the transmissible probability and reproduction cases of virus for social awareness. With the similar context, in this article, we have studied sequence variability of the virus primarily focusing on three aspects: (a) sequence variability among SARS-CoV-1, MERS-CoV and SARS-CoV-2 in human host, which are in the same coronavirus family, (b) sequence variability of SARS-CoV-2 in human host for 54 different countries and (c) sequence variability between coronavirus family and country specific SARS-CoV-2 sequences in human host. For this purpose, as a case study, we have performed topological analysis of 2391 global genomic sequences of SARS-CoV-2 in association with SARS-CoV-1 and MERS-CoV using an integrated semi-alignment based computational technique. The results of the semi-alignment based technique are experimentally and statistically found similar to alignment based technique and computationally faster. Moreover, the outcome of this analysis can help to identify the nations with homogeneous SARS-CoV-2 sequences, so that same vaccine can be applied to their heterogeneous human population.


Asunto(s)
COVID-19/epidemiología , Infecciones por Coronavirus/epidemiología , Variación Genética , Genoma Viral , Pandemias , SARS-CoV-2/genética , Síndrome Respiratorio Agudo Grave/epidemiología , África/epidemiología , Américas/epidemiología , Asia/epidemiología , Australia/epidemiología , Secuencia de Bases , COVID-19/transmisión , COVID-19/virología , Biología Computacional/métodos , Infecciones por Coronavirus/transmisión , Infecciones por Coronavirus/virología , Europa (Continente)/epidemiología , Interacciones Huésped-Patógeno/genética , Humanos , Coronavirus del Síndrome Respiratorio de Oriente Medio/genética , Coronavirus del Síndrome Respiratorio de Oriente Medio/patogenicidad , Coronavirus Relacionado al Síndrome Respiratorio Agudo Severo/genética , Coronavirus Relacionado al Síndrome Respiratorio Agudo Severo/patogenicidad , SARS-CoV-2/patogenicidad , Alineación de Secuencia , Síndrome Respiratorio Agudo Grave/transmisión , Síndrome Respiratorio Agudo Grave/virología
15.
J Gen Virol ; 100(11): 1523-1529, 2019 11.
Artículo en Inglés | MEDLINE | ID: mdl-31592752

RESUMEN

Middle East respiratory syndrome (MERS) is a viral respiratory illness first reported in Saudi Arabia in September 2012 caused by the human coronavirus (CoV), MERS-CoV. Using full-genome sequencing and phylogenetic analysis, scientists have identified three clades and multiple lineages of MERS-CoV in humans and the zoonotic host, dromedary camels. In this study, we have characterized eight MERS-CoV isolates collected from patients in Saudi Arabia in 2015. We have performed full-genome sequencing on the viral isolates, and compared them to the corresponding clinical specimens. All isolates were clade B, lineages 4 and 5. Three of the isolates carry deletions located on three independent regions of the genome in the 5'UTR, ORF1a and ORF3. All novel MERS-CoV strains replicated efficiently in Vero and Huh7 cells. Viruses with deletions in the 5'UTR and ORF1a exhibited impaired viral release in Vero cells. These data emphasize the plasticity of the MERS-CoV genome during human infection.


Asunto(s)
Coronavirus del Síndrome Respiratorio de Oriente Medio/crecimiento & desarrollo , Coronavirus del Síndrome Respiratorio de Oriente Medio/genética , Eliminación de Secuencia , Replicación Viral , Regiones no Traducidas 5' , Animales , Línea Celular , Chlorocebus aethiops , Infecciones por Coronavirus/virología , Genotipo , Humanos , Coronavirus del Síndrome Respiratorio de Oriente Medio/clasificación , Coronavirus del Síndrome Respiratorio de Oriente Medio/aislamiento & purificación , Sistemas de Lectura Abierta , Arabia Saudita , Secuenciación Completa del Genoma
16.
Methods Mol Biol ; 2035: 25-44, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31444742

RESUMEN

Circular Dichroic (CD) spectroscopy is one of the most frequently used methods for guanine quadruplex studies and in general for studies of conformational properties of nucleic acids. The reason is its high sensitivity to even slight changes in mutual orientation of absorbing bases of DNA. CD can reveal formation of particular structural DNA arrangements and can be used to search for the conditions stabilizing the structures, to follow the transitions between various structural states, to explore kinetics of their appearance, to determine thermodynamic parameters, and also to detect formation of higher order structures. CD spectroscopy is an important complementary technique to NMR spectroscopy and X-ray diffraction in quadruplex studies due to its sensitivity, easy manipulation of studied samples, and relative inexpensiveness. In this part, we present the protocol for the use of CD spectroscopy in the study of guanine quadruplexes, together with practical advice and cautions about various, particularly interpretation, difficulties.


Asunto(s)
ADN/química , G-Cuádruplex , Dicroismo Circular , Espectroscopía de Resonancia Magnética , Conformación de Ácido Nucleico , Difracción de Rayos X
17.
J Biomol Struct Dyn ; 37(9): 2322-2338, 2019 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-30044190

RESUMEN

The assembly and maturation of viruses with icosahedral capsids must be coordinated with icosahedral symmetry. The icosahedral symmetry imposes also the restrictions on the cooperative specific interactions between genomic RNA/DNA and coat proteins that should be reflected in quasi-regular segmentation of viral genomic sequences. Combining discrete direct and double Fourier transforms, we studied the quasi-regular large-scale segmentation in genomic sequences of different ssRNA, ssDNA, and dsDNA viruses. The particular representatives included satellite tobacco mosaic virus (STMV) and the strains of satellite tobacco necrosis virus (STNV), STNV-C, STNV-1, STNV-2, Escherichia phages MS2, ϕX174, α3, and HK97, and Simian virus 40. In all their genomes, we found the significant quasi-regular segmentation of genomic sequences related to the virion assembly and the genome packaging within icosahedral capsid. We also found good correspondence between our results and available cryo-electron microscopy data on capsid structures and genome packaging in these viruses. Fourier analysis of genomic sequences provides the additional insight into mechanisms of hierarchical genome packaging and may be used for verification of the concepts of 3-fold or 5-fold intermediates in virion assembly. The results of sequence analysis should be taken into account at the choice of models and data interpretation. They also may be helpful for the development of antiviral drugs.


Asunto(s)
Proteínas de la Cápside/química , Cápside/metabolismo , Genoma Viral/genética , Conformación de Ácido Nucleico , Conformación Proteica , ARN Viral/química , Ensamble de Virus/genética , Algoritmos , Proteínas de la Cápside/genética , Genómica/métodos , Modelos Moleculares , Modelos Teóricos , ARN Viral/genética
18.
Proc Natl Acad Sci U S A ; 115(26): 6703-6708, 2018 06 26.
Artículo en Inglés | MEDLINE | ID: mdl-29895692

RESUMEN

Between 2009 and 2016 the number of protein sequences from known species increased 10-fold from 8 million to 85 million. About 80% of these sequences contain at least one region recognized by the conserved domain architecture retrieval tool (CDART) as a sequence motif. Motifs provide clues to biological function but CDART often matches the same region of a protein by two or more profiles. Such synonyms complicate estimates of functional complexity. We do full-linkage clustering of redundant profiles by finding maximum disjoint cliques: Each cluster is replaced by a single representative profile to give what we term a unique function word (UFW). From 2009 to 2016, the number of sequence profiles used by CDART increased by 80%; the number of UFWs increased more slowly by 30%, indicating that the number of UFWs may be saturating. The number of sequences matched by a single UFW (sequences with single domain architectures) increased as slowly as the number of different words, whereas the number of sequences matched by a combination of two or more UFWs in sequences with multiple domain architectures (MDAs) increased at the same rate as the total number of sequences. This combinatorial arrangement of a limited number of UFWs in MDAs accounts for the genomic diversity of protein sequences. Although eukaryotes and prokaryotes use very similar sets of "words" or UFWs (57% shared), the "sentences" (MDAs) are different (1.3% shared).


Asunto(s)
Secuencia Conservada , Genómica , Dominios Proteicos , Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Análisis por Conglomerados , Bases de Datos Factuales , Humanos , Alineación de Secuencia , Homología de Secuencia de Aminoácido , Relación Estructura-Actividad
19.
Virology ; 512: 124-131, 2017 12.
Artículo en Inglés | MEDLINE | ID: mdl-28957690

RESUMEN

Herpes simplex virus 1 (HSV-1) is a widespread pathogen that persists for life, replicating in surface tissues and establishing latency in peripheral ganglia. Increasingly, molecular studies of latency use cultured neuron models developed using recombinant viruses such as HSV-1 GFP-US11, a derivative of strain Patton expressing green fluorescent protein (GFP) fused to the viral US11 protein. Visible fluorescence follows viral DNA replication, providing a real time indicator of productive infection and reactivation. Patton was isolated in Houston, Texas, prior to 1973, and distributed to many laboratories. Although used extensively, the genomic structure and phylogenetic relationship to other strains is poorly known. We report that wild type Patton and the GFP-US11 recombinant contain the full complement of HSV-1 genes and differ within the unique regions at only eight nucleotides, changing only two amino acids. Although isolated in North America, Patton is most closely related to Asian viruses, including KOS63.


Asunto(s)
Herpes Simple/virología , Herpesvirus Humano 1/genética , Asia/epidemiología , Secuencia Conservada , ADN Viral , Regulación Viral de la Expresión Génica , Herpes Simple/epidemiología , Humanos , Filogenia , Replicación Viral
20.
Sheng Wu Gong Cheng Xue Bao ; 33(8): 1292-1303, 2017 Aug 25.
Artículo en Chino | MEDLINE | ID: mdl-28853257

RESUMEN

In this study, a multiplex RT-PCR method was developed for detection of seven diarrhea-associated porcine viruses, including porcine teschovirus (PTV), porcine sapovirus (PSV), porcine deltacornavirus (PDCoV), porcine kobuvirus (PKV), porcine sapovirus (PSaV), porcine astrovirus (PAstV) and porcine torovirus (PToV). A total of 419 samples were screened by this method and results showed that PKV had the highest positive rate of 26.98%?45.79% and its mixed infection rate reached 9.52%-18.54%. On account of high positive rate of PKV and its important role in diarrhea disease, complete genomic sequences of three PKV positive samples were further sequenced. Three PKV labeled as PD-PKV, JS-PKV and CM-PKV were classified into porcine kobuvirus genus and had far genetic distance with other kobuviruses. The complete genome homologies among them were 88.1%-89.1%. CM-PKV had the highest identity with the Chinese strain JS-02a-CHN/2013 reported in 2013 while JS-PKV and PD-PKV were most closed to the K-30-HUN/2008/HUN strain reported in Hungary in 2008. This illustrates the significant genetic differences of the different PKV isolates in Shanghai while its relationship with the viral pathogenicity still needs to be explored. This research provides references for further understanding the prevalence of PKV and its role in swine diarrhea.


Asunto(s)
Kobuvirus/genética , Kobuvirus/aislamiento & purificación , Filogenia , Infecciones por Picornaviridae/veterinaria , Enfermedades de los Porcinos/diagnóstico , Animales , China , Diarrea/veterinaria , Diarrea/virología , Infecciones por Picornaviridae/diagnóstico , ARN Viral , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa/veterinaria , Porcinos/virología , Enfermedades de los Porcinos/virología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA