Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 8 de 8
Filtre
1.
Genomics, Proteomics & Bioinformatics ; (4): 43-51, 2003.
Article Dans Anglais | WPRIM | ID: wpr-339525

Résumé

The large amount of repeats, especially high copy repeats, in the genomes of higher animals and plants makes whole genome assembly (WGA) quite difficult. In order to solve this problem, we tried to identify repeats and mask them prior to assembly even at the stage of genome survey. It is known that repeats of different copy number have different probabilities of appearance in shotgun data, so based on this principle, we constructed a statistical model and inferred criteria for mathematically defined repeats (MDRs) at different shotgun coverages. According to these criteria, we developed software MDRmasker to identify and mask MDRs in shotgun data. With repeats masked prior to assembly, the speed of assembly was increased with lower error probability. In addition, clone-insert size affect the accuracy of repeat assembly and scaffold construction, we also designed length distribution of clone-inserts using our model. In our simulated genomes of human and rice, the length distribution of repeats is different, so their optimal length distributions of clone-inserts were not the same. Thus with optimal length distribution of clone-inserts, a given genome could be assembled better at lower coverage.


Sujets)
Animaux , Humains , Clonage moléculaire , Génome , Génome humain , Génomique , Méthodes , Modèles génétiques , Modèles statistiques , Modèles théoriques , Oryza , Génétique , Analyse de séquence d'ADN
2.
Genomics, Proteomics & Bioinformatics ; (4): 101-107, 2003.
Article Dans Anglais | WPRIM | ID: wpr-339517

Résumé

We report a complete genomic sequence of rare isolates (minor genotype) of the SARS-CoV from SARS patients in Guangdong, China, where the first few cases emerged. The most striking discovery from the isolate is an extra 29-nucleotide sequence located at the nucleotide positions between 27,863 and 27,864 (referred to the complete sequence of BJ01) within an overlapped region composed of BGI-PUP5 (BGI-postulated uncharacterized protein 5) and BGI-PUP6 upstream of the N (nucleocapsid) protein. The discovery of this minor genotype, GD-Ins29, suggests a significant genetic event and differentiates it from the previously reported genotype, the dominant form among all sequenced SARS-CoV isolates. A 17-nt segment of this extra sequence is identical to a segment of the same size in two human mRNA sequences that may interfere with viral replication and transcription in the cytosol of the infected cells. It provides a new avenue for the exploration of the virus-host interaction in viral evolution, host pathogenesis, and vaccine development.


Sujets)
Séquence nucléotidique , Chine , Analyse de regroupements , Composants de gène , Variation génétique , Génome viral , Génotype , Données de séquences moléculaires , Phylogenèse , RT-PCR , Virus du SRAS , Génétique , Analyse de séquence d'ADN , Syndrome respiratoire aigu sévère , Génétique
3.
Genomics, Proteomics & Bioinformatics ; (4): 108-117, 2003.
Article Dans Anglais | WPRIM | ID: wpr-339516

Résumé

The corona-like spikes or peplomers on the surface of the virion under electronic microscope are the most striking features of coronaviruses. The S (spike) protein is the largest structural protein, with 1,255 amino acids, in the viral genome. Its structure can be divided into three regions: a long N-terminal region in the exterior, a characteristic transmembrane (TM) region, and a short C-terminus in the interior of a virion. We detected fifteen substitutions of nucleotides by comparisons with the seventeen published SARS-CoV genome sequences, eight (53.3%) of which are non-synonymous mutations leading to amino acid alternations with predicted physiochemical changes. The possible antigenic determinants of the S protein are predicted, and the result is confirmed by ELISA (enzyme-linked immunosorbent assay) with synthesized peptides. Another profound finding is that three disulfide bonds are defined at the C-terminus with the N-terminus of the E (envelope) protein, based on the typical sequence and positions, thus establishing the structural connection with these two important structural proteins, if confirmed. Phylogenetic analysis reveals several conserved regions that might be potent drug targets.


Sujets)
Séquence d'acides aminés , Antigènes viraux , Allergie et immunologie , Composition en bases nucléiques , Biologie informatique , Test ELISA , Glycoprotéines membranaires , Génétique , Données de séquences moléculaires , Mutation , Génétique , Phylogenèse , Structure tertiaire des protéines , Virus du SRAS , Génétique , Allergie et immunologie , Analyse de séquence d'ADN , Similitude de séquences , Glycoprotéine de spicule des coronavirus , Protéines de l'enveloppe virale , Génétique , Métabolisme
4.
Genomics, Proteomics & Bioinformatics ; (4): 118-130, 2003.
Article Dans Anglais | WPRIM | ID: wpr-339515

Résumé

We studied structural and immunological properties of the SARS-CoV M (membrane) protein, based on comparative analyses of sequence features, phylogenetic investigation, and experimental results. The M protein is predicted to contain a triple-spanning transmembrane (TM) region, a single N-glycosylation site near its N-terminus that is in the exterior of the virion, and a long C-terminal region in the interior. The M protein harbors a higher substitution rate (0.6% correlated to its size) among viral open reading frames (ORFs) from published data. The four substitutions detected in the M protein, which cause non-synonymous changes, can be classified into three types. One of them results in changes of pI (isoelectric point) and charge, affecting antigenicity. The second changes hydrophobicity of the TM region, and the third one relates to hydrophilicity of the interior structure. Phylogenetic tree building based on the variations of the M protein appears to support the non-human origin of SARS-CoV. To investigate its immunogenicity, we synthesized eight oligopeptides covering 69.2% of the entire ORF and screened them by using ELISA (enzyme-linked immunosorbent assay) with sera from SARS patients. The results confirmed our predictions on antigenic sites.


Sujets)
Séquence d'acides aminés , Séquence nucléotidique , Analyse de regroupements , Test ELISA , Dosage immunologique , Données de séquences moléculaires , Mutation , Génétique , Oligopeptides , Phylogenèse , Structure tertiaire des protéines , Virus du SRAS , Génétique , Alignement de séquences , Analyse de séquence d'ADN , Protéines de la matrice virale , Chimie , Génétique , Allergie et immunologie
5.
Genomics, Proteomics & Bioinformatics ; (4): 155-165, 2003.
Article Dans Anglais | WPRIM | ID: wpr-339512

Résumé

The R (replicase) protein is the uniquely defined non-structural protein (NSP) responsible for RNA replication, mutation rate or fidelity, regulation of transcription in coronaviruses and many other ssRNA viruses. Based on our complete genome sequences of four isolates (BJ01-BJ04) of SARS-CoV from Beijing, China, we analyzed the structure and predicted functions of the R protein in comparison with 13 other isolates of SARS-CoV and 6 other coronaviruses. The entire ORF (open-reading frame) encodes for two major enzyme activities, RNA-dependent RNA polymerase (RdRp) and proteinase activities. The R polyprotein undergoes a complex proteolytic process to produce 15 function-related peptides. A hydrophobic domain (HOD) and a hydrophilic domain (HID) are newly identified within NSP1. The substitution rate of the R protein is close to the average of the SARS-CoV genome. The functional domains in all NSPs of the R protein give different phylogenetic results that suggest their different mutation rate under selective pressure. Eleven highly conserved regions in RdRp and twelve cleavage sites by 3CLP (chymotrypsin-like protein) have been identified as potential drug targets. Findings suggest that it is possible to obtain information about the phylogeny of SARS-CoV, as well as potential tools for drug design, genotyping and diagnostics of SARS.


Sujets)
Séquence d'acides aminés , Composition en bases nucléiques , Séquence nucléotidique , Analyse de regroupements , Biologie informatique , Séquence conservée , Génétique , Évolution moléculaire , Composants de gène , Génome viral , Données de séquences moléculaires , Mutation , Génétique , Phylogenèse , Structure tertiaire des protéines , RNA replicase , Génétique , Virus du SRAS , Génétique , Analyse de séquence d'ADN
6.
Genomics, Proteomics & Bioinformatics ; (4): 180-192, 2003.
Article Dans Anglais | WPRIM | ID: wpr-339508

Résumé

Beijing has been one of the epicenters attacked most severely by the SARS-CoV (severe acute respiratory syndrome-associated coronavirus) since the first patient was diagnosed in one of the city's hospitals. We now report complete genome sequences of the BJ Group, including four isolates (Isolates BJ01, BJ02, BJ03, and BJ04) of the SARS-CoV. It is remarkable that all members of the BJ Group share a common haplotype, consisting of seven loci that differentiate the group from other isolates published to date. Among 42 substitutions uniquely identified from the BJ group, 32 are non-synonymous changes at the amino acid level. Rooted phylogenetic trees, proposed on the basis of haplotypes and other sequence variations of SARS-CoV isolates from Canada, USA, Singapore, and China, gave rise to different paradigms but positioned the BJ Group, together with the newly discovered GD01 (GD-Ins29) in the same clade, followed by the H-U Group (from Hong Kong to USA) and the H-T Group (from Hong Kong to Toronto), leaving the SP Group (Singapore) more distant. This result appears to suggest a possible transmission path from Guangdong to Beijing/Hong Kong, then to other countries and regions.


Sujets)
Humains , Génome viral , Haplotypes , Mutation , Cadres ouverts de lecture , Phylogenèse , Virus du SRAS , Génétique
7.
Genomics, Proteomics & Bioinformatics ; (4): 216-225, 2003.
Article Dans Anglais | WPRIM | ID: wpr-339504

Résumé

Knowledge of the evolution of pathogens is of great medical and biological significance to the prevention, diagnosis, and therapy of infectious diseases. In order to understand the origin and evolution of the SARS-CoV (severe acute respiratory syndrome-associated coronavirus), we collected complete genome sequences of all viruses available in GenBank, and made comparative analyses with the SARS-CoV. Genomic signature analysis demonstrates that the coronaviruses all take the TGTT as their richest tetranucleotide except the SARS-CoV. A detailed analysis of the forty-two complete SARS-CoV genome sequences revealed the existence of two distinct genotypes, and showed that these isolates could be classified into four groups. Our manual analysis of the BLASTN results demonstrates that the HE (hemagglutinin-esterase) gene exists in the SARS-CoV, and many mutations made it unfamiliar to us.


Sujets)
Motifs d'acides aminés , Substitution d'acide aminé , Composition en bases nucléiques , Codon , Génétique , Biologie informatique , Analyse de mutations d'ADN , Évolution moléculaire , Transfert horizontal de gène , Variation génétique , Génome viral , Phylogenèse , Virus du SRAS , Génétique
8.
Genomics, Proteomics & Bioinformatics ; (4): 226-235, 2003.
Article Dans Anglais | WPRIM | ID: wpr-339503

Résumé

Annotation of the genome sequence of the SARS-CoV (severe acute respiratory syndrome-associated coronavirus) is indispensable to understand its evolution and pathogenesis. We have performed a full annotation of the SARS-CoV genome sequences by using annotation programs publicly available or developed by ourselves. Totally, 21 open reading frames (ORFs) of genes or putative uncharacterized proteins (PUPs) were predicted. Seven PUPs had not been reported previously, and two of them were predicted to contain transmembrane regions. Eight ORFs partially overlapped with or embedded into those of known genes, revealing that the SARS-CoV genome is a small and compact one with overlapped coding regions. The most striking discovery is that an ORF locates on the minus strand. We have also annotated non-coding regions and identified the transcription regulating sequences (TRS) in the intergenic regions. The analysis of TRS supports the minus strand extending transcription mechanism of coronavirus. The SNP analysis of different isolates reveals that mutations of the sequences do not affect the prediction results of ORFs.


Sujets)
Substitution d'acide aminé , Composition en bases nucléiques , Séquence nucléotidique , Biologie informatique , Méthodes , Génome viral , Point isoélectrique , Modèles génétiques , Données de séquences moléculaires , Masse moléculaire , Cadres ouverts de lecture , Virus du SRAS , Génétique , Analyse de séquence , Transcription génétique
SÉLECTION CITATIONS
Détails de la recherche