Your browser doesn't support javascript.
loading
A statistical approach designed for finding mathematically defined repeats in shotgun data and determining the length distribution of clone-inserts / 基因组蛋白质组与生物信息学报·英文版
Genomics, Proteomics & Bioinformatics ; (4): 43-51, 2003.
Article in English | WPRIM | ID: wpr-339525
ABSTRACT
The large amount of repeats, especially high copy repeats, in the genomes of higher animals and plants makes whole genome assembly (WGA) quite difficult. In order to solve this problem, we tried to identify repeats and mask them prior to assembly even at the stage of genome survey. It is known that repeats of different copy number have different probabilities of appearance in shotgun data, so based on this principle, we constructed a statistical model and inferred criteria for mathematically defined repeats (MDRs) at different shotgun coverages. According to these criteria, we developed software MDRmasker to identify and mask MDRs in shotgun data. With repeats masked prior to assembly, the speed of assembly was increased with lower error probability. In addition, clone-insert size affect the accuracy of repeat assembly and scaffold construction, we also designed length distribution of clone-inserts using our model. In our simulated genomes of human and rice, the length distribution of repeats is different, so their optimal length distributions of clone-inserts were not the same. Thus with optimal length distribution of clone-inserts, a given genome could be assembled better at lower coverage.
Subject(s)
Full text: Available Index: WPRIM (Western Pacific) Main subject: Oryza / Genome, Human / Models, Statistical / Genome / Cloning, Molecular / Sequence Analysis, DNA / Genomics / Genetics / Methods / Models, Genetic Type of study: Diagnostic study / Prognostic study / Risk factors Limits: Animals / Humans Language: English Journal: Genomics, Proteomics & Bioinformatics Year: 2003 Type: Article

Similar

MEDLINE

...
LILACS

LIS

Full text: Available Index: WPRIM (Western Pacific) Main subject: Oryza / Genome, Human / Models, Statistical / Genome / Cloning, Molecular / Sequence Analysis, DNA / Genomics / Genetics / Methods / Models, Genetic Type of study: Diagnostic study / Prognostic study / Risk factors Limits: Animals / Humans Language: English Journal: Genomics, Proteomics & Bioinformatics Year: 2003 Type: Article