Search | VHL Regional Portal

Assembler for de novo assembly of large genomes.

Chu, Te-Chin; Lu, Chen-Hua; Liu, Tsunglin; Lee, Greg C; Li, Wen-Hsiung; Shih, Arthur Chun-Chieh.

Proc Natl Acad Sci U S A ; 110(36): E3417-24, 2013 Sep 03.

Article in English | MEDLINE | ID: mdl-23966565

ABSTRACT

Assembling a large genome using next generation sequencing reads requires large computer memory and a long execution time. To reduce these requirements, we propose an extension-based assembler, called JR-Assembler, where J and R stand for "jumping" extension and read "remapping." First, it uses the read count to select good quality reads as seeds. Second, it extends each seed by a whole-read extension process, which expedites the extension process and can jump over short repeats. Third, it uses a dynamic back trimming process to avoid extension termination due to sequencing errors. Fourth, it remaps reads to each assembled sequence, and if an assembly error occurs by the presence of a repeat, it breaks the contig at the repeat boundaries. Fifth, it applies a less stringent extension criterion to connect low-coverage regions. Finally, it merges contigs by unused reads. An extensive comparison of JR-Assembler with current assemblers using datasets from small, medium, and large genomes shows that JR-Assembler achieves a better or comparable overall assembly quality and requires lower memory use and less central processing unit time, especially for large genomes. Finally, a simulation study shows that JR-Assembler achieves a superior performance on memory use and central processing unit time than most current assemblers when the read length is 150 bp or longer, indicating that the advantages of JR-Assembler over current assemblers will increase as the read length increases with advances in next generation sequencing technology.

Subject(s)

Computational Biology/methods , Genome, Bacterial , Genome, Fungal , High-Throughput Nucleotide Sequencing/methods , Computational Biology/instrumentation , High-Throughput Nucleotide Sequencing/statistics & numerical data , Reproducibility of Results , Software

GR-Aligner: an algorithm for aligning pairwise genomic sequences containing rearrangement events.

Chu, Te-Chin; Liu, Tsunglin; Lee, D T; Lee, Greg C; Shih, Arthur Chun-Chieh.

Bioinformatics ; 25(17): 2188-93, 2009 Sep 01.

Article in English | MEDLINE | ID: mdl-19542149

ABSTRACT

MOTIVATION: Homologous genomic sequences between species usually contain different rearrangement events. Whether some specific patterns existed in the breakpoint regions that caused such events to occur is still unclear. To resolve this question, it is necessary to determine the location of breakpoints at the nucleotide level. The availability of sequences near breakpoints would further facilitate the related studies. We thus need a tool that can identify breakpoints and align the neighboring sequences. Although local alignment tools can detect rearrangement events, they only report a set of discontinuous alignments, where the detailed alignments in the breakpoint regions are usually missing. Global alignment tools are even less appropriate for these tasks since most of them are designed to align the conserved regions between sequences in a consistent order, i.e. they do not consider rearrangement events. RESULTS: We propose an effective and efficient pairwise sequence alignment algorithm, called GR-Aligner (Genomic Rearrangement Aligner), which can find breakpoints of rearrangement events by integrating the forward and reverse alignments of the breakpoint regions flanked by homologously rearranged sequences. In addition, GR-Aligner also provides an option to view the alignments of sequences extended to the breakpoints. These outputs provide materials for studying possible evolutionary mechanisms and biological functionalities of the rearrangement.

Subject(s)

Algorithms , Gene Rearrangement/genetics , Genome/genetics , Sequence Alignment/methods , Animals , Base Sequence , Computer Simulation , Humans , Pan troglodytes/genetics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL