Search | VHL Regional Portal

Taking a color photo: A homozygous 25-bp deletion in Bace2 may cause brown-and-white coat color in giant pandas.

Guan, Dengfeng; Sun, Shuyan; Song, Lingyun; Zhao, Pengpeng; Nie, Yonggang; Huang, Xin; Zhou, Wenliang; Yan, Li; Lei, Yinghu; Hu, Yibo; Wei, Fuwen.

Proc Natl Acad Sci U S A ; 121(11): e2317430121, 2024 Mar 12.

Article in English | MEDLINE | ID: mdl-38437540

ABSTRACT

Brown-and-white giant pandas (hereafter brown pandas) are distinct coat color mutants found exclusively in the Qinling Mountains, Shaanxi, China. However, its genetic mechanism has remained unclear since their discovery in 1985. Here, we identified the genetic basis for this coat color variation using a combination of field ecological data, population genomic data, and a CRISPR-Cas9 knockout mouse model. We de novo assembled a long-read-based giant panda genome and resequenced the genomes of 35 giant pandas, including two brown pandas and two family trios associated with a brown panda. We identified a homozygous 25-bp deletion in the first exon of Bace2, a gene encoding amyloid precursor protein cleaving enzyme, as the most likely genetic basis for brown-and-white coat color. This deletion was further validated using PCR and Sanger sequencing of another 192 black giant pandas and CRISPR-Cas9 edited knockout mice. Our investigation revealed that this mutation reduced the number and size of melanosomes of the hairs in knockout mice and possibly in the brown panda, further leading to the hypopigmentation. These findings provide unique insights into the genetic basis of coat color variation in wild animals.

Subject(s)

Ursidae , Animals , Mice , Ursidae/genetics , Peptide Hydrolases , Amyloid beta-Protein Precursor , Animals, Wild , Mice, Knockout

Digital Noah's Ark: last chance to save the endangered species.

Wei, Fuwen; Huang, Guangping; Guan, Dengfeng; Fan, Huizhong; Zhou, Wenliang; Wang, Depeng; Hu, Yibo.

Sci China Life Sci ; 65(11): 2325-2327, 2022 11.

Article in English | MEDLINE | ID: mdl-36223043

Subject(s)

Endangered Species , Animals

Efficient iterative Hi-C scaffolder based on N-best neighbors.

Guan, Dengfeng; McCarthy, Shane A; Ning, Zemin; Wang, Guohua; Wang, Yadong; Durbin, Richard.

BMC Bioinformatics ; 22(1): 569, 2021 Nov 27.

Article in English | MEDLINE | ID: mdl-34837944

ABSTRACT

BACKGROUND: Efficient and effective genome scaffolding tools are still in high demand for generating reference-quality assemblies. While long read data itself is unlikely to create a chromosome-scale assembly for most eukaryotic species, the inexpensive Hi-C sequencing technology, capable of capturing the chromosomal profile of a genome, is now widely used to complete the task. However, the existing Hi-C based scaffolding tools either require a priori chromosome number as input, or lack the ability to build highly continuous scaffolds. RESULTS: We design and develop a novel Hi-C based scaffolding tool, pin_hic, which takes advantage of contact information from Hi-C reads to construct a scaffolding graph iteratively based on N-best neighbors of contigs. Subsequent to scaffolding, it identifies potential misjoins and breaks them to keep the scaffolding accuracy. Through our tests on three long read based de novo assemblies from three different species, we demonstrate that pin_hic is more efficient than current standard state-of-art tools, and it can generate much more continuous scaffolds, while achieving a higher or comparable accuracy. CONCLUSIONS: Pin_hic is an efficient Hi-C based scaffolding tool, which can be useful for building chromosome-scale assemblies. As many sequencing projects have been launched in the recent years, we believe pin_hic has potential to be applied in these projects and makes a meaningful contribution.

Subject(s)

Genome , Genomics , Chromosomes/genetics , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA

Correction to: Efficient iterative Hi-C scaffolder based on N-best neighbors.

Guan, Dengfeng; McCarthy, Shane A; Ning, Zemin; Wang, Guohua; Wang, Yadong; Durbin, Richard.

BMC Bioinformatics ; 22(1): 612, 2021 Dec 31.

Article in English | MEDLINE | ID: mdl-34972500

Short Read Alignment Based on Maximal Approximate Match Seeds.

Quan, Wei; Guan, Dengfeng; Quan, Guangri; Liu, Bo; Wang, Yadong.

Front Mol Biosci ; 7: 572934, 2020.

Article in English | MEDLINE | ID: mdl-33251246

ABSTRACT

Sequence alignment is a critical step in many critical genomic studies, such as variant calling, quantitative transcriptome analysis (RNA-seq), and metagenomic sequence classification. However, the alignment performance is largely affected by repetitive sequences in the reference genome, which extensively exist in species from bacteria to mammals. Aligning repeating sequences might lead to tremendous candidate locations, bringing about a challenging computational burden. Thus, most alignment tools prefer to simply discard highly repetitive seeds, but this may cause the true alignment to be missed. Using maximal approximate matches (MAMs) as seeds is an option, but MEMs seeds may fail due to sequencing errors or genomic variations in MEMs seeds. Here, we propose a novel sequence alignment algorithm, named MAM, which can efficiently align short DNA sequences. MAM first builds a modified Burrows-Wheeler transform (BWT) structure of a reference genome to accelerate approximate seed matching. Then, MAM uses maximal approximate matches (MAMs) seeds to reduce the candidate locations. Finally, MAM applies an affine-gap-penalty dynamic programming to extend MAMs seeds. Experimental results on simulated and real sequencing datasets show that MAM achieves better performance in speed than other state-of-the-art alignment tools. The source code is available at https://github.com/weiquan/mam.

Identifying and removing haplotypic duplication in primary genome assemblies.

Guan, Dengfeng; McCarthy, Shane A; Wood, Jonathan; Howe, Kerstin; Wang, Yadong; Durbin, Richard.

Bioinformatics ; 36(9): 2896-2898, 2020 05 01.

Article in English | MEDLINE | ID: mdl-31971576

ABSTRACT

MOTIVATION: Rapid development in long-read sequencing and scaffolding technologies is accelerating the production of reference-quality assemblies for large eukaryotic genomes. However, haplotype divergence in regions of high heterozygosity often results in assemblers creating two copies rather than one copy of a region, leading to breaks in contiguity and compromising downstream steps such as gene annotation. Several tools have been developed to resolve this problem. However, they either focus only on removing contained duplicate regions, also known as haplotigs, or fail to use all the relevant information and hence make errors. RESULTS: Here we present a novel tool, purge_dups, that uses sequence similarity and read depth to automatically identify and remove both haplotigs and heterozygous overlaps. In comparison with current tools, we demonstrate that purge_dups can reduce heterozygous duplication and increase assembly continuity while maintaining completeness of the primary assembly. Moreover, purge_dups is fully automatic and can easily be integrated into assembly pipelines. AVAILABILITY AND IMPLEMENTATION: The source code is written in C and is available at https://github.com/dfguan/purge_dups. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

High-Throughput Nucleotide Sequencing , Software , Genome , Haplotypes , Sequence Analysis, DNA

Fast read alignment with incorporation of known genomic variants.

Guo, Hongzhe; Liu, Bo; Guan, Dengfeng; Fu, Yilei; Wang, Yadong.

BMC Med Inform Decis Mak ; 19(Suppl 6): 265, 2019 12 19.

Article in English | MEDLINE | ID: mdl-31856811

ABSTRACT

BACKGROUND: Many genetic variants have been reported from sequencing projects due to decreasing experimental costs. Compared to the current typical paradigm, read mapping incorporating existing variants can improve the performance of subsequent analysis. This method is supposed to map sequencing reads efficiently to a graphical index with a reference genome and known variation to increase alignment quality and variant calling accuracy. However, storing and indexing various types of variation require costly RAM space. METHODS: Aligning reads to a graph model-based index including the whole set of variants is ultimately an NP-hard problem in theory. Here, we propose a variation-aware read alignment algorithm (VARA), which generates the alignment between read and multiple genomic sequences simultaneously utilizing the schema of the Landau-Vishkin algorithm. VARA dynamically extracts regional variants to construct a pseudo tree-based structure on-the-fly for seed extension without loading the whole genome variation into memory space. RESULTS: We developed the novel high-throughput sequencing read aligner deBGA-VARA by integrating VARA into deBGA. The deBGA-VARA is benchmarked both on simulated reads and the NA12878 sequencing dataset. The experimental results demonstrate that read alignment incorporating genetic variation knowledge can achieve high sensitivity and accuracy. CONCLUSIONS: Due to its efficiency, VARA provides a promising solution for further improvement of variant calling while maintaining small memory footprints. The deBGA-VARA is available at: https://github.com/hitbc/deBGA-VARA.

Subject(s)

Algorithms , Genetic Variation/genetics , Genome, Human/genetics , High-Throughput Nucleotide Sequencing , Sequence Analysis, DNA/classification , Benchmarking , Humans , Sequence Analysis, DNA/methods , Software

rHAT: fast alignment of noisy long reads with regional hashing.

Liu, Bo; Guan, Dengfeng; Teng, Mingxiang; Wang, Yadong.

Bioinformatics ; 32(11): 1625-31, 2016 06 01.

Article in English | MEDLINE | ID: mdl-26568628

ABSTRACT

MOTIVATION: Single Molecule Real-Time (SMRT) sequencing has been widely applied in cutting-edge genomic studies. However, it is still an expensive task to align the noisy long SMRT reads to reference genome by state-of-the-art aligners, which is becoming a bottleneck in applications with SMRT sequencing. Novel approach is on demand for improving the efficiency and effectiveness of SMRT read alignment. RESULTS: We propose Regional Hashing-based Alignment Tool (rHAT), a seed-and-extension-based read alignment approach specifically designed for noisy long reads. rHAT indexes reference genome by regional hash table (RHT), a hash table-based index which describes the short tokens within local windows of reference genome. In the seeding phase, rHAT utilizes RHT for efficiently calculating the occurrences of short token matches between partial read and local genomic windows to find highly possible candidate sites. In the extension phase, a sparse dynamic programming-based heuristic approach is used for reducing the cost of aligning read to the candidate sites. By benchmarking on the real and simulated datasets from various prokaryote and eukaryote genomes, we demonstrated that rHAT can effectively align SMRT reads with outstanding throughput. AVAILABILITY AND IMPLEMENTATION: rHAT is implemented in C++; the source code is available at https://github.com/HIT-Bioinformatics/rHAT CONTACT: ydwang@hit.edu.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Software , Algorithms , Genomics , Sequence Alignment , Sequence Analysis, DNA

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL