Search | VHL Regional Portal

Whole genome SNP genotype piecemeal imputation.

Wang, Yining; Wylie, Tim; Stothard, Paul; Lin, Guohui.

BMC Bioinformatics ; 16: 340, 2015 Oct 23.

Article in English | MEDLINE | ID: mdl-26498158

ABSTRACT

BACKGROUND: Despite ongoing reductions in the cost of sequencing technologies, whole genome SNP genotype imputation is often used as an alternative for obtaining abundant SNP genotypes for genome wide association studies. Several existing genotype imputation methods can be efficient for this purpose, while achieving various levels of imputation accuracy. Recent empirical results have shown that the two-step imputation may improve accuracy by imputing the low density genotyped study animals to a medium density array first and then to the target density. We are interested in building a series of staircase arrays that lead the low density array to the high density array or even the whole genome, such that genotype imputation along these staircases can achieve the highest accuracy. RESULTS: For genotype imputation from a lower density to a higher density, we first show how to select untyped SNPs to construct a medium density array. Subsequently, we determine for each selected SNP those untyped SNPs to be imputed in the add-one two-step imputation, and lastly how the clusters of imputed genotype are pieced together as the final imputation result. We design extensive empirical experiments using several hundred sequenced and genotyped animals to demonstrate that our novel two-step piecemeal imputation always achieves an improvement compared to the one-step imputation by the state-of-the-art methods Beagle and FImpute. Using the two-step piecemeal imputation, we present some preliminary success on whole genome SNP genotype imputation for genotyped animals via a series of staircase arrays. CONCLUSIONS: From a low SNP density to the whole genome, intermediate pseudo-arrays can be computationally constructed by selecting the most informative SNPs for untyped SNP genotype imputation. Such pseudo-array staircases are able to impute more accurately than the classic one-step imputation.

Subject(s)

Genome-Wide Association Study , Animals , Cattle , Cluster Analysis , Genome , Genotype , Polymorphism, Single Nucleotide , Software

Protein chain pair simplification under the discrete Fréchet distance.

Wylie, Tim; Zhu, Binhai.

IEEE/ACM Trans Comput Biol Bioinform ; 10(6): 1372-83, 2013.

Article in English | MEDLINE | ID: mdl-24407296

ABSTRACT

For protein structure alignment and comparison, a lot of work has been done using RMSD as the distance measure, which has drawbacks under certain circumstances. Thus, the discrete Fréchet distance was recently applied to the problem of protein (backbone) structure alignment and comparison with promising results. For this problem, visualization is also important because protein chain backbones can have as many as 500-600 $(\alpha)$-carbon atoms, which constitute the vertices in the comparison. Even with an excellent alignment, the similarity of two polygonal chains can be difficult to visualize unless the chains are nearly identical. Thus, the chain pair simplification problem (CPS-3F) was proposed in 2008 to simultaneously simplify both chains with respect to each other under the discrete Fréchet distance. The complexity of CPS-3F is unknown, so heuristic methods have been developed. Here, we define a variation of CPS-3F, called the constrained CPS-3F problem ($({\rm CPS\hbox{-}3F}^+)$), and prove that it is polynomially solvable by presenting a dynamic programming solution, which we then prove is a factor-2 approximation for CPS-3F. We then compare the $({\rm CPS\hbox{-}3F}^+)$ solutions with previous empirical results, and further demonstrate some of the benefits of the simplified comparisons. Chain pair simplification based on the Hausdorff distance (CPS-2H) is known to be NP-complete, and here we prove that the constrained version ($(\rm CPS\hbox{-}2H^+)$) is also NP-complete. Finally, we discuss future work and implications along with a software library implementation, named the Fréchet-based Protein Alignment & Comparison Toolkit (FPACT).

Subject(s)

Computational Biology/methods , Proteins/chemistry , Algorithms , Carbon/chemistry , Computer Simulation , Protein Conformation , Sequence Alignment/methods , Software , Structural Homology, Protein

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL