Search | VHL Regional Portal

Kinpute: using identity by descent to improve genotype imputation.

Abney, Mark; ElSherbiny, Aisha.

Bioinformatics ; 35(21): 4321-4326, 2019 11 01.

Article in English | MEDLINE | ID: mdl-30918937

ABSTRACT

MOTIVATION: Genotype imputation, though generally accurate, often results in many genotypes being poorly imputed, particularly in studies where the individuals are not well represented by standard reference panels. When individuals in the study share regions of the genome identical by descent (IBD), it is possible to use this information in combination with a study-specific reference panel (SSRP) to improve the imputation results. Kinpute uses IBD information-due to recent, familial relatedness or distant, unknown ancestors-in conjunction with the output from linkage disequilibrium (LD) based imputation methods to compute more accurate genotype probabilities. Kinpute uses a novel method for IBD imputation, which works even in the absence of a pedigree, and results in substantially improved imputation quality. RESULTS: Given initial estimates of average IBD between subjects in the study sample, Kinpute uses a novel algorithm to select an optimal set of individuals to sequence and use as an SSRP. Kinpute is designed to use as input both this SSRP and the genotype probabilities output from other LD-based imputation software, and uses a new method to combine the LD imputed genotype probabilities with IBD configurations to substantially improve imputation. We tested Kinpute on a human population isolate where 98 individuals have been sequenced. In half of this sample, whose sequence data was masked, we used Impute2 to perform LD-based imputation and Kinpute was used to obtain higher accuracy genotype probabilities. Measures of imputation accuracy improved significantly, particularly for those genotypes that Impute2 imputed with low certainty. AVAILABILITY AND IMPLEMENTATION: Kinpute is an open-source and freely available C++ software package that can be downloaded from. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Software , Genotype , Humans , Linkage Disequilibrium , Pedigree , Polymorphism, Single Nucleotide

Phylogenomic clustering for selecting non-redundant genomes for comparative genomics.

Moreno-Hagelsieb, Gabriel; Wang, Zilin; Walsh, Stephanie; ElSherbiny, Aisha.

Bioinformatics ; 29(7): 947-9, 2013 Apr 01.

Article in English | MEDLINE | ID: mdl-23396122

ABSTRACT

MOTIVATION: Analyses in comparative genomics often require non-redundant genome datasets. Eliminating redundancy is not as simple as keeping one strain for each named species because genomes might be redundant at a higher taxonomic level than that of species for some analyses; some strains with different species names can be as similar as most strains sharing a species name, whereas some strains sharing a species name can be so different that they should be put into different groups; and some genomes lack a species name. RESULTS: We have implemented a method and Web server that clusters a genome dataset into groups of redundant genomes at different thresholds based on a few phylogenomic distance measures. AVAILABILITY: The Web interface, similarity and distance data and R-scripts can be accessed at http://microbiome.wlu.ca/research/redundancy/.

Subject(s)

Genomics/methods , Phylogeny , Genome , Internet , Software

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL