Pesquisa | Portal Regional da BVS (teste)

Evaluation of vicinity-based hidden Markov models for genotype imputation.

Wang, Su; Kim, Miran; Jiang, Xiaoqian; Harmanci, Arif Ozgun.

BMC Bioinformatics ; 23(1): 356, 2022 Aug 29.

Artigo em Inglês | MEDLINE | ID: mdl-36038834

RESUMO

BACKGROUND: The decreasing cost of DNA sequencing has led to a great increase in our knowledge about genetic variation. While population-scale projects bring important insight into genotype-phenotype relationships, the cost of performing whole-genome sequencing on large samples is still prohibitive. In-silico genotype imputation coupled with genotyping-by-arrays is a cost-effective and accurate alternative for genotyping of common and uncommon variants. Imputation methods compare the genotypes of the typed variants with the large population-specific reference panels and estimate the genotypes of untyped variants by making use of the linkage disequilibrium patterns. Most accurate imputation methods are based on the Li-Stephens hidden Markov model, HMM, that treats the sequence of each chromosome as a mosaic of the haplotypes from the reference panel. RESULTS: Here we assess the accuracy of vicinity-based HMMs, where each untyped variant is imputed using the typed variants in a small window around itself (as small as 1 centimorgan). Locality-based imputation is used recently by machine learning-based genotype imputation approaches. We assess how the parameters of the vicinity-based HMMs impact the imputation accuracy in a comprehensive set of benchmarks and show that vicinity-based HMMs can accurately impute common and uncommon variants. CONCLUSIONS: Our results indicate that locality-based imputation models can be effectively used for genotype imputation. The parameter settings that we identified can be used in future methods and vicinity-based HMMs can be used for re-structuring and parallelizing new imputation methods. The source code for the vicinity-based HMM implementations is publicly available at https://github.com/harmancilab/LoHaMMer .

Assuntos

Polimorfismo de Nucleotídeo Único , Software , Estudo de Associação Genômica Ampla/métodos , Genótipo , Haplótipos , Desequilíbrio de Ligação , Análise de Sequência de DNA/métodos

Ultrafast homomorphic encryption models enable secure outsourcing of genotype imputation.

Kim, Miran; Harmanci, Arif Ozgun; Bossuat, Jean-Philippe; Carpov, Sergiu; Cheon, Jung Hee; Chillotti, Ilaria; Cho, Wonhee; Froelicher, David; Gama, Nicolas; Georgieva, Mariya; Hong, Seungwan; Hubaux, Jean-Pierre; Kim, Duhyeong; Lauter, Kristin; Ma, Yiping; Ohno-Machado, Lucila; Sofia, Heidi; Son, Yongha; Song, Yongsoo; Troncoso-Pastoriza, Juan; Jiang, Xiaoqian.

Cell Syst ; 12(11): 1108-1120.e4, 2021 11 17.

Artigo em Inglês | MEDLINE | ID: mdl-34464590

RESUMO

Genotype imputation is a fundamental step in genomic data analysis, where missing variant genotypes are predicted using the existing genotypes of nearby "tag" variants. Although researchers can outsource genotype imputation, privacy concerns may prohibit genetic data sharing with an untrusted imputation service. Here, we developed secure genotype imputation using efficient homomorphic encryption (HE) techniques. In HE-based methods, the genotype data are secure while it is in transit, at rest, and in analysis. It can only be decrypted by the owner. We compared secure imputation with three state-of-the-art non-secure methods and found that HE-based methods provide genetic data security with comparable accuracy for common variants. HE-based methods have time and memory requirements that are comparable or lower than those for the non-secure methods. Our results provide evidence that HE-based methods can practically perform resource-intensive computations for high-throughput genetic data analysis. The source code is freely available for download at https://github.com/K-miran/secure-imputation.

Assuntos

Serviços Terceirizados , Segurança Computacional , Estudo de Associação Genômica Ampla , Genótipo , Privacidade

Stochastic sampling of the RNA structural alignment space.

Harmanci, Arif Ozgun; Sharma, Gaurav; Mathews, David H.

Nucleic Acids Res ; 37(12): 4063-75, 2009 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-19429694

RESUMO

A novel method is presented for predicting the common secondary structures and alignment of two homologous RNA sequences by sampling the 'structural alignment' space, i.e. the joint space of their alignments and common secondary structures. The structural alignment space is sampled according to a pseudo-Boltzmann distribution based on a pseudo-free energy change that combines base pairing probabilities from a thermodynamic model and alignment probabilities from a hidden Markov model. By virtue of the implicit comparative analysis between the two sequences, the method offers an improvement over single sequence sampling of the Boltzmann ensemble. A cluster analysis shows that the samples obtained from joint sampling of the structural alignment space cluster more closely than samples generated by the single sequence method. On average, the representative (centroid) structure and alignment of the most populated cluster in the sample of structures and alignments generated by joint sampling are more accurate than single sequence sampling and alignment based on sequence alone, respectively. The 'best' centroid structure that is closest to the known structure among all the centroids is, on average, more accurate than structure predictions of other methods. Additionally, cluster analysis identifies, on average, a few clusters, whose centroids can be presented as alternative candidates. The source code for the proposed method can be downloaded at http://rna.urmc.rochester.edu.

Assuntos

RNA/química , Alinhamento de Sequência/métodos , Análise de Sequência de RNA , Algoritmos , Análise por Conglomerados , Conformação de Ácido Nucleico , Processos Estocásticos

PARTS: probabilistic alignment for RNA joinT secondary structure prediction.

Harmanci, Arif Ozgun; Sharma, Gaurav; Mathews, David H.

Nucleic Acids Res ; 36(7): 2406-17, 2008 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-18304945

RESUMO

A novel method is presented for joint prediction of alignment and common secondary structures of two RNA sequences. The joint consideration of common secondary structures and alignment is accomplished by structural alignment over a search space defined by the newly introduced motif called matched helical regions. The matched helical region formulation generalizes previously employed constraints for structural alignment and thereby better accommodates the structural variability within RNA families. A probabilistic model based on pseudo free energies obtained from precomputed base pairing and alignment probabilities is utilized for scoring structural alignments. Maximum a posteriori (MAP) common secondary structures, sequence alignment and joint posterior probabilities of base pairing are obtained from the model via a dynamic programming algorithm called PARTS. The advantage of the more general structural alignment of PARTS is seen in secondary structure predictions for the RNase P family. For this family, the PARTS MAP predictions of secondary structures and alignment perform significantly better than prior methods that utilize a more restrictive structural alignment model. For the tRNA and 5S rRNA families, the richer structural alignment model of PARTS does not offer a benefit and the method therefore performs comparably with existing alternatives. For all RNA families studied, the posterior probability estimates obtained from PARTS offer an improvement over posterior probability estimates from a single sequence prediction. When considering the base pairings predicted over a threshold value of confidence, the combination of sensitivity and positive predictive value is superior for PARTS than for the single sequence prediction. PARTS source code is available for download under the GNU public license at http://rna.urmc.rochester.edu.

Assuntos

Algoritmos , RNA/química , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Conformação de Ácido Nucleico , Probabilidade

Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign.

Harmanci, Arif Ozgun; Sharma, Gaurav; Mathews, David H.

BMC Bioinformatics ; 8: 130, 2007 Apr 19.

Artigo em Inglês | MEDLINE | ID: mdl-17445273

RESUMO

BACKGROUND: Joint alignment and secondary structure prediction of two RNA sequences can significantly improve the accuracy of the structural predictions. Methods addressing this problem, however, are forced to employ constraints that reduce computation by restricting the alignments and/or structures (i.e. folds) that are permissible. In this paper, a new methodology is presented for the purpose of establishing alignment constraints based on nucleotide alignment and insertion posterior probabilities. Using a hidden Markov model, posterior probabilities of alignment and insertion are computed for all possible pairings of nucleotide positions from the two sequences. These alignment and insertion posterior probabilities are additively combined to obtain probabilities of co-incidence for nucleotide position pairs. A suitable alignment constraint is obtained by thresholding the co-incidence probabilities. The constraint is integrated with Dynalign, a free energy minimization algorithm for joint alignment and secondary structure prediction. The resulting method is benchmarked against the previous version of Dynalign and against other programs for pairwise RNA structure prediction. RESULTS: The proposed technique eliminates manual parameter selection in Dynalign and provides significant computational time savings in comparison to prior constraints in Dynalign while simultaneously providing a small improvement in the structural prediction accuracy. Savings are also realized in memory. In experiments over a 5S RNA dataset with average sequence length of approximately 120 nucleotides, the method reduces computation by a factor of 2. The method performs favorably in comparison to other programs for pairwise RNA structure prediction: yielding better accuracy, on average, and requiring significantly lesser computational resources. CONCLUSION: Probabilistic analysis can be utilized in order to automate the determination of alignment constraints for pairwise RNA structure prediction methods in a principled fashion. These constraints can reduce the computational and memory requirements of these methods while maintaining or improving their accuracy of structural prediction. This extends the practical reach of these methods to longer length sequences. The revised Dynalign code is freely available for download.

Assuntos

Modelos Estatísticos , RNA/genética , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Valor Preditivo dos Testes , RNA/química , Software

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA