Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 39(4)2023 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-37039826

RESUMO

MOTIVATION: This work is motivated by the problem of identifying homozygosity islands on the genome of individuals in a population. Our method directly tackles the issue of identification of the homozygosity islands at the population level, without the need of analysing single individuals and then combine the results, as is made nowadays in state-of-the-art approaches. RESULTS: We propose regularized offline change-point methods to detect changes in the parameters of a multidimensional distribution when we have several aligned, independent samples of fixed resolution. We present a penalized maximum likelihood approach that can be efficiently computed by a dynamic programming algorithm or approximated by a fast binary segmentation algorithm. Both estimators are shown to converge almost surely to the set of change-points without the need of specifying a priori the number of change-points. In simulation, we observed similar performances from the exact and greedy estimators. Moreover, we provide a new methodology for the selection of the regularization constant which has the advantage of being automatic, consistent, and less prone to subjective analysis. AVAILABILITY AND IMPLEMENTATION: The data used in the application are from the Human Genome Diversity Project (HGDP) and is publicly available. Algorithms were implemented using the R software R Core Team (R: A Language and Environment for Statistical Computing. Vienna (Austria): R Foundation for Statistical Computing, 2020.) in the R package blockcpd, found at https://github.com/Lucas-Prates/blockcpd.


Assuntos
Algoritmos , Software , Humanos , Funções Verossimilhança , Ilhas , Simulação por Computador
2.
Bioinformatics ; 22(11): 1302-7, 2006 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-16527830

RESUMO

MOTIVATION: A central problem in genomics is to determine the function of a protein using the information contained in its amino acid sequence. Variable length Markov chains (VLMC) are a promising class of models that can effectively classify proteins into families and they can be estimated in linear time and space. RESULTS: We introduce a new algorithm, called Sparse Probabilistic Suffix Trees (SPST), that identifies equivalence between the contexts of a VLMC. We show that, in many cases, the identification of these equivalence can improve the classification rate of the classical Probabilistic Suffix Trees (PST) algorithm. We also show that better classification can be achieved by identifying representative fingerprints in the amino acid chains, and this variation in the SPST algorithm is called F-SPST.


Assuntos
Biologia Computacional/métodos , Análise de Sequência de Proteína/métodos , Algoritmos , Genômica/métodos , Internet , Cadeias de Markov , Modelos Estatísticos , Probabilidade , Alinhamento de Sequência , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...