Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Anal Chem ; 94(8): 3467-3475, 2022 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-35171581

RESUMO

Accurate full-length sequencing of a purified unknown protein is still challenging nowadays due to the error-prone mass-spectrometry (MS)-based methods. De novo identified peptide sequence largely contain errors, undermining the accuracy of assembly. Bias on the detectability of the peptides also makes low-coverage regions, resulting in gaps. Although recent advances on multi-enzyme hydrolysis and algorithms showed complete assembly of full-length protein sequences in a few examples, the robustness in practical application is still to be improved. Here, inspired by genome assembly strategies, we demonstrate a contig-scaffolding strategy to assemble protein sequences with high robustness and accuracy. This strategy integrates multiple unspecific hydrolysis methods to minimize the bias in the hydrolysis process. After de novo identification of the peptides, our assembly algorithm, named Multiple Contigs & Scaffolding (MuCS), assembles the peptide sequences in a multistep, i.e., contig-scaffold manner, with error correction in each step. MS data from different hydrolysis experiments complement each other for robust contig extension and error correction. We demonstrated that our strategy on three proteins and three replications all reached 100% coverage (except one with 98.85%) and 98.69-100% accuracy. It can also efficiently deal with the membrane protein, although the transmembrane region was missing due to the limitation of the MS. The three replicates reached 88.85-92.57% coverage and 97.57-100% accuracy. In sum, we provided a practical, robust, and accurate solution for full-length protein sequencing. The MuCS software is available at http://chi-biotech.com/mucs/.


Assuntos
Análise de Sequência de Proteína , Software , Algoritmos , Sequência de Aminoácidos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Proteínas/genética , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos
2.
PLoS One ; 9(4): e94250, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24743329

RESUMO

Correct and bias-free interpretation of the deep sequencing data is inevitably dependent on the complete mapping of all mappable reads to the reference sequence, especially for quantitative RNA-seq applications. Seed-based algorithms are generally slow but robust, while Burrows-Wheeler Transform (BWT) based algorithms are fast but less robust. To have both advantages, we developed an algorithm FANSe2 with iterative mapping strategy based on the statistics of real-world sequencing error distribution to substantially accelerate the mapping without compromising the accuracy. Its sensitivity and accuracy are higher than the BWT-based algorithms in the tests using both prokaryotic and eukaryotic sequencing datasets. The gene identification results of FANSe2 is experimentally validated, while the previous algorithms have false positives and false negatives. FANSe2 showed remarkably better consistency to the microarray than most other algorithms in terms of gene expression quantifications. We implemented a scalable and almost maintenance-free parallelization method that can utilize the computational power of multiple office computers, a novel feature not present in any other mainstream algorithm. With three normal office computers, we demonstrated that FANSe2 mapped an RNA-seq dataset generated from an entire Illunima HiSeq 2000 flowcell (8 lanes, 608 M reads) to masked human genome within 4.1 hours with higher sensitivity than Bowtie/Bowtie2. FANSe2 thus provides robust accuracy, full indel sensitivity, fast speed, versatile compatibility and economical computational utilization, making it a useful and practical tool for deep sequencing applications. FANSe2 is freely available at http://bioinformatics.jnu.edu.cn/software/fanse2/.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Alinhamento de Sequência/métodos , Algoritmos , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Genoma Humano/genética , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Sequência de DNA , Análise de Sequência de RNA , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...