Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 43(20): e129, 2015 Nov 16.
Artigo em Inglês | MEDLINE | ID: mdl-26101252

RESUMO

Single Molecule, Real-Time (SMRT) Sequencing (Pacific Biosciences, Menlo Park, CA, USA) provides the longest continuous DNA sequencing reads currently available. However, the relatively high error rate in the raw read data requires novel analysis methods to deconvolute sequences derived from complex samples. Here, we present a workflow of novel computer algorithms able to reconstruct viral variant genomes present in mixtures with an accuracy of >QV50. This approach relies exclusively on Continuous Long Reads (CLR), which are the raw reads generated during SMRT Sequencing. We successfully implement this workflow for simultaneous sequencing of mixtures containing up to forty different >9 kb HIV-1 full genomes. This was achieved using a single SMRT Cell for each mixture and desktop computing power. This novel approach opens the possibility of solving complex sequencing tasks that currently lack a solution.


Assuntos
Variação Genética , Genoma Viral , HIV-1/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Análise por Conglomerados , Humanos , Alinhamento de Sequência
2.
Biostatistics ; 10(3): 424-35, 2009 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-19234308

RESUMO

Classification studies with high-dimensional measurements and relatively small sample sizes are increasingly common. Prospective analysis of the role of sample sizes in the performance of such studies is important for study design and interpretation of results, but the complexity of typical pattern discovery methods makes this problem challenging. The approach developed here combines Monte Carlo methods and new approximations for linear discriminant analysis, assuming multivariate normal distributions. Monte Carlo methods are used to sample the distribution of which features are selected for a classifier and the mean and variance of features given that they are selected. Given selected features, the linear discriminant problem involves different distributions of training data and generalization data, for which 2 approximations are compared: one based on Taylor series approximation of the generalization error and the other on approximating the discriminant scores as normally distributed. Combining the Monte Carlo and approximation approaches to different aspects of the problem allows efficient estimation of expected generalization error without full simulations of the entire sampling and analysis process. To evaluate the method and investigate realistic study design questions, full simulations are used to ask how validation error rate depends on the strength and number of informative features, the number of noninformative features, the sample size, and the number of features allowed into the pattern. Both approximation methods perform well for most cases but only the normal discriminant score approximation performs well for cases of very many weakly informative or uninformative dimensions. The simulated cases show that many realistic study designs will typically estimate substantially suboptimal patterns and may have low probability of statistically significant validation results.


Assuntos
Biometria/métodos , Classificação/métodos , Tamanho da Amostra , Algoritmos , Genômica/estatística & dados numéricos , Humanos , Modelos Lineares , Método de Monte Carlo , Análise Multivariada , Proteômica/estatística & dados numéricos
3.
Eukaryot Cell ; 6(6): 940-8, 2007 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-17468393

RESUMO

Pre-mRNA splicing is essential to ensure accurate expression of many genes in eukaryotic organisms. In Entamoeba histolytica, a deep-branching eukaryote, approximately 30% of the annotated genes are predicted to contain introns; however, the accuracy of these predictions has not been tested. In this study, we mined an expressed sequence tag (EST) library representing 7% of amoebic genes and found evidence supporting splicing of 60% of the testable intron predictions, the majority of which contain a GUUUGU 5' splice site and a UAG 3' splice site. Additionally, we identified several splice site misannotations, evidence for the existence of 30 novel introns in previously annotated genes, and identified novel genes through uncovering their spliced ESTs. Finally, we provided molecular evidence for the E. histolytica U2, U4, and U5 snRNAs. These data lay the foundation for further dissection of the role of RNA processing in E. histolytica gene expression.


Assuntos
Entamoeba histolytica , Íntrons , RNA Nuclear Pequeno/metabolismo , Spliceossomos/metabolismo , Animais , Sequência de Bases , Entamoeba histolytica/genética , Entamoeba histolytica/metabolismo , Etiquetas de Sequências Expressas , Regulação da Expressão Gênica , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Splicing de RNA , RNA Nuclear Pequeno/química , RNA Nuclear Pequeno/genética , Spliceossomos/genética
4.
Electrophoresis ; 26(7-8): 1500-12, 2005 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-15765480

RESUMO

A capillary electrophoresis-mass spectrometry (CE-MS) method has been developed to perform routine, automated analysis of low-molecular-weight peptides in human serum. The method incorporates transient isotachophoresis for in-line preconcentration and a sheathless electrospray interface. To evaluate the performance of the method and demonstrate the utility of the approach, an experiment was designed in which peptides were added to sera from individuals at each of two different concentrations, artificially creating two groups of samples. The CE-MS data from the serum samples were divided into separate training and test sets. A pattern-recognition/feature-selection algorithm based on support vector machines was used to select the mass-to-charge (m/z) values from the training set data that distinguished the two groups of samples from each other. The added peptides were identified correctly as the distinguishing features, and pattern recognition based on these peptides was used to assign each sample in the independent test set to its respective group. A twofold difference in peptide concentration could be detected with statistical significance (p-value < 0.0001). The accuracy of the assignment was 95%, demonstrating the utility of this technique for the discovery of patterns of biomarkers in serum.


Assuntos
Biomarcadores/sangue , Eletroforese Capilar/métodos , Espectrometria de Massas por Ionização por Electrospray/métodos , Automação , Eletroforese em Gel Bidimensional , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...