Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BioData Min ; 10: 13, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28450890

RESUMO

BACKGROUND: A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. RESULTS: The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. CONCLUSION: The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features.

2.
Environ Mol Mutagen ; 57(2): 114-24, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26683280

RESUMO

Identification of mutations induced by xenotoxins is a common task in the field of genetic toxicology. Mutations are often detected by clonally expanding potential mutant cells and genotyping each viable clone by Sanger sequencing. Such a "clone-by-clone" approach requires significant time and effort, and sometimes is even impossible to implement. Alternative techniques for efficient mutation identification would greatly benefit both basic and regulatory genetic toxicology research. Here, we report the development of Mutation Analysis with Random DNA Identifiers (MARDI), a novel high-fidelity Next Generation Sequencing (NGS) approach that circumvents clonal expansion and directly catalogs mutations in pools of mutant cells. MARDI uses oligonucleotides carrying Random DNA Identifiers (RDIs) to tag progenitor DNA molecules before PCR amplification, enabling clustering of descendant DNA molecules and eliminating NGS- and PCR-induced sequencing artifacts. When applied to the Pig-a cDNA analysis of heterogeneous pools of CD48-deficient T cells derived from DMBA-treated rats, MARDI detected nearly all Pig-a mutations that were previously identified by conventional clone-by-clone analysis and discovered many additional ones consistent with DMBA exposure: mostly A to T transversions, with the mutated A located on the non-transcribed DNA strand.


Assuntos
9,10-Dimetil-1,2-benzantraceno/toxicidade , Análise Mutacional de DNA/métodos , Mutação , Linfócitos T/efeitos dos fármacos , Animais , Antígenos CD/genética , Antígenos CD/metabolismo , Antígeno CD48 , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Masculino , Reação em Cadeia da Polimerase/métodos , Ratos Endogâmicos F344
3.
PLoS One ; 10(7): e0133315, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26177368

RESUMO

Methods were developed to evaluate the stability of rat whole blood expression obtained from RNA sequencing (RNA-seq) and assess changes in whole blood transcriptome profiles in experiments replicated over time. Expression was measured in globin-depleted RNA extracted from the whole blood of Sprague-Dawley rats, given either saline (control) or neurotoxic doses of amphetamine (AMPH). The experiment was repeated four times (paired control and AMPH groups) over a 2-year span. The transcriptome of the control and AMPH-treated groups was evaluated on: 1) transcript levels for ribosomal protein subunits; 2) relative expression of immune-related genes; 3) stability of the control transcriptome over 2 years; and 4) stability of the effects of AMPH on immune-related genes over 2 years. All, except one, of the 70 genes that encode the 80s ribosome had levels that ranked in the top 5% of all mean expression levels. Deviations in sequencing performance led to significant changes in the ribosomal transcripts. The overall expression profile of immune-related genes and genes specific to monocytes, T-cells or B-cells were well represented and consistent within treatment groups. There were no differences between the levels of ribosomal transcripts in time-matched control and AMPH groups but significant differences in the expression of immune-related genes between control and AMPH groups. AMPH significantly increased expression of some genes related to monocytes but down-regulated those specific to T-cells. These changes were partially due to changes in the two types of leukocytes present in blood, which indicate an activation of the innate immune system by AMPH. Thus, the stability of RNA-seq whole blood transcriptome can be verified by assessing ribosomal protein subunits and immune-related gene expression. Such stability enables the pooling of samples from replicate experiments to carry out differential expression analysis with acceptable power.


Assuntos
Anfetamina/farmacologia , Sangue/metabolismo , Perfilação da Expressão Gênica , Imunidade/genética , Estabilidade de RNA/efeitos dos fármacos , Análise de Sequência de RNA , Transcriptoma/genética , Animais , Sangue/efeitos dos fármacos , Regulação da Expressão Gênica/efeitos dos fármacos , Imunidade/efeitos dos fármacos , Leucócitos/efeitos dos fármacos , Leucócitos/metabolismo , Masculino , Estabilidade de RNA/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Ratos Sprague-Dawley , Subunidades Ribossômicas/genética , Subunidades Ribossômicas/metabolismo , Transcriptoma/efeitos dos fármacos
4.
PLoS One ; 10(6): e0125224, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26039068

RESUMO

The discrete data structure and large sequencing depth of RNA sequencing (RNA-seq) experiments can often generate outlier read counts in one or more RNA samples within a homogeneous group. Thus, how to identify and manage outlier observations in RNA-seq data is an emerging topic of interest. One of the main objectives in these research efforts is to develop statistical methodology that effectively balances the impact of outlier observations and achieves maximal power for statistical testing. To reach that goal, strengthening the accuracy of outlier detection is an important precursor. Current outlier detection algorithms for RNA-seq data are executed within a testing framework and may be sensitive to sparse data and heavy-tailed distributions. Therefore, we propose a univariate algorithm that utilizes a probabilistic approach to measure the deviation between an observation and the distribution generating the remaining data and implement it within in an iterative leave-one-out design strategy. Analyses of real and simulated RNA-seq data show that the proposed methodology has higher outlier detection rates for both non-normalized and normalized negative binomial distributed data.


Assuntos
DNA/genética , Bases de Dados de Ácidos Nucleicos , RNA/genética , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos
5.
BMC Bioinformatics ; 15 Suppl 11: S16, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25350700

RESUMO

BACKGROUND: Chemical cross-linking is used for protein-protein contacts mapping and for structural analysis. One of the difficulties in cross-linking studies is the analysis of mass-spectrometry data and the assignment of the site of cross-link incorporation. The difficulties are due to higher charges of fragment ions, and to the overall low-abundance of cross-link species in the background of linear peptides. Cross-linkers non-specific at one end, such as photo-inducible diazirines, may complicate the analysis further. In this report, we design and validate a novel cross-linked peptide mapping algorithm (XLPM) and compare it to StavroX, which is currently one of the best algorithms in this class. RESULTS: We have designed a novel cross-link search algorithm -XLPM - and implemented it both as an online tool and as a downloadable archive of scripts. We designed a filter based on an observation that observation of a b-ion implies observation of a complimentary y-ion with high probability (b-y filter). We validated the b-y filter on the set of linear peptides from NIST library, and demonstrate that it is an effective way to find high-quality mass spectra. Next, we generated cross-linked data from an ssDNA binding protein, Rim1with a specific cross-linker disuccinimidyl suberate, and a semi-specific cross-linker NHS-Diazirine, followed by analysis of the cross-linked products by nanoLC-LTQ-Orbitrap mass spectrometry. The cross-linked data were searched by XLPM and StavroX and the performance of the two algorithms was compared. The cross-links were mapped to the X-ray structure of Rim1 tetramer. Analysis of the mixture of NHS-Diazirine cross-linked ¹5N and ¹4N-labeled Rim1 tetramers yielded ¹5N-labeled to ¹4N-labeled cross-linked peptide pairs, corresponding to C-terminus-to-N-terminus cross-linking, demonstrating interaction between different two Rim1 tetramers. Both XLPM and StavroX were successful in identification of this interaction, with XLPM leading to a better annotation of higher-charged fragments. We also put forward a new method of estimating specificity and sensitivity of identification of a cross-linked residue in the case of a non-specific cross-linker. CONCLUSIONS: The novel cross-link mapping algorithm, XLPM, considerably improves the speed and accuracy of the analysis compared to other methods. The quality selection filter based on b-to-y ions ratio proved to be an effective way to select high quality cross-linked spectra.


Assuntos
Algoritmos , Reagentes de Ligações Cruzadas , Espectrometria de Massas , Mapeamento de Interação de Proteínas/métodos , Proteínas de Ligação a DNA/química , Humanos , Peptídeos/química , Multimerização Proteica , Software , Succinimidas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...