Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
1.
EMBO Rep ; 22(11): e52061, 2021 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-34423893

RESUMO

H2A.Z is a H2A-type histone variant essential for many aspects of cell biology, ranging from gene expression to genome stability. From deuterostomes, H2A.Z evolved into two paralogues, H2A.Z.1 and H2A.Z.2, that differ by only three amino acids and are encoded by different genes (H2AFZ and H2AFV, respectively). Despite the importance of this histone variant in development and cellular homeostasis, very little is known about the individual functions of each paralogue in mammals. Here, we have investigated the distinct roles of the two paralogues in cell cycle regulation and unveiled non-redundant functions for H2A.Z.1 and H2A.Z.2 in cell division. Our findings show that H2A.Z.1 regulates the expression of cell cycle genes such as Myc and Ki-67 and its depletion leads to a G1 arrest and cellular senescence. On the contrary, H2A.Z.2, in a transcription-independent manner, is essential for centromere integrity and sister chromatid cohesion regulation, thus playing a key role in chromosome segregation.


Assuntos
Segregação de Cromossomos , Histonas , Animais , Centrômero/metabolismo , Instabilidade Genômica , Histonas/genética , Histonas/metabolismo
2.
Biostatistics ; 21(2): e1-e16, 2020 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-30203001

RESUMO

Graphical lasso is one of the most used estimators for inferring genetic networks. Despite its diffusion, there are several fields in applied research where the limits of detection of modern measurement technologies make the use of this estimator theoretically unfounded, even when the assumption of a multivariate Gaussian distribution is satisfied. Typical examples are data generated by polymerase chain reactions and flow cytometer. The combination of censoring and high-dimensionality make inference of the underlying genetic networks from these data very challenging. In this article, we propose an $\ell_1$-penalized Gaussian graphical model for censored data and derive two EM-like algorithms for inference. We evaluate the computational efficiency of the proposed algorithms by an extensive simulation study and show that, when censored data are available, our proposal is superior to existing competitors both in terms of network recovery and parameter estimation. We apply the proposed method to gene expression data generated by microfluidic Reverse Transcription quantitative Polymerase Chain Reaction technology in order to make inference on the regulatory mechanisms of blood development. A software implementation of our method is available on github (https://github.com/LuigiAugugliaro/cglasso).


Assuntos
Algoritmos , Redes Reguladoras de Genes , Distribuição Normal , Simulação por Computador , Humanos , Reação em Cadeia da Polimerase Via Transcriptase Reversa
3.
Entropy (Basel) ; 20(2)2018 Feb 22.
Artigo em Inglês | MEDLINE | ID: mdl-33265233

RESUMO

Regression for count data is widely performed by models such as Poisson, negative binomial (NB) and zero-inflated regression. A challenge often faced by practitioners is the selection of the right model to take into account dispersion, which typically occurs in count datasets. It is highly desirable to have a unified model that can automatically adapt to the underlying dispersion and that can be easily implemented in practice. In this paper, a discrete Weibull regression model is shown to be able to adapt in a simple way to different types of dispersions relative to Poisson regression: overdispersion, underdispersion and covariate-specific dispersion. Maximum likelihood can be used for efficient parameter estimation. The description of the model, parameter inference and model diagnostics is accompanied by simulated and real data analyses.

4.
Methods Mol Biol ; 1552: 115-122, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28224494

RESUMO

Chromatin ImmunoPrecipitation-sequencing (ChIP-seq) experiments have now become routine in biology for the detection of protein binding sites. In this chapter, we show how hidden Markov models can be used for the analysis of data generated by ChIP-seq experiments. We show how a hidden Markov model can naturally account for spatial dependencies in the ChIP-seq data, how it can be used in the presence of data from multiple ChIP-seq experiments under the same biological condition, and how it naturally accounts for the different IP efficiencies of individual ChIP-seq experiments.


Assuntos
Imunoprecipitação da Cromatina/métodos , Cadeias de Markov , Modelos Estatísticos , Análise de Sequência de DNA , Fatores de Transcrição/metabolismo , Humanos , Ligação Proteica
5.
Nat Commun ; 8: 14048, 2017 01 16.
Artigo em Inglês | MEDLINE | ID: mdl-28091603

RESUMO

Repo-Man is a protein phosphatase 1 (PP1) targeting subunit that regulates mitotic progression and chromatin remodelling. After mitosis, Repo-Man/PP1 remains associated with chromatin but its function in interphase is not known. Here we show that Repo-Man, via Nup153, is enriched on condensed chromatin at the nuclear periphery and at the edge of the nucleopore basket. Repo-Man/PP1 regulates the formation of heterochromatin, dephosphorylates H3S28 and it is necessary and sufficient for heterochromatin protein 1 binding and H3K27me3 recruitment. Using a novel proteogenomic approach, we show that Repo-Man is enriched at subtelomeric regions together with H2AZ and H3.3 and that depletion of Repo-Man alters the peripheral localization of a subset of these regions and alleviates repression of some polycomb telomeric genes. This study shows a role for a mitotic phosphatase in the regulation of the epigenetic landscape and gene expression in interphase.


Assuntos
Proteínas de Transporte/metabolismo , Proteínas de Ciclo Celular/metabolismo , Heterocromatina/metabolismo , Interfase , Proteínas Nucleares/metabolismo , Proteínas de Transporte/genética , Proteínas de Ciclo Celular/genética , Linhagem Celular , Cromatina/genética , Cromatina/metabolismo , Montagem e Desmontagem da Cromatina , Heterocromatina/genética , Histonas/genética , Histonas/metabolismo , Humanos , Proteínas Nucleares/genética , Fosforilação
6.
BMC Bioinformatics ; 17(1): 352, 2016 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-27597310

RESUMO

BACKGROUND: Network enrichment analysis is a powerful method, which allows to integrate gene enrichment analysis with the information on relationships between genes that is provided by gene networks. Existing tests for network enrichment analysis deal only with undirected networks, they can be computationally slow and are based on normality assumptions. RESULTS: We propose NEAT, a test for network enrichment analysis. The test is based on the hypergeometric distribution, which naturally arises as the null distribution in this context. NEAT can be applied not only to undirected, but to directed and partially directed networks as well. Our simulations indicate that NEAT is considerably faster than alternative resampling-based methods, and that its capacity to detect enrichments is at least as good as the one of alternative tests. We discuss applications of NEAT to network analyses in yeast by testing for enrichment of the Environmental Stress Response target gene set with GO Slim and KEGG functional gene sets, and also by inspecting associations between functional sets themselves. CONCLUSIONS: NEAT is a flexible and efficient test for network enrichment analysis that aims to overcome some limitations of existing resampling-based tests. The method is implemented in the R package neat, which can be freely downloaded from CRAN ( https://cran.r-project.org/package=neat ).


Assuntos
Redes Reguladoras de Genes , Saccharomyces cerevisiae/genética , Software , Simulação por Computador , Perfilação da Expressão Gênica/métodos , Ontologia Genética , Genes Fúngicos , Estresse Fisiológico/genética
7.
BMC Bioinformatics ; 17: 254, 2016 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-27342572

RESUMO

BACKGROUND: Sparse Gaussian graphical models are popular for inferring biological networks, such as gene regulatory networks. In this paper, we investigate the consistency of these models across different data platforms, such as microarray and next generation sequencing, on the basis of a rich dataset containing samples that are profiled under both techniques as well as a large set of independent samples. RESULTS: Our analysis shows that individual node variances can have a remarkable effect on the connectivity of the resulting network. Their inconsistency across platforms and the fact that the variability level of a node may not be linked to its regulatory role mean that, failing to scale the data prior to the network analysis, leads to networks that are not reproducible across different platforms and that may be misleading. Moreover, we show how the reproducibility of networks across different platforms is significantly higher if networks are summarised in terms of enrichment amongst functional groups of interest, such as pathways, rather than at the level of individual edges. CONCLUSIONS: Careful pre-processing of transcriptional data and summaries of networks beyond individual edges can improve the consistency of network inference across platforms. However, caution is needed at this stage in the (over)interpretation of gene regulatory networks inferred from biological data.


Assuntos
Redes Reguladoras de Genes , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência com Séries de Oligonucleotídeos , Análise de Sequência de RNA , Ansiedade/genética , Depressão/genética , Feminino , Humanos , Masculino , Reprodutibilidade dos Testes , Estudos em Gêmeos como Assunto
8.
Stat Appl Genet Mol Biol ; 15(3): 193-212, 2016 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-27023322

RESUMO

Factorial Gaussian graphical Models (fGGMs) have recently been proposed for inferring dynamic gene regulatory networks from genomic high-throughput data. In the search for true regulatory relationships amongst the vast space of possible networks, these models allow the imposition of certain restrictions on the dynamic nature of these relationships, such as Markov dependencies of low order - some entries of the precision matrix are a priori zeros - or equal dependency strengths across time lags - some entries of the precision matrix are assumed to be equal. The precision matrix is then estimated by l1-penalized maximum likelihood, imposing a further constraint on the absolute value of its entries, which results in sparse networks. Selecting the optimal sparsity level is a major challenge for this type of approaches. In this paper, we evaluate the performance of a number of model selection criteria for fGGMs by means of two simulated regulatory networks from realistic biological processes. The analysis reveals a good performance of fGGMs in comparison with other methods for inferring dynamic networks and of the KLCV criterion in particular for model selection. Finally, we present an application on a high-resolution time-course microarray data from the Neisseria meningitidis bacterium, a causative agent of life-threatening infections such as meningitis. The methodology described in this paper is implemented in the R package sglasso, freely available at CRAN, http://CRAN.R-project.org/package=sglasso.


Assuntos
Redes Reguladoras de Genes , Modelos Genéticos , Algoritmos , Simulação por Computador , Neisseria/genética , Distribuição Normal , Probabilidade
9.
Biostatistics ; 15(2): 296-310, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24178187

RESUMO

Chromatin ImmunoPrecipitation-sequencing (ChIP-seq) experiments have now become routine in biology for the detection of protein-binding sites. In this paper, we present a Markov random field model for the joint analysis of multiple ChIP-seq experiments. The proposed model naturally accounts for spatial dependencies in the data, by assuming first-order Markov dependence and, for the large proportion of zero counts, by using zero-inflated mixture distributions. In contrast to all other available implementations, the model allows for the joint modeling of multiple experiments, by incorporating key aspects of the experimental design. In particular, the model uses the information about replicates and about the different antibodies used in the experiments. An extensive simulation study shows a lower false non-discovery rate for the proposed method, compared with existing methods, at the same false discovery rate. Finally, we present an analysis on real data for the detection of histone modifications of two chromatin modifiers from eight ChIP-seq experiments, including technical replicates with different IP efficiencies.


Assuntos
Imunoprecipitação da Cromatina/normas , Cadeias de Markov , Modelos Estatísticos , Análise de Sequência de DNA/normas , Ligação Proteica , Distribuições Estatísticas
10.
BMC Bioinformatics ; 14: 169, 2013 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-23721376

RESUMO

BACKGROUND: ImmunoPrecipitation (IP) efficiencies may vary largely between different antibodies and between repeated experiments with the same antibody. These differences have a large impact on the quality of ChIP-seq data: a more efficient experiment will necessarily lead to a higher signal to background ratio, and therefore to an apparent larger number of enriched regions, compared to a less efficient experiment. In this paper, we show how IP efficiencies can be explicitly accounted for in the joint statistical modelling of ChIP-seq data. RESULTS: We fit a latent mixture model to eight experiments on two proteins, from two laboratories where different antibodies are used for the two proteins. We use the model parameters to estimate the efficiencies of individual experiments, and find that these are clearly different for the different laboratories, and amongst technical replicates from the same lab. When we account for ChIP efficiency, we find more regions bound in the more efficient experiments than in the less efficient ones, at the same false discovery rate. A priori knowledge of the same number of binding sites across experiments can also be included in the model for a more robust detection of differentially bound regions among two different proteins. CONCLUSIONS: We propose a statistical model for the detection of enriched and differentially bound regions from multiple ChIP-seq data sets. The framework that we present accounts explicitly for IP efficiencies in ChIP-seq data, and allows to model jointly, rather than individually, replicates and experiments from different proteins, leading to more robust biological conclusions.


Assuntos
Imunoprecipitação da Cromatina/métodos , Proteínas de Ligação a DNA/análise , Modelos Estatísticos , Sítios de Ligação , Proteínas de Ligação a DNA/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência de DNA , Fatores de Transcrição/análise , Fatores de Transcrição/metabolismo
11.
PLoS Comput Biol ; 7(11): e1002258, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-22072955

RESUMO

Gene regulatory networks give important insights into the mechanisms underlying physiology and pathophysiology. The derivation of gene regulatory networks from high-throughput expression data via machine learning strategies is problematic as the reliability of these models is often compromised by limited and highly variable samples, heterogeneity in transcript isoforms, noise, and other artifacts. Here, we develop a novel algorithm, dubbed Dandelion, in which we construct and train intraspecies Bayesian networks that are translated and assessed on independent test sets from other species in a reiterative procedure. The interspecies disease networks are subjected to multi-layers of analysis and evaluation, leading to the identification of the most consistent relationships within the network structure. In this study, we demonstrate the performance of our algorithms on datasets from animal models of oculopharyngeal muscular dystrophy (OPMD) and patient materials. We show that the interspecies network of genes coding for the proteasome provide highly accurate predictions on gene expression levels and disease phenotype. Moreover, the cross-species translation increases the stability and robustness of these networks. Unlike existing modeling approaches, our algorithms do not require assumptions on notoriously difficult one-to-one mapping of protein orthologues or alternative transcripts and can deal with missing data. We show that the identified key components of the OPMD disease network can be confirmed in an unseen and independent disease model. This study presents a state-of-the-art strategy in constructing interspecies disease networks that provide crucial information on regulatory relationships among genes, leading to better understanding of the disease molecular mechanisms.


Assuntos
Doença/genética , Redes Reguladoras de Genes , Algoritmos , Animais , Inteligência Artificial , Teorema de Bayes , Biologia Computacional , Bases de Dados Genéticas , Modelos Animais de Doenças , Drosophila/genética , Expressão Gênica , Humanos , Camundongos , Modelos Genéticos , Distrofia Muscular Animal/genética , Distrofia Muscular Oculofaríngea/genética , Fenótipo , Especificidade da Espécie , Transcriptoma
12.
Environ Health Perspect ; 119(3): 306-11, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-21362587

RESUMO

BACKGROUND: The feminization of nature by endocrine-disrupting chemicals (EDCs) is a key environmental issue affecting both terrestrial and aquatic wildlife. A crucial and as yet unanswered question is whether EDCs have adverse impacts on the sustainability of wildlife populations. There is widespread concern that intersex fish are reproductively compromised, with potential population-level consequences. However, to date, only in vitro sperm quality data are available in support of this hypothesis. OBJECTIVE: The aim of this study was to examine whether wild endocrine-disrupted fish can compete successfully in a realistic breeding scenario. METHODS: In two competitive breeding experiments using wild roach (Rutilus rutilus), we used DNA microsatellites to assign parentage and thus determine reproductive success of the adults. RESULTS: In both studies, the majority of intersex fish were able to breed, albeit with varying degrees of success. In the first study, where most intersex fish were only mildly feminized, body length was the only factor correlated with reproductive success. In the second study, which included a higher number of more severely intersex fish, reproductive performance was negatively correlated with severity of intersex. The intersex condition reduced reproductive performance by up to 76% for the most feminized individuals in this study, demonstrating a significant adverse effect of intersex on reproductive performance. CONCLUSION: Feminization of male fish is likely to be an important determinant of reproductive performance in rivers where there is a high prevalence of moderately to severely feminized males.


Assuntos
Cyprinidae/fisiologia , Disruptores Endócrinos/toxicidade , Feminização/epidemiologia , Reprodução/efeitos dos fármacos , Poluentes Químicos da Água/toxicidade , Animais , Comportamento Competitivo , Masculino , Dinâmica Populacional , Espermatozoides/efeitos dos fármacos , Testículo/efeitos dos fármacos
13.
Stat Appl Genet Mol Biol ; 8: Article 41, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19799560

RESUMO

In this paper, we explore the use of M-quantile regression and M-quantile coefficients to detect statistical differences between temporal curves that belong to different experimental conditions. In particular, we consider the application of temporal gene expression data. Here, the aim is to detect genes whose temporal expression is significantly different across a number of biological conditions. We present a new method to approach this problem. Firstly, the temporal profiles of the genes are modelled by a parametric M-quantile regression model. This model is particularly appealing to small-sample gene expression data, as it is very robust against outliers and it does not make any assumption on the error distribution. Secondly, we further increase the robustness of the method by summarising the M-quantile regression models for a large range of quantile values into an M-quantile coefficient. Finally, we fit a polynomial M-quantile regression model to the M-quantile coefficients over time and employ a Hotelling T(2)-test to detect significant differences of the temporal M-quantile coefficients profiles across conditions. Extensive simulations show the increased power and robustness of M-quantile regression methods over standard regression methods and over some of the previously published methods. We conclude by applying the method to detect differentially expressed genes from time-course microarray data on muscular dystrophy.


Assuntos
Expressão Gênica , Análise de Regressão , Simulação por Computador , Análise de Sequência com Séries de Oligonucleotídeos
14.
Artigo em Inglês | MEDLINE | ID: mdl-19644169

RESUMO

In this paper, the extended Kalman filter (EKF) algorithm is applied to model the gene regulatory network from gene time series data. The gene regulatory network is considered as a nonlinear dynamic stochastic model that consists of the gene measurement equation and the gene regulation equation. After specifying the model structure, we apply the EKF algorithm for identifying both the model parameters and the actual value of gene expression levels. It is shown that the EKF algorithm is an online estimation algorithm that can identify a large number of parameters (including parameters of nonlinear functions) through iterative procedure by using a small number of observations. Four real-world gene expression data sets are employed to demonstrate the effectiveness of the EKF algorithm, and the obtained models are evaluated from the viewpoint of bioinformatics.


Assuntos
Expressão Gênica , Redes Reguladoras de Genes , Modelos Genéticos , Dinâmica não Linear , Algoritmos , Animais , Análise por Conglomerados , Genômica , Distribuição Normal , Análise de Sequência com Séries de Oligonucleotídeos , Vírus/genética , Leveduras/genética
15.
J Comput Biol ; 15(3): 305-16, 2008 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-18333757

RESUMO

MicroRNAs (miRNAs) have recently emerged as a new complex layer of gene regulation. MiRNAs act post-transcriptionally, influencing the stability, compartmentalization, and translation of their target mRNAs. Computational efforts to understand the post-transcriptional gene regulation by miRNAs have been focused on the target prediction tools, while quantitative kinetic models of gene regulation by miRNAs have so far largely been overlooked. We here develop a kinetic model of post-transcriptional gene regulation by miRNAs, focusing on the miRNAs' effect on increasing the target mRNAs degradation rates. The model is fitted to a temporal microarray dataset where human mRNAs are measured upon transfection with a specific miRNA (miRNA124a). The proposed model exhibits good fit with many target mRNA profiles, indicating that such type of models can be used for studying post-transcriptional gene regulation by miRNA. In particular, the proposed kinetic model can be used for quantifying the miRNA-mediated effects on its targets in the miRNA mis-expression experiments. The model makes an experimentally verifiable prediction of the miRNA124a decay rate, quantifies the miRNA-mediated effect on the target mRNAs degradation, and yields a good correspondence between the inferred and experimentally measured decay rates of human target mRNAs.


Assuntos
Biologia Computacional/métodos , Regulação da Expressão Gênica , MicroRNAs/genética , Modelos Genéticos , Transcrição Gênica , Regulação para Baixo/genética , Meia-Vida , Cinética , Funções Verossimilhança
16.
Invest Ophthalmol Vis Sci ; 47(12): 5356-62, 2006 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-17122124

RESUMO

PURPOSE: To examine the relationship between an anatomic map relating the retinal nerve fiber layer (RNFL) distribution to the optic nerve head and a functional map derived from the interpoint correlation of raw sensitivities in visual field (VF) testing. METHODS: Previously, interpoint correlations were generated for all possible pairs of VF test points in a dataset of 98,821 Humphrey VF test results taken from the Moorfields Eye Hospital archive. The relationship between these correlations and the physical distance between the VF test point pairs was evaluated by Pearson's correlation coefficient and multiple regression analysis. The distance between the pairs of VF test points was calculated in two ways. First, the anatomic map was used to estimate the angular distance at the optic nerve head (ONH), between the RNFL bundles corresponding to the VF test points in each pair (ONHd). Second, the retinal distance between pairs of test points was calculated from the Humphrey VF template (RETd). A best-fit model for predicting functional correlation (FC) from ONHd and RETd was constructed and used to formulate a filter incorporating the anatomic-functional correlation data. RESULTS: All scatterplots showed a negative association between interpoint retinal sensitivity correlation values and distance between points: ONHd (R2 = 0.60) and RETd (R2 = 0.33). The raw sensitivity correlation values could be predicted from a multiple regression model using ONHd, RETd, and a combined interaction of ONHd and RETd (R2 = 0.75, P < 0.00001). The construction of a new filter was based on the equation FC = 0.9325 - (0.0029 . ONHd) - (0.0077 . RETd) + (0.0001 . ONHd . RETd). CONCLUSIONS: A good level of association was observed between the strength of correlation between points in the VF and the relative location of those test points in the peripheral retina and in corresponding RNFL bundles at the ONH. These results help to validate the relationship between structure and function and may be of use in the further refinement of physiologically derived VF filters to reduce measurement noise.


Assuntos
Glaucoma de Ângulo Aberto/fisiopatologia , Disco Óptico/patologia , Doenças do Nervo Óptico/fisiopatologia , Células Ganglionares da Retina/patologia , Campos Visuais/fisiologia , Humanos , Pressão Intraocular , Fibras Nervosas/patologia
17.
BMC Bioinformatics ; 7: 183, 2006 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-16584545

RESUMO

BACKGROUND: The identification of biologically interesting genes in a temporal expression profiling dataset is challenging and complicated by high levels of experimental noise. Most statistical methods used in the literature do not fully exploit the temporal ordering in the dataset and are not suited to the case where temporal profiles are measured for a number of different biological conditions. We present a statistical test that makes explicit use of the temporal order in the data by fitting polynomial functions to the temporal profile of each gene and for each biological condition. A Hotelling T2-statistic is derived to detect the genes for which the parameters of these polynomials are significantly different from each other. RESULTS: We validate the temporal Hotelling T2-test on muscular gene expression data from four mouse strains which were profiled at different ages: dystrophin-, beta-sarcoglycan and gamma-sarcoglycan deficient mice, and wild-type mice. The first three are animal models for different muscular dystrophies. Extensive biological validation shows that the method is capable of finding genes with temporal profiles significantly different across the four strains, as well as identifying potential biomarkers for each form of the disease. The added value of the temporal test compared to an identical test which does not make use of temporal ordering is demonstrated via a simulation study, and through confirmation of the expression profiles from selected genes by quantitative PCR experiments. The proposed method maximises the detection of the biologically interesting genes, whilst minimising false detections. CONCLUSION: The temporal Hotelling T2-test is capable of finding relatively small and robust sets of genes that display different temporal profiles between the conditions of interest. The test is simple, it can be used on gene expression data generated from any experimental design and for any number of conditions, and it allows fast interpretation of the temporal behaviour of genes. The R code is available from V.V. The microarray data have been submitted to GEO under series GSE1574 and GSE3523.


Assuntos
Bases de Dados Genéticas/estatística & dados numéricos , Perfilação da Expressão Gênica/métodos , Distrofias Musculares/genética , Animais , Simulação por Computador/estatística & dados numéricos , Perfilação da Expressão Gênica/estatística & dados numéricos , Camundongos , Camundongos Endogâmicos C57BL , Camundongos Endogâmicos mdx
18.
Appl Bioinformatics ; 5(1): 1-11, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16539532

RESUMO

Identifying genes that direct the mechanism of a disease from expression data is extremely useful in understanding how that mechanism works. This in turn may lead to better diagnoses and potentially could lead to a cure for that disease. This task becomes extremely challenging when the data are characterised by only a small number of samples and a high number of dimensions, as is often the case with gene expression data. Motivated by this challenge, we present a general framework that focuses on simplicity and data perturbation. These are the keys for robust identification of the most predictive features in such data. Within this framework, we propose a simple selective naive Bayes classifier discovered using a global search technique, and combine it with data perturbation to increase its robustness for small sample sizes. An extensive validation of the method was carried out using two applied datasets from the field of microarrays and a simulated dataset, all confounded by small sample sizes and high dimensionality. The method has been shown to be capable of selecting genes known to be associated with prostate cancer and viral infections.


Assuntos
Biomarcadores Tumorais/análise , Diagnóstico por Computador/métodos , Perfilação da Expressão Gênica/métodos , Proteínas de Neoplasias/análise , Neoplasias/diagnóstico , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Reconhecimento Automatizado de Padrão/métodos , Inteligência Artificial , Marcadores Genéticos/genética , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
19.
Artif Intell Med ; 34(2): 163-77, 2005 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-15894180

RESUMO

OBJECTIVE: Progressive loss of the field of vision is characteristic of a number of eye diseases such as glaucoma which is a leading cause of irreversible blindness in the world. Recently, there has been an explosion in the amount of data being stored on patients who suffer from visual deterioration including field test data, retinal image data and patient demographic data. However, there has been relatively little work in modelling the spatial and temporal relationships common to such data. In this paper we introduce a novel method for classifying visual field (VF) data that explicitly models these spatial and temporal relationships. METHODOLOGY: We carry out an analysis of our proposed spatio-temporal Bayesian classifier and compare it to a number of classifiers from the machine learning and statistical communities. These are all tested on two datasets of VF and clinical data. We investigate the receiver operating characteristics curves, the resulting network structures and also make use of existing anatomical knowledge of the eye in order to validate the discovered models. RESULTS: Results are very encouraging showing that our classifiers are comparable to existing statistical models whilst also facilitating the understanding of underlying spatial and temporal relationships within VF data. The results reveal the potential of using such models for knowledge discovery within ophthalmic databases, such as networks reflecting the 'nasal step', an early indicator of the onset of glaucoma. CONCLUSION: The results outlined in this paper pave the way for a substantial program of study involving many other spatial and temporal datasets, including retinal image and clinical data.


Assuntos
Retina/patologia , Transtornos da Visão/diagnóstico , Transtornos da Visão/fisiopatologia , Algoritmos , Inteligência Artificial , Teorema de Bayes , Humanos
20.
Genome Biol ; 5(11): R94, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15535870

RESUMO

Microarray analysis using clustering algorithms can suffer from lack of inter-method consistency in assigning related gene-expression profiles to clusters. Obtaining a consensus set of clusters from a number of clustering methods should improve confidence in gene-expression analysis. Here we introduce consensus clustering, which provides such an advantage. When coupled with a statistically based gene functional analysis, our method allowed the identification of novel genes regulated by NFkappaB and the unfolded protein response in certain B-cell lymphomas.


Assuntos
Sequência Consenso/fisiologia , Perfilação da Expressão Gênica/estatística & dados numéricos , Regulação da Expressão Gênica/fisiologia , Análise em Microsséries/estatística & dados numéricos , Modelos Genéticos , Análise por Conglomerados , Simulação por Computador , Perfilação da Expressão Gênica/métodos , Análise em Microsséries/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...