Pesquisa | Portal Regional da BVS (teste)

New Distance-Based approach for Genome-Wide Association Studies.

Irigoien, Itziar; Cormand, Bru; Soler-Artigas, Maria; Sanchez-Mora, Cristina; Ramos-Quiroga, Josep-Antoni; Arenas, Concepcion.

IEEE/ACM Trans Comput Biol Bioinform ; 19(5): 2938-2949, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-34181548

RESUMO

With the rise of genome-wide association studies (GWAS), the analysis of typical GWAS data sets with thousands of single-nucleotide polymorphisms (SNPs) has become crucial in biomedicine research. Here, we propose a new method to identify SNPs related to disease in case-control studies. The method, based on genetic distances between individuals, takes into account the possible population substructure, and avoids the issues of multiple testing. The method provides two ordered lists of SNPs; one with SNPs which minor alleles can be considered risk alleles for the disease, and another one with SNPs which minor alleles can be considered as protective. These two lists provide a useful tool to help the researcher to decide where to focus attention in a first stage.

Assuntos

Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Humanos , Alelos , Estudos de Casos e Controles , Predisposição Genética para Doença/genética , Estudo de Associação Genômica Ampla/métodos , Genótipo , Polimorfismo de Nucleotídeo Único/genética

ORdensity: user-friendly R package to identify differentially expressed genes.

Martínez-Otzeta, José María; Irigoien, Itziar; Sierra, Basilio; Arenas, Concepción.

BMC Bioinformatics ; 21(1): 135, 2020 Apr 07.

Artigo em Inglês | MEDLINE | ID: mdl-32264950

RESUMO

BACKGROUND: Microarray technology provides the expression level of many genes. Nowadays, an important issue is to select a small number of informative differentially expressed genes that provide biological knowledge and may be key elements for a disease. With the increasing volume of data generated by modern biomedical studies, software is required for effective identification of differentially expressed genes. Here, we describe an R package, called ORdensity, that implements a recent methodology (Irigoien and Arenas, 2018) developed in order to identify differentially expressed genes. The benefits of parallel implementation are discussed. RESULTS: ORdensity gives the user the list of genes identified as differentially expressed genes in an easy and comprehensible way. The experimentation carried out in an off-the-self computer with the parallel execution enabled shows an improvement in run-time. This implementation may also lead to an important use of memory load. Results previously obtained with simulated and real data indicated that the procedure implemented in the package is robust and suitable for differentially expressed genes identification. CONCLUSIONS: The new package, ORdensity, offers a friendly and easy way to identify differentially expressed genes, which is very useful for users not familiar with programming. AVAILABILITY: https://github.com/rsait/ORdensity.

Assuntos

Interface Usuário-Computador , Doença/genética , Regulação da Expressão Gênica , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , RNA-Seq/métodos

Identification of differentially expressed genes by means of outlier detection.

Irigoien, Itziar; Arenas, Concepción.

BMC Bioinformatics ; 19(1): 317, 2018 Sep 10.

Artigo em Inglês | MEDLINE | ID: mdl-30200879

RESUMO

BACKGROUND: An important issue in microarray data is to select, from thousands of genes, a small number of informative differentially expressed (DE) genes which may be key elements for a disease. If each gene is analyzed individually, there is a big number of hypotheses to test and a multiple comparison correction method must be used. Consequently, the resulting cut-off value may be too small. Moreover, an important issue is the selection's replicability of the DE genes. We present a new method, called ORdensity, to obtain a reproducible selection of DE genes. It takes into account the relation between all genes and it is not a gene-by-gene approach, unlike the usually applied techniques to DE gene selection. RESULTS: The proposed method returns three measures, related to the concepts of outlier and density of false positives in a neighbourhood, which allow us to identify the DE genes with high classification accuracy. To assess the performance of ORdensity, we used simulated microarray data and four real microarray cancer data sets. The results indicated that the method correctly detects the DE genes; it is competitive with other well accepted methods; the list of DE genes that it obtains is useful for the correct classification or diagnosis of new future samples and, in general, it is more stable than other procedures. CONCLUSIONS: ORdensity is a new method for identifying DE genes that avoids some of the shortcomings of the individual gene identification and it is stable when the original sample is changed by subsamples.

Assuntos

Biomarcadores/metabolismo , Perfilação da Expressão Gênica/métodos , Neoplasias/genética , Humanos , Análise de Sequência com Séries de Oligonucleotídeos/métodos

Diagnosis using clinical/pathological and molecular information.

Irigoien, Itziar; Arenas, Concepción.

Stat Methods Med Res ; 25(6): 2878-2894, 2016 12.

Artigo em Inglês | MEDLINE | ID: mdl-24821003

RESUMO

In diagnosis and classification diseases multiple outcomes, both molecular and clinical/pathological are routinely gathered on patients. In recent years, many approaches have been suggested for integrating gene expression (continuous data) with clinical/pathological data (usually categorical and ordinal data). This new area of research integrates both clinical and genomic data in order to improve our knowledge about diseases, and to capture the information which is lost in independent clinical or genomic studies. The related metric scaling distance is a not well-known, but very valuable distance to integrate clinical/pathological and molecular information. In this article, we present the use of the related metric scaling distance in biomedical research. We describe how this distance works, and we also explain why it may sometimes be preferred. We discuss the choice of the related metric scaling distance and compare it with other proximity measures to include both clinical and genetic information. Furthermore, we comment the choice of the related metric scaling distance when classical clustering or discriminant analysis based on distances are performed and compare the results with more complex cluster or discriminant procedures specially constructed for integrating clinical and molecular information. The use of the related metric scaling distance is illustrated on simulated experimental and four real data sets, a heart disease, and three cancer studies. The results present the flexibility and availability of this distance which gives competitive results.

Assuntos

Cardiopatias/diagnóstico , Cardiopatias/genética , Neoplasias/diagnóstico , Neoplasias/genética , Pesquisa Biomédica , Análise por Conglomerados , Análise Discriminante , Perfilação da Expressão Gênica , Cardiopatias/patologia , Humanos , Neoplasias/patologia

Development of a prediction model for fatal and non-fatal coronary heart disease and cardiovascular disease in patients with newly diagnosed type 2 diabetes mellitus: the Basque Country Prospective Complications and Mortality Study risk engine (BASCORE).

Piniés, José A; González-Carril, Fernando; Arteagoitia, José M; Irigoien, Itziar; Altzibar, Jone M; Rodriguez-Murua, José L; Echevarriarteun, Larraitz.

Diabetologia ; 57(11): 2324-33, 2014 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-25212259

RESUMO

AIMS/HYPOTHESIS: The aim of this study was to construct a model for predicting CHD and cardiovascular disease (CVD) risk in patients with newly diagnosed type 2 diabetes in a southern European region. External validation of two other cardiovascular risk models and internal validation of our model were assessed. METHODS: We studied 65,651 people attending a primary care setting in the Basque Country Health Service. A 10-year prospective population-based cohort study was performed with 777 patients newly diagnosed with type 2 diabetes older than 24 years in a Sentinel Practice Network. Cardiovascular risk factors, CVD events and mortality were registered. Coefficients for the significant predictors of CHD and CVD were estimated using Cox models. We assessed the discrimination and calibration of the UK Prospective Diabetes Study risk engine (UKPDS-RE), the Framingham Risk Score-Regicor Study (FRS-RS) and the cardiovascular risk model we developed. RESULTS: The incidence rate per 1,000 patients/year was calculated for microvascular and cardiovascular complications, and death. Age, the ratio of non-HDL- to HDL-cholesterol, HbA1c, systolic blood pressure and smoking were significant predictors of cardiovascular events. A risk model was developed using these predictors. The UKPDS-RE and FRS-RS showed inadequate discrimination (Uno's C statistics 0.62 and 0.58, respectively) and calibration (24% overestimation and 51% underestimation, respectively) for predicting CHD risk. The internal discrimination and calibration of the developed model were acceptable for predicting fatal/non-fatal 2- and 5-, but not 10-year CHD and CVD risk. CONCLUSIONS/INTERPRETATION: This study is the first southern European validated population-derived model for predicting 5-year fatal/non-fatal CHD and CVD risk in patients with newly diagnosed type 2 diabetes.

Assuntos

Doenças Cardiovasculares/etiologia , Doenças Cardiovasculares/mortalidade , Doença das Coronárias/etiologia , Doença das Coronárias/mortalidade , Diabetes Mellitus Tipo 2/complicações , Diabetes Mellitus Tipo 2/mortalidade , Modelos Teóricos , Idoso , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Estudos Prospectivos

Towards application of one-class classification methods to medical data.

Irigoien, Itziar; Sierra, Basilio; Arenas, Concepción.

ScientificWorldJournal ; 2014: 730712, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-24778600

RESUMO

In the problem of one-class classification (OCC) one of the classes, the target class, has to be distinguished from all other possible objects, considered as nontargets. In many biomedical problems this situation arises, for example, in diagnosis, image based tumor recognition or analysis of electrocardiogram data. In this paper an approach to OCC based on a typicality test is experimentally compared with reference state-of-the-art OCC techniques--Gaussian, mixture of Gaussians, naive Parzen, Parzen, and support vector data description-using biomedical data sets. We evaluate the ability of the procedures using twelve experimental data sets with not necessarily continuous data. As there are few benchmark data sets for one-class classification, all data sets considered in the evaluation have multiple classes. Each class in turn is considered as the target class and the units in the other classes are considered as new units to be classified. The results of the comparison show the good performance of the typicality approach, which is available for high dimensional data; it is worth mentioning that it can be used for any kind of data (continuous, discrete, or nominal), whereas state-of-the-art approaches application is not straightforward when nominal variables are present.

Assuntos

Algoritmos , Pesquisa Biomédica/estatística & dados numéricos , Interpretação Estatística de Dados , Modelos Biológicos , Humanos , Reprodutibilidade dos Testes

The depth problem: identifying the most representative units in a data group.

Irigoien, Itziar; Mestres, Francesc; Arenas, Concepción.

IEEE/ACM Trans Comput Biol Bioinform ; 10(1): 161-72, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23702552

RESUMO

This paper presents a solution to the problem of how to identify the units in groups or clusters that have the greatest degree of centrality and best characterize each group. This problem frequently arises in the classification of data such as types of tumor, gene expression profiles or general biomedical data. It is particularly important in the common context that many units do not properly belong to any cluster. Furthermore, in gene expression data classification, good identification of the most central units in a cluster enables recognition of the most important samples in a particular pathological process. We propose a new depth function that allows us to identify central units. As our approach is based on a measure of distance or dissimilarity between any pair of units, it can be applied to any kind of multivariate data (continuous, binary or multiattribute data). Therefore, it is very valuable in many biomedical applications, which usually involve noncontinuous data, such as clinical, pathological, or biological data sources. We validate the approach using artificial examples and apply it to empirical data. The results show the good performance of our statistical approach.

Assuntos

Algoritmos , Análise por Conglomerados , Biologia Computacional/métodos , Simulação por Computador , Bases de Dados Factuais , Perfilação da Expressão Gênica , Humanos , Modelos Biológicos , Neoplasias/genética , Neoplasias/metabolismo , Reprodutibilidade dos Testes

ICGE: an R package for detecting relevant clusters and atypical units in gene expression.

Irigoien, Itziar; Sierra, Basilio; Arenas, Concepcion.

BMC Bioinformatics ; 13: 30, 2012 Feb 13.

Artigo em Inglês | MEDLINE | ID: mdl-22330431

RESUMO

BACKGROUND: Gene expression technologies have opened up new ways to diagnose and treat cancer and other diseases. Clustering algorithms are a useful approach with which to analyze genome expression data. They attempt to partition the genes into groups exhibiting similar patterns of variation in expression level. An important problem associated with gene classification is to discern whether the clustering process can find a relevant partition as well as the identification of new genes classes. There are two key aspects to classification: the estimation of the number of clusters, and the decision as to whether a new unit (gene, tumor sample...) belongs to one of these previously identified clusters or to a new group. RESULTS: ICGE is a user-friendly R package which provides many functions related to this problem: identify the number of clusters using mixed variables, usually found by applied biomedical researchers; detect whether the data have a cluster structure; identify whether a new unit belongs to one of the pre-identified clusters or to a novel group, and classify new units into the corresponding cluster. The functions in the ICGE package are accompanied by help files and easy examples to facilitate its use. CONCLUSIONS: We demonstrate the utility of ICGE by analyzing simulated and real data sets. The results show that ICGE could be very useful to a broad research community.

Assuntos

Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Neoplasias/genética , Software , Algoritmos , Simulação por Computador , Humanos , Transtornos Linfoproliferativos/genética

Microarray time course experiments: finding profiles.

Irigoien, Itziar; Vives, Sergi; Arenas, Concepción.

IEEE/ACM Trans Comput Biol Bioinform ; 8(2): 464-75, 2011.

Artigo em Inglês | MEDLINE | ID: mdl-21233526

RESUMO

Time course studies with microarray techniques and experimental replicates are very useful in biomedical research. We present, in replicate experiments, an alternative approach to select and cluster genes according to a new measure for association between genes. First, the procedure normalizes and standardizes the expression profile of each gene, and then, identifies scaling parameters that will further minimize the distance between replicates of the same gene. Then, the procedure filters out genes with a flat profile, detects differences between replicates, and separates genes without significant differences from the rest. For this last group of genes, we define a mean profile for each gene and use it to compute the distance between two genes. Next, a hierarchical clustering procedure is proposed, a statistic is computed for each cluster to determine its compactness, and the total number of classes is determined. For the rest of the genes, those with significant differences between replicates, the procedure detects where the differences between replicates lie, and assigns each gene to the best fitting previously identified profile or defines a new profile. We illustrate this new procedure using simulated data and a representative data set arising from a microarray experiment with replication, and report interesting results.

Assuntos

Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise por Conglomerados , Cinética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA