Búsqueda | Portal Regional de la BVS

1.

Zero-covid advocacy during the COVID-19 pandemic: a case study of views on Twitter/X.

Kepp, Kasper P; Bardosh, Kevin; De Bie, Tijl; Emilsson, Louise; Greaves, Justin; Lallukka, Tea; Muka, Taulant; Rangel, J Christian; Sandström, Niclas; Schippers, Michaéla C; Schmidt-Chanasit, Jonas; Vaillancourt, Tracy.

Monash Bioeth Rev ; 2024 Sep 03.

Artículo en Inglés | MEDLINE | ID: mdl-39225854

RESUMEN

During the COVID-19 pandemic, many advocacy groups and individuals criticized governments on social media for doing either too much or too little to mitigate the pandemic. In this article, we review advocacy for COVID-19 elimination or "zero-covid" on the social media platform X (Twitter). We present a thematic analysis of tweets by 20 influential co-signatories of the World Health Network letter on ten themes, covering six topics of science and mitigation (zero-covid, epidemiological data on variants, long-term post-acute sequelae (Long COVID), vaccines, schools and children, views on monkeypox/Mpox) and four advocacy methods (personal advice and promoting remedies, use of anecdotes, criticism of other scientists, and of authorities). The advocacy, although timely and informative, often appealed to emotions and values using anecdotes and strong criticism of authorities and other scientists. Many tweets received hundreds or thousands of likes. Risks were emphasized about children's vulnerability, Long COVID, variant severity, and Mpox, and via comparisons with human immunodeficiency viruses (HIV). Far-reaching policies and promotion of remedies were advocated without systematic evidence review, or sometimes, core field expertise. We identified potential conflicts of interest connected to private companies. Our study documents a need for public health debates to be less polarizing and judgmental, and more factual. In order to protect public trust in science during a crisis, we suggest the development of mechanisms to ensure ethical guidelines for engagement in "science-based" advocacy, and consideration of cost-benefit analysis of recommendations for public health decision-making.

2.

Evaluating Representation Learning and Graph Layout Methods for Visualization.

Heiter, Edith; Kang, Bo; De Bie, Tijl; Lijffijt, Jefrey; Potel, Mike.

IEEE Comput Graph Appl ; 42(3): 19-28, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-35671278

RESUMEN

Graphs and other structured data have come to the forefront in machine learning over the past few years due to the efficacy of novel representation learning methods boosting the prediction performance in various tasks. Representation learning methods embed the nodes in a low-dimensional real-valued space, enabling the application of traditional machine learning methods on graphs. These representations have been widely premised to be also suited for graph visualization. However, no benchmarks or encompassing studies on this topic exist. We present an empirical study comparing several state-of-the-art representation learning methods with two recent graph layout algorithms, using readability and distance-based measures as well as the link prediction performance. Generally, no method consistently outperformed the others across quality measures. The graph layout methods provided qualitatively superior layouts when compared to representation learning methods. Embedding graphs in a higher dimensional space and applying t-distributed stochastic neighbor embedding for visualization improved the preservation of local neighborhoods, albeit at substantially higher computational cost.

Asunto(s)

Algoritmos , Aprendizaje Automático , Benchmarking , Investigación Empírica , Proyectos de Investigación

3.

An Empirical Evaluation of Network Representation Learning Methods.

Mara, Alexandru Cristian; Lijffijt, Jefrey; De Bie, Tijl.

Big Data ; 2022 Mar 10.

Artículo en Inglés | MEDLINE | ID: mdl-35271383

RESUMEN

Network representation learning methods map network nodes to vectors in an embedding space that can preserve specific properties and enable traditional downstream prediction tasks. The quality of the representations learned is then generally showcased through results on these downstream tasks. Commonly used benchmark tasks such as link prediction or network reconstruction, however, present complex evaluation pipelines and an abundance of design choices. This, together with a lack of standardized evaluation setups, can obscure the real progress in the field. In this article, we aim at investigating the impact on the performance of a variety of such design choices and perform an extensive and consistent evaluation that can shed light on the state-of-the-art on network representation learning. Our evaluation reveals that only limited progress has been made in recent years, with embedding-based approaches struggling to outperform basic heuristics in many scenarios.

4.

Conditional t-SNE: more informative t-SNE embeddings.

Kang, Bo; García García, Darío; Lijffijt, Jefrey; Santos-Rodríguez, Raúl; De Bie, Tijl.

Mach Learn ; 110(10): 2905-2940, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-34840420

RESUMEN

Dimensionality reduction and manifold learning methods such as t-distributed stochastic neighbor embedding (t-SNE) are frequently used to map high-dimensional data into a two-dimensional space to visualize and explore that data. Going beyond the specifics of t-SNE, there are two substantial limitations of any such approach: (1) not all information can be captured in a single two-dimensional embedding, and (2) to well-informed users, the salient structure of such an embedding is often already known, preventing that any real new insights can be obtained. Currently, it is not known how to extract the remaining information in a similarly effective manner. We introduce conditional t-SNE (ct-SNE), a generalization of t-SNE that discounts prior information in the form of labels. This enables obtaining more informative and more relevant embeddings. To achieve this, we propose a conditioned version of the t-SNE objective, obtaining an elegant method with a single integrated objective. We show how to efficiently optimize the objective and study the effects of the extra parameter that ct-SNE has over t-SNE. Qualitative and quantitative empirical results on synthetic and real data show ct-SNE is scalable, effective, and achieves its goal: it allows complementary structure to be captured in the embedding and provided new insights into real data.

5.

Opinion dynamics with backfire effect and biased assimilation.

Chen, Xi; Tsaparas, Panayiotis; Lijffijt, Jefrey; De Bie, Tijl.

PLoS One ; 16(9): e0256922, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-34469486

RESUMEN

The democratization of AI tools for content generation, combined with unrestricted access to mass media for all (e.g. through microblogging and social media), makes it increasingly hard for people to distinguish fact from fiction. This raises the question of how individual opinions evolve in such a networked environment without grounding in a known reality. The dominant approach to studying this problem uses simple models from the social sciences on how individuals change their opinions when exposed to their social neighborhood, and applies them on large social networks. We propose a novel model that incorporates two known social phenomena: (i) Biased Assimilation: the tendency of individuals to adopt other opinions if they are similar to their own; (ii) Backfire Effect: the fact that an opposite opinion may further entrench people in their stances, making their opinions more extreme instead of moderating them. To the best of our knowledge, this is the first DeGroot-type opinion formation model that captures the Backfire Effect. A thorough theoretical and empirical analysis of the proposed model reveals intuitive conditions for polarization and consensus to exist, as well as the properties of the resulting opinions.

Asunto(s)

Actitud , Modelos Psicológicos , Redes Sociales en Línea , Prejuicio/psicología , Humanos , Medios de Comunicación Sociales

6.

SIMIT: Subjectively Interesting Motifs in Time Series.

Deng, Junning; Lijffijt, Jefrey; Kang, Bo; De Bie, Tijl.

Entropy (Basel) ; 21(6)2019 Jun 05.

Artículo en Inglés | MEDLINE | ID: mdl-33267280

RESUMEN

Numerical time series data are pervasive, originating from sources as diverse as wearable devices, medical equipment, to sensors in industrial plants. In many cases, time series contain interesting information in terms of subsequences that recur in approximate form, so-called motifs. Major open challenges in this area include how one can formalize the interestingness of such motifs and how the most interesting ones can be found. We introduce a novel approach that tackles these issues. We formalize the notion of such subsequence patterns in an intuitive manner and present an information-theoretic approach for quantifying their interestingness with respect to any prior expectation a user may have about the time series. The resulting interestingness measure is thus a subjective measure, enabling a user to find motifs that are truly interesting to them. Although finding the best motif appears computationally intractable, we develop relaxations and a branch-and-bound approach implemented in a constraint programming solver. As shown in experiments on synthetic data and two real-world datasets, this enables us to mine interesting patterns in small or mid-sized time series.

7.

The structure of the EU mediasphere.

Flaounas, Ilias; Turchi, Marco; Ali, Omar; Fyson, Nick; De Bie, Tijl; Mosdell, Nick; Lewis, Justin; Cristianini, Nello.

PLoS One ; 5(12): e14243, 2010 Dec 08.

Artículo en Inglés | MEDLINE | ID: mdl-21170383

RESUMEN

BACKGROUND: A trend towards automation of scientific research has recently resulted in what has been termed "data-driven inquiry" in various disciplines, including physics and biology. The automation of many tasks has been identified as a possible future also for the humanities and the social sciences, particularly in those disciplines concerned with the analysis of text, due to the recent availability of millions of books and news articles in digital format. In the social sciences, the analysis of news media is done largely by hand and in a hypothesis-driven fashion: the scholar needs to formulate a very specific assumption about the patterns that might be in the data, and then set out to verify if they are present or not. METHODOLOGY/PRINCIPAL FINDINGS: In this study, we report what we think is the first large scale content-analysis of cross-linguistic text in the social sciences, by using various artificial intelligence techniques. We analyse 1.3 M news articles in 22 languages detecting a clear structure in the choice of stories covered by the various outlets. This is significantly affected by objective national, geographic, economic and cultural relations among outlets and countries, e.g., outlets from countries sharing strong economic ties are more likely to cover the same stories. We also show that the deviation from average content is significantly correlated with membership to the eurozone, as well as with the year of accession to the EU. CONCLUSIONS/SIGNIFICANCE: While independently making a multitude of small editorial decisions, the leading media of the 27 EU countries, over a period of six months, shaped the contents of the EU mediasphere in a way that reflects its deep geographic, economic and cultural relations. Detecting these subtle signals in a statistically rigorous way would be out of the reach of traditional methods. This analysis demonstrates the power of the available methods for significant automation of media content analysis.

Asunto(s)

Cultura , Recolección de Datos , Medios de Comunicación de Masas , Automatización , Libros , Unión Europea , Humanos , Investigación , Ciencias Sociales

8.

The condition-dependent transcriptional network in Escherichia coli.

Lemmens, Karen; De Bie, Tijl; Dhollander, Thomas; Monsieurs, Pieter; De Moor, Bart; Collado-Vides, Julio; Engelen, Kristof; Marchal, Kathleen.

Ann N Y Acad Sci ; 1158: 29-35, 2009 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-19348629

RESUMEN

Thanks to the availability of high-throughput omics data, bioinformatics approaches are able to hypothesize thus-far undocumented genetic interactions. However, due to the amount of noise in these data, inferences based on a single data source are often unreliable. A popular approach to overcome this problem is to integrate different data sources. In this study, we describe DISTILLER, a novel framework for data integration that simultaneously analyzes microarray and motif information to find modules that consist of genes that are co-expressed in a subset of conditions, and their corresponding regulators. By applying our method on publicly available data, we evaluated the condition-specific transcriptional network of Escherichia coli. DISTILLER confirmed 62% of 736 interactions described in RegulonDB, and 278 novel interactions were predicted.

Asunto(s)

Biología Computacional/métodos , Escherichia coli/genética , Regulación Bacteriana de la Expresión Génica , Redes Reguladoras de Genes , Algoritmos , Bases de Datos Genéticas , Perfilación de la Expresión Génica , Modelos Genéticos , Análisis de Secuencia por Matrices de Oligonucleótidos

9.

DISTILLER: a data integration framework to reveal condition dependency of complex regulons in Escherichia coli.

Lemmens, Karen; De Bie, Tijl; Dhollander, Thomas; De Keersmaecker, Sigrid C; Thijs, Inge M; Schoofs, Geert; De Weerdt, Ami; De Moor, Bart; Vanderleyden, Jos; Collado-Vides, Julio; Engelen, Kristof; Marchal, Kathleen.

Genome Biol ; 10(3): R27, 2009.

Artículo en Inglés | MEDLINE | ID: mdl-19265557

RESUMEN

We present DISTILLER, a data integration framework for the inference of transcriptional module networks. Experimental validation of predicted targets for the well-studied fumarate nitrate reductase regulator showed the effectiveness of our approach in Escherichia coli. In addition, the condition dependency and modularity of the inferred transcriptional network was studied. Surprisingly, the level of regulatory complexity seemed lower than that which would be expected from RegulonDB, indicating that complex regulatory programs tend to decrease the degree of modularity.

Asunto(s)

Biología Computacional/métodos , Escherichia coli/genética , Redes Reguladoras de Genes , Regulón/genética , Programas Informáticos , Inmunoprecipitación de Cromatina , Regulación Bacteriana de la Expresión Génica , Factores de Transcripción

10.

ModuleDigger: an itemset mining framework for the detection of cis-regulatory modules.

Sun, Hong; De Bie, Tijl; Storms, Valerie; Fu, Qiang; Dhollander, Thomas; Lemmens, Karen; Verstuyf, Annemieke; De Moor, Bart; Marchal, Kathleen.

BMC Bioinformatics ; 10 Suppl 1: S30, 2009 Jan 30.

Artículo en Inglés | MEDLINE | ID: mdl-19208131

RESUMEN

BACKGROUND: The detection of cis-regulatory modules (CRMs) that mediate transcriptional responses in eukaryotes remains a key challenge in the postgenomic era. A CRM is characterized by a set of co-occurring transcription factor binding sites (TFBS). In silico methods have been developed to search for CRMs by determining the combination of TFBS that are statistically overrepresented in a certain geneset. Most of these methods solve this combinatorial problem by relying on computational intensive optimization methods. As a result their usage is limited to finding CRMs in small datasets (containing a few genes only) and using binding sites for a restricted number of transcription factors (TFs) out of which the optimal module will be selected. RESULTS: We present an itemset mining based strategy for computationally detecting cis-regulatory modules (CRMs) in a set of genes. We tested our method by applying it on a large benchmark data set, derived from a ChIP-Chip analysis and compared its performance with other well known cis-regulatory module detection tools. CONCLUSION: We show that by exploiting the computational efficiency of an itemset mining approach and combining it with a well-designed statistical scoring scheme, we were able to prioritize the biologically valid CRMs in a large set of coregulated genes using binding sites for a large number of potential TFs as input.

Asunto(s)

Algoritmos , Elementos Reguladores de la Transcripción , Programas Informáticos , Factores de Transcripción/metabolismo , Sitios de Unión , Bases de Datos Genéticas , Modelos Genéticos

11.

Integrating microarray and proteomics data to predict the response on cetuximab in patients with rectal cancer.

Daemen, Anneleen; Gevaert, Olivier; De Bie, Tijl; Debucquoy, Annelies; Machiels, Jean-Pascal; De Moor, Bart; Haustermans, Karin.

Pac Symp Biocomput ; : 166-77, 2008.

Artículo en Inglés | MEDLINE | ID: mdl-18229684

RESUMEN

To investigate the combination of cetuximab, capecitabine and radiotherapy in the preoperative treatment of patients with rectal cancer, fourty tumour samples were gathered before treatment (T0), after one dose of cetuximab but before radiotherapy with capecitabine (T1) and at moment of surgery (T2). The tumour and plasma samples were subjected at all timepoints to Affymetrix microarray and Luminex proteomics analysis, respectively. At surgery, the Rectal Cancer Regression Grade (RCRG) was registered. We used a kernel-based method with Least Squares Support Vector Machines to predict RCRG based on the integration of microarray and proteomics data on To and T1. We demonstrated that combining multiple data sources improves the predictive power. The best model was based on 5 genes and 10 proteins at T0 and T1 and could predict the RCRG with an accuracy of 91.7%, sensitivity of 96.2% and specificity of 80%.

Asunto(s)

Anticuerpos Monoclonales/uso terapéutico , Antineoplásicos/uso terapéutico , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Proteómica/estadística & datos numéricos , Neoplasias del Recto/terapia , Algoritmos , Anticuerpos Monoclonales Humanizados , Inteligencia Artificial , Capecitabina , Cetuximab , Terapia Combinada , Biología Computacional , Interpretación Estadística de Datos , Bases de Datos Factuales , Desoxicitidina/análogos & derivados , Desoxicitidina/uso terapéutico , Fluorouracilo/análogos & derivados , Fluorouracilo/uso terapéutico , Humanos , Análisis de los Mínimos Cuadrados , Modelos Estadísticos , Neoplasias del Recto/genética , Neoplasias del Recto/metabolismo

12.

Kernel-based data fusion for gene prioritization.

De Bie, Tijl; Tranchevent, Léon-Charles; van Oeffelen, Liesbeth M M; Moreau, Yves.

Bioinformatics ; 23(13): i125-32, 2007 Jul 01.

Artículo en Inglés | MEDLINE | ID: mdl-17646288

RESUMEN

MOTIVATION: Hunting disease genes is a problem of primary importance in biomedical research. Biologists usually approach this problem in two steps: first a set of candidate genes is identified using traditional positional cloning or high-throughput genomics techniques; second, these genes are further investigated and validated in the wet lab, one by one. To speed up discovery and limit the number of costly wet lab experiments, biologists must test the candidate genes starting with the most probable candidates. So far, biologists have relied on literature studies, extensive queries to multiple databases and hunches about expected properties of the disease gene to determine such an ordering. Recently, we have introduced the data mining tool ENDEAVOUR (Aerts et al., 2006), which performs this task automatically by relying on different genome-wide data sources, such as Gene Ontology, literature, microarray, sequence and more. RESULTS: In this article, we present a novel kernel method that operates in the same setting: based on a number of different views on a set of training genes, a prioritization of test genes is obtained. We furthermore provide a thorough learning theoretical analysis of the method's guaranteed performance. Finally, we apply the method to the disease data sets on which ENDEAVOUR (Aerts et al., 2006) has been benchmarked, and report a considerable improvement in empirical performance. AVAILABILITY: The MATLAB code used in the empirical results will be made publicly available.

Asunto(s)

Biomarcadores/metabolismo , Mapeo Cromosómico/métodos , Sistemas de Administración de Bases de Datos , Bases de Datos Factuales , Susceptibilidad a Enfermedades/metabolismo , Almacenamiento y Recuperación de la Información/métodos , Modelos Biológicos , Simulación por Computador , Predisposición Genética a la Enfermedad/genética , Humanos

13.

The evolution of mammalian gene families.

Demuth, Jeffery P; De Bie, Tijl; Stajich, Jason E; Cristianini, Nello; Hahn, Matthew W.

PLoS One ; 1: e85, 2006 Dec 20.

Artículo en Inglés | MEDLINE | ID: mdl-17183716

RESUMEN

Gene families are groups of homologous genes that are likely to have highly similar functions. Differences in family size due to lineage-specific gene duplication and gene loss may provide clues to the evolutionary forces that have shaped mammalian genomes. Here we analyze the gene families contained within the whole genomes of human, chimpanzee, mouse, rat, and dog. In total we find that more than half of the 9,990 families present in the mammalian common ancestor have either expanded or contracted along at least one lineage. Additionally, we find that a large number of families are completely lost from one or more mammalian genomes, and a similar number of gene families have arisen subsequent to the mammalian common ancestor. Along the lineage leading to modern humans we infer the gain of 689 genes and the loss of 86 genes since the split from chimpanzees, including changes likely driven by adaptive natural selection. Our results imply that humans and chimpanzees differ by at least 6% (1,418 of 22,000 genes) in their complement of genes, which stands in stark contrast to the oft-cited 1.5% difference between orthologous nucleotide sequences. This genomic "revolving door" of gene gain and loss represents a large number of genetic differences separating humans from our closest relatives.

Asunto(s)

Evolución Biológica , Mamíferos/genética , Familia de Multigenes , Animales , Perros , Humanos , Ratones , Pan troglodytes/genética , Filogenia , Primates/genética , Ratas , Roedores/genética , Selección Genética

14.

Inferring transcriptional modules from ChIP-chip, motif and microarray data.

Lemmens, Karen; Dhollander, Thomas; De Bie, Tijl; Monsieurs, Pieter; Engelen, Kristof; Smets, Bart; Winderickx, Joris; De Moor, Bart; Marchal, Kathleen.

Genome Biol ; 7(5): R37, 2006.

Artículo en Inglés | MEDLINE | ID: mdl-16677396

RESUMEN

'ReMoDiscovery' is an intuitive algorithm to correlate regulatory programs with regulators and corresponding motifs to a set of co-expressed genes. It exploits in a concurrent way three independent data sources: ChIP-chip data, motif information and gene expression profiles. When compared to published module discovery algorithms, ReMoDiscovery is fast and easily tunable. We evaluated our method on yeast data, where it was shown to generate biologically meaningful findings and allowed the prediction of potential novel roles of transcriptional regulators.

Asunto(s)

Algoritmos , Inmunoprecipitación de Cromatina , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Análisis de Secuencia por Matrices de Oligonucleótidos , Aminoácidos/metabolismo , Ciclo Celular , Galactosa/metabolismo , Elementos Reguladores de la Transcripción , Ribosomas/metabolismo , Programas Informáticos , Factores de Transcripción/metabolismo , Levaduras/genética , Levaduras/crecimiento & desarrollo , Levaduras/metabolismo

15.

CAFE: a computational tool for the study of gene family evolution.

De Bie, Tijl; Cristianini, Nello; Demuth, Jeffery P; Hahn, Matthew W.

Bioinformatics ; 22(10): 1269-71, 2006 May 15.

Artículo en Inglés | MEDLINE | ID: mdl-16543274

RESUMEN

SUMMARY: We present CAFE (Computational Analysis of gene Family Evolution), a tool for the statistical analysis of the evolution of the size of gene families. It uses a stochastic birth and death process to model the evolution of gene family sizes over a phylogeny. For a specified phylogenetic tree, and given the gene family sizes in the extant species, CAFE can estimate the global birth and death rate of gene families, infer the most likely gene family size at all internal nodes, identify gene families that have accelerated rates of gain and loss (quantified by a p-value) and identify which branches cause the p-value to be small for significant families. AVAILABILITY: Software is available from http://www.bio.indiana.edu/~hahnlab/Software.html

Asunto(s)

Algoritmos , Mapeo Cromosómico/métodos , Análisis Mutacional de ADN/métodos , Evolución Molecular , Familia de Multigenes/genética , Programas Informáticos , Variación Genética/genética , Filogenia , Interfaz Usuario-Computador

16.

Estimating the tempo and mode of gene family evolution from comparative genomic data.

Hahn, Matthew W; De Bie, Tijl; Stajich, Jason E; Nguyen, Chi; Cristianini, Nello.

Genome Res ; 15(8): 1153-60, 2005 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-16077014

RESUMEN

Comparison of whole genomes has revealed that changes in the size of gene families among organisms is quite common. However, there are as yet no models of gene family evolution that make it possible to estimate ancestral states or to infer upon which lineages gene families have contracted or expanded. In addition, large differences in family size have generally been attributed to the effects of natural selection, without a strong statistical basis for these conclusions. Here we use a model of stochastic birth and death for gene family evolution and show that it can be efficiently applied to multispecies genome comparisons. This model takes into account the lengths of branches on phylogenetic trees, as well as duplication and deletion rates, and hence provides expectations for divergence in gene family size among lineages. The model offers both the opportunity to identify large-scale patterns in genome evolution and the ability to make stronger inferences regarding the role of natural selection in gene family expansion or contraction. We apply our method to data from the genomes of five yeast species to show its applicability.

Asunto(s)

Evolución Molecular , Genómica/métodos , Modelos Genéticos , Familia de Multigenes/genética , Funciones de Verosimilitud , Filogenia , Saccharomyces/genética , Procesos Estocásticos

17.

Discovering transcriptional modules from motif, chip-chip and microarray data.

De Bie, Tijl; Monsieurs, Pieter; Engelen, Kristof; De Moor, Bart; Cristianini, Nello; Marchal, Kathleen.

Pac Symp Biocomput ; : 483-94, 2005.

Artículo en Inglés | MEDLINE | ID: mdl-15759653

RESUMEN

We present a method for inference of transcriptional modules from heterogeneous data sources. It allows identifying the responsible set of regulators in combination with their corresponding DNA recognition sites (motifs) and target genes. Our approach distinguishes itself from previous work in literature because it fully exploits the knowledge of three independently acquired data sources: ChIP-chip data; motif information as obtained by phylogenetic shadowing; and gene expression profiles obtained using microarray experiments. Moreover, these three data sources are dealt with in a new and fully integrated manner. By avoiding approaches that take the different data sources into account sequentially or iteratively, the transparency of the method and the interpretability of the results are ensured. Using our method on biological data demonstrated the biological relevance of the inference.

Asunto(s)

Análisis de Secuencia por Matrices de Oligonucleótidos , Saccharomyces cerevisiae/genética , Transcripción Genética , Algoritmos , Ciclo Celular/genética , Proteínas Fúngicas/genética , Modelos Genéticos , Reproducibilidad de los Resultados , Ribosomas/genética , Saccharomyces cerevisiae/citología

18.

A statistical framework for genomic data fusion.

Lanckriet, Gert R G; De Bie, Tijl; Cristianini, Nello; Jordan, Michael I; Noble, William Stafford.

Bioinformatics ; 20(16): 2626-35, 2004 Nov 01.

Artículo en Inglés | MEDLINE | ID: mdl-15130933

RESUMEN

MOTIVATION: During the past decade, the new focus on genomics has highlighted a particular challenge: to integrate the different views of the genome that are provided by various types of experimental data. RESULTS: This paper describes a computational framework for integrating and drawing inferences from a collection of genome-wide measurements. Each dataset is represented via a kernel function, which defines generalized similarity relationships between pairs of entities, such as genes or proteins. The kernel representation is both flexible and efficient, and can be applied to many different types of data. Furthermore, kernel functions derived from different types of data can be combined in a straightforward fashion. Recent advances in the theory of kernel methods have provided efficient algorithms to perform such combinations in a way that minimizes a statistical loss function. These methods exploit semidefinite programming techniques to reduce the problem of finding optimizing kernel combinations to a convex optimization problem. Computational experiments performed using yeast genome-wide datasets, including amino acid sequences, hydropathy profiles, gene expression data and known protein-protein interactions, demonstrate the utility of this approach. A statistical learning algorithm trained from all of these data to recognize particular classes of proteins--membrane proteins and ribosomal proteins--performs significantly better than the same algorithm trained on any single type of data. AVAILABILITY: Supplementary data at http://noble.gs.washington.edu/proj/sdp-svm

Asunto(s)

Algoritmos , Mapeo Cromosómico/métodos , Bases de Datos de Proteínas , Perfilación de la Expresión Génica/métodos , Modelos Genéticos , Proteínas/genética , Análisis de Secuencia de Proteína/métodos , Inteligencia Artificial , Bases de Datos Genéticas , Proteínas Fúngicas/química , Proteínas Fúngicas/genética , Genómica/métodos , Almacenamiento y Recuperación de la Información/métodos , Proteínas de la Membrana/genética , Proteínas de la Membrana/metabolismo , Modelos Estadísticos , Reconocimiento de Normas Patrones Automatizadas , Proteínas/análisis , Proteínas/química , Proteínas/clasificación , Proteínas Ribosómicas/química , Proteínas Ribosómicas/genética , Alineación de Secuencia , Homología de Secuencia de Aminoácido , Integración de Sistemas

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA