Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
BMC Genomics ; 23(1): 198, 2022 Mar 12.
Artigo em Inglês | MEDLINE | ID: mdl-35279098

RESUMO

BACKGROUND: Sphaerophoria rueppellii, a European species of hoverfly, is a highly effective beneficial predator of hemipteran crop pests including aphids, thrips and coleopteran/lepidopteran larvae in integrated pest management (IPM) programmes. It is also a key pollinator of a wide variety of important agricultural crops. No genomic information is currently available for S. rueppellii. Without genomic information for such beneficial predator species, we are unable to perform comparative analyses of insecticide target-sites and genes encoding metabolic enzymes potentially responsible for insecticide resistance, between crop pests and their predators. These metabolic mechanisms include several gene families - cytochrome P450 monooxygenases (P450s), ATP binding cassette transporters (ABCs), glutathione-S-transferases (GSTs), UDP-glycosyltransferases (UGTs) and carboxyl/choline esterases (CCEs). METHODS AND FINDINGS: In this study, a high-quality near-chromosome level de novo genome assembly (as well as a mitochondrial genome assembly) for S. rueppellii has been generated using a hybrid approach with PacBio long-read and Illumina short-read data, followed by super scaffolding using Hi-C data. The final assembly achieved a scaffold N50 of 87Mb, a total genome size of 537.6Mb and a level of completeness of 96% using a set of 1,658 core insect genes present as full-length genes. The assembly was annotated with 14,249 protein-coding genes. Comparative analysis revealed gene expansions of CYP6Zx P450s, epsilon-class GSTs, dietary CCEs and multiple UGT families (UGT37/302/308/430/431). Conversely, ABCs, delta-class GSTs and non-CYP6Zx P450s showed limited expansion. Differences were seen in the distributions of resistance-associated gene families across subfamilies between S. rueppellii and some hemipteran crop pests. Additionally, S. rueppellii had larger numbers of detoxification genes than other pollinator species. CONCLUSION AND SIGNIFICANCE: This assembly is the first published genome for a predatory member of the Syrphidae family and will serve as a useful resource for further research into selectivity and potential tolerance of insecticides by beneficial predators. Furthermore, the expansion of some gene families often linked to insecticide resistance and selectivity may be an indicator of the capacity of this predator to detoxify IPM selective insecticides. These findings could be exploited by targeted insecticide screens and functional studies to increase effectiveness of IPM strategies, which aim to increase crop yields by sustainably and effectively controlling pests without impacting beneficial predator populations.


Assuntos
Dípteros , Inseticidas , Animais , Cromossomos , Dípteros/genética , Tamanho do Genoma , Humanos , Resistência a Inseticidas/genética , Inseticidas/farmacologia
2.
BMC Genomics ; 23(1): 45, 2022 Jan 11.
Artigo em Inglês | MEDLINE | ID: mdl-35012450

RESUMO

BACKGROUND: Orius laevigatus, a minute pirate bug, is a highly effective beneficial predator of crop pests including aphids, spider mites and thrips in integrated pest management (IPM) programmes. No genomic information is currently available for O. laevigatus, as is the case for the majority of beneficial predators which feed on crop pests. In contrast, genomic information for crop pests is far more readily available. The lack of publicly available genomes for beneficial predators to date has limited our ability to perform comparative analyses of genes encoding potential insecticide resistance mechanisms between crop pests and their predators. These mechanisms include several gene/protein families including cytochrome P450s (P450s), ATP binding cassette transporters (ABCs), glutathione S-transferases (GSTs), UDP-glucosyltransferases (UGTs) and carboxyl/cholinesterases (CCEs). METHODS AND FINDINGS: In this study, a high-quality scaffold level de novo genome assembly for O. laevigatus has been generated using a hybrid approach with PacBio long-read and Illumina short-read data. The final assembly achieved a scaffold N50 of 125,649 bp and a total genome size of 150.98 Mb. The genome assembly achieved a level of completeness of 93.6% using a set of 1658 core insect genes present as full-length genes. Genome annotation identified 15,102 protein-coding genes - 87% of which were assigned a putative function. Comparative analyses revealed gene expansions of sigma class GSTs and CYP3 P450s. Conversely the UGT gene family showed limited expansion. Differences were seen in the distributions of resistance-associated gene families at the subfamily level between O. laevigatus and some of its targeted crop pests. A target site mutation in ryanodine receptors (I4790M, PxRyR) which has strong links to diamide resistance in crop pests and had previously only been identified in lepidopteran species was found to also be present in hemipteran species, including O. laevigatus. CONCLUSION AND SIGNIFICANCE: This assembly is the first published genome for the Anthocoridae family and will serve as a useful resource for further research into target-site selectivity issues and potential resistance mechanisms in beneficial predators. Furthermore, the expansion of gene families often linked to insecticide resistance may be an indicator of the capacity of this predator to detoxify selective insecticides. These findings could be exploited by targeted pesticide screens and functional studies to increase effectiveness of IPM strategies, which aim to increase crop yields by sustainably, environmentally-friendly and effectively control pests without impacting beneficial predator populations.


Assuntos
Heterópteros , Inseticidas , Tisanópteros , Animais , Genoma , Humanos , Resistência a Inseticidas
3.
iScience ; 24(6): 102499, 2021 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-34308279

RESUMO

Male honeybees (drones) are thought to congregate in large numbers in particular "drone congregation areas" to mate. We used harmonic radar to record the flight paths of individual drones and found that drones favored certain locations within the landscape which were stable over two years. Drones often visit multiple potential lekking sites within a single flight and take shared flight paths between them. Flights between such sites are relatively straight and begin as early as the drone's second flight, indicating familiarity with the sites acquired during initial learning flights. Arriving at congregation areas, drones display convoluted, looping flight patterns. We found a correlation between a drone's distance from the center of each area and its acceleration toward the center, a signature of collective behavior leading to congregation in these areas. Our study reveals the behavior of individual drones as they navigate between and within multiple aerial leks.

4.
Sci Rep ; 11(1): 4087, 2021 02 18.
Artigo em Inglês | MEDLINE | ID: mdl-33602999

RESUMO

Despite intensive research, the aetiology of multiple sclerosis (MS) remains unknown. Cerebrospinal fluid proteomics has the potential to reveal mechanisms of MS pathogenesis, but analyses must account for disease heterogeneity. We previously reported explorative multivariate analysis by hierarchical clustering of proteomics data of MS patients and controls, which resulted in two groups of individuals. Grouping reflected increased levels of intrathecal inflammatory response proteins and decreased levels of proteins involved in neural development in one group relative to the other group. MS patients and controls were present in both groups. Here we reanalysed these data and we also reanalysed data from an independent cohort of patients diagnosed with clinically isolated syndrome (CIS), who have symptoms of MS without evidence of dissemination in space and/or time. Some, but not all, CIS patients had intrathecal inflammation. The analyses reported here identified a common protein signature of MS/CIS that was not linked to elevated intrathecal inflammation. The signature included low levels of complement proteins, semaphorin-7A, reelin, neural cell adhesion molecules, inter-alpha-trypsin inhibitor heavy chain H2, transforming growth factor beta 1, follistatin-related protein 1, malate dehydrogenase 1 cytoplasmic, plasma retinol-binding protein, biotinidase, and transferrin, all known to play roles in neural development. Low levels of these proteins suggest that MS/CIS patients suffer from abnormally low oxidative capacity that results in disrupted neural development from an early stage of the disease.


Assuntos
Proteínas do Líquido Cefalorraquidiano/análise , Esclerose Múltipla/líquido cefalorraquidiano , Proteoma/análise , Adolescente , Adulto , Biomarcadores/líquido cefalorraquidiano , Estudos de Casos e Controles , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Esclerose Múltipla/patologia , Adulto Jovem
5.
J Integr Bioinform ; 15(3)2018 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-30085931

RESUMO

The speed and accuracy of new scientific discoveries - be it by humans or artificial intelligence - depends on the quality of the underlying data and on the technology to connect, search and share the data efficiently. In recent years, we have seen the rise of graph databases and semi-formal data models such as knowledge graphs to facilitate software approaches to scientific discovery. These approaches extend work based on formalised models, such as the Semantic Web. In this paper, we present our developments to connect, search and share data about genome-scale knowledge networks (GSKN). We have developed a simple application ontology based on OWL/RDF with mappings to standard schemas. We are employing the ontology to power data access services like resolvable URIs, SPARQL endpoints, JSON-LD web APIs and Neo4j-based knowledge graphs. We demonstrate how the proposed ontology and graph databases considerably improve search and access to interoperable and reusable biological knowledge (i.e. the FAIRness data principles).


Assuntos
Biologia Computacional/métodos , Gráficos por Computador , Redes Reguladoras de Genes , Genoma Humano , Software , Bases de Dados Factuais , Estudo de Associação Genômica Ampla , Humanos , Conhecimento
6.
Sci Data ; 5: 180072, 2018 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-29762552

RESUMO

The electronic Rothamsted Archive, e-RA (www.era.rothamsted.ac.uk) provides a permanent managed database to both securely store and disseminate data from Rothamsted Research's long-term field experiments (since 1843) and meteorological stations (since 1853). Both historical and contemporary data are made available via this online database which provides the scientific community with access to a unique continuous record of agricultural experiments and weather measured since the mid-19th century. Qualitative information, such as treatment and management practices, plans and soil information, accompanies the data and are made available on the e-RA website. e-RA was released externally to the wider scientific community in 2013 and this paper describes its development, content, curation and the access process for data users. Case studies illustrate the diverse applications of the data, including its original intended purposes and recent unforeseen applications. Usage monitoring demonstrates the data are of increasing interest. Future developments, including adopting FAIR data principles, are proposed as the resource is increasingly recognised as a unique archive of data relevant to sustainable agriculture, agroecology and the environment.

7.
F1000Res ; 7: 1651, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30755790

RESUMO

KnetMaps is a BioJS component for the interactive visualization of biological knowledge networks. It is well suited for applications that need to visualise complementary, connected and content-rich data in a single view in order to help users to traverse pathways linking entities of interest, for example to go from genotype to phenotype. KnetMaps loads data in JSON format, visualizes the structure and content of knowledge networks using lightweight JavaScript libraries, and supports interactive touch gestures. KnetMaps uses effective visualization techniques to prevent information overload and to allow researchers to progressively build their knowledge.


Assuntos
Biologia , Conhecimento , Software , Interface Usuário-Computador
8.
J Integr Bioinform ; 14(1)2017 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-28609292

RESUMO

Genetics and "omics" studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future.


Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Genes , Estudos de Associação Genética/métodos , Genótipo , Fenótipo , Animais , Humanos
9.
Bioinformatics ; 33(7): 1096-1098, 2017 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-27993779

RESUMO

Summary: The goal of this work is to offer a computational framework for exploring data from the Recon2 human metabolic reconstruction model. Advanced user access features have been developed using the Neo4j graph database technology and this paper describes key features such as efficient management of the network data, examples of the network querying for addressing particular tasks, and how query results are converted back to the Systems Biology Markup Language (SBML) standard format. The Neo4j-based metabolic framework facilitates exploration of highly connected and comprehensive human metabolic data and identification of metabolic subnetworks of interest. A Java-based parser component has been developed to convert query results (available in the JSON format) into SBML and SIF formats in order to facilitate further results exploration, enhancement or network sharing. Availability and Implementation: The Neo4j-based metabolic framework is freely available from: https://diseaseknowledgebase.etriks.org/metabolic/browser/ . The java code files developed for this work are available from the following url: https://github.com/ibalaur/MetabolicFramework . Contact: ibalaur@eisbm.org. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Redes e Vias Metabólicas , Software , Gráficos por Computador , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Genoma , Humanos , Redes e Vias Metabólicas/genética , Modelos Biológicos
10.
J Comput Biol ; 24(10): 969-980, 2017 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-27627442

RESUMO

The development of colorectal cancer (CRC)-the third most common cancer type-has been associated with deregulations of cellular mechanisms stimulated by both genetic and epigenetic events. StatEpigen is a manually curated and annotated database, containing information on interdependencies between genetic and epigenetic signals, and specialized currently for CRC research. Although StatEpigen provides a well-developed graphical user interface for information retrieval, advanced queries involving associations between multiple concepts can benefit from more detailed graph representation of the integrated data. This can be achieved by using a graph database (NoSQL) approach. Data were extracted from StatEpigen and imported to our newly developed EpiGeNet, a graph database for storage and querying of conditional relationships between molecular (genetic and epigenetic) events observed at different stages of colorectal oncogenesis. We illustrate the enhanced capability of EpiGeNet for exploration of different queries related to colorectal tumor progression; specifically, we demonstrate the query process for (i) stage-specific molecular events, (ii) most frequently observed genetic and epigenetic interdependencies in colon adenoma, and (iii) paths connecting key genes reported in CRC and associated events. The EpiGeNet framework offers improved capability for management and visualization of data on molecular events specific to CRC initiation and progression.


Assuntos
Neoplasias Colorretais/genética , Biologia Computacional/métodos , Gráficos por Computador , Epigênese Genética , Redes Reguladoras de Genes , Software , Bases de Dados Factuais , Humanos
11.
Appl Transl Genom ; 11: 18-26, 2016 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-28018846

RESUMO

The chances of raising crop productivity to enhance global food security would be greatly improved if we had a complete understanding of all the biological mechanisms that underpinned traits such as crop yield, disease resistance or nutrient and water use efficiency. With more crop genomes emerging all the time, we are nearer having the basic information, at the gene-level, to begin assembling crop gene catalogues and using data from other plant species to understand how the genes function and how their interactions govern crop development and physiology. Unfortunately, the task of creating such a complete knowledge base of gene functions, interaction networks and trait biology is technically challenging because the relevant data are dispersed in myriad databases in a variety of data formats with variable quality and coverage. In this paper we present a general approach for building genome-scale knowledge networks that provide a unified representation of heterogeneous but interconnected datasets to enable effective knowledge mining and gene discovery. We describe the datasets and outline the methods, workflows and tools that we have developed for creating and visualising these networks for the major crop species, wheat and barley. We present the global characteristics of such knowledge networks and with an example linking a seed size phenotype to a barley WRKY transcription factor orthologous to TTG2 from Arabidopsis, we illustrate the value of integrated data in biological knowledge discovery. The software we have developed (www.ondex.org) and the knowledge resources (http://knetminer.rothamsted.ac.uk) we have created are all open-source and provide a first step towards systematic and evidence-based gene discovery in order to facilitate crop improvement.

12.
BioData Min ; 9: 23, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27462371

RESUMO

BACKGROUND: Systems biology experiments generate large volumes of data of multiple modalities and this information presents a challenge for integration due to a mix of complexity together with rich semantics. Here, we describe how graph databases provide a powerful framework for storage, querying and envisioning of biological data. RESULTS: We show how graph databases are well suited for the representation of biological information, which is typically highly connected, semi-structured and unpredictable. We outline an application case that uses the Neo4j graph database for building and querying a prototype network to provide biological context to asthma related genes. CONCLUSIONS: Our study suggests that graph databases provide a flexible solution for the integration of multiple types of biological data and facilitate exploratory data mining to support hypothesis generation.

13.
Front Genet ; 5: 21, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-24600467

RESUMO

Network inference utilizes experimental high-throughput data for the reconstruction of molecular interaction networks where new relationships between the network entities can be predicted. Despite the increasing amount of experimental data, the parameters of each modeling technique cannot be optimized based on the experimental data alone, but needs to be qualitatively assessed if the components of the resulting network describe the experimental setting. Candidate list prioritization and validation builds upon data integration and data visualization. The application of tools supporting this procedure is limited to the exploration of smaller information networks because the display and interpretation of large amounts of information is challenging regarding the computational effort and the users' experience. The Ondex software framework was extended with customizable context-sensitive menus which allow additional integration and data analysis options for a selected set of candidates during interactive data exploration. We provide new functionalities for on-the-fly data integration using InterProScan, PubMed Central literature search, and sequence-based homology search. We applied the Ondex system to the integration of publicly available data for Aspergillus nidulans and analyzed transcriptome data. We demonstrate the advantages of our approach by proposing new hypotheses for the functional annotation of specific genes of differentially expressed fungal gene clusters. Our extension of the Ondex framework makes it possible to overcome the separation between data integration and interactive analysis. More specifically, computationally demanding calculations can be performed on selected sub-networks without losing any information from the whole network. Furthermore, our extensions allow for direct access to online biological databases which helps to keep the integrated information up-to-date.

14.
Bioinformatics ; 30(7): 1034-5, 2014 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-24363379

RESUMO

SUMMARY: Ondex Web is a new web-based implementation of the network visualization and exploration tools from the Ondex data integration platform. New features such as context-sensitive menus and annotation tools provide users with intuitive ways to explore and manipulate the appearance of heterogeneous biological networks. Ondex Web is open source, written in Java and can be easily embedded into Web sites as an applet. Ondex Web supports loading data from a variety of network formats, such as XGMML, NWB, Pajek and OXL. AVAILABILITY AND IMPLEMENTATION: http://ondex.rothamsted.ac.uk/OndexWeb.


Assuntos
Biologia/métodos , Software , Mineração de Dados , Internet , Redes e Vias Metabólicas
15.
J Mol Biol ; 425(1): 186-97, 2013 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-23103756

RESUMO

Increasingly, experimental data on biological systems are obtained from several sources and computational approaches are required to integrate this information and derive models for the function of the system. Here, we demonstrate the power of a logic-based machine learning approach to propose hypotheses for gene function integrating information from two diverse experimental approaches. Specifically, we use inductive logic programming that automatically proposes hypotheses explaining the empirical data with respect to logically encoded background knowledge. We study the capsular polysaccharide biosynthetic pathway of the major human gastrointestinal pathogen Campylobacter jejuni. We consider several key steps in the formation of capsular polysaccharide consisting of 15 genes of which 8 have assigned function, and we explore the extent to which functions can be hypothesised for the remaining 7. Two sources of experimental data provide the information for learning-the results of knockout experiments on the genes involved in capsule formation and the absence/presence of capsule genes in a multitude of strains of different serotypes. The machine learning uses the pathway structure as background knowledge. We propose assignments of specific genes to five previously unassigned reaction steps. For four of these steps, there was an unambiguous optimal assignment of gene to reaction, and to the fifth, there were three candidate genes. Several of these assignments were consistent with additional experimental results. We therefore show that the logic-based methodology provides a robust strategy to integrate results from different experimental approaches and propose hypotheses for the behaviour of a biological system.


Assuntos
Inteligência Artificial , Campylobacter jejuni/metabolismo , Lógica , Modelos Biológicos , Polissacarídeos Bacterianos/genética , Biologia de Sistemas/métodos , Cápsulas Bacterianas/genética , Cápsulas Bacterianas/metabolismo , Vias Biossintéticas/genética , Campylobacter jejuni/genética , Técnicas de Inativação de Genes , Genes Bacterianos/genética , Genes Bacterianos/fisiologia , Glicômica , Metabolômica , Anotação de Sequência Molecular , Mutação , Análise de Sequência com Séries de Oligonucleotídeos , Fenótipo , Polissacarídeos Bacterianos/metabolismo
16.
J Integr Plant Biol ; 54(5): 345-55, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-22494395

RESUMO

Associating phenotypic traits and quantitative trait loci (QTL) to causative regions of the underlying genome is a key goal in agricultural research. InterStoreDB is a suite of integrated databases designed to assist in this process. The individual databases are species independent and generic in design, providing access to curated datasets relating to plant populations, phenotypic traits, genetic maps, marker loci and QTL, with links to functional gene annotation and genomic sequence data. Each component database provides access to associated metadata, including data provenance and parameters used in analyses, thus providing users with information to evaluate the relative worth of any associations identified. The databases include CropStoreDB, for management of population, genetic map, QTL and trait measurement data, SeqStoreDB for sequence-related data and AlignStoreDB, which stores sequence alignment information, and allows navigation between genetic and genomic datasets. Genetic maps are visualized and compared using the CMAP tool, and functional annotation from sequenced genomes is provided via an EnsEMBL-based genome browser. This framework facilitates navigation of the multiple biological domains involved in genetics and genomics research in a transparent manner within a single portal. We demonstrate the value of InterStoreDB as a tool for Brassica research. InterStoreDB is available from: http://www.interstoredb.org.


Assuntos
Bases de Dados Genéticas , Genômica , Software , Brassica/genética , Produtos Agrícolas/genética , Genes de Plantas/genética , Locos de Características Quantitativas/genética , Alinhamento de Sequência
17.
BMC Bioinformatics ; 12: 431, 2011 Nov 03.
Artigo em Inglês | MEDLINE | ID: mdl-22054122

RESUMO

BACKGROUND: In response to the rapid growth of available genome sequences, efforts have been made to develop automatic inference methods to functionally characterize them. Pipelines that infer functional annotation are now routinely used to produce new annotations at a genome scale and for a broad variety of species. These pipelines differ widely in their inference algorithms, confidence thresholds and data sources for reasoning. This heterogeneity makes a comparison of the relative merits of each approach extremely complex. The evaluation of the quality of the resultant annotations is also challenging given there is often no existing gold-standard against which to evaluate precision and recall. RESULTS: In this paper, we present a pragmatic approach to the study of functional annotations. An ensemble of 12 metrics, describing various aspects of functional annotations, is defined and implemented in a unified framework, which facilitates their systematic analysis and inter-comparison. The use of this framework is demonstrated on three illustrative examples: analysing the outputs of state-of-the-art inference pipelines, comparing electronic versus manual annotation methods, and monitoring the evolution of publicly available functional annotations. The framework is part of the AIGO library (http://code.google.com/p/aigo) for the Analysis and the Inter-comparison of the products of Gene Ontology (GO) annotation pipelines. The AIGO library also provides functionalities to easily load, analyse, manipulate and compare functional annotations and also to plot and export the results of the analysis in various formats. CONCLUSIONS: This work is a step toward developing a unified framework for the systematic study of GO functional annotations. This framework has been designed so that new metrics on GO functional annotations can be added in a very straightforward way.


Assuntos
Bovinos/genética , Genômica/métodos , Anotação de Sequência Molecular , Vocabulário Controlado , Algoritmos , Animais , Mapeamento Cromossômico , Bases de Dados Genéticas , Genoma , Humanos
18.
BMC Bioinformatics ; 12: 203, 2011 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-21612636

RESUMO

BACKGROUND: Combining multiple evidence-types from different information sources has the potential to reveal new relationships in biological systems. The integrated information can be represented as a relationship network, and clustering the network can suggest possible functional modules. The value of such modules for gaining insight into the underlying biological processes depends on their functional coherence. The challenges that we wish to address are to define and quantify the functional coherence of modules in relationship networks, so that they can be used to infer function of as yet unannotated proteins, to discover previously unknown roles of proteins in diseases as well as for better understanding of the regulation and interrelationship between different elements of complex biological systems. RESULTS: We have defined the functional coherence of modules with respect to the Gene Ontology (GO) by considering two complementary aspects: (i) the fragmentation of the GO functional categories into the different modules and (ii) the most representative functions of the modules. We have proposed a set of metrics to evaluate these two aspects and demonstrated their utility in Arabidopsis thaliana. We selected 2355 proteins for which experimentally established protein-protein interaction (PPI) data were available. From these we have constructed five relationship networks, four based on single types of data: PPI, co-expression, co-occurrence of protein names in scientific literature abstracts and sequence similarity and a fifth one combining these four evidence types. The ability of these networks to suggest biologically meaningful grouping of proteins was explored by applying Markov clustering and then by measuring the functional coherence of the clusters. CONCLUSIONS: Relationship networks integrating multiple evidence-types are biologically informative and allow more proteins to be assigned to a putative functional module. Using additional evidence types concentrates the functional annotations in a smaller number of modules without unduly compromising their consistency. These results indicate that integration of more data sources improves the ability to uncover functional association between proteins, both by allowing more proteins to be linked and producing a network where modular structure more closely reflects the hierarchy in the gene ontology.


Assuntos
Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Arabidopsis/metabolismo , Metabolômica/métodos , Algoritmos , Proteínas de Arabidopsis/genética , Análise por Conglomerados , Bases de Dados Genéticas , Cadeias de Markov , Redes e Vias Metabólicas
19.
J Integr Bioinform ; 7(3)2010 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-20375451

RESUMO

High throughput genomic studies can identify large numbers of potential candidate genes, which must be interpreted and filtered by investigators to select the best ones for further analysis. Prioritization is generally based on evidence that supports the role of a gene product in the biological process being investigated. The two most important bodies of information providing such evidence are bioinformatics databases and the scientific literature. In this paper we present an extension to the Ondex data integration framework that uses text mining techniques over Medline abstracts as a method for accessing both these bodies of evidence in a consistent way. In an example use case, we apply our method to create a knowledge base of Arabidopsis proteins implicated in plant stress response and use various scoring metrics to identify key protein-stress associations. In conclusion, we show that the additional text mining features are able to highlight proteins using the scientific literature that would not have been seen using data integration alone. Ondex is an open-source software project and can be downloaded, together with the text mining features described here, from www.ondex.org.


Assuntos
Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Mineração de Dados , Estresse Fisiológico , Etilenos/metabolismo , Reprodutibilidade dos Testes , Estatística como Assunto
20.
Brief Bioinform ; 10(6): 676-93, 2009 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-19933213

RESUMO

The development of a systems based approach to problems in plant sciences requires integration of existing information resources. However, the available information is currently often incomplete and dispersed across many sources and the syntactic and semantic heterogeneity of the data is a challenge for integration. In this article, we discuss strategies for data integration and we use a graph based integration method (Ondex) to illustrate some of these challenges with reference to two example problems concerning integration of (i) metabolic pathway and (ii) protein interaction data for Arabidopsis thaliana. We quantify the degree of overlap for three commonly used pathway and protein interaction information sources. For pathways, we find that the AraCyc database contains the widest coverage of enzyme reactions and for protein interactions we find that the IntAct database provides the largest unique contribution to the integrated dataset. For both examples, however, we observe a relatively small amount of data common to all three sources. Analysis and visual exploration of the integrated networks was used to identify a number of practical issues relating to the interpretation of these datasets. We demonstrate the utility of these approaches to the analysis of groups of coexpressed genes from an individual microarray experiment, in the context of pathway information and for the combination of coexpression data with an integrated protein interaction network.


Assuntos
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Mapeamento Cromossômico/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Genoma de Planta/genética , Armazenamento e Recuperação da Informação/métodos , Mapeamento de Interação de Proteínas/métodos , Integração de Sistemas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...