Pesquisa | Portal Regional da BVS (teste)

1.

Large language model based framework for automated extraction of genetic interactions from unstructured data.

Gill, Jaskaran Kaur; Chetty, Madhu; Lim, Suryani; Hallinan, Jennifer.

PLoS One ; 19(5): e0303231, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38771886

RESUMO

Extracting biological interactions from published literature helps us understand complex biological systems, accelerate research, and support decision-making in drug or treatment development. Despite efforts to automate the extraction of biological relations using text mining tools and machine learning pipelines, manual curation continues to serve as the gold standard. However, the rapidly increasing volume of literature pertaining to biological relations poses challenges in its manual curation and refinement. These challenges are further compounded because only a small fraction of the published literature is relevant to biological relation extraction, and the embedded sentences of relevant sections have complex structures, which can lead to incorrect inference of relationships. To overcome these challenges, we propose GIX, an automated and robust Gene Interaction Extraction framework, based on pre-trained Large Language models fine-tuned through extensive evaluations on various gene/protein interaction corpora including LLL and RegulonDB. GIX identifies relevant publications with minimal keywords, optimises sentence selection to reduce computational overhead, simplifies sentence structure while preserving meaning, and provides a confidence factor indicating the reliability of extracted relations. GIX's Stage-2 relation extraction method performed well on benchmark protein/gene interaction datasets, assessed using 10-fold cross-validation, surpassing state-of-the-art approaches. We demonstrated that the proposed method, although fully automated, performs as well as manual relation extraction, with enhanced robustness. We also observed GIX's capability to augment existing datasets with new sentences, incorporating newly discovered biological terms and processes. Further, we demonstrated GIX's real-world applicability in inferring E. coli gene circuits.

Assuntos

Mineração de Dados , Mineração de Dados/métodos , Processamento de Linguagem Natural , Aprendizado de Máquina , Biologia Computacional/métodos , Humanos , Algoritmos

2.

MICFuzzy: A maximal information content based fuzzy approach for reconstructing genetic networks.

Nakulugamuwa Gamage, Hasini; Chetty, Madhu; Lim, Suryani; Hallinan, Jennifer.

PLoS One ; 18(7): e0288174, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37418430

RESUMO

In systems biology, the accurate reconstruction of Gene Regulatory Networks (GRNs) is crucial since these networks can facilitate the solving of complex biological problems. Amongst the plethora of methods available for GRN reconstruction, information theory and fuzzy concepts-based methods have abiding popularity. However, most of these methods are not only complex, incurring a high computational burden, but they may also produce a high number of false positives, leading to inaccurate inferred networks. In this paper, we propose a novel hybrid fuzzy GRN inference model called MICFuzzy which involves the aggregation of the effects of Maximal Information Coefficient (MIC). This model has an information theory-based pre-processing stage, the output of which is applied as an input to the novel fuzzy model. In this preprocessing stage, the MIC component filters relevant genes for each target gene to significantly reduce the computational burden of the fuzzy model when selecting the regulatory genes from these filtered gene lists. The novel fuzzy model uses the regulatory effect of the identified activator-repressor gene pairs to determine target gene expression levels. This approach facilitates accurate network inference by generating a high number of true regulatory interactions while significantly reducing false regulatory predictions. The performance of MICFuzzy was evaluated using DREAM3 and DREAM4 challenge data, and the SOS real gene expression dataset. MICFuzzy outperformed the other state-of-the-art methods in terms of F-score, Matthews Correlation Coefficient, Structural Accuracy, and SS_mean, and outperformed most of them in terms of efficiency. MICFuzzy also had improved efficiency compared with the classical fuzzy model since the design of MICFuzzy leads to a reduction in combinatorial computation.

Assuntos

Algoritmos , Biologia Computacional , Biologia Computacional/métodos , Redes Reguladoras de Genes , Biologia de Sistemas , Genes Reguladores , Modelos Genéticos

3.

Computational intelligence and machine learning in bioinformatics and computational biology.

Chetty, Madhu; Hallinan, Jennifer; Ruz, Gonzalo A; Wipat, Anil.

Biosystems ; 222: 104792, 2022 12.

Artigo em Inglês | MEDLINE | ID: mdl-36209915

Assuntos

Inteligência Artificial , Biologia Computacional , Aprendizado de Máquina

4.

Filter feature selection based Boolean Modelling for Genetic Network Inference.

Gamage, Hasini Nakulugamuwa; Chetty, Madhu; Shatte, Adrian; Hallinan, Jennifer.

Biosystems ; 221: 104757, 2022 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-36007675

RESUMO

The reconstruction of Gene Regulatory Networks (GRNs) from time series gene expression data is highly relevant for the discovery of complex biological interactions and dynamics. Various computational strategies have been developed for this task, but most approaches have low computational efficiency and are not able to cope with high-dimensional, low sample-number, gene expression data. In this paper, we introduce a novel combined filter feature selection approach for efficient and accurate inference of GRNs. A Boolean framework for network modelling is used to demonstrate the efficacy of the proposed approach. Using discretized microarray expression data, the genes most relevant to each target gene are first filtered using ReliefF, an instance-based feature ranking method that is here applied for the first time to GRN inference. Then, further gene selection from the filtered-gene list is done using a mutual information-based min-redundancy max-relevance criterion by eliminating irrelevant genes. This combined method is executed on resampled datasets to finalize the optimal set of regulatory genes. Building upon our previous research, a Pearson correlation coefficient-based Boolean modelling approach is utilized for the efficient identification of the optimal regulatory rules associated with selected regulatory genes. The proposed approach was evaluated using gene expression datasets from small-scale and medium-scale real gene networks, and was observed to be more effective than Linear Discriminant Analysis, performed better than the individual feature selection methods, and obtained improved Structural Accuracy with a higher number of true positives than other state-of-the-art methods, while outperforming these methods with respect to Dynamic Accuracy and efficiency.

Assuntos

Algoritmos , Redes Reguladoras de Genes , Biologia Computacional/métodos , Redes Reguladoras de Genes/genética , Fatores de Tempo

5.

Combining kinetic orders for efficient S-System modelling of gene regulatory network.

Gill, Jaskaran; Chetty, Madhu; Shatte, Adrian; Hallinan, Jennifer.

Biosystems ; 220: 104736, 2022 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-35863700

RESUMO

S-System models, non-linear differential equation models, are widely used for reconstructing gene regulatory networks from temporal gene expression data. An S-System model involves two states, generation and degeneration, and uses the kinetic parameters gij and hij, to represent the direction, nature, and intensity of the genetic interactions. The need for learning a large number of model parameters results in increased computational expense. Previously, we improved the performance of the algorithm using dynamic allocation of the maximum in-degree for each gene. While the method was effective for smaller networks, a large amount of computation was still needed for larger networks. This problem arose mainly due to the increased occurrence of invalid networks during optimization, primarily because the two kinetic parameters (gij and hij) of the S-System model converge independently during optimization. Being independent, these two parameters can converge to values that can indicate contradictory gene interactions, specifically inhibition or activation. In this study, to address this major challenge in S-System modelling, we developed a novel method that includes two features: a penalty term that penalizes those networks with invalid kinetic orders, and a parameter, wij, derived by combining the kinetic parameters gij and hij. The novel penalty term was used for candidate selection during the process of optimizing the DRNI (Dynamically Regulated Network Initialization) algorithm. Rather than remaining constant, it is dynamic, with its magnitude dependent on the number of invalid interactions in the given network. This approach encourages the generation of valid candidate solutions, and eliminates invalid networks in a systematic manner. The previous DRNI method, a two-stage approach which uses dynamic allocation of the maximum in-degree for each gene, was further improved by adding a third stage which applies the proposed wij to handle the invalid regulations that may still exist in that candidate solutions. The method was tested on different gene expression datasets, and was able to reduce the number of iterations and produce improved network accuracies. For a 20 gene network, the number of generations required for convergence was reduced by 300, and the F-score improved by 0.05 compared to our previously reported DRNI approach. For the well-known 10 gene networks of the DREAM challenge, our method produced an improvement in the average area under the ROC curve of the DREAM4 10 gene networks.

Assuntos

Biologia Computacional , Redes Reguladoras de Genes , Algoritmos , Biologia Computacional/métodos , Redes Reguladoras de Genes/genética , Cinética

6.

Modelling the fitness landscapes of a SCRaMbLEd yeast genome.

Yang, Bill; Misirli, Goksel; Wipat, Anil; Hallinan, Jennifer.

Biosystems ; 219: 104730, 2022 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-35772570

RESUMO

The use of microorganisms for the production of industrially important compounds and enzymes is becoming increasingly important. Eukaryotes have been less widely used than prokaryotes in biotechnology, because of the complexity of their genomic structure and biology. The Yeast2.0 project is an international effort to engineer the yeast Saccharomyces cerevisiae to make it easy to manipulate, and to generate random variants using a system called SCRaMbLE. SCRaMbLE relies on artificial evolution in vitro to identify useful variants, an approach which is time consuming and expensive. We developed an in silico simulator for the SCRaMbLE system, using an evolutionary computing approach, which can be used to investigate and optimize the fitness landscape of the system. We applied the system to the investigation of the fitness landscape of one of the S. saccharomyces chromosomes, and found that our results fitted well with those previously published. We then simulated directed evolution with or without manipulation of SCRaMbLE, and revealed that controlling the SCRaMbLE process could effectively impact directed evolution. Our simulator can be applied to the analysis of the fitness landscapes of any organism for which SCRaMbLE has been implemented.

Assuntos

Genoma Fúngico , Saccharomyces cerevisiae , Cromossomos , Aptidão Genética/genética , Genoma Fúngico/genética , Genômica , Saccharomyces cerevisiae/genética

7.

B-cell activity markers are associated with different disease activity domains in primary Sjögren's syndrome.

James, Katherine; Chipeta, Chimwemwe; Parker, Antony; Harding, Stephen; Cockell, Simon J; Gillespie, Colin S; Hallinan, Jennifer; Barone, Francesca; Bowman, Simon J; Ng, Wan-Fai; Fisher, Benjamin A.

Rheumatology (Oxford) ; 57(7): 1222-1227, 2018 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-29608774

RESUMO

OBJECTIVES: B-cell activating factor (BAFF), ß-2 microglobulin (ß2M) and serum free light chains (FLCs) are elevated in primary SS (pSS) and associated with disease activity. We aimed to investigate their association with the individual disease activity domains of the EULAR Sjögren's Syndrome Disease Activity Index (ESSDAI) in a large well-characterized pSS cohort. METHODS: Sera from pSS patients enrolled in the UK Primary Sjögren's Syndrome Registry (UKPSSR) (n = 553) and healthy controls (n = 286) were analysed for FLC (κ and λ), BAFF and ß2 M. Pearson correlation coefficients were calculated for patient clinical characteristics, including salivary flow, Schirmer's test, EULAR Sjögren's Syndrome Patient Reported Index and serum IgG levels. Poisson regression was performed to identify independent predictors of total ESSDAI and ClinESSDAI (validated ESSDAI minus the biological domain) scores and their domains. RESULTS: Levels of BAFF, ß2M and FLCs were higher in pSS patients compared to controls. All three biomarkers associated significantly with the ESSDAI and the ClinESSDAI. BAFF associated with the peripheral nervous system domain of the ESSDAI, whereas ß2M and FLCs associated with the cutaneous, biological and renal domains. Multivariate analysis showed BAFF, ß2M and their interaction to be independent predictors of ESSDAI/ClinESSDAI. FLCs were also shown to associate with the ESSDAI/ClinESSDAI but not independent of serum IgG. CONCLUSION: All biomarkers were associated with total ESSDAI scores but with differing domain associations. These findings should encourage further investigation of these biomarkers in longitudinal studies and against other disease activity measures.

8.

The SBOL Stack: A Platform for Storing, Publishing, and Sharing Synthetic Biology Designs.

Madsen, Curtis; McLaughlin, James Alastair; Misirli, Göksel; Pocock, Matthew; Flanagan, Keith; Hallinan, Jennifer; Wipat, Anil.

ACS Synth Biol ; 5(6): 487-97, 2016 06 17.

Artigo em Inglês | MEDLINE | ID: mdl-27268205

RESUMO

Recently, synthetic biologists have developed the Synthetic Biology Open Language (SBOL), a data exchange standard for descriptions of genetic parts, devices, modules, and systems. The goals of this standard are to allow scientists to exchange designs of biological parts and systems, to facilitate the storage of genetic designs in repositories, and to facilitate the description of genetic designs in publications. In order to achieve these goals, the development of an infrastructure to store, retrieve, and exchange SBOL data is necessary. To address this problem, we have developed the SBOL Stack, a Resource Description Framework (RDF) database specifically designed for the storage, integration, and publication of SBOL data. This database allows users to define a library of synthetic parts and designs as a service, to share SBOL data with collaborators, and to store designs of biological systems locally. The database also allows external data sources to be integrated by mapping them to the SBOL data model. The SBOL Stack includes two Web interfaces: the SBOL Stack API and SynBioHub. While the former is designed for developers, the latter allows users to upload new SBOL biological designs, download SBOL documents, search by keyword, and visualize SBOL data. Since the SBOL Stack is based on semantic Web technology, the inherent distributed querying functionality of RDF databases can be used to allow different SBOL stack databases to be queried simultaneously, and therefore, data can be shared between different institutes, centers, or other users.

Assuntos

Sistemas de Gerenciamento de Base de Dados , Linguagens de Programação , Biologia Sintética , Bases de Dados Factuais , Editoração

9.

Data Integration and Mining for Synthetic Biology Design.

Misirli, Göksel; Hallinan, Jennifer; Pocock, Matthew; Lord, Phillip; McLaughlin, James Alastair; Sauro, Herbert; Wipat, Anil.

ACS Synth Biol ; 5(10): 1086-1097, 2016 10 21.

Artigo em Inglês | MEDLINE | ID: mdl-27110921

RESUMO

One aim of synthetic biologists is to create novel and predictable biological systems from simpler modular parts. This approach is currently hampered by a lack of well-defined and characterized parts and devices. However, there is a wealth of existing biological information, which can be used to identify and characterize biological parts, and their design constraints in the literature and numerous biological databases. However, this information is spread among these databases in many different formats. New computational approaches are required to make this information available in an integrated format that is more amenable to data mining. A tried and tested approach to this problem is to map disparate data sources into a single data set, with common syntax and semantics, to produce a data warehouse or knowledge base. Ontologies have been used extensively in the life sciences, providing this common syntax and semantics as a model for a given biological domain, in a fashion that is amenable to computational analysis and reasoning. Here, we present an ontology for applications in synthetic biology design, SyBiOnt, which facilitates the modeling of information about biological parts and their relationships. SyBiOnt was used to create the SyBiOntKB knowledge base, incorporating and building upon existing life sciences ontologies and standards. The reasoning capabilities of ontologies were then applied to automate the mining of biological parts from this knowledge base. We propose that this approach will be useful to speed up synthetic biology design and ultimately help facilitate the automation of the biological engineering life cycle.

Assuntos

Mineração de Dados , Bases de Dados Genéticas , Biologia Sintética , Bacillus subtilis/genética , Bacillus subtilis/metabolismo , Biologia Computacional , DNA Bacteriano/genética , Bases de Conhecimento , Regiões Promotoras Genéticas , Análise de Sequência de DNA , Software

10.

A Transcriptional Signature of Fatigue Derived from Patients with Primary Sjögren's Syndrome.

James, Katherine; Al-Ali, Shereen; Tarn, Jessica; Cockell, Simon J; Gillespie, Colin S; Hindmarsh, Victoria; Locke, James; Mitchell, Sheryl; Lendrem, Dennis; Bowman, Simon; Price, Elizabeth; Pease, Colin T; Emery, Paul; Lanyon, Peter; Hunter, John A; Gupta, Monica; Bombardieri, Michele; Sutcliffe, Nurhan; Pitzalis, Costantino; McLaren, John; Cooper, Annie; Regan, Marian; Giles, Ian; Isenberg, David; Saravanan, Vadivelu; Coady, David; Dasgupta, Bhaskar; McHugh, Neil; Young-Min, Steven; Moots, Robert; Gendi, Nagui; Akil, Mohammed; Griffiths, Bridget; Wipat, Anil; Newton, Julia; Jones, David E; Isaacs, John; Hallinan, Jennifer; Ng, Wan-Fai.

PLoS One ; 10(12): e0143970, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26694930

RESUMO

BACKGROUND: Fatigue is a debilitating condition with a significant impact on patients' quality of life. Fatigue is frequently reported by patients suffering from primary Sjögren's Syndrome (pSS), a chronic autoimmune condition characterised by dryness of the eyes and the mouth. However, although fatigue is common in pSS, it does not manifest in all sufferers, providing an excellent model with which to explore the potential underpinning biological mechanisms. METHODS: Whole blood samples from 133 fully-phenotyped pSS patients stratified for the presence of fatigue, collected by the UK primary Sjögren's Syndrome Registry, were used for whole genome microarray. The resulting data were analysed both on a gene by gene basis and using pre-defined groups of genes. Finally, gene set enrichment analysis (GSEA) was used as a feature selection technique for input into a support vector machine (SVM) classifier. Classification was assessed using area under curve (AUC) of receiver operator characteristic and standard error of Wilcoxon statistic, SE(W). RESULTS: Although no genes were individually found to be associated with fatigue, 19 metabolic pathways were enriched in the high fatigue patient group using GSEA. Analysis revealed that these enrichments arose from the presence of a subset of 55 genes. A radial kernel SVM classifier with this subset of genes as input displayed significantly improved performance over classifiers using all pathway genes as input. The classifiers had AUCs of 0.866 (SE(W) 0.002) and 0.525 (SE(W) 0.006), respectively. CONCLUSIONS: Systematic analysis of gene expression data from pSS patients discordant for fatigue identified 55 genes which are predictive of fatigue level using SVM classification. This list represents the first step in understanding the underlying pathophysiological mechanisms of fatigue in patients with pSS.

Assuntos

Fadiga/genética , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Síndrome de Sjogren/complicações , Transcriptoma , Adulto , Idoso , Área Sob a Curva , Fadiga/sangue , Fadiga/etiologia , Feminino , Predisposição Genética para Doença , Humanos , Masculino , Pessoa de Meia-Idade , Índice de Gravidade de Doença , Síndrome de Sjogren/sangue

11.

A distributed computational search strategy for the identification of diagnostics targets: application to finding aptamer targets for methicillin-resistant staphylococci.

Flanagan, Keith; Cockell, Simon; Harwood, Colin; Hallinan, Jennifer; Nakjang, Sirintra; Lawry, Beth; Wipat, Anil.

J Integr Bioinform ; 11(2): 242, 2014 Jun 30.

Artigo em Inglês | MEDLINE | ID: mdl-24980620

RESUMO

The rapid and cost-effective identification of bacterial species is crucial, especially for clinical diagnosis and treatment. Peptide aptamers have been shown to be valuable for use as a component of novel, direct detection methods. These small peptides have a number of advantages over antibodies, including greater specificity and longer shelf life. These properties facilitate their use as the detector components of biosensor devices. However, the identification of suitable aptamer targets for particular groups of organisms is challenging. We present a semi-automated processing pipeline for the identification of candidate aptamer targets from whole bacterial genome sequences. The pipeline can be configured to search for protein sequence fragments that uniquely identify a set of strains of interest. The system is also capable of identifying additional organisms that may be of interest due to their possession of protein fragments in common with the initial set. Through the use of Cloud computing technology and distributed databases, our system is capable of scaling with the rapidly growing genome repositories, and consequently of keeping the resulting data sets up-to-date. The system described is also more generically applicable to the discovery of specific targets for other diagnostic approaches such as DNA probes, PCR primers and antibodies.

Assuntos

Biologia Computacional/métodos , Staphylococcus aureus Resistente à Meticilina/genética , Infecções Estafilocócicas/microbiologia , Algoritmos , Automação , Proteínas de Bactérias/genética , Redes de Comunicação de Computadores , Sistemas Computacionais , DNA/química , Epitopos/química , Genoma Bacteriano , Ligantes , Staphylococcus aureus Resistente à Meticilina/efeitos dos fármacos , Peptídeos/química , RNA/química , Infecções Estafilocócicas/diagnóstico

12.

Probabilistic latent semantic analysis applied to whole bacterial genomes identifies common genomic features.

Rusakovica, Julija; Hallinan, Jennifer; Wipat, Anil; Zuliani, Paolo.

J Integr Bioinform ; 11(2): 243, 2014 Jun 30.

Artigo em Inglês | MEDLINE | ID: mdl-24980693

RESUMO

The spread of drug resistance amongst clinically-important bacteria is a serious, and growing, problem [1]. However, the analysis of entire genomes requires considerable computational effort, usually including the assembly of the genome and subsequent identification of genes known to be important in pathology. An alternative approach is to use computational algorithms to identify genomic differences between pathogenic and non-pathogenic bacteria, even without knowing the biological meaning of those differences. To overcome this problem, a range of techniques for dimensionality reduction have been developed. One such approach is known as latent-variable models [2]. In latent-variable models dimensionality reduction is achieved by representing a high-dimensional data by a few hidden or latent variables, which are not directly observed but inferred from the observed variables present in the model. Probabilistic Latent Semantic Indexing (PLSA) is an extention of LSA [3]. PLSA is based on a mixture decomposition derived from a latent class model. The main objective of the algorithm, as in LSA, is to represent high-dimensional co-occurrence information in a lower-dimensional way in order to discover the hidden semantic structure of the data using a probabilistic framework. In this work we applied the PLSA approach to analyse the common genomic features in methicillin resistant Staphylococcus aureus, using tokens derived from amino acid sequences rather than DNA. We characterised genome-scale amino acid sequences in terms of their components, and then investigated the relationships between genomes and tokens and the phenotypes they generated. As a control we used the non-pathogenic model Gram-positive bacterium Bacillus subtilis.

Assuntos

Bacillus subtilis/genética , Biologia Computacional/métodos , Genoma Bacteriano , Genômica , Staphylococcus aureus Resistente à Meticilina/genética , Algoritmos , Reconhecimento Automatizado de Padrão , Fenótipo , Probabilidade , Semântica , Software

13.

BacillusRegNet: a transcriptional regulation database and analysis platform for Bacillus species.

Misirli, Goksel; Hallinan, Jennifer; Röttger, Richard; Baumbach, Jan; Wipat, Anil.

J Integr Bioinform ; 11(2): 244, 2014 Jul 08.

Artigo em Inglês | MEDLINE | ID: mdl-25001169

RESUMO

As high-throughput technologies become cheaper and easier to use, raw sequence data and corresponding annotations for many organisms are becoming available. However, sequence data alone is not sufficient to explain the biological behaviour of organisms, which arises largely from complex molecular interactions. There is a need to develop new platform technologies that can be applied to the investigation of whole-genome datasets in an efficient and cost-effective manner. One such approach is the transfer of existing knowledge from well-studied organisms to closely-related organisms. In this paper, we describe a system, BacillusRegNet, for the use of a model organism, Bacillus subtilis, to infer genome-wide regulatory networks in less well-studied close relatives. The putative transcription factors, their binding sequences and predicted promoter sequences along with annotations are available from the associated BacillusRegNet website (http://bacillus.ncl.ac.uk).

Assuntos

Bacillus subtilis/genética , Biologia Computacional/métodos , Redes Reguladoras de Genes , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos , Bacillus subtilis/metabolismo , Sítios de Ligação , Simulação por Computador , Sistemas Computacionais , Bases de Dados Genéticas , Bases de Dados de Proteínas , Regulação Bacteriana da Expressão Gênica , Genoma Bacteriano , Regiões Promotoras Genéticas , Software , Biologia de Sistemas , Fatores de Transcrição/metabolismo , Transcrição Gênica

14.

The Synthetic Biology Open Language (SBOL) provides a community standard for communicating designs in synthetic biology.

Galdzicki, Michal; Clancy, Kevin P; Oberortner, Ernst; Pocock, Matthew; Quinn, Jacqueline Y; Rodriguez, Cesar A; Roehner, Nicholas; Wilson, Mandy L; Adam, Laura; Anderson, J Christopher; Bartley, Bryan A; Beal, Jacob; Chandran, Deepak; Chen, Joanna; Densmore, Douglas; Endy, Drew; Grünberg, Raik; Hallinan, Jennifer; Hillson, Nathan J; Johnson, Jeffrey D; Kuchinsky, Allan; Lux, Matthew; Misirli, Goksel; Peccoud, Jean; Plahar, Hector A; Sirin, Evren; Stan, Guy-Bart; Villalobos, Alan; Wipat, Anil; Gennari, John H; Myers, Chris J; Sauro, Herbert M.

Nat Biotechnol ; 32(6): 545-50, 2014 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-24911500

RESUMO

The re-use of previously validated designs is critical to the evolution of synthetic biology from a research discipline to an engineering practice. Here we describe the Synthetic Biology Open Language (SBOL), a proposed data standard for exchanging designs within the synthetic biology community. SBOL represents synthetic biology designs in a community-driven, formalized format for exchange between software tools, research groups and commercial service providers. The SBOL Developers Group has implemented SBOL as an XML/RDF serialization and provides software libraries and specification documentation to help developers implement SBOL in their own software. We describe early successes, including a demonstration of the utility of SBOL for information exchange between several different software tools and repositories from both academic and industrial partners. As a community-driven standard, SBOL will be updated as synthetic biology evolves to provide specific capabilities for different aspects of the synthetic biology workflow.

Assuntos

Disseminação de Informação/métodos , Projetos de Pesquisa/normas , Software/normas , Biologia Sintética/normas , Terminologia como Assunto , Vocabulário Controlado , Internacionalidade , Padrões de Referência

15.

BacillOndex: an integrated data resource for systems and synthetic biology.

Misirli, Goksel; Wipat, Anil; Mullen, Joseph; James, Katherine; Pocock, Matthew; Smith, Wendy; Allenby, Nick; Hallinan, Jennifer S.

J Integr Bioinform ; 10(2): 224, 2013 Apr 10.

Artigo em Inglês | MEDLINE | ID: mdl-23571273

RESUMO

BacillOndex is an extension of the Ondex data integration system, providing a semantically annotated, integrated knowledge base for the model Gram-positive bacterium Bacillus subtilis. This application allows a user to mine a variety of B. subtilis data sources, and analyse the resulting integrated dataset, which contains data about genes, gene products and their interactions. The data can be analysed either manually, by browsing using Ondex, or computationally via a Web services interface. We describe the process of creating a BacillOndex instance, and describe the use of the system for the analysis of single nucleotide polymorphisms in B. subtilis Marburg. The Marburg strain is the progenitor of the widely-used laboratory strain B. subtilis 168. We identified 27 SNPs with predictable phenotypic effects, including genetic traits for known phenotypes. We conclude that BacillOndex is a valuable tool for the systems-level investigation of, and hypothesis generation about, this important biotechnology workhorse. Such understanding contributes to our ability to construct synthetic genetic circuits in this organism.

Assuntos

Bacillus subtilis/genética , Bases de Dados Genéticas , Biologia Sintética , Biologia de Sistemas , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Software

16.

Microbase2.0: a generic framework for computationally intensive bioinformatics workflows in the cloud.

Flanagan, Keith; Nakjang, Sirintra; Hallinan, Jennifer; Harwood, Colin; Hirt, Robert P; Pocock, Matthew R; Wipat, Anil.

J Integr Bioinform ; 9(2): 212, 2012 Sep 24.

Artigo em Inglês | MEDLINE | ID: mdl-23001322

RESUMO

As bioinformatics datasets grow ever larger, and analyses become increasingly complex, there is a need for data handling infrastructures to keep pace with developing technology. One solution is to apply Grid and Cloud technologies to address the computational requirements of analysing high throughput datasets. We present an approach for writing new, or wrapping existing applications, and a reference implementation of a framework, Microbase2.0, for executing those applications using Grid and Cloud technologies. We used Microbase2.0 to develop an automated Cloud-based bioinformatics workflow executing simultaneously on two different Amazon EC2 data centres and the Newcastle University Condor Grid. Several CPU years' worth of computational work was performed by this system in less than two months. The workflow produced a detailed dataset characterising the cellular localisation of 3,021,490 proteins from 867 taxa, including bacteria, archaea and unicellular eukaryotes. Microbase2.0 is freely available from http://www.microbase.org.uk/.

Assuntos

Biologia Computacional/métodos , Software , Bases de Dados Factuais , Interface Usuário-Computador , Fluxo de Trabalho

17.

Bayesian integration of networks without gold standards.

Weile, Jochen; James, Katherine; Hallinan, Jennifer; Cockell, Simon J; Lord, Phillip; Wipat, Anil; Wilkinson, Darren J.

Bioinformatics ; 28(11): 1495-500, 2012 Jun 01.

Artigo em Inglês | MEDLINE | ID: mdl-22492647

RESUMO

MOTIVATION: Biological experiments give insight into networks of processes inside a cell, but are subject to error and uncertainty. However, due to the overlap between the large number of experiments reported in public databases it is possible to assess the chances of individual observations being correct. In order to do so, existing methods rely on high-quality 'gold standard' reference networks, but such reference networks are not always available. RESULTS: We present a novel algorithm for computing the probability of network interactions that operates without gold standard reference data. We show that our algorithm outperforms existing gold standard-based methods. Finally, we apply the new algorithm to a large collection of genetic interaction and protein-protein interaction experiments. AVAILABILITY: The integrated dataset and a reference implementation of the algorithm as a plug-in for the Ondex data integration framework are available for download at http://bio-nexus.ncl.ac.uk/projects/nogold/

Assuntos

Algoritmos , Teorema de Bayes , Epistasia Genética , Mapeamento de Interação de Proteínas/normas , Funções Verossimilhança , Mapeamento de Interação de Proteínas/métodos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo

18.

Is newer better?--evaluating the effects of data curation on integrated analyses in Saccharomyces cerevisiae.

James, Katherine; Wipat, Anil; Hallinan, Jennifer.

Integr Biol (Camb) ; 4(7): 715-27, 2012 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-22526920

RESUMO

Recent high-throughput experiments have produced a wealth of heterogeneous datasets, each of which provides information about different aspects of the cell. Consequently, integration of diverse data types is essential in order to address many biological questions. The quality of any integrated analysis system is dependent upon the quality of its component data, and upon the Gold Standard data used to evaluate it. It is commonly assumed that the quality of data improves as databases grow and change, particularly for manually curated databases. However, the validity of this assumption can be questioned, given the constant changes in the data coupled with the high level of noise associated with high-throughput experimental techniques. One of the most powerful approaches to data integration is the use of Probabilistic Functional Integrated Networks (PFINs). Here, we systematically analyse the changes in four highly-curated and widely-used online databases and evaluate the extent to which these changes affect the protein function prediction performance of PFINs in the yeast Saccharomyces cerevisiae. We find that the global trend in network performance improves over time. Where individual areas of biology are concerned, however, the most recent files do not always produce the best results. Individual datasets have unique biases towards different biological processes and by selecting and integrating relevant datasets performance can be improved. When using any type of integrated system to answer a specific biological question careful selection of raw data and Gold Standard is vital, since the most recent data may not be the most appropriate.

Assuntos

Biologia Computacional/métodos , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/fisiologia , Algoritmos , Área Sob a Curva , Reparo do DNA , Interpretação Estatística de Dados , Bases de Dados de Proteínas , Reações Falso-Positivas , Modelos Estatísticos , Mapeamento de Interação de Proteínas/métodos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

19.

Keratinocyte apoptosis in epidermal remodeling and clearance of psoriasis induced by UV radiation.

Weatherhead, Sophie C; Farr, Peter M; Jamieson, David; Hallinan, Jennifer S; Lloyd, James J; Wipat, Anil; Reynolds, Nick J.

J Invest Dermatol ; 131(9): 1916-26, 2011 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-21614017

RESUMO

Psoriasis is a common chronic skin disorder, but the mechanisms involved in the resolution and clearance of plaques remain poorly defined. We investigated the mechanism of action of UVB, which is highly effective in clearing psoriasis and inducing remission, and tested the hypothesis that apoptosis is a key mechanism. To distinguish bystander effects, equal erythemal doses of two UVB wavelengths were compared following in vivo irradiation of psoriatic plaques; one is clinically effective (311 nm) and one has no therapeutic effect on psoriasis (290 nm). Only 311 nm UVB induced significant apoptosis in lesional epidermis, and most apoptotic cells were keratinocytes. To determine clinical relevance, we created a computational model of psoriatic epidermis. Modeling predicted apoptosis would occur in both stem and transit-amplifying cells to account for plaque clearance; this was confirmed and quantified experimentally. The median rate of keratinocyte apoptosis from onset to cell death was 20 minutes. These data were fed back into the model and demonstrated that the observed level of keratinocyte apoptosis was sufficient to explain UVB-induced plaque resolution. Our human studies combined with a systems biology approach demonstrate that keratinocyte apoptosis is a key mechanism in psoriatic plaques clearance, providing the basis for future molecular investigation and therapeutic development.

Assuntos

Apoptose/efeitos da radiação , Queratinócitos/efeitos da radiação , Psoríase/patologia , Psoríase/radioterapia , Terapia Ultravioleta/métodos , Adulto , Biópsia , Divisão Celular/efeitos da radiação , Células Cultivadas , Derme/patologia , Derme/efeitos da radiação , Relação Dose-Resposta à Radiação , Epiderme/patologia , Epiderme/efeitos da radiação , Feminino , Humanos , Queratinócitos/citologia , Queratinócitos/patologia , Masculino , Modelos Biológicos , Resultado do Tratamento

20.

Customizable views on semantically integrated networks for systems biology.

Weile, Jochen; Pocock, Matthew; Cockell, Simon J; Lord, Phillip; Dewar, James M; Holstein, Eva-Maria; Wilkinson, Darren; Lydall, David; Hallinan, Jennifer; Wipat, Anil.

Bioinformatics ; 27(9): 1299-306, 2011 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-21414991

RESUMO

MOTIVATION: The rise of high-throughput technologies in the post-genomic era has led to the production of large amounts of biological data. Many of these datasets are freely available on the Internet. Making optimal use of these data is a significant challenge for bioinformaticians. Various strategies for integrating data have been proposed to address this challenge. One of the most promising approaches is the development of semantically rich integrated datasets. Although well suited to computational manipulation, such integrated datasets are typically too large and complex for easy visualization and interactive exploration. RESULTS: We have created an integrated dataset for Saccharomyces cerevisiae using the semantic data integration tool Ondex, and have developed a view-based visualization technique that allows for concise graphical representations of the integrated data. The technique was implemented in a plug-in for Cytoscape, called OndexView. We used OndexView to investigate telomere maintenance in S. cerevisiae. AVAILABILITY: The Ondex yeast dataset and the OndexView plug-in for Cytoscape are accessible at http://bsu.ncl.ac.uk/ondexview.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Armazenamento e Recuperação da Informação/métodos , Biologia de Sistemas/métodos , Internet , Saccharomyces cerevisiae/genética , Telômero/genética

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA