Pesquisa | Portal Regional da BVS

Graph convolutional network based virus-human protein-protein interaction prediction for novel viruses.

Koca, Mehmet Burak; Nourani, Esmaeil; Abbasoglu, Ferda; Karadeniz, Ilknur; Sevilgen, Fatih Erdogan.

Comput Biol Chem ; 101: 107755, 2022 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-36037723

RESUMO

Computational identification of human-virus protein-protein interactions (PHIs) is a worthwhile step towards understanding infection mechanisms. Analysis of the PHI networks is important for the determination of pathogenic diseases. Prediction of these interactions is a popular problem since experimental detection of PHIs is both time-consuming and expensive. The available methods use biological features like amino acid sequences, molecular structure, or biological activities for prediction. Recent studies show that the topological properties of proteins in protein-protein interaction (PPI) networks increase the performance of the predictions. The basic network projections, random-walk-based models, or graph neural networks are used for generating topologically enriched (hybrid) protein embeddings. In this study, we propose a three-stage machine learning pipeline that generates and uses hybrid embeddings for PHI prediction. In the first stage, numerical features are extracted from the amino acid sequences using the Doc2Vec and Byte Pair Encoding method. The amino acid embeddings are used as node features while training a modified GraphSAGE model, which is an improved version of the graph convolutional network. Lastly, the hybrid protein embeddings are used for training a binary interaction classifier model that predicts whether there is an interaction between the given two proteins or not. The proposed method is evaluated with comprehensive experiments to test its functionality and compare it with the state-of-art methods. The experimental results on the benchmark dataset prove the efficiency of the proposed model by having a 3-23% better area under curve (AUC) score than its competitors.

Assuntos

Redes Neurais de Computação , Vírus , Humanos , Aprendizado de Máquina , Proteínas , Área Sob a Curva

Linking entities through an ontology using word embeddings and syntactic re-ranking.

Karadeniz, Ilknur; Özgür, Arzucan.

BMC Bioinformatics ; 20(1): 156, 2019 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-30917789

RESUMO

BACKGROUND: Although there is an enormous number of textual resources in the biomedical domain, currently, manually curated resources cover only a small part of the existing knowledge. The vast majority of these information is in unstructured form which contain nonstandard naming conventions. The task of named entity recognition, which is the identification of entity names from text, is not adequate without a standardization step. Linking each identified entity mention in text to an ontology/dictionary concept is an essential task to make sense of the identified entities. This paper presents an unsupervised approach for the linking of named entities to concepts in an ontology/dictionary. We propose an approach for the normalization of biomedical entities through an ontology/dictionary by using word embeddings to represent semantic spaces, and a syntactic parser to give higher weight to the most informative word in the named entity mentions. RESULTS: We applied the proposed method to two different normalization tasks: the normalization of bacteria biotope entities through the Onto-Biotope ontology and the normalization of adverse drug reaction entities through the Medical Dictionary for Regulatory Activities (MedDRA). The proposed method achieved a precision score of 65.9%, which is 2.9 percentage points above the state-of-the-art result on the BioNLP Shared Task 2016 Bacteria Biotope test data and a macro-averaged precision score of 68.7% on the Text Analysis Conference 2017 Adverse Drug Reaction test data. CONCLUSIONS: The core contribution of this paper is a syntax-based way of combining the individual word vectors to form vectors for the named entity mentions and ontology concepts, which can then be used to measure the similarity between them. The proposed approach is unsupervised and does not require labeled data, making it easily applicable to different domains.

Assuntos

Mineração de Dados , Semântica , Algoritmos , Bactérias/metabolismo , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Padrões de Referência , Software

Literature Mining and Ontology based Analysis of Host-Brucella Gene-Gene Interaction Network.

Karadeniz, Ilknur; Hur, Junguk; He, Yongqun; Özgür, Arzucan.

Front Microbiol ; 6: 1386, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26696993

RESUMO

Brucella is an intracellular bacterium that causes chronic brucellosis in humans and various mammals. The identification of host-Brucella interaction is crucial to understand host immunity against Brucella infection and Brucella pathogenesis against host immune responses. Most of the information about the inter-species interactions between host and Brucella genes is only available in the text of the scientific publications. Many text-mining systems for extracting gene and protein interactions have been proposed. However, only a few of them have been designed by considering the peculiarities of host-pathogen interactions. In this paper, we used a text mining approach for extracting host-Brucella gene-gene interactions from the abstracts of articles in PubMed. The gene-gene interactions here represent the interactions between genes and/or gene products (e.g., proteins). The SciMiner tool, originally designed for detecting mammalian gene/protein names in text, was extended to identify host and Brucella gene/protein names in the abstracts. Next, sentence-level and abstract-level co-occurrence based approaches, as well as sentence-level machine learning based methods, originally designed for extracting intra-species gene interactions, were utilized to extract the interactions among the identified host and Brucella genes. The extracted interactions were manually evaluated. A total of 46 host-Brucella gene interactions were identified and represented as an interaction network. Twenty four of these interactions were identified from sentence-level processing. Twenty two additional interactions were identified when abstract-level processing was performed. The Interaction Network Ontology (INO) was used to represent the identified interaction types at a hierarchical ontology structure. Ontological modeling of specific gene-gene interactions demonstrates that host-pathogen gene-gene interactions occur at experimental conditions which can be ontologically represented. Our results show that the introduced literature mining and ontology-based modeling approach are effective in retrieving and analyzing host-pathogen gene-gene interaction networks.

Detection and categorization of bacteria habitats using shallow linguistic analysis.

Karadeniz, Ilknur; Özgür, Arzucan.

BMC Bioinformatics ; 16 Suppl 10: S5, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26201262

RESUMO

BACKGROUND: Information regarding bacteria biotopes is important for several research areas including health sciences, microbiology, and food processing and preservation. One of the challenges for scientists in these domains is the huge amount of information buried in the text of electronic resources. Developing methods to automatically extract bacteria habitat relations from the text of these electronic resources is crucial for facilitating research in these areas. METHODS: We introduce a linguistically motivated rule-based approach for recognizing and normalizing names of bacteria habitats in biomedical text by using an ontology. Our approach is based on the shallow syntactic analysis of the text that include sentence segmentation, part-of-speech (POS) tagging, partial parsing, and lemmatization. In addition, we propose two methods for identifying bacteria habitat localization relations. The underlying assumption for the first method is that discourse changes with a new paragraph. Therefore, it operates on a paragraph-basis. The second method performs a more fine-grained analysis of the text and operates on a sentence-basis. We also develop a novel anaphora resolution method for bacteria coreferences and incorporate it with the sentence-based relation extraction approach. RESULTS: We participated in the Bacteria Biotope (BB) Task of the BioNLP Shared Task 2013. Our system (Boun) achieved the second best performance with 68% Slot Error Rate (SER) in Sub-task 1 (Entity Detection and Categorization), and ranked third with an F-score of 27% in Sub-task 2 (Localization Event Extraction). This paper reports the system that is implemented for the shared task, including the novel methods developed and the improvements obtained after the official evaluation. The extensions include the expansion of the OntoBiotope ontology using the training set for Sub-task 1, and the novel sentence-based relation extraction method incorporated with anaphora resolution for Sub-task 2. These extensions resulted in promising results for Sub-task 1 with a SER of 68%, and state-of-the-art performance for Sub-task 2 with an F-score of 53%. CONCLUSIONS: Our results show that a linguistically-oriented approach based on the shallow syntactic analysis of the text is as effective as machine learning approaches for the detection and ontology-based normalization of habitat entities. Furthermore, the newly developed sentence-based relation extraction system with the anaphora resolution module significantly outperforms the paragraph-based one, as well as the other systems that participated in the BB Shared Task 2013.

Assuntos

Bactérias/classificação , Ecossistema , Microbiologia Ambiental , Linguística , Processamento de Linguagem Natural , Software , Bactérias/genética , Humanos , Aprendizado de Máquina

PHISTO: pathogen-host interaction search tool.

Durmus Tekir, Saliha; Çakir, Tunahan; Ardiç, Emre; Sayilirbas, Ali Semih; Konuk, Gökhan; Konuk, Mithat; Sariyer, Hasret; Ugurlu, Azat; Karadeniz, Ilknur; Özgür, Arzucan; Sevilgen, Fatih Erdogan; Ülgen, Kutlu Ö.

Bioinformatics ; 29(10): 1357-8, 2013 May 15.

Artigo em Inglês | MEDLINE | ID: mdl-23515528

RESUMO

SUMMARY: Knowledge of pathogen-host protein interactions is required to better understand infection mechanisms. The pathogen-host interaction search tool (PHISTO) is a web-accessible platform that provides relevant information about pathogen-host interactions (PHIs). It enables access to the most up-to-date PHI data for all pathogen types for which experimentally verified protein interactions with human are available. The platform also offers integrated tools for visualization of PHI networks, graph-theoretical analysis of targeted human proteins, BLAST search and text mining for detecting missing experimental methods. PHISTO will facilitate PHI studies that provide potential therapeutic targets for infectious diseases. AVAILABILITY: http://www.phisto.org. CONTACT: saliha.durmus@boun.edu.tr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Mineração de Dados , Interações Hospedeiro-Patógeno , Ferramenta de Busca , Doenças Transmissíveis , Bases de Dados de Proteínas , Humanos , Internet , Domínios e Motivos de Interação entre Proteínas

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA