Search | VHL Regional Portal

Formalizing biomedical concepts from textual definitions.

Petrova, Alina; Ma, Yue; Tsatsaronis, George; Kissa, Maria; Distel, Felix; Baader, Franz; Schroeder, Michael.

J Biomed Semantics ; 6: 22, 2015.

Article in English | MEDLINE | ID: mdl-25949785

ABSTRACT

BACKGROUND: Ontologies play a major role in life sciences, enabling a number of applications, from new data integration to knowledge verification. SNOMED CT is a large medical ontology that is formally defined so that it ensures global consistency and support of complex reasoning tasks. Most biomedical ontologies and taxonomies on the other hand define concepts only textually, without the use of logic. Here, we investigate how to automatically generate formal concept definitions from textual ones. We develop a method that uses machine learning in combination with several types of lexical and semantic features and outputs formal definitions that follow the structure of SNOMED CT concept definitions. RESULTS: We evaluate our method on three benchmarks and test both the underlying relation extraction component as well as the overall quality of output concept definitions. In addition, we provide an analysis on the following aspects: (1) How do definitions mined from the Web and literature differ from the ones mined from manually created definitions, e.g., MeSH? (2) How do different feature representations, e.g., the restrictions of relations' domain and range, impact on the generated definition quality?, (3) How do different machine learning algorithms compare to each other for the task of formal definition generation?, and, (4) What is the influence of the learning data size to the task? We discuss all of these settings in detail and show that the suggested approach can achieve success rates of over 90%. In addition, the results show that the choice of corpora, lexical features, learning algorithm and data size do not impact the performance as strongly as semantic types do. Semantic types limit the domain and range of a predicted relation, and as long as relations' domain and range pairs do not overlap, this information is most valuable in formalizing textual definitions. CONCLUSIONS: The analysis presented in this manuscript implies that automated methods can provide a valuable contribution to the formalization of biomedical knowledge, thus paving the way for future applications that go beyond retrieval and into complex reasoning. The method is implemented and accessible to the public from: https://github.com/alifahsyamsiyah/learningDL.

Prediction of drug gene associations via ontological profile similarity with application to drug repositioning.

Kissa, Maria; Tsatsaronis, George; Schroeder, Michael.

Methods ; 74: 71-82, 2015 Mar.

Article in English | MEDLINE | ID: mdl-25498216

ABSTRACT

The amount of biomedical literature has been increasing rapidly during the last decade. Text mining techniques can harness this large-scale data, shed light onto complex drug mechanisms, and extract relation information that can support computational polypharmacology. In this work, we introduce a fully corpus-based and unsupervised method which utilizes the MEDLINE indexed titles and abstracts to infer drug gene associations and assist drug repositioning. The method measures the Pointwise Mutual Information (PMI) between biomedical terms derived from the Gene Ontology and the Medical Subject Headings. Based on the PMI scores, drug and gene profiles are generated and candidate drug gene associations are inferred when computing the relatedness of their profiles. Results show that an Area Under the Curve (AUC) of up to 0.88 can be achieved. The method can successfully identify direct drug gene associations with high precision and prioritize them. Validation shows that the statistically derived profiles from literature perform as good as manually curated profiles. In addition, we examine the potential application of our approach towards drug repositioning. For all FDA approved drugs repositioned over the last 5 years, we generate profiles from publications before 2009 and show that new indications rank high in the profiles. In summary, literature mined profiles can accurately predict drug gene associations and provide insights onto potential repositioning cases.

Subject(s)

Data Mining/methods , Drug Repositioning/methods , Gene Ontology , Genetic Association Studies/methods , Pharmaceutical Preparations , Pharmacogenetics/methods , Forecasting , Humans

SAFE Software and FED Database to Uncover Protein-Protein Interactions using Gene Fusion Analysis.

Tsagrasoulis, Dimosthenis; Danos, Vasilis; Kissa, Maria; Trimpalis, Philip; Koumandou, V Lila; Karagouni, Amalia D; Tsakalidis, Athanasios; Kossida, Sophia.

Evol Bioinform Online ; 8: 47-60, 2012.

Article in English | MEDLINE | ID: mdl-22267904

ABSTRACT

Domain Fusion Analysis takes advantage of the fact that certain proteins in a given proteome A, are found to have statistically significant similarity with two separate proteins in another proteome B. In other words, the result of a fusion event between two separate proteins in proteome B is a specific full-length protein in proteome A. In such a case, it can be safely concluded that the protein pair has a common biological function or even interacts physically. In this paper, we present the Fusion Events Database (FED), a database for the maintenance and retrieval of fusion data both in prokaryotic and eukaryotic organisms and the Software for the Analysis of Fusion Events (SAFE), a computational platform implemented for the automated detection, filtering and visualization of fusion events (both available at: http://www.bioacademy.gr/bioinformatics/projects/ProteinFusion/index.htm). Finally, we analyze the proteomes of three microorganisms using these tools in order to demonstrate their functionality.

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL