Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
Entropy (Basel) ; 24(9)2022 Sep 02.
Article in English | MEDLINE | ID: mdl-36141120

ABSTRACT

In this study, we focus on mixed data which are either observations of univariate random variables which can be quantitative or qualitative, or observations of multivariate random variables such that each variable can include both quantitative and qualitative components. We first propose a novel method, called CMIh, to estimate conditional mutual information taking advantages of the previously proposed approaches for qualitative and quantitative data. We then introduce a new local permutation test, called LocAT for local adaptive test, which is well adapted to mixed data. Our experiments illustrate the good behaviour of CMIh and LocAT, and show their respective abilities to accurately estimate conditional mutual information and to detect conditional (in)dependence for mixed data.

2.
Entropy (Basel) ; 24(8)2022 Aug 19.
Article in English | MEDLINE | ID: mdl-36010820

ABSTRACT

This study addresses the problem of learning a summary causal graph on time series with potentially different sampling rates. To do so, we first propose a new causal temporal mutual information measure for time series. We then show how this measure relates to an entropy reduction principle that can be seen as a special case of the probability raising principle. We finally combine these two ingredients in PC-like and FCI-like algorithms to construct the summary causal graph. There algorithm are evaluated on several datasets, which shows both their efficacy and efficiency.

3.
IEEE Trans Cybern ; 52(4): 2059-2069, 2022 Apr.
Article in English | MEDLINE | ID: mdl-32697727

ABSTRACT

Metric learning has been successful in learning new metrics adapted to numerical datasets. However, its development of categorical data still needs further exploration. In this article, we propose a method, called CPML for categorical projected metric learning, which tries to efficiently (i.e., less computational time and better prediction accuracy) address the problem of metric learning in categorical data. We make use of the value distance metric to represent our data and propose new distances based on this representation. We then show how to efficiently learn new metrics. We also generalize several previous regularizers through the Schatten p -norm and provide a generalization bound for it that complements the standard generalization bound for metric learning. The experimental results show that our method provides state-of-the-art results while being faster.


Subject(s)
Algorithms
4.
BMC Bioinformatics ; 16: 138, 2015 Apr 30.
Article in English | MEDLINE | ID: mdl-25925131

ABSTRACT

BACKGROUND: This article provides an overview of the first BIOASQ challenge, a competition on large-scale biomedical semantic indexing and question answering (QA), which took place between March and September 2013. BIOASQ assesses the ability of systems to semantically index very large numbers of biomedical scientific articles, and to return concise and user-understandable answers to given natural language questions by combining information from biomedical articles and ontologies. RESULTS: The 2013 BIOASQ competition comprised two tasks, Task 1a and Task 1b. In Task 1a participants were asked to automatically annotate new PUBMED documents with MESH headings. Twelve teams participated in Task 1a, with a total of 46 system runs submitted, and one of the teams performing consistently better than the MTI indexer used by NLM to suggest MESH headings to curators. Task 1b used benchmark datasets containing 29 development and 282 test English questions, along with gold standard (reference) answers, prepared by a team of biomedical experts from around Europe and participants had to automatically produce answers. Three teams participated in Task 1b, with 11 system runs. The BIOASQ infrastructure, including benchmark datasets, evaluation mechanisms, and the results of the participants and baseline methods, is publicly available. CONCLUSIONS: A publicly available evaluation infrastructure for biomedical semantic indexing and QA has been developed, which includes benchmark datasets, and can be used to evaluate systems that: assign MESH headings to published articles or to English questions; retrieve relevant RDF triples from ontologies, relevant articles and snippets from PUBMED Central; produce "exact" and paragraph-sized "ideal" answers (summaries). The results of the systems that participated in the 2013 BIOASQ competition are promising. In Task 1a one of the systems performed consistently better from the NLM's MTI indexer. In Task 1b the systems received high scores in the manual evaluation of the "ideal" answers; hence, they produced high quality summaries as answers. Overall, BIOASQ helped obtain a unified view of how techniques from text classification, semantic indexing, document and passage retrieval, question answering, and text summarization can be combined to allow biomedical experts to obtain concise, user-understandable answers to questions reflecting their real information needs.


Subject(s)
Abstracting and Indexing/methods , Algorithms , Medical Subject Headings , Natural Language Processing , PubMed , Semantics , Software , Humans , National Library of Medicine (U.S.) , United States
5.
Int J Med Inform ; 74(2-4): 317-24, 2005 Mar.
Article in English | MEDLINE | ID: mdl-15694638

ABSTRACT

Bio-medical knowledge bases are valuable resources for the research community. Original scientific publications are the main source used to annotate them. Medical annotation in Swiss-Prot is specifically targeted at finding and extracting data about human genetic diseases and polymorphisms. Curators have to scan through hundreds of publications to select the relevant ones. This workload can be greatly reduced by using bio-text mining techniques. Using a combination of natural language processing (NLP) techniques and statistical classifiers, we achieve recall points of up to 84% on the potentially interesting documents and a precision of more than 96% in detecting irrelevant documents. Careful analysis of the document pre-processing chain allows us to measure the impact of some steps on the overall result, as well as test different classifier configurations. The best combination was used to create a prototype of a search and classification tool that is currently tested by the database curators.


Subject(s)
Databases, Protein , Statistics as Topic , Genetic Diseases, Inborn/genetics , Humans , Polymorphism, Genetic
6.
Stud Health Technol Inform ; 95: 421-6, 2003.
Article in English | MEDLINE | ID: mdl-14664023

ABSTRACT

The goal of medical annotation of human proteins in Swiss-Prot is to add features specifically intended for researchers working on genetic diseases and polymorphisms. For this purpose, it is necessary to search through a vast number of publications containing relevant information. Promising results have been obtained by applying natural language processing and machine learning techniques to solve this problem. By using the Probabilistic Latent Categorizer on representative query sets, 69% recall and 59% precision was achieved for relevant documents. This classifier also rejected irrelevant abstracts with more than 96% precision. Better linguistic pre-processing of source documents can further improve such computer approach.


Subject(s)
Databases, Protein , Information Storage and Retrieval/statistics & numerical data , Probability , Switzerland
7.
Bioinformatics ; 19 Suppl 1: i91-4, 2003.
Article in English | MEDLINE | ID: mdl-12855443

ABSTRACT

MOTIVATION: Searching relevant publications for manual database annotation is a tedious task. In this paper, we apply a combination of Natural Language Processing (NLP) and probabilistic classification to re-rank documents returned by PubMed according to their relevance to Swiss-Prot annotation, and to identify significant terms in the documents. RESULTS: With a Probabilistic Latent Categoriser (PLC) we obtained 69% recall and 59% precision for relevant documents in a representative query. As the PLC technique provides the relative contribution of each term to the final document score, we used the Kullback-Leibler symmetric divergence to determine the most discriminating words for Swiss-Prot medical annotation. This information should allow curators to understand classification results better. It also has great value for fine-tuning the linguistic pre-processing of documents, which in turn can improve the overall classifier performance.


Subject(s)
Abstracting and Indexing/methods , Databases, Protein , Models, Statistical , Natural Language Processing , Periodicals as Topic/classification , Proteins/chemistry , PubMed , Algorithms , Artificial Intelligence , Documentation/methods , Pattern Recognition, Automated , Proteins/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...