Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
1.
Neural Netw ; 170: 215-226, 2024 Feb.
Article in English | MEDLINE | ID: mdl-37992509

ABSTRACT

This paper shows that text-only Language Models (LM) can learn to ground spatial relations like left of or below if they are provided with explicit location information of objects and they are properly trained to leverage those locations. We perform experiments on a verbalized version of the Visual Spatial Reasoning (VSR) dataset, where images are coupled with textual statements which contain real or fake spatial relations between two objects of the image. We verbalize the images using an off-the-shelf object detector, adding location tokens to every object label to represent their bounding boxes in textual form. Given the small size of VSR, we do not observe any improvement when using locations, but pretraining the LM over a synthetic dataset automatically derived by us improves results significantly when using location tokens. We thus show that locations allow LMs to ground spatial relations, with our text-only LMs outperforming Vision-and-Language Models and setting the new state-of-the-art for the VSR dataset. Our analysis show that our text-only LMs can generalize beyond the relations seen in the synthetic dataset to some extent, learning also more useful information than that encoded in the spatial rules we used to create the synthetic dataset itself.


Subject(s)
Language , Learning , Problem Solving
2.
Neuroimage ; 273: 120072, 2023 06.
Article in English | MEDLINE | ID: mdl-37004829

ABSTRACT

Early research proposed that individuals with developmental dyslexia use contextual information to facilitate lexical access and compensate for phonological deficits. Yet at present there is no corroborating neuro-cognitive evidence. We explored this with a novel combination of magnetoencephalography (MEG), neural encoding and grey matter volume analyses. We analysed MEG data from 41 adult native Spanish speakers (14 with dyslexic symptoms) who passively listened to naturalistic sentences. We used multivariate Temporal Response Function analysis to capture online cortical tracking of both auditory (speech envelope) and contextual information. To compute contextual information tracking we used word-level Semantic Surprisal derived using a Transformer Neural Network language model. We related online information tracking to participants' reading scores and grey matter volumes within the reading-linked cortical network. We found that right hemisphere envelope tracking was related to better phonological decoding (pseudoword reading) for both groups, with dyslexic readers performing worse overall at this task. Consistently, grey matter volume in the superior temporal and bilateral inferior frontal areas increased with better envelope tracking abilities. Critically, for dyslexic readers only, stronger Semantic Surprisal tracking in the right hemisphere was related to better word reading. These findings further support the notion of a speech envelope tracking deficit in dyslexia and provide novel evidence for top-down semantic compensatory mechanisms.


Subject(s)
Dyslexia , Speech Perception , Adult , Humans , Reading , Speech , Semantics , Magnetoencephalography , Speech Perception/physiology
3.
Knowl Based Syst ; 240: 108072, 2022 Mar 15.
Article in English | MEDLINE | ID: mdl-35002094

ABSTRACT

Biosanitary experts around the world are directing their efforts towards the study of COVID-19. This effort generates a large volume of scientific publications at a speed that makes the effective acquisition of new knowledge difficult. Therefore, Information Systems are needed to assist biosanitary experts in accessing, consulting and analyzing these publications. In this work we develop a study of the variables involved in the development of a Question Answering system that receives a set of questions asked by experts about the disease COVID-19 and its causal virus SARS-CoV-2, and provides a ranked list of expert-level answers to each question. In particular, we address the interrelation of the Information Retrieval and the Answer Extraction steps. We found that a recall based document retrieval that leaves to a neural answer extraction module the scanning of the whole documents to find the best answer is a better strategy than relying in a precise passage retrieval before extracting the answer span.

4.
Cereb Cortex ; 31(9): 4092-4103, 2021 07 29.
Article in English | MEDLINE | ID: mdl-33825884

ABSTRACT

Cortical circuits rely on the temporal regularities of speech to optimize signal parsing for sound-to-meaning mapping. Bottom-up speech analysis is accelerated by top-down predictions about upcoming words. In everyday communications, however, listeners are regularly presented with challenging input-fluctuations of speech rate or semantic content. In this study, we asked how reducing speech temporal regularity affects its processing-parsing, phonological analysis, and ability to generate context-based predictions. To ensure that spoken sentences were natural and approximated semantic constraints of spontaneous speech we built a neural network to select stimuli from large corpora. We analyzed brain activity recorded with magnetoencephalography during sentence listening using evoked responses, speech-to-brain synchronization and representational similarity analysis. For normal speech theta band (6.5-8 Hz) speech-to-brain synchronization was increased and the left fronto-temporal areas generated stronger contextual predictions. The reverse was true for temporally irregular speech-weaker theta synchronization and reduced top-down effects. Interestingly, delta-band (0.5 Hz) speech tracking was greater when contextual/semantic predictions were lower or if speech was temporally jittered. We conclude that speech temporal regularity is relevant for (theta) syllabic tracking and robust semantic predictions while the joint support of temporal and contextual predictability reduces word and phrase-level cortical tracking (delta).


Subject(s)
Cerebral Cortex/physiology , Language , Speech Perception/physiology , Adaptation, Psychological/physiology , Adolescent , Adult , Anticipation, Psychological , Electroencephalography Phase Synchronization , Evoked Potentials , Female , Humans , Magnetoencephalography , Male , Middle Aged , Nerve Net/physiology , Speech/physiology , Theta Rhythm/physiology , Young Adult
5.
Artif Intell Rev ; 54(1): 755-810, 2021.
Article in English | MEDLINE | ID: mdl-33505103

ABSTRACT

In this paper, we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation, in and of itself, is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost- and time-intensive. Thus, much work has been put into finding methods which allow a reduction in involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented, conversational, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then present the evaluation methods regarding that class.

6.
Data Brief ; 26: 104432, 2019 Oct.
Article in English | MEDLINE | ID: mdl-31516953

ABSTRACT

This data article introduces a reproducibility dataset with the aim of allowing the exact replication of all experiments, results and data tables introduced in our companion paper (Lastra-Díaz et al., 2019), which introduces the largest experimental survey on ontology-based semantic similarity methods and Word Embeddings (WE) for word similarity reported in the literature. The implementation of all our experiments, as well as the gathering of all raw data derived from them, was based on the software implementation and evaluation of all methods in HESML library (Lastra-Díaz et al., 2017), and their subsequent recording with Reprozip (Chirigati et al., 2016). Raw data is made up by a collection of data files gathering the raw word-similarity values returned by each method for each word pair evaluated in any benchmark. Raw data files were processed by running a R-language script with the aim of computing all evaluation metrics reported in (Lastra-Díaz et al., 2019), such as Pearson and Spearman correlation, harmonic score and statistical significance p-values, as well as to generate automatically all data tables shown in our companion paper. Our dataset provides all input data files, resources and complementary software tools to reproduce from scratch all our experimental data, statistical analysis and reported data. Finally, our reproducibility dataset provides a self-contained experimentation platform which allows to run new word similarity benchmarks by setting up new experiments including other unconsidered methods or word similarity benchmarks.

7.
J Biomed Inform ; 51: 100-6, 2014 Oct.
Article in English | MEDLINE | ID: mdl-24768598

ABSTRACT

OBJECTIVE: Most of the information in Electronic Health Records (EHRs) is represented in free textual form. Practitioners searching EHRs need to phrase their queries carefully, as the record might use synonyms or other related words. In this paper we show that an automatic query expansion method based on the Unified Medicine Language System (UMLS) Metathesaurus improves the results of a robust baseline when searching EHRs. MATERIALS AND METHODS: The method uses a graph representation of the lexical units, concepts and relations in the UMLS Metathesaurus. It is based on random walks over the graph, which start on the query terms. Random walks are a well-studied discipline in both Web and Knowledge Base datasets. RESULTS: Our experiments over the TREC Medical Record track show improvements in both the 2011 and 2012 datasets over a strong baseline. DISCUSSION: Our analysis shows that the success of our method is due to the automatic expansion of the query with extra terms, even when they are not directly related in the UMLS Metathesaurus. The terms added in the expansion go beyond simple synonyms, and also add other kinds of topically related terms. CONCLUSIONS: Expansion of queries using related terms in the UMLS Metathesaurus beyond synonymy is an effective way to overcome the gap between query and document vocabularies when searching for patient cohorts.


Subject(s)
Artificial Intelligence , Data Interpretation, Statistical , Data Mining/methods , Electronic Health Records/organization & administration , Natural Language Processing , Pattern Recognition, Automated/methods , Unified Medical Language System , Computer Simulation , Models, Statistical
8.
J Am Med Inform Assoc ; 19(2): 235-40, 2012.
Article in English | MEDLINE | ID: mdl-21900701

ABSTRACT

OBJECTIVE: Current techniques for knowledge-based Word Sense Disambiguation (WSD) of ambiguous biomedical terms rely on relations in the Unified Medical Language System Metathesaurus but do not take into account the domain of the target documents. The authors' goal is to improve these methods by using information about the topic of the document in which the ambiguous term appears. DESIGN: The authors proposed and implemented several methods to extract lists of key terms associated with Medical Subject Heading terms. These key terms are used to represent the document topic in a knowledge-based WSD system. They are applied both alone and in combination with local context. MEASUREMENTS: A standard measure of accuracy was calculated over the set of target words in the widely used National Library of Medicine WSD dataset. RESULTS AND DISCUSSION: The authors report a significant improvement when combining those key terms with local context, showing that domain information improves the results of a WSD system based on the Unified Medical Language System Metathesaurus alone. The best results were obtained using key terms obtained by relevance feedback and weighted by inverse document frequency.


Subject(s)
Information Storage and Retrieval/methods , Medical Subject Headings , Natural Language Processing , Terminology as Topic , Unified Medical Language System , Knowledge Bases , Medical Informatics/methods
9.
Bioinformatics ; 26(22): 2889-96, 2010 Nov 15.
Article in English | MEDLINE | ID: mdl-20934991

ABSTRACT

MOTIVATION: Word Sense Disambiguation (WSD), automatically identifying the meaning of ambiguous words in context, is an important stage of text processing. This article presents a graph-based approach to WSD in the biomedical domain. The method is unsupervised and does not require any labeled training data. It makes use of knowledge from the Unified Medical Language System (UMLS) Metathesaurus which is represented as a graph. A state-of-the-art algorithm, Personalized PageRank, is used to perform WSD. RESULTS: When evaluated on the NLM-WSD dataset, the algorithm outperforms other methods that rely on the UMLS Metathesaurus alone. AVAILABILITY: The WSD system is open source licensed and available from http://ixa2.si.ehu.es/ukb/. The UMLS, MetaMap program and NLM-WSD corpus are available from the National Library of Medicine https://www.nlm.nih.gov/research/umls/, http://mmtx.nlm.nih.gov and http://wsd.nlm.nih.gov. Software to convert the NLM-WSD corpus into a format that can be used by our WSD system is available from http://www.dcs.shef.ac.uk/∼marks/biomedical_wsd under open source license.


Subject(s)
Algorithms , Computational Biology/methods , Pattern Recognition, Automated/methods , Unified Medical Language System , Databases, Factual , Vocabulary, Controlled
SELECTION OF CITATIONS
SEARCH DETAIL
...