Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Front Artif Intell ; 6: 1225791, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37899964

RESUMO

Construction Grammar (CxG) is a paradigm from cognitive linguistics emphasizing the connection between syntax and semantics. Rather than rules that operate on lexical items, it posits constructions as the central building blocks of language, i.e., linguistic units of different granularity that combine syntax and semantics. As a first step toward assessing the compatibility of CxG with the syntactic and semantic knowledge demonstrated by state-of-the-art pretrained language models (PLMs), we present an investigation of their capability to classify and understand one of the most commonly studied constructions, the English comparative correlative (CC). We conduct experiments examining the classification accuracy of a syntactic probe on the one hand and the models' behavior in a semantic application task on the other, with BERT, RoBERTa, and DeBERTa as the example PLMs. Our results show that all three investigated PLMs, as well as OPT, are able to recognize the structure of the CC but fail to use its meaning. While human-like performance of PLMs on many NLP tasks has been alleged, this indicates that PLMs still suffer from substantial shortcomings in central domains of linguistic knowledge.

2.
Proc Natl Acad Sci U S A ; 117(42): 25966-25974, 2020 10 20.
Artigo em Inglês | MEDLINE | ID: mdl-32989131

RESUMO

Language is crucial for human intelligence, but what exactly is its role? We take language to be a part of a system for understanding and communicating about situations. In humans, these abilities emerge gradually from experience and depend on domain-general principles of biological neural networks: connection-based learning, distributed representation, and context-sensitive, mutual constraint satisfaction-based processing. Current artificial language processing systems rely on the same domain general principles, embodied in artificial neural networks. Indeed, recent progress in this field depends on query-based attention, which extends the ability of these systems to exploit context and has contributed to remarkable breakthroughs. Nevertheless, most current models focus exclusively on language-internal tasks, limiting their ability to perform tasks that depend on understanding situations. These systems also lack memory for the contents of prior situations outside of a fixed contextual span. We describe the organization of the brain's distributed understanding system, which includes a fast learning system that addresses the memory problem. We sketch a framework for future models of understanding drawing equally on cognitive neuroscience and artificial intelligence and exploiting query-based attention. We highlight relevant current directions and consider further developments needed to fully capture human-level language understanding in a computational system.


Assuntos
Inteligência Artificial , Encéfalo/fisiologia , Compreensão/fisiologia , Inteligência/fisiologia , Idioma , Redes Neurais de Computação , Vias Neurais/fisiologia , Simulação por Computador , Humanos
3.
Cogn Sci ; 34(4): 537-82, 2010 May.
Artigo em Inglês | MEDLINE | ID: mdl-21564224

RESUMO

This paper presents recent research that provides an overarching model of exemplar theory capable of explaining phenomena across the phonetic and syntactic strata. The model represents a unique exemplar-based account of constituency interactions encompassing both linguistic domains. It yields simulation and experimental results in keeping with experimental findings in the literature on syllable duration variability and offers an exemplar-theoretic account of local grammaticality. In addition, it provides some insights into the nature of exemplar cloud formation and demonstrates experimentally the potential gains that can be enjoyed via the use of rich exemplar representations.

4.
Bioinformatics ; 20(2): 216-25, 2004 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-14734313

RESUMO

MOTIVATION: New high-throughput technologies have accelerated the accumulation of knowledge about genes and proteins. However, much knowledge is still stored as written natural language text. Therefore, we have developed a new method, GAPSCORE, to identify gene and protein names in text. GAPSCORE scores words based on a statistical model of gene names that quantifies their appearance, morphology and context. RESULTS: We evaluated GAPSCORE against the Yapex data set and achieved an F-score of 82.5% (83.3% recall, 81.5% precision) for partial matches and 57.6% (58.5% recall, 56.7% precision) for exact matches. Since the method is statistical, users can choose score cutoffs that adjust the performance according to their needs. AVAILABILITY: GAPSCORE is available at http://bionlp.stanford.edu/gapscore/


Assuntos
Algoritmos , Genes , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão , Publicações Periódicas como Assunto , Proteínas , Terminologia como Assunto , Indexação e Redação de Resumos , Inteligência Artificial , Sistemas de Gerenciamento de Base de Dados , Dicionários como Assunto , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Software
5.
J Am Med Inform Assoc ; 9(6): 612-20, 2002.
Artigo em Inglês | MEDLINE | ID: mdl-12386112

RESUMO

OBJECTIVE: The growth of the biomedical literature presents special challenges for both human readers and automatic algorithms. One such challenge derives from the common and uncontrolled use of abbreviations in the literature. Each additional abbreviation increases the effective size of the vocabulary for a field. Therefore, to create an automatically generated and maintained lexicon of abbreviations, we have developed an algorithm to match abbreviations in text with their expansions. DESIGN: Our method uses a statistical learning algorithm, logistic regression, to score abbreviation expansions based on their resemblance to a training set of human-annotated abbreviations. We applied it to Medstract, a corpus of MEDLINE abstracts in which abbreviations and their expansions have been manually annotated. We then ran the algorithm on all abstracts in MEDLINE, creating a dictionary of biomedical abbreviations. To test the coverage of the database, we used an independently created list of abbreviations from the China Medical Tribune. MEASUREMENTS: We measured the recall and precision of the algorithm in identifying abbreviations from the Medstract corpus. We also measured the recall when searching for abbreviations from the China Medical Tribune against the database. RESULTS: On the Medstract corpus, our algorithm achieves up to 83% recall at 80% precision. Applying the algorithm to all of MEDLINE yielded a database of 781,632 high-scoring abbreviations. Of all the abbreviations in the list from the China Medical Tribune, 88% were in the database. CONCLUSION: We have developed an algorithm to identify abbreviations from text. We are making this available as a public abbreviation server at \url[http://abbreviation.stanford.edu/].


Assuntos
Abreviaturas como Assunto , Algoritmos , MEDLINE , Indexação e Redação de Resumos/tendências , Dicionários como Assunto , Modelos Logísticos , MEDLINE/tendências
6.
Genome Res ; 12(10): 1582-90, 2002 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-12368251

RESUMO

The analysis of large-scale genomic information (such as sequence data or expression patterns) frequently involves grouping genes on the basis of common experimental features. Often, as with gene expression clustering, there are too many groups to easily identify the functionally relevant ones. One valuable source of information about gene function is the published literature. We present a method, neighbor divergence, for assessing whether the genes within a group share a common biological function based on their associated scientific literature. The method uses statistical natural language processing techniques to interpret biological text. It requires only a corpus of documents relevant to the genes being studied (e.g., all genes in an organism) and an index connecting the documents to appropriate genes. Given a group of genes, neighbor divergence assigns a numerical score indicating how "functionally coherent" the gene group is from the perspective of the published literature. We evaluate our method by testing its ability to distinguish 19 known functional gene groups from 1900 randomly assembled groups. Neighbor divergence achieves 79% sensitivity at 100% specificity, comparing favorably to other tested methods. We also apply neighbor divergence to previously published gene expression clusters to assess its ability to recognize gene groups that had been manually identified as representative of a common function.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Genes Fúngicos/fisiologia , Processamento de Linguagem Natural , Projetos de Pesquisa/estatística & dados numéricos , Algoritmos , Inteligência Artificial , Análise por Conglomerados , Biologia Computacional/tendências , Bases de Dados Genéticas/estatística & dados numéricos , Análise Discriminante , Perfilação da Expressão Gênica/estatística & dados numéricos , Perfilação da Expressão Gênica/tendências , Genoma Fúngico , Serviços de Informação , Projetos de Pesquisa/tendências , Saccharomyces cerevisiae/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...