Pesquisa | Portal Regional da BVS (teste)

Automatic Search-and-Replace From Examples With Coevolutionary Genetic Programming.

Bartoli, Alberto; De Lorenzo, Andrea; Medvet, Eric; Tarlao, Fabiano.

IEEE Trans Cybern ; 51(5): 2612-2624, 2021 May.

Artigo em Inglês | MEDLINE | ID: mdl-31199282

RESUMO

We describe the design and implementation of a system for executing search-and-replace text processing tasks automatically, based only on examples of the desired behavior. The examples consist of pairs describing the original string and the desired modified string. Their construction, thus, does not require any specific technical skill. The system constructs a solution to the specified task that can be used unchanged on popular existing software for text processing. The solution consists of a search pattern coupled with a replacement expression: the former is a regular expression which describes both the strings to be replaced and their portions to be reused in the latter, which describes how to build the modified strings. Our proposed system is internally based on genetic programming and implements a form of cooperative coevolution in which two separate populations are evolved independently, one for search patterns and the other for replacement expressions. We assess our proposal on six tasks of realistic complexity obtaining very good results, both in terms of absolute quality of the solutions and with respect to the challenging baselines considered.

Assuntos

Algoritmos , Mineração de Dados/métodos , Aprendizado de Máquina , Modelos Genéticos , Software , Evolução Molecular

Active Learning of Regular Expressions for Entity Extraction.

Bartoli, Alberto; De Lorenzo, Andrea; Medvet, Eric; Tarlao, Fabiano.

IEEE Trans Cybern ; 48(3): 1067-1080, 2018 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-28358694

RESUMO

We consider the automatic synthesis of an entity extractor, in the form of a regular expression, from examples of the desired extractions in an unstructured text stream. This is a long-standing problem for which many different approaches have been proposed, which all require the preliminary construction of a large dataset fully annotated by the user. In this paper, we propose an active learning approach aimed at minimizing the user annotation effort: the user annotates only one desired extraction and then merely answers extraction queries generated by the system. During the learning process, the system digs into the input text for selecting the most appropriate extraction query to be submitted to the user in order to improve the current extractor. We construct candidate solutions with genetic programming (GP) and select queries with a form of querying-by-committee, i.e., based on a measure of disagreement within the best candidate solutions. All the components of our system are carefully tailored to the peculiarities of active learning with GP and of entity extraction from unstructured text. We evaluate our proposal in depth, on a number of challenging datasets and based on a realistic estimate of the user effort involved in answering each single query. The results demonstrate high accuracy with significant savings in terms of computational effort, annotated characters, and execution time over a state-of-the-art baseline.

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA