Pesquisa | Portal Regional da BVS (teste)

1.

An Efficient, Parallelized Algorithm for Optimal Conditional Entropy-Based Feature Selection.

Estrela, Gustavo; Gubitoso, Marco Dimas; Ferreira, Carlos Eduardo; Barrera, Junior; Reis, Marcelo S.

Entropy (Basel) ; 22(4)2020 Apr 24.

Artigo em Inglês | MEDLINE | ID: mdl-33286261

RESUMO

In Machine Learning, feature selection is an important step in classifier design. It consists of finding a subset of features that is optimum for a given cost function. One possibility to solve feature selection is to organize all possible feature subsets into a Boolean lattice and to exploit the fact that the costs of chains in that lattice describe U-shaped curves. Minimization of such cost function is known as the U-curve problem. Recently, a study proposed U-Curve Search (UCS), an optimal algorithm for that problem, which was successfully used for feature selection. However, despite of the algorithm optimality, the UCS required time in computational assays was exponential on the number of features. Here, we report that such scalability issue arises due to the fact that the U-curve problem is NP-hard. In the sequence, we introduce the Parallel U-Curve Search (PUCS), a new algorithm for the U-curve problem. In PUCS, we present a novel way to partition the search space into smaller Boolean lattices, thus rendering the algorithm highly parallelizable. We also provide computational assays with both synthetic data and Machine Learning datasets, where the PUCS performance was assessed against UCS and other golden standard algorithms in feature selection.

2.

An efficient, parallelized algorithm for optimal conditional entropy-based feature selection

Estrela, Gustavo; Gubitoso, Marco Dimas; Ferreira, Carlos Eduardo; Barrera, Junior; Reis, Marcelo da Silva.

Entropy, v. 22, n. 4, 492, abr. 2020

Artigo em Inglês | Sec. Est. Saúde SP, SESSP-IBPROD, Sec. Est. Saúde SP | ID: bud-3069

RESUMO

In Machine Learning, feature selection is an important step in classifier design. It consists of finding a subset of features that is optimum for a given cost function. One possibility to solve feature selection is to organize all possible feature subsets into a Boolean lattice and to exploit the fact that the costs of chains in that lattice describe U-shaped curves. Minimization of such cost function is known as the U-curve problem. Recently, a study proposed U-Curve Search (UCS), an optimal algorithm for that problem, which was successfully used for feature selection. However, despite of the algorithm optimality, the UCS required time in computational assays was exponential on the number of features. Here, we report that such scalability issue arises due to the fact that the U-curve problem is NP-hard. In the sequence, we introduce the Parallel U-Curve Search (PUCS), a new algorithm for the U-curve problem. In PUCS, we present a novel way to partition the search space into smaller Boolean lattices, thus rendering the algorithm highly parallelizable. We also provide computational assays with both synthetic data and Machine Learning datasets, where the PUCS performance was assessed against UCS and other golden standard algorithms in feature selection

3.

An efficient, parallelized algorithm for optimal conditional entropy-based feature selection

Estrela, Gustavo; Gubitoso, Marco Dimas; Ferreira, Carlos Eduardo; Barrera, Junior; Reis, Marcelo da Silva.

Entropy ; 22(4): 492, 2020.

Artigo em Inglês | Sec. Est. Saúde SP, SESSP-IBPROD, Sec. Est. Saúde SP | ID: but-ib17734

RESUMO

In Machine Learning, feature selection is an important step in classifier design. It consists of finding a subset of features that is optimum for a given cost function. One possibility to solve feature selection is to organize all possible feature subsets into a Boolean lattice and to exploit the fact that the costs of chains in that lattice describe U-shaped curves. Minimization of such cost function is known as the U-curve problem. Recently, a study proposed U-Curve Search (UCS), an optimal algorithm for that problem, which was successfully used for feature selection. However, despite of the algorithm optimality, the UCS required time in computational assays was exponential on the number of features. Here, we report that such scalability issue arises due to the fact that the U-curve problem is NP-hard. In the sequence, we introduce the Parallel U-Curve Search (PUCS), a new algorithm for the U-curve problem. In PUCS, we present a novel way to partition the search space into smaller Boolean lattices, thus rendering the algorithm highly parallelizable. We also provide computational assays with both synthetic data and Machine Learning datasets, where the PUCS performance was assessed against UCS and other golden standard algorithms in feature selection

4.

Optimal Boolean lattice-based algorithms for the U-curve optimization problem

Reis, Marcelo da Silva; Estrela, Gustavo; Ferreira, Carlos Eduardo; Barrera, Junior.

Inf Sci, v. 471, p. 97-114, jan. 2019

Artigo em Inglês | Sec. Est. Saúde SP, SESSP-IBPROD, Sec. Est. Saúde SP | ID: bud-2567

RESUMO

The U-curve optimization problem is characterized by a decomposable in U-shaped curves cost function over the chains of a Boolean lattice. This problem can be applied to model the classical feature selection problem in Machine Learning. In this paper, we point out that the firstly proposed algorithm to tackle the U-curve problem, the RBM algorithm, is in fact suboptimal. We also present two new algorithms: UCS, which is actually optimal to tackle this problem; and UCSR, a variation of UCS that solves a special case of the U-curve problem and relies on a reduced, ordered binary decision diagram to control the search space. We provide results of two computational assays with these new algorithms: first, W-operator design for filtering of binary images; second, linear SVM design for classification of data sets from the UCI Machine Learning Repository. We show that, in these assays, UCS and UCSR outperformed an exhaustive search and also three widely used heuristics: the SFFS sequential selection, the BFS graph-based search, and the CHCGA genetic algorithm. Finally, we analyze the obtained results and point out improvements that might enhance the performance of these two novel algorithms.

5.

Optimal Boolean lattice-based algorithms for the U-curve optimization problem

Reis, Marcelo da Silva; Estrela, Gustavo; Ferreira, Carlos Eduardo; Barrera, Junior.

Inf Sci ; 471: p. 97-114, 2019.

Artigo em Inglês | Sec. Est. Saúde SP, SESSP-IBPROD, Sec. Est. Saúde SP | ID: but-ib15594

RESUMO

The U-curve optimization problem is characterized by a decomposable in U-shaped curves cost function over the chains of a Boolean lattice. This problem can be applied to model the classical feature selection problem in Machine Learning. In this paper, we point out that the firstly proposed algorithm to tackle the U-curve problem, the RBM algorithm, is in fact suboptimal. We also present two new algorithms: UCS, which is actually optimal to tackle this problem; and UCSR, a variation of UCS that solves a special case of the U-curve problem and relies on a reduced, ordered binary decision diagram to control the search space. We provide results of two computational assays with these new algorithms: first, W-operator design for filtering of binary images; second, linear SVM design for classification of data sets from the UCI Machine Learning Repository. We show that, in these assays, UCS and UCSR outperformed an exhaustive search and also three widely used heuristics: the SFFS sequential selection, the BFS graph-based search, and the CHCGA genetic algorithm. Finally, we analyze the obtained results and point out improvements that might enhance the performance of these two novel algorithms.

RESUMO

RESUMO

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA