Pesquisa | Portal Regional da BVS

How to Query an Oracle? Efficient Strategies to Label Data.

Lahouti, Farshad; Kostina, Victoria; Hassibi, Babak.

IEEE Trans Pattern Anal Mach Intell ; 44(11): 7597-7609, 2022 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-34618669

RESUMO

We consider the basic problem of querying an expert oracle for labeling a dataset in machine learning. This is typically an expensive and time consuming process and therefore, we seek ways to do so efficiently. The conventional approach involves comparing each sample with (the representative of) each class to find a match. In a setting with N equally likely classes, this involves N/2 pairwise comparisons (queries per sample) on average. We consider a k-ary query scheme with k ≥ 2 samples in a query that identifies (dis)similar items in the set while effectively exploiting the associated transitive relations. We present a randomized batch algorithm that operates on a round-by-round basis to label the samples and achieves a query rate of [Formula: see text]. In addition, we present an adaptive greedy query scheme, which achieves an average rate of ≈ 0.2N queries per sample with triplet queries. For the proposed algorithms, we investigate the query rate performance analytically and with simulations. Empirical studies suggest that each triplet query takes an expert at most 50% more time compared with a pairwise query, indicating the effectiveness of the proposed k-ary query schemes. We generalize the analyses to nonuniform class distributions when possible.

A Lower Bound on the Differential Entropy of Log-Concave Random Vectors with Applications.

Marsiglietti, Arnaud; Kostina, Victoria.

Entropy (Basel) ; 20(3)2018 Mar 09.

Artigo em Inglês | MEDLINE | ID: mdl-33265276

RESUMO

We derive a lower bound on the differential entropy of a log-concave random variable X in terms of the p-th absolute moment of X. The new bound leads to a reverse entropy power inequality with an explicit constant, and to new bounds on the rate-distortion function and the channel capacity. Specifically, we study the rate-distortion function for log-concave sources and distortion measure d ( x , x ^ ) = | x - x ^ | r , with r ≥ 1 , and we establish that the difference between the rate-distortion function and the Shannon lower bound is at most log ( π e ) ≈ 1 . 5 bits, independently of r and the target distortion d. For mean-square error distortion, the difference is at most log ( π e 2 ) ≈ 1 bit, regardless of d. We also provide bounds on the capacity of memoryless additive noise channels when the noise is log-concave. We show that the difference between the capacity of such channels and the capacity of the Gaussian channel with the same noise power is at most log ( π e 2 ) ≈ 1 bit. Our results generalize to the case of a random vector X with possibly dependent coordinates. Our proof technique leverages tools from convex geometry.

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA