Pesquisa | Portal Regional da BVS (teste)

Ensuring generalized fairness in batch classification.

Pal, Manjish; Pokhriyal, Subham; Sikdar, Sandipan; Ganguly, Niloy.

Sci Rep ; 13(1): 18892, 2023 Nov 02.

Artigo em Inglês | MEDLINE | ID: mdl-37919372

RESUMO

In this paper, we consider the problem of batch classification and propose a novel framework for achieving fairness in such settings. The problem of batch classification involves selection of a set of individuals, often encountered in real-world scenarios such as job recruitment, college admissions etc. This is in contrast to a typical classification problem, where each candidate in the test set is considered separately and independently. In such scenarios, achieving the same acceptance rate (i.e., probability of the classifier assigning positive class) for each group (membership determined by the value of sensitive attributes such as gender, race etc.) is often not desirable, and the regulatory body specifies a different acceptance rate for each group. The existing fairness enhancing methods do not allow for such specifications and hence are unsuited for such scenarios. In this paper, we define a configuration model whereby the acceptance rate of each group can be regulated and further introduce a novel batch-wise fairness post-processing framework using the classifier confidence-scores. We deploy our framework across four real-world datasets and two popular notions of fairness, namely demographic parity and equalized odds. In addition to consistent performance improvements over the competing baselines, the proposed framework allows flexibility and significant speed-up. It can also seamlessly incorporate multiple overlapping sensitive attributes. To further demonstrate the generalizability of our framework, we deploy it to the problem of fair gerrymandering where it achieves a better fairness-accuracy trade-off than the existing baseline method.

The Effects of Gender Signals and Performance in Online Product Reviews.

Sikdar, Sandipan; Sachdeva, Rachneet; Wachs, Johannes; Lemmerich, Florian; Strohmaier, Markus.

Front Big Data ; 4: 771404, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-35072061

RESUMO

This work quantifies the effects of signaling gender through gender specific user names, on the success of reviews written on the popular amazon.com shopping platform. Highly rated reviews play an important role in e-commerce since they are prominently displayed next to products. Differences in reviews, perceived-consciously or unconsciously-with respect to gender signals, can lead to crucial biases in determining what content and perspectives are represented among top reviews. To investigate this, we extract signals of author gender from user names to select reviews where the author's likely gender can be inferred. Using reviews authored by these gender-signaling authors, we train a deep learning classifier to quantify the gendered writing style (i.e., gendered performance) of reviews written by authors who do not send clear gender signals via their user name. We contrast the effects of gender signaling and performance on the review helpfulness ratings using matching experiments. This is aimed at understanding if an advantage is to be gained by (not) signaling one's gender when posting reviews. While we find no general trend that gendered signals or performances influence overall review success, we find strong context-specific effects. For example, reviews in product categories such as Electronics or Computers are perceived as less helpful when authors signal that they are likely woman, but are received as more helpful in categories such as Beauty or Clothing. In addition to these interesting findings, we believe this general chain of tools could be deployed across various social media platforms.

Unsupervised ranking of clustering algorithms by INFOMAX.

Sikdar, Sandipan; Mukherjee, Animesh; Marsili, Matteo.

PLoS One ; 15(10): e0239331, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33104709

RESUMO

Clustering and community detection provide a concise way of extracting meaningful information from large datasets. An ever growing plethora of data clustering and community detection algorithms have been proposed. In this paper, we address the question of ranking the performance of clustering algorithms for a given dataset. We show that, for hard clustering and community detection, Linsker's Infomax principle can be used to rank clustering algorithms. In brief, the algorithm that yields the highest value of the entropy of the partition, for a given number of clusters, is the best one. We show indeed, on a wide range of datasets of various sizes and topological structures, that the ranking provided by the entropy of the partition over a variety of partitioning algorithms is strongly correlated with the overlap with a ground truth partition The codes related to the project are available in https://github.com/Sandipan99/Ranking_cluster_algorithms.

Assuntos

Algoritmos , Interface Usuário-Computador , Análise por Conglomerados , Bases de Dados Factuais

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA