ABSTRACT
Area under the receiver operating characteristics curve (AUC) is an important metric for a wide range of machine-learning problems, and scalable methods for optimizing AUC have recently been proposed. However, handling very large data sets remains an open challenge for this problem. This article proposes a novel approach to AUC maximization based on sampling mini-batches of positive/negative instance pairs and computing U-statistics to approximate a global risk minimization problem. The resulting algorithm is simple, fast, and learning-rate free. We show that the number of samples required for good performance is independent of the number of pairs available, which is a quadratic function of the positive and negative instances. Extensive experiments show the practical utility of the proposed method.