We propose a novel data-driven framework for assessing the a-priori epidemic risk of a geographical area and for identifying high-risk areas within a country. Our risk index is evaluated as a function of three different components: the hazard of the disease, the exposure of the area and the vulnerability of its inhabitants. As an application, we discuss the case of COVID-19 outbreak in Italy. We characterize each of the twenty Italian regions by using available historical data on air pollution, human mobility, winter temperature, housing concentration, health care density, population size and age. We find that the epidemic risk is higher in some of the Northern regions with respect to Central and Southern Italy. The corresponding risk index shows correlations with the available official data on the number of infected individuals, patients in intensive care and deceased patients, and can help explaining why regions such as Lombardia, Emilia-Romagna, Piemonte and Veneto have suffered much more than the rest of the country. Although the COVID-19 outbreak started in both North (Lombardia) and Central Italy (Lazio) almost at the same time, when the first cases were officially certified at the beginning of 2020, the disease has spread faster and with heavier consequences in regions with higher epidemic risk. Our framework can be extended and tested on other epidemic data, such as those on seasonal flu, and applied to other countries. We also present a policy model connected with our methodology, which might help policy-makers to take informed decisions.