Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add more filters










Database
Language
Publication year range
1.
Data Brief ; 46: 108779, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36478687

ABSTRACT

Open Government Data (OGD), including statistical data, such as economic, environmental and social indicators, are data published by the public sector for free reuse. These data have a huge potential when exploited using Machine Learning methods. Linked Data technologies facilitate retrieving integrated statistical indicators by defining and executing SPARQL queries. However, statistical indicators are available in different temporal and spatial granularity levels as well using different units of measurement. This data article describes the integrated statistical indicators that were retrieved from the official Scottish data portal in order to facilitate the exploitation of Machine Learning methods in OGD. Multiple SPARQL queries as well as manual search in the data portal were employed towards this end. The resulted dataset comprises the maximum number of compatible datasets, i.e., datasets with matching temporal and spatial characteristics. In particular, the data include 60 statistical indicators from seven categories such as health and social care, housing, and crime and justice. The indicators refer to the 6,976 "2011 data zones" of Scotland, while the year of reference is 2015. Data are ready to be used by the research community, students, policy makers, and journalists and give rise to plenty of social, business, and research scenarios that can be solved using Machine Learning technologies and methods.

2.
Sensors (Basel) ; 22(24)2022 Dec 10.
Article in English | MEDLINE | ID: mdl-36560054

ABSTRACT

Dynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to predict traffic demand, they are prone to data quality errors produced by, e.g., failures of sensors and network faults. This paper explores the quality of Dynamic Open Government Data. To that end, a single case is studied using traffic data from the official Greek OGD portal. The portal uses an Application Programming Interface (API), which is essential for effective dynamic data dissemination. Our research approach includes assessing data quality using statistical and machine learning methods to detect missing values and anomalies. Traffic flow-speed correlation analysis, seasonal-trend decomposition, and unsupervised isolation Forest (iForest) are used to detect anomalies. iForest anomalies are classified as sensor faults and unusual traffic conditions. The iForest algorithm is also trained on additional features, and the model is explained using explainable artificial intelligence. There are 20.16% missing traffic observations, and 50% of the sensors have 15.5% to 33.43% missing values. The average percent of anomalies per sensor is 71.1%, with only a few sensors having less than 10% anomalies. Seasonal-trend decomposition detected 12.6% anomalies in the data of these sensors, and iForest 11.6%, with very few overlaps. To the authors' knowledge, this is the first time a study has explored the quality of dynamic OGD.


Subject(s)
Artificial Intelligence , Machine Learning , Algorithms , Government
SELECTION OF CITATIONS
SEARCH DETAIL
...