Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Data Brief ; 46: 108779, 2023 Feb.
Article in English | MEDLINE | ID: mdl-36478687

ABSTRACT

Open Government Data (OGD), including statistical data, such as economic, environmental and social indicators, are data published by the public sector for free reuse. These data have a huge potential when exploited using Machine Learning methods. Linked Data technologies facilitate retrieving integrated statistical indicators by defining and executing SPARQL queries. However, statistical indicators are available in different temporal and spatial granularity levels as well using different units of measurement. This data article describes the integrated statistical indicators that were retrieved from the official Scottish data portal in order to facilitate the exploitation of Machine Learning methods in OGD. Multiple SPARQL queries as well as manual search in the data portal were employed towards this end. The resulted dataset comprises the maximum number of compatible datasets, i.e., datasets with matching temporal and spatial characteristics. In particular, the data include 60 statistical indicators from seven categories such as health and social care, housing, and crime and justice. The indicators refer to the 6,976 "2011 data zones" of Scotland, while the year of reference is 2015. Data are ready to be used by the research community, students, policy makers, and journalists and give rise to plenty of social, business, and research scenarios that can be solved using Machine Learning technologies and methods.

2.
Sensors (Basel) ; 22(24)2022 Dec 10.
Article in English | MEDLINE | ID: mdl-36560054

ABSTRACT

Dynamic data (including environmental, traffic, and sensor data) were recently recognized as an important part of Open Government Data (OGD). Although these data are of vital importance in the development of data intelligence applications, such as business applications that exploit traffic data to predict traffic demand, they are prone to data quality errors produced by, e.g., failures of sensors and network faults. This paper explores the quality of Dynamic Open Government Data. To that end, a single case is studied using traffic data from the official Greek OGD portal. The portal uses an Application Programming Interface (API), which is essential for effective dynamic data dissemination. Our research approach includes assessing data quality using statistical and machine learning methods to detect missing values and anomalies. Traffic flow-speed correlation analysis, seasonal-trend decomposition, and unsupervised isolation Forest (iForest) are used to detect anomalies. iForest anomalies are classified as sensor faults and unusual traffic conditions. The iForest algorithm is also trained on additional features, and the model is explained using explainable artificial intelligence. There are 20.16% missing traffic observations, and 50% of the sensors have 15.5% to 33.43% missing values. The average percent of anomalies per sensor is 71.1%, with only a few sensors having less than 10% anomalies. Seasonal-trend decomposition detected 12.6% anomalies in the data of these sensors, and iForest 11.6%, with very few overlaps. To the authors' knowledge, this is the first time a study has explored the quality of dynamic OGD.


Subject(s)
Artificial Intelligence , Machine Learning , Algorithms , Government
3.
Educ Inf Technol (Dordr) ; 27(6): 8859-8882, 2022.
Article in English | MEDLINE | ID: mdl-35340533

ABSTRACT

With Open Data becoming more popular and more public bodies publishing their datasets, the need for educating prospective graduates on how they can use them has become prominent. This study examines the use of the Problem Based Learning (PBL) method and educational technologies to support the development of Open Data skills in university students. The study follows a Design Based Research approach and consists of three phases: a) examination of stakeholders' needs, b) design of an Open Data module, and c) re-design of the module based on the outcomes of its first run. The data collected throughout the three phases come from various sources, namely interviews with practitioners, focus groups with students, and tutors' reflection. The findings suggest that while the PBL method is suitable for Open Data education, special care should be taken to ensure that the potential of educational technologies is fully realised. The study concludes with design principles that aim to guide instructors on how they can incorporate the PBL method and digital tools into Open Data education effectively. Supplementary Information: The online version contains supplementary material available at 10.1007/s10639-022-10995-9.

4.
J Biomed Inform ; 50: 213-25, 2014 Aug.
Article in English | MEDLINE | ID: mdl-24632296

ABSTRACT

The integration of medical data coming from multiple sources is important in clinical research. Amongst others, it enables the discovery of appropriate subjects in patient-oriented research and the identification of innovative results in epidemiological studies. At the same time, the integration of medical data faces significant ethical and legal challenges that impose access constraints. Some of these issues can be addressed by making available aggregated instead of raw record-level data. In many cases however, there is still a need for controlling access even to the resulting aggregated data, e.g., due to data provider's policies. In this paper we present the Linked Medical Data Access Control (LiMDAC) framework that capitalizes on Linked Data technologies to enable controlling access to medical data across distributed sources with diverse access constraints. The LiMDAC framework consists of three Linked Data models, namely the LiMDAC metadata model, the LiMDAC user profile model, and the LiMDAC access policy model. It also includes an architecture that exploits these models. Based on the framework, a proof-of-concept platform is developed and its performance and functionality are evaluated by employing two usage scenarios.


Subject(s)
Access to Information , Medical Record Linkage
5.
AMIA Annu Symp Proc ; 2013: 581-90, 2013.
Article in English | MEDLINE | ID: mdl-24551360

ABSTRACT

BioPortal contains over 300 ontologies, for which quality assurance (QA) is critical. Abstraction networks (ANs), compact summarizations of ontology structure and content, have been used in such QA efforts, typically in a "one-off" manner for a single ontology. Ontologies can be characterized-independently of knowledge-content focus-from a structural standpoint leading to the formulation of ontology families. A family is defined as a set of ontologies satisfying some overarching condition regarding their structural features. Seven such families, comprising 186 ontologies, are identified. To increase efficiency, a new family-based QA framework is introduced in which an automated, uniform AN derivation technique and accompanying semi-automated, uniform QA regimen are applicable to the ontologies of a given family. Specifically, across an entire family, the QA efforts exploit family-wide AN features in the characterization of sets of classes that are more likely to harbor errors. The approach is demonstrated on the Cancer Chemoprevention BioPortal ontology.


Subject(s)
Biological Ontologies , Quality Assurance, Health Care , Abstracting and Indexing , Antineoplastic Agents/therapeutic use , Humans , Neoplasms/prevention & control , Programming Languages
SELECTION OF CITATIONS
SEARCH DETAIL
...