Your browser doesn't support javascript.
Reptile: Aggregation-level Explanations for Hierarchical Data
Proceedings of the 2022 International Conference on Management of Data (Sigmod '22) ; : 399-413, 2022.
Article in English | Web of Science | ID: covidwho-2042879
ABSTRACT
Users often can see from overview-level statistics that some results look "off", but are rarely able to characterize even the type of error. Reptile is an iterative human-in-the-loop explanation and cleaning system for errors in hierarchical data. Users specify an anomalous distributive aggregation result (a complaint), and Reptile recommends drill-down operations to help the user "zoom-in" on the underlying errors. Unlike prior explanation systems that intervene on raw records, Reptile intervenes by learning a group's expected statistics, and ranks drill-down sub-groups by how much the intervention fixes the complaint. This group-level formulation supports a wide range of error types (missing, duplicates, value errors) and uniquely leverages the distributive properties of the user complaint. Further, the learning-based intervention lets users provide domain expertise that Reptile learns from. In each drill-down iteration, Reptile must train a large number of predictive models. We thus extend factorised learning from countjoin queries to aggregation-join queries, and develop a suite of optimizations that leverage the data's hierarchical structure. These optimizations reduce runtimes by >6x compared to a Lapack-based implementation. When applied to real-world Covid-19 and African farmer survey data, Reptile correctly identifies 21/30 (vs 2 using existing explanation approaches) and 20/22 errors. Reptile has been deployed in Ethiopia and Zambia, and used to clean nationwide farmer survey data;the clean data has been used to design national drought insurance policies.
Keywords

Full text: Available Collection: Databases of international organizations Database: Web of Science Language: English Journal: Proceedings of the 2022 International Conference on Management of Data (Sigmod '22) Year: 2022 Document Type: Article

Similar

MEDLINE

...
LILACS

LIS


Full text: Available Collection: Databases of international organizations Database: Web of Science Language: English Journal: Proceedings of the 2022 International Conference on Management of Data (Sigmod '22) Year: 2022 Document Type: Article