Anomaly Detection in COVID-19 Time-Series Data.

Homayouni, Hajar; Ray, Indrakshi; Ghosh, Sudipto; Gondalia, Shlok; Kahn, Michael G

Homayouni, Hajar; Ray, Indrakshi; Ghosh, Sudipto; Gondalia, Shlok; Kahn, Michael G.

Homayouni H; Computer Science Department, Colorado State University, Fort Collins, CO 80523 USA.
Ray I; Computer Science Department, Colorado State University, Fort Collins, CO 80523 USA.
Ghosh S; Computer Science Department, Colorado State University, Fort Collins, CO 80523 USA.
Gondalia S; Computer Science Department, Colorado State University, Fort Collins, CO 80523 USA.
Kahn MG; Anschutz Medical Campus, University of Colorado Denver, Aurora, CO 80045 USA.

SN Comput Sci ; 2(4): 279, 2021.

Article in English | MEDLINE | ID: covidwho-1240116

ABSTRACT

ABSTRACT

Anomaly detection and explanation in big volumes of real-world medical data, such as those pertaining to COVID-19, pose some challenges. First, we are dealing with time-series data. Typical time-series data describe behavior of a single object over time. In medical data, we are dealing with time-series data belonging to multiple entities. Thus, there may be multiple subsets of records such that records in each subset, which belong to a single entity are temporally dependent, but the records in different subsets are unrelated. Moreover, the records in a subset contain different types of attributes, some of which must be grouped in a particular manner to make the analysis meaningful. Anomaly detection techniques need to be customized for time-series data belonging to multiple entities. Second, anomaly detection techniques fail to explain the cause of outliers to the experts. This is critical for new diseases and pandemics where current knowledge is insufficient. We propose to address these issues by extending our existing work called IDEAL, which is an LSTM-autoencoder based approach for data quality testing of sequential records, and provides explanations of constraint violations in a manner that is understandable to end-users. The extension (1) uses a novel two-level reshaping technique that splits COVID-19 data sets into multiple temporally-dependent subsequences and (2) adds a data visualization plot to further explain the anomalies and evaluate the level of abnormality of subsequences detected by IDEAL. We performed two systematic evaluation studies for our anomalous subsequence detection. One study uses aggregate data, including the number of cases, deaths, recovered, and percentage of hospitalization rate, collected from a COVID tracking project, New York Times, and Johns Hopkins for the same time period. The other study uses COVID-19 patient medical records obtained from Anschutz Medical Center health data warehouse. The results are promising and indicate that our techniques can be used to detect anomalies in large volumes of real-world unlabeled data whose accuracy or validity is unknown.

Keywords

Anomaly detection; COVID-19 data; Data quality tests; Explainability; LSTM-autoencoder; Time series

Fulltext

XML

PubMed Links

Search on Google

Full text: Available Collection: International databases Database: MEDLINE Type of study: Experimental Studies / Systematic review/Meta Analysis Language: English Journal: SN Comput Sci Year: 2021 Document Type: Article

Similar

MEDLINE

LILACS

LIS

Fulltext

XML

PubMed Links

Search on Google