Search | VHL Regional Portal

Developing a Natural Language Processing tool to identify perinatal self-harm in electronic healthcare records.

Ayre, Karyn; Bittar, André; Kam, Joyce; Verma, Somain; Howard, Louise M; Dutta, Rina.

PLoS One ; 16(8): e0253809, 2021.

Article in English | MEDLINE | ID: mdl-34347787

ABSTRACT

BACKGROUND: Self-harm occurring within pregnancy and the postnatal year ("perinatal self-harm") is a clinically important yet under-researched topic. Current research likely under-estimates prevalence due to methodological limitations. Electronic healthcare records (EHRs) provide a source of clinically rich data on perinatal self-harm. AIMS: (1) To create a Natural Language Processing (NLP) tool that can, with acceptable precision and recall, identify mentions of acts of perinatal self-harm within EHRs. (2) To use this tool to identify service-users who have self-harmed perinatally, based on their EHRs. METHODS: We used the Clinical Record Interactive Search system to extract de-identified EHRs of secondary mental healthcare service-users at South London and Maudsley NHS Foundation Trust. We developed a tool that applied several layers of linguistic processing based on the spaCy NLP library for Python. We evaluated mention-level performance in the following domains: span, status, temporality and polarity. Evaluation was done against a manually coded reference standard. Mention-level performance was reported as precision, recall, F-score and Cohen's kappa for each domain. Performance was also assessed at 'service-user' level and explored whether a heuristic rule improved this. We report per-class statistics for service-user performance, as well as likelihood ratios and post-test probabilities. RESULTS: Mention-level performance: micro-averaged F-score, precision and recall for span, polarity and temporality >0.8. Kappa for status 0.68, temporality 0.62, polarity 0.91. Service-user level performance with heuristic: F-score, precision, recall of minority class 0.69, macro-averaged F-score 0.81, positive LR 9.4 (4.8-19), post-test probability 69.0% (53-82%). Considering the task difficulty, the tool performs well, although temporality was the attribute with the lowest level of annotator agreement. CONCLUSIONS: It is feasible to develop an NLP tool that identifies, with acceptable validity, mentions of perinatal self-harm within EHRs, although with limitations regarding temporality. Using a heuristic rule, it can also function at a service-user-level.

Subject(s)

Electronic Health Records , Natural Language Processing , Self-Injurious Behavior , Adolescent , Adult , Female , Humans , Perinatal Care , Pregnancy , Young Adult

Generation and evaluation of artificial mental health records for Natural Language Processing.

Ive, Julia; Viani, Natalia; Kam, Joyce; Yin, Lucia; Verma, Somain; Puntis, Stephen; Cardinal, Rudolf N; Roberts, Angus; Stewart, Robert; Velupillai, Sumithra.

NPJ Digit Med ; 3: 69, 2020.

Article in English | MEDLINE | ID: mdl-32435697

ABSTRACT

A serious obstacle to the development of Natural Language Processing (NLP) methods in the clinical domain is the accessibility of textual data. The mental health domain is particularly challenging, partly because clinical documentation relies heavily on free text that is difficult to de-identify completely. This problem could be tackled by using artificial medical data. In this work, we present an approach to generate artificial clinical documents. We apply this approach to discharge summaries from a large mental healthcare provider and discharge summaries from an intensive care unit. We perform an extensive intrinsic evaluation where we (1) apply several measures of text preservation; (2) measure how much the model memorises training data; and (3) estimate clinical validity of the generated text based on a human evaluation task. Furthermore, we perform an extrinsic evaluation by studying the impact of using artificial text in a downstream NLP text classification task. We found that using this artificial data as training data can lead to classification results that are comparable to the original results. Additionally, using only a small amount of information from the original data to condition the generation of the artificial data is successful, which holds promise for reducing the risk of these artificial data retaining rare information from the original data. This is an important finding for our long-term goal of being able to generate artificial clinical data that can be released to the wider research community and accelerate advances in developing computational methods that use healthcare data.

Annotating Temporal Relations to Determine the Onset of Psychosis Symptoms.

Viani, Natalia; Kam, Joyce; Yin, Lucia; Verma, Somain; Stewart, Robert; Patel, Rashmi; Velupillai, Sumithra.

Stud Health Technol Inform ; 264: 418-422, 2019 Aug 21.

Article in English | MEDLINE | ID: mdl-31437957

ABSTRACT

For patients with a diagnosis of schizophrenia, determining symptom onset is crucial for timely and successful intervention. In mental health records, information about early symptoms is often documented only in free text, and thus needs to be extracted to support clinical research. To achieve this, natural language processing (NLP) methods can be used. Development and evaluation of NLP systems requires manually annotated corpora. We present a corpus of mental health records annotated with temporal relations for psychosis symptoms. We propose a methodology for document selection and manual annotation to detect symptom onset information, and develop an annotated corpus. To assess the utility of the created corpus, we propose a pilot NLP system. To the best of our knowledge, this is the first temporally-annotated corpus tailored to a specific clinical use-case.

Subject(s)

Natural Language Processing , Psychotic Disorders , Electronic Health Records , Humans , Records

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL