Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
J Biomed Semantics ; 12(1): 11, 2021 07 14.
Article in English | MEDLINE | ID: mdl-34261535

ABSTRACT

BACKGROUND: The limited availability of clinical texts for Natural Language Processing purposes is hindering the progress of the field. This article investigates the use of synthetic data for the annotation and automated extraction of family history information from Norwegian clinical text. We make use of incrementally developed synthetic clinical text describing patients' family history relating to cases of cardiac disease and present a general methodology which integrates the synthetically produced clinical statements and annotation guideline development. The resulting synthetic corpus contains 477 sentences and 6030 tokens. In this work we experimentally assess the validity and applicability of the annotated synthetic corpus using machine learning techniques and furthermore evaluate the system trained on synthetic text on a corpus of real clinical text, consisting of de-identified records for patients with genetic heart disease. RESULTS: For entity recognition, an SVM trained on synthetic data had class weighted precision, recall and F1-scores of 0.83, 0.81 and 0.82, respectively. For relation extraction precision, recall and F1-scores were 0.74, 0.75 and 0.74. CONCLUSIONS: A system for extraction of family history information developed on synthetic data generalizes well to real, clinical notes with a small loss of accuracy. The methodology outlined in this paper may be useful in other situations where limited availability of clinical text hinders NLP tasks. Both the annotation guidelines and the annotated synthetic corpus are made freely available and as such constitutes the first publicly available resource of Norwegian clinical text.


Subject(s)
Machine Learning , Natural Language Processing , Humans , Language
2.
Philos Trans R Soc Lond B Biol Sci ; 376(1824): 20200202, 2021 05 10.
Article in English | MEDLINE | ID: mdl-33745308

ABSTRACT

Two families of quantitative methods have been used to infer geographical homelands of language families: Bayesian phylogeography and the 'diversity method'. Bayesian methods model how populations may have moved using a phylogenetic tree as a backbone, while the diversity method assumes that the geographical area where linguistic diversity is highest likely corresponds to the homeland. No systematic tests of the performances of the different methods in a linguistic context have so far been published. Here, we carry out performance testing by simulating language families, including branching structures and word lists, along with speaker populations moving in space. We test six different methods: two versions of BayesTraits; the relaxed random walk model of BEAST 2; our own RevBayes implementations of a fixed rate and a variable rates random walk model; and the diversity method. As a result of the tests, we propose a hierarchy of performance of the different methods. Factors such as geographical idiosyncrasies, incomplete sampling, tree imbalance and small family sizes all have a negative impact on performance, but mostly across the board, the performance hierarchy generally being impervious to such factors. This article is part of the theme issue 'Reconstructing prehistoric languages'.


Subject(s)
Cultural Evolution , Human Migration , Language , Linguistics/methods , Bayes Theorem , Humans , Phylogeny , Phylogeography
3.
BMC Med Inform Decis Mak ; 21(1): 84, 2021 03 04.
Article in English | MEDLINE | ID: mdl-33663479

ABSTRACT

BACKGROUND: With a motivation of quality assurance, machine learning techniques were trained to classify Norwegian radiology reports of paediatric CT examinations according to their description of abnormal findings. METHODS: 13.506 reports from CT-scans of children, 1000 reports from CT scan of adults and 1000 reports from X-ray examination of adults were classified as positive or negative by a radiologist, according to the presence of abnormal findings. Inter-rater reliability was evaluated by comparison with a clinician's classifications of 500 reports. Test-retest reliability of the radiologist was performed on the same 500 reports. A convolutional neural network model (CNN), a bidirectional recurrent neural network model (bi-LSTM) and a support vector machine model (SVM) were trained on a random selection of the children's data set. Models were evaluated on the remaining CT-children reports and the adult data sets. RESULTS: Test-retest reliability: Cohen's Kappa = 0.86 and F1 = 0.919. Inter-rater reliability: Kappa = 0.80 and F1 = 0.885. Model performances on the Children-CT data were as follows. CNN: (AUC = 0.981, F1 = 0.930), bi-LSTM: (AUC = 0.978, F1 = 0.927), SVM: (AUC = 0.975, F1 = 0.912). On the adult data sets, the models had AUC around 0.95 and F1 around 0.91. CONCLUSIONS: The models performed close to perfectly on its defined domain, and also performed convincingly on reports pertaining to a different patient group and a different modality. The models were deemed suitable for classifying radiology reports for future quality assurance purposes, where the fraction of the examinations with abnormal findings for different sub-groups of patients is a parameter of interest.


Subject(s)
Radiology , Tomography, X-Ray Computed , Adult , Child , Humans , Neural Networks, Computer , Radiography , Reproducibility of Results
4.
J Healthc Inform Res ; 5(1): 114-131, 2021.
Article in English | MEDLINE | ID: mdl-33437913

ABSTRACT

This paper reports on our efforts to collect daily COVID-19-related symptoms for a large public university population, as well as study relationship between reported symptoms and individual movements. We developed a set of tools to collect and integrate individual-level data. COVID-19-related symptoms are collected using a self-reporting tool initially implemented in Qualtrics survey system and consequently moved to .NET framework. Individual movement data are collected using off-the-shelf tracking apps available for iPhone and Android phones. Data integration and analysis are done in PostgreSQL, Python, and R. As of September 2020, we collected about 184,000 daily symptom responses for 20,000 individuals, as well as over 15,000 days of GPS movement data for 175 individuals. The analysis of the data indicates that headache is the most frequently reported symptom, present almost always when any other symptoms are reported as indicated by derived association rules. It is followed by cough, sore throat, and aches. The study participants traveled on average 223.61 km every week with a large standard deviation of 254.53 and visited on average 5.77 ± 4.75 locations each week for at least 10 min. However, there is no evidence that reported symptoms or prior COVID-19 contact affects movements (p > 0.3 in most models). The evidence suggests that although some individuals limit their movements during pandemics, the overall study population do not change their movements as suggested by guidelines.

5.
PLoS One ; 15(8): e0236522, 2020.
Article in English | MEDLINE | ID: mdl-32785236

ABSTRACT

In current practice, when dating the root of a Bayesian language phylogeny the researcher is required to supply some of the information beforehand, including a distribution of root ages and dates for some nodes serving as calibration points. In addition to the potential subjectivity that this leaves room for, the problem arises that for many of the language families of the world there are no available internal calibration points. Here we address the following questions: Can a new Bayesian framework which overcomes these problems be introduced and how well does it perform? The new framework that we present is generalized in the sense that no family-specific priors or calibration points are needed. We moreover introduce a way to overcome another potential source of subjectivity in Bayesian tree inference as commonly practiced, namely that of manual cognate identification; instead, we apply an automated approach. Dates are obtained by fitting a Gamma regression model to tree lengths and known time depths for 30 phylogenetically independent calibration points. This model is used to predict the time depths of both the root and the internal nodes for 116 language families, producing a total of 1,287 dates for families and subgroups. It turns out that results are similar to those of published Bayesian studies of individual language families. The performance of the method is compared to automated glottochronology, which is an update of the classical method of Swadesh drawing upon automated cognate recognition and a new formula for deriving a time depth from percentages of shared cognates. It is also compared to a third dating method, that of the Automated Similarity Judgment Program (ASJP). In terms of errors and correlations with known dates, ASJP works better than the new method and both work better than automated glottochronology.


Subject(s)
Bayes Theorem , Language/history , Phylogeny , Fossils , History, Ancient , Humans , Linguistics
6.
PLoS One ; 8(5): e63238, 2013.
Article in English | MEDLINE | ID: mdl-23691003

ABSTRACT

The ASJP (Automated Similarity Judgment Program) described an automated, lexical similarity-based method for dating the world's language groups using 52 archaeological, epigraphic and historical calibration date points. The present paper describes a new automated dating method, based on phonotactic diversity. Unlike ASJP, our method does not require any information on the internal classification of a language group. Also, the method can use all the available word lists for a language and its dialects eschewing the debate on 'language' vs. 'dialect'. We further combine these dates and provide a new baseline which, to our knowledge, is the best one. We make a systematic comparison of our method, ASJP's dating procedure, and combined dates. We predict time depths for world's language families and sub-families using this new baseline. Finally, we explain our results in the model of language change given by Nettle.


Subject(s)
Algorithms , Cultural Evolution , Language/history , Linguistics/methods , Models, Theoretical , Phonation , History, Ancient , Humans , Time Factors
SELECTION OF CITATIONS
SEARCH DETAIL
...