Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
1.
Sensors (Basel) ; 24(8)2024 Apr 19.
Article in English | MEDLINE | ID: mdl-38676224

ABSTRACT

Patient care and management have entered a new arena, where intelligent technology can assist clinicians in both diagnosis and treatment [...].


Subject(s)
Artificial Intelligence , Delivery of Health Care , Internet of Things , Humans
2.
NPJ Digit Med ; 7(1): 11, 2024 Jan 13.
Article in English | MEDLINE | ID: mdl-38218738

ABSTRACT

Urinary Tract Infections (UTIs) are one of the most prevalent bacterial infections in older adults and a significant contributor to unplanned hospital admissions in People Living with Dementia (PLWD), with early detection being crucial due to the predicament of reporting symptoms and limited help-seeking behaviour. The most common diagnostic tool is urine sample analysis, which can be time-consuming and is only employed where UTI clinical suspicion exists. In this method development and proof-of-concept study, participants living with dementia were monitored via low-cost devices in the home that passively measure activity, sleep, and nocturnal physiology. Using 27828 person-days of remote monitoring data (from 117 participants), we engineered features representing symptoms used for diagnosing a UTI. We then evaluate explainable machine learning techniques in passively calculating UTI risk and perform stratification on scores to support clinical translation and allow control over the balance between alert rate and sensitivity and specificity. The proposed UTI algorithm achieves a sensitivity of 65.3% (95% Confidence Interval (CI) = 64.3-66.2) and specificity of 70.9% (68.6-73.1) when predicting UTIs on unseen participants and after risk stratification, a sensitivity of 74.7% (67.9-81.5) and specificity of 87.9% (85.0-90.9). In addition, feature importance methods reveal that the largest contributions to the predictions were bathroom visit statistics, night-time respiratory rate, and the number of previous UTI events, aligning with the literature. Our machine learning method alerts clinicians of UTI risk in subjects, enabling earlier detection and enhanced screening when considering treatment.

3.
Sci Data ; 10(1): 606, 2023 09 09.
Article in English | MEDLINE | ID: mdl-37689815

ABSTRACT

Dementia is a progressive condition that affects cognitive and functional abilities. There is a need for reliable and continuous health monitoring of People Living with Dementia (PLWD) to improve their quality of life and support their independent living. Healthcare services often focus on addressing and treating already established health conditions that affect PLWD. Managing these conditions continuously can inform better decision-making earlier for higher-quality care management for PLWD. The Technology Integrated Health Management (TIHM) project developed a new digital platform to routinely collect longitudinal, observational, and measurement data, within the home and apply machine learning and analytical models for the detection and prediction of adverse health events affecting the well-being of PLWD. This work describes the TIHM dataset collected during the second phase (i.e., feasibility study) of the TIHM project. The data was collected from homes of 56 PLWD and associated with events and clinical observations (daily activity, physiological monitoring, and labels for health-related conditions). The study recorded an average of 50 days of data per participant, totalling 2803 days.


Subject(s)
Dementia , Quality of Life , Humans , Activities of Daily Living , Delivery of Health Care , Health Facilities
4.
Bioinformatics ; 39(3)2023 03 01.
Article in English | MEDLINE | ID: mdl-36825820

ABSTRACT

MOTIVATION: Language models pre-trained on biomedical corpora, such as BioBERT, have recently shown promising results on downstream biomedical tasks. Many existing pre-trained models, on the other hand, are resource-intensive and computationally heavy owing to factors such as embedding size, hidden dimension and number of layers. The natural language processing community has developed numerous strategies to compress these models utilizing techniques such as pruning, quantization and knowledge distillation, resulting in models that are considerably faster, smaller and subsequently easier to use in practice. By the same token, in this article, we introduce six lightweight models, namely, BioDistilBERT, BioTinyBERT, BioMobileBERT, DistilBioBERT, TinyBioBERT and CompactBioBERT which are obtained either by knowledge distillation from a biomedical teacher or continual learning on the Pubmed dataset. We evaluate all of our models on three biomedical tasks and compare them with BioBERT-v1.1 to create the best efficient lightweight models that perform on par with their larger counterparts. RESULTS: We trained six different models in total, with the largest model having 65 million in parameters and the smallest having 15 million; a far lower range of parameters compared with BioBERT's 110M. Based on our experiments on three different biomedical tasks, we found that models distilled from a biomedical teacher and models that have been additionally pre-trained on the PubMed dataset can retain up to 98.8% and 98.6% of the performance of the BioBERT-v1.1, respectively. Overall, our best model below 30 M parameters is BioMobileBERT, while our best models over 30 M parameters are DistilBioBERT and CompactBioBERT, which can keep up to 98.2% and 98.8% of the performance of the BioBERT-v1.1, respectively. AVAILABILITY AND IMPLEMENTATION: Codes are available at: https://github.com/nlpie-research/Compact-Biomedical-Transformers. Trained models can be accessed at: https://huggingface.co/nlpie.


Subject(s)
Natural Language Processing , PubMed , Datasets as Topic
5.
Genes (Basel) ; 15(1)2023 12 25.
Article in English | MEDLINE | ID: mdl-38254924

ABSTRACT

Machine learning, including deep learning, reinforcement learning, and generative artificial intelligence are revolutionising every area of our lives when data are made available. With the help of these methods, we can decipher information from larger datasets while addressing the complex nature of biological systems in a more efficient way. Although machine learning methods have been introduced to human genetic epidemiological research as early as 2004, those were never used to their full capacity. In this review, we outline some of the main applications of machine learning to assigning human genetic loci to health outcomes. We summarise widely used methods and discuss their advantages and challenges. We also identify several tools, such as Combi, GenNet, and GMSTool, specifically designed to integrate these methods for hypothesis-free analysis of genetic variation data. We elaborate on the additional value and limitations of these tools from a geneticist's perspective. Finally, we discuss the fast-moving field of foundation models and large multi-modal omics biobank initiatives.


Subject(s)
Artificial Intelligence , Genome-Wide Association Study , Humans , Machine Learning , Genetic Loci , Genetic Research
6.
Sci Rep ; 12(1): 17052, 2022 10 12.
Article in English | MEDLINE | ID: mdl-36224203

ABSTRACT

Oncology patients experience numerous co-occurring symptoms during their treatment. The identification of sentinel/core symptoms is a vital prerequisite for therapeutic interventions. In this study, using Network Analysis, we investigated the inter-relationships among 38 common symptoms over time (i.e., a total of six time points over two cycles of chemotherapy) in 987 oncology patients with four different types of cancer (i.e., breast, gastrointestinal, gynaecological, and lung). In addition, we evaluated the associations between and among symptoms and symptoms clusters and examined the strength of these interactions over time. Eight unique symptom clusters were identified within the networks. Findings from this research suggest that changes occur in the relationships and interconnections between and among co-occurring symptoms and symptoms clusters that depend on the time point in the chemotherapy cycle and the type of cancer. The evaluation of the centrality measures provides new insights into the relative importance of individual symptoms within various networks that can be considered as potential targets for symptom management interventions.


Subject(s)
Neoplasms , Humans , Longitudinal Studies , Neoplasms/drug therapy , Palliative Care , Syndrome
7.
JMIR Aging ; 5(3): e38211, 2022 Sep 19.
Article in English | MEDLINE | ID: mdl-36121687

ABSTRACT

BACKGROUND: Sensor-based remote health monitoring can be used for the timely detection of health deterioration in people living with dementia with minimal impact on their day-to-day living. Anomaly detection approaches have been widely applied in various domains, including remote health monitoring. However, current approaches are challenged by noisy, multivariate data and low generalizability. OBJECTIVE: This study aims to develop an online, lightweight unsupervised learning-based approach to detect anomalies representing adverse health conditions using activity changes in people living with dementia. We demonstrated its effectiveness over state-of-the-art methods on a real-world data set of 9363 days collected from 15 participant households by the UK Dementia Research Institute between August 2019 and July 2021. Our approach was applied to household movement data to detect urinary tract infections (UTIs) and hospitalizations. METHODS: We propose and evaluate a solution based on Contextual Matrix Profile (CMP), an exact, ultrafast distance-based anomaly detection algorithm. Using daily aggregated household movement data collected via passive infrared sensors, we generated CMPs for location-wise sensor counts, duration, and change in hourly movement patterns for each patient. We computed a normalized anomaly score in 2 ways: by combining univariate CMPs and by developing a multidimensional CMP. The performance of our method was evaluated relative to Angle-Based Outlier Detection, Copula-Based Outlier Detection, and Lightweight Online Detector of Anomalies. We used the multidimensional CMP to discover and present the important features associated with adverse health conditions in people living with dementia. RESULTS: The multidimensional CMP yielded, on average, 84.3% recall with 32.1 alerts, or a 5.1% alert rate, offering the best balance of recall and relative precision compared with Copula-Based and Angle-Based Outlier Detection and Lightweight Online Detector of Anomalies when evaluated for UTI and hospitalization. Midnight to 6 AM bathroom activity was shown to be the most important cross-patient digital biomarker of anomalies indicative of UTI, contributing approximately 30% to the anomaly score. We also demonstrated how CMP-based anomaly scoring can be used for a cross-patient view of anomaly patterns. CONCLUSIONS: To the best of our knowledge, this is the first real-world study to adapt the CMP to continuous anomaly detection in a health care scenario. The CMP inherits the speed, accuracy, and simplicity of the Matrix Profile, providing configurability, the ability to denoise and detect patterns, and explainability to clinical practitioners. We addressed the need for anomaly scoring in multivariate time series health care data by developing the multidimensional CMP. With high sensitivity, a low alert rate, better overall performance than state-of-the-art methods, and the ability to discover digital biomarkers of anomalies, the CMP is a clinically meaningful unsupervised anomaly detection technique extensible to multimodal data for dementia and other health care scenarios.

8.
Elife ; 112022 05 19.
Article in English | MEDLINE | ID: mdl-35588296

ABSTRACT

Tuberculosis is a respiratory disease that is treatable with antibiotics. An increasing prevalence of resistance means that to ensure a good treatment outcome it is desirable to test the susceptibility of each infection to different antibiotics. Conventionally, this is done by culturing a clinical sample and then exposing aliquots to a panel of antibiotics, each being present at a pre-determined concentration, thereby determining if the sample isresistant or susceptible to each sample. The minimum inhibitory concentration (MIC) of a drug is the lowestconcentration that inhibits growth and is a more useful quantity but requires each sample to be tested at a range ofconcentrations for each drug. Using 96-well broth micro dilution plates with each well containing a lyophilised pre-determined amount of an antibiotic is a convenient and cost-effective way to measure the MICs of several drugs at once for a clinical sample. Although accurate, this is still an expensive and slow process that requires highly-skilled and experienced laboratory scientists. Here we show that, through the BashTheBug project hosted on the Zooniverse citizen science platform, a crowd of volunteers can reproducibly and accurately determine the MICs for 13 drugs and that simply taking the median or mode of 11-17 independent classifications is sufficient. There is therefore a potential role for crowds to support (but not supplant) the role of experts in antibiotic susceptibility testing.


Tuberculosis is a bacterial respiratory infection that kills about 1.4 million people worldwide each year. While antibiotics can cure the condition, the bacterium responsible for this disease, Mycobacterium tuberculosis, is developing resistance to these treatments. Choosing which antibiotics to use to treat the infection more carefully may help to combat the growing threat of drug-resistant bacteria. One way to find the best choice is to test how an antibiotic affects the growth of M. tuberculosis in the laboratory. To speed up this process, laboratories test multiple drugs simultaneously. They do this by growing bacteria on plates with 96 wells and injecting individual antibiotics in to each well at different concentrations. The Comprehensive Resistance Prediction for Tuberculosis (CRyPTIC) consortium has used this approach to collect and analyse bacteria from over 20,000 tuberculosis patients. An image of the 96-well plate is then captured and the level of bacterial growth in each well is assessed by laboratory scientists. But this work is difficult, time-consuming, and subjective, even for tuberculosis experts. Here, Fowler et al. show that enlisting citizen scientists may help speed up this process and reduce errors that arise from analysing such a large dataset. In April 2017, Fowler et al. launched the project 'BashTheBug' on the Zooniverse citizen science platform where anyone can access and analyse the images from the CRyPTIC consortium. They found that a crowd of inexperienced volunteers were able to consistently and accurately measure the concentration of antibiotics necessary to inhibit the growth of M. tuberculosis. If the concentration is above a pre-defined threshold, the bacteria are considered to be resistant to the treatment. A consensus result could be reached by calculating the median value of the classifications provided by as few as 17 different BashTheBug participants. The work of BashTheBug volunteers has reduced errors in the CRyPTIC project data, which has been used for several other studies. For instance, the World Health Organization (WHO) has also used the data to create a catalogue of genetic mutations associated with antibiotics resistance in M. tuberculosis. Enlisting citizen scientists has accelerated research on tuberculosis and may help with other pressing public health concerns.


Subject(s)
Mycobacterium tuberculosis , Tuberculosis , Antitubercular Agents/pharmacology , Humans , Microbial Sensitivity Tests , Tuberculosis/drug therapy , Volunteers
9.
Brief Bioinform ; 23(2)2022 03 10.
Article in English | MEDLINE | ID: mdl-35152280

ABSTRACT

Phosphorylation of proteins is one of the most significant post-translational modifications (PTMs) and plays a crucial role in plant functionality due to its impact on signaling, gene expression, enzyme kinetics, protein stability and interactions. Accurate prediction of plant phosphorylation sites (p-sites) is vital as abnormal regulation of phosphorylation usually leads to plant diseases. However, current experimental methods for PTM prediction suffers from high-computational cost and are error-prone. The present study develops machine learning-based prediction techniques, including a high-performance interpretable deep tabular learning network (TabNet) to improve the prediction of protein p-sites in soybean. Moreover, we use a hybrid feature set of sequential-based features, physicochemical properties and position-specific scoring matrices to predict serine (Ser/S), threonine (Thr/T) and tyrosine (Tyr/Y) p-sites in soybean for the first time. The experimentally verified p-sites data of soybean proteins are collected from the eukaryotic phosphorylation sites database and database post-translational modification. We then remove the redundant set of positive and negative samples by dropping protein sequences with >40% similarity. It is found that the developed techniques perform >70% in terms of accuracy. The results demonstrate that the TabNet model is the best performing classifier using hybrid features and with window size of 13, resulted in 78.96 and 77.24% sensitivity and specificity, respectively. The results indicate that the TabNet method has advantages in terms of high-performance and interpretability. The proposed technique can automatically analyze the data without any measurement errors and any human intervention. Furthermore, it can be used to predict putative protein p-sites in plants effectively. The collected dataset and source code are publicly deposited at https://github.com/Elham-khalili/Soybean-P-sites-Prediction.


Subject(s)
Glycine max , Protein Processing, Post-Translational , Amino Acid Sequence , Computational Biology/methods , Humans , Machine Learning , Phosphorylation , Glycine max/genetics
10.
Article in English | MEDLINE | ID: mdl-37015447

ABSTRACT

Early detection of COVID-19 is an ongoing area of research that can help with triage, monitoring and general health assessment of potential patients and may reduce operational strain on hospitals that cope with the coronavirus pandemic. Different machine learning techniques have been used in the literature to detect potential cases of coronavirus using routine clinical data (blood tests, and vital signs measurements). Data breaches and information leakage when using these models can bring reputational damage and cause legal issues for hospitals. In spite of this, protecting healthcare models against leakage of potentially sensitive information is an understudied research area. In this study, two machine learning techniques that aim to predict a patient's COVID-19 status are examined. Using adversarial training, robust deep learning architectures are explored with the aim to protect attributes related to demographic information about the patients. The two models examined in this work are intended to preserve sensitive information against adversarial attacks and information leakage. In a series of experiments using datasets from the Oxford University Hospitals (OUH), Bedfordshire Hospitals NHS Foundation Trust (BH), University Hospitals Birmingham NHS Foundation Trust (UHB), and Portsmouth Hospitals University NHS Trust (PUH), two neural networks are trained and evaluated. These networks predict PCR test results using information from basic laboratory blood tests, and vital signs collected from a patient upon arrival to the hospital. The level of privacy each one of the models can provide is assessed and the efficacy and robustness of the proposed architectures are compared with a relevant baseline. One of the main contributions in this work is the particular focus on the development of effective COVID-19 detection models with built-in mechanisms in order to selectively protect sensitive attributes against adversarial attacks. The results on hold-out test set and external validation confirmed that there was no impact on the generalisibility of the model using adversarial learning.

11.
Brief Bioinform ; 22(6)2021 11 05.
Article in English | MEDLINE | ID: mdl-34414415

ABSTRACT

Antimicrobial resistance (AMR) poses a threat to global public health. To mitigate the impacts of AMR, it is important to identify the molecular mechanisms of AMR and thereby determine optimal therapy as early as possible. Conventional machine learning-based drug-resistance analyses assume genetic variations to be homogeneous, thus not distinguishing between coding and intergenic sequences. In this study, we represent genetic data from Mycobacterium tuberculosis as a graph, and then adopt a deep graph learning method-heterogeneous graph attention network ('HGAT-AMR')-to predict anti-tuberculosis (TB) drug resistance. The HGAT-AMR model is able to accommodate incomplete phenotypic profiles, as well as provide 'attention scores' of genes and single nucleotide polymorphisms (SNPs) both at a population level and for individual samples. These scores encode the inputs, which the model is 'paying attention to' in making its drug resistance predictions. The results show that the proposed model generated the best area under the receiver operating characteristic (AUROC) for isoniazid and rifampicin (98.53 and 99.10%), the best sensitivity for three first-line drugs (94.91% for isoniazid, 96.60% for ethambutol and 90.63% for pyrazinamide), and maintained performance when the data were associated with incomplete phenotypes (i.e. for those isolates for which phenotypic data for some drugs were missing). We also demonstrate that the model successfully identifies genes and SNPs associated with drug resistance, mitigating the impact of resistance profile while considering particular drug resistance, which is consistent with domain knowledge.


Subject(s)
Antitubercular Agents/pharmacology , Drug Resistance, Bacterial/genetics , Mycobacterium tuberculosis/drug effects , Microbial Sensitivity Tests , Mycobacterium tuberculosis/genetics , Polymorphism, Single Nucleotide
12.
Healthc Technol Lett ; 8(5): 105-117, 2021 Oct.
Article in English | MEDLINE | ID: mdl-34221413

ABSTRACT

COVID-19 is a major, urgent, and ongoing threat to global health. Globally more than 24 million have been infected and the disease has claimed more than a million lives as of November 2020. Predicting which patients will need respiratory support is important to guiding individual patient treatment and also to ensuring sufficient resources are available. The ability of six common Early Warning Scores (EWS) to identify respiratory deterioration defined as the need for advanced respiratory support (high-flow nasal oxygen, continuous positive airways pressure, non-invasive ventilation, intubation) within a prediction window of 24 h is evaluated. It is shown that these scores perform sub-optimally at this specific task. Therefore, an alternative EWS based on the Gradient Boosting Trees (GBT) algorithm is developed that is able to predict deterioration within the next 24 h with high AUROC 94% and an accuracy, sensitivity, and specificity of 70%, 96%, 70%, respectively. The GBT model outperformed the best EWS (LDTEWS:NEWS), increasing the AUROC by 14%. Our GBT model makes the prediction based on the current and baseline measures of routinely available vital signs and blood tests.

13.
Lancet Digit Health ; 3(2): e78-e87, 2021 02.
Article in English | MEDLINE | ID: mdl-33509388

ABSTRACT

BACKGROUND: The early clinical course of COVID-19 can be difficult to distinguish from other illnesses driving presentation to hospital. However, viral-specific PCR testing has limited sensitivity and results can take up to 72 h for operational reasons. We aimed to develop and validate two early-detection models for COVID-19, screening for the disease among patients attending the emergency department and the subset being admitted to hospital, using routinely collected health-care data (laboratory tests, blood gas measurements, and vital signs). These data are typically available within the first hour of presentation to hospitals in high-income and middle-income countries, within the existing laboratory infrastructure. METHODS: We trained linear and non-linear machine learning classifiers to distinguish patients with COVID-19 from pre-pandemic controls, using electronic health record data for patients presenting to the emergency department and admitted across a group of four teaching hospitals in Oxfordshire, UK (Oxford University Hospitals). Data extracted included presentation blood tests, blood gas testing, vital signs, and results of PCR testing for respiratory viruses. Adult patients (>18 years) presenting to hospital before Dec 1, 2019 (before the first COVID-19 outbreak), were included in the COVID-19-negative cohort; those presenting to hospital between Dec 1, 2019, and April 19, 2020, with PCR-confirmed severe acute respiratory syndrome coronavirus 2 infection were included in the COVID-19-positive cohort. Patients who were subsequently admitted to hospital were included in their respective COVID-19-negative or COVID-19-positive admissions cohorts. Models were calibrated to sensitivities of 70%, 80%, and 90% during training, and performance was initially assessed on a held-out test set generated by an 80:20 split stratified by patients with COVID-19 and balanced equally with pre-pandemic controls. To simulate real-world performance at different stages of an epidemic, we generated test sets with varying prevalences of COVID-19 and assessed predictive values for our models. We prospectively validated our 80% sensitivity models for all patients presenting or admitted to the Oxford University Hospitals between April 20 and May 6, 2020, comparing model predictions with PCR test results. FINDINGS: We assessed 155 689 adult patients presenting to hospital between Dec 1, 2017, and April 19, 2020. 114 957 patients were included in the COVID-negative cohort and 437 in the COVID-positive cohort, for a full study population of 115 394 patients, with 72 310 admitted to hospital. With a sensitive configuration of 80%, our emergency department (ED) model achieved 77·4% sensitivity and 95·7% specificity (area under the receiver operating characteristic curve [AUROC] 0·939) for COVID-19 among all patients attending hospital, and the admissions model achieved 77·4% sensitivity and 94·8% specificity (AUROC 0·940) for the subset of patients admitted to hospital. Both models achieved high negative predictive values (NPV; >98·5%) across a range of prevalences (≤5%). We prospectively validated our models for all patients presenting and admitted to Oxford University Hospitals in a 2-week test period. The ED model (3326 patients) achieved 92·3% accuracy (NPV 97·6%, AUROC 0·881), and the admissions model (1715 patients) achieved 92·5% accuracy (97·7%, 0·871) in comparison with PCR results. Sensitivity analyses to account for uncertainty in negative PCR results improved apparent accuracy (ED model 95·1%, admissions model 94·1%) and NPV (ED model 99·0%, admissions model 98·5%). INTERPRETATION: Our models performed effectively as a screening test for COVID-19, excluding the illness with high-confidence by use of clinical data routinely available within 1 h of presentation to hospital. Our approach is rapidly scalable, fitting within the existing laboratory testing infrastructure and standard of care of hospitals in high-income and middle-income countries. FUNDING: Wellcome Trust, University of Oxford, Engineering and Physical Sciences Research Council, National Institute for Health Research Oxford Biomedical Research Centre.


Subject(s)
Artificial Intelligence , COVID-19 , Hematologic Tests , Mass Screening , Predictive Value of Tests , Triage , Adult , Emergency Service, Hospital , Hospitalization , Hospitals , Humans , Middle Aged , Prospective Studies
14.
Front Plant Sci ; 11: 590529, 2020.
Article in English | MEDLINE | ID: mdl-33381132

ABSTRACT

Early prediction of pathogen infestation is a key factor to reduce the disease spread in plants. Macrophomina phaseolina (Tassi) Goid, as one of the main causes of charcoal rot disease, suppresses the plant productivity significantly. Charcoal rot disease is one of the most severe threats to soybean productivity. Prediction of this disease in soybeans is very tedious and non-practical using traditional approaches. Machine learning (ML) techniques have recently gained substantial traction across numerous domains. ML methods can be applied to detect plant diseases, prior to the full appearance of symptoms. In this paper, several ML techniques were developed and examined for prediction of charcoal rot disease in soybean for a cohort of 2,000 healthy and infected plants. A hybrid set of physiological and morphological features were suggested as inputs to the ML models. All developed ML models were performed better than 90% in terms of accuracy. Gradient Tree Boosting (GBT) was the best performing classifier which obtained 96.25% and 97.33% in terms of sensitivity and specificity. Our findings supported the applicability of ML especially GBT for charcoal rot disease prediction in a real environment. Moreover, our analysis demonstrated the importance of including physiological featured in the learning. The collected dataset and source code can be found in https://github.com/Elham-khalili/Soybean-Charcoal-Rot-Disease-Prediction-Dataset-code.

15.
Front Microbiol ; 11: 667, 2020.
Article in English | MEDLINE | ID: mdl-32390972

ABSTRACT

Resistance prediction and mutation ranking are important tasks in the analysis of Tuberculosis sequence data. Due to standard regimens for the use of first-line antibiotics, resistance co-occurrence, in which samples are resistant to multiple drugs, is common. Analysing all drugs simultaneously should therefore enable patterns reflecting resistance co-occurrence to be exploited for resistance prediction. Here, multi-label random forest (MLRF) models are compared with single-label random forest (SLRF) for both predicting phenotypic resistance from whole genome sequences and identifying important mutations for better prediction of four first-line drugs in a dataset of 13402 Mycobacterium tuberculosis isolates. Results confirmed that MLRFs can improve performance compared to conventional clinical methods (by 18.10%) and SLRFs (by 0.91%). In addition, we identified a list of candidate mutations that are important for resistance prediction or that are related to resistance co-occurrence. Moreover, we found that retraining our analysis to a subset of top-ranked mutations was sufficient to achieve satisfactory performance. The source code can be found at http://www.robots.ox.ac.uk/~davidc/code.php.

16.
Viruses ; 11(5)2019 04 26.
Article in English | MEDLINE | ID: mdl-31035503

ABSTRACT

Advances in DNA sequencing technology are facilitating genomic analyses of unprecedented scope and scale, widening the gap between our abilities to generate and fully exploit biological sequence data. Comparable analytical challenges are encountered in other data-intensive fields involving sequential data, such as signal processing, in which dimensionality reduction (i.e., compression) methods are routinely used to lessen the computational burden of analyses. In this work, we explored the application of dimensionality reduction methods to numerically represent high-throughput sequence data for three important biological applications of virus sequence data: reference-based mapping, short sequence classification and de novo assembly. Leveraging highly compressed sequence transformations to accelerate sequence comparison, our approach yielded comparable accuracy to existing approaches, further demonstrating its suitability for sequences originating from diverse virus populations. We assessed the application of our methodology using both synthetic and real viral pathogen sequences. Our results show that the use of highly compressed sequence approximations can provide accurate results, with analytical performance retained and even enhanced through appropriate dimensionality reduction of sequence data.


Subject(s)
Computational Biology , DNA Viruses/classification , DNA Viruses/genetics , Genome, Viral , Genomics , Computational Biology/methods , Genomics/methods , Humans
17.
Sci Rep ; 9(1): 2159, 2019 02 15.
Article in English | MEDLINE | ID: mdl-30770850

ABSTRACT

Algorithms in bioinformatics use textual representations of genetic information, sequences of the characters A, T, G and C represented computationally as strings or sub-strings. Signal and related image processing methods offer a rich source of alternative descriptors as they are designed to work in the presence of noisy data without the need for exact matching. Here we introduce a method, multi-resolution local binary patterns (MLBP) adapted from image processing to extract local 'texture' changes from nucleotide sequence data. We apply this feature space to the alignment-free binning of metagenomic data. The effectiveness of MLBP is demonstrated using both simulated and real human gut microbial communities. Sequence reads or contigs can be represented as vectors and their 'texture' compared efficiently using machine learning algorithms to perform dimensionality reduction to capture eigengenome information and perform clustering (here using randomized singular value decomposition and BH-tSNE). The intuition behind our method is the MLBP feature vectors permit sequence comparisons without the need for explicit pairwise matching. We demonstrate this approach outperforms existing methods based on k-mer frequencies. The signal processing method, MLBP, thus offers a viable alternative feature space to textual representations of sequence data. The source code for our Multi-resolution Genomic Binary Patterns method can be found at https://github.com/skouchaki/MrGBP .


Subject(s)
Bacteria/classification , Bacteria/genetics , Computational Biology/methods , Metagenomics/methods , Sequence Homology, Nucleic Acid , Algorithms , Gastrointestinal Microbiome , Microbiota
18.
Annu Int Conf IEEE Eng Med Biol Soc ; 2019: 3115-3118, 2019 Jul.
Article in English | MEDLINE | ID: mdl-31946547

ABSTRACT

In this study, a novel sleep pose identification method has been proposed for classifying 12 different sleep postures using a two-step deep learning process. For this purpose, transfer learning as an initial stage retrains a well-known CNN network (VGG-19) to categorise the data into four main pose classes, namely: supine, left, right, and prone. According to the decision made by VGG-19, subsets of the image data are next passed to one of four dedicated sub-class CNNs. As a result, the pose estimation label is further refined from one of four sleep pose labels to one of 12 sleep pose labels. 10 participants contributed for recording infrared (IR) images of 12 pre-defined sleep positions. Participants were covered by a blanket to occlude the original pose and present a more realistic sleep situation. Finally, we have compared our results with (1) the traditional CNN learning from scratch and (2) retrained VGG-19 network in one stage. The average accuracy increased from 74.5% & 78.1% to 85.6% compared with (1) & (2) respectively.


Subject(s)
Deep Learning , Neural Networks, Computer , Posture , Sleep , Humans
19.
Bioinformatics ; 35(13): 2276-2282, 2019 07 01.
Article in English | MEDLINE | ID: mdl-30462147

ABSTRACT

MOTIVATION: Timely identification of Mycobacterium tuberculosis (MTB) resistance to existing drugs is vital to decrease mortality and prevent the amplification of existing antibiotic resistance. Machine learning methods have been widely applied for timely predicting resistance of MTB given a specific drug and identifying resistance markers. However, they have been not validated on a large cohort of MTB samples from multi-centers across the world in terms of resistance prediction and resistance marker identification. Several machine learning classifiers and linear dimension reduction techniques were developed and compared for a cohort of 13 402 isolates collected from 16 countries across 6 continents and tested 11 drugs. RESULTS: Compared to conventional molecular diagnostic test, area under curve of the best machine learning classifier increased for all drugs especially by 23.11%, 15.22% and 10.14% for pyrazinamide, ciprofloxacin and ofloxacin, respectively (P < 0.01). Logistic regression and gradient tree boosting found to perform better than other techniques. Moreover, logistic regression/gradient tree boosting with a sparse principal component analysis/non-negative matrix factorization step compared with the classifier alone enhanced the best performance in terms of F1-score by 12.54%, 4.61%, 7.45% and 9.58% for amikacin, moxifloxacin, ofloxacin and capreomycin, respectively, as well increasing area under curve for amikacin and capreomycin. Results provided a comprehensive comparison of various techniques and confirmed the application of machine learning for better prediction of the large diverse tuberculosis data. Furthermore, mutation ranking showed the possibility of finding new resistance/susceptible markers. AVAILABILITY AND IMPLEMENTATION: The source code can be found at http://www.robots.ox.ac.uk/ davidc/code.php. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Mycobacterium tuberculosis , Tuberculosis , Antitubercular Agents , Humans , Machine Learning
20.
J Neurosci Methods ; 273: 96-106, 2016 11 01.
Article in English | MEDLINE | ID: mdl-27528379

ABSTRACT

BACKGROUND: Manual sleep scoring is deemed to be tedious and time consuming. Even among automatic methods such as time-frequency (T-F) representations, there is still room for more improvement. NEW METHOD: To optimise the efficiency of T-F domain analysis of sleep electroencephalography (EEG) a novel approach for automatically identifying the brain waves, sleep spindles, and K-complexes from the sleep EEG signals is proposed. The proposed method is based on singular spectrum analysis (SSA). The single-channel EEG signal (C3-A2) is initially decomposed and then the desired components are automatically separated. In addition, the noise is removed to enhance the discrimination ability of features. The obtained T-F features after preprocessing stage are classified using a multi-class support vector machines (SVMs) and used for the identification of four sleep stages over three sleep types. Furthermore, to emphasise on the usefulness of the proposed method the automatically-determined spindles are parameterised to discriminate three sleep types. RESULT: The four sleep stages are classified through SVM twice: with and without preprocessing stage. The mean accuracy, sensitivity, and specificity for before the preprocessing stage are: 71.5±0.11%, 56.1±0.09% and 86.8±0.04% respectively. However, these values increase significantly to 83.6±0.07%, 70.6±0.14% and 90.8±0.03% after applying SSA. COMPARISON WITH EXISTING METHOD: The new T-F representation has been compared with the existing benchmarks. Our results prove that, the proposed method well outperforms the previous methods in terms of identification and representation of sleep stages. CONCLUSION: Experimental results confirm the performance improvement in terms of classification rate and also representative T-F domain.


Subject(s)
Brain Waves/physiology , Electroencephalography/classification , Electroencephalography/methods , Sleep/physiology , Spectrum Analysis/methods , Brain Mapping , Female , Humans , Male , Polysomnography , Sensitivity and Specificity , Support Vector Machine , Time Factors , Wakefulness/physiology
SELECTION OF CITATIONS
SEARCH DETAIL
...