Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
NPJ Digit Med ; 3: 30, 2020.
Article in English | MEDLINE | ID: mdl-32195365

ABSTRACT

Autoimmune diseases are chronic, multifactorial conditions. Through machine learning (ML), a branch of the wider field of artificial intelligence, it is possible to extract patterns within patient data, and exploit these patterns to predict patient outcomes for improved clinical management. Here, we surveyed the use of ML methods to address clinical problems in autoimmune disease. A systematic review was conducted using MEDLINE, embase and computers and applied sciences complete databases. Relevant papers included "machine learning" or "artificial intelligence" and the autoimmune diseases search term(s) in their title, abstract or key words. Exclusion criteria: studies not written in English, no real human patient data included, publication prior to 2001, studies that were not peer reviewed, non-autoimmune disease comorbidity research and review papers. 169 (of 702) studies met the criteria for inclusion. Support vector machines and random forests were the most popular ML methods used. ML models using data on multiple sclerosis, rheumatoid arthritis and inflammatory bowel disease were most common. A small proportion of studies (7.7% or 13/169) combined different data types in the modelling process. Cross-validation, combined with a separate testing set for more robust model evaluation occurred in 8.3% of papers (14/169). The field may benefit from adopting a best practice of validation, cross-validation and independent testing of ML models. Many models achieved good predictive results in simple scenarios (e.g. classification of cases and controls). Progression to more complex predictive models may be achievable in future through integration of multiple data types.

2.
BMC Bioinformatics ; 20(1): 254, 2019 May 16.
Article in English | MEDLINE | ID: mdl-31096927

ABSTRACT

BACKGROUND: Next-generation sequencing is revolutionising diagnosis and treatment of rare diseases, however its application to understanding common disease aetiology is limited. Rare disease applications binarily attribute genetic change(s) at a single locus to a specific phenotype. In common diseases, where multiple genetic variants within and across genes contribute to disease, binary modelling cannot capture the burden of pathogenicity harboured by an individual across a given gene/pathway. We present GenePy, a novel gene-level scoring system for integration and analysis of next-generation sequencing data on a per-individual basis that transforms NGS data interpretation from variant-level to gene-level. This simple and flexible scoring system is intuitive and amenable to integration for machine learning, network and topological approaches, facilitating the investigation of complex phenotypes. RESULTS: Whole-exome sequencing data from 508 individuals were used to generate GenePy scores. For each variant a score is calculated incorporating: i) population allele frequency estimates; ii) individual zygosity, determined through standard variant calling pipelines and; iii) any user defined deleteriousness metric to inform on functional impact. GenePy then combines scores generated for all variants observed into a single gene score for each individual. We generated a matrix of ~ 14,000 GenePy scores for all individuals for each of sixteen popular deleteriousness metrics. All per-gene scores are corrected for gene length. The majority of genes generate GenePy scores < 0.01 although individuals harbouring multiple rare highly deleterious mutations can accumulate extremely high GenePy scores. In the absence of a comparator metric, we examine GenePy performance in discriminating genes known to be associated with three common, complex diseases. A Mann-Whitney U test conducted on GenePy scores for this positive control gene in cases versus controls demonstrates markedly more significant results (p = 1.37 × 10- 4) compared to the most commonly applied association tool that combines common and rare variation (p = 0.003). CONCLUSIONS: Per-gene per-individual GenePy scores are intuitive when assessing genetic variation in individual patients or comparing scores between groups. GenePy outperforms the currently accepted best practice tools for combining common and rare variation. GenePy scores are suitable for downstream data integration with transcriptomic and proteomic data that also report at the gene level.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Software , Virulence/genetics , Alleles , Cohort Studies , Databases, Genetic , Exome , Gene Frequency/genetics , Humans , Phenotype , Polymorphism, Single Nucleotide/genetics , Exome Sequencing , Zygote/metabolism
3.
Sci Rep ; 7(1): 2427, 2017 05 25.
Article in English | MEDLINE | ID: mdl-28546534

ABSTRACT

Paediatric inflammatory bowel disease (PIBD), comprising Crohn's disease (CD), ulcerative colitis (UC) and inflammatory bowel disease unclassified (IBDU) is a complex and multifactorial condition with increasing incidence. An accurate diagnosis of PIBD is necessary for a prompt and effective treatment. This study utilises machine learning (ML) to classify disease using endoscopic and histological data for 287 children diagnosed with PIBD. Data were used to develop, train, test and validate a ML model to classify disease subtype. Unsupervised models revealed overlap of CD/UC with broad clustering but no clear subtype delineation, whereas hierarchical clustering identified four novel subgroups characterised by differing colonic involvement. Three supervised ML models were developed utilising endoscopic data only, histological only and combined endoscopic/histological data yielding classification accuracy of 71.0%, 76.9% and 82.7% respectively. The optimal combined model was tested on a statistically independent cohort of 48 PIBD patients from the same clinic, accurately classifying 83.3% of patients. This study employs mathematical modelling of endoscopic and histological data to aid diagnostic accuracy. While unsupervised modelling categorises patients into four subgroups, supervised approaches confirm the need of both endoscopic and histological evidence for an accurate diagnosis. Overall, this paper provides a blueprint for ML use with clinical data.


Subject(s)
Inflammatory Bowel Diseases/diagnosis , Machine Learning , Adolescent , Age Factors , Child , Child, Preschool , Cluster Analysis , Female , Humans , Infant , Infant, Newborn , Male , Models, Theoretical , ROC Curve , Reproducibility of Results , Supervised Machine Learning , Unsupervised Machine Learning
4.
Heredity (Edinb) ; 117(5): 375-382, 2016 11.
Article in English | MEDLINE | ID: mdl-27381324

ABSTRACT

The analysis of linkage disequilibrium (LD) underpins the development of effective genotyping technologies, trait mapping and understanding of biological mechanisms such as those driving recombination and the impact of selection. We apply the Malécot-Morton model of LD to create additive LD maps that describe the high-resolution LD landscape of commercial chickens. We investigated LD in chickens (Gallus gallus) at the highest resolution to date for broiler, white egg and brown egg layer commercial lines. There is minimal concordance between breeds of fine-scale LD patterns (correlation coefficient <0.21), and even between discrete broiler lines. Regions of LD breakdown, which may align with recombination hot spots, are enriched near CpG islands and transcription start sites (P<2.2 × 10-16), consistent with recent evidence described in finches, but concordance in hot spot locations between commercial breeds is only marginally greater than random. As in other birds, functional elements in the chicken genome are associated with recombination but, unlike evidence from other bird species, the LD landscape is not stable in the populations studied. The development of optimal genotyping panels for genome-led selection programmes will depend on careful analysis of the LD structure of each line of interest. Further study is required to fully elucidate the mechanisms underlying highly divergent LD patterns found in commercial chickens.


Subject(s)
Chickens/genetics , Linkage Disequilibrium , Recombination, Genetic , Animals , Breeding , Chromosome Mapping , Genetics, Population , Genotyping Techniques , Polymorphism, Single Nucleotide , Sequence Analysis, DNA
SELECTION OF CITATIONS
SEARCH DETAIL
...