Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
1.
Article in English | MEDLINE | ID: mdl-36201419

ABSTRACT

The discovery of causal relationships is a fundamental problem in science and medicine. In recent years, many elegant approaches to discovering causal relationships between two variables from observational data have been proposed. However, most of these deal only with purely directed causal relationships and cannot detect latent common causes. Here, we devise a general heuristic which takes a causal discovery algorithm that can only distinguish purely directed causal relations and modifies it to also detect latent common causes. We apply our method to two directed causal discovery algorithms, the information geometric causal inference (IGCI) of (Daniusis et al., 2010) and the kernel conditional deviance for causal inference of (Mitrovic et al., 2018), and extensively test on synthetic data-detecting latent common causes in additive, multiplicative and complex noise regimes-and on real data, where we are able to detect known common causes. In addition to detecting latent common causes, our experiments demonstrate that both the modified algorithms preserve the performance of the original in distinguishing directed causal relations.

3.
Nat Commun ; 11(1): 4754, 2020 09 16.
Article in English | MEDLINE | ID: mdl-32938913

ABSTRACT

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

4.
Nat Commun ; 11(1): 3923, 2020 08 11.
Article in English | MEDLINE | ID: mdl-32782264

ABSTRACT

Machine learning promises to revolutionize clinical decision making and diagnosis. In medical diagnosis a doctor aims to explain a patient's symptoms by determining the diseases causing them. However, existing machine learning approaches to diagnosis are purely associative, identifying diseases that are strongly correlated with a patients symptoms. We show that this inability to disentangle correlation from causation can result in sub-optimal or dangerous diagnoses. To overcome this, we reformulate diagnosis as a counterfactual inference task and derive counterfactual diagnostic algorithms. We compare our counterfactual algorithms to the standard associative algorithm and 44 doctors using a test set of clinical vignettes. While the associative algorithm achieves an accuracy placing in the top 48% of doctors in our cohort, our counterfactual algorithm places in the top 25% of doctors, achieving expert clinical accuracy. Our results show that causal reasoning is a vital missing ingredient for applying machine learning to medical diagnosis.


Subject(s)
Data Accuracy , Diagnosis , Machine Learning , Algorithms , Bayes Theorem , Data Collection , Decision Making , Diagnosis, Computer-Assisted , Disease , Humans , Models, Statistical
5.
Front Digit Health ; 2: 569261, 2020.
Article in English | MEDLINE | ID: mdl-34713043

ABSTRACT

Background: AI-driven digital health tools often rely on estimates of disease incidence or prevalence, but obtaining these estimates is costly and time-consuming. We explored the use of machine learning models that leverage contextual information about diseases from unstructured text, to estimate disease incidence. Methods: We used a class of machine learning models, called language models, to extract contextual information relating to disease incidence. We evaluated three different language models: BioBERT, Global Vectors for Word Representation (GloVe), and the Universal Sentence Encoder (USE), as well as an approach which uses all jointly. The output of these models is a mathematical representation of the underlying data, known as "embeddings." We used these to train neural network models to predict disease incidence. The neural networks were trained and validated using data from the Global Burden of Disease study, and tested using independent data sourced from the epidemiological literature. Findings: A variety of language models can be used to encode contextual information of diseases. We found that, on average, BioBERT embeddings were the best for disease names across multiple tasks. In particular, BioBERT was the best performing model when predicting specific disease-country pairs, whilst a fusion model combining BioBERT, GloVe, and USE performed best on average when predicting disease incidence in unseen countries. We also found that GloVe embeddings performed better than BioBERT embeddings when applied to country names. However, we also noticed that the models were limited in view of predicting previously unseen diseases. Further limitations were also observed with substantial variations across age groups and notably lower performance for diseases that are highly dependent on location and climate. Interpretation: We demonstrate that context-aware machine learning models can be used for estimating disease incidence. This method is quicker to implement than traditional epidemiological approaches. We therefore suggest it complements existing modeling efforts, where data is required more rapidly or at larger scale. This may particularly benefit AI-driven digital health products where the data will undergo further processing and a validated approximation of the disease incidence is adequate.

6.
Front Artif Intell ; 3: 543405, 2020.
Article in English | MEDLINE | ID: mdl-33733203

ABSTRACT

AI virtual assistants have significant potential to alleviate the pressure on overly burdened healthcare systems by enabling patients to self-assess their symptoms and to seek further care when appropriate. For these systems to make a meaningful contribution to healthcare globally, they must be trusted by patients and healthcare professionals alike, and service the needs of patients in diverse regions and segments of the population. We developed an AI virtual assistant which provides patients with triage and diagnostic information. Crucially, the system is based on a generative model, which allows for relatively straightforward re-parameterization to reflect local disease and risk factor burden in diverse regions and population segments. This is an appealing property, particularly when considering the potential of AI systems to improve the provision of healthcare on a global scale in many regions and for both developing and developed countries. We performed a prospective validation study of the accuracy and safety of the AI system and human doctors. Importantly, we assessed the accuracy and safety of both the AI and human doctors independently against identical clinical cases and, unlike previous studies, also accounted for the information gathering process of both agents. Overall, we found that the AI system is able to provide patients with triage and diagnostic information with a level of clinical accuracy and safety comparable to that of human doctors. Through this approach and study, we hope to start building trust in AI-powered systems by directly comparing their performance to human doctors, who do not always agree with each other on the cause of patients' symptoms or the most appropriate triage recommendation.

7.
Bioinformatics ; 22(4): 495-6, 2006 Feb 15.
Article in English | MEDLINE | ID: mdl-16357032

ABSTRACT

SEAN is an application that predicts single nucleotide polymorphisms (SNPs) using multiple sequence alignments produced from expressed sequence tag (EST) clusters. The algorithm uses rules of sequence identity and SNP abundance to determine the quality of the prediction. A Java viewer is provided to display the EST alignments and predicted SNPs.


Subject(s)
DNA Mutational Analysis/methods , Expressed Sequence Tags , Polymorphism, Single Nucleotide/genetics , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Software , User-Computer Interface , Algorithms , Chromosome Mapping/methods , Cluster Analysis , Computer Graphics , Pattern Recognition, Automated/methods
SELECTION OF CITATIONS
SEARCH DETAIL
...