Search | VHL Regional Portal

Essentiality, protein-protein interactions and evolutionary properties are key predictors for identifying cancer-associated genes using machine learning.

Safadi, Amro; Lovell, Simon C; Doig, Andrew J.

Sci Rep ; 14(1): 9199, 2024 04 22.

Article in English | MEDLINE | ID: mdl-38649399

ABSTRACT

The distinctive nature of cancer as a disease prompts an exploration of the special characteristics the genes implicated in cancer exhibit. The identification of cancer-associated genes and their characteristics is crucial to further our understanding of this disease and enhanced likelihood of therapeutic drug targets success. However, the rate at which cancer genes are being identified experimentally is slow. Applying predictive analysis techniques, through the building of accurate machine learning models, is potentially a useful approach in enhancing the identification rate of these genes and their characteristics. Here, we investigated gene essentiality scores and found that they tend to be higher for cancer-associated genes compared to other protein-coding human genes. We built a dataset of extended gene properties linked to essentiality and used it to train a machine-learning model; this model reached 89% accuracy and > 0.85 for the Area Under Curve (AUC). The model showed that essentiality, evolutionary-related properties, and properties arising from protein-protein interaction networks are particularly effective in predicting cancer-associated genes. We were able to use the model to identify potential candidate genes that have not been previously linked to cancer. Prioritising genes that score highly by our methods could aid scientists in their cancer genes research.

Subject(s)

Genes, Essential , Machine Learning , Neoplasms , Humans , Neoplasms/genetics , Protein Interaction Maps/genetics , Evolution, Molecular , Computational Biology/methods

Using an integrative machine learning approach utilising homology modelling to clinically interpret genetic variants: CACNA1F as an exemplar.

Sallah, Shalaw R; Sergouniotis, Panagiotis I; Barton, Stephanie; Ramsden, Simon; Taylor, Rachel L; Safadi, Amro; Kabir, Mitra; Ellingford, Jamie M; Lench, Nick; Lovell, Simon C; Black, Graeme C M.

Eur J Hum Genet ; 28(9): 1274-1282, 2020 09.

Article in English | MEDLINE | ID: mdl-32313206

ABSTRACT

Advances in DNA sequencing technologies have revolutionised rare disease diagnostics and have led to a dramatic increase in the volume of available genomic data. A key challenge that needs to be overcome to realise the full potential of these technologies is that of precisely predicting the effect of genetic variants on molecular and organismal phenotypes. Notably, despite recent progress, there is still a lack of robust in silico tools that accurately assign clinical significance to variants. Genetic alterations in the CACNA1F gene are the commonest cause of X-linked incomplete Congenital Stationary Night Blindness (iCSNB), a condition associated with non-progressive visual impairment. We combined genetic and homology modelling data to produce CACNA1F-vp, an in silico model that differentiates disease-implicated from benign missense CACNA1F changes. CACNA1F-vp predicts variant effects on the structure of the CACNA1F encoded protein (a calcium channel) using parameters based upon changes in amino acid properties; these include size, charge, hydrophobicity, and position. The model produces an overall score for each variant that can be used to predict its pathogenicity. CACNA1F-vp outperformed four other tools in identifying disease-implicated variants (area under receiver operating characteristic and precision recall curves = 0.84; Matthews correlation coefficient = 0.52) using a tenfold cross-validation technique. We consider this protein-specific model to be a robust stand-alone diagnostic classifier that could be replicated in other proteins and could enable precise and timely diagnosis.

Subject(s)

Genetic Testing/methods , Sequence Alignment/methods , Sequence Analysis, DNA/methods , Structural Homology, Protein , Animals , Calcium Channels, L-Type/chemistry , Calcium Channels, L-Type/genetics , Humans , Machine Learning , Mutation

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL