Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 10 de 10
Filter
1.
Article in English | MEDLINE | ID: mdl-38691429

ABSTRACT

DNA damage is a critical factor in the onset and progression of cancer. When DNA is damaged, the number of genetic mutations increases, making it necessary to activate DNA repair mechanisms. A crucial factor in the base excision repair process, which helps maintain the stability of the genome, is an enzyme called DNA polymerase [Formula: see text] (Pol[Formula: see text]) encoded by the POLB gene. It plays a vital role in the repair of damaged DNA. Additionally, variations known as Single Nucleotide Polymorphisms (SNPs) in the POLB gene can potentially affect the ability to repair DNA. This study uses bioinformatics tools that extract important features from SNPs to construct a feature matrix, which is then used in combination with machine learning algorithms to predict the likelihood of developing cancer associated with a specific mutation. Eight different machine learning algorithms were used to investigate the relationship between POLB gene variations and their potential role in cancer onset. This study not only highlights the complex link between POLB gene SNPs and cancer, but also underscores the effectiveness of machine learning approaches in genomic studies, paving the way for advanced predictive models in genetic and cancer research.

2.
Sci Rep ; 14(1): 3392, 2024 02 09.
Article in English | MEDLINE | ID: mdl-38337023

ABSTRACT

The Human leukocyte antigen (HLA) molecules are central to immune response and have associations with the phenotypes of various diseases and induced drug toxicity. Further, the role of HLA molecules in presenting antigens significantly affects the transplantation outcome. The objective of this study was to examine the extent of the diversity of HLA alleles in the population of the United Arab Emirates (UAE) using Next-Generation Sequencing methodologies and encompassing a larger cohort of individuals. A cohort of 570 unrelated healthy citizens of the UAE volunteered to provide samples for Whole Genome Sequencing and Whole Exome Sequencing. The definition of the HLA alleles was achieved through the application of the bioinformatics tools, HLA-LA and xHLA. Subsequently, the findings from this study were compared with other local and international datasets. A broad range of HLA alleles in the UAE population, of which some were previously unreported, was identified. A comparison with other populations confirmed the current population's unique intertwined genetic heritage while highlighting similarities with populations from the Middle East region. Some disease-associated HLA alleles were detected at a frequency of > 5%, such as HLA-B*51:01, HLA-DRB1*03:01, HLA-DRB1*15:01, and HLA-DQB1*02:01. The increase in allele homozygosity, especially for HLA class I genes, was identified in samples with a higher level of genome-wide homozygosity. This highlights a possible effect of consanguinity on the HLA homozygosity. The HLA allele distribution in the UAE population showcases a unique profile, underscoring the need for tailored databases for traditional activities such as unrelated transplant matching and for newer initiatives in precision medicine based on specific populations. This research is part of a concerted effort to improve the knowledge base, particularly in the fields of transplant medicine and investigating disease associations as well as in understanding human migration patterns within the Arabian Peninsula and surrounding regions.


Subject(s)
Histocompatibility Antigens Class II , Histocompatibility Antigens Class I , Humans , United Arab Emirates , Gene Frequency , Histocompatibility Antigens Class I/genetics , Histocompatibility Antigens Class II/genetics , Major Histocompatibility Complex/genetics , High-Throughput Nucleotide Sequencing , Haplotypes , Alleles , HLA-DRB1 Chains/genetics
3.
BMC Bioinformatics ; 24(1): 354, 2023 Sep 21.
Article in English | MEDLINE | ID: mdl-37735350

ABSTRACT

BACKGROUND: Plummeting DNA sequencing cost in recent years has enabled genome sequencing projects to scale up by several orders of magnitude, which is transforming genomics into a highly data-intensive field of research. This development provides the much needed statistical power required for genotype-phenotype predictions in complex diseases. METHODS: In order to efficiently leverage the wealth of information, we here assessed several genomic data science tools. The rationale to focus on on-premise installations is to cope with situations where data confidentiality and compliance regulations etc. rule out cloud based solutions. We established a comprehensive qualitative and quantitative comparison between BCFtools, SnpSift, Hail, GEMINI, and OpenCGA. The tools were compared in terms of data storage technology, query speed, scalability, annotation, data manipulation, visualization, data output representation, and availability. RESULTS: Tools that leverage sophisticated data structures are noted as the most suitable for large-scale projects in varying degrees of scalability in comparison to flat-file manipulation (e.g., BCFtools, and SnpSift). Remarkably, for small to mid-size projects, even lightweight relational database. CONCLUSION: The assessment criteria provide insights into the typical questions posed in scalable genomics and serve as guidance for the development of scalable computational infrastructure in genomics.


Subject(s)
Data Science , Genomics , Chromosome Mapping , Databases, Factual , Sequence Analysis, DNA
4.
Biology (Basel) ; 12(4)2023 Mar 29.
Article in English | MEDLINE | ID: mdl-37106719

ABSTRACT

Gene expression profiling is one of the most recognized techniques for inferring gene regulators and their potential targets in gene regulatory networks (GRN). The purpose of this study is to build a regulatory network for the budding yeast Saccharomyces cerevisiae genome by incorporating the use of RNA-seq and microarray data represented by a wide range of experimental conditions. We introduce a pipeline for data analysis, data preparation, and training models. Several kernel classification models; including one-class, two-class, and rare event classification methods, are used to categorize genes. We test the impact of the normalization techniques on the overall performance of RNA-seq. Our findings provide new insights into the interactions between genes in the yeast regulatory network. The conclusions of our study have significant importance since they highlight the effectiveness of classification and its contribution towards enhancing the present comprehension of the yeast regulatory network. When assessed, our pipeline demonstrates strong performance across different statistical metrics, such as a 99% recall rate and a 98% AUC score.

5.
BMJ Open ; 11(5): e044102, 2021 05 11.
Article in English | MEDLINE | ID: mdl-33980523

ABSTRACT

OBJECTIVE: To generate cross-national forecasts of COVID-19 trajectories and quantify the associated impact on essential critical care resources for disease management in Gulf Cooperation Council (GCC) countries. DESIGN: Population-level aggregate analysis. SETTING: Bahrain, Kuwait, Oman, Qatar, United Arab Emirates (UAE) and Saudi Arabia. METHODS: We applied an extended time-dependent SEICRD compartmental model to predict the flow of people between six states, susceptible-exposed-infected-critical-recovery-death, accounting for community mitigation strategies and the latent period between exposure and infected and contagious states. Then, we used the WHO Adaptt Surge Planning Tool to predict intensive care unit (ICU) and human resources capacity based on predicted daily active and cumulative infections from the SEICRD model. MAIN OUTCOME MEASURES: Predicted COVID-19 infections, deaths, and ICU and human resources capacity for disease management. RESULTS: COVID-19 infections vary daily from 498 per million in Bahrain to over 300 per million in UAE and Qatar, to 9 per million in Saudi Arabia. The cumulative number of deaths varies from 302 per million in Oman to 89 in Qatar. UAE attained its first peak as early as 21 April 2020, whereas Oman had its peak on 29 August 2020. In absolute terms, Saudi Arabia is predicted to have the highest COVID-19 mortality burden, followed by UAE and Oman. The predicted maximum number of COVID-19-infected patients in need of oxygen therapy during the peak of emergency admissions varies between 690 in Bahrain, 1440 in Oman and over 10 000 in Saudi Arabia. CONCLUSION: Although most GCC countries have managed to flatten the epidemiological curve by August 2020, trends since November 2020 show potential increase in new infections. The pandemic is predicted to recede by August 2021, provided the existing infection control measures continue effectively and consistently across all countries. Current health infrastructure including the provision of ICUs and nursing staff seem adequate, but health systems should keep ICUs ready to manage critically ill patients.


Subject(s)
COVID-19 , Severe Acute Respiratory Syndrome , Bahrain/epidemiology , Critical Care , Humans , Kuwait/epidemiology , Oman/epidemiology , Pandemics , Qatar , SARS-CoV-2 , Saudi Arabia/epidemiology , United Arab Emirates/epidemiology
6.
Evol Bioinform Online ; 16: 1176934320920310, 2020.
Article in English | MEDLINE | ID: mdl-35173404

ABSTRACT

Computational prediction of gene-gene associations is one of the productive directions in the study of bioinformatics. Many tools are developed to infer the relation between genes using different biological data sources. The association of a pair of genes deduced from the analysis of biological data becomes meaningful when it reflects the directionality and the type of reaction between genes. In this work, we follow another method to construct a causal gene co-expression network while identifying transcription factors in each pair of genes using microarray expression data. We adopt a machine learning technique based on a logistic regression model to tackle the sparsity of the network and to improve the quality of the prediction accuracy. The proposed system classifies each pair of genes into either connected or nonconnected class using the data of the correlation between these genes in the whole Saccharomyces cerevisiae genome. The accuracy of the classification model in predicting related genes was evaluated using several data sets for the yeast regulatory network. Our system achieves high performance in terms of several statistical measures.

7.
BMC Bioinformatics ; 20(1): 71, 2019 Feb 08.
Article in English | MEDLINE | ID: mdl-30736739

ABSTRACT

BACKGROUND: A large number of computational methods have been proposed for predicting protein functions. The underlying techniques adopted by most of these methods revolve around predicting the functions of an unannotated protein p from already annotated proteins that have similar characteristics as p. Recent Information Extraction methods take advantage of the huge growth of biomedical literature to predict protein functions. They extract biological molecule terms that directly describe protein functions from biomedical texts. However, they consider only explicitly mentioned terms that co-occur with proteins in texts. We observe that some important biological molecule terms pertaining functional categories may implicitly co-occur with proteins in texts. Therefore, the methods that rely solely on explicitly mentioned terms in texts may miss vital functional information implicitly mentioned in the texts. RESULTS: To overcome the limitations of methods that rely solely on explicitly mentioned terms in texts to predict protein functions, we propose in this paper an Information Extraction system called PL-PPF. The proposed system employs techniques for predicting the functions of proteins based on their co-occurrences with explicitly and implicitly mentioned biological molecule terms that pertain functional categories in biomedical literature. That is, PL-PPF employs a combination of statistical-based explicit term extraction techniques and logic-based implicit term extraction techniques. The statistical component of PL-PPF predicts some of the functions of a protein by extracting the explicitly mentioned functional terms that directly describe the functions of the protein from the biomedical texts associated with the protein. The logic-based component of PL-PPF predicts additional functions of the protein by inferring the functional terms that co-occur implicitly with the protein in the biomedical texts associated with it. First, the system employs its statistical-based component to extract the explicitly mentioned functional terms. Then, it employs its logic-based component to infer additional functions of the protein. Our hypothesis is that important biological molecule terms pertaining functional categories of proteins are likely to co-occur implicitly with the proteins in biomedical texts. We evaluated PL-PPF experimentally and compared it with five systems. Results revealed better prediction performance. CONCLUSIONS: The experimental results showed that PL-PPF outperformed the other five systems. This is an indication of the effectiveness and practical viability of PL-PPF's combination of explicit and implicit techniques. We also evaluated two versions of PL-PPF: one adopting the complete techniques (i.e., adopting both the implicit and explicit techniques) and the other adopting only the explicit terms co-occurrence extraction techniques (i.e., without the inference rules for predicate logic). The experimental results showed that the complete version outperformed significantly the other version. This is attributed to the effectiveness of the rules of predicate logic to infer functional terms that co-occur implicitly with proteins in biomedical texts. A demo application of PL-PPF can be accessed through the following link: http://ecesrvr.kustar.ac.ae:8080/plppf/.


Subject(s)
Logic , Proteins/metabolism , Publications , Databases, Genetic , Gene Ontology , Genome, Fungal , Information Storage and Retrieval , Molecular Sequence Annotation , Reproducibility of Results , Saccharomyces cerevisiae/genetics
8.
BMC Bioinformatics ; 20(1): 70, 2019 Feb 08.
Article in English | MEDLINE | ID: mdl-30736752

ABSTRACT

BACKGROUND: Understanding the genetic networks and their role in chronic diseases (e.g., cancer) is one of the important objectives of biological researchers. In this work, we present a text mining system that constructs a gene-gene-interaction network for the entire human genome and then performs network analysis to identify disease-related genes. We recognize the interacting genes based on their co-occurrence frequency within the biomedical literature and by employing linear and non-linear rare-event classification models. We analyze the constructed network of genes by using different network centrality measures to decide on the importance of each gene. Specifically, we apply betweenness, closeness, eigenvector, and degree centrality metrics to rank the central genes of the network and to identify possible cancer-related genes. RESULTS: We evaluated the top 15 ranked genes for different cancer types (i.e., Prostate, Breast, and Lung Cancer). The average precisions for identifying breast, prostate, and lung cancer genes vary between 80-100%. On a prostate case study, the system predicted an average of 80% prostate-related genes. CONCLUSIONS: The results show that our system has the potential for improving the prediction accuracy of identifying gene-gene interaction and disease-gene associations. We also conduct a prostate cancer case study by using the threshold property in logistic regression, and we compare our approach with some of the state-of-the-art methods.


Subject(s)
Epistasis, Genetic , Gene Regulatory Networks , Genetic Predisposition to Disease , Humans , Logistic Models , Male , Prostatic Neoplasms/genetics , ROC Curve
9.
Sci Rep ; 7(1): 15784, 2017 Nov 17.
Article in English | MEDLINE | ID: mdl-29150626

ABSTRACT

Text mining has become an important tool in bioinformatics research with the massive growth in the biomedical literature over the past decade. Mining the biomedical literature has resulted in an incredible number of computational algorithms that assist many bioinformatics researchers. In this paper, we present a text mining system called Gene Interaction Rare Event Miner (GIREM) that constructs gene-gene-interaction networks for human genome using information extracted from biomedical literature. GIREM identifies functionally related genes based on their co-occurrences in the abstracts of biomedical literature. For a given gene g, GIREM first extracts the set of genes found within the abstracts of biomedical literature associated with g. GIREM aims at enhancing biological text mining approaches by identifying the semantic relationship between each co-occurrence of a pair of genes in abstracts using the syntactic structures of sentences and linguistics theories. It uses a supervised learning algorithm, weighted logistic regression to label pairs of genes to related or un-related classes, and to reflect the population proportion using smaller samples. We evaluated GIREM by comparing it experimentally with other well-known approaches and a protein-protein interactions database. Results showed marked improvement.


Subject(s)
Data Mining , Gene Regulatory Networks , Publications , Genes , ROC Curve
10.
BMJ Glob Health ; 2(3): e000394, 2017.
Article in English | MEDLINE | ID: mdl-29018585

ABSTRACT

OBJECTIVE: Road traffic injuries (RTIs) are the leading cause of disability-adjusted life years lost in Oman, Saudi Arabia and United Arab Emirates. Injury prevention strategies often overlook the interaction of individual and behavioural risk factors in assessing the severity of RTI outcomes. We conducted a systematic investigation of the underlying interactive effects of age and gender on the severity of fatal and non-fatal RTI outcomes in the Sultanate of Oman. METHODS: We used the Royal Oman Police national database of road traffic crashes for the period 2010-2014. Our study was based on 35 785 registered incidents: of these, 10.2% fatal injuries, 6.2% serious, 27.3% moderate, 37.3% mild injuries and 19% only vehicle damage but no human injuries. We applied a generalised ordered logit regression to estimate the effect of age and gender on RTI severity, controlling for risk behaviours, personal characteristics, vehicle, road, traffic, environment conditions and geographical location. RESULTS: The most dominant group at risk of all types of RTIs was young male drivers. The probability of severe incapacitating injuries was the highest for drivers aged 25-29 (26.6%) years, whereas the probability of fatal injuries was the highest for those aged 20-24 (26.9%) years. Analysis of three-way interactions of age, gender and causes of crash show that overspeeding was the primary cause of different types of RTIs. In particular, the probability of fatal injuries among male drivers attributed to overspeeding ranged from 3%-6% for those aged 35 years and above to 13.4% and 17.7% for those aged 25-29 years and 20-24 years, respectively. CONCLUSIONS: The high burden of severe and fatal RTIs in Oman was primarily attributed to overspeed driving behaviour of young male drivers in the 20-29 years age range. Our findings highlight the critical need for designing early gender-sensitive road safety interventions targeting young male and female drivers.

SELECTION OF CITATIONS
SEARCH DETAIL
...