Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 48
Filter
1.
Mutat Res Rev Mutat Res ; : 108509, 2024 Jul 06.
Article in English | MEDLINE | ID: mdl-38977176

ABSTRACT

Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder (NDD) influenced by genetic, epigenetic, and environmental factors. Recent advancements in genomic analysis have shed light on numerous genes associated with ASD, highlighting the significant role of both common and rare genetic mutations, as well as copy number variations (CNVs), single nucleotide polymorphisms (SNPs) and unique de novo variants. These genetic variations disrupt neurodevelopmental pathways, contributing to the disorder's complexity. Notably, CNVs are present in 10%-20% of individuals with autism, with 3%-7% detectable through cytogenetic methods. While the role of submicroscopic CNVs in ASD has been recently studied, their association with genomic loci and genes has not been thoroughly explored. In this review, we focus on 47 CNV regions linked to ASD, encompassing 1,632 genes, including protein-coding genes and long non-coding RNAs (lncRNAs), of which 659 show significant brain expression. Using a list of ASD-associated genes from SFARI, we detect 17 regions harboring at least one known ASD-related protein-coding gene. Of the remaining 30 regions, we identify 24 regions containing at least one protein-coding gene with brain-enriched expression and a nervous system phenotype in mouse mutants, and one lncRNA with both brain-enriched expression and upregulation in iPSC to neuron differentiation. This review not only expands our understanding of the genetic diversity associated with ASD but also underscores the potential of lncRNAs in contributing to its etiology. Additionally, the discovered CNVs will be a valuable resource for future diagnostic, therapeutic, and research endeavors aimed at prioritizing genetic variations in ASD.

2.
Brief Bioinform ; 25(2)2024 01 22.
Article in English | MEDLINE | ID: mdl-38483255

ABSTRACT

Spatially resolved transcriptomics (SRT) is a pioneering method for simultaneously studying morphological contexts and gene expression at single-cell precision. Data emerging from SRT are multifaceted, presenting researchers with intricate gene expression matrices, precise spatial details and comprehensive histology visuals. Such rich and intricate datasets, unfortunately, render many conventional methods like traditional machine learning and statistical models ineffective. The unique challenges posed by the specialized nature of SRT data have led the scientific community to explore more sophisticated analytical avenues. Recent trends indicate an increasing reliance on deep learning algorithms, especially in areas such as spatial clustering, identification of spatially variable genes and data alignment tasks. In this manuscript, we provide a rigorous critique of these advanced deep learning methodologies, probing into their merits, limitations and avenues for further refinement. Our in-depth analysis underscores that while the recent innovations in deep learning tailored for SRT have been promising, there remains a substantial potential for enhancement. A crucial area that demands attention is the development of models that can incorporate intricate biological nuances, such as phylogeny-aware processing or in-depth analysis of minuscule histology image segments. Furthermore, addressing challenges like the elimination of batch effects, perfecting data normalization techniques and countering the overdispersion and zero inflation patterns seen in gene expression is pivotal. To support the broader scientific community in their SRT endeavors, we have meticulously assembled a comprehensive directory of readily accessible SRT databases, hoping to serve as a foundation for future research initiatives.


Subject(s)
Deep Learning , Algorithms , Databases, Factual , Gene Expression Profiling , Machine Learning
3.
Article in English | MEDLINE | ID: mdl-38082750

ABSTRACT

Automated detection of atrial fibrillation (AF) from electrocardiogram (ECG) traces remains a challenging task and is crucial for telemonitoring of patients after stroke. This study aimed to quantify the generalizability of a deep learning (DL)-based automated ECG classification algorithm. We first developed a novel hybrid DL (HDL) model using the PhysioNet/CinC Challenge 2017 (CinC2017) dataset (publicly available) that can classify the ECG recordings as one of four classes: normal sinus rhythm (NSR), AF, other rhythms (OR), and too noisy (TN) recordings. The (pre)trained HDL was then used to classify 636 ECG samples collected by our research team using a handheld ECG device, CONTEC PM10 Portable ECG Monitor, from 102 (age: 68 ± 15 years, 74 male) outpatients of the Eastern Heart Clinic and inpatients in the Cardiology ward of Prince of Wales Hospital, Sydney, Australia. The proposed HDL model achieved average test F1-score of 0.892 for NSR, AF, and OR, relative to the reference values, on the CinC2017 dataset. The HDL model also achieved an average F1-score of 0.722 (AF: 0.905, NSR: 0.791, OR: 0.471 and TN: 0.342) on the dataset created by our research team. After retraining the HDL model on this dataset using a 5-fold cross validation method, the average F1-score increased to 0.961. We finally conclude that the generalizability of the HDL-based algorithm developed for AF detection from short-term single-lead ECG traces is acceptable. However, the accuracy of the pre-trained DL model was significantly improved by retraining the model parameters on the new dataset of ECG traces.


Subject(s)
Atrial Fibrillation , Deep Learning , Humans , Male , Middle Aged , Aged , Aged, 80 and over , Atrial Fibrillation/diagnosis , Signal Processing, Computer-Assisted , Algorithms , Electrocardiography
4.
Article in English | MEDLINE | ID: mdl-38083096

ABSTRACT

Transfer learning (TL) has been proven to be a good strategy for solving domain-specific problems in many deep learning (DL) applications. Typically, in TL, a pre-trained DL model is used as a feature extractor and the extracted features are then fed to a newly trained classifier as the model head. In this study, we propose a new ensemble approach of transfer learning that uses multiple neural network classifiers at once in the model head. We compared the classification results of the proposed ensemble approach with the direct approach of several popular models, namely VGG-16, ResNet-50, and MobileNet, on two publicly available tuberculosis datasets, i.e., Montgomery County (MC) and Shenzhen (SZ) datasets. Moreover, we also compared the results when a fully pre-trained DL model was used for feature extraction versus the cases in which the features were obtained from a middle layer of the pre-trained DL model. Several metrics derived from confusion matrix results were used, namely the accuracy (ACC), sensitivity (SNS), specificity (SPC), precision (PRC), and F1-score. We concluded that the proposed ensemble approach outperformed the direct approach. Best result was achieved by ResNet-50 when the features were extracted from a middle layer with an accuracy of 91.2698% on MC dataset.Clinical Relevance- The proposed ensemble approach could increase the detection accuracy of 7-8% for Montgomery County dataset and 4-5% for Shenzhen dataset.


Subject(s)
Benchmarking , Neural Networks, Computer , Problem Solving , Machine Learning
5.
Adv Sci (Weinh) ; 10(33): e2303502, 2023 11.
Article in English | MEDLINE | ID: mdl-37816141

ABSTRACT

Single-cell Hi-C (scHi-C) has made it possible to analyze chromatin organization at the single-cell level. However, scHi-C experiments generate inherently sparse data, which poses a challenge for loop calling methods. The existing approach performs significance tests across the imputed dense contact maps, leading to substantial computational overhead and loss of information at the single-cell level. To overcome this limitation, a lightweight framework called scGSLoop is proposed, which sets a new paradigm for scHi-C loop calling by adapting the training and inferencing strategies of graph-based deep learning to leverage the sequence features and 1D positional information of genomic loci. With this framework, sparsity is no longer a challenge, but rather an advantage that the model leverages to achieve unprecedented computational efficiency. Compared to existing methods, scGSLoop makes more accurate predictions and is able to identify more loops that have the potential to play regulatory roles in genome functioning. Moreover, scGSLoop preserves single-cell information by identifying a distinct group of loops for each individual cell, which not only enables an understanding of the variability of chromatin looping states between cells, but also allows scGSLoop to be extended for the investigation of multi-connected hubs and their underlying mechanisms.


Subject(s)
Chromatin , Genomics , Chromatin/genetics , Genome
6.
Blood ; 142(17): 1448-1462, 2023 10 26.
Article in English | MEDLINE | ID: mdl-37595278

ABSTRACT

Hematopoietic stem and progenitor cells (HSPCs) rely on a complex interplay among transcription factors (TFs) to regulate differentiation into mature blood cells. A heptad of TFs (FLI1, ERG, GATA2, RUNX1, TAL1, LYL1, LMO2) bind regulatory elements in bulk CD34+ HSPCs. However, whether specific heptad-TF combinations have distinct roles in regulating hematopoietic differentiation remains unknown. We mapped genome-wide chromatin contacts (HiC, H3K27ac, HiChIP), chromatin modifications (H3K4me3, H3K27ac, H3K27me3) and 10 TF binding profiles (heptad, PU.1, CTCF, STAG2) in HSPC subsets (stem/multipotent progenitors plus common myeloid, granulocyte macrophage, and megakaryocyte erythrocyte progenitors) and found TF occupancy and enhancer-promoter interactions varied significantly across cell types and were associated with cell-type-specific gene expression. Distinct regulatory elements were enriched with specific heptad-TF combinations, including stem-cell-specific elements with ERG, and myeloid- and erythroid-specific elements with combinations of FLI1, RUNX1, GATA2, TAL1, LYL1, and LMO2. Furthermore, heptad-occupied regions in HSPCs were subsequently bound by lineage-defining TFs, including PU.1 and GATA1, suggesting that heptad factors may prime regulatory elements for use in mature cell types. We also found that enhancers with cell-type-specific heptad occupancy shared a common grammar with respect to TF binding motifs, suggesting that combinatorial binding of TF complexes was at least partially regulated by features encoded in DNA sequence motifs. Taken together, this study comprehensively characterizes the gene regulatory landscape in rare subpopulations of human HSPCs. The accompanying data sets should serve as a valuable resource for understanding adult hematopoiesis and a framework for analyzing aberrant regulatory networks in leukemic cells.


Subject(s)
Core Binding Factor Alpha 2 Subunit , Hematopoietic Stem Cells , Humans , Core Binding Factor Alpha 2 Subunit/genetics , Core Binding Factor Alpha 2 Subunit/metabolism , Hematopoietic Stem Cells/metabolism , Gene Expression Regulation , Hematopoiesis/genetics , Chromatin/metabolism
7.
PLoS Comput Biol ; 19(7): e1011249, 2023 07.
Article in English | MEDLINE | ID: mdl-37486921

ABSTRACT

The genetic etiology of brain disorders is highly heterogeneous, characterized by abnormalities in the development of the central nervous system that lead to diminished physical or intellectual capabilities. The process of determining which gene drives disease, known as "gene prioritization," is not entirely understood. Genome-wide searches for gene-disease associations are still underdeveloped due to reliance on previous discoveries and evidence sources with false positive or negative relations. This paper introduces DeepGenePrior, a model based on deep neural networks that prioritizes candidate genes in genetic diseases. Using the well-studied Variational AutoEncoder (VAE), we developed a score to measure the impact of genes on target diseases. Unlike other methods that use prior data to select candidate genes, based on the "guilt by association" principle and auxiliary data sources like protein networks, our study exclusively employs copy number variants (CNVs) for gene prioritization. By analyzing CNVs from 74,811 individuals with autism, schizophrenia, and developmental delay, we identified genes that best distinguish cases from controls. Our findings indicate a 12% increase in fold enrichment in brain-expressed genes compared to previous studies and a 15% increase in genes associated with mouse nervous system phenotypes. Furthermore, we identified common deletions in ZDHHC8, DGCR5, and CATG00000022283 among the top genes related to all three disorders, suggesting a common etiology among these clinically distinct conditions. DeepGenePrior is publicly available online at http://git.dml.ir/z_rahaie/DGP to address obstacles in existing gene prioritization studies identifying candidate genes.


Subject(s)
Autistic Disorder , Deep Learning , Animals , Mice , DNA Copy Number Variations/genetics , Autistic Disorder/genetics , Brain , Genetic Predisposition to Disease/genetics
8.
Cancers (Basel) ; 15(14)2023 Jul 11.
Article in English | MEDLINE | ID: mdl-37509229

ABSTRACT

Higher eukaryotic enhancers, as a major class of regulatory elements, play a crucial role in the regulation of gene expression. Over the last decade, the development of sequencing technologies has flooded researchers with transcriptome-phenotype data alongside emerging candidate regulatory elements. Since most methods can only provide hints about enhancer function, there have been attempts to develop experimental and computational approaches that can bridge the gap in the causal relationship between regulatory regions and phenotypes. The coupling of two state-of-the-art technologies, also referred to as crisprQTL, has emerged as a promising high-throughput toolkit for addressing this question. This review provides an overview of the importance of studying enhancers, the core molecular foundation of crisprQTL, and recent studies utilizing crisprQTL to interrogate enhancer-phenotype correlations. Additionally, we discuss computational methods currently employed for crisprQTL data analysis. We conclude by pointing out common challenges, making recommendations, and looking at future prospects, with the aim of providing researchers with an overview of crisprQTL as an important toolkit for studying enhancers.

9.
Comput Biol Med ; 160: 106998, 2023 06.
Article in English | MEDLINE | ID: mdl-37182422

ABSTRACT

In recent years, cardiovascular diseases (CVDs) have become one of the leading causes of mortality globally. At early stages, CVDs appear with minor symptoms and progressively get worse. The majority of people experience symptoms such as exhaustion, shortness of breath, ankle swelling, fluid retention, and other symptoms when starting CVD. Coronary artery disease (CAD), arrhythmia, cardiomyopathy, congenital heart defect (CHD), mitral regurgitation, and angina are the most common CVDs. Clinical methods such as blood tests, electrocardiography (ECG) signals, and medical imaging are the most effective methods used for the detection of CVDs. Among the diagnostic methods, cardiac magnetic resonance imaging (CMRI) is increasingly used to diagnose, monitor the disease, plan treatment and predict CVDs. Coupled with all the advantages of CMR data, CVDs diagnosis is challenging for physicians as each scan has many slices of data, and the contrast of it might be low. To address these issues, deep learning (DL) techniques have been employed in the diagnosis of CVDs using CMR data, and much research is currently being conducted in this field. This review provides an overview of the studies performed in CVDs detection using CMR images and DL techniques. The introduction section examined CVDs types, diagnostic methods, and the most important medical imaging techniques. The following presents research to detect CVDs using CMR images and the most significant DL methods. Another section discussed the challenges in diagnosing CVDs from CMRI data. Next, the discussion section discusses the results of this review, and future work in CVDs diagnosis from CMR images and DL techniques are outlined. Finally, the most important findings of this study are presented in the conclusion section.


Subject(s)
Cardiovascular Diseases , Coronary Artery Disease , Deep Learning , Humans , Cardiovascular Diseases/diagnostic imaging , Magnetic Resonance Imaging , Heart , Coronary Artery Disease/diagnosis
10.
Comput Biol Med ; 158: 106841, 2023 05.
Article in English | MEDLINE | ID: mdl-37028142

ABSTRACT

Invasive angiography is the reference standard for coronary artery disease (CAD) diagnosis but is expensive and associated with certain risks. Machine learning (ML) using clinical and noninvasive imaging parameters can be used for CAD diagnosis to avoid the side effects and cost of angiography. However, ML methods require labeled samples for efficient training. The labeled data scarcity and high labeling costs can be mitigated by active learning. This is achieved through selective query of challenging samples for labeling. To the best of our knowledge, active learning has not been used for CAD diagnosis yet. An Active Learning with Ensemble of Classifiers (ALEC) method is proposed for CAD diagnosis, consisting of four classifiers. Three of these classifiers determine whether a patient's three main coronary arteries are stenotic or not. The fourth classifier predicts whether the patient has CAD or not. ALEC is first trained using labeled samples. For each unlabeled sample, if the outputs of the classifiers are consistent, the sample along with its predicted label is added to the pool of labeled samples. Inconsistent samples are manually labeled by medical experts before being added to the pool. The training is performed once more using the samples labeled so far. The interleaved phases of labeling and training are repeated until all samples are labeled. Compared with 19 other active learning algorithms, ALEC combined with a support vector machine classifier attained superior performance with 97.01% accuracy. Our method is justified mathematically as well. We also comprehensively analyze the CAD dataset used in this paper. As part of dataset analysis, features pairwise correlation is computed. The top 15 features contributing to CAD and stenosis of the three main coronary arteries are determined. The relationship between stenosis of the main arteries is presented using conditional probabilities. The effect of considering the number of stenotic arteries on sample discrimination is investigated. The discrimination power over dataset samples is visualized, assuming each of the three main coronary arteries as a sample label and considering the two remaining arteries as sample features.


Subject(s)
Coronary Artery Disease , Humans , Coronary Artery Disease/diagnostic imaging , Constriction, Pathologic , Algorithms , Coronary Angiography
11.
Sensors (Basel) ; 23(3)2023 Jan 28.
Article in English | MEDLINE | ID: mdl-36772503

ABSTRACT

Continuous advancements of technologies such as machine-to-machine interactions and big data analysis have led to the internet of things (IoT) making information sharing and smart decision-making possible using everyday devices. On the other hand, swarm intelligence (SI) algorithms seek to establish constructive interaction among agents regardless of their intelligence level. In SI algorithms, multiple individuals run simultaneously and possibly in a cooperative manner to address complex nonlinear problems. In this paper, the application of SI algorithms in IoT is investigated with a special focus on the internet of medical things (IoMT). The role of wearable devices in IoMT is briefly reviewed. Existing works on applications of SI in addressing IoMT problems are discussed. Possible problems include disease prediction, data encryption, missing values prediction, resource allocation, network routing, and hardware failure management. Finally, research perspectives and future trends are outlined.


Subject(s)
Internet of Things , Wearable Electronic Devices , Humans , Algorithms , Cognition , Intelligence , Internet
12.
Int J Mol Sci ; 24(3)2023 Jan 27.
Article in English | MEDLINE | ID: mdl-36768794

ABSTRACT

Prostate cancer (PC) is the most frequently diagnosed non-skin cancer in the world. Previous studies have shown that genomic alterations represent the most common mechanism for molecular alterations responsible for the development and progression of PC. This highlights the importance of identifying functional genomic variants for early detection in high-risk PC individuals. Great efforts have been made to identify common protein-coding genetic variations; however, the impact of non-coding variations, including regulatory genetic variants, is not well understood. Identification of these variants and the underlying target genes will be a key step in improving the detection and treatment of PC. To gain an understanding of the functional impact of genetic variants, and in particular, regulatory variants in PC, we developed an integrative pipeline (AGV) that uses whole genome/exome sequences, GWAS SNPs, chromosome conformation capture data, and ChIP-Seq signals to investigate the potential impact of genomic variants on the underlying target genes in PC. We identified 646 putative regulatory variants, of which 30 significantly altered the expression of at least one protein-coding gene. Our analysis of chromatin interactions data (Hi-C) revealed that the 30 putative regulatory variants could affect 131 coding and non-coding genes. Interestingly, our study identified the 131 protein-coding genes that are involved in disease-related pathways, including Reactome and MSigDB, for most of which targeted treatment options are currently available. Notably, our analysis revealed several non-coding RNAs, including RP11-136K7.2 and RAMP2-AS1, as potential enhancer elements of the protein-coding genes CDH12 and EZH1, respectively. Our results provide a comprehensive map of genomic variants in PC and reveal their potential contribution to prostate cancer progression and development.


Subject(s)
Genome-Wide Association Study , Prostatic Neoplasms , Male , Humans , Genome-Wide Association Study/methods , Genetic Predisposition to Disease , Prostatic Neoplasms/genetics , Chromatin , Genomics , Polymorphism, Single Nucleotide
13.
Article in English | MEDLINE | ID: mdl-36322495

ABSTRACT

Alzheimer's is progressive and irreversible type of dementia, which causes degeneration and death of cells and their connections in the brain. AD worsens over time and greatly impacts patients' life and affects their important mental functions, including thinking, the ability to carry on a conversation, and judgment and response to environment. Clinically, there is no single test to effectively diagnose Alzheimer disease. However, computed tomography (CT) and magnetic resonance imaging (MRI) scans can be used to help in AD diagnosis by observing critical changes in the size of different brain areas, typically parietal and temporal lobes areas. In this work, an integrative mulitresolutional ensemble deep learning-based framework is proposed to achieve better predictive performance for the diagnosis of Alzheimer disease. Unlike ResNet, DenseNet and their variants proposed pipeline utilizes PartialNet in a hierarchical design tailored to AD detection using brain MRIs. The advantage of the proposed analysis system is that PartialNet diversified the depth and deep supervision. Additionally, it also incorporates the properties of identity mappings which makes it powerful in better learning due to feature reuse. Besides, the proposed ensemble PartialNet is better in vanishing gradient, diminishing forward-flow with low number of parameters and better training time in comparison to its counter network. The proposed analysis pipeline has been tested and evaluated on benchmark ADNI dataset collected from 379 subjects patients. Quantitative validation of the obtained results documented our framework's capability, outperforming state-of-the-art learning approaches for both multi-and binary-class AD detection.

14.
Int J Mol Sci ; 23(22)2022 Nov 20.
Article in English | MEDLINE | ID: mdl-36430895

ABSTRACT

Here we developed KARAJ, a fast and flexible Linux command-line tool to automate the end-to-end process of querying and downloading a wide range of genomic and transcriptomic sequence data types. The input to KARAJ is a list of PMCIDs or publication URLs or various types of accession numbers to automate four tasks as follows; firstly, it provides a summary list of accessible datasets generated by or used in these scientific articles, enabling users to select appropriate datasets; secondly, KARAJ calculates the size of files that users want to download and confirms the availability of adequate space on the local disk; thirdly, it generates a metadata table containing sample information and the experimental design of the corresponding study; and lastly, it enables users to download supplementary data tables attached to publications. Further, KARAJ provides a parallel downloading framework powered by Aspera connect which reduces the downloading time significantly.


Subject(s)
Software , Transcriptome , Genome , Genomics , Metadata
15.
Comput Struct Biotechnol J ; 20: 4975-4983, 2022.
Article in English | MEDLINE | ID: mdl-36147666

ABSTRACT

Copy Number Variation (CNV) refers to a type of structural genomic alteration in which a segment of chromosome is duplicated or deleted. To date, many CNVs have been identified as causative genetic elements for several diseases and phenotypes. However, performing a CNV-based genome-wide association study is challenging due to inconsistency in length and occurrence of CNVs across different individuals under investigation. One of the most efficient strategies to address this issue is building CNV regions (genomic regions in which CNVs are overlapping - CNVRs). However, this approach is susceptible to a high false positive rate due to overlapping and co-occurring of confounding CNVRs with true positive CNVRs. Here, we develop PeakCNV that differentiates false-positive CNVRs from true positives by calculating a new metric, independence ranking score, (IR-score) via a feature ranking approach. We compared the performance of PeakCNV with other current existing tools by carrying out two case studies one using the CNV genotype data for individuals with prostate cancer (194 cases and 2,392 healthy individuals) and the second one for individuals with neurodevelopmental disorders (19,642 cases and 6,451 healthy individuals). Crucially, our benchmarking analyses on prostate cancer cohort indicated that PeakCNV identifies a fewer risk candidate CNVRs with shorter lengths compared to other tools. Importantly, these CNVRs cover a greater proportion of case over healthy individuals compared to other tools. The accuracy of PeakCNV in identifying relevant candidate CNVRs was reproducible in the case study on neurodevelopmental disorders. Using data from the FANTOM5 expression atlas and the Clinical Genomic Database, we show that the candidate CNVRs identified by PeakCNV for neurodevelopmental disorders overlap with a greater number of genes with the brain-enriched expression, and a greater number of genes that are associated with neurological conditions compared to candidate CNVRs identified by other tools. Taken together, PeakCNV outperformed current existing CNV association study tools by identifying more biologically meaningful CNVRs relevant to the phenotype of interest. PeakCNV is publicly available for the analysis of CNV-associated diseases and is accessible from https://rdrr.io/github/mahdieh1/PeakCNV.

17.
Genomics ; 114(5): 110454, 2022 09.
Article in English | MEDLINE | ID: mdl-36030022

ABSTRACT

Cis-regulatory elements (CREs) are non-coding parts of the genome that play a critical role in gene expression regulation. Enhancers, as an important example of CREs, interact with genes to influence complex traits like disease, heat tolerance and growth rate. Much of what is known about enhancers come from studies of humans and a few model organisms like mouse, with little known about other mammalian species. Previous studies have attempted to identify enhancers in less studied mammals using comparative genomics but with limited success. Recently, Machine Learning (ML) techniques have shown promising results to predict enhancer regions. Here, we investigated the ability of ML methods to identify enhancers in three non-model mammalian species (cattle, pig and dog) using human and mouse enhancer data from VISTA and publicly available ChIP-seq. We tested nine models, using four different representations of the DNA sequences in cross-species prediction using both the VISTA dataset and species-specific ChIP-seq data. We identified between 809,399 and 877,278 enhancer-like regions (ELRs) in the study species (11.6-13.7% of each genome). These predictions were close to the ~8% proportion of ELRs that covered the human genome. We propose that our ML methods have predictive ability for identifying enhancers in non-model mammalian species. We have provided a list of high confidence enhancers at https://github.com/DaviesCentreInformatics/Cross-species-enhancer-prediction and believe these enhancers will be of great use to the community.


Subject(s)
Enhancer Elements, Genetic , Genomics , Animals , Base Sequence , Cattle , Dogs , Genome, Human , Genomics/methods , Humans , Machine Learning , Mammals/genetics , Mice , Swine
18.
Front Public Health ; 10: 869238, 2022.
Article in English | MEDLINE | ID: mdl-35812486

ABSTRACT

Early diagnosis, prioritization, screening, clustering, and tracking of patients with COVID-19, and production of drugs and vaccines are some of the applications that have made it necessary to use a new style of technology to involve, manage, and deal with this epidemic. Strategies backed by artificial intelligence (A.I.) and the Internet of Things (IoT) have been undeniably effective to understand how the virus works and prevent it from spreading. Accordingly, the main aim of this survey is to critically review the ML, IoT, and the integration of IoT and ML-based techniques in the applications related to COVID-19, from the diagnosis of the disease to the prediction of its outbreak. According to the main findings, IoT provided a prompt and efficient approach to tracking the disease spread. On the other hand, most of the studies developed by ML-based techniques aimed at the detection and handling of challenges associated with the COVID-19 pandemic. Among different approaches, Convolutional Neural Network (CNN), Support Vector Machine, Genetic CNN, and pre-trained CNN, followed by ResNet have demonstrated the best performances compared to other methods.


Subject(s)
COVID-19 , Internet of Things , Machine Learning , Artificial Intelligence , COVID-19/epidemiology , Humans , Neural Networks, Computer , Pandemics/prevention & control , Support Vector Machine
19.
BMC Bioinformatics ; 23(1): 298, 2022 Jul 25.
Article in English | MEDLINE | ID: mdl-35879674

ABSTRACT

BACKGROUND: The advent of high throughput sequencing has enabled researchers to systematically evaluate the genetic variations in cancer, identifying many cancer-associated genes. Although cancers in the same tissue are widely categorized in the same group, they demonstrate many differences concerning their mutational profiles. Hence, there is no definitive treatment for most cancer types. This reveals the importance of developing new pipelines to identify cancer-associated genes accurately and re-classify patients with similar mutational profiles. Classification of cancer patients with similar mutational profiles may help discover subtypes of cancer patients who might benefit from specific treatment types. RESULTS: In this study, we propose a new machine learning pipeline to identify protein-coding genes mutated in many samples to identify cancer subtypes. We apply our pipeline to 12,270 samples collected from the international cancer genome consortium, covering 19 cancer types. As a result, we identify 17 different cancer subtypes. Comprehensive phenotypic and genotypic analysis indicates distinguishable properties, including unique cancer-related signaling pathways. CONCLUSIONS: This new subtyping approach offers a novel opportunity for cancer drug development based on the mutational profile of patients. Additionally, we analyze the mutational signatures for samples in each subtype, which provides important insight into their active molecular mechanisms. Some of the pathways we identified in most subtypes, including the cell cycle and the Axon guidance pathways, are frequently observed in cancer disease. Interestingly, we also identified several mutated genes and different rates of mutation in multiple cancer subtypes. In addition, our study on "gene-motif" suggests the importance of considering both the context of the mutations and mutational processes in identifying cancer-associated genes. The source codes for our proposed clustering pipeline and analysis are publicly available at: https://github.com/bcb-sut/Pan-Cancer .


Subject(s)
Neoplasms , Point Mutation , Cluster Analysis , Genome, Human , Humans , Mutation , Neoplasms/genetics
20.
Front Public Health ; 10: 879418, 2022.
Article in English | MEDLINE | ID: mdl-35712286

ABSTRACT

Age estimation in dental radiographs Orthopantomography (OPG) is a medical imaging technique that physicians and pathologists utilize for disease identification and legal matters. For example, for estimating post-mortem interval, detecting child abuse, drug trafficking, and identifying an unknown body. Recent development in automated image processing models improved the age estimation's limited precision to an approximate range of +/- 1 year. While this estimation is often accepted as accurate measurement, age estimation should be as precise as possible in most serious matters, such as homicide. Current age estimation techniques are highly dependent on manual and time-consuming image processing. Age estimation is often a time-sensitive matter in which the image processing time is vital. Recent development in Machine learning-based data processing methods has decreased the imaging time processing; however, the accuracy of these techniques remains to be further improved. We proposed an ensemble method of image classifiers to enhance the accuracy of age estimation using OPGs from 1 year to a couple of months (1-3-6). This hybrid model is based on convolutional neural networks (CNN) and K nearest neighbors (KNN). The hybrid (HCNN-KNN) model was used to investigate 1,922 panoramic dental radiographs of patients aged 15 to 23. These OPGs were obtained from the various teaching institutes and private dental clinics in Malaysia. To minimize the chance of overfitting in our model, we used the principal component analysis (PCA) algorithm and eliminated the features with high correlation. To further enhance the performance of our hybrid model, we performed systematic image pre-processing. We applied a series of classifications to train our model. We have successfully demonstrated that combining these innovative approaches has improved the classification and segmentation and thus the age-estimation outcome of the model. Our findings suggest that our innovative model, for the first time, to the best of our knowledge, successfully estimated the age in classified studies of 1 year old, 6 months, 3 months and 1-month-old cases with accuracies of 99.98, 99.96, 99.87, and 98.78 respectively.


Subject(s)
Image Processing, Computer-Assisted , Neural Networks, Computer , Algorithms , Child , Cluster Analysis , Humans , Image Processing, Computer-Assisted/methods , Infant , Radiography, Panoramic
SELECTION OF CITATIONS
SEARCH DETAIL
...