Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 50
Filter
1.
Article in English | MEDLINE | ID: mdl-37022825

ABSTRACT

Stage-based sleep screening is a widely-used tool in both healthcare and neuroscientific research, as it allows for the accurate assessment of sleep patterns and stages. In this paper, we propose a novel framework that is based on authoritative guidance in sleep medicine and is designed to automatically capture the time-frequency characteristics of sleep electroencephalogram (EEG) signals in order to make staging decisions. Our framework consists of two main phases: a feature extraction process that partitions the input EEG spectrograms into a sequence of time-frequency patches, and a staging phase that searches for correlations between the extracted features and the defining characteristics of sleep stages. To model the staging phase, we utilize a Transformer model with an attention-based module, which allows for the extraction of global contextual relevance among time-frequency patches and the use of this relevance for staging decisions. The proposed method is validated on the large-scale Sleep Heart Health Study dataset and achieves new state-of-the-art results for the wake, N2, and N3 stages, with respective F1 scores of 0.93, 0.88, and 0.87 using only EEG signals. Our method also demonstrates high inter-rater reliability, with a kappa score of 0.80. Moreover, we provide visualizations of the correspondence between sleep staging decisions and features extracted by our method, which enhances the interpretability of the proposal. Overall, our work represents a significant contribution to the field of automated sleep staging and has important implications for both healthcare and neuroscience research.


Subject(s)
Sleep Stages , Sleep , Humans , Reproducibility of Results , Polysomnography/methods , Electroencephalography/methods
2.
Comput Methods Programs Biomed ; 236: 107543, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37100024

ABSTRACT

BACKGROUND AND OBJECTIVE: Defining and separating cancer subtypes is essential for facilitating personalized therapy modality and prognosis of patients. The definition of subtypes has been constantly recalibrated as a result of our deepened understanding. During this recalibration, researchers often rely on clustering of cancer data to provide an intuitive visual reference that could reveal the intrinsic characteristics of subtypes. The data being clustered are often omics data such as transcriptomics that have strong correlations to the underlying biological mechanism. However, while existing studies have shown promising results, they suffer from issues associated with omics data: sample scarcity and high dimensionality while they impose unrealistic assumptions to extract useful features from the data while avoiding overfitting to spurious correlations. METHODS: This paper proposes to leverage a recent strong generative model, Vector-Quantized Variational AutoEncoder, to tackle the data issues and extract discrete representations that are crucial to the quality of subsequent clustering by retaining only information relevant to reconstructing the input. RESULTS: Extensive experiments and medical analysis on multiple datasets comprising 10 distinct cancers demonstrate the proposed clustering results can significantly and robustly improve prognosis over prevalent subtyping systems. CONCLUSION: Our proposal does not impose strict assumptions on data distribution; while, its latent features are better representations of the transcriptomic data in different cancer subtypes, capable of yielding superior clustering performance with any mainstream clustering method.


Subject(s)
Neoplasms , Humans , Gene Expression Profiling , Transcriptome , Cluster Analysis
3.
Life (Basel) ; 13(2)2023 Feb 03.
Article in English | MEDLINE | ID: mdl-36836796

ABSTRACT

The use of herbal medicines in recent decades has increased because their side effects are considered lower than conventional medicine. Unani herbal medicines are often used in Southern Asia. These herbal medicines are usually composed of several types of medicinal plants to treat various diseases. Research on herbal medicine usually focuses on insight into the composition of plants used as ingredients. However, in the present study, we extended to the level of metabolites that exist in the medicinal plants. This study aimed to develop a predictive model of the Unani therapeutic usage based on its constituent metabolites using deep learning and data-intensive science approaches. Furthermore, the best prediction model was then utilized to extract important metabolites for each therapeutic usage of Unani. In this study, it was observed that the deep neural network approach provided a much better prediction model than other algorithms including random forest and support vector machine. Moreover, according to the best prediction model using the deep neural network, we identified 118 important metabolites for nine therapeutic usages of Unani.

4.
Plant Methods ; 18(1): 118, 2022 Nov 05.
Article in English | MEDLINE | ID: mdl-36335358

ABSTRACT

BACKGROUND: Phytochemicals or secondary metabolites are low molecular weight organic compounds with little function in plant growth and development. Nevertheless, the metabolite diversity govern not only the phenetics of an organism but may also inform the evolutionary pattern and adaptation of green plants to the changing environment. Plant chemoinformatics analyzes the chemical system of natural products using computational tools and robust mathematical algorithms. It has been a powerful approach for species-level differentiation and is widely employed for species classifications and reinforcement of previous classifications. RESULTS: This study attempts to classify Angiosperms using plant sulfur-containing compound (SCC) or sulphated compound information. The SCC dataset of 692 plant species were collected from the comprehensive species-metabolite relationship family (KNApSAck) database. The structural similarity score of metabolite pairs under all possible combinations (plant species-metabolite) were determined and metabolite pairs with a Tanimoto coefficient value > 0.85 were selected for clustering using machine learning algorithm. Metabolite clustering showed association between the similar structural metabolite clusters and metabolite content among the plant species. Phylogenetic tree construction of Angiosperms displayed three major clades, of which, clade 1 and clade 2 represented the eudicots only, and clade 3, a mixture of both eudicots and monocots. The SCC-based construction of Angiosperm phylogeny is a subset of the existing monocot-dicot classification. The majority of eudicots present in clade 1 and 2 were represented by glucosinolate compounds. These clades with SCC may have been a mixture of ancestral species whilst the combinatorial presence of monocot-dicot in clade 3 suggests sulphated-chemical structure diversification in the event of adaptation during evolutionary change. CONCLUSIONS: Sulphated chemoinformatics informs classification of Angiosperms via machine learning technique.

5.
Antibiotics (Basel) ; 11(9)2022 Sep 05.
Article in English | MEDLINE | ID: mdl-36139978

ABSTRACT

Jamu is the traditional Indonesian herbal medicine system that is considered to have many benefits such as serving as a cure for diseases or maintaining sound health. A Jamu medicine is generally made from a mixture of several herbs. Natural antibiotics can provide a way to handle the problem of antibiotic resistance. This research aims to discover the potential of herbal plants as natural antibiotic candidates based on a machine learning approach. Our input data consists of a list of herbal formulas with plants as their constituents. The target class corresponds to bacterial diseases that can be cured by herbal formulas. The best model has been observed by implementing the Random Forest (RF) algorithm. For 10-fold cross-validations, the maximum accuracy, recall, and precision are 91.10%, 91.10%, and 90.54% with standard deviations 1.05, 1.05, and 1.48, respectively, which imply that the model obtained is good and robust. This study has shown that 14 plants can be potentially used as natural antibiotic candidates. Furthermore, according to scientific journals, 10 of the 14 selected plants have direct or indirect antibacterial activity.

6.
Mol Inform ; 41(7): e2100247, 2022 07.
Article in English | MEDLINE | ID: mdl-35014190

ABSTRACT

The plants produce numerous types of secondary metabolites which have pharmacological importance in drug development for different diseases. Computational methods widely use the fingerprints of the metabolites to understand different properties and similarities among metabolites and for the prediction of chemical reactions etc. In this work, we developed three different deep neural network models (DNN) to predict the antibacterial property of plant metabolites. We developed the first DNN model using the fingerprint set of metabolites as features. In the second DNN model, we searched the similarities among fingerprints using correlation and used one representative feature from each group of highly correlated fingerprints. In the third model, the fingerprints of metabolites were used to find structurally similar chemical compound clusters. Form each cluster a representative metabolite is selected and made part of the training dataset. The second model reduced the number of features where the third model achieved better classification results for test data. In both cases, we applied the simple graph clustering method to cluster the corresponding network. The correlation-based DNN model reduced some features while retaining an almost similar performance compared to the first DNN model. The third model improves classification results for test data by capturing wider variance within training data using graph clustering method. This third model is somewhat novel approach and can be applied to build DNN models for other purposes.


Subject(s)
Anti-Bacterial Agents , Neural Networks, Computer , Anti-Bacterial Agents/pharmacology , Cluster Analysis
7.
PeerJ ; 9: e11876, 2021.
Article in English | MEDLINE | ID: mdl-34430080

ABSTRACT

BACKGROUND: Glucosinolates (GSLs) are plant secondary metabolites that contain nitrogen-containing compounds. They are important in the plant defense system and known to provide protection against cancer in humans. Currently, increasing the amount of data generated from various omics technologies serves as a hotspot for new gene discovery. However, sometimes sequence similarity searching approach is not sufficiently effective to find these genes; hence, we adapted a network clustering approach to search for potential GSLs genes from the Arabidopsis thaliana co-expression dataset. METHODS: We used known GSL genes to construct a comprehensive GSL co-expression network. This network was analyzed with the DPClusOST algorithm using a density of 0.5. 0.6. 0.7, 0.8, and 0.9. Generating clusters were evaluated using Fisher's exact test to identify GSL gene co-expression clusters. A significance score (SScore) was calculated for each gene based on the generated p-value of Fisher's exact test. SScore was used to perform a receiver operating characteristic (ROC) study to classify possible GSL genes using the ROCR package. ROCR was used in determining the AUC that measured the suitable density value of the cluster for further analysis. Finally, pathway enrichment analysis was conducted using ClueGO to identify significant pathways associated with the GSL clusters. RESULTS: The density value of 0.8 showed the highest area under the curve (AUC) leading to the selection of thirteen potential GSL genes from the top six significant clusters that include IMDH3, MVP1, T19K24.17, MRSA2, SIR, ASP4, MTO1, At1g21440, HMT3, At3g47420, PS1, SAL1, and At3g14220. A total of Four potential genes (MTO1, SIR, SAL1, and IMDH3) were identified from the pathway enrichment analysis on the significant clusters. These genes are directly related to GSL-associated pathways such as sulfur metabolism and valine, leucine, and isoleucine biosynthesis. This approach demonstrates the ability of the network clustering approach in identifying potential GSL genes which cannot be found from the standard similarity search.

8.
Life (Basel) ; 11(8)2021 Aug 23.
Article in English | MEDLINE | ID: mdl-34440610

ABSTRACT

BACKGROUND: We performed in silico prediction of the interactions between compounds of Jamu herbs and human proteins by utilizing data-intensive science and machine learning methods. Verifying the proteins that are targeted by compounds of natural herbs will be helpful to select natural herb-based drug candidates. METHODS: Initially, data related to compounds, target proteins, and interactions between them were collected from open access databases. Compounds are represented by molecular fingerprints, whereas amino acid sequences are represented by numerical protein descriptors. Then, prediction models that predict the interactions between compounds and target proteins were constructed using support vector machine and random forest. RESULTS: A random forest model constructed based on MACCS fingerprint and amino acid composition obtained the highest accuracy. We used the best model to predict target proteins for 94 important Jamu compounds and assessed the results by supporting evidence from published literature and other sources. There are 27 compounds that can be validated by professional doctors, and those compounds belong to seven efficacy groups. CONCLUSION: By comparing the efficacy of predicted compounds and the relations of the targeted proteins with diseases, we found that some compounds might be considered as drug candidates.

9.
Sci Rep ; 11(1): 14450, 2021 07 14.
Article in English | MEDLINE | ID: mdl-34262063

ABSTRACT

Mental disorders (MDs), including schizophrenia (SCZ) and bipolar disorder (BD), have attracted special attention from scientists due to their high prevalence and significantly debilitating clinical features. The diagnosis of MDs is still essentially based on clinical interviews, and intensive efforts to introduce biochemical based diagnostic methods have faced several difficulties for implementation in clinics, due to the complexity and still limited knowledge in MDs. In this context, aiming for improving the knowledge in etiology and pathophysiology, many authors have reported several alterations in metabolites in MDs and other brain diseases. After potentially fishing all metabolite biomarkers reported up to now for SCZ and BD, we investigated here the proteins related to these metabolites in order to construct a protein-protein interaction (PPI) network associated with these diseases. We determined the statistically significant clusters in this PPI network and, based on these clusters, we identified 28 significant pathways for SCZ and BDs that essentially compose three groups representing three major systems, namely stress response, energy and neuron systems. By characterizing new pathways with potential to innovate the diagnosis and treatment of psychiatric diseases, the present data may also contribute to the proposal of new intervention for the treatment of still unmet aspects in MDs.


Subject(s)
Bipolar Disorder , Protein Interaction Maps , Schizophrenia , Humans
10.
BMC Med Inform Decis Mak ; 21(1): 163, 2021 05 20.
Article in English | MEDLINE | ID: mdl-34016115

ABSTRACT

BACKGROUND: Sepsis is a severe illness that affects millions of people worldwide, and its early detection is critical for effective treatment outcomes. In recent years, researchers have used models to classify positive patients or identify the probability for sepsis using vital signs and other time-series variables as input. METHODS: In our study, we analyzed patients' conditions by their kinematics position, velocity, and acceleration, in a six-dimensional space defined by six vital signs. The patient is affected by the disease after a period if the position gets "near" to a calculated sepsis position in space. We imputed these kinematics features as explanatory variables of long short-term memory (LSTM), convolutional neural network (CNN) and linear neural network (LNN) and compared the prediction accuracies with only the vital signs as input. The dataset used contained information of approximately 4800 patients, each with 48 hourly registers. RESULTS: We demonstrated that the kinematics features models had an improved performance compared with vital signs models. The kinematics features model of LSTM achieved the best accuracy, 0.803, which was nine points higher than the vital signs model. Although with lesser accuracies, the kinematics features models of the CNN and LNN showed better performances than vital signs models. CONCLUSION: Applying our novel approach for early detection of sepsis using neural networks will prove to be an invaluable, more accurate method than considering only simple vital signs as input variables. We expect that other researchers with similar objectives can use the model presented in this innovative approach to improve their results.


Subject(s)
Neural Networks, Computer , Sepsis , Biomechanical Phenomena , Early Diagnosis , Humans , Sepsis/diagnosis , Vital Signs
11.
Database (Oxford) ; 20212021 03 11.
Article in English | MEDLINE | ID: mdl-33705530

ABSTRACT

A biomarker is a measurable indicator of a disease or abnormal state of a body that plays an important role in disease diagnosis, prognosis and treatment. The biomarker has become a significant topic due to its versatile usage in the medical field and in rapid detection of the presence or severity of some diseases. The volume of biomarker data is rapidly increasing and the identified data are scattered. To provide comprehensive information, the explosively growing data need to be recorded in a single platform. There is no open-source freely available comprehensive online biomarker database. To fulfill this purpose, we have developed a human biomarker database as part of the KNApSAcK family databases which contain a vast quantity of information on the relationships between biomarkers and diseases. We have classified the diseases into 18 disease classes, mostly according to the National Center for Biotechnology Information definitions. Apart from this database development, we also have performed disease classification by separately using protein and metabolite biomarkers based on the network clustering algorithm DPClusO and hierarchical clustering. Finally, we reached a conclusion about the relationships among the disease classes. The human biomarker database can be accessed online and the inter-disease relationships may be helpful in understanding the molecular mechanisms of diseases. To our knowledge, this is one of the first approaches to classify diseases based on biomarkers. Database URL:  http://www.knapsackfamily.com/Biomarker/top.php.


Subject(s)
Algorithms , Proteins , Biomarkers , Cluster Analysis , Databases, Factual , Humans
12.
Life (Basel) ; 12(1)2021 Dec 22.
Article in English | MEDLINE | ID: mdl-35054404

ABSTRACT

Using the Plethysmograph (PPG) signal to estimate blood pressure (BP) is attractive given the convenience and possibility of continuous measurement. However, due to the personal differences and the insufficiency of data, the dilemma between the accuracy for a small dataset and the robustness as a general method remains. To this end, we scrutinized the whole pipeline from the feature selection to regression model construction based on a one-month experiment with 11 subjects. By constructing the explanatory features consisting of five general PPG waveform features that do not require the identification of dicrotic notch and diastolic peak and the heart rate, three regression models, which are partial least square, local weighted partial least square, and Gaussian Process model, were built to reflect the underlying assumption about the nature of the fitting problem. By comparing the regression models, it can be confirmed that an individual Gaussian Process model attains the best results with 5.1 mmHg and 4.6 mmHg mean absolute error for SBP and DBP and 6.2 mmHg and 5.4 mmHg standard deviation for SBP and DBP. Moreover, the results of the individual models are significantly better than the generalized model built with the data of all subjects.

13.
Life (Basel) ; 12(1)2021 Dec 24.
Article in English | MEDLINE | ID: mdl-35054420

ABSTRACT

Recent advances in information technology have brought forth a paradigm shift in science, especially in the biology and medical fields [...].

14.
Mol Inform ; 40(2): e2000203, 2021 02.
Article in English | MEDLINE | ID: mdl-33164295

ABSTRACT

Deep learning approaches are widely used to search molecular structures for a candidate drug/material. The basic approach in drug/material candidate structure discovery is to embed a relationship that holds between a molecular structure and the physical property into a low-dimensional vector space (chemical space) and search for a candidate molecular structure in that space based on a desired physical property value. Deep learning simplifies the structure search by efficiently modeling the structure of the chemical space with greater detail and lower dimensions than the original input space. In our research, we propose an effective method for molecular embedding learning that combines variational autoencoders (VAEs) and metric learning using any physical property. Our method enables molecular structures and physical properties to be embedded locally and continuously into VAEs' latent space while maintaining the consistency of the relationship between the structural features and the physical properties of molecules to yield better predictions.


Subject(s)
Drug Discovery/methods , Models, Molecular , Molecular Structure , Algorithms
15.
BMC Genomics ; 21(1): 679, 2020 Oct 01.
Article in English | MEDLINE | ID: mdl-32998685

ABSTRACT

BACKGROUND: Species of the genus Monascus are considered to be economically important and have been widely used in the production of yellow and red food colorants. In particular, three Monascus species, namely, M. pilosus, M. purpureus, and M. ruber, are used for food fermentation in the cuisine of East Asian countries such as China, Japan, and Korea. These species have also been utilized in the production of various kinds of natural pigments. However, there is a paucity of information on the genomes and secondary metabolites of these strains. Here, we report the genomic analysis and secondary metabolites produced by M. pilosus NBRC4520, M. purpureus NBRC4478 and M. ruber NBRC4483, which are NBRC standard strains. We believe that this report will lead to a better understanding of red yeast rice food. RESULTS: We examined the diversity of secondary metabolite production in three Monascus species (M. pilosus, M. purpureus, and M. ruber) at both the metabolome level by LCMS analysis and at the genome level. Specifically, M. pilosus NBRC4520, M. purpureus NBRC4478 and M. ruber NBRC4483 strains were used in this study. Illumina MiSeq 300 bp paired-end sequencing generated 17 million high-quality short reads in each species, corresponding to 200 times the genome size. We measured the pigments and their related metabolites using LCMS analysis. The colors in the liquid media corresponding to the pigments and their related metabolites produced by the three species were very different from each other. The gene clusters for secondary metabolite biosynthesis of the three Monascus species also diverged, confirming that M. pilosus and M. purpureus are chemotaxonomically different. M. ruber has similar biosynthetic and secondary metabolite gene clusters to M. pilosus. The comparison of secondary metabolites produced also revealed divergence in the three species. CONCLUSIONS: Our findings are important for improving the utilization of Monascus species in the food industry and industrial field. However, in view of food safety, we need to determine if the toxins produced by some Monascus strains exist in the genome or in the metabolome.


Subject(s)
Genes, Plant , Genetic Speciation , Monascus/genetics , Pigments, Biological/genetics , Secondary Metabolism , Monascus/classification , Monascus/metabolism , Multigene Family , Phylogeny , Pigments, Biological/biosynthesis
16.
BMC Med Genomics ; 13(Suppl 3): 10, 2020 02 24.
Article in English | MEDLINE | ID: mdl-32093721

ABSTRACT

BACKGROUND: Multidimensional data mining from an integrated environment of different data sources is frequently performed in computational system biology. The molecular mechanism from the analysis of a complex network of gene-miRNA can aid to diagnosis and treatment of associated diseases. METHODS: In this work, we mainly focus on finding inflammatory bowel disease (IBD) associated microRNAs (miRNAs) by biclustering the miRNA-target interactions aided by known IBD risk genes and their associated miRNAs collected from several sources. We rank different miRNAs by attributing to the dataset size and connectivity of IBD associated genes in the miRNA regulatory modules from biclusters. We search the association of some top-ranking miRNAs to IBD related diseases. We also search the network of discovered miRNAs to different diseases and evaluate the similarity of those diseases to IBD. RESULTS: According to different literature, our results show the significance of top-ranking miRNA to IBD or related diseases. The ratio analysis supports our ranking method where the top 20 miRNA has approximately tenfold attachment to IBD genes. From disease-associated miRNA network analysis we found that 71% of different diseases attached to those miRNAs show more than 0.75 similarity scores to IBD. CONCLUSION: We successfully identify some miRNAs related to IBD where the scoring formula and disease-associated network analysis show the significance of our method. This method can be a promising approach for isolating miRNAs for similar types of diseases.


Subject(s)
Inflammatory Bowel Diseases/genetics , MicroRNAs/analysis , Cluster Analysis , Computational Biology , Datasets as Topic , Gene Regulatory Networks , Humans
17.
Mol Inform ; 39(1-2): e1900095, 2020 01.
Article in English | MEDLINE | ID: mdl-31815371

ABSTRACT

Machine learning approaches are widely used to evaluate ligand activities of chemical compounds toward potential target proteins. Especially, exploration of highly selective ligands is important for the development of new drugs with higher safety. One difficulty in constructing well-performing model predicting such a ligand activity is the absence of data on true negative ligand-protein interactions. In other words, in many cases we can access to plenty of information on ligands that bind to specific protein, but less or almost no information showing that compounds don't bind to proteins of interest. In this paper, we suggested an approach to comprehensively explore candidates for ligands specifically targeting toward proteins without using information on the true negative interaction. The approach consists of 4 steps: 1) constructing a model that distinguishes ligands for the target proteins of interest from those targeting proteins that cause off-target effects, by using graph convolution neural network (GCNN); 2) extracting feature vectors after convolution/pooling processes and mapping their principal components in two dimensions; 3) specifying regions with higher density for two ligand groups through kernel density estimation; and 4) investigating the distribution of compounds for exploration on the density map using the same classifier and decomposer. If compounds for exploration are located in higher-density regions of ligand compounds, these compounds can be regarded as having relatively high binding affinity to the major target or off-target proteins compared with other compounds. We applied the approach to the exploration of ligands for ß-site amyloid precursor protein [APP]-cleaving enzyme 1 (BACE1), a major target for Alzheimer Disease (AD), with less off-target effect toward cathepsin D. We demonstrated that the density region of BACE1 and cathepsin D ligands are well-divided, and a group of natural compounds as a target for exploration of new drug candidates also has significantly different distribution on the density map.


Subject(s)
Algorithms , Amyloid Precursor Protein Secretases/chemistry , Aspartic Acid Endopeptidases/chemistry , Cathepsin D/chemistry , Neural Networks, Computer , Amyloid Precursor Protein Secretases/antagonists & inhibitors , Aspartic Acid Endopeptidases/antagonists & inhibitors , Cathepsin D/pharmacology , Humans , Ligands
18.
BMC Bioinformatics ; 20(1): 380, 2019 Jul 09.
Article in English | MEDLINE | ID: mdl-31288752

ABSTRACT

BACKGROUND: Alkaloids, a class of organic compounds that contain nitrogen bases, are mainly synthesized as secondary metabolites in plants and fungi, and they have a wide range of bioactivities. Although there are thousands of compounds in this class, few of their biosynthesis pathways are fully identified. In this study, we constructed a model to predict their precursors based on a novel kind of neural network called the molecular graph convolutional neural network. Molecular similarity is a crucial metric in the analysis of qualitative structure-activity relationships. However, it is sometimes difficult for current fingerprint representations to emphasize specific features for the target problems efficiently. It is advantageous to allow the model to select the appropriate features according to data-driven decisions for extracting more useful information, which influences a classification or regression problem substantially. RESULTS: In this study, we applied a neural network architecture for undirected graph representation of molecules. By encoding a molecule as an abstract graph and applying "convolution" on the graph and training the weight of the neural network framework, the neural network can optimize feature selection for the training problem. By incorporating the effects from adjacent atoms recursively, graph convolutional neural networks can extract the features of latent atoms that represent chemical features of a molecule efficiently. In order to investigate alkaloid biosynthesis, we trained the network to distinguish the precursors of 566 alkaloids, which are almost all of the alkaloids whose biosynthesis pathways are known, and showed that the model could predict starting substances with an averaged accuracy of 97.5%. CONCLUSION: We have showed that our model can predict more accurately compared to the random forest and general neural network when the variables and fingerprints are not selected, while the performance is comparable when we carefully select 507 variables from 18000 dimensions of descriptors. The prediction of pathways contributes to understanding of alkaloid synthesis mechanisms and the application of graph based neural network models to similar problems in bioinformatics would therefore be beneficial. We applied our model to evaluate the precursors of biosynthesis of 12000 alkaloids found in various organisms and found power-low-like distribution.


Subject(s)
Alkaloids/classification , Biosynthetic Pathways , Neural Networks, Computer , Algorithms , Alkaloids/chemistry , Metabolome , Models, Theoretical
19.
Int J Comput Assist Radiol Surg ; 13(12): 1905-1913, 2018 Dec.
Article in English | MEDLINE | ID: mdl-30159833

ABSTRACT

PURPOSE: Convolutional neural networks have become rapidly popular for image recognition and image analysis because of its powerful potential. In this paper, we developed a method for classifying subtypes of lung adenocarcinoma from pathological images using neural network whose that can evaluate phenotypic features from wider area to consider cellular distributions. METHODS: In order to recognize the types of tumors, we need not only to detail features of cells, but also to incorporate statistical distribution of the different types of cells. Variants of autoencoders as building blocks of pre-trained convolutional layers of neural networks are implemented. A sparse deep autoencoder which minimizes local information entropy on the encoding layer is then proposed and applied to images of size [Formula: see text]. We applied this model for feature extraction from pathological images of lung adenocarcinoma, which is comprised of three transcriptome subtypes previously defined by the Cancer Genome Atlas network. Since the tumor tissue is composed of heterogeneous cell populations, recognition of tumor transcriptome subtypes requires more information than local pattern of cells. The parameters extracted using this approach will then be used in multiple reduction stages to perform classification on larger images. RESULTS: We were able to demonstrate that these networks successfully recognize morphological features of lung adenocarcinoma. We also performed classification and reconstruction experiments to compare the outputs of the variants. The results showed that the larger input image that covers a certain area of the tissue is required to recognize transcriptome subtypes. The sparse autoencoder network with [Formula: see text] input provides a 98.9% classification accuracy. CONCLUSION: This study shows the potential of autoencoders as a feature extraction paradigm and paves the way for a whole slide image analysis tool to predict molecular subtypes of tumors from pathological features.


Subject(s)
Adenocarcinoma of Lung/classification , Image Processing, Computer-Assisted/methods , Neural Networks, Computer , Adenocarcinoma of Lung/diagnosis , Adenocarcinoma of Lung/genetics , Biopsy , Humans , Transcriptome
20.
BMC Bioinformatics ; 19(1): 264, 2018 07 13.
Article in English | MEDLINE | ID: mdl-30005591

ABSTRACT

BACKGROUND: There are different and complicated associations between genes and diseases. Finding the causal associations between genes and specific diseases is still challenging. In this work we present a method to predict novel associations of genes and pathways with inflammatory bowel disease (IBD) by integrating information of differential gene expression, protein-protein interaction and known disease genes related to IBD. RESULTS: We downloaded IBD gene expression data from NCBI's Gene Expression Omnibus, performed statistical analysis to determine differentially expressed genes, collected known IBD genes from DisGeNet database, which were used to construct a IBD related PPI network with HIPPIE database. We adapted our graph-based clustering algorithm DPClusO to cluster the disease PPI network. We evaluated the statistical significance of the identified clusters in the context of determining the richness of IBD genes using Fisher's exact test and predicted novel genes related to IBD. We showed 93.8% of our predictions are correct in the context of other databases and published literatures related to IBD. CONCLUSIONS: Finding disease-causing genes is necessary for developing drugs with synergistic effect targeting many genes simultaneously. Here we present an approach to identify novel disease genes and pathways and discuss our approach in the context of IBD. The approach can be generalized to find disease-associated genes for other diseases.


Subject(s)
Gene Regulatory Networks , Inflammatory Bowel Diseases/genetics , Algorithms , Area Under Curve , Databases, Genetic , Gene Ontology , Humans , Protein Interaction Maps/genetics , ROC Curve , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL
...