Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
1.
J Immunother Cancer ; 12(5)2024 05 15.
Article in English | MEDLINE | ID: mdl-38749538

ABSTRACT

BACKGROUND: Only a subset of patients with gastric cancer experience long-term benefits from immune checkpoint inhibitors (ICIs). Currently, there is a deficiency in precise predictive biomarkers for ICI efficacy. The aim of this study was to develop and validate a pathomics-driven ensemble model for predicting the response to ICIs in gastric cancer, using H&E-stained whole slide images (WSI). METHODS: This multicenter study retrospectively collected and analyzed H&E-stained WSIs and clinical data from 584 patients with gastric cancer. An ensemble model, integrating four classifiers: least absolute shrinkage and selection operator, k-nearest neighbors, decision trees, and random forests, was developed and validated using pathomics features, with the objective of predicting the therapeutic efficacy of immune checkpoint inhibition. Model performance was evaluated using metrics including the area under the curve (AUC), sensitivity, and specificity. Additionally, SHAP (SHapley Additive exPlanations) analysis was used to explain the model's predicted values as the sum of the attribution values for each input feature. Pathogenomics analysis was employed to explain the molecular mechanisms underlying the model's predictions. RESULTS: Our pathomics-driven ensemble model effectively stratified the response to ICIs in training cohort (AUC 0.985 (95% CI 0.971 to 0.999)), which was further validated in internal validation cohort (AUC 0.921 (95% CI 0.839 to 0.999)), as well as in external validation cohort 1 (AUC 0.914 (95% CI 0.837 to 0.990)), and external validation cohort 2 (0.927 (95% CI 0.802 to 0.999)). The univariate Cox regression analysis revealed that the prediction signature of pathomics-driven ensemble model was a prognostic factor for progression-free survival in patients with gastric cancer who underwent immunotherapy (p<0.001, HR 0.35 (95% CI 0.24 to 0.50)), and remained an independent predictor after multivariable Cox regression adjusted for clinicopathological variables, (including sex, age, carcinoembryonic antigen, carbohydrate antigen 19-9, therapy regime, line of therapy, differentiation, location and programmed death ligand 1 (PD-L1) expression in all patients (p<0.001, HR 0.34 (95% CI 0.24 to 0.50)). Pathogenomics analysis suggested that the ensemble model is driven by molecular-level immune, cancer, metabolism-related pathways, and was correlated with the immune-related characteristics, including immune score, Estimation of STromal and Immune cells in MAlignant Tumor tissues using Expression data score, and tumor purity. CONCLUSIONS: Our pathomics-driven ensemble model exhibited high accuracy and robustness in predicting the response to ICIs using WSIs. Therefore, it could serve as a novel and valuable tool to facilitate precision immunotherapy.


Subject(s)
Immunotherapy , Stomach Neoplasms , Humans , Stomach Neoplasms/drug therapy , Stomach Neoplasms/immunology , Stomach Neoplasms/pathology , Stomach Neoplasms/therapy , Male , Female , Immunotherapy/methods , Retrospective Studies , Middle Aged , Immune Checkpoint Inhibitors/therapeutic use , Immune Checkpoint Inhibitors/pharmacology , Aged
2.
NPJ Precis Oncol ; 7(1): 117, 2023 Nov 06.
Article in English | MEDLINE | ID: mdl-37932419

ABSTRACT

The response rate of cancer immune checkpoint inhibitors (ICI) varies among patients, making it challenging to pre-determine whether a particular patient will respond to immunotherapy. While gene mutation is critical to the treatment outcome, a framework capable of explicitly incorporating biology knowledge has yet to be established. Here we aim to propose and validate a mutation-based deep learning model for survival analysis on 1571 patients treated with ICI. Our model achieves an average concordance index of 0.59 ± 0.13 across nine types of cancer, compared to the gold standard Cox-PH model (0.52 ± 0.10). The "black box" nature of deep learning is a major concern in healthcare field. This model's interpretability, which results from incorporating the gene pathways and protein interaction (i.e., biology-aware) rather than relying on a 'black box' approach, helps patient stratification and provides insight into novel gene biomarkers, advancing our understanding of ICI treatment.

3.
J Biomed Inform ; 146: 104496, 2023 10.
Article in English | MEDLINE | ID: mdl-37704104

ABSTRACT

Automatic radiology report generation has the potential to alert inexperienced radiologists to misdiagnoses or missed diagnoses and improve healthcare delivery efficiency by reducing the documentation workload of radiologists. Motivated by the continuous development of automatic image captioning, more and more deep learning methods have been proposed for automatic radiology report generation. However, the visual and textual data bias problem still face many challenges in the medical domain. Additionally, do not integrate medical knowledge, ignoring the mutual influences between medical findings, and abundant unlabeled medical images influence the accuracy of generating report. In this paper, we propose a Medical Knowledge with Contrastive Learning model (MKCL) to enhance radiology report generation. The proposed model MKCL uses IU Medical Knowledge Graph (IU-MKG) to mine the relationship among medical findings and improve the accuracy of identifying positive diseases findings from radiologic medical images. In particular, we design Knowledge Enhanced Attention (KEA), which integrates the IU-MKG and the extracted chest radiological visual features to alleviate textual data bias. Meanwhile, this paper leverages supervised contrastive learning to relieve radiographic medical images which have not been labeled, and identify abnormalities from images. Experimental results on the public dataset IU X-ray show that our proposed model MKCL outperforms other state-of-the-art report generation methods. Ablation studies also demonstrate that IU medical knowledge graph module and supervised contrastive learning module enhance the ability of the model to detect the abnormal parts and accurately describe the abnormal findings. The source code is available at: https://github.com/Eleanorhxd/MKCL.


Subject(s)
Radiology , Humans , Documentation , Knowledge , Radiography , Radiologists , Learning
4.
Nat Commun ; 14(1): 5135, 2023 08 23.
Article in English | MEDLINE | ID: mdl-37612313

ABSTRACT

Substantial progress has been made in using deep learning for cancer detection and diagnosis in medical images. Yet, there is limited success on prediction of treatment response and outcomes, which has important implications for personalized treatment strategies. A significant hurdle for clinical translation of current data-driven deep learning models is lack of interpretability, often attributable to a disconnect from the underlying pathobiology. Here, we present a biology-guided deep learning approach that enables simultaneous prediction of the tumor immune and stromal microenvironment status as well as treatment outcomes from medical images. We validate the model for predicting prognosis of gastric cancer and the benefit from adjuvant chemotherapy in a multi-center international study. Further, the model predicts response to immune checkpoint inhibitors and complements clinically approved biomarkers. Importantly, our model identifies a subset of mismatch repair-deficient tumors that are non-responsive to immunotherapy and may inform the selection of patients for combination treatments.


Subject(s)
Brain Neoplasms , Deep Learning , Humans , Immunotherapy , Chemotherapy, Adjuvant , Biology , Tumor Microenvironment
5.
ISA Trans ; 142: 445-453, 2023 Nov.
Article in English | MEDLINE | ID: mdl-37558515

ABSTRACT

In recent years, pumps have become critical components in agriculture, industry, and the military, necessitating extensive development and implementation of the fault diagnosis method. In the majority of existing fault classification models, the connection between performance improvement and the amount of training data remains high, yet real-world samples are difficult to obtain. Combining domain migration theory and sample expansion method, this paper introduces a few-shot learning fault diagnosis method. Employing the T-SNE visualization algorithm, we examine the validity of the self-calibration attention mechanism (SCAM) and distribution edge prediction strategy (DEPS). The accomplishment demonstrated that the proposed algorithm could effectively map the expanded sample space within a separate interval, thereby avoiding the problem of feature aliasing caused by the overlap of sample features among similar categories and significantly enhancing the quality and quantity of training samples. The experimental analysis indicates that the proposed methodology can effectively increase the accuracy of few-shot tasks, especially in the 9way-15shot task, where it maintains a performance of 72 %, which leading the mean accuracy calculated from the others of about 30%. It is believed that much of the work has superior applicability to other few-shot diagnosis cases.

6.
Cell Rep Med ; 4(8): 101146, 2023 08 15.
Article in English | MEDLINE | ID: mdl-37557177

ABSTRACT

The tumor microenvironment (TME) plays a critical role in disease progression and is a key determinant of therapeutic response in cancer patients. Here, we propose a noninvasive approach to predict the TME status from radiological images by combining radiomics and deep learning analyses. Using multi-institution cohorts of 2,686 patients with gastric cancer, we show that the radiological model accurately predicted the TME status and is an independent prognostic factor beyond clinicopathologic variables. The model further predicts the benefit from adjuvant chemotherapy for patients with localized disease. In patients treated with checkpoint blockade immunotherapy, the model predicts clinical response and further improves predictive accuracy when combined with existing biomarkers. Our approach enables noninvasive assessment of the TME, which opens the door for longitudinal monitoring and tracking response to cancer therapy. Given the routine use of radiologic imaging in oncology, our approach can be extended to many other solid tumor types.


Subject(s)
Deep Learning , Stomach Neoplasms , Humans , Stomach Neoplasms/diagnostic imaging , Stomach Neoplasms/therapy , Tumor Microenvironment , Immunotherapy , Chemotherapy, Adjuvant
7.
IEEE Trans Pattern Anal Mach Intell ; 45(5): 6289-6306, 2023 May.
Article in English | MEDLINE | ID: mdl-36178991

ABSTRACT

Semantic segmentation is an important step in understanding the scene for many practical applications such as autonomous driving. Although Deep Convolutional Neural Networks-based methods have significantly improved segmentation accuracy, small/thin objects remain challenging to segment due to convolutional and pooling operations that result in information loss, especially for small objects. This article presents a novel attention-based method called Across Feature Map Attention (AFMA) to address this challenge. It quantifies the inner-relationship between small and large objects belonging to the same category by utilizing the different feature levels of the original image. The AFMA could compensate for the loss of high-level feature information of small objects and improve the small/thin object segmentation. Our method can be used as an efficient plug-in for a wide range of existing architectures and produces much more interpretable feature representation than former studies. Extensive experiments on eight widely used segmentation methods and other existing small-object segmentation models on CamVid and Cityscapes demonstrate that our method substantially and consistently improves the segmentation of small/thin objects.

8.
Nat Commun ; 13(1): 7142, 2022 11 21.
Article in English | MEDLINE | ID: mdl-36414658

ABSTRACT

Single cell RNA sequencing is a promising technique to determine the states of individual cells and classify novel cell subtypes. In current sequence data analysis, however, genes with low expressions are omitted, which leads to inaccurate gene counts and hinders downstream analysis. Recovering these omitted expression values presents a challenge because of the large size of the data. Here, we introduce a data-driven gene expression recovery framework, referred to as self-consistent expression recovery machine (SERM), to impute the missing expressions. Using a neural network, the technique first learns the underlying data distribution from a subset of the noisy data. It then recovers the overall expression data by imposing a self-consistency on the expression matrix, thus ensuring that the expression levels are similarly distributed in different parts of the matrix. We show that SERM improves the accuracy of gene imputation with orders of magnitude enhancement in computational efficiency in comparison to the state-of-the-art imputation techniques.


Subject(s)
Selective Estrogen Receptor Modulators , Gene Expression
9.
ACS Appl Mater Interfaces ; 14(22): 26057-26067, 2022 Jun 08.
Article in English | MEDLINE | ID: mdl-35608638

ABSTRACT

Porous materials with super-wetting surfaces (superhydrophilic/underwater superoleophobic) are ideal for oil/water separation. However, the inability to monitor the pollution degree and self-cleaning during the separation process limits their application in industrial production. In this study, a porous metal-based hydrogel is proposed, inspired by the porous structure of wood. Porous copper foam with nano-Cu(OH)2 is used as the skeleton, and its surface is coated with a polyvinyl alcohol, tannic acid, and multiwalled carbon nanotube cross-linked hydrogel coating. The hydrogel has superhydrophilicity and excellent oil/water separation efficiency (>99%) and can adapt to various environments. This approach can also realize hydrogel pollution degree self-detection according to the change in the electrical signal generated during the oil/water separation process, and the hydrogel can also be recovered by soaking to realize self-cleaning. This study will provide new insights into the application of oil/water separation materials in practical industrial manufacturing.

10.
IEEE/ACM Trans Comput Biol Bioinform ; 19(3): 1294-1301, 2022.
Article in English | MEDLINE | ID: mdl-32750871

ABSTRACT

Nowadays, the amount of biomedical literatures is growing at an explosive speed, and much useful knowledge is yet undiscovered in the literature. Classical information retrieval techniques allow to access explicit information from a given collection of information, but are not able to recognize implicit connections. Literature-based discovery (LBD) is characterized by uncovering hidden associations in non-interacting literature. It could significantly support scientific research by identifying new connections between biomedical entities. However, most of the existing approaches to LBD are not scalable and may not be sufficient to detect complex associations in non-directly-connected literature. In this article, we present a model which incorporates biomedical knowledge graph, graph embedding, and deep learning methods for literature-based discovery. First, the relations between biomedical entities are extracted from biomedical abstracts and then a knowledge graph is constructed by using these obtained relations. Second, the graph embedding technologies are applied to convert the entities and relations in the knowledge graph into a low-dimensional vector space. Third, a bidirectional Long Short-Term Memory (BLSTM) network is trained based on the entity associations represented by the pre-trained graph embeddings. Finally, the learned model is used for open and closed literature-based discovery tasks. The experimental results show that our method could not only effectively discover hidden associations between entities, but also reveal the corresponding mechanism of interactions. It suggests that incorporating knowledge graph and deep learning methods is an effective way for capturing the underlying complex associations between entities hidden in the literature.


Subject(s)
Neural Networks, Computer , Publications , Knowledge , Knowledge Bases , Research Design
11.
ISA Trans ; 125: 665-680, 2022 Jun.
Article in English | MEDLINE | ID: mdl-34176603

ABSTRACT

As a typical frequency-domain analysis method, quaternion discrete Fourier transform (QDFT) has been widely used in information hiding in color images. However, due to the sensitivity of QDFT to geometric attacks, existing QDFT-based information hiding schemes have limited ability in resisting geometric attacks. In this study, a kind of novel geometrically resilient polar QDFT (PQDFT) is constructed and the properties of the proposed PQDFT are analyzed. Subsequently, a PQDFT-based color image zero-hiding scheme robust to geometric attacks is proposed for lossless copyright protection of color images, which experimentally shows reasonable resistance against geometric and common attacks, indicating better robustness compared with the existing QDFT-based information hiding schemes and other leading-edge zero-hiding schemes.

12.
Clin Diabetes ; 39(3): 284-292, 2021 Jul.
Article in English | MEDLINE | ID: mdl-34421204

ABSTRACT

This retrospective cohort study evaluated diabetes device utilization and the effectiveness of these devices for newly diagnosed type 1 diabetes. Investigators examined the use of continuous glucose monitoring (CGM) systems, self-monitoring of blood glucose (SMBG), continuous subcutaneous insulin infusion (CSII), and multiple daily injection (MDI) insulin regimens and their effects on A1C. The researchers identified 6,250 patients with type 1 diabetes, of whom 32% used CGM and 37.1% used CSII. A higher adoption rate of either CGM or CSII in newly diagnosed type 1 diabetes was noted among White patients and those with private health insurance. CGM users had lower A1C levels than nonusers (P = 0.039), whereas no difference was noted between CSII users and nonusers (P = 0.057). Furthermore, CGM use combined with CSII yielded lower A1C than MDI regimens plus SMBG (P <0.001).

13.
J Biomed Inform ; 119: 103802, 2021 07.
Article in English | MEDLINE | ID: mdl-33965640

ABSTRACT

BACKGROUND: Unlike well-established diseases that base clinical care on randomized trials, past experiences, and training, prognosis in COVID19 relies on a weaker foundation. Knowledge from other respiratory failure diseases may inform clinical decisions in this novel disease. The objective was to predict 48-hour invasive mechanical ventilation (IMV) within 48 h in patients hospitalized with COVID-19 using COVID-like diseases (CLD). METHODS: This retrospective multicenter study trained machine learning (ML) models on patients hospitalized with CLD to predict IMV within 48 h in COVID-19 patients. CLD patients were identified using diagnosis codes for bacterial pneumonia, viral pneumonia, influenza, unspecified pneumonia and acute respiratory distress syndrome (ARDS), 2008-2019. A total of 16 cohorts were constructed, including any combinations of the four diseases plus an exploratory ARDS cohort, to determine the most appropriate cohort to use. Candidate predictors included demographic and clinical parameters that were previously associated with poor COVID-19 outcomes. Model development included the implementation of logistic regression and three ensemble tree-based algorithms: decision tree, AdaBoost, and XGBoost. Models were validated in hospitalized COVID-19 patients at two healthcare systems, March 2020-July 2020. ML models were trained on CLD patients at Stanford Hospital Alliance (SHA). Models were validated on hospitalized COVID-19 patients at both SHA and Intermountain Healthcare. RESULTS: CLD training data were obtained from SHA (n = 14,030), and validation data included 444 adult COVID-19 hospitalized patients from SHA (n = 185) and Intermountain (n = 259). XGBoost was the top-performing ML model, and among the 16 CLD training cohorts, the best model achieved an area under curve (AUC) of 0.883 in the validation set. In COVID-19 patients, the prediction models exhibited moderate discrimination performance, with the best models achieving an AUC of 0.77 at SHA and 0.65 at Intermountain. The model trained on all pneumonia and influenza cohorts had the best overall performance (SHA: positive predictive value (PPV) 0.29, negative predictive value (NPV) 0.97, positive likelihood ratio (PLR) 10.7; Intermountain: PPV, 0.23, NPV 0.97, PLR 10.3). We identified important factors associated with IMV that are not traditionally considered for respiratory diseases. CONCLUSIONS: The performance of prediction models derived from CLD for 48-hour IMV in patients hospitalized with COVID-19 demonstrate high specificity and can be used as a triage tool at point of care. Novel predictors of IMV identified in COVID-19 are often overlooked in clinical practice. Lessons learned from our approach may assist other research institutes seeking to build artificial intelligence technologies for novel or rare diseases with limited data for training and validation.


Subject(s)
COVID-19 , Respiratory Insufficiency , Adult , Artificial Intelligence , Hospitalization , Humans , Respiratory Insufficiency/diagnosis , Respiratory Insufficiency/therapy , Retrospective Studies , SARS-CoV-2 , Triage , Ventilators, Mechanical
14.
J Med Internet Res ; 23(2): e23026, 2021 02 22.
Article in English | MEDLINE | ID: mdl-33534724

ABSTRACT

BACKGROUND: For the clinical care of patients with well-established diseases, randomized trials, literature, and research are supplemented with clinical judgment to understand disease prognosis and inform treatment choices. In the void created by a lack of clinical experience with COVID-19, artificial intelligence (AI) may be an important tool to bolster clinical judgment and decision making. However, a lack of clinical data restricts the design and development of such AI tools, particularly in preparation for an impending crisis or pandemic. OBJECTIVE: This study aimed to develop and test the feasibility of a "patients-like-me" framework to predict the deterioration of patients with COVID-19 using a retrospective cohort of patients with similar respiratory diseases. METHODS: Our framework used COVID-19-like cohorts to design and train AI models that were then validated on the COVID-19 population. The COVID-19-like cohorts included patients diagnosed with bacterial pneumonia, viral pneumonia, unspecified pneumonia, influenza, and acute respiratory distress syndrome (ARDS) at an academic medical center from 2008 to 2019. In total, 15 training cohorts were created using different combinations of the COVID-19-like cohorts with the ARDS cohort for exploratory purposes. In this study, two machine learning models were developed: one to predict invasive mechanical ventilation (IMV) within 48 hours for each hospitalized day, and one to predict all-cause mortality at the time of admission. Model performance was assessed using the area under the receiver operating characteristic curve (AUROC), sensitivity, specificity, positive predictive value, and negative predictive value. We established model interpretability by calculating SHapley Additive exPlanations (SHAP) scores to identify important features. RESULTS: Compared to the COVID-19-like cohorts (n=16,509), the patients hospitalized with COVID-19 (n=159) were significantly younger, with a higher proportion of patients of Hispanic ethnicity, a lower proportion of patients with smoking history, and fewer patients with comorbidities (P<.001). Patients with COVID-19 had a lower IMV rate (15.1 versus 23.2, P=.02) and shorter time to IMV (2.9 versus 4.1 days, P<.001) compared to the COVID-19-like patients. In the COVID-19-like training data, the top models achieved excellent performance (AUROC>0.90). Validating in the COVID-19 cohort, the top-performing model for predicting IMV was the XGBoost model (AUROC=0.826) trained on the viral pneumonia cohort. Similarly, the XGBoost model trained on all 4 COVID-19-like cohorts without ARDS achieved the best performance (AUROC=0.928) in predicting mortality. Important predictors included demographic information (age), vital signs (oxygen saturation), and laboratory values (white blood cell count, cardiac troponin, albumin, etc). Our models had class imbalance, which resulted in high negative predictive values and low positive predictive values. CONCLUSIONS: We provided a feasible framework for modeling patient deterioration using existing data and AI technology to address data limitations during the onset of a novel, rapidly changing pandemic.


Subject(s)
COVID-19/diagnosis , COVID-19/mortality , Machine Learning , Pneumonia, Viral/diagnosis , Aged , Area Under Curve , Cohort Studies , Comorbidity , Female , Hospitalization/statistics & numerical data , Humans , Male , Middle Aged , Pandemics , Pneumonia, Viral/mortality , Predictive Value of Tests , Prognosis , ROC Curve , Respiration, Artificial/statistics & numerical data , Retrospective Studies , SARS-CoV-2 , Treatment Outcome
15.
Artif Intell Med ; 96: 107-115, 2019 05.
Article in English | MEDLINE | ID: mdl-31164203

ABSTRACT

Cellular processes are typically carried out by protein complexes rather than individual proteins. Identifying protein complexes is one of the keys to understanding principles of cellular organization and function. Also, protein complexes are a group of interacting genes underlying similar diseases, which points out the therapeutic importance of protein complexes. With the development of life science and computing science, an increasing amount of protein-protein interaction (PPI) data becomes available, which makes it possible to predict protein complexes from PPI networks. However, most PPI data produced by high-throughput experiments often has many false positive interactions and false negative edge loss, which makes it difficult to predict complexes accurately. In this paper, we present a new method, named as MEMO (Multiple network Embedding for coMplex detectiOn), to detect protein complexes. MEMO integrates multiple PPI datasets from different species into a single PPI network by using functional orthology information across multiple species and then uses a graph embedding technology to embed protein nodes of the network into continuous vector spaces, so as to quantify the relationships between nodes and better guild the protein complex detection process. Finally, it utilizes a seed-and-extend strategy to identify protein complexes from multiple PPI networks based on the similarities of their corresponding protein representations. As part of our approach, we also define a new quality measure which combines the cluster cohesiveness and cluster density to measure the likelihood of a detected protein complex being a real protein complex. Extensive experimental results demonstrate the proposed method outperforms state-of-the-art complex detection techniques.


Subject(s)
Algorithms , Protein Interaction Mapping/methods , Proteins/chemistry , Humans
16.
BMC Med Inform Decis Mak ; 19(Suppl 2): 59, 2019 04 09.
Article in English | MEDLINE | ID: mdl-30961599

ABSTRACT

BACKGROUND: Drug development is an expensive and time-consuming process. Literature-based discovery has played a critical role in drug development and may be a supplementary method to help scientists speed up the discovery of drugs. METHODS: Here, we propose a relation path features embedding based convolutional neural network model with attention mechanism for drug discovery from literature, which we denote as PACNN. First, we use predications from biomedical abstracts to construct a biomedical knowledge graph, and then apply a path ranking algorithm to extract drug-disease relation path features on the biomedical knowledge graph. After that, we use these drug-disease relation features to train a convolutional neural network model which combined with the attention mechanism. Finally, we employ the trained models to mine drugs for treating diseases. RESULTS: The experiment shows that the proposed model achieved promising results, comparing to several random walk algorithms. CONCLUSIONS: In this paper, we propose a relation path features embedding based convolutional neural network with attention mechanism for discovering potential drugs from literature. Our method could be an auxiliary method for drug discovery, which can speed up the discovery of new drugs for the incurable diseases.


Subject(s)
Drug Discovery , Knowledge Bases , Neural Networks, Computer , Algorithms , Humans , Research Design
17.
BMC Bioinformatics ; 19(1): 332, 2018 Sep 21.
Article in English | MEDLINE | ID: mdl-30241459

ABSTRACT

BACKGROUND: Protein complexes are one of the keys to deciphering the behavior of a cell system. During the past decade, most computational approaches used to identify protein complexes have been based on discovering densely connected subgraphs in protein-protein interaction (PPI) networks. However, many true complexes are not dense subgraphs and these approaches show limited performances for detecting protein complexes from PPI networks. RESULTS: To solve these problems, in this paper we propose a supervised learning method based on network node embeddings which utilizes the informative properties of known complexes to guide the search process for new protein complexes. First, node embeddings are obtained from human protein interaction network. Then the protein interactions are weighted through the similarities between node embeddings. After that, the supervised learning method is used to detect protein complexes. Then the random forest model is used to filter the candidate complexes in order to obtain the final predicted complexes. Experimental results on real human and yeast protein interaction networks show that our method effectively improves the performance for protein complex detection. CONCLUSIONS: We provided a new method for identifying protein complexes from human and yeast protein interaction networks, which has great potential to benefit the field of protein complex detection.


Subject(s)
Algorithms , Computational Biology/methods , Protein Interaction Mapping/methods , Protein Interaction Maps , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Humans
18.
BMC Bioinformatics ; 19(1): 193, 2018 05 30.
Article in English | MEDLINE | ID: mdl-29843590

ABSTRACT

BACKGROUND: Drug discovery is the process through which potential new medicines are identified. High-throughput screening and computer-aided drug discovery/design are the two main drug discovery methods for now, which have successfully discovered a series of drugs. However, development of new drugs is still an extremely time-consuming and expensive process. Biomedical literature contains important clues for the identification of potential treatments. It could support experts in biomedicine on their way towards new discoveries. METHODS: Here, we propose a biomedical knowledge graph-based drug discovery method called SemaTyP, which discovers candidate drugs for diseases by mining published biomedical literature. We first construct a biomedical knowledge graph with the relations extracted from biomedical abstracts, then a logistic regression model is trained by learning the semantic types of paths of known drug therapies' existing in the biomedical knowledge graph, finally the learned model is used to discover drug therapies for new diseases. RESULTS: The experimental results show that our method could not only effectively discover new drug therapies for new diseases, but also could provide the potential mechanism of action of the candidate drugs. CONCLUSIONS: In this paper we propose a novel knowledge graph based literature mining method for drug discovery. It could be a supplementary method for current drug discovery methods.


Subject(s)
Data Mining/methods , Drug Discovery/methods , Drug Therapy , Humans , Knowledge Bases , Logistic Models , Publications
19.
BMC Bioinformatics ; 17 Suppl 7: 229, 2016 Jul 25.
Article in English | MEDLINE | ID: mdl-27454775

ABSTRACT

BACKGROUND: Accurate determination of protein complexes has become a key task of system biology for revealing cellular organization and function. Up to now, the protein complex prediction methods are mostly focused on static protein protein interaction (PPI) networks. However, cellular systems are highly dynamic and responsive to cues from the environment. The shift from static PPI networks to dynamic PPI networks is essential to accurately predict protein complex. RESULTS: The gene expression data contains crucial dynamic information of proteins and PPIs, along with high-throughput experimental PPI data, are valuable for protein complex prediction. Firstly, we exploit gene expression data to calculate the active time point and the active probability of each protein and PPI. The dynamic active information is integrated into high-throughput PPI data to construct dynamic PPI networks. Secondly, a novel method for predicting protein complexes from the dynamic PPI networks is proposed based on core-attachment structural feature. Our method can effectively exploit not only the dynamic active information but also the topology structure information based on the dynamic PPI networks. CONCLUSIONS: We construct four dynamic PPI networks, and accurately predict many well-characterized protein complexes. The experimental results show that (i) the dynamic active information significantly improves the performance of protein complex prediction; (ii) our method can effectively make good use of both the dynamic active information and the topology structure information of dynamic PPI networks to achieve state-of-the-art protein complex prediction capabilities.


Subject(s)
Algorithms , Computational Biology/methods , Protein Interaction Maps , Proteins/metabolism , Protein Interaction Mapping/methods , Proteins/genetics , Transcriptome
20.
Biomed Res Int ; 2015: 698527, 2015.
Article in English | MEDLINE | ID: mdl-26380291

ABSTRACT

Nowadays, the amount of biomedical literatures is growing at an explosive speed, and there is much useful knowledge undiscovered in this literature. Researchers can form biomedical hypotheses through mining these works. In this paper, we propose a supervised learning based approach to generate hypotheses from biomedical literature. This approach splits the traditional processing of hypothesis generation with classic ABC model into AB model and BC model which are constructed with supervised learning method. Compared with the concept cooccurrence and grammar engineering-based approaches like SemRep, machine learning based models usually can achieve better performance in information extraction (IE) from texts. Then through combining the two models, the approach reconstructs the ABC model and generates biomedical hypotheses from literature. The experimental results on the three classic Swanson hypotheses show that our approach outperforms SemRep system.


Subject(s)
Data Mining , Medical Informatics Computing , Publications , Algorithms , Humans , Natural Language Processing
SELECTION OF CITATIONS
SEARCH DETAIL
...