Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 151
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Artigo em Inglês | MEDLINE | ID: mdl-38451769

RESUMO

Transcription factors (TFs) regulation is required for the vast majority of biological processes in living organisms. Some diseases may be caused by improper transcriptional regulation. Identifying the target genes of TFs is thus critical for understanding cellular processes and analyzing disease molecular mechanisms. Computational approaches can be challenging to employ when attempting to predict potential interactions between TFs and target genes. In this paper, we present a novel graph model (PPRTGI) for detecting TF-target gene interactions using DNA sequence features. Feature representations of TFs and target genes are extracted from sequence embeddings and biological associations. Then, by combining the aggregated node feature with graph structure, PPRTGI uses a graph neural network with personalized PageRank to learn interaction patterns. Finally, a bilinear decoder is applied to predict interaction scores between TF and target gene nodes. We designed experiments on six datasets from different species. The experimental results show that PPRTGI is effective in regulatory interaction inference, with our proposed model achieving an area under receiver operating characteristic score of 93.87% and an area under precision-recall curves score of 88.79% on the human dataset. This paper proposes a new method for predicting TF-target gene interactions, which provides new insights into modeling molecular networks and can thus be used to gain a better understanding of complex biological systems.


Assuntos
Biologia Computacional , Redes Neurais de Computação , Fatores de Transcrição , Biologia Computacional/métodos , Humanos , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Algoritmos , Redes Reguladoras de Genes/genética , Animais , Bases de Dados Genéticas , Análise de Sequência de DNA/métodos
2.
Ecol Evol ; 14(2): e11032, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38357593

RESUMO

Plant phenotypic characteristics, especially leaf morphology of leaves, are an important indicator for species identification. However, leaf shape can be extraordinarily complex in some species, such as oaks. The great variation in leaf morphology and difficulty of species identification in oaks have attracted the attention of scientists since Charles Darwin. Recent advances in discrimination technology have provided opportunities to understand leaf morphology variation in oaks. Here, we aimed to compare the accuracy and efficiency of species identification in two closely related deciduous oaks by geometric morphometric method (GMM) and deep learning using preliminary identification of simple sequence repeats (nSSRs) as a prior. A total of 538 Asian deciduous oak trees, 16 Q. aliena and 23 Q. dentata populations, were firstly assigned by nSSRs Bayesian clustering analysis to one of the two species or admixture and this grouping served as a priori identification of these trees. Then we analyzed the shapes of 2328 leaves from the 538 trees in terms of 13 characters (landmarks) by GMM. Finally, we trained and classified 2221 leaf-scanned images with Xception architecture using deep learning. The two species can be identified by GMM and deep learning using genetic analysis as a priori. Deep learning is the most cost-efficient method in terms of time-consuming, while GMM can confirm the admixture individuals' leaf shape. These various methods provide high classification accuracy, highlight the application in plant classification research, and are ready to be applied to other morphology analysis.

3.
IEEE J Biomed Health Inform ; 28(4): 1937-1948, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-37327093

RESUMO

The complexes of long non-coding RNAs bound to proteins can be involved in regulating life activities at various stages of organisms. However, in the face of the growing number of lncRNAs and proteins, verifying LncRNA-Protein Interactions (LPI) based on traditional biological experiments is time-consuming and laborious. Therefore, with the improvement of computing power, predicting LPI has met new development opportunity. In virtue of the state-of-the-art works, a framework called LncRNA-Protein Interactions based on Kernel Combinations and Graph Convolutional Networks (LPI-KCGCN) has been proposed in this article. We first construct kernel matrices by taking advantage of extracting both the lncRNAs and protein concerning the sequence features, sequence similarity features, expression features, and gene ontology. Then reconstruct the existent kernel matrices as the input of the next step. Combined with known LPI interactions, the reconstructed similarity matrices, which can be used as features of the topology map of the LPI network, are exploited in extracting potential representations in the lncRNA and protein space using a two-layer Graph Convolutional Network. The predicted matrix can be finally obtained by training the network to produce scoring matrices w.r.t. lncRNAs and proteins. Different LPI-KCGCN variants are ensemble to derive the final prediction results and testify on balanced and unbalanced datasets. The 5-fold cross-validation shows that the optimal feature information combination on a dataset with 15.5% positive samples has an AUC value of 0.9714 and an AUPR value of 0.9216. On another highly unbalanced dataset with only 5% positive samples, LPI-KCGCN also has outperformed the state-of-the-art works, which achieved an AUC value of 0.9907 and an AUPR value of 0.9267.


Assuntos
Algoritmos , RNA Longo não Codificante , Humanos , RNA Longo não Codificante/genética , Biologia Computacional/métodos
4.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37930024

RESUMO

Development of robust and effective strategies for synthesizing new compounds, drug targeting and constructing GEnome-scale Metabolic models (GEMs) requires a deep understanding of the underlying biological processes. A critical step in achieving this goal is accurately identifying the categories of pathways in which a compound participated. However, current machine learning-based methods often overlook the multifaceted nature of compounds, resulting in inaccurate pathway predictions. Therefore, we present a novel framework on Multi-View Multi-Label Learning for Metabolic Pathway Inference, hereby named MVML-MPI. First, MVML-MPI learns the distinct compound representations in parallel with corresponding compound encoders to fully extract features. Subsequently, we propose an attention-based mechanism that offers a fusion module to complement these multi-view representations. As a result, MVML-MPI accurately represents and effectively captures the complex relationship between compounds and metabolic pathways and distinguishes itself from current machine learning-based methods. In experiments conducted on the Kyoto Encyclopedia of Genes and Genomes pathways dataset, MVML-MPI outperformed state-of-the-art methods, demonstrating the superiority of MVML-MPI and its potential to utilize the field of metabolic pathway design, which can aid in optimizing drug-like compounds and facilitating the development of GEMs. The code and data underlying this article are freely available at https://github.com/guofei-tju/MVML-MPI. Contact:  jtang@cse.sc.edu, guofei@csu.edu.com or wuxi_dyj@csj.uestc.edu.cn.


Assuntos
Aprendizado de Máquina , Redes e Vias Metabólicas
5.
Comput Biol Med ; 167: 107660, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37944303

RESUMO

Protein-protein interaction plays an important role in studying the mechanism of protein functions from the structural perspective. Molecular docking is a powerful approach to detect protein-protein complexes using computational tools, due to the high cost and time-consuming of the traditional experimental methods. Among existing technologies, the template-based method utilizes the structural information of known homologous 3D complexes as available and reliable templates to achieve high accuracy and low computational complexity. However, the performance of the template-based method depends on the quality and quantity of templates. When insufficient or even no templates, the ab initio docking method is necessary and largely enriches the docking conformations. Therefore, it's a feasible strategy to fuse the effectivity of the template-based model and the universality of ab initio model to improve the docking performance. In this study, we construct a new, diverse, comprehensive template library derived from PDB, containing 77,685 complexes. We propose a template-based method (named TemDock), which retrieves the evolutionary relationship between the target sequence and samples in the template library and transfers similar structural information. Then, the target structure is built by superposing on the homologous template complex with TM-align. Moreover, we develop a consensus-based method (named ComDock) to integrate our TemDock and an existing ab initio method (ZDOCK). On 105 targets with templates from Benchmark 5.0, the TemDock and ComDock achieve a success rate of 68.57 % and 71.43 % in the top 10 conformations, respectively. Compared with the HDOCK, ComDock obtains better I-RMSD of hit configurations on 9 targets and more hit models in the top 100 conformations. As an efficient method for protein-protein docking, the ComDock is expected to study protein-protein recognition and reveal the various biological passways that are critical for developing drug discovery. The final results are stored at https://github.com/guofei-tju/mqz_ComDock_docking.


Assuntos
Algoritmos , Software , Simulação de Acoplamento Molecular , Biologia Computacional/métodos , Bases de Dados de Proteínas , Proteínas/química , Ligação Proteica
6.
J Transl Med ; 21(1): 783, 2023 11 04.
Artigo em Inglês | MEDLINE | ID: mdl-37925448

RESUMO

Prior research has shown that the deconvolution of cell-free RNA can uncover the tissue origin. The conventional deconvolution approaches rely on constructing a reference tissue-specific gene panel, which cannot capture the inherent variation present in actual data. To address this, we have developed a novel method that utilizes a neural network framework to leverage the entire training dataset. Our approach involved training a model that incorporated 15 distinct tissue types. Through one semi-independent and two complete independent validations, including deconvolution using a semi in silico dataset, deconvolution with a custom normal tissue mixture RNA-seq data, and deconvolution of longitudinal circulating tumor cell RNA-seq (ctcRNA) data from a cancer patient with metastatic tumors, we demonstrate the efficacy and advantages of the deep-learning approach which were exerted by effectively capturing the inherent variability present in the dataset, thus leading to enhanced accuracy. Sensitivity analyses reveal that neural network models are less susceptible to the presence of missing data, making them more suitable for real-world applications. Moreover, by leveraging the concept of organotropism, we applied our approach to trace the migration of circulating tumor cell-derived RNA (ctcRNA) in a cancer patient with metastatic tumors, thereby highlighting the potential clinical significance of early detection of cancer metastasis.


Assuntos
Células Neoplásicas Circulantes , RNA , Humanos , Redes Neurais de Computação , RNA-Seq , Análise de Sequência de RNA
7.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37321965

RESUMO

In recent years, protein structure problems have become a hotspot for understanding protein folding and function mechanisms. It has been observed that most of the protein structure works rely on and benefit from co-evolutionary information obtained by multiple sequence alignment (MSA). As an example, AlphaFold2 (AF2) is a typical MSA-based protein structure tool which is famous for its high accuracy. As a consequence, these MSA-based methods are limited by the quality of the MSAs. Especially for orphan proteins that have no homologous sequence, AlphaFold2 performs unsatisfactorily as MSA depth decreases, which may pose a barrier to its widespread application in protein mutation and design problems in which there are no rich homologous sequences and rapid prediction is needed. In this paper, we constructed two standard datasets for orphan and de novo proteins which have insufficient/none homology information, called Orphan62 and Design204, respectively, to fairly evaluate the performance of the various methods in this case. Then, depending on whether or not utilizing scarce MSA information, we summarized two approaches, MSA-enhanced and MSA-free methods, to effectively solve the issue without sufficient MSAs. MSA-enhanced model aims to improve poor MSA quality from the data source by knowledge distillation and generation models. MSA-free model directly learns the relationship between residues on enormous protein sequences from pre-trained models, bypassing the step of extracting the residue pair representation from MSA. Next, we evaluated the performance of four MSA-free methods (trRosettaX-Single, TRFold, ESMFold and ProtT5) and MSA-enhanced (Bagging MSA) method compared with a traditional MSA-based method AlphaFold2, in two protein structure-related prediction tasks, respectively. Comparison analyses show that trRosettaX-Single and ESMFold which belong to MSA-free method can achieve fast prediction ($\sim\! 40$s) and comparable performance compared with AF2 in tertiary structure prediction, especially for short peptides, $\alpha $-helical segments and targets with few homologous sequences. Bagging MSA utilizing MSA enhancement improves the accuracy of our trained base model which is an MSA-based method when poor homology information exists in secondary structure prediction. Our study provides biologists an insight of how to select rapid and appropriate prediction tools for enzyme engineering and peptide drug development. CONTACT: guofei@csu.edu.cn, jj.tang@siat.ac.cn.


Assuntos
Algoritmos , Furilfuramida , Alinhamento de Sequência , Proteínas/química , Sequência de Aminoácidos
8.
Natl Sci Rev ; 10(5): nwad073, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-37223244

RESUMO

Synthetic genome evolution provides a dynamic approach for systematically and straightforwardly exploring evolutionary processes. Synthetic Chromosome Rearrangement and Modification by LoxP-mediated Evolution (SCRaMbLE) is an evolutionary system intrinsic to the synthetic yeast genome that can rapidly drive structural variations. Here, we detect over 260 000 rearrangement events after the SCRaMbLEing of a yeast strain harboring 5.5 synthetic yeast chromosomes (synII, synIII, synV, circular synVI, synIXR and synX). Remarkably, we find that the rearrangement events exhibit a specific landscape of frequency. We further reveal that the landscape is shaped by the combined effects of chromatin accessibility and spatial contact probability. The rearrangements tend to occur in 3D spatially proximal and chromatin-accessible regions. The enormous numbers of rearrangements mediated by SCRaMbLE provide a driving force to potentiate directed genome evolution, and the investigation of the rearrangement landscape offers mechanistic insights into the dynamics of genome evolution.

9.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 3033-3043, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37159322

RESUMO

Detecting potential associations between drugs and diseases plays an indispensable role in drug development, which has also become a research hotspot in recent years. Compared with traditional methods, some computational approaches have the advantages of fast speed and low cost, which greatly accelerate the progress of predicting the drug-disease association. In this study, we propose a novel similarity-based method of low-rank matrix decomposition based on multi-graph regularization. On the basis of low-rank matrix factorization with L2 regularization, the multi-graph regularization constraint is constructed by combining a variety of similarity matrices from drugs and diseases respectively. In the experiments, we analyze the difference in the combination of different similarities, resulting that combining all the similarity information on drug space is unnecessary, and only a part of the similarity information can achieve the desired performance. Then our method is compared with other existing models on three data sets (Fdataset, Cdataset and LRSSLdataset) and have a good advantage in the evaluation measurement of AUPR. Besides, a case study experiment is conducted and showing that the superior ability for predicting the potential disease-related drugs of our model. Finally, we compare our model with some methods on six real world datasets, and our model has a good performance in detecting real world data.


Assuntos
Algoritmos , Desenvolvimento de Medicamentos , Descoberta de Drogas
10.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-36932655

RESUMO

Determining drug-drug interactions (DDIs) is an important part of pharmacovigilance and has a vital impact on public health. Compared with drug trials, obtaining DDI information from scientific articles is a faster and lower cost but still a highly credible approach. However, current DDI text extraction methods consider the instances generated from articles to be independent and ignore the potential connections between different instances in the same article or sentence. Effective use of external text data could improve prediction accuracy, but existing methods cannot extract key information from external data accurately and reasonably, resulting in low utilization of external data. In this study, we propose a DDI extraction framework, instance position embedding and key external text for DDI (IK-DDI), which adopts instance position embedding and key external text to extract DDI information. The proposed framework integrates the article-level and sentence-level position information of the instances into the model to strengthen the connections between instances generated from the same article or sentence. Moreover, we introduce a comprehensive similarity-matching method that uses string and word sense similarity to improve the matching accuracy between the target drug and external text. Furthermore, the key sentence search method is used to obtain key information from external data. Therefore, IK-DDI can make full use of the connection between instances and the information contained in external text data to improve the efficiency of DDI extraction. Experimental results show that IK-DDI outperforms existing methods on both macro-averaged and micro-averaged metrics, which suggests our method provides complete framework that can be used to extract relationships between biomedical entities and process external text data.


Assuntos
Mineração de Dados , Farmacovigilância , Mineração de Dados/métodos , Interações Medicamentosas , Benchmarking , Sistemas de Liberação de Medicamentos
11.
Artigo em Inglês | MEDLINE | ID: mdl-34882559

RESUMO

N4-methylcytosine (4mC) is one of important epigenetic modifications in DNA sequences. Detecting 4mC sites is time-consuming. The computational method based on machine learning has provided effective help for identifying 4mC. To further improve the performance of prediction, we propose a Laplacian Regularized Sparse Representation based Classifier with L2,1/2-matrix norm (LapRSRC). We also utilize kernel trick to derive the kernel LapRSRC for nonlinear modeling. Matrix factorization technology is employed to solve the sparse representation coefficients of all test samples in the training set. And an efficient iterative algorithm is proposed to solve the objective function. We implement our model on six benchmark datasets of 4mC and eight UCI datasets to evaluate performance. The results show that the performance of our method is better or comparable.


Assuntos
Algoritmos , Aprendizado de Máquina , Epigênese Genética/genética , DNA/genética
12.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36355452

RESUMO

MOTIVATION: Somatic mutation co-occurrence has been proven to have a profound effect on tumorigenesis. While some studies have been conducted on co-mutations, a centralized resource dedicated to co-mutations in cancer is still lacking. RESULTS: Using multi-omics data from over 30 000 subjects and 1747 cancer cell lines, we present the Cancer co-mutation database (CoMutDB), the most comprehensive resource devoted to describing cancer co-mutations and their characteristics. AVAILABILITY AND IMPLEMENTATION: The data underlying this article are available in the online database CoMutDB: http://www.innovebioinfo.com/Database/CoMutDB/Home.php.


Assuntos
Neoplasias , Humanos , Mutação , Bases de Dados Factuais , Neoplasias/genética , Carcinogênese , Transformação Celular Neoplásica
13.
Comput Biol Chem ; 101: 107765, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-36113329

RESUMO

BACKGROUND: RNA Secondary Structure (RSS) has drawn growing concern, both for their pivotal roles in RNA tertiary structures prediction and critical effect in penetrating the mechanism of functional non-coding RNA. Computational techniques that can reduce the in vitro and in vivo experimental costs have become popular in RSS prediction. However, as an NP-hard problem, there is room for improvement that the validity of the prediction RSS with pseudoknots in traditional machine learning predictors. RESULTS: In this essay, by integrating the bidirectional GRU (Gated Recurrent Unit) with the attention, we propose a multilayered neural network called BAT-Net to predict RSS. Different from the state-of-the-art works, BAT-Net can not only make full use of the information about the direct predecessor and direct successor of the predicted base in the RNA sequence but also dynamically adjust the corresponding loss function. The experimental results on five representative datasets extracted from the RNA STRAND database show that the sensitivity, precision, accuracy, and MCC (Matthews Correlation Coefficient) of the BAT-Net have improved by 8.52%, 8.28%, 5.66% and 9.82%, respectively, compared with the benchmark approaches on the best averages. CONCLUSIONS: BAT-Net can provide users with more credible RSS results since it has further utilized the source information of the dataset. Comparative results show that the proposed BAT-Net is superior to the other existing methods on the relevant indicators.


Assuntos
Redes Neurais de Computação , RNA , RNA/genética , RNA/química , Estrutura Secundária de Proteína , Sequência de Bases
14.
Nat Commun ; 13(1): 5361, 2022 09 12.
Artigo em Inglês | MEDLINE | ID: mdl-36097016

RESUMO

DNA data storage is a rapidly developing technology with great potential due to its high density, long-term durability, and low maintenance cost. The major technical challenges include various errors, such as strand breaks, rearrangements, and indels that frequently arise during DNA synthesis, amplification, sequencing, and preservation. In this study, a de novo strand assembly algorithm (DBGPS) is developed using de Bruijn graph and greedy path search to meet these challenges. DBGPS shows substantial advantages in handling DNA breaks, rearrangements, and indels. The robustness of DBGPS is demonstrated by accelerated aging, multiple independent data retrievals, deep error-prone PCR, and large-scale simulations. Remarkably, 6.8 MB of data is accurately recovered from a severely corrupted sample that has been treated at 70 °C for 70 days. With DBGPS, we are able to achieve a logical density of 1.30 bits/cycle and a physical density of 295 PB/g.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Armazenamento e Recuperação da Informação , Algoritmos , DNA/genética , Análise de Sequência de DNA
15.
Methods Mol Biol ; 2569: 343-359, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36083457

RESUMO

Effective population size (Ne) determines the amount of genetic diversity and the fate of genetic variants in a species and thus is an essential parameter in evolutionary genetics. There are standard approaches to determine the Ne of evolving species. For example, the long-term Ne of an extant species is calculated based on its unbiased global mutation rate and the neutral genetic diversity of the species. However, approaches for inferring Ne of ancestral lineages are less known. Here, we introduce an evolutionary genetic statistic and an analytical procedure to assess the efficiency of natural selection for deep nodes by calculating rates of nonsynonymous nucleotide substitutions leading to radical (dR) and conservative (dC) amino acid replacements, respectively. Given that radical variants are more likely to be deleterious than conservative ones, an elevated dR/dC ratio in gene families across the genome means an accelerated genome-wide accumulation of the more deleterious type of mutations (i.e., radical variants), which indicates that natural selection is less efficient and genetic drift becomes more powerful. Earlier approaches that calculate dR/dC do not consider the impact of nucleotide composition (G+C content) on the dR/dC result, which is partially accounted for in more recent methods. Here, we use these methods to demonstrate that genetic drift may have driven the early evolution of Prochlorococcus, the most abundant carbon-fixing photosynthetic bacteria in the ocean.


Assuntos
Deriva Genética , Seleção Genética , Evolução Molecular , Variação Genética , Genoma , Modelos Genéticos , Mutação , Nucleotídeos
16.
Biology (Basel) ; 11(6)2022 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-35741369

RESUMO

As an important part of immune surveillance, major histocompatibility complex (MHC) is a set of proteins that recognize foreign molecules. Computational prediction methods for MHC binding peptides have been developed. However, existing methods share the limitation of fixed peptide sequence length, which necessitates the training of models by peptide length or prediction with a length reduction technique. Using a bidirectional long short-term memory neural network, we constructed BVMHC, an MHC class I and II binding prediction tool that is independent of peptide length. The performance of BVMHC was compared to seven MHC class I prediction tools and three MHC class II prediction tools using eight performance criteria independently. BVMHC attained the best performance in three of the eight criteria for MHC class I, and the best performance in four of the eight criteria for MHC class II, including accuracy and AUC. Furthermore, models for non-human species were also trained using the same strategy and made available for applications in mice, chimpanzees, macaques, and rats. BVMHC is composed of a series of peptide length independent MHC class I and II binding predictors. Models from this study have been implemented in an online web portal for easy access and use.

17.
Metabolites ; 12(4)2022 Mar 23.
Artigo em Inglês | MEDLINE | ID: mdl-35448468

RESUMO

Blood pressure is one of the most basic health screenings and it has a complex relationship with chronic kidney disease (CKD). Controlling blood pressure for CKD patients is crucial for curbing kidney function decline and reducing the risk of cardiovascular disease. Two independent CKD cohorts, including matched controls (discovery n = 824; validation n = 552), were recruited. High-throughput metabolomics was conducted with the patients' serum samples using mass spectrometry. After controlling for CKD severity and other clinical hypertension risk factors, we identified ten metabolites that have significant associations with blood pressure. The quantitative importance of these metabolites was verified in a fully connected neural network model. Of the ten metabolites, seven have not previously been associated with blood pressure. The metabolites that had the strongest positive association with blood pressure were aspartylglycosamine (p = 4.58 × 10-5), fructose-1,6-diphosphate (p = 1.19 × 10-4) and N-Acetylserine (p = 3.27 × 10-4). Three metabolites that were negatively associated with blood pressure (phosphocreatine, p = 6.39 × 10-3; dodecanedioic acid, p = 0.01; phosphate, p = 0.04) have been reported previously to have beneficial effects on hypertension. These results suggest that intake of metabolites as supplements may help to control blood pressure in CKD patients.

18.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35134117

RESUMO

Targeted drugs have been applied to the treatment of cancer on a large scale, and some patients have certain therapeutic effects. It is a time-consuming task to detect drug-target interactions (DTIs) through biochemical experiments. At present, machine learning (ML) has been widely applied in large-scale drug screening. However, there are few methods for multiple information fusion. We propose a multiple kernel-based triple collaborative matrix factorization (MK-TCMF) method to predict DTIs. The multiple kernel matrices (contain chemical, biological and clinical information) are integrated via multi-kernel learning (MKL) algorithm. And the original adjacency matrix of DTIs could be decomposed into three matrices, including the latent feature matrix of the drug space, latent feature matrix of the target space and the bi-projection matrix (used to join the two feature spaces). To obtain better prediction performance, MKL algorithm can regulate the weight of each kernel matrix according to the prediction error. The weights of drug side-effects and target sequence are the highest. Compared with other computational methods, our model has better performance on four test data sets.


Assuntos
Algoritmos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Interações Medicamentosas , Humanos , Aprendizado de Máquina
19.
Cancers (Basel) ; 14(2)2022 Jan 14.
Artigo em Inglês | MEDLINE | ID: mdl-35053577

RESUMO

Somatic mutations are one of the most important factors in tumorigenesis and are the focus of most cancer-sequencing efforts. The co-occurrence of multiple mutations in one tumor has gained increasing attention as a means of identifying cooperating mutations or pathways that contribute to cancer. Using multi-omics, phenotypical, and clinical data from 29,559 cancer subjects and 1747 cancer cell lines covering 78 distinct cancer types, we show that co-mutations are associated with prognosis, drug sensitivity, and disparities in sex, age, and race. Some co-mutation combinations displayed stronger effects than their corresponding single mutations. For example, co-mutation TP53:KRAS in pancreatic adenocarcinoma is significantly associated with disease specific survival (hazard ratio = 2.87, adjusted p-value = 0.0003) and its prognostic predictive power is greater than either TP53 or KRAS as individually mutated genes. Functional analyses revealed that co-mutations with higher prognostic values have higher potential impact and cause greater dysregulation of gene expression. Furthermore, many of the prognostically significant co-mutations caused gains or losses of binding sequences of RNA binding proteins or micro RNAs with known cancer associations. Thus, detailed analyses of co-mutations can identify mechanisms that cooperate in tumorigenesis.

20.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35062026

RESUMO

Inferring gene regulatory networks (GRNs) based on gene expression profiles is able to provide an insight into a number of cellular phenotypes from the genomic level and reveal the essential laws underlying various life phenomena. Different from the bulk expression data, single-cell transcriptomic data embody cell-to-cell variance and diverse biological information, such as tissue characteristics, transformation of cell types, etc. Inferring GRNs based on such data offers unprecedented advantages for making a profound study of cell phenotypes, revealing gene functions and exploring potential interactions. However, the high sparsity, noise and dropout events of single-cell transcriptomic data pose new challenges for regulation identification. We develop a hybrid deep learning framework for GRN inference from single-cell transcriptomic data, DGRNS, which encodes the raw data and fuses recurrent neural network and convolutional neural network (CNN) to train a model capable of distinguishing related gene pairs from unrelated gene pairs. To overcome the limitations of such datasets, it applies sliding windows to extract valuable features while preserving the direction of regulation. DGRNS is constructed as a deep learning model containing gated recurrent unit network for exploring time-dependent information and CNN for learning spatially related information. Our comprehensive and detailed comparative analysis on the dataset of mouse hematopoietic stem cells illustrates that DGRNS outperforms state-of-the-art methods. The networks inferred by DGRNS are about 16% higher than the area under the receiver operating characteristic curve of other unsupervised methods and 10% higher than the area under the precision recall curve of other supervised methods. Experiments on human datasets show the strong robustness and excellent generalization of DGRNS. By comparing the predictions with standard network, we discover a series of novel interactions which are proved to be true in some specific cell types. Importantly, DGRNS identifies a series of regulatory relationships with high confidence and functional consistency, which have not yet been experimentally confirmed and merit further research.


Assuntos
Aprendizado Profundo , Redes Reguladoras de Genes , Algoritmos , Animais , Camundongos , Redes Neurais de Computação , Transcriptoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...