Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 72
Filter
1.
Brief Bioinform ; 25(3)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38725156

ABSTRACT

Protein acetylation is one of the extensively studied post-translational modifications (PTMs) due to its significant roles across a myriad of biological processes. Although many computational tools for acetylation site identification have been developed, there is a lack of benchmark dataset and bespoke predictors for non-histone acetylation site prediction. To address these problems, we have contributed to both dataset creation and predictor benchmark in this study. First, we construct a non-histone acetylation site benchmark dataset, namely NHAC, which includes 11 subsets according to the sequence length ranging from 11 to 61 amino acids. There are totally 886 positive samples and 4707 negative samples for each sequence length. Secondly, we propose TransPTM, a transformer-based neural network model for non-histone acetylation site predication. During the data representation phase, per-residue contextualized embeddings are extracted using ProtT5 (an existing pre-trained protein language model). This is followed by the implementation of a graph neural network framework, which consists of three TransformerConv layers for feature extraction and a multilayer perceptron module for classification. The benchmark results reflect that TransPTM has the competitive performance for non-histone acetylation site prediction over three state-of-the-art tools. It improves our comprehension on the PTM mechanism and provides a theoretical basis for developing drug targets for diseases. Moreover, the created PTM datasets fills the gap in non-histone acetylation site datasets and is beneficial to the related communities. The related source code and data utilized by TransPTM are accessible at https://www.github.com/TransPTM/TransPTM.


Subject(s)
Neural Networks, Computer , Protein Processing, Post-Translational , Acetylation , Computational Biology/methods , Databases, Protein , Software , Algorithms , Humans , Proteins/chemistry , Proteins/metabolism
2.
ACS Nano ; 18(20): 13141-13149, 2024 May 21.
Article in English | MEDLINE | ID: mdl-38718265

ABSTRACT

Electrocatalytic reduction of NO2- to NH3 (NO2RR) offers an effective method for alleviating NO2- pollution and generating valuable NH3. Herein, a p-block single-atom alloy, namely, isolated Sb alloyed in a Cu substrate (Sb1Cu), is explored as a durable and high-current-density NO2RR catalyst. As revealed by the theoretical calculations and operando spectroscopic measurements, we demonstrate that Sb1 incorporation can not only hamper the competing hydrogen evolution reaction but also optimize the d-band center of Sb1Cu and intermediate adsorption energies to boost the protonation energetics of NO2--to-NH3 conversion. Consequently, Sb1Cu integrated in a flow cell achieves an outstanding NH3 yield rate of 2529.4 µmol h-1 cm-2 and FENH3 of 95.9% at a high current density of 424.2 mA cm-2, as well as a high durability for 100 h of electrolysis.

3.
iScience ; 27(4): 109352, 2024 Apr 19.
Article in English | MEDLINE | ID: mdl-38510148

ABSTRACT

Gene regulatory networks (GRNs) involve complex and multi-layer regulatory interactions between regulators and their target genes. Precise knowledge of GRNs is important in understanding cellular processes and molecular functions. Recent breakthroughs in single-cell sequencing technology made it possible to infer GRNs at single-cell level. Existing methods, however, are limited by expensive computations, and sometimes simplistic assumptions. To overcome these obstacles, we propose scGREAT, a framework to infer GRN using gene embeddings and transformer from single-cell transcriptomics. scGREAT starts by constructing gene expression and gene biotext dictionaries from scRNA-seq data and gene text information. The representation of TF gene pairs is learned through optimizing embedding space by transformer-based engine. Results illustrated scGREAT outperformed other contemporary methods on benchmarks. Besides, gene representations from scGREAT provide valuable gene regulation insights, and external validation on spatial transcriptomics illuminated the mechanism behind scGREAT annotation. Moreover, scGREAT identified several TF target regulations corroborated in studies.

4.
Biochem Genet ; 2024 Feb 15.
Article in English | MEDLINE | ID: mdl-38361095

ABSTRACT

Stomach adenocarcinoma (STAD) patients are often associated with significantly high mortality rates and poor prognoses worldwide. Among STAD patients, competing endogenous RNAs (ceRNAs) play key roles in regulating one another at the post-transcriptional stage by competing for shared miRNAs. In this study, we aimed to elucidate the roles of lncRNAs in the ceRNA network of STAD, uncovering the molecular biomarkers for target therapy and prognosis. Specifically, a multitude of differentially expressed lncRNAs, miRNAs, and mRNAs (i.e., 898 samples in total) was collected and processed from TCGA. Cytoplasmic lncRNAs were kept for evaluating overall survival (OS) time and constructing the ceRNA network. Differentially expressed mRNAs in the ceRNA network were also investigated for functional and pathological insights. Interestingly, we identified one ceRNA network including 13 lncRNAs, 25 miRNAs, and 9 mRNAs. Among them, 13 RNAs were found related to the patient survival time; their individual risk score can be adopted for prognosis inference. Finally, we constructed a comprehensive ceRNA regulatory network for STAD and developed our own risk-scoring system that can predict the OS time of STAD patients by taking into account the above.

5.
Dalton Trans ; 53(8): 3470-3475, 2024 Feb 20.
Article in English | MEDLINE | ID: mdl-38323778

ABSTRACT

Electrocatalytic NO2--to-NH3 reduction (NO2RR) has emerged as an intriguing route for simultaneous mitigation of harmful nitrites and production of valuable NH3. Herein, we design for the first time undercoordinated Cu nanowires (u-Cu) as an efficient and selective NO2RR electrocatalyst, delivering the maximum NO2--to-NH3 faradaic efficiency of 94.7% and an ammonia production rate of 494.5 µmol h-1 cm-2 at -0.7 V vs. RHE. Theoretical calculations reveal that the created undercoordinated Cu sites on u-Cu can enhance NO2- adsorption, boost NO2--to-NH3 energetics and restrict competitive hydrogen evolution, thereby enabling the active and selective NO2RR.

6.
Comput Biol Med ; 168: 107753, 2024 01.
Article in English | MEDLINE | ID: mdl-38039889

ABSTRACT

BACKGROUND: Trans-acting factors are of special importance in transcription regulation, which is a group of proteins that can directly or indirectly recognize or bind to the 8-12 bp core sequence of cis-acting elements and regulate the transcription efficiency of target genes. The progressive development in high-throughput chromatin capture technology (e.g., Hi-C) enables the identification of chromatin-interacting sequence groups where trans-acting DNA motif groups can be discovered. The problem difficulty lies in the combinatorial nature of DNA sequence pattern matching and its underlying sequence pattern search space. METHOD: Here, we propose to develop MotifHub for trans-acting DNA motif group discovery on grouped sequences. Specifically, the main approach is to develop probabilistic modeling for accommodating the stochastic nature of DNA motif patterns. RESULTS: Based on the modeling, we develop global sampling techniques based on EM and Gibbs sampling to address the global optimization challenge for model fitting with latent variables. The results reflect that our proposed approaches demonstrate promising performance with linear time complexities. CONCLUSION: MotifHub is a novel algorithm considering the identification of both DNA co-binding motif groups and trans-acting TFs. Our study paves the way for identifying hub TFs of stem cell development (OCT4 and SOX2) and determining potential therapeutic targets of prostate cancer (FOXA1 and MYC). To ensure scientific reproducibility and long-term impact, its matrix-algebra-optimized source code is released at http://bioinfo.cs.cityu.edu.hk/MotifHub.


Subject(s)
Algorithms , Software , Nucleotide Motifs/genetics , Reproducibility of Results , Chromatin/genetics
7.
iScience ; 26(11): 108197, 2023 Nov 17.
Article in English | MEDLINE | ID: mdl-37965148

ABSTRACT

By soaking microRNAs (miRNAs), long non-coding RNAs (lncRNAs) have the potential to regulate gene expression. Few methods have been created based on this mechanism to anticipate the lncRNA-gene relationship prediction. Hence, we present lncRNA-Top to forecast potential lncRNA-gene regulation relationships. Specifically, we constructed controlled deep-learning methods using 12417 lncRNAs and 16127 genes. We have provided retrospective and innovative views among negative sampling, random seeds, cross-validation, metrics, and independent datasets. The AUC, AUPR, and our defined precision@k were leveraged to evaluate performance. In-depth case studies demonstrate that 47 out of 100 projected top unknown pairings were recorded in publications, supporting the predictive power. Our additional software can annotate the scores with target candidates. The lncRNA-Top will be a helpful tool to uncover prospective lncRNA targets and better comprehend the regulatory processes of lncRNAs.

8.
Nat Commun ; 14(1): 6824, 2023 10 26.
Article in English | MEDLINE | ID: mdl-37884495

ABSTRACT

RNA-binding proteins play crucial roles in the regulation of gene expression, and understanding the interactions between RNAs and RBPs in distinct cellular conditions forms the basis for comprehending the underlying RNA function. However, current computational methods pose challenges to the cross-prediction of RNA-protein binding events across diverse cell lines and tissue contexts. Here, we develop HDRNet, an end-to-end deep learning-based framework to precisely predict dynamic RBP binding events under diverse cellular conditions. Our results demonstrate that HDRNet can accurately and efficiently identify binding sites, particularly for dynamic prediction, outperforming other state-of-the-art models on 261 linear RNA datasets from both eCLIP and CLIP-seq, supplemented with additional tissue data. Moreover, we conduct motif and interpretation analyses to provide fresh insights into the pathological mechanisms underlying RNA-RBP interactions from various perspectives. Our functional genomic analysis further explores the gene-human disease associations, uncovering previously uncharacterized observations for a broad range of genetic disorders.


Subject(s)
RNA-Binding Proteins , RNA , Humans , RNA/genetics , RNA/metabolism , RNA-Binding Proteins/metabolism , Binding Sites/genetics , Protein Binding , Chromatin Immunoprecipitation Sequencing
9.
Adv Sci (Weinh) ; 10(33): e2303502, 2023 11.
Article in English | MEDLINE | ID: mdl-37816141

ABSTRACT

Single-cell Hi-C (scHi-C) has made it possible to analyze chromatin organization at the single-cell level. However, scHi-C experiments generate inherently sparse data, which poses a challenge for loop calling methods. The existing approach performs significance tests across the imputed dense contact maps, leading to substantial computational overhead and loss of information at the single-cell level. To overcome this limitation, a lightweight framework called scGSLoop is proposed, which sets a new paradigm for scHi-C loop calling by adapting the training and inferencing strategies of graph-based deep learning to leverage the sequence features and 1D positional information of genomic loci. With this framework, sparsity is no longer a challenge, but rather an advantage that the model leverages to achieve unprecedented computational efficiency. Compared to existing methods, scGSLoop makes more accurate predictions and is able to identify more loops that have the potential to play regulatory roles in genome functioning. Moreover, scGSLoop preserves single-cell information by identifying a distinct group of loops for each individual cell, which not only enables an understanding of the variability of chromatin looping states between cells, but also allows scGSLoop to be extended for the investigation of multi-connected hubs and their underlying mechanisms.


Subject(s)
Chromatin , Genomics , Chromatin/genetics , Genome
10.
Environ Sci Technol ; 57(40): 15055-15064, 2023 10 10.
Article in English | MEDLINE | ID: mdl-37774013

ABSTRACT

The particle phase state plays a vital role in the gas-particle partitioning, multiphase reactions, ice nucleation activity, and particle growth in the atmosphere. However, the characterization of the atmospheric phase state remains challenging. Herein, based on measured aerosol chemical composition and ambient relative humidity (RH), a machine learning (ML) model with high accuracy (R2 = 0.952) and robustness (RMSE = 0.078) was developed to predict the particle rebound fraction, f, which is an indicator of the particle phase state. Using this ML model, the f of particles in the urban atmosphere was predicted based on seasonal average aerosol chemical composition and RH. Regardless of seasons, aerosols remain in the liquid state of mid-high latitude cities in the northern hemisphere and in the semisolid state over semiarid regions. In the East Asian megacities, the particles remain in the liquid state in spring and summer and in the semisolid state in other seasons. The effects of nitrate, which is becoming dominant in fine particles in several urban areas, on the particle phase state were evaluated. More nitrate led the particles to remain in the liquid state at an even lower RH. This study proposed a new approach to predict the particle phase state in the atmosphere based on RH and aerosol chemical composition.


Subject(s)
Atmosphere , Nitrates , Aerosols , Atmosphere/chemistry , Cities , Seasons , Particle Size
11.
Brief Bioinform ; 24(4)2023 07 20.
Article in English | MEDLINE | ID: mdl-37455245

ABSTRACT

The rapid growth of omics-based data has revolutionized biomedical research and precision medicine, allowing machine learning models to be developed for cutting-edge performance. However, despite the wealth of high-throughput data available, the performance of these models is hindered by the lack of sufficient training data, particularly in clinical research (in vivo experiments). As a result, translating this knowledge into clinical practice, such as predicting drug responses, remains a challenging task. Transfer learning is a promising tool that bridges the gap between data domains by transferring knowledge from the source to the target domain. Researchers have proposed transfer learning to predict clinical outcomes by leveraging pre-clinical data (mouse, zebrafish), highlighting its vast potential. In this work, we present a comprehensive literature review of deep transfer learning methods for health informatics and clinical decision-making, focusing on high-throughput molecular data. Previous reviews mostly covered image-based transfer learning works, while we present a more detailed analysis of transfer learning papers. Furthermore, we evaluated original studies based on different evaluation settings across cross-validations, data splits and model architectures. The result shows that those transfer learning methods have great potential; high-throughput sequencing data and state-of-the-art deep learning models lead to significant insights and conclusions. Additionally, we explored various datasets in transfer learning papers with statistics and visualization.


Subject(s)
Benchmarking , Zebrafish , Animals , Mice , Zebrafish/genetics , Machine Learning , Precision Medicine , Clinical Decision-Making
12.
Adv Sci (Weinh) ; 10(22): e2205442, 2023 08.
Article in English | MEDLINE | ID: mdl-37290050

ABSTRACT

Unsupervised clustering is an essential step in identifying cell types from single-cell RNA sequencing (scRNA-seq) data. However, a common issue with unsupervised clustering models is that the optimization direction of the objective function and the final generated clustering labels in the absence of supervised information may be inconsistent or even arbitrary. To address this challenge, a dynamic ensemble pruning framework (DEPF) is proposed to identify and interpret single-cell molecular heterogeneity. In particular, a silhouette coefficient-based indicator is developed to determine the optimization direction of the bi-objective function. In addition, a hierarchical autoencoder is employed to project the high-dimensional data onto multiple low-dimensional latent space sets, and then a clustering ensemble is produced in the latent space by the basic clustering algorithm. Following that, a bi-objective fruit fly optimization algorithm is designed to prune dynamically the low-quality basic clustering in the ensemble. Multiple experiments are conducted on 28 real scRNA-seq datasets and one large real scRNA-seq dataset from diverse platforms and species to validate the effectiveness of the DEPF. In addition, biological interpretability and transcriptional and post-transcriptional regulatory are conducted to explore biological patterns from the cell types identified, which could provide novel insights into characterizing the mechanisms.


Subject(s)
Algorithms , Single-Cell Analysis , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods , Cluster Analysis , Gene Expression Regulation
13.
Inorg Chem ; 62(22): 8487-8493, 2023 Jun 05.
Article in English | MEDLINE | ID: mdl-37219358

ABSTRACT

We report iron diboride (FeB2) as a high-performance metal diboride catalyst for electrochemical NO-to-NH3 reduction (NORR), which shows a maximum NH3 yield rate of 289.3 µmol h-1 cm-2 and a NH3-Faradaic efficiency of 93.8% at -0.4 V versus reversible hydrogen electrode. Theoretical computations reveal that Fe and B sites synergetically activate the NO molecule, while the protonation of NO is energetically more favorable on B sites. Meanwhile, both Fe and B sites preferentially absorb NO over H atoms to suppress the competing hydrogen evolution.

14.
Nat Commun ; 14(1): 400, 2023 01 25.
Article in English | MEDLINE | ID: mdl-36697410

ABSTRACT

Single-cell RNA sequencing provides high-throughput gene expression information to explore cellular heterogeneity at the individual cell level. A major challenge in characterizing high-throughput gene expression data arises from challenges related to dimensionality, and the prevalence of dropout events. To address these concerns, we develop a deep graph learning method, scMGCA, for single-cell data analysis. scMGCA is based on a graph-embedding autoencoder that simultaneously learns cell-cell topology representation and cluster assignments. We show that scMGCA is accurate and effective for cell segregation and batch effect correction, outperforming other state-of-the-art models across multiple platforms. In addition, we perform genomic interpretation on the key compressed transcriptomic space of the graph-embedding autoencoder to demonstrate the underlying gene regulation mechanism. We demonstrate that in a pancreatic ductal adenocarcinoma dataset, scMGCA successfully provides annotations on the specific cell types and reveals differential gene expression levels across multiple tumor-associated and cell signalling pathways.


Subject(s)
Carcinoma, Pancreatic Ductal , Pancreatic Neoplasms , Humans , Gene Expression Profiling/methods , Pancreatic Neoplasms/genetics , Gene Expression Regulation , Transcriptome , Carcinoma, Pancreatic Ductal/genetics , Single-Cell Analysis/methods
15.
iScience ; 25(12): 105535, 2022 Dec 22.
Article in English | MEDLINE | ID: mdl-36444296

ABSTRACT

Graph and image are two common representations of Hi-C cis-contact maps. Existing computational tools have only adopted Hi-C data modeled as unitary data structures but neglected the potential advantages of synergizing the information of different views. Here we propose GILoop, a dual-branch neural network that learns from both representations to identify genome-wide CTCF-mediated loops. With GILoop, we explore the combined strength of integrating the two view representations of Hi-C data and corroborate the complementary relationship between the views. In particular, the model outperforms the state-of-the-art loop calling framework and is also more robust against low-quality Hi-C libraries. We also uncover distinct preferences for matrix density by graph-based and image-based models, revealing interesting insights into Hi-C data elucidation. Finally, along with multiple transfer-learning case studies, we demonstrate that GILoop can accurately model the organizational and functional patterns of CTCF-mediated looping across different cell lines.

16.
Bioinformatics ; 38(19): 4537-4545, 2022 09 30.
Article in English | MEDLINE | ID: mdl-35984287

ABSTRACT

MOTIVATION: Single-cell RNA sequencing (scRNA-seq) can provide insight into gene expression patterns at the resolution of individual cells, which offers new opportunities to study the behavior of different cell types. However, it is often plagued by dropout events, a phenomenon where the expression value of a gene tends to be measured as zero in the expression matrix due to various technical defects. RESULTS: In this article, we argue that borrowing gene and cell information across column and row subspaces directly results in suboptimal solutions due to the noise contamination in imputing dropout values. Thus, to impute more precisely the dropout events in scRNA-seq data, we develop a regularization for leveraging that imperfect prior information to estimate the true underlying prior subspace and then embed it in a typical low-rank matrix completion-based framework, named scWMC. To evaluate the performance of the proposed method, we conduct comprehensive experiments on simulated and real scRNA-seq data. Extensive data analysis, including simulated analysis, cell clustering, differential expression analysis, functional genomic analysis, cell trajectory inference and scalability analysis, demonstrate that our method produces improved imputation results compared to competing methods that benefits subsequent downstream analysis. AVAILABILITY AND IMPLEMENTATION: The source code is available at https://github.com/XuYuanchi/scWMC and test data is available at https://doi.org/10.5281/zenodo.6832477. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Gene Expression Profiling , Single-Cell Analysis , Sequence Analysis, RNA/methods , Software , Exome Sequencing
17.
Comput Struct Biotechnol J ; 20: 3522-3532, 2022.
Article in English | MEDLINE | ID: mdl-35860402

ABSTRACT

Post-translational modifications (PTMs) are closely linked to numerous diseases, playing a significant role in regulating protein structures, activities, and functions. Therefore, the identification of PTMs is crucial for understanding the mechanisms of cell biology and diseases therapy. Compared to traditional machine learning methods, the deep learning approaches for PTM prediction provide accurate and rapid screening, guiding the downstream wet experiments to leverage the screened information for focused studies. In this paper, we reviewed the recent works in deep learning to identify phosphorylation, acetylation, ubiquitination, and other PTM types. In addition, we summarized PTM databases and discussed future directions with critical insights.

18.
Gene ; 835: 146657, 2022 Aug 15.
Article in English | MEDLINE | ID: mdl-35710083

ABSTRACT

Bladder urothelial carcinoma (BLCA) is a complex disease with high morbidity and mortality. Changes in alternative splicing (AS) and splicing factor (SF) can affect gene expression, thus playing an essential role in tumorigenesis. This study downloaded 412 patients' clinical information and 433 samples of transcriptome profiling data from TCGA. And we collected 48 AS signatures from SpliceSeq. LASSO and Cox analyses were used for identifying survival-related AS events in BLCA. Finally, 1,645 OS-related AS events in 1,129 genes were validated by Kaplan-Meier (KM) survival analysis, ROC analysis, risk curve analysis, and independent prognostic analysis. Finally, our survey provides an AS-SF regulation network consisting of five SFs and 46 AS events. In the end, we profiled genes that AS occurred in pan-cancer and five SFs' expression in tumor and normal samples in BLCA. We selected CLIP-seq data for validation the interaction regulated by RBP. Our study paves the way for potential therapeutic targets of BLCA.


Subject(s)
Carcinoma, Transitional Cell , Urinary Bladder Neoplasms , Alternative Splicing , Biomarkers, Tumor/genetics , Biomarkers, Tumor/metabolism , Carcinoma, Transitional Cell/genetics , Gene Expression Regulation, Neoplastic , Humans , RNA Splicing Factors/genetics , Urinary Bladder Neoplasms/metabolism
19.
IEEE J Biomed Health Inform ; 26(8): 4303-4313, 2022 08.
Article in English | MEDLINE | ID: mdl-35439152

ABSTRACT

Exploring the prognostic classification and biomarkers in Head and Neck Squamous Carcinoma (HNSC) is of great clinical significance. We hybridized three prominent strategies to comprehensively characterize the molecular features of HNSC. We constructed a 15-gene signature to predict patients' death risk with an average AUC of 0.744 for 1-, 3-, and 5-year on TCGA-HNSC training set, and average AUCs of 0.636, 0.584, 0.755 in GSE65858, GSE-112026, CPTAC-HNSCC datasets, respectively. By combined with NMF clustering and consensus clustering of fraction of tumor immune cell infiltration (ICI) in the tumor microenvironment (TME), we captured a more refined biological characteristics of HNSC, and observed a prognosis heterogeneity in high tumor immunity patients. By matching tumor subset-specific expression signatures to drug-induced cell line expression profiles from large-scale pharmacogenomic databases in the OCTAD workspace, we identified a group of HNSC patients featured with poor prognosis and demonstrated that the individuals in this group are likely to receive increased drug sensitivity to reverse differentially expressed disease signature genes. This trend is especially highlighted among those with higher death risk and tumour immunity.


Subject(s)
Gene Expression Profiling , Head and Neck Neoplasms , Biomarkers, Tumor/genetics , Head and Neck Neoplasms/drug therapy , Head and Neck Neoplasms/genetics , Humans , Prognosis , Squamous Cell Carcinoma of Head and Neck/genetics , Transcriptome , Treatment Outcome , Tumor Microenvironment/genetics
20.
iScience ; 25(4): 104081, 2022 Apr 15.
Article in English | MEDLINE | ID: mdl-35372808

ABSTRACT

Human disease prediction from microbiome data has broad implications in metagenomics. It is rare for the existing methods to consider abundance profiles from both known and unknown microbial organisms, or capture the taxonomic relationships among microbial taxa, leading to significant information loss. On the other hand, deep learning has shown unprecedented advantages in classification tasks for its feature-learning ability. However, it encounters the opposite situation in metagenome-based disease prediction since high-dimensional low-sample-size metagenomic datasets can lead to severe overfitting; and black-box model fails in providing biological explanations. To circumvent the related problems, we developed MetaDR, a comprehensive machine learning-based framework that integrates various information and deep learning to predict human diseases. Experimental results indicate that MetaDR achieves competitive prediction performance with a reduction in running time, and effectively discovers the informative features with biological insights.

SELECTION OF CITATIONS
SEARCH DETAIL
...