Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Comput Biol Med ; 174: 108392, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38608321

RESUMO

Proteins must be sorted to specific subcellular compartments to perform their functions. Abnormal protein subcellular localizations are related to many diseases. Although many efforts have been made in predicting protein subcellular localization from various static information, including sequences, structures and interactions, such static information cannot predict protein mis-localization events in diseases. On the contrary, the IHC (immunohistochemistry) images, which have been widely applied in clinical diagnosis, contains information that can be used to find protein mis-localization events in disease states. In this study, we create the Vislocas method, which is capable of finding mis-localized proteins from IHC images as markers of cancer subtypes. By combining CNNs and vision transformer encoders, Vislocas can automatically extract image features at both global and local level. Vislocas can be trained with full-sized IHC images from scratch. It is the first attempt to create an end-to-end IHC image-based protein subcellular location predictor. Vislocas achieved comparable or better performances than state-of-the-art methods. We applied Vislocas to find significant protein mis-localization events in different subtypes of glioma, melanoma and skin cancer. The mis-localized proteins, which were found purely from IHC images by Vislocas, are in consistency with clinical or experimental results in literatures. All codes of Vislocas have been deposited in a Github repository (https://github.com/JingwenWen99/Vislocas). All datasets of Vislocas have been deposited in Zenodo (https://zenodo.org/records/10632698).


Assuntos
Imuno-Histoquímica , Humanos , Neoplasias/metabolismo , Neoplasias/classificação , Neoplasias/patologia , Proteínas de Neoplasias/metabolismo , Biomarcadores Tumorais/metabolismo , Processamento de Imagem Assistida por Computador/métodos
2.
Appl Opt ; 62(26): 7036-7043, 2023 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-37707044

RESUMO

We propose and experimentally demonstrate a tunable and switchable multi-wavelength erbium-doped fiber ring pulsed laser based on a nonlinear optical loop mirror (NOLM) and an improved Sagnac filter. To achieve multi-wavelength pulsed laser output, we adopt a NOLM as a quasi-saturable absorber and an improved Sagnac loop as a wavelength selected filter. The constructed laser has a maximum output wavelength number of five with a pulse repetition frequency of 40.45 kHz and pulse duration of 108 ns. The laser can output single-wavelength and dual-wavelength pulsed lasers within a certain wavelength tuning range and a five-wavelength pulsed laser with a constant wavelength interval of 3 nm by adjusting the polarization controller. Dual-wavelength, three-wavelength, and four-wavelength pulsed lasers with various wavelength intervals are also obtained. The output performance of the constructed laser is tested with a maximum average output power of 127.45 µW and minimum pulse duration of 52 ns, and the stability of the laser output is also tested with a maximum power fluctuation of 0.62 dB and minimum wavelength drift of 0.51 nm with pump power of 350 mW.

3.
Science ; 381(6661): 961-964, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37651514

RESUMO

Accretion of material onto a black hole drags any magnetic fields present inwards, increasing their strength. Theory predicts that sufficiently strong magnetic fields can halt the accretion flow, producing a magnetically arrested disk (MAD). We analyzed archival multiwavelength observations of an outburst from the black hole x-ray binary MAXI J1820+070 in 2018. The radio and optical fluxes were delayed compared with the x-ray flux by about 8 and 17 days, respectively. We interpret this as evidence for the formation of a MAD. In this scenario, the magnetic field is amplified by an expanding corona, forming a MAD around the time of the radio peak. We propose that the optical delay is due to thermal viscous instability in the outer disk.

4.
Front Neurosci ; 17: 1132290, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36908799

RESUMO

Introduction: Currently, it is still a challenge to detect single-trial P300 from electroencephalography (EEG) signals. In this paper, to address the typical problems faced by existing single-trial P300 classification, such as complex, time-consuming and low accuracy processes, a single-trial P300 classification algorithm based on multiplayer data fusion convolutional neural network (CNN) is proposed to construct a centralized collaborative brain-computer interfaces (cBCI) for fast and highly accurate classification of P300 EEG signals. Methods: In this paper, two multi-person data fusion methods (parallel data fusion and serial data fusion) are used in the data pre-processing stage to fuse multi-person EEG information stimulated by the same task instructions, and then the fused data is fed as input to the CNN for classification. In building the CNN network for single-trial P300 classification, the Conv layer was first used to extract the features of single-trial P300, and then the Maxpooling layer was used to connect the Flatten layer for secondary feature extraction and dimensionality reduction, thereby simplifying the computation. Finally batch normalisation is used to train small batches of data in order to better generalize the network and speed up single-trial P300 signal classification. Results: In this paper, the above new algorithms were tested on the Kaggle dataset and the Brain-Computer Interface (BCI) Competition III dataset, and by analyzing the P300 waveform features and EEG topography and the four standard evaluation metrics, namely Accuracy, Precision, Recall and F1-score,it was demonstrated that the single-trial P300 classification algorithm after two multi-person data fusion CNNs significantly outperformed other classification algorithms. Discussion: The results show that the single-trial P300 classification algorithm after two multi-person data fusion CNNs significantly outperformed the single-person model, and that the single-trial P300 classification algorithm with two multi-person data fusion CNNs involves smaller models, fewer training parameters, higher classification accuracy and improves the overall P300-cBCI classification rate and actual performance more effectively with a small amount of sample information compared to other algorithms.

5.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-36920063

RESUMO

Gene essentiality is defined as the extent to which a gene is required for the survival and reproductive success of a living system. It can vary between genetic backgrounds and environments. Essential protein coding genes have been well studied. However, the essentiality of non-coding regions is rarely reported. Most regions of human genome do not encode proteins. Determining essentialities of non-coding genes is demanded. We developed iEssLnc models, which can assign essentiality scores to lncRNA genes. As far as we know, this is the first direct quantitative estimation to the essentiality of lncRNA genes. By taking the advantage of graph neural network with meta-path-guided random walks on the lncRNA-protein interaction network, iEssLnc models can perform genome-wide screenings for essential lncRNA genes in a quantitative manner. We carried out validations and whole genome screening in the context of human cancer cell-lines and mouse genome. In comparisons to other methods, which are transferred from protein-coding genes, iEssLnc achieved better performances. Enrichment analysis indicated that iEssLnc essentiality scores clustered essential lncRNA genes with high ranks. With the screening results of iEssLnc models, we estimated the number of essential lncRNA genes in human and mouse. We performed functional analysis to find that essential lncRNA genes interact with microRNAs and cytoskeletal proteins significantly, which may be of interest in experimental life sciences. All datasets and codes of iEssLnc models have been deposited in GitHub (https://github.com/yyZhang14/iEssLnc).


Assuntos
MicroRNAs , Neoplasias , RNA Longo não Codificante , Humanos , Animais , Camundongos , Mapas de Interação de Proteínas , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , MicroRNAs/metabolismo , Redes Neurais de Computação
6.
Comput Biol Med ; 157: 106775, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36921458

RESUMO

The aberrant protein sorting has been observed in many conditions, including complex diseases, drug treatments, and environmental stresses. It is important to systematically identify protein mis-localization events in a given condition. Experimental methods for finding mis-localized proteins are always costly and time consuming. Predicting protein subcellular localizations has been studied for many years. However, only a handful of existing works considered protein subcellular location alterations. We proposed a computational method for identifying alterations of protein subcellular locations under drug treatments. We took three drugs, including TSA (trichostain A), bortezomib and tacrolimus, as instances for this study. By introducing dynamic protein-protein interaction networks, graph neural network algorithms were applied to aggregate topological information under different conditions. We systematically reported potential protein mis-localization events under drug treatments. As far as we know, this is the first attempt to find protein mis-localization events computationally in drug treatment conditions. Literatures validated that a number of proteins, which are highly related to pharmacological mechanisms of these drugs, may undergo protein localization alterations. We name our method as PLA-GNN (Protein Localization Alteration by Graph Neural Networks). It can be extended to other drugs and other conditions. All datasets and codes of this study has been deposited in a GitHub repository (https://github.com/quinlanW/PLA-GNN).


Assuntos
Algoritmos , Redes Neurais de Computação , Proteínas/metabolismo , Mapas de Interação de Proteínas , Poliésteres/metabolismo
7.
Interdiscip Sci ; 15(3): 433-438, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37000408

RESUMO

Over the last few years, an increasing number of protein mis-localization events have been reported under various conditions. It is important to understand these events and their relationship with complex disorders. Although many efforts had been made in establishing models with statistical or machine learning algorithms, a comprehensive database resource is still missing. Since the records of experimental-validated protein mis-localization events spread across many literatures, a collection of all these reports in a unique website is demanded. In this paper, we created the dbMisLoc database by manually curating conditional protein mis-localization events from various literatures. The dbMisLoc database records the protein localizations, mis-localizations, conditions for mis-localization, and the original reports. The dbMisLoc database allows the users to intuitively view, search, visualize and download protein mis-localization records. The dbMisLoc database integrates a BLAST search engine, which can search mis-localized proteins that are similar to user queries. The dbMisLoc database can be accessed directly through ( https://dbml.pufengdu.org ). The source code of dbMisLoc database is available from the GitHub repository ( https://github.com/quinlanW/dbMisLoc ) for free. Users can host their own mirrors of dbMisLoc database on their own servers. dbMisLoc is database for manually curated protein mis-localization events. It contains mis-localization events in 14 categories of conditions such as diseases, drug treatments and environmental stresses.


Assuntos
Proteínas , Software , Proteínas/metabolismo , Algoritmos , Bases de Dados Factuais , Aprendizado de Máquina
8.
Ann Vasc Surg ; 89: 302-311, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36334895

RESUMO

BACKGROUND: To explore whether simulation-based endovascular training with focus on radiation safety could improve correct behavior without jeopardizing the learning of procedural skills. METHODS: Twenty-four residents without previous endovascular experience completed 10 clinical scenarios on a virtual-reality endovascular simulator with software for peripheral endovascular interventions. Participants were randomized to receive feedback (n = 12) or not (n = 12) on radiation protection (RP) performance after each case. Expert assessments were done at the first, second, fourth, seventh, and 10th case on RP and endovascular skills (ES). Automatic simulator metrics on procedure time, contrast dose, handling errors, and estimated radiation exposure to patient and operator were registered. Outcome metrics were analyzed by two-way mixed analysis of variance pairwise comparisons with independent t-tests. Correlations were explored using Pearson's r for internal consistency reliability. RESULTS: The RP performance was similar in both groups at their first attempt (P = 0.61), but the feedback group significantly outperformed the control group over time (P < 0.001 for all comparisons). The feedback group was however slower to learn the ES at start (P = 0.047 at second performance), but after 7 attempts no difference was shown (P = 0.59). The feedback group used more time (19.5 vs. 15.3 min; P = 0.007) but less contrast (60 vs. 100 mL; P < 0.001). The number of errors was the same in both groups, but all metrics regarding radiation exposure favored the feedback group (P-values from 0.001 to 0.008). CONCLUSIONS: Simulation-based training (SBT) is effective to acquire basic endovascular intervention skills and concurrently learn RP behavior when feedback on radiation culture is provided.


Assuntos
Proteção Radiológica , Treinamento por Simulação , Humanos , Reprodutibilidade dos Testes , Análise e Desempenho de Tarefas , Resultado do Tratamento , Competência Clínica , Simulação por Computador
9.
Ind Eng Chem Res ; 62(34): 13554-13571, 2023 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-38356642

RESUMO

In safety-critical chemical reactors with potential hazards, reaction kinetics and heat transfer parameters are usually known, and a mathematical model is available. It is then meaningful to base fault detection and isolation algorithms on the first-principles model as opposed to statistics, so that physically meaningful residual signals are generated from material and/or energy balances not closing, leading to reliable fault diagnosis. Additionally, to maintain the safety of the entire system, it is necessary to take appropriate control action based on the mathematical model and the identified faults, to minimize their impact and thus ensure safe operation. In the present work, these ideas will be formulated and illustrated through a continuous stirred-tank reactor case study involving the liquid-phase oxidation of alkylpyridine with hydrogen peroxide. The proposed fault tolerant control strategy monitors the DSM (distance of the system state from the boundary of the dynamic safe set) and the estimate of the fault size, and when they cross a certain limit as a result of an abnormal event, the manipulated input is switched. Simulation results show the effectiveness of the proposed fault tolerant control strategy in dealing with cooling system failure.

10.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38168841

RESUMO

Silencers are repressive cis-regulatory elements that play crucial roles in transcriptional regulation. Experimental methods for identifying silencers are always costly and time-consuming. Computational methods, which relies on genomic sequence features, have been introduced as alternative approaches. However, silencers do not have significant epigenomic signature. Therefore, we explore a new way to computationally identify silencers, by incorporating chromatin structural information. We propose the SilenceREIN method, which focuses on finding silencers on anchors of chromatin loops. By using graph neural networks, we extracted chromatin structural information from a regulatory element interaction network. SilenceREIN integrated the chromatin structural information with linear genomic signatures to find silencers. The predictive performance of SilenceREIN is comparable or better than other states-of-the-art methods. We performed a genome-wide scanning to systematically find silencers in human genome. Results suggest that silencers are widespread on anchors of chromatin loops. In addition, enrichment analysis of transcription factor binding motif support our prediction results. As far as we can tell, this is the first attempt to incorporate chromatin structural information in finding silencers. All datasets and source codes of SilenceREIN have been deposited in a GitHub repository (https://github.com/JianHPan/SilenceREIN).


Assuntos
Cromatina , Elementos Silenciadores Transcricionais , Humanos , Cromatina/genética , Sequências Reguladoras de Ácido Nucleico , Genoma Humano , Redes Neurais de Computação
11.
Macromolecules ; 55(12): 5197-5212, 2022 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-35784657

RESUMO

Electrostatic interactions play a significant role in regulating biological systems and have received increasing attention due to their usefulness in designing advanced stimulus-responsive materials. Polypeptoids are highly tunable N-substituted peptidomimetic polymers that lack backbone hydrogen bonding and chirality. Therefore, polypeptoids are suitable systems to study the effect of noncovalent interactions of substituents without complications of backbone intramolecular and intermolecular hydrogen bonding. In this study, all-atom molecular dynamics (MD) simulations were performed on micelles formed by a series of sequence-defined ionic polypeptoid block copolymers consisting of a hydrophobic segment and a hydrophilic segment in an aqueous solution. By combining the results from MD simulations and experimental small-angle neutron scattering data, further insights were gained into the internal structure of the formed polypeptoid micelles, which is not always directly accessible from experiments. In addition, information was gained into the physics of the noncovalent interactions responsible for the self-assembly of weakly charged polypeptoids in an aqueous solution. While the aggregation number is governed by electrostatic repulsion of the negatively charged carboxylate (COO-) substituents on the polypeptoid chain within the micelle, MD simulations indicate that the position of the charge on singly charged chains mediates the shape of the micelle through the charge-dipole interactions between the COO- substituent and the surrounding water. Therefore, the polypeptoid micelles formed from the single-charged series offer the possibility for tailorable micelle shapes. In contrast, the polypeptoid micelles formed from the triple-charged series are characterized by more pronounced electrostatic repulsion that competes with more significant charge-sodium interactions, making it difficult to predict the shape of the micelles. This work has helped further develop design principles for the shape and structure of self-assembled micelles by controlling the position of charged moieties on the backbone of polypeptoid block copolymers.

12.
Comput Struct Biotechnol J ; 20: 2657-2663, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35685362

RESUMO

Long non-coding RNAs (lncRNAs) play important roles in many biological processes. Knocking out or knocking down some lncRNAs will lead to lethality or infertility. These lncRNAs are called essential lncRNAs. Knowledges of essential lncRNAs are important in establishing minimal genomes of living cells, developing drug therapies and early diagnostic approaches for complex diseases. However, existing databases focus on collecting essential coding genes. Essential non-coding gene records are rare in existing databases. A comprehensive collection of essential non-coding genes, particularly essential lncRNA genes, is demanded. We manually curated 207 essential lncRNAs from literatures for establishing a database on essential lncRNAs, which is named as dbEssLnc (Database of essential lncRNAs). The dbEssLnc database has a web-based user-friendly interface for the users to browse, to search, to visualize and to blast search records in the database. The dbEssLnc database is freely accessible at https://esslnc.pufengdu.org. All data and source codes for mirroring the dbEssLnc database have been deposited in GitHub (https://github.com/yyZhang14/dbEssLnc).

13.
Front Genet ; 13: 896925, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35591855

RESUMO

5-Hydroxymethylcytosine (5hmC), one of the most important RNA modifications, plays an important role in many biological processes. Accurately identifying RNA modification sites helps understand the function of RNA modification. In this work, we propose a computational method for identifying 5hmC-modified regions using machine learning algorithms. We applied a sequence feature embedding method based on the dna2vec algorithm to represent the RNA sequence. The results showed that the performance of our model is better that of than state-of-art methods. All dataset and source codes used in this study are available at: https://github.com/liu-h-y/5hmC_model.

14.
Front Genet ; 13: 877409, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35419029

RESUMO

MicroRNAs (miRNAs) play vital roles in gene expression regulations. Identification of essential miRNAs is of fundamental importance in understanding their cellular functions. Experimental methods for identifying essential miRNAs are always costly and time-consuming. Therefore, computational methods are considered as alternative approaches. Currently, only a handful of studies are focused on predicting essential miRNAs. In this work, we proposed to predict essential miRNAs using the XGBoost framework with CART (Classification and Regression Trees) on various types of sequence-based features. We named this method as XGEM (XGBoost for essential miRNAs). The prediction performance of XGEM is promising. In comparison with other state-of-the-art methods, XGEM performed the best, indicating its potential in identifying essential miRNAs.

15.
Front Genet ; 13: 864564, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35386279

RESUMO

Long noncoding RNAs (lncRNAs) play important roles in a variety of biological processes. Knocking out or knocking down some lncRNA genes can lead to death or infertility. These lncRNAs are called essential lncRNAs. Identifying the essential lncRNA is of importance for complex disease diagnosis and treatments. However, experimental methods for identifying essential lncRNAs are always costly and time consuming. Therefore, computational methods can be considered as an alternative approach. We propose a method to identify essential lncRNAs by combining network centrality measures and lncRNA sequence information. By constructing a lncRNA-protein-protein interaction network, we measure the essentiality of lncRNAs from their role in the network and their sequence together. We name our method as the systematic gene importance index (SGII). As far as we can tell, this is the first attempt to identify essential lncRNAs by combining sequence and network information together. The results of our method indicated that essential lncRNAs have similar roles in the LPPI network as the essential coding genes in the PPI network. Another encouraging observation is that the network information can significantly boost the predictive performance of sequence-based method. All source code and dataset of SGII have been deposited in a GitHub repository (https://github.com/ninglolo/SGII).

16.
Curr Gene Ther ; 22(3): 228-244, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34254917

RESUMO

Long non-coding RNAs (LncRNAs) are a type of RNA with little or no protein-coding ability. Their length is more than 200 nucleotides. A large number of studies have indicated that lncRNAs play a significant role in various biological processes, including chromatin organizations, epigenetic programmings, transcriptional regulations, post-transcriptional processing, and circadian mechanism at the cellular level. Since lncRNAs perform vast functions through their interactions with proteins, identifying lncRNA-protein interaction is crucial to the understandings of the lncRNA molecular functions. However, due to the high cost and time-consuming disadvantage of experimental methods, a variety of computational methods have emerged. Recently, many effective and novel machine learning methods have been developed. In general, these methods fall into two categories: semisupervised learning methods and supervised learning methods. The latter category can be further classified into the deep learning-based method, the ensemble learning-based method, and the hybrid method. In this paper, we focused on supervised learning methods. We summarized the state-of-the-art methods in predicting lncRNA-protein interactions. Furthermore, the performance and the characteristics of different methods have also been compared in this work. Considering the limits of the existing models, we analyzed the problems and discussed future research potentials.


Assuntos
RNA Longo não Codificante , Biologia Computacional/métodos , Regulação da Expressão Gênica , Aprendizado de Máquina , Proteínas/genética , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo
17.
IEEE J Biomed Health Inform ; 26(4): 1861-1871, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-34699377

RESUMO

ncRNAs play important roles in a variety of biological processes by interacting with RNA-binding proteins. Therefore, identifying ncRNA-protein interactions is important to understanding the biological functions of ncRNAs. Since experimental methods to determine ncRNA-protein interactions are always costly and time-consuming, computational methods have been proposed as alternative approaches. We developed a novel method NPI-RGCNAE (predicting ncRNA-Protein Interactions by the Relational Graph Convolutional Network Auto-Encoder). With a reliable negative sample selection strategy, we applied the Relational Graph Convolutional Network encoder and the DistMult decoder to predict ncRNA-protein interactions in an accurate and efficient way. By using the 5-fold cross-validation, we found that our method achieved a comparable performance to all state-of-the-art methods. Our method requires less than 10% training time of all state-of-the-art methods. It is a more efficient choice with large datasets in practice.


Assuntos
Biologia Computacional , RNA não Traduzido , Biologia Computacional/métodos , Humanos , RNA não Traduzido/metabolismo
18.
Genomics ; 113(6): 4052-4060, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34666191

RESUMO

Super-enhancer (SE) is a cluster of active typical enhancers (TE) with high levels of the Mediator complex, master transcriptional factors, and chromatin regulators. SEs play a key role in the control of cell identity and disease. Traditionally, scientists used a variety of high-throughput data of different transcriptional factors or chromatin marks to distinguish SEs from TEs. This kind of experimental methods are usually costly and time-consuming. In this paper, we proposed a model DeepSE, which is based on a deep convolutional neural network model, to distinguish the SEs from TEs. DeepSE represent the DNA sequences using the dna2vec feature embeddings. With only the DNA sequence information, DeepSE outperformed all state-of-the-art methods. In addition, DeepSE can be generalized well across different cell lines, which implied that cell-type specific SEs may share hidden sequence patterns across different cell lines. The source code and data are stored in GitHub (https://github.com/QiaoyingJi/DeepSE).


Assuntos
Cromatina , Elementos Facilitadores Genéticos , Linhagem Celular , Cromatina/genética , Redes Neurais de Computação , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
19.
Brief Bioinform ; 22(5)2021 09 02.
Artigo em Inglês | MEDLINE | ID: mdl-33822882

RESUMO

Noncoding RNAs (ncRNAs) play crucial roles in many biological processes. Experimental methods for identifying ncRNA-protein interactions (NPIs) are always costly and time-consuming. Many computational approaches have been developed as alternative ways. In this work, we collected five benchmarking datasets for predicting NPIs. Based on these datasets, we evaluated and compared the prediction performances of existing machine-learning based methods. Graph neural network (GNN) is a recently developed deep learning algorithm for link predictions on complex networks, which has never been applied in predicting NPIs. We constructed a GNN-based method, which is called Noncoding RNA-Protein Interaction prediction using Graph Neural Networks (NPI-GNN), to predict NPIs. The NPI-GNN method achieved comparable performance with state-of-the-art methods in a 5-fold cross-validation. In addition, it is capable of predicting novel interactions based on network information and sequence information. We also found that insufficient sequence information does not affect the NPI-GNN prediction performance much, which makes NPI-GNN more robust than other methods. As far as we can tell, NPI-GNN is the first end-to-end GNN predictor for predicting NPIs. All benchmarking datasets in this work and all source codes of the NPI-GNN method have been deposited with documents in a GitHub repo (https://github.com/AshuiRUA/NPI-GNN).


Assuntos
Aprendizado Profundo , Proteínas/metabolismo , RNA não Traduzido/metabolismo , Software , Benchmarking , Conjuntos de Dados como Assunto , Humanos , Internet , Ligação Proteica , Proteínas/genética , RNA não Traduzido/genética , Sensibilidade e Especificidade
20.
Brief Bioinform ; 22(4)2021 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-33147622

RESUMO

With the development of high-throughput sequencing technology, the genomic sequences increased exponentially over the last decade. In order to decode these new genomic data, machine learning methods were introduced for genome annotation and analysis. Due to the requirement of most machines learning methods, the biological sequences must be represented as fixed-length digital vectors. In this representation procedure, the physicochemical properties of k-tuple nucleotides are important information. However, the values of the physicochemical properties of k-tuple nucleotides are scattered in different resources. To facilitate the studies on genomic sequences, we developed the first comprehensive database, namely KNIndex (https://knindex.pufengdu.org), for depositing and visualizing physicochemical properties of k-tuple nucleotides. Currently, the KNIndex database contains 182 properties including one for mononucleotide (DNA), 169 for dinucleotide (147 for DNA and 22 for RNA) and 12 for trinucleotide (DNA). KNIndex database also provides a user-friendly web-based interface for the users to browse, query, visualize and download the physicochemical properties of k-tuple nucleotides. With the built-in conversion and visualization functions, users are allowed to display DNA/RNA sequences as curves of multiple physicochemical properties. We wish that the KNIndex will facilitate the related studies in computational biology.


Assuntos
DNA/genética , Bases de Dados de Ácidos Nucleicos , Sequenciamento de Nucleotídeos em Larga Escala , Nucleotídeos/genética , RNA/genética , Software , Genômica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...