Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Comput Biol Med ; 172: 108316, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38503091

RESUMO

Influenza, a pervasive viral respiratory illness, remains a significant global health concern. The influenza A virus, capable of causing pandemics, necessitates timely identification of specific subtypes for effective prevention and control, as highlighted by the World Health Organization. The genetic diversity of influenza A virus, especially in the hemagglutinin protein, presents challenges for accurate subtype prediction. This study introduces PreIS as a novel pipeline utilizing advanced protein language models and supervised data augmentation to discern subtle differences in hemagglutinin protein sequences. PreIS demonstrates two key contributions: leveraging pre-trained protein language models for influenza subtype classification and utilizing supervised data augmentation to generate additional training data without extensive annotations. The effectiveness of the pipeline has been rigorously assessed through extensive experiments, demonstrating a superior performance with an impressive accuracy of 94.54% compared to the current state-of-the-art model, the MC-NN model, which achieves an accuracy of 89.6%. PreIS also exhibits proficiency in handling unknown subtypes, emphasizing the importance of early detection. Pioneering the classification of HxNy subtypes solely based on the hemagglutinin protein chain, this research sets a benchmark for future studies. These findings promise more precise and timely influenza subtype prediction, enhancing public health preparedness against influenza outbreaks and pandemics. The data and code underlying this article are available in https://github.com/CBRC-lab/PreIS.


Assuntos
Vírus da Influenza A , Influenza Humana , Humanos , Hemaglutininas , Glicoproteínas de Hemaglutininação de Vírus da Influenza/genética , Glicoproteínas de Hemaglutininação de Vírus da Influenza/metabolismo , Vírus da Influenza A/genética , Vírus da Influenza A/metabolismo , Sequência de Aminoácidos
2.
ACS Omega ; 8(47): 44757-44772, 2023 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-38046344

RESUMO

Drug failure during experimental procedures due to low bioactivity presents a significant challenge. To mitigate this risk and enhance compound bioactivities, predicting bioactivity classes during lead optimization is essential. The existing studies on structure-activity relationships have highlighted the connection between the chemical structures of compounds and their bioactivity. However, these studies often overlook the intricate relationship between drugs and bioactivity, which encompasses multiple factors beyond the chemical structure alone. To address this issue, we propose the BioAct-Het model, employing a heterogeneous siamese neural network to model the complex relationship between drugs and bioactivity classes, bringing them into a unified latent space. In particular, we introduce a novel representation for the bioactivity classes, called Bio-Prof, and enhance the original bioactivity data sets to tackle data scarcity. These innovative approaches resulted in our model outperforming the previous ones. The evaluation of BioAct-Het is conducted through three distinct strategies: association-based, bioactivity class-based, and compound-based. The association-based strategy utilizes supervised learning classification, while the bioactivity class-based strategy adopts a retrospective study evaluation approach. On the other hand, the compound-based strategy demonstrates similarities to the concept of meta-learning. Furthermore, the model's effectiveness in addressing real-world problems is analyzed through a case study on the application of vancomycin and oseltamivir for COVID-19 treatment as well as molnupiravir's potential efficacy in treating COVID-19 patients. The data and code underlying this article are available on https://github.com/CBRC-lab/BioAct-Het. However, data sets were derived from sources in the public domain.

3.
Sci Rep ; 13(1): 20795, 2023 11 27.
Artigo em Inglês | MEDLINE | ID: mdl-38012271

RESUMO

Breast cancer is a major global health concern, and recent researches have highlighted the critical roles of non-coding RNAs in both cancer and the immune system. The competing endogenous RNA hypothesis suggests that various types of RNA, including coding and non-coding RNAs, compete for microRNA targets, acting as molecular sponges. This study introduces the Pre_CLM_BCS pipeline to investigate the potential of long non-coding RNAs and circular RNAs as biomarkers in breast cancer subtypes. The pipeline identifies specific modules within each subtype that contain at least one long non-coding RNA or circular RNA exhibiting significantly distinct expression patterns when compared to other subtypes. The results reveal potential biomarker genes for each subtype, such as circ_001845, circ_001124, circ_003925, circ_000736, and circ_003996 for the basal-like subtype, circ_00306 and circ_00128 for the luminal B subtype, circ_000709 and NPHS1 for the normal-like subtype, CAMKV and circ_001855 for the luminal A subtype, and circ_00128 and circ_00173 for the HER2+ subtype. Additionally, certain long non-coding RNAs and circular RNAs, including RGS5-AS1, C6orf223, HHLA3-AS1, circ_000349, circ_003996, circ_003925, circ_002665, circ_001855, and DLEU1, are identified as potential regulators of T cell mechanisms, underscoring their importance in understanding breast cancer progression in various subtypes. This pipeline provides valuable insights into cancer and immune-related processes in breast cancer subtypes.


Assuntos
Neoplasias da Mama , MicroRNAs , Humanos , Feminino , RNA Circular/genética , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Regulação Neoplásica da Expressão Gênica , MicroRNAs/genética , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/metabolismo
4.
BMC Bioinformatics ; 24(1): 374, 2023 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-37789314

RESUMO

BACKGROUND: Drug repurposing is an approach that holds promise for identifying new therapeutic uses for existing drugs. Recently, knowledge graphs have emerged as significant tools for addressing the challenges of drug repurposing. However, there are still major issues with constructing and embedding knowledge graphs. RESULTS: This study proposes a two-step method called DrugRep-HeSiaGraph to address these challenges. The method integrates the drug-disease knowledge graph with the application of a heterogeneous siamese neural network. In the first step, a drug-disease knowledge graph named DDKG-V1 is constructed by defining new relationship types, and then numerical vector representations for the nodes are created using the distributional learning method. In the second step, a heterogeneous siamese neural network called HeSiaNet is applied to enrich the embedding of drugs and diseases by bringing them closer in a new unified latent space. Then, it predicts potential drug candidates for diseases. DrugRep-HeSiaGraph achieves impressive performance metrics, including an AUC-ROC of 91.16%, an AUC-PR of 90.32%, an accuracy of 84.63%, a BS of 0.119, and an MCC of 69.31%. CONCLUSION: We demonstrate the effectiveness of the proposed method in identifying potential drugs for COVID-19 as a case study. In addition, this study shows the role of dipeptidyl peptidase 4 (DPP-4) as a potential receptor for SARS-CoV-2 and the effectiveness of DPP-4 inhibitors in facing COVID-19. This highlights the practical application of the model in addressing real-world challenges in the field of drug repurposing. The code and data for DrugRep-HeSiaGraph are publicly available at https://github.com/CBRC-lab/DrugRep-HeSiaGraph .


Assuntos
COVID-19 , Reposicionamento de Medicamentos , Humanos , Reconhecimento Automatizado de Padrão , SARS-CoV-2 , Redes Neurais de Computação
5.
Bioinform Adv ; 3(1): vbad098, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37521309

RESUMO

Motivation: Metabolite-protein interactions play an important role in regulating protein functions and metabolism. Yet, predictions of metabolite-protein interactions using genome-scale metabolic networks are lacking. Here, we fill this gap by presenting a computational framework, termed SARTRE, that employs features corresponding to shadow prices determined in the context of flux variability analysis to predict metabolite-protein interactions using supervised machine learning. Results: By using gold standards for metabolite-protein interactomes and well-curated genome-scale metabolic models of Escherichia coli and Saccharomyces cerevisiae, we found that the implementation of SARTRE with random forest classifiers accurately predicts metabolite-protein interactions, supported by an average area under the receiver operating curve of 0.86 and 0.85, respectively. Ranking of features based on their importance for classification demonstrated the key role of shadow prices in predicting metabolite-protein interactions. The quality of predictions is further supported by the excellent agreement of the organism-specific classifiers on unseen interactions shared between the two model organisms. Further, predictions from SARTRE are highly competitive against those obtained from a recent deep-learning approach relying on a variety of protein and metabolite features. Together, these findings show that features extracted from constraint-based analyses of metabolic networks pave the way for understanding the functional roles of the interactions between proteins and small molecules. Availability and implementation: https://github.com/fayazsoleymani/SARTRE.

6.
J Chem Inf Model ; 63(8): 2532-2545, 2023 04 24.
Artigo em Inglês | MEDLINE | ID: mdl-37023229

RESUMO

Drug repurposing or repositioning (DR) refers to finding new therapeutic applications for existing drugs. Current computational DR methods face data representation and negative data sampling challenges. Although retrospective studies attempt to operate various representations, it is a crucial step for an accurate prediction to aggregate these features and bring the associations between drugs and diseases into a unified latent space. In addition, the number of unknown associations between drugs and diseases, which is considered negative data, is much higher than the number of known associations, or positive data, leading to an imbalanced dataset. In this regard, we propose the DrugRep-KG method, which applies a knowledge graph embedding approach for representing drugs and diseases, to address these challenges. Despite the typical DR methods that consider all unknown drug-disease associations as negative data, we select a subset of unknown associations, provided the disease occurs because of an adverse reaction to a drug. DrugRep-KG has been evaluated based on different settings and achieves an AUC-ROC (area under the receiver operating characteristic curve) of 90.83% and an AUC-PR (area under the precision-recall curve) of 90.10%, which are higher than in previous works. Besides, we checked the performance of our framework in finding potential drugs for coronavirus infection and skin-related diseases: contact dermatitis and atopic eczema. DrugRep-KG predicted beclomethasone for contact dermatitis, and fluorometholone, clocortolone, fluocinonide, and beclomethasone for atopic eczema, all of which have previously been proven to be effective in other studies. Fluorometholone for contact dermatitis is a novel suggestion by DrugRep-KG that should be validated experimentally. DrugRep-KG also predicted the associations between COVID-19 and potential treatments suggested by DrugBank, in addition to new drug candidates provided with experimental evidence. The data and code underlying this article are available at https://github.com/CBRC-lab/DrugRep-KG.


Assuntos
COVID-19 , Dermatite Atópica , Dermatite de Contato , Humanos , Reposicionamento de Medicamentos , Estudos Retrospectivos , Beclometasona , Fluormetolona , Reconhecimento Automatizado de Padrão , Algoritmos
7.
BMC Med Genomics ; 16(1): 12, 2023 01 23.
Artigo em Inglês | MEDLINE | ID: mdl-36691005

RESUMO

BACKGROUND: Autism is a neurodevelopmental disorder that is usually diagnosed in early childhood. Timely diagnosis and early initiation of treatments such as behavioral therapy are important in autistic people. Discovering critical genes and regulators in this disorder can lead to early diagnosis. Since the contribution of miRNAs along their targets can lead us to a better understanding of autism, we propose a framework containing two steps for gene and miRNA discovery. METHODS: The first step, called the FA_gene algorithm, finds a small set of genes involved in autism. This algorithm uses the WGCNA package to construct a co-expression network for control samples and seek modules of genes that are not reproducible in the corresponding co-expression network for autistic samples. Then, the protein-protein interaction network is constructed for genes in the non-reproducible modules and a small set of genes that may have potential roles in autism is selected based on this network. The second step, named the DMN_miRNA algorithm, detects the minimum number of miRNAs related to autism. To do this, DMN_miRNA defines an extended Set Cover algorithm over the mRNA-miRNA network, consisting of the selected genes and corresponding miRNA regulators. RESULTS: In the first step of the framework, the FA_gene algorithm finds a set of important genes; TP53, TNF, MAPK3, ACTB, TLR7, LCK, RAC2, EEF2, CAT, ZAP70, CD19, RPLP0, CDKN1A, CCL2, CDK4, CCL5, CTSD, CD4, RACK1, CD74; using co-expression and protein-protein interaction networks. In the second step, the DMN_miRNA algorithm extracts critical miRNAs, hsa-mir-155-5p, hsa-mir-17-5p, hsa-mir-181a-5p, hsa-mir-18a-5p, and hsa-mir-92a-1-5p, as signature regulators for autism using important genes and mRNA-miRNA network. The importance of these key genes and miRNAs is confirmed by previous studies and enrichment analysis. CONCLUSION: This study suggests FA_gene and DMN_miRNA algorithms for biomarker discovery, which lead us to a list of important players in ASD with potential roles in the nervous system or neurological disorders that can be experimentally investigated as candidates for ASD diagnostic tests.


Assuntos
Transtorno do Espectro Autista , MicroRNAs , Pré-Escolar , Humanos , Redes Reguladoras de Genes , MicroRNAs/genética , Biomarcadores , RNA Mensageiro/genética
8.
Comput Biol Chem ; 99: 107717, 2022 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-35802991

RESUMO

Profiles are used to model protein families and domains. They are built by multiple sequence alignments obtained by mapping a query sequence against a database to generate a profile based on the substitution scoring matrix. The profile applications are very dependent on the alignment algorithm and scoring system for amino acid substitution. However, sometimes there are no similar sequences in the database with the query sequence based on the scoring schema. In these cases, it is not possible to make a profile. This paper proposes a method named PA_SPP, based on pre-trained ProtAlbert transformer to predict the profile for a single protein sequence without alignment. The performance of transformers on natural languages is impressive. Protein sequences can be viewed as a language; we can benefit from these models. We analyze the attention heads in different layers of ProtAlbert to show that the transformer can capture five essential protein characteristics of a single sequence. This assessment shows that ProtAlbert considers some protein properties when suggesting amino acids for each position in the sequence. In other words, transformers can be considered an appropriate alternative for alignment and scoring schema to predict a profile. We evaluate PA_SPP on the Casp13 dataset, including 55 proteins. Meanwhile, one thermophilic and two mesophilic proteins are used as case studies. The results display high similarity between the predicted profiles and HSSP profiles.


Assuntos
Algoritmos , Proteínas , Sequência de Aminoácidos , Bases de Dados Factuais , Proteínas/química , Alinhamento de Sequência
9.
Helicobacter ; 25(6): e12731, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-32794288

RESUMO

OBJECTIVES: Disruption of protein synthesis, by drug-mediated restriction of the ribosomal nascent peptide exit tunnel (NPET), may inhibit bacterial growth. Here, we have studied the secondary and tertiary structures of domain V of the 23S rRNA in the wild-type and mutant (resistant) H. pylori strains and their mechanisms of interaction with clarithromycin (CLA). METHODS: H pylori strains, isolated from cultured gastric biopsies, underwent CLA susceptibility testing by E test, followed by PCR amplification and sequencing of domain V of 23S rRNA. The homology model of this domain in H pylori, in complex with L4 and L22 accessory proteins, was determined based on the E. coli ribosome 3D structure. The interactions between CLA and 23S rRNA complex were determined by molecular docking studies. RESULTS: Of the 70 H pylori strains, isolated from 200 dyspeptic patients, 11 (16%) were CLA-resistant. DNA sequencing identified categories with no (A), A2142G (B), and A2143G (C) mutations. Docking studies of our homology model of 23S rRNA complex with CLA showed deviated positions for categories B and C, in reference to category A, with 12.19 Å and 7.92 Å RMSD values, respectively. In both mutant categories, CLA lost its interactions at positions 2142 and 2587 and gained two new bonds with the L4 accessory protein. CONCLUSION: Our data suggest that, in mutant H pylori strains, once the nucleotides at positions 2142 and 2587 are detached from the drug, CLA interacts with and is peeled back by the L4 accessory protein, removing the drug-imposed spatial restriction of the NPET.


Assuntos
Antibacterianos , Claritromicina , Helicobacter pylori , Ribossomos/química , Antibacterianos/química , Antibacterianos/farmacologia , Claritromicina/química , Claritromicina/farmacologia , Farmacorresistência Bacteriana , Escherichia coli/efeitos dos fármacos , Infecções por Helicobacter/tratamento farmacológico , Helicobacter pylori/efeitos dos fármacos , Humanos , Testes de Sensibilidade Microbiana , Simulação de Acoplamento Molecular , RNA Ribossômico 23S
10.
Comput Biol Chem ; 86: 107232, 2020 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-32142982

RESUMO

The genetic information encoded in structural genes is decoded by an intracellular process called gene expression. This mechanism is regulated by epigenetic processes such as histone acetylation. Histone acetylation, which happens in nucleosomes, exposes DNA (genome) to transcription factors. Therefore, the correlation between histone acetylation and gene expression has been assessed as a fundamental issue in many previous studies. In the proposed research, we investigate which marks of histone acetylation are informative and which ones are redundant in the vicinity of SP1 transcription factor binding sites, in human CD4 + T cell. To achieve this, we use information theory methods. Subsequently, we apply a multilayer perceptron neural network to show that the selected histone acetylation marks by information theory methods are sufficiently informative. Finally, we use the neural network to predict binding sites of 17 other transcription factors on chromosomes 1 and 2. The results suggest that information conveyed by the selected histone acetylation marks are equivalent to that of all 18 marks associated with SP1 transcription factor binding sites on chromosome 1. Furthermore, almost 91.75 % of SP1 binding sites of chromosome 2 are predicted by the selected histone acetylation marks while all 18 marks predict 90.56 % correctly. Moreover, the selected histone acetylation marks are efficient at predicting 17 other types of transcription factor binding sites.


Assuntos
Linfócitos T CD4-Positivos/metabolismo , Histonas/metabolismo , Fatores de Transcrição/metabolismo , Acetilação , Sítios de Ligação , Humanos
11.
Microbiol Resour Announc ; 9(7)2020 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-32054708

RESUMO

The draft genome sequence of Pseudomonas aeruginosa LMG 1272, isolated from mushroom, is reported here. This strain triggers formation of a precipitate ("white line") when cocultured with Pseudomonas tolaasii However, LMG 1272 lacks the capacity to produce a cyclic lipopeptide that is typically associated with white line formation, suggesting the involvement of a different diffusible factor.

12.
Bull Math Biol ; 82(1): 11, 2020 01 14.
Artigo em Inglês | MEDLINE | ID: mdl-31933029

RESUMO

Cell cycle phase is a decisive factor in determining the repair pathway of DNA double-strand breaks (DSBs) by non-homologous end joining (NHEJ) or homologous recombination (HR). Recent experimental studies revealed that 53BP1 and BRCA1 are the key mediators of the DNA damage response (DDR) with antagonizing roles in choosing the appropriate DSB repair pathway in G1, S, and G2 phases. Here, we present a stochastic model of biochemical kinetics involved in detecting and repairing DNA DSBs induced by ionizing radiation during the cell cycle progression. A three-dimensional stochastic process is defined to monitor the cell cycle phase and DSBs repair at times after irradiation. To estimate the model parameters, a Metropolis Monte Carlo method is applied to perform maximum likelihood estimation utilizing the kinetics of γ-H2AX and RAD51 foci formation in G1, S, and G2 phases. The recruitment of DSB repair proteins is verified by comparing our model predictions with the corresponding experimental data on human cells after exposure to X and γ-radiation. Furthermore, the interaction between 53BP1 and BRCA1 is simulated for G1 and S/G2 phases determining the competition between NHEJ and HR pathways in repairing induced DSBs throughout the cell cycle. In accordance with recent biological data, the numerical results demonstrate that the maximum proportion of HR occurs in S phase cells and the high level of NHEJ takes place in G1 and G2 phases. Moreover, the stochastic realizations of the total yield of simple and complex DSBs ligation are compared for G1 and S/G2 damaged cells. Finally, the proposed stochastic model is validated when DSBs induced by different particle radiation such as iron, silicon, oxygen, proton, and carbon.


Assuntos
Ciclo Celular/fisiologia , Quebras de DNA de Cadeia Dupla , Reparo do DNA/fisiologia , Modelos Biológicos , Proteína BRCA1/metabolismo , Simulação por Computador , Reparo do DNA por Junção de Extremidades/fisiologia , Histonas/metabolismo , Humanos , Cinética , Funções Verossimilhança , Cadeias de Markov , Conceitos Matemáticos , Método de Monte Carlo , Rad51 Recombinase/metabolismo , Reparo de DNA por Recombinação/fisiologia , Processos Estocásticos , Proteína 1 de Ligação à Proteína Supressora de Tumor p53/metabolismo
13.
BMC Bioinformatics ; 20(1): 577, 2019 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-31726977

RESUMO

BACKGROUND: De novo drug discovery is a time-consuming and expensive process. Nowadays, drug repositioning is utilized as a common strategy to discover a new drug indication for existing drugs. This strategy is mostly used in cases with a limited number of candidate pairs of drugs and diseases. In other words, they are not scalable to a large number of drugs and diseases. Most of the in-silico methods mainly focus on linear approaches while non-linear models are still scarce for new indication predictions. Therefore, applying non-linear computational approaches can offer an opportunity to predict possible drug repositioning candidates. RESULTS: In this study, we present a non-linear method for drug repositioning. We extract four drug features and two disease features to find the semantic relations between drugs and diseases. We utilize deep learning to extract an efficient representation for each feature. These representations reduce the dimension and heterogeneity of biological data. Then, we assess the performance of different combinations of drug features to introduce a pipeline for drug repositioning. In the available database, there are different numbers of known drug-disease associations corresponding to each combination of drug features. Our assessment shows that as the numbers of drug features increase, the numbers of available drugs decrease. Thus, the proposed method with large numbers of drug features is as accurate as small numbers. CONCLUSION: Our pipeline predicts new indications for existing drugs systematically, in a more cost-effective way and shorter timeline. We assess the pipeline to discover the potential drug-disease associations based on cross-validation experiments and some clinical trial studies.


Assuntos
Aprendizado Profundo , Reposicionamento de Medicamentos , Preparações Farmacêuticas , Área Sob a Curva , Doença , Humanos , Análise de Componente Principal
14.
BMC Bioinformatics ; 19(1): 406, 2018 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-30400807

RESUMO

BACKGROUND: Nowadays, according to valuable resources of high-quality genome sequences, reference-based assembly methods with high accuracy and efficiency are strongly required. Many different algorithms have been designed for mapping reads onto a genome sequence which try to enhance the accuracy of reconstructed genomes. In this problem, one of the challenges occurs when some reads are aligned to multiple locations due to repetitive regions in the genomes. RESULTS: In this paper, our goal is to decrease the error rate of rebuilt genomes by resolving multi-mapping reads. To achieve this purpose, we reduce the search space for the reads which can be aligned against the genome with mismatches, insertions or deletions to decrease the probability of incorrect read mapping. We propose a pipeline divided to three steps: ExactMapping, InExactMapping, and MergingContigs, where exact and inexact reads are aligned in two separate phases. We test our pipeline on some simulated and real data sets by applying some read mappers. The results show that the two-step mapping of reads onto the contigs generated by a mapper such as Bowtie2, BWA and Yara is effective in improving the contigs in terms of error rate. CONCLUSIONS: Assessment results of our pipeline suggest that reducing the error rate of read mapping, not only can improve the genomes reconstructed by reference-based assembly in a reasonable running time, but can also have an impact on improving the genomes generated by de novo assembly. In fact, our pipeline produces genomes comparable to those of a multi-mapping reads resolution tool, namely MMR by decreasing the number of multi-mapping reads. Consequently, we introduce EIM as a post-processing step to genomes reconstructed by mappers.


Assuntos
Algoritmos , Biologia Computacional/métodos , Proteínas de Escherichia coli/genética , Escherichia coli/genética , Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Mapeamento Cromossômico , Humanos , Análise de Sequência de DNA/métodos , Software
15.
Math Med Biol ; 35(4): 517-539, 2018 12 05.
Artigo em Inglês | MEDLINE | ID: mdl-29237014

RESUMO

DNA double strand breaks (DSBs) are the most lethal lesions of DNA induced by ionizing radiation, industrial chemicals and a wide variety of drugs used in chemotherapy. In the context of DNA damage response system modelling, uncertainty may arise in several ways such as number of induced DSBs, kinetic rates and measurement error in observable quantities. Therefore, using the stochastic approaches is imperative to gain further insight into the dynamic behaviour of DSBs repair process. In this article, a continuous-time Markov chain (CTMC) model of the non-homologous end joining (NHEJ) mechanism is formulated according to the DSB complexity. Additionally, a Metropolis Monte Carlo method is used to perform maximum likelihood estimation of the kinetic rate constants. Here, the effects of fluctuating kinetic rates and DSBs induction rate of the NHEJ mechanism are investigated. The stochastic realizations of the total yield of simple and complex DSBs ligation are simulated to compare their asymptotic dynamics. Furthermore, it has been proved that the total yield of DSBs has a normal distribution for sufficiently large number of DSBs. In order to estimate the expected duration of repairing DSBs, the probability distribution of DSBs lifetime is calculated based on the CTMC NHEJ model. Moreover, the variability of total yield of DSBs during constant low-dose radiation is evaluated in the presented model. The findings indicate that in stochastic NHEJ model, when there is no new DSBs induction through the repair process, all DSBs are eventually repaired. However, when DSBs are induced by constant low-dose radiation, a number of DSBs remains un-repaired.


Assuntos
Reparo do DNA por Junção de Extremidades , Reparo do DNA , Modelos Teóricos , Humanos , Modelos Estatísticos
16.
Genes Genet Syst ; 92(6): 257-265, 2018 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-28757510

RESUMO

It has long been established that in addition to being involved in protein translation, RNA plays essential roles in numerous other cellular processes, including gene regulation and DNA replication. Such roles are known to be dictated by higher-order structures of RNA molecules. It is therefore of prime importance to find an RNA sequence that can fold to acquire a particular function that is desirable for use in pharmaceuticals and basic research. The challenge of finding an RNA sequence for a given structure is known as the RNA design problem. Although there are several algorithms to solve this problem, they mainly consider hard constraints, such as minimum free energy, to evaluate the predicted sequences. Recently, SHAPE data has emerged as a new soft constraint for RNA secondary structure prediction. To take advantage of this new experimental constraint, we report here a new method for accurate design of RNA sequences based on their secondary structures using SHAPE data as pseudo-free energy. We then compare our algorithm with four others: INFO-RNA, ERD, MODENA and RNAifold 2.0. Our algorithm precisely predicts 26 out of 29 new sequences for the structures extracted from the Rfam dataset, while the other four algorithms predict no more than 22 out of 29. The proposed algorithm is comparable to the above algorithms on RNA-SSD datasets, where they can predict up to 33 appropriate sequences for RNA secondary structures out of 34.


Assuntos
Dobramento de RNA/fisiologia , RNA/metabolismo , RNA/fisiologia , Algoritmos , Sequência de Bases , Simulação por Computador , Desenho Assistido por Computador , Conformação de Ácido Nucleico , Dobramento de RNA/genética , Software
17.
J Bioinform Comput Biol ; 15(6): 1750023, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29113564

RESUMO

Finding an effective measure to predict a more accurate RNA secondary structure is a challenging problem. In the last decade, an experimental method, known as selective [Formula: see text]-hydroxyl acylation analyzed by primer extension (SHAPE), was proposed to measure the tendency of forming a base pair for almost all nucleotides in an RNA sequence. These SHAPE reactivities are then utilized to improve the accuracy of RNA structure prediction. Due to a significant impact of SHAPE reactivity and in order to reduce the experimental costs, we propose a new model called HL-k-mer. This model simulates the SHAPE reactivity for each nucleotide in an RNA sequence. This is done by fetching the SHAPE reactivities for all sub-sequences of length k (k-mers) appearing in helix and loop regions. For evaluating the quality of simulated SHAPE data, ESD-Fold method is used based on the SHAPE data simulated by the HL-k-mer model ([Formula: see text]). Also, for further evaluation of simulated SHAPE data, three different methods are employed. We also extend this model to simulate the SHAPE data for the RNA pseudoknotted structure. The results indicate that the average accuracies of prediction using the SHAPE data simulated by our models (for [Formula: see text]) are higher compared to the experimental SHAPE data.


Assuntos
Biologia Computacional/métodos , RNA/química , Acilação , Bases de Dados Factuais , Modelos Moleculares , Conformação de Ácido Nucleico , Termodinâmica
18.
PLoS One ; 11(11): e0166965, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27893832

RESUMO

BACKGROUND: Non-coding RNAs perform a wide range of functions inside the living cells that are related to their structures. Several algorithms have been proposed to predict RNA secondary structure based on minimum free energy. Low prediction accuracy of these algorithms indicates that free energy alone is not sufficient to predict the functional secondary structure. Recently, the obtained information from the SHAPE experiment greatly improves the accuracy of RNA secondary structure prediction by adding this information to the thermodynamic free energy as pseudo-free energy. METHOD: In this paper, a new method is proposed to predict RNA secondary structure based on both free energy and SHAPE pseudo-free energy. For each RNA sequence, a population of secondary structures is constructed and their SHAPE data are simulated. Then, an evolutionary algorithm is used to improve each structure based on both free and pseudo-free energies. Finally, a structure with minimum summation of free and pseudo-free energies is considered as the predicted RNA secondary structure. RESULTS AND CONCLUSIONS: Computationally simulating the SHAPE data for a given RNA sequence requires its secondary structure. Here, we overcome this limitation by employing a population of secondary structures. This helps us to simulate the SHAPE data for any RNA sequence and consequently improves the accuracy of RNA secondary structure prediction as it is confirmed by our experiments. The source code and web server of our proposed method are freely available at http://mostafa.ut.ac.ir/ESD-Fold/.


Assuntos
Algoritmos , Simulação por Computador , Conformação de Ácido Nucleico , RNA/química , Humanos , Termodinâmica
19.
BMC Bioinformatics ; 17(1): 353, 2016 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-27597167

RESUMO

BACKGROUND: According to structure-dependent function of proteins, two main challenging problems called Protein Structure Prediction (PSP) and Inverse Protein Folding (IPF) are investigated. In spite of IPF essential applications, it has not been investigated as much as PSP problem. In fact, the ultimate goal of IPF problem or protein design is to create proteins with enhanced properties or even novel functions. One of the major computational challenges in protein design is its large sequence space, namely searching through all plausible sequences is impossible. Inasmuch as, protein secondary structure represents an appropriate primary scaffold of the protein conformation, undoubtedly studying the Protein Secondary Structure Inverse Folding (PSSIF) problem is a quantum leap forward in protein design, as it can reduce the search space. In this paper, a novel genetic algorithm which uses native secondary sub-structures is proposed to solve PSSIF problem. In essence, evolutionary information can lead the algorithm to design appropriate amino acid sequences respective to the target secondary structures. Furthermore, they can be folded to tertiary structures almost similar to their reference 3D structures. RESULTS: The proposed algorithm called GAPSSIF benefits from evolutionary information obtained by solved proteins in the PDB. Therefore, we construct a repository of protein secondary sub-structures to accelerate convergence of the algorithm. The secondary structure of designed sequences by GAPSSIF is comparable with those obtained by Evolver and EvoDesign. Although we do not explicitly consider tertiary structure features through the algorithm, the structural similarity of native and designed sequences declares acceptable values. CONCLUSIONS: Using the evolutionary information of native structures can significantly improve the quality of designed sequences. In fact, the combination of this information and effective features such as solvent accessibility and torsion angles leads IPF problem to an efficient solution. GAPSSIF can be downloaded at http://bioinformatics.aut.ac.ir/GAPSSIF/ .


Assuntos
Proteínas/química , Algoritmos , Sequência de Aminoácidos , Distribuição de Qui-Quadrado , Bases de Dados de Proteínas , Dobramento de Proteína , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína
20.
Genes Genet Syst ; 91(1): 47, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27440408

RESUMO

"J-STAGE Advance published date: 15 January 2015" on p. 317 should be changed to "J-STAGE Advance published date: 15 January 2016".

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...