Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 98
Filtrar
1.
bioRxiv ; 2024 Jun 03.
Artigo em Inglês | MEDLINE | ID: mdl-38895436

RESUMO

Background: Profiling circulating cell-free DNA (cfDNA) has become a fundamental practice in cancer medicine, but the effectiveness of cfDNA at elucidating tumor-derived molecular features has not been systematically compared to standard single-lesion tumor biopsies in prospective cohorts of patients. The use of plasma instead of tissue to guide therapy is particularly attractive for patients with small cell lung cancer (SCLC), a cancer whose aggressive clinical course making it exceedingly challenging to obtain tumor biopsies. Methods: Here, a prospective cohort of 49 plasma samples obtained before, during, and after treatment from 20 patients with recurrent SCLC, we study cfDNA low pass whole genome (0.1X coverage) and exome (130X) sequencing in comparison with time-point matched tumor, characterized using exome and transcriptome sequencing. Results: Direct comparison of cfDNA versus tumor biopsy reveals that cfDNA not only mirrors the mutation and copy number landscape of the corresponding tumor but also identifies clinically relevant resistance mechanisms and cancer driver alterations not found in matched tumor biopsies. Longitudinal cfDNA analysis reliably tracks tumor response, progression, and clonal evolution. Genomic sequencing coverage of plasma DNA fragments around transcription start sites shows distinct treatment-related changes and captures the expression of key transcription factors such as NEUROD1 and REST in the corresponding SCLC tumors, allowing prediction of SCLC neuroendocrine phenotypes and treatment responses. Conclusions: These findings have important implications for non-invasive stratification and subtype-specific therapies for patients with SCLC, now treated as a single disease.

2.
Mol Plant Pathol ; 25(6): e13468, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38808392

RESUMO

Phytophthora pathogens possess hundreds of effector genes that exhibit diverse expression patterns during infection, yet how the expression of effector genes is precisely regulated remains largely elusive. Previous studies have identified a few potential conserved transcription factor binding sites (TFBSs) in the promoters of Phytophthora effector genes. Here, we report a MYB-related protein, PsMyb37, in Phytophthora sojae, the major causal agent of root and stem rot in soybean. Yeast one-hybrid and electrophoretic mobility shift assays showed that PsMyb37 binds to the TACATGTA motif, the most prevalent TFBS in effector gene promoters. The knockout mutant of PsMyb37 exhibited significantly reduced virulence on soybean and was more sensitive to oxidative stress. Consistently, transcriptome analysis showed that numerous effector genes associated with suppressing plant immunity or scavenging reactive oxygen species were down-regulated in the PsMyb37 knockout mutant during infection compared to the wild-type P. sojae. Several promoters of effector genes were confirmed to drive the expression of luciferase in a reporter assay. These results demonstrate that a MYB-related transcription factor contributes to the expression of effector genes in P. sojae.


Assuntos
Phytophthora , Doenças das Plantas , Regiões Promotoras Genéticas , Fatores de Transcrição , Phytophthora/patogenicidade , Phytophthora/genética , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Regiões Promotoras Genéticas/genética , Doenças das Plantas/microbiologia , Doenças das Plantas/genética , Glycine max/microbiologia , Glycine max/genética , Virulência/genética
3.
Int J Mol Sci ; 25(7)2024 Apr 06.
Artigo em Inglês | MEDLINE | ID: mdl-38612878

RESUMO

We developed a procedure for locating genes on Drosophila melanogaster polytene chromosomes and described three types of chromosome structures (gray bands, black bands, and interbands), which differed markedly in morphological and genetic properties. This was reached through the use of our original methods of molecular and genetic analysis, electron microscopy, and bioinformatics data processing. Analysis of the genome-wide distribution of these properties led us to a bioinformatics model of the Drosophila genome organization, in which the genome was divided into two groups of genes. One was constituted by 65, in which the genome was divided into two groups, 62 genes that are expressed in most cell types during life cycle and perform basic cellular functions (the so-called "housekeeping genes"). The other one was made up of 3162 genes that are expressed only at particular stages of development ("developmental genes"). These two groups of genes are so different that we may state that the genome has two types of genetic organization. Different are the timings of their expression, chromatin packaging levels, the composition of activating and deactivating proteins, the sizes of these genes, the lengths of their introns, the organization of the promoter regions of the genes, the locations of origin recognition complexes (ORCs), and DNA replication timings.


Assuntos
Drosophila , Genes Essenciais , Animais , Drosophila/genética , Drosophila melanogaster/genética , Cromatina , Íntrons
4.
Pharmaceutics ; 16(4)2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38675205

RESUMO

Understanding the regulation of transgene expression is critical for the success of plasmid-based gene therapy and vaccine development. In this study, we used two sets of plasmid vectors containing secreted embryonic alkaline phosphatase or the mouse IL-10 gene as a reporter and investigated the role of promoter elements in regulating transgene expression in vivo. We demonstrated in mice that hydrodynamic transfer of plasmids with the CMV promoter resulted in a high level of reporter gene expression that declined rapidly over time. In contrast, when plasmids with albumin promoters were used, a lower but sustained gene expression pattern was observed. We also found that plasmids containing a shorter CMV promoter sequence with fewer transcription factor binding sites showed a decrease in the peak level of gene expression without changing the overall pattern of reporter gene expression. The replacement of regulatory elements in the CMV promoter with a single regulatory element of the albumin promoter changed the pattern of transient gene expression seen in the CMV promoter to a pattern of sustained gene expression identical to that of a full albumin promoter. ChIP analyses demonstrated an elevated binding of acetylated histones and TATA box-binding protein to the promoter carrying regulatory elements of the albumin promoter. These results suggest that the strength of a promoter is determined by the number of appropriate transcription factor binding sites, while gene expression persistence is determined by the presence of regulatory elements capable of recruiting epigenetic modifying complexes that make the promoter accessible for transcription. This study provides important insights into the mechanisms underlying gene expression regulation in vivo, which can be used to improve plasmid-based gene therapy and vaccine development.

5.
Hum Genomics ; 18(1): 12, 2024 Feb 02.
Artigo em Inglês | MEDLINE | ID: mdl-38308339

RESUMO

Genome-wide association studies (GWAS) are a powerful tool for detecting variants associated with complex traits and can help risk stratification and prevention strategies against pancreatic ductal adenocarcinoma (PDAC). However, the strict significance threshold commonly used makes it likely that many true risk loci are missed. Functional annotation of GWAS polymorphisms is a proven strategy to identify additional risk loci. We aimed to investigate single-nucleotide polymorphisms (SNP) in regulatory regions [transcription factor binding sites (TFBSs) and enhancers] that could change the expression profile of multiple genes they act upon and thereby modify PDAC risk. We analyzed a total of 12,636 PDAC cases and 43,443 controls from PanScan/PanC4 and the East Asian GWAS (discovery populations), and the PANDoRA consortium (replication population). We identified four associations that reached study-wide statistical significance in the overall meta-analysis: rs2472632(A) (enhancer variant, OR 1.10, 95%CI 1.06,1.13, p = 5.5 × 10-8), rs17358295(G) (enhancer variant, OR 1.16, 95%CI 1.10,1.22, p = 6.1 × 10-7), rs2232079(T) (TFBS variant, OR 0.88, 95%CI 0.83,0.93, p = 6.4 × 10-6) and rs10025845(A) (TFBS variant, OR 1.88, 95%CI 1.50,1.12, p = 1.32 × 10-5). The SNP with the most significant association, rs2472632, is located in an enhancer predicted to target the coiled-coil domain containing 34 oncogene. Our results provide new insights into genetic risk factors for PDAC by a focused analysis of polymorphisms in regulatory regions and demonstrating the usefulness of functional prioritization to identify loci associated with PDAC risk.


Assuntos
Carcinoma Ductal Pancreático , Neoplasias Pancreáticas , Humanos , Estudo de Associação Genômica Ampla , Predisposição Genética para Doença , Neoplasias Pancreáticas/genética , Neoplasias Pancreáticas/epidemiologia , Neoplasias Pancreáticas/patologia , Carcinoma Ductal Pancreático/genética , Carcinoma Ductal Pancreático/patologia , Sequências Reguladoras de Ácido Nucleico , Polimorfismo de Nucleotídeo Único/genética , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Sítios de Ligação/genética
6.
Artigo em Inglês | MEDLINE | ID: mdl-38347788

RESUMO

INTRODUCTION: Transcription factors are vital biological components that control gene expression, and their primary biological function is to recognize DNA sequences. As related research continues, it was found that the specificity of DNA-protein binding has a significant role in gene expression, regulation, and especially gene therapy. Convolutional Neural Networks (CNNs) have become increasingly popular for predicting DNa-protein-specific binding sites, but their accuracy in prediction needs to be improved. METHODS: We proposed a framework for combining multi-Instance Learning (MIL) and a hybrid neural network named WSHNN. First, we utilized sliding windows to split the DNA sequences into multiple overlapping instances, each instance containing multiple bags. Then, the instances were encoded using a K-mer encoding. Afterward, the scores of all instances in the same bag were calculated separately by a hybrid neural network. RESULTS: Finally, a fully connected network was utilized as the final prediction for that bag. The framework could achieve the performances of 90.73% in Pre, 82.77% in Recall, 87.17% in Acc, 0.8657 in F1-score, and 0.7462 in MCC, respectively. In addition, we discussed the performance of K-mer encoding. Compared with other art-of-the-state efforts, the model has better performance with sequence information. CONCLUSION: From the experimental results, it can be concluded that Bi-directional Long-ShortTerm Memory (Bi-LSTM) can better capture the long-sequence relationships between DNA sequences (the code and data can be visited at https://github.com/baowz12345/Weak_ Super_Network).

7.
R Soc Open Sci ; 11(1): 231088, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38269075

RESUMO

Transcription factor binding sites (TFBS), like other DNA sequence, evolve via mutation and selection relating to their function. Models of nucleotide evolution describe DNA evolution via single-nucleotide mutation. A stationary vector of such a model is the long-term distribution of nucleotides, unchanging under the model. Neutrally evolving sites may have uniform stationary vectors, but one expects that sites within a TFBS instead have stationary vectors reflective of the fitness of various nucleotides at those positions. We introduce 'position-specific stationary vectors' (PSSVs), the collection of stationary vectors at each site in a TFBS locus, analogous to the position weight matrix (PWM) commonly used to describe TFBS. We infer PSSVs for human TFs using two evolutionary models (Felsenstein 1981 and Hasegawa-Kishino-Yano 1985). We find that PSSVs reflect the nucleotide distribution from PWMs, but with reduced specificity. We infer ancestral nucleotide distributions at individual positions and calculate 'conditional PSSVs' conditioned on specific choices of majority ancestral nucleotide. We find that certain ancestral nucleotides exert a strong evolutionary pressure on neighbouring sequence while others have a negligible effect. Finally, we present a fast likelihood calculation for the F81 model on moderate-sized trees that makes this approach feasible for large-scale studies along these lines.

8.
Math Biosci Eng ; 20(9): 15809-15829, 2023 07 31.
Artigo em Inglês | MEDLINE | ID: mdl-37919990

RESUMO

Transcription factors (TFs) are important factors that regulate gene expression. Revealing the mechanism affecting the binding specificity of TFs is the key to understanding gene regulation. Most of the previous studies focus on TF-DNA binding sites at the sequence level, and they seldom utilize the contextual features of DNA sequences. In this paper, we develop an integrated spatiotemporal context-aware neural network framework, named GNet, for predicting TF-DNA binding signal at single nucleotide resolution by achieving three tasks: single nucleotide resolution signal prediction, identification of binding regions at the sequence level, and TF-DNA binding motif prediction. GNet extracts implicit spatial contextual information with a gated highway neural mechanism, which captures large context multi-level patterns using linear shortcut connections, and the idea of it permeates the encoder and decoder parts of GNet. The improved dual external attention mechanism, which learns implicit relationships both within and among samples, and improves the performance of the model. Experimental results on 53 human TF ChIP-seq datasets and 6 chromatin accessibility ATAC-seq datasets shows that GNet outperforms the state-of-the-art methods in the three tasks, and the results of cross-species studies on 15 human and 18 mouse TF datasets of the corresponding TF families indicate that GNet also shows the best performance in cross-species prediction over the competitive methods.


Assuntos
Nucleotídeos , Fatores de Transcrição , Humanos , Animais , Camundongos , Nucleotídeos/metabolismo , Ligação Proteica , Fatores de Transcrição/genética , Cromatina , DNA
9.
Mol Biol Evol ; 40(5)2023 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-37172323

RESUMO

Changes in transcription factor binding sites (TFBSs) can alter the spatiotemporal expression pattern and transcript abundance of genes. Loss and gain of TFBSs were shown to cause shifts in expression patterns in numerous cases. However, we know little about the evolution of extended regulatory sequences incorporating many TFBSs. We compare, across the crucifers (Brassicaceae, cabbage family), the sequences between the translated regions of Arabidopsis Bsister (ABS)-like MADS-box genes (including paralogous GOA-like genes) and the next gene upstream, as an example of family-wide evolution of putative upstream regulatory regions (PURRs). ABS-like genes are essential for integument development of ovules and endothelium formation in seeds of Arabidopsis thaliana. A combination of motif-based gene ontology enrichment and reporter gene analysis using A. thaliana as common trans-regulatory environment allows analysis of selected Brassicaceae Bsister gene PURRs. Comparison of TFBS of transcriptionally active ABS-like genes with those of transcriptionally largely inactive GOA-like genes shows that the number of in silico predicted TFBS) is similar between paralogs, emphasizing the importance of experimental verification for in silico characterization of TFBS activity and analysis of their evolution. Further, our data show highly conserved expression of Brassicaceae ABS-like genes almost exclusively in the chalazal region of ovules. The Arabidopsis-specific insertion of a transposable element (TE) into the ABS PURRs is required for stabilizing this spatially restricted expression, while other Brassicaceae achieve chalaza-specific expression without TE insertion. We hypothesize that the chalaza-specific expression of ABS is regulated by cis-regulatory elements provided by the TE.


Assuntos
Proteínas de Arabidopsis , Arabidopsis , Brassica , Brassicaceae , Arabidopsis/metabolismo , Brassicaceae/genética , Brassicaceae/metabolismo , Elementos de DNA Transponíveis , Proteínas de Arabidopsis/genética , Sementes/genética , Brassica/genética , Regulação da Expressão Gênica de Plantas
10.
Brief Bioinform ; 24(3)2023 05 19.
Artigo em Inglês | MEDLINE | ID: mdl-37114659

RESUMO

Cyclic AMP receptor proteins (CRPs) are important transcription regulators in many species. The prediction of CRP-binding sites was mainly based on position-weighted matrixes (PWMs). Traditional prediction methods only considered known binding motifs, and their ability to discover inflexible binding patterns was limited. Thus, a novel CRP-binding site prediction model called CRPBSFinder was developed in this research, which combined the hidden Markov model, knowledge-based PWMs and structure-based binding affinity matrixes. We trained this model using validated CRP-binding data from Escherichia coli and evaluated it with computational and experimental methods. The result shows that the model not only can provide higher prediction performance than a classic method but also quantitatively indicates the binding affinity of transcription factor binding sites by prediction scores. The prediction result included not only the most knowns regulated genes but also 1089 novel CRP-regulated genes. The major regulatory roles of CRPs were divided into four classes: carbohydrate metabolism, organic acid metabolism, nitrogen compound metabolism and cellular transport. Several novel functions were also discovered, including heterocycle metabolic and response to stimulus. Based on the functional similarity of homologous CRPs, we applied the model to 35 other species. The prediction tool and the prediction results are online and are available at: https://awi.cuhk.edu.cn/∼CRPBSFinder.


Assuntos
Proteína Receptora de AMP Cíclico , Proteínas de Escherichia coli , Proteína Receptora de AMP Cíclico/genética , Proteína Receptora de AMP Cíclico/química , Proteína Receptora de AMP Cíclico/metabolismo , Proteínas de Escherichia coli/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Sítios de Ligação/genética , Ligação Proteica/genética
11.
Proc Natl Acad Sci U S A ; 120(10): e2216907120, 2023 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-36853943

RESUMO

Ultraviolet (UV) light induces different classes of mutagenic photoproducts in DNA, namely cyclobutane pyrimidine dimers (CPDs), 6-4 photoproducts (6-4PPs), and atypical thymine-adenine photoproducts (TA-PPs). CPD formation is modulated by nucleosomes and transcription factors (TFs), which has important ramifications for Ultraviolet (UV) mutagenesis. How chromatin affects the formation of 6-4PPs and TA-PPs is unclear. Here, we use UV damage endonuclease-sequencing (UVDE-seq) to map these UV photoproducts across the yeast genome. Our results indicate that nucleosomes, the fundamental building block of chromatin, have opposing effects on photoproduct formation. Nucleosomes induce CPDs and 6-4PPs at outward rotational settings in nucleosomal DNA but suppress TA-PPs at these settings. Our data also indicate that DNA binding by different classes of yeast TFs causes lesion-specific hotspots of 6-4PPs or TA-PPs. For example, DNA binding by the TF Rap1 generally suppresses CPD and 6-4PP formation but induces a TA-PP hotspot. Finally, we show that 6-4PP formation is strongly induced at the binding sites of TATA-binding protein (TBP), which is correlated with higher mutation rates in UV-exposed yeast. These results indicate that the formation of 6-4PPs and TA-PPs is modulated by chromatin differently than CPDs and that this may have important implications for UV mutagenesis.


Assuntos
Cromatina , Saccharomyces cerevisiae , Cromatina/genética , Saccharomyces cerevisiae/genética , Nucleossomos/genética , Mutagênese , Mutagênicos , Adenina , Dímeros de Pirimidina/genética
12.
Methods Mol Biol ; 2594: 173-183, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36264496

RESUMO

Reconstruction of gene regulatory networks is a very important but difficult issue in plant sciences. Recently, numerous high-throughput techniques, such as chromatin immunoprecipitation sequencing (ChIP-seq) and DNA affinity purification sequencing (DAP-seq), have been developed to identify the genomic binding landscapes of regulatory factors. To understand the relationships among transcription factors (TFs) and their corresponding binding sites on target genes is usually the first step for elucidating gene regulatory mechanisms. Therefore, a good database for plant TFs and transcription factor binding sites (TFBSs) will be useful for starting a series of complex experiments. In this chapter, PlantPAN (version 3.0) is utilized as an example to explain how bioinformatics systems advance research on gene regulation.


Assuntos
Plantas , Fatores de Transcrição , Sítios de Ligação , Ligação Proteica , Plantas/genética , Plantas/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , DNA/metabolismo
13.
Front Plant Sci ; 13: 970018, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36082286

RESUMO

As a sessile organism, plants hold elaborate transcriptional regulatory systems that allow them to adapt to variable surrounding environments. Current understanding of plant regulatory mechanisms is greatly constrained by limited knowledge of transcription factor (TF)-DNA interactions. To mitigate this problem, a Plant-DTI predictor (Plant DBD-TFBS Interaction) was developed here as the first machine-learning model that covered the largest experimental datasets of 30 plant TF families, including 7 plant-specific DNA binding domain (DBD) types, and their transcription factor binding sites (TFBSs). Plant-DTI introduced a novel TFBS feature construction, called TFBS base-preference, which enhanced the specificity of TFBS to DBD types. The proposed model showed better predictive performance with the TFBS base-preference than the simple binary representation. Plant-DTI was validated with 22 independent ChIP-seq datasets. It accurately predicted the measured DBD-TFBS pairs along with their TFBS motifs, and effectively predicted interactions of other TFs containing similar DBD types. Comparing to the existing state-of-art methods, Plant-DTI prediction showed a figure of merit in sensitivity and specificity with respect to the position weight matrix (PWM) and TSPTFBS methods. Finally, the proposed Plant-DTI model helped to fill the knowledge gap in the regulatory mechanisms of the cassava sucrose synthase 1 gene (MeSUS1). Plant-DTI predicted MeERF72 as a regulator of MeSUS1 in consistence with the yeast one-hybrid (Y1H) experiment. Taken together, Plant-DTI would help facilitate the prediction of TF-TFBS and TF-target gene (TG) interactions, thereby accelerating the study of transcriptional regulatory systems in plant species.

14.
Plants (Basel) ; 11(14)2022 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-35890495

RESUMO

Salvia miltiorrhiza synthesises tanshinones with multidirectional therapeutic effects. These compounds have a complex biosynthetic pathway, whose first rate limiting enzyme is 3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGR). In the present study, a new 1646 bp fragment of the S. miltiorrhiza HMGR4 gene consisting of a promoter, 5' untranslated region and part of a coding sequence was isolated and characterised in silico using bioinformatics tools. The results indicate the presence of a TATA box, tandem repeat and pyrimidine-rich sequence, and the absence of CpG islands. The sequence was rich in motifs recognised by specific transcription factors sensitive mainly to light, salicylic acid, bacterial infection and auxins; it also demonstrated many binding sites for microRNAs. Moreover, our results suggest that HMGR4 expression is possibly regulated during flowering, embryogenesis, organogenesis and the circadian rhythm. The obtained data were verified by comparison with microarray co-expression results obtained for Arabidopsis thaliana. Alignment of the isolated HMGR4 sequence with other plant HMGRs indicated the presence of many common binding sites for transcription factors, including conserved ones. Our findings provide valuable information for understanding the mechanisms that direct transcription of the S. miltiorrhiza HMGR4 gene.

15.
Biochim Biophys Acta Gene Regul Mech ; 1865(5): 194847, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35901946

RESUMO

Transcriptional regulation is key in bacteria for providing an adequate response in time and space to changing environmental conditions. However, despite decades of research, the binding sites and therefore the target genes and the function of most transcription factors (TFs) remain unknown. Filling this gap in knowledge through conventional methods represents a colossal task which we demonstrate here can be significantly facilitated by a widespread feature in transcriptional control: the autoregulation of TFs implying that the yet unknown transcription factor binding site (TFBS) is neighboring the TF itself. In this work, we describe the "AURTHO" methodology (AUtoregulation of oRTHOlogous transcription factors), consisting of analyzing upstream regions of orthologous TFs in order to uncover their associated TFBSs. AURTHO enabled the de novo identification of novel TFBSs with an unprecedented improvement in terms of quantity and reliability. DNA-protein interaction studies on a selection of candidate cis-acting elements yielded an >90 % success rate, demonstrating the efficacy of AURTHO at highlighting true TF-TFBS couples and confirming the identification in a near future of a plethora of TFBSs across all bacterial species.


Assuntos
Sequências Reguladoras de Ácido Nucleico , Fatores de Transcrição , Sítios de Ligação , Homeostase , Reprodutibilidade dos Testes , Fatores de Transcrição/metabolismo
16.
Brief Funct Genomics ; 21(5): 357-375, 2022 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-35652477

RESUMO

Transcription factors are important cellular components of the process of gene expression control. Transcription factor binding sites are locations where transcription factors specifically recognize DNA sequences, targeting gene-specific regions and recruiting transcription factors or chromatin regulators to fine-tune spatiotemporal gene regulation. As the common proteins, transcription factors play a meaningful role in life-related activities. In the face of the increase in the protein sequence, it is urgent how to predict the structure and function of the protein effectively. At present, protein-DNA-binding site prediction methods are based on traditional machine learning algorithms and deep learning algorithms. In the early stage, we usually used the development method based on traditional machine learning algorithm to predict protein-DNA-binding sites. In recent years, methods based on deep learning to predict protein-DNA-binding sites from sequence data have achieved remarkable success. Various statistical and machine learning methods used to predict the function of DNA-binding proteins have been proposed and continuously improved. Existing deep learning methods for predicting protein-DNA-binding sites can be roughly divided into three categories: convolutional neural network (CNN), recursive neural network (RNN) and hybrid neural network based on CNN-RNN. The purpose of this review is to provide an overview of the computational and experimental methods applied in the field of protein-DNA-binding site prediction today. This paper introduces the methods of traditional machine learning and deep learning in protein-DNA-binding site prediction from the aspects of data processing characteristics of existing learning frameworks and differences between basic learning model frameworks. Our existing methods are relatively simple compared with natural language processing, computational vision, computer graphics and other fields. Therefore, the summary of existing protein-DNA-binding site prediction methods will help researchers better understand this field.


Assuntos
Algoritmos , Biologia Computacional , Sítios de Ligação , Cromatina , Biologia Computacional/métodos , DNA , Proteínas de Ligação a DNA , Fatores de Transcrição
17.
Aging (Albany NY) ; 14(12): 5163-5176, 2022 06 21.
Artigo em Inglês | MEDLINE | ID: mdl-35748775

RESUMO

BACKGROUND: Identification of candidate SNPs from transcription factors (TFs) is a novel concept, while systematic large-scale studies on these SNPs are scarce. PURPOSE: This study aimed to identify the SNPs of six TF binding sites (TFBSs) and examine the association between candidate SNPs and osteoporosis. METHODS: We used the Taiwan BioBank database; University of California, Santa Cruz, reference genome; and a chromatin immunoprecipitation sequencing database to detect 14 SNPs at the potential binding sites of six TFs. Moreover, we performed a case-control study and genotyped 109 patients with osteoporosis (T-score ≤ -2.5 evaluated by dual-energy X-ray absorptiometry) and 262 healthy individuals (T-score ≥ -1) at Tri-Service General Hospital from 2015 to 2019. Furthermore, we used the expression quantitative trait loci (eQTL) from the Genotype-Tissue Expression database to identify downstream gene expression as a criterion for the function of candidate SNPs. RESULTS: Bioinformatic analysis identified 14 SNPs of TFBSs influencing osteoporosis. Of these SNPs, the rs130347 CC + TC genotype had 0.57 times higher risk than the TT genotype (OR = 0.57, p = 0.031). Validation of eQTL analysis revealed that rs130347 T allele influences mRNA expression of downstream A4GALT in whole blood (p = 0.0041) and skeletal tissues (p = 0.011). CONCLUSIONS: We successfully identified the unique osteoporosis locus rs130347 in the Taiwanese and functionally validated this finding. In the future, this strategy can be expanded to other diseases to identify susceptible loci and achieve personalized precision medicine.


Assuntos
Osteoporose , Fatores de Transcrição , Estudos de Casos e Controles , Biologia Computacional , Expressão Gênica , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Osteoporose/genética , Polimorfismo de Nucleotídeo Único , Fatores de Transcrição/genética
18.
Biology (Basel) ; 11(5)2022 Apr 29.
Artigo em Inglês | MEDLINE | ID: mdl-35625412

RESUMO

Single nucleotide polymorphisms (SNPs) that are located in the promoter regions of genes and affect the binding of transcription factors (TFs) are called regulatory SNPs (rSNPs). Their identification can be highly valuable for the interpretation of genome-wide association studies (GWAS), since rSNPs can reveal the biologically causative variant and decipher the regulatory mechanisms behind a phenotype. In our previous work, we presented agReg-SNPdb, a database of regulatory SNPs for agriculturally important animal species. To complement this previous work, in this study we present the extension agReg-SNPdb-Plants storing rSNPs and their predicted effects on TF-binding for 13 agriculturally important plant species and subspecies (Brassica napus, Helianthus annuus, Hordeum vulgare, Oryza glaberrima, Oryza glumipatula, Oryza sativa Indica, Oryza sativa Japonica, Solanum lycopersicum, Sorghum bicolor, Triticum aestivum, Triticum turgidum, Vitis vinifera, and Zea mays). agReg-SNPdb-Plants can be queried via a web interface that allows users to search for SNP IDs, chromosomal regions, or genes. For a comprehensive interpretation of GWAS results or larger SNP-sets, it is possible to download the whole list of SNPs and their impact on transcription factor binding sites (TFBSs) from the website chromosome-wise.

19.
Development ; 149(7)2022 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-35394007

RESUMO

A long-standing biological question is how DNA cis-regulatory elements shape transcriptional patterns during metazoan development. Reporter constructs, cell culture assays and computational modeling have made major contributions to answering this question, but analysis of elements in their natural context is an important complement. Here, we mutate Notch-dependent LAG-1 binding sites (LBSs) in the endogenous Caenorhabditis elegans sygl-1 gene, which encodes a key stem cell regulator, and analyze the consequences on sygl-1 expression (nascent transcripts, mRNA, protein) and stem cell maintenance. Mutation of one LBS in a three-element cluster approximately halved both expression and stem cell pool size, whereas mutation of two LBSs essentially abolished them. Heterozygous LBS mutant clusters provided intermediate values. Our results lead to two major conclusions. First, both LBS number and configuration impact cluster activity: LBSs act additively in trans and synergistically in cis. Second, the SYGL-1 gradient promotes self-renewal above its functional threshold and triggers differentiation below the threshold. Our approach of coupling CRISPR/Cas9 LBS mutations with effects on both molecular and biological readouts establishes a powerful model for in vivo analyses of DNA cis-regulatory elements.


Assuntos
Caenorhabditis elegans , Elementos Reguladores de Transcrição , Células-Tronco , Animais , Caenorhabditis elegans/citologia , Caenorhabditis elegans/metabolismo , Proteínas de Caenorhabditis elegans/genética , Autorrenovação Celular , DNA/metabolismo , Proteínas de Ligação a DNA/genética , Receptores Notch , Células-Tronco/citologia
20.
Int J Mol Sci ; 23(3)2022 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-35163661

RESUMO

The identification of promoters is an essential step in the genome annotation process, providing a framework for gene regulatory networks and their role in transcription regulation. Despite considerable advances in the high-throughput determination of transcription start sites (TSSs) and transcription factor binding sites (TFBSs), experimental methods are still time-consuming and expensive. Instead, several computational approaches have been developed to provide fast and reliable means for predicting the location of TSSs and regulatory motifs on a genome-wide scale. Numerous studies have been carried out on the regulatory elements of mammalian genomes, but plant promoters, especially in gymnosperms, have been left out of the limelight and, therefore, have been poorly investigated. The aim of this study was to enhance and expand the existing genome annotations using computational approaches for genome-wide prediction of TSSs in the four conifer species: loblolly pine, white spruce, Norway spruce, and Siberian larch. Our pipeline will be useful for TSS predictions in other genomes, especially for draft assemblies, where reliable TSS predictions are not usually available. We also explored some of the features of the nucleotide composition of the predicted promoters and compared the GC properties of conifer genes with model monocot and dicot plants. Here, we demonstrate that even incomplete genome assemblies and partial annotations can be a reliable starting point for TSS annotation. The results of the TSS prediction in four conifer species have been deposited in the Persephone genome browser, which allows smooth visualization and is optimized for large data sets. This work provides the initial basis for future experimental validation and the study of the regulatory regions to understand gene regulation in gymnosperms.


Assuntos
Genoma de Planta , Traqueófitas/genética , Sítio de Iniciação de Transcrição , Composição de Bases/genética , Sítios de Ligação , DNA de Plantas/genética , Éxons/genética , Anotação de Sequência Molecular , Motivos de Nucleotídeos/genética , Nucleotídeos/metabolismo , Fases de Leitura Aberta/genética , Regiões Promotoras Genéticas , Fatores de Transcrição/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...