Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Cancer Res Ther ; 18(1): 231-239, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35381789

RESUMO

Aims: Nonsmall-cell lung carcinoma comprises 85% of lung malignancies and is usually associated with a poor prognosis due to diagnosis at advanced stages. Molecular diagnosis of computerized tomography (CT)-guided biopsy has the potential to identify subtypes of lung carcinoma like adenocarcinoma (AC) and squamous cell carcinoma (SCC) along with its molecular stratification. This approach will help predict the genetic signature of lung cancer in individual patients. Subjects and Methods: Histopathologically proved a CT-guided biopsy sample of lung cancer cases was used to screen for the expression of microRNA (miRNA) earlier quantitated in blood plasma. Primers against hsa-miR2114, hsa-miR2115, hsa-miR2116, hsa-miR2117, hsa-miR449c, and hsa-miR548q with control RNU6 were used to screen 30 AC, 30 SCC, 5 nonspecific granulomatous inflammation, and 8 control samples. Reverse transcription polymerase chain reaction (RT-PCR) data revealed expression of hsa-miR2114 and hsa-miR548q in AC as well as SCC. Results: RT-PCR data revealed that the expression of hsa-miR2116 and hsa-miR449c was found upregulated in AC while hsa-miR2117 was expressed in SCC cases. Bioinformatic analysis revealed that genes, where these miRNAs are located, were also upregulated while targets of these miRNAs were downregulated. Conclusions: miRNAs expression pattern in the CT-guided biopsy samples can be used as a potential tool to differentially diagnose lung cancer subtypes. The expression pattern of miRNAs matches very well in blood plasma and tissue samples, albeit levels were very low in the earlier case than later. This approach can also be used for screening mutations and other molecular markers in a personalized manner for the management of lung cancer patients.


Assuntos
Neoplasias Pulmonares , MicroRNAs , Biópsia , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Pulmão/diagnóstico por imagem , Neoplasias Pulmonares/diagnóstico por imagem , Neoplasias Pulmonares/genética , MicroRNAs/genética , MicroRNAs/metabolismo , Tomografia Computadorizada por Raios X
2.
BMC Genomics ; 22(1): 336, 2021 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-33971818

RESUMO

BACKGROUND: Our understanding of genome regulation is ever-evolving with the continuous discovery of new modes of gene regulation, and transcriptomic studies of mammalian genomes have revealed the presence of a considerable population of non-coding RNA molecules among the transcripts expressed. One such non-coding RNA molecule is long non-coding RNA (lncRNA). However, the function of lncRNAs in gene regulation is not well understood; moreover, finding conserved lncRNA across species is a challenging task. Therefore, we propose a novel approach to identify conserved lncRNAs and functionally annotate these molecules. RESULTS: In this study, we exploited existing myogenic transcriptome data and identified conserved lncRNAs in mice and humans. We identified the lncRNAs expressing differentially between the early and later stages of muscle development. Differential expression of these lncRNAs was confirmed experimentally in cultured mouse muscle C2C12 cells. We utilized the three-dimensional architecture of the genome and identified topologically associated domains for these lncRNAs. Additionally, we correlated the expression of genes in domains for functional annotation of these trans-lncRNAs in myogenesis. Using this approach, we identified conserved lncRNAs in myogenesis and functionally annotated them. CONCLUSIONS: With this novel approach, we identified the conserved lncRNAs in myogenesis in humans and mice and functionally annotated them. The method identified a large number of lncRNAs are involved in myogenesis. Further studies are required to investigate the reason for the conservation of the lncRNAs in human and mouse while their sequences are dissimilar. Our approach can be used to identify novel lncRNAs conserved in different species and functionally annotated them.


Assuntos
RNA Longo não Codificante , Animais , Biologia Computacional , Genoma , Camundongos , Desenvolvimento Muscular/genética , RNA Longo não Codificante/genética , Transcriptoma
3.
Bioinformatics ; 37(1): 126-128, 2021 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-33367516

RESUMO

SUMMARY: Since its introduction, RNA-Seq technology has been used extensively in studies of pathogenic bacteria to identify and quantify differences in gene expression across multiple samples from bacteria exposed to different conditions. With some exceptions, tools for studying gene expression, determination of differential gene expression, downstream pathway analysis and normalization of data collected in extreme biological conditions is still lacking. Here, we describe ProkSeq, a user-friendly, fully automated RNA-Seq data analysis pipeline designed for prokaryotes. ProkSeq provides a wide variety of options for analysing differential expression, normalizing expression data and visualizing data and results. AVAILABILITY AND IMPLEMENTATION: ProkSeq is implemented in Python and is published under the MIT source license. The pipeline is available as a Docker container https://hub.docker.com/repository/docker/snandids/prokseq-v2.0, or can be used through Anaconda: https://anaconda.org/snandiDS/prokseq. The code is available on Github: https://github.com/snandiDS/prokseq and a detailed user documentation, including a manual and tutorial can be found at https://prokseqV20.readthedocs.io. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
J Comput Biol ; 27(8): 1313-1328, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-31855461

RESUMO

Multiple transcription factors (TFs) bind to specific sites in the genome and interact among themselves to form the cis-regulatory modules (CRMs). They are essential in modulating the expression of genes, and it is important to study this interplay to understand gene regulation. In the present study, we integrated experimentally identified TF binding sites collected from published studies with computationally predicted TF binding sites to identify Drosophila CRMs. Along with the detection of the previously known CRMs, this approach identified novel protein combinations. We determined high-occupancy target sites, where a large number of TFs bind. Investigating these sites revealed that Giant, Dichaete, and Knirp are highly enriched in these locations. A common TAG team motif was observed at these sites, which might play a role in recruiting other TFs. While comparing the binding sites at distal and proximal promoters, we found that certain regulatory TFs, such as Zelda, were highly enriched in enhancers. Our study has shown that, from the information available concerning the TF binding sites, the real CRMs could be predicted accurately and efficiently. Although we only may claim co-occurrence of these proteins in this study, it may actually point to their interaction (as known interaction proteins typically co-occur together). Such an integrative approach can, therefore, help us to provide a better understanding of the interplay among the factors, even though further experimental verification is required.


Assuntos
Proteínas de Drosophila/genética , Proteínas Nucleares/genética , Proteínas Repressoras/genética , Fatores de Transcrição SOX/genética , Fatores de Transcrição/genética , Animais , Sítios de Ligação/genética , Biologia Computacional , Proteínas de Ligação a DNA/genética , Regulação da Expressão Gênica/genética , Genoma de Inseto/genética , Elementos Reguladores de Transcrição , Sequências Reguladoras de Ácido Nucleico/genética , Software
5.
Sci Rep ; 9(1): 3753, 2019 03 06.
Artigo em Inglês | MEDLINE | ID: mdl-30842590

RESUMO

A larger amount of sequence data in private and public databases produced by next-generation sequencing put new challenges due to limitation associated with the alignment-based method for sequence comparison. So, there is a high need for faster sequence analysis algorithms. In this study, we developed an alignment-free algorithm for faster sequence analysis. The novelty of our approach is the inclusion of fuzzy integral with Markov chain for sequence analysis in the alignment-free model. The method estimate the parameters of a Markov chain by considering the frequencies of occurrence of all possible nucleotide pairs from each DNA sequence. These estimated Markov chain parameters were used to calculate similarity among all pairwise combinations of DNA sequences based on a fuzzy integral algorithm. This matrix is used as an input for the neighbor program in the PHYLIP package for phylogenetic tree construction. Our method was tested on eight benchmark datasets and on in-house generated datasets (18 s rDNA sequences from 11 arbuscular mycorrhizal fungi (AMF) and 16 s rDNA sequences of 40 bacterial isolates from plant interior). The results indicate that the fuzzy integral algorithm is an efficient and feasible alignment-free method for sequence analysis on the genomic scale.


Assuntos
Bactérias/genética , Biologia Computacional/métodos , Micorrizas/genética , Análise de Sequência de DNA/métodos , Algoritmos , Bactérias/isolamento & purificação , Análise por Conglomerados , DNA Ribossômico/genética , Lógica Fuzzy , Cadeias de Markov , Micorrizas/isolamento & purificação , Filogenia , Plantas/microbiologia
6.
Sci Rep ; 9(1): 2775, 2019 02 26.
Artigo em Inglês | MEDLINE | ID: mdl-30808983

RESUMO

Sequence comparison is an essential part of modern molecular biology research. In this study, we estimated the parameters of Markov chain by considering the frequencies of occurrence of the all possible amino acid pairs from each alignment-free protein sequence. These estimated Markov chain parameters were used to calculate similarity between two protein sequences based on a fuzzy integral algorithm. For validation, our result was compared with both alignment-based (ClustalW) and alignment-free methods on six benchmark datasets. The results indicate that our developed algorithm has a better clustering performance for protein sequence comparison.


Assuntos
Proteínas/química , Algoritmos , Sequência de Aminoácidos , Complexo I de Transporte de Elétrons/química , Complexo I de Transporte de Elétrons/classificação , Humanos , Cadeias de Markov , Proteínas Mitocondriais/química , Proteínas Mitocondriais/classificação , NADH Desidrogenase/química , NADH Desidrogenase/classificação , Filogenia , Proteínas/classificação , Alinhamento de Sequência
7.
J Biol Chem ; 293(37): 14342-14358, 2018 09 14.
Artigo em Inglês | MEDLINE | ID: mdl-30068546

RESUMO

Polycomb group proteins are essential epigenetic repressors. They form multiple protein complexes of which two kinds, PRC1 and PRC2, are indispensable for repression. Although much is known about their biochemical properties, how mammalian PRC1 and PRC2 are targeted to specific genes is poorly understood. Here, we establish the cyclin D2 (CCND2) oncogene as a simple model to address this question. We provide the evidence that the targeting of PRC1 to CCND2 involves a dedicated PRC1-targeting element (PTE). The PTE appears to act in concert with an adjacent cytosine-phosphate-guanine (CpG) island to arrange for the robust binding of PRC1 and PRC2 to repressed CCND2 Our findings pave the way to identify sequence-specific DNA-binding proteins implicated in the targeting of mammalian PRC1 complexes and provide novel link between polycomb repression and cancer.


Assuntos
Ciclina D2/genética , Ciclina D2/metabolismo , Oncogenes , Proteínas do Grupo Polycomb/metabolismo , Animais , Sítios de Ligação , Inativação Gênica , Humanos , Camundongos , Ligação Proteica , Transcrição Gênica
8.
Nucleic Acids Res ; 41(19): 8822-41, 2013 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-23913413

RESUMO

In higher organisms, gene regulation is controlled by the interplay of non-random combinations of multiple transcription factors (TFs). Although numerous attempts have been made to identify these combinations, important details, such as mutual positioning of the factors that have an important role in the TF interplay, are still missing. The goal of the present work is in silico mapping of some of such associating factors based on their mutual positioning, using computational screening. We have selected the process of myogenesis as a study case, and we focused on TF combinations involving master myogenic TF Myogenic differentiation (MyoD) with other factors situated at specific distances from it. The results of our work show that some muscle-specific factors occur together with MyoD within the range of ±100 bp in a large number of promoters. We confirm co-occurrence of the MyoD with muscle-specific factors as described in earlier studies. However, we have also found novel relationships of MyoD with other factors not specific for muscle. Additionally, we have observed that MyoD tends to associate with different factors in proximal and distal promoter areas. The major outcome of our study is establishing the genome-wide connection between biological interactions of TFs and close co-occurrence of their binding sites.


Assuntos
Proteína MyoD/metabolismo , Regiões Promotoras Genéticas , Fatores de Transcrição/metabolismo , Animais , Sítios de Ligação , Simulação por Computador , Elementos Facilitadores Genéticos , Humanos , Camundongos , Desenvolvimento Muscular/genética , Mioblastos/metabolismo
9.
BMC Genomics ; 13: 416, 2012 Aug 22.
Artigo em Inglês | MEDLINE | ID: mdl-22913572

RESUMO

BACKGROUND: The identifying of binding sites for transcription factors is a key component of gene regulatory network analysis. This is often done using position-weight matrices (PWMs). Because of the importance of in silico mapping of tentative binding sites, we previously developed an approach for PWM optimization that substantially improves the accuracy of such mapping. RESULTS: The present work implements the optimization algorithm applied to the existing PWM for GATA-3 transcription factor and builds a new di-nucleotide PWM. The existing available PWM is based on experimental data adopted from Jaspar. The optimized PWM substantially improves the sensitivity and specificity of the TF mapping compared to the conventional applications. The refined PWM also facilitates in silico identification of novel binding sites that are supported by experimental data. We also describe uncommon positioning of binding motifs for several T-cell lineage specific factors in human promoters. CONCLUSION: Our proposed di-nucleotide PWM approach outperforms the conventional mono-nucleotide PWM approach with respect to GATA-3. Therefore our new di-nucleotide PWM provides new insight into plausible transcriptional regulatory interactions in human promoters.


Assuntos
Sítios de Ligação , Biologia Computacional/métodos , Fator de Transcrição GATA3/genética , Matrizes de Pontuação de Posição Específica , Algoritmos , Bases de Dados Genéticas , Humanos , Regiões Promotoras Genéticas
10.
Nucleic Acids Res ; 40(17): 8227-39, 2012 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-22730291

RESUMO

The Six1 transcription factor is a homeodomain protein involved in controlling gene expression during embryonic development. Six1 establishes gene expression profiles that enable skeletal myogenesis and nephrogenesis, among others. While several homeodomain factors have been extensively characterized with regards to their DNA-binding properties, relatively little is known of the properties of Six1. We have used the genomic binding profile of Six1 during the myogenic differentiation of myoblasts to obtain a better understanding of its preferences for recognizing certain DNA sequences. DNA sequence analyses on our genomic binding dataset, combined with biochemical characterization using binding assays, reveal that Six1 has a much broader DNA-binding sequence spectrum than had been previously determined. Moreover, using a position weight matrix optimization algorithm, we generated a highly sensitive and specific matrix that can be used to predict novel Six1-binding sites with highest accuracy. Furthermore, our results support the idea of a mode of DNA recognition by this factor where Six1 itself is sufficient for sequence discrimination, and where Six1 domains outside of its homeodomain contribute to binding site selection. Together, our results provide new light on the properties of this important transcription factor, and will enable more accurate modeling of Six1 function in bioinformatic studies.


Assuntos
DNA/química , Proteínas de Homeodomínio/metabolismo , Animais , Sítios de Ligação , DNA/metabolismo , Genômica/métodos , Camundongos , Mioblastos/metabolismo , Motivos de Nucleotídeos , Matrizes de Pontuação de Posição Específica , Ligação Proteica , Estrutura Terciária de Proteína , Análise de Sequência de DNA
11.
Adv Bioinformatics ; 2011: 743782, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21541071

RESUMO

Various enzyme identification protocols involving homology transfer by sequence-sequence or profile-sequence comparisons have been devised which utilise Swiss-Prot sequences associated with EC numbers as the training set. A profile HMM constructed for a particular EC number might select sequences which perform a different enzymatic function due to the presence of certain fold-specific residues which are conserved in enzymes sharing a common fold. We describe a protocol, ModEnzA (HMM-ModE Enzyme Annotation), which generates profile HMMs highly specific at a functional level as defined by the EC numbers by incorporating information from negative training sequences. We enrich the training dataset by mining sequences from the NCBI Non-Redundant database for increased sensitivity. We compare our method with other enzyme identification methods, both for assigning EC numbers to a genome as well as identifying protein sequences associated with an enzymatic activity. We report a sensitivity of 88% and specificity of 95% in identifying EC numbers and annotating enzymatic sequences from the E. coli genome which is higher than any other method. With the next-generation sequencing methods producing a huge amount of sequence data, the development and use of fully automated yet accurate protocols such as ModEnzA is warranted for rapid annotation of newly sequenced genomes and metagenomic sequences.

12.
Mycopathologia ; 164(1): 1-17, 2007 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-17574539

RESUMO

In the absence of steroid receptors and any known mechanism of gene regulation by steroid hormones in Candida albicans, we did a genome-wide analysis of C. albicans cells treated with progesterone using Eurogentec cDNA microarrays to find the complete repertoire of steroid responsive genes. Northern blotting analysis was employed to validate the genes that were differentially regulated by progesterone in the microarray experiments. A total of 99 genes were found to be significantly regulated by progesterone, among them 60 were up-regulated and 39 were down-regulated. It was observed that progesterone considerably enhanced the expression of multi-drug resistance (MDR) genes belonging to ATP Binding Cassette (CDR1 and CDR2) super-family of multidrug transporters, suggesting a possible relationship between steroid stress and MDR genes. Several genes associated with hyphal induction and the establishment of pathogenesis were also found up-regulated. In silico search for various transcription factor (TF) binding sites in the promoter of the affected genes revealed that EFG1, CPH1, NRG1, TUP1, MIG1 and AP-1 regulated genes are responsive to progesterone. The stress responsive elements (STRE; AG(4) or C(4)T) were also found in the promoters of several responsive genes. Our data sheds new light on the regulation of gene expression in C. albicans by human steroids, and its correlation with drug resistance, virulence, morphogenesis and general stress response. A comparison with drug induced stress response has also been discussed.


Assuntos
Candida albicans/genética , Regulação Fúngica da Expressão Gênica/efeitos dos fármacos , Genoma Fúngico , Progesterona/farmacologia , Transportadores de Cassetes de Ligação de ATP/genética , Northern Blotting , Candida albicans/efeitos dos fármacos , Candida albicans/crescimento & desenvolvimento , Candidíase/microbiologia , Farmacorresistência Fúngica/genética , Proteínas Fúngicas/genética , Humanos , Análise de Sequência com Séries de Oligonucleotídeos
13.
BMC Bioinformatics ; 8: 104, 2007 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-17389042

RESUMO

BACKGROUND: Profile Hidden Markov Models (HMM) are statistical representations of protein families derived from patterns of sequence conservation in multiple alignments and have been used in identifying remote homologues with considerable success. These conservation patterns arise from fold specific signals, shared across multiple families, and function specific signals unique to the families. The availability of sequences pre-classified according to their function permits the use of negative training sequences to improve the specificity of the HMM, both by optimizing the threshold cutoff and by modifying emission probabilities to minimize the influence of fold-specific signals. A protocol to generate family specific HMMs is described that first constructs a profile HMM from an alignment of the family's sequences and then uses this model to identify sequences belonging to other classes that score above the default threshold (false positives). Ten-fold cross validation is used to optimise the discrimination threshold score for the model. The advent of fast multiple alignment methods enables the use of the profile alignments to align the true and false positive sequences, and the resulting alignments are used to modify the emission probabilities in the original model. RESULTS: The protocol, called HMM-ModE, was validated on a set of sequences belonging to six sub-families of the AGC family of kinases. These sequences have an average sequence similarity of 63% among the group though each sub-group has a different substrate specificity. The optimisation of discrimination threshold, by using negative sequences scored against the model improves specificity in test cases from an average of 21% to 98%. Further discrimination by the HMM after modifying model probabilities using negative training sequences is provided in a few cases, the average specificity rising to 99%. Similar improvements were obtained with a sample of G-Protein coupled receptors sub-classified with respect to their substrate specificity, though the average sequence identity across the sub-families is just 20.6%. The protocol is applied in a high-throughput classification exercise on protein kinases. CONCLUSION: The protocol has the potential to maximise the contributions of discriminating residues to classify proteins based on their molecular function, using pre-classified positive and negative sequence training data. The high specificity of the method, and increasing availability of pre-classified sequence data holds the potential for its application in sequence annotation.


Assuntos
Algoritmos , Inteligência Artificial , Reconhecimento Automatizado de Padrão/métodos , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Interpretação Estatística de Dados , Análise Discriminante , Cadeias de Markov , Modelos Químicos , Modelos Estatísticos
14.
BMC Genomics ; 6: 116, 2005 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-16150155

RESUMO

BACKGROUND: Theoretical proteome analysis, generated by plotting theoretical isoelectric points (pI) against molecular masses of all proteins encoded by the genome show a multimodal distribution for pI. This multimodal distribution is an effect of allowed combinations of the charged amino acids, and not due to evolutionary causes. The variation in this distribution can be correlated to the organisms ecological niche. Contributions to this variation maybe mapped to individual proteins by studying the variation in pI of orthologs across microorganism genomes. RESULTS: The distribution of ortholog pI values showed trimodal distributions for all prokaryotic genomes analyzed, similar to whole proteome plots. Pairwise analysis of pI variation show that a few COGs are conserved within, but most vary between, the acidic and basic regions of the distribution, while molecular mass is more highly conserved. At the level of functional grouping of orthologs, five groups vary significantly from the population of orthologs, which is attributed to either conservation at the level of sequences or a bias for either positively or negatively charged residues contributing to the function. Individual COGs conserved in both the acidic and basic regions of the trimodal distribution are identified, and orthologs that best represent the variation in levels of the acidic and basic regions are listed. CONCLUSION: The analysis of pI distribution by using orthologs provides a basis for resolution of theoretical proteome comparison at the level of individual proteins. Orthologs identified that significantly vary between the major acidic and basic regions maybe used as representative of the variation of the entire proteome.


Assuntos
Biologia Computacional/métodos , Genoma Bacteriano , Proteoma , Proteômica/métodos , Proteínas de Bactérias , Análise por Conglomerados , Simulação por Computador , Bases de Dados de Proteínas , Eletroforese em Gel Bidimensional , Concentração de Íons de Hidrogênio , Ponto Isoelétrico , Modelos Estatísticos , Fases de Leitura Aberta , Proteínas/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...