Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Small ; : e2400155, 2024 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-38644332

RESUMO

Nanopatterning driven by electrohydrodynamic (EHD) instability can aid in the resolution of the drawbacks inherent in conventional imprinting or other molding methods. This is because EHD force negates the requirement of physical contact and is easily tuned. However, its potential has not examined owing to the limited size of the pattern replica (several to tens of micrometers). Thus, this study proposes a new route for large-area patterning through high-speed evolution of EHD-driven pattern growth along the in-plane axis. Through the acceleration of the in-plane growth, while selectively controlling a specific edge growth, the pattern replica area can be extended from the micro- to centimeter scale with high fidelity. Moreover, even in the case of nonuniform contact mode, the proposed rapid in-plane growth mode facilitates uniform large-scale replication, which is not possible in conventional imprinting or other molding methods.

2.
PeerJ ; 12: e17006, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38426141

RESUMO

Single-cell omics sequencing has rapidly advanced, enabling the quantification of diverse omics profiles at a single-cell resolution. To facilitate comprehensive biological insights, such as cellular differentiation trajectories, precise annotation of cell subtypes is essential. Conventional methods involve clustering cells and manually assigning subtypes based on canonical markers, a labor-intensive and expert-dependent process. Hence, an automated computational prediction framework is crucial. While several classification frameworks for predicting cell subtypes from single-cell RNA sequencing datasets exist, these methods solely rely on single-omics data, offering insights at a single molecular level. They often miss inter-omic correlations and a holistic understanding of cellular processes. To address this, the integration of multi-omics datasets from individual cells is essential for accurate subtype annotation. This article introduces moSCminer, a novel framework for classifying cell subtypes that harnesses the power of single-cell multi-omics sequencing datasets through an attention-based neural network operating at the omics level. By integrating three distinct omics datasets-gene expression, DNA methylation, and DNA accessibility-while accounting for their biological relationships, moSCminer excels at learning the relative significance of each omics feature. It then transforms this knowledge into a novel representation for cell subtype classification. Comparative evaluations against standard machine learning-based classifiers demonstrate moSCminer's superior performance, consistently achieving the highest average performance on real datasets. The efficacy of multi-omics integration is further corroborated through an in-depth analysis of the omics-level attention module, which identifies potential markers for cell subtype annotation. To enhance accessibility and scalability, moSCminer is accessible as a user-friendly web-based platform seamlessly connected to a cloud system, publicly accessible at http://203.252.206.118:5568. Notably, this study marks the pioneering integration of three single-cell multi-omics datasets for cell subtype identification.


Assuntos
Multiômica , Redes Neurais de Computação , Aprendizado de Máquina , Metilação de DNA/genética
3.
Nano Lett ; 23(24): 11949-11957, 2023 Dec 27.
Artigo em Inglês | MEDLINE | ID: mdl-38079430

RESUMO

Electrohydrodynamic (EHD)-driven patterning is a pioneering lithographic technique capable of replicating and modifying micro/nanostructures efficiently. However, this process is currently restricted to conventional substrates, as it necessitates a uniform and robust electric field over a large area. Consequently, the use of nontraditional substrates, such as those that are flexible, nonflat, or have high insulation, has been notably limited. In our study, we extend the applicability of EHD-driven patterning by introducing a solvent-assisted capillary peel-and-transfer method that allows the successful removal of diverse EHD-induced structures from their original substrates. Compared with the traditional route, our process boasts a success rate close to 100%. The detached structures can then be efficiently transferred to nonconventional substrates, overcoming the limitations of the traditional EHD process. Our method exhibits significant versatility, as evidenced by successful transfer of structures with engineered wettability and patterned structures composed of metals and metal oxides onto nonconventional substrates.

4.
BMC Bioinformatics ; 24(1): 169, 2023 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-37101124

RESUMO

BACKGROUND: Breast cancer is a highly heterogeneous disease that comprises multiple biological components. Owing its diversity, patients have different prognostic outcomes; hence, early diagnosis and accurate subtype prediction are critical for treatment. Standardized breast cancer subtyping systems, mainly based on single-omics datasets, have been developed to ensure proper treatment in a systematic manner. Recently, multi-omics data integration has attracted attention to provide a comprehensive view of patients but poses a challenge due to the high dimensionality. In recent years, deep learning-based approaches have been proposed, but they still present several limitations. RESULTS: In this study, we describe moBRCA-net, an interpretable deep learning-based breast cancer subtype classification framework that uses multi-omics datasets. Three omics datasets comprising gene expression, DNA methylation and microRNA expression data were integrated while considering the biological relationships among them, and a self-attention module was applied to each omics dataset to capture the relative importance of each feature. The features were then transformed to new representations considering the respective learned importance, allowing moBRCA-net to predict the subtype. CONCLUSIONS: Experimental results confirmed that moBRCA-net has a significantly enhanced performance compared with other methods, and the effectiveness of multi-omics integration and omics-level attention were identified. moBRCA-net is publicly available at https://github.com/cbi-bioinfo/moBRCA-net .


Assuntos
Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Multiômica , Algoritmos , Redes Neurais de Computação
5.
BMC Bioinformatics ; 24(1): 168, 2023 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-37101254

RESUMO

BACKGROUND: Identification of the cancer subtype plays a crucial role to provide an accurate diagnosis and proper treatment to improve the clinical outcomes of patients. Recent studies have shown that DNA methylation is one of the key factors for tumorigenesis and tumor growth, where the DNA methylation signatures have the potential to be utilized as cancer subtype-specific markers. However, due to the high dimensionality and the low number of DNA methylome cancer samples with the subtype information, still, to date, a cancer subtype classification method utilizing DNA methylome datasets has not been proposed. RESULTS: In this paper, we present meth-SemiCancer, a semi-supervised cancer subtype classification framework based on DNA methylation profiles. The proposed model was first pre-trained based on the methylation datasets with the cancer subtype labels. After that, meth-SemiCancer generated the pseudo-subtypes for the cancer datasets without subtype information based on the model's prediction. Finally, fine-tuning was performed utilizing both the labeled and unlabeled datasets. CONCLUSIONS: From the performance comparison with the standard machine learning-based classifiers, meth-SemiCancer achieved the highest average F1-score and Matthews correlation coefficient, outperforming other methods. Fine-tuning the model with the unlabeled patient samples by providing the proper pseudo-subtypes, encouraged meth-SemiCancer to generalize better than the supervised neural network-based subtype classification method. meth-SemiCancer is publicly available at https://github.com/cbi-bioinfo/meth-SemiCancer .


Assuntos
Metilação de DNA , Neoplasias , Humanos , Aprendizado de Máquina Supervisionado , Neoplasias/genética , Aprendizado de Máquina , Redes Neurais de Computação
6.
Cancers (Basel) ; 13(19)2021 Sep 26.
Artigo em Inglês | MEDLINE | ID: mdl-34638295

RESUMO

The biological behavior of sebaceous carcinoma (SeC) is relatively indolent; however, local invasion or distant metastasis is sometimes reported. Nevertheless, a lack of understanding of the genetic background of SeC makes it difficult to apply effective systemic therapy. This study was designed to investigate major genetic alterations in SeCs in Korean patients. A total of 29 samples, including 20 ocular SeCs (SeC-Os) and 9 extraocular SeCs (SeC-EOs), were examined. Targeted next-generation sequencing tests including 171 cancer-related genes were performed. TP53 and PIK3CA genes were frequently mutated in both SeC-Os and SeC-EOs with slight predominance in SeC-Os, whereas the NOTCH1 gene was more commonly mutated in SeC-EOs. In clinical correlation, mutations in RUNX1 and ATM were associated with development of distant metastases, and alterations in MSH6 and BRCA1 were associated with inferior progression-free survival (all p < 0.05). In conclusion, our study revealed distinct genetic alterations between SeC-Os and SeC-EOs and some important prognostic molecular markers. Mutations in potentially actionable genes, including EGFR, ERBB2, and mismatch repair genes, were noted, suggesting consideration of a clinical trial in intractable cases.

7.
Bioinformatics ; 38(1): 275-277, 2021 12 22.
Artigo em Inglês | MEDLINE | ID: mdl-34185062

RESUMO

MOTIVATION: Multi-omics data in molecular biology has accumulated rapidly over the years. Such data contains valuable information for research in medicine and drug discovery. Unfortunately, data-driven research in medicine and drug discovery is challenging for a majority of small research labs due to the large volume of data and the complexity of analysis pipeline. RESULTS: We present BioVLAB-Cancer-Pharmacogenomics, a bioinformatics system that facilitates analysis of multi-omics data from breast cancer to analyze and investigate intratumor heterogeneity and pharmacogenomics on Amazon Web Services. Our system takes multi-omics data as input to perform tumor heterogeneity analysis in terms of TCGA data and deconvolve-and-match the tumor gene expression to cell line data in CCLE using DNA methylation profiles. We believe that our system can help small research labs perform analysis of tumor multi-omics without worrying about computational infrastructure and maintenance of databases and tools. AVAILABILITY AND IMPLEMENTATION: http://biohealth.snu.ac.kr/software/biovlab_cancer_pharmacogenomics. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias da Mama , Software , Humanos , Feminino , Multiômica , Farmacogenética , Neoplasias da Mama/tratamento farmacológico , Neoplasias da Mama/genética , Bases de Dados Factuais
8.
Mol Clin Oncol ; 14(5): 88, 2021 May.
Artigo em Inglês | MEDLINE | ID: mdl-33767857

RESUMO

Ependymomas are tumors of the central nervous system that can occur in patients of all ages. Guidelines from the World Health Organization (WHO) for the grading of ependymomas consider patient age, tumor resection range, tumor location and histopathological grade. However, recent studies have suggested that a greater focus on both tumor location and patient age in terms of transcriptomic, genetic, and epigenetic analyses may provide a more accurate assessment of clinical prognosis than the grading system proposed by WHO guidelines. The current study identified the differences and similarities in ependymoma characteristics using three different molecular analyses and methylation arrays. Primary intracranial ependymoma tissues were obtained from 13 Korean patients (9 adults and 4 children), after which whole-exome sequencing (WES), ion-proton comprehensive cancer panel (CCP) analysis, RNA sequencing, and Infinium HumanMethylation450 BeadChip array analysis was performed. Somatic mutations, copy number variations, and fusion genes were identified. It was observed that the methylation status and differentially expressed genes were significantly different according to tumor location and patient age. Several novel gene fusions and somatic mutations were identified, including a yes-associated protein 1 fusion mutation in a child with a good prognosis. Moreover, the methylation microarray revealed that genes associated with neurogenesis and neuron differentiation were hypermethylated in the adult group, whereas genes in the homeobox gene family were hypermethylated in the supratentorial (ST) group. The results confirmed the existence of significantly differentially expressed tumor-specific genes based on tumor location and patient age. These results provided valuable insight into the epigenetic and genetic profiles of intracranial ependymomas and uncovered potential strategies for the identification of location- and age-based ependymoma-related prognostic factors.

10.
Brief Bioinform ; 22(1): 66-76, 2021 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-32227074

RESUMO

Gene expressions are subtly regulated by quantifiable measures of genetic molecules such as interaction with other genes, methylation, mutations, transcription factor and histone modifications. Integrative analysis of multi-omics data can help scientists understand the condition or patient-specific gene regulation mechanisms. However, analysis of multi-omics data is challenging since it requires not only the analysis of multiple omics data sets but also mining complex relations among different genetic molecules by using state-of-the-art machine learning methods. In addition, analysis of multi-omics data needs quite large computing infrastructure. Moreover, interpretation of the analysis results requires collaboration among many scientists, often requiring reperforming analysis from different perspectives. Many of the aforementioned technical issues can be nicely handled when machine learning tools are deployed on the cloud. In this survey article, we first survey machine learning methods that can be used for gene regulation study, and we categorize them according to five different goals: gene regulatory subnetwork discovery, disease subtype analysis, survival analysis, clinical prediction and visualization. We also summarize the methods in terms of multi-omics input types. Then, we explain why the cloud is potentially a good solution for the analysis of multi-omics data, followed by a survey of two state-of-the-art cloud systems, Galaxy and BioVLAB. Finally, we discuss important issues when the cloud is used for the analysis of multi-omics data for the gene regulation study.


Assuntos
Computação em Nuvem , Biologia Computacional/métodos , Animais , Regulação da Expressão Gênica , Humanos , Aprendizado de Máquina
11.
BMC Bioinformatics ; 21(1): 181, 2020 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-32393170

RESUMO

BACKGROUND: Recently, DNA methylation has drawn great attention due to its strong correlation with abnormal gene activities and informative representation of the cancer status. As a number of studies focus on DNA methylation signatures in cancer, demand for utilizing publicly available methylome dataset has been increased. To satisfy this, large-scale projects were launched to discover biological insights into cancer, providing a collection of the dataset. However, public cancer data, especially for certain cancer types, is still limited to be used in research. Several simulation tools for producing epigenetic dataset have been introduced in order to alleviate the issue, still, to date, generation for user-specified cancer type dataset has not been proposed. RESULTS: In this paper, we present methCancer-gen, a tool for generating DNA methylome dataset considering type for cancer. Employing conditional variational autoencoder, a neural network-based generative model, it estimates the conditional distribution with latent variables and data, and generates samples for specified cancer type. CONCLUSIONS: To evaluate the simulation performance of methCancer-gen for the user-specified cancer type, our proposed model was compared to a benchmark method and it could successfully reproduce cancer type-wise data with high accuracy helping to alleviate the lack of condition-specific data issue. methCancer-gen is publicly available at https://github.com/cbi-bioinfo/methCancer-gen.


Assuntos
Algoritmos , Metilação de DNA/genética , Bases de Dados Genéticas , Neoplasias/genética , Simulação por Computador , Humanos , Redes Neurais de Computação , Máquina de Vetores de Suporte
12.
Sci Rep ; 9(1): 18835, 2019 12 11.
Artigo em Inglês | MEDLINE | ID: mdl-31827198

RESUMO

Clinical islet transplantation has recently been a promising treatment option for intractable type 1 diabetes patients. Although early graft loss has been well studied and controlled, the mechanisms of late graft loss largely remains obscure. Since long-term islet graft survival had not been achieved in islet xenotransplantation, it has been impossible to explore the mechanism of late islet graft loss. Fortunately, recent advances where consistent long-term survival (≥6 months) of adult porcine islet grafts was achieved in five independent, diabetic nonhuman primates (NHPs) enabled us to investigate on the late graft loss. Regardless of the conventional immune monitoring methods applied in the post-transplant period, the initiation of late graft loss could rarely be detected before the overt graft loss observed via uncontrolled blood glucose level. Thus, we retrospectively analyzed the gene expression profiles in 2 rhesus monkey recipients using peripheral blood RNA-sequencing (RNA-seq) data to find out the potential cause(s) of late graft loss. Bioinformatic analyses showed that highly relevant immunological pathways were activated in the animal which experienced late graft failure. Further connectivity analyses revealed that the activation of T cell signaling pathways was the most prominent, suggesting that T cell-mediated graft rejection could be the cause of the late-phase islet loss. Indeed, the porcine islets in the biopsied monkey liver samples were heavily infiltrated with CD3+ T cells. Furthermore, hypothesis test using a computational experiment reinforced our conclusion. Taken together, we suggest that bioinformatics analyses with peripheral blood RNA-seq could unveil the cause of insidious late islet graft loss.


Assuntos
Rejeição de Enxerto/genética , Hiperglicemia/cirurgia , Transplante das Ilhotas Pancreáticas , Macaca mulatta/cirurgia , RNA , Sus scrofa , Animais , Biologia Computacional , Regulação da Expressão Gênica , Rejeição de Enxerto/sangue , Macaca mulatta/genética , Macaca mulatta/imunologia , RNA/sangue , RNA/genética , Análise de Sequência de RNA , Transdução de Sinais , Linfócitos T/imunologia , Linfócitos T/metabolismo , Transplante Heterólogo
13.
BMC Bioinformatics ; 20(Suppl 16): 588, 2019 Dec 02.
Artigo em Inglês | MEDLINE | ID: mdl-31787073

RESUMO

BACKGROUND: Integrated analysis that uses multiple sample gene expression data measured under the same stress can detect stress response genes more accurately than analysis of individual sample data. However, the integrated analysis is challenging since experimental conditions (strength of stress and the number of time points) are heterogeneous across multiple samples. RESULTS: HTRgene is a computational method to perform the integrated analysis of multiple heterogeneous time-series data measured under the same stress condition. The goal of HTRgene is to identify "response order preserving DEGs" that are defined as genes not only which are differentially expressed but also whose response order is preserved across multiple samples. The utility of HTRgene was demonstrated using 28 and 24 time-series sample gene expression data measured under cold and heat stress in Arabidopsis. HTRgene analysis successfully reproduced known biological mechanisms of cold and heat stress in Arabidopsis. Also, HTRgene showed higher accuracy in detecting the documented stress response genes than existing tools. CONCLUSIONS: HTRgene, a method to find the ordering of response time of genes that are commonly observed among multiple time-series samples, successfully integrated multiple heterogeneous time-series gene expression datasets. It can be applied to many research problems related to the integration of time series data analysis.


Assuntos
Algoritmos , Arabidopsis/genética , Arabidopsis/fisiologia , Temperatura Baixa , Biologia Computacional/métodos , Genes de Plantas , Resposta ao Choque Térmico/genética , Transdução de Sinais/genética , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Redes Reguladoras de Genes , Fatores de Tempo , Fatores de Transcrição/metabolismo
14.
J Clin Med ; 8(3)2019 Mar 02.
Artigo em Inglês | MEDLINE | ID: mdl-30832348

RESUMO

Parathyroid adenoma is the main cause of primary hyperparathyroidism, which is characterized by enlarged parathyroid glands and excessive parathyroid hormone secretion. Here, we performed transcriptome analysis, comparing parathyroid adenomas with normal parathyroid gland tissue. RNA extracted from ten parathyroid adenoma and five normal parathyroid samples was sequenced, and differentially expressed genes (DEGs) were identified using strict cut-off criteria. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed using DEGs as the input, and protein-protein interaction (PPI) networks were constructed using Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) and visualized in Cytoscape. Among DEGs identified in parathyroid adenomas (n = 247; 45 up-regulated, 202 down-regulated), the top five GO terms for up-regulated genes were nucleoplasm, nucleus, transcription DNA-template, regulation of mRNA processing, and nucleic acid binding, while those for down-regulated genes were extracellular exosome, membrane endoplasmic reticulum (ER), membrane, ER, and melanosome. KEGG enrichment analysis revealed significant enrichment of five pathways: protein processing in ER, protein export, RNA transport, glycosylphosphatidylinositol-anchor biosynthesis, and pyrimidine metabolism. Further, PPI network analysis identified a densely connected sub-module, comprising eight hub molecules: SPCS2, RPL23, RPL26, RPN1, SEC11C, SEC11A, RPS25, and SEC61G. These findings may be helpful in further analysis of the mechanisms underlying parathyroid adenoma development.

15.
J Clin Med ; 8(1)2019 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-30658510

RESUMO

BACKGROUND: We investigated the associations between v-Raf murine sarcoma viral oncogene homolog B1 (BRAFV600E, henceforth BRAF) and v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (KRAS) mutations and colorectal cancer (CRC) prognosis, using The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GSE39582) datasets. MATERIALS AND METHODS: The effects of BRAF and KRAS mutations on overall survival (OS) and disease-free survival (DFS) of CRC were evaluated. RESULTS: The mutational status of BRAF and KRAS genes was not associated with overall survival (OS) or DFS of the CRC patients drawn from the TCGA database. The 3-year OS and DFS rates of the BRAF mutation (+) vs. mutation (-) groups were 92.6% vs. 90.4% and 79.7% vs. 68.4%, respectively. The 3-year OS and DFS rates of the KRAS mutation (+) vs. mutation (-) groups were 90.4% vs. 90.5% and 65.3% vs. 73.5%, respectively. In stage II patients, however, the 3-year OS rate was lower in the BRAF mutation (+) group than in the mutation (-) group (85.5% vs. 97.7%, p <0.001). The mutational status of BRAF genes of 497 CRC patients drawn from the GSE39582 database was not associated with OS or DFS. The 3-year OS and DFS rates of BRAF mutation (+) vs. mutation (-) groups were 75.7% vs. 78.9% and 73.6% vs. 71.1%, respectively. However, KRAS mutational status had an effect on 3-year OS rate (71.9% mutation (+) vs. 83% mutation (-), p = 0.05) and DFS rate (66.3% mutation (+) vs. 74.6% mutation (-), p = 0.013). CONCLUSIONS: We found no consistent association between the mutational status of BRAF nor KRAS and the OS and DFS of CRC patients from the TCGA and GSE39582 databases. Studies with longer-term records and larger patient numbers may be necessary to expound the influence of BRAF and KRAS mutations on the outcomes of CRC.

16.
J Bioinform Comput Biol ; 16(6): 1840028, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30567473

RESUMO

In recent years, there have been many studies utilizing DNA methylome data to answer fundamental biological questions. Bisulfite sequencing (BS-seq) has enabled measurement of a genome-wide absolute level of DNA methylation at single-nucleotide resolution. However, due to the ambiguity introduced by bisulfite-treatment, the aligning process especially in large-scale epigenetic research is still considered a huge burden. We present Cloud-BS, an efficient BS-seq aligner designed for parallel execution on a distributed environment. Utilizing Apache Hadoop framework, Cloud-BS splits sequencing reads into multiple blocks and transfers them to distributed nodes. By designing each aligning procedure into separate map and reducing tasks while an internal key-value structure is optimized based on the MapReduce programming model, the algorithm significantly improves alignment performance without sacrificing mapping accuracy. In addition, Cloud-BS minimizes the innate burden of configuring a distributed environment by providing a pre-configured cloud image. Cloud-BS shows significantly improved bisulfite alignment performance compared to other existing BS-seq aligners. We believe our algorithm facilitates large-scale methylome data analysis. The algorithm is freely available at https://paryoja.github.io/Cloud-BS/ .


Assuntos
Algoritmos , Computação em Nuvem , Metilação de DNA , Análise de Sequência de DNA/métodos , Sulfitos , Genoma Humano , Humanos , Software , Fluxo de Trabalho
17.
BMC Bioinformatics ; 19(1): 472, 2018 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-30526492

RESUMO

BACKGROUND: Bisulfite sequencing is one of the major high-resolution DNA methylation measurement method. Due to the selective nucleotide conversion on unmethylated cytosines after treatment with sodium bisulfite, processing bisulfite-treated sequencing reads requires additional steps which need high computational demands. However, a dearth of efficient aligner that is designed for bisulfite-treated sequencing becomes a bottleneck of large-scale DNA methylome analyses. RESULTS: In this study, we present a highly scalable, efficient, and load-balanced bisulfite aligner, BiSpark, which is designed for processing large volumes of bisulfite sequencing data. We implemented the BiSpark algorithm over the Apache Spark, a memory optimized distributed data processing framework, to achieve the maximum data parallel efficiency. The BiSpark algorithm is designed to support redistribution of imbalanced data to minimize delays on large-scale distributed environment. CONCLUSIONS: Experimental results on methylome datasets show that BiSpark significantly outperforms other state-of-the-art bisulfite sequencing aligners in terms of alignment speed and scalability with respect to dataset size and a number of computing nodes while providing highly consistent and comparable mapping results. AVAILABILITY: The implementation of BiSpark software package and source code is available at https://github.com/bhi-kimlab/BiSpark/ .


Assuntos
Alinhamento de Sequência , Análise de Sequência de DNA/métodos , Software , Sulfitos/química , Algoritmos , Metilação de DNA/genética , Humanos
18.
PLoS One ; 12(3): e0174999, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28362846

RESUMO

miRNAs are small non-coding RNAs that regulate gene expression by binding to the 3'-UTR of genes. Many recent studies have reported that miRNAs play important biological roles by regulating specific mRNAs or genes. Many sequence-based target prediction algorithms have been developed to predict miRNA targets. However, these methods are not designed for condition-specific target predictions and produce many false positives; thus, expression-based target prediction algorithms have been developed for condition-specific target predictions. A typical strategy to utilize expression data is to leverage the negative control roles of miRNAs on genes. To control false positives, a stringent cutoff value is typically set, but in this case, these methods tend to reject many true target relationships, i.e., false negatives. To overcome these limitations, additional information should be utilized. The literature is probably the best resource that we can utilize. Recent literature mining systems compile millions of articles with experiments designed for specific biological questions, and the systems provide a function to search for specific information. To utilize the literature information, we used a literature mining system, BEST, that automatically extracts information from the literature in PubMed and that allows the user to perform searches of the literature with any English words. By integrating omics data analysis methods and BEST, we developed Context-MMIA, a miRNA-mRNA target prediction method that combines expression data analysis results and the literature information extracted based on the user-specified context. In the pathway enrichment analysis using genes included in the top 200 miRNA-targets, Context-MMIA outperformed the four existing target prediction methods that we tested. In another test on whether prediction methods can re-produce experimentally validated target relationships, Context-MMIA outperformed the four existing target prediction methods. In summary, Context-MMIA allows the user to specify a context of the experimental data to predict miRNA targets, and we believe that Context-MMIA is very useful for predicting condition-specific miRNA targets.


Assuntos
Literatura , MicroRNAs/metabolismo , RNA Mensageiro/metabolismo , Regiões 3' não Traduzidas/genética , Regiões 3' não Traduzidas/fisiologia , Algoritmos , Biologia Computacional , Humanos , PubMed , Software
19.
Biol Direct ; 11(1): 57, 2016 10 24.
Artigo em Inglês | MEDLINE | ID: mdl-27776539

RESUMO

MOTIVATION: Transcriptome data from the gene knockout experiment in mouse is widely used to investigate functions of genes and relationship to phenotypes. When a gene is knocked out, it is important to identify which genes are affected by the knockout gene. Existing methods, including differentially expressed gene (DEG) methods, can be used for the analysis. However, existing methods require cutoff values to select candidate genes, which can produce either too many false positives or false negatives. This hurdle can be addressed either by improving the accuracy of gene selection or by providing a method to rank candidate genes effectively, or both. Prioritization of candidate genes should consider the goals or context of the knockout experiment. As of now, there are no tools designed for both selecting and prioritizing genes from the mouse knockout data. Hence, the necessity of a new tool arises. RESULTS: In this study, we present CLIP-GENE, a web service that selects gene markers by utilizing differentially expressed genes, mouse transcription factor (TF) network, and single nucleotide variant information. Then, protein-protein interaction network and literature information are utilized to find genes that are relevant to the phenotypic differences. One of the novel features is to allow researchers to specify their contexts or hypotheses in a set of keywords to rank genes according to the contexts that the user specify. We believe that CLIP-GENE will be useful in characterizing functions of TFs in mouse experiments. AVAILABILITY: http://epigenomics.snu.ac.kr/CLIP-GENE REVIEWERS: This article was reviewed by Dr. Lee and Dr. Pongor.


Assuntos
Biologia Computacional/métodos , Fatores de Transcrição/genética , Transcriptoma , Animais , Internet , Camundongos , Camundongos Knockout , Análise de Sequência com Séries de Oligonucleotídeos
20.
Methods ; 111: 64-71, 2016 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-27477210

RESUMO

Measuring gene expression, DNA sequence variation, and DNA methylation status is routinely done using high throughput sequencing technologies. To analyze such multi-omics data and explore relationships, reliable bioinformatics systems are much needed. Existing systems are either for exploring curated data or for processing omics data in the form of a library such as R. Thus scientists have much difficulty in investigating relationships among gene expression, DNA sequence variation, and DNA methylation using multi-omics data. In this study, we report a system called BioVLAB-mCpG-SNP-EXPRESS for the integrated analysis of DNA methylation, sequence variation (SNPs), and gene expression for distinguishing cellular phenotypes at the pairwise and multiple phenotype levels. The system can be deployed on either the Amazon cloud or a publicly available high-performance computing node, and the data analysis and exploration of the analysis result can be conveniently done using a web-based interface. In order to alleviate analysis complexity, all the process are fully automated, and graphical workflow system is integrated to represent real-time analysis progression. The BioVLAB-mCpG-SNP-EXPRESS system works in three stages. First, it processes and analyzes multi-omics data as input in the form of the raw data, i.e., FastQ files. Second, various integrated analyses such as methylation vs. gene expression and mutation vs. methylation are performed. Finally, the analysis result can be explored in a number of ways through a web interface for the multi-level, multi-perspective exploration. Multi-level interpretation can be done by either gene, gene set, pathway or network level and multi-perspective exploration can be explored from either gene expression, DNA methylation, sequence variation, or their relationship perspective. The utility of the system is demonstrated by performing analysis of phenotypically distinct 30 breast cancer cell line data set. BioVLAB-mCpG-SNP-EXPRESS is available at http://biohealth.snu.ac.kr/software/biovlab_mcpg_snp_express/.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software , Metilação de DNA/genética , Bases de Dados Genéticas , Variação Genética , Humanos , Polimorfismo de Nucleotídeo Único/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...