Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 82
Filter
1.
BMC Med Genomics ; 16(1): 126, 2023 06 09.
Article in English | MEDLINE | ID: mdl-37296477

ABSTRACT

BACKGROUND: Hereditary genetic mutations causing predisposition to colorectal cancer are accountable for approximately 30% of all colorectal cancer cases. However, only a small fraction of these are high penetrant mutations occurring in DNA mismatch repair genes, causing one of several types of familial colorectal cancer (CRC) syndromes. Most of the mutations are low-penetrant variants, contributing to an increased risk of familial colorectal cancer, and they are often found in additional genes and pathways not previously associated with CRC. The aim of this study was to identify such variants, both high-penetrant and low-penetrant ones. METHODS: We performed whole exome sequencing on constitutional DNA extracted from blood of 48 patients suspected of familial colorectal cancer and used multiple in silico prediction tools and available literature-based evidence to detect and investigate genetic variants. RESULTS: We identified several causative and some potentially causative germline variants in genes known for their association with colorectal cancer. In addition, we identified several variants in genes not typically included in relevant gene panels for colorectal cancer, including CFTR, PABPC1 and TYRO3, which may be associated with an increased risk for cancer. CONCLUSIONS: Identification of variants in additional genes that potentially can be associated with familial colorectal cancer indicates a larger genetic spectrum of this disease, not limited only to mismatch repair genes. Usage of multiple in silico tools based on different methods and combined through a consensus approach increases the sensitivity of predictions and narrows down a large list of variants to the ones that are most likely to be significant.


Subject(s)
Colorectal Neoplasms, Hereditary Nonpolyposis , Colorectal Neoplasms , Humans , Colorectal Neoplasms, Hereditary Nonpolyposis/diagnosis , Colorectal Neoplasms, Hereditary Nonpolyposis/genetics , Exome Sequencing , Genetic Predisposition to Disease , Pedigree , Germ-Line Mutation , Germ Cells , Colorectal Neoplasms/genetics , Colorectal Neoplasms/diagnosis
2.
Mol Genet Genomics ; 298(3): 555-566, 2023 May.
Article in English | MEDLINE | ID: mdl-36856825

ABSTRACT

The cancer syndrome polymerase proofreading-associated polyposis results from germline mutations in the POLE and POLD1 genes. Mutations in the exonuclease domain of these genes are associated with hyper- and ultra-mutated tumors with a predominance of base substitutions resulting from faulty proofreading during DNA replication. When a new variant is identified by gene testing of POLE and POLD1, it is important to verify whether the variant is associated with PPAP or not, to guide genetic counseling of mutation carriers. In 2015, we reported the likely pathogenic (class 4) germline POLE c.1373A > T p.(Tyr458Phe) variant and we have now characterized this variant to verify that it is a class 5 pathogenic variant. For this purpose, we investigated (1) mutator phenotype in tumors from two carriers, (2) mutation frequency in cell-based mutagenesis assays, and (3) structural consequences based on protein modeling. Whole-exome sequencing of two tumors identified an ultra-mutator phenotype with a predominance of base substitutions, the majority of which are C > T. A SupF mutagenesis assay revealed increased mutation frequency in cells overexpressing the variant of interest as well as in isogenic cells encoding the variant. Moreover, exonuclease repair yeast-based assay supported defect in proofreading activity. Lastly, we present a homology model of human POLE to demonstrate structural consequences leading to pathogenic impact of the p.(Tyr458Phe) mutation. The three lines of evidence, taken together with updated co-segregation and previously published data, allow the germline variant POLE c.1373A > T p.(Tyr458Phe) to be reclassified as a class 5 variant. That means the variant is associated with PPAP.


Subject(s)
DNA Polymerase II , Neoplasms , Humans , DNA Polymerase II/genetics , DNA Polymerase II/chemistry , DNA Polymerase II/metabolism , Poly-ADP-Ribose Binding Proteins/genetics , Neoplasms/genetics , Mutation , Exonucleases/genetics , Exonucleases/metabolism
3.
Nucleic Acids Res ; 51(D1): D564-D570, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36350659

ABSTRACT

We present an update of EpiFactors, a manually curated database providing information about epigenetic regulators, their complexes, targets, and products which is openly accessible at http://epifactors.autosome.org. An updated version of the EpiFactors contains information on 902 proteins, including 101 histones and protamines, and, as a main update, a newly curated collection of 124 lncRNAs involved in epigenetic regulation. The amount of publications concerning the role of lncRNA in epigenetics is rapidly growing. Yet, the resource that compiles, integrates, organizes, and presents curated information on lncRNAs in epigenetics is missing. EpiFactors fills this gap and provides data on epigenetic regulators in an accessible and user-friendly form. For 820 of the genes in EpiFactors, we include expression estimates across multiple cell types assessed by CAGE-Seq in the FANTOM5 project. In addition, the updated EpiFactors contains information on 73 protein complexes involved in epigenetic regulation. Our resource is practical for a wide range of users, including biologists, bioinformaticians and molecular/systems biologists.


Subject(s)
Databases, Genetic , Epigenesis, Genetic , Humans , Histones/genetics , Histones/metabolism , Protamines , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism
4.
PLoS One ; 17(10): e0275621, 2022.
Article in English | MEDLINE | ID: mdl-36282866

ABSTRACT

Mitochondrial activity in cancer cells has been central to cancer research since Otto Warburg first published his thesis on the topic in 1956. Although Warburg proposed that oxidative phosphorylation in the tricarboxylic acid (TCA) cycle was perturbed in cancer, later research has shown that oxidative phosphorylation is activated in most cancers, including prostate cancer (PCa). However, more detailed knowledge on mitochondrial metabolism and metabolic pathways in cancers is still lacking. In this study we expand our previously developed method for analyzing functional homologous proteins (FunHoP), which can provide a more detailed view of metabolic pathways. FunHoP uses results from differential expression analysis of RNA-Seq data to improve pathway analysis. By adding information on subcellular localization based on experimental data and computational predictions we can use FunHoP to differentiate between mitochondrial and non-mitochondrial processes in cancerous and normal prostate cell lines. Our results show that mitochondrial pathways are upregulated in PCa and that splitting metabolic pathways into mitochondrial and non-mitochondrial counterparts using FunHoP adds to the interpretation of the metabolic properties of PCa cells.


Subject(s)
Genes, Mitochondrial , Prostatic Neoplasms , Male , Humans , Up-Regulation , Cell Line, Tumor , Oxidative Phosphorylation , Prostatic Neoplasms/genetics , Prostatic Neoplasms/metabolism , Tricarboxylic Acids
5.
Insect Mol Biol ; 31(6): 810-820, 2022 12.
Article in English | MEDLINE | ID: mdl-36054587

ABSTRACT

The protein vitellogenin (Vg) plays a central role in lipid transportation in most egg-laying animals. High Vg levels correlate with stress resistance and lifespan potential in honey bees (Apis mellifera). Vg is the primary circulating zinc-carrying protein in honey bees. Zinc is an essential metal ion in numerous biological processes, including the function and structure of many proteins. Measurements of Zn2+ suggest a variable number of ions per Vg molecule in different animal species, but the molecular implications of zinc-binding by this protein are not well-understood. We used inductively coupled plasma mass spectrometry to determine that, on average, each honey bee Vg molecule binds 3 Zn2+ -ions. Our full-length protein structure and sequence analysis revealed seven potential zinc-binding sites. These are located in the ß-barrel and α-helical subdomains of the N-terminal domain, the lipid binding site, and the cysteine-rich C-terminal region of unknown function. Interestingly, two potential zinc-binding sites in the ß-barrel can support a proposed role for this structure in DNA-binding. Overall, our findings suggest that honey bee Vg bind zinc at several functional regions, indicating that Zn2+ -ions are important for many of the activities of this protein. In addition to being potentially relevant for other egg-laying species, these insights provide a platform for studies of metal ions in bee health, which is of global interest due to recent declines in pollinator numbers.


Subject(s)
Insect Proteins , Vitellogenins , Bees , Animals , Vitellogenins/metabolism , Insect Proteins/metabolism , Zinc , Binding Sites , Lipids
6.
iScience ; 25(6): 104451, 2022 Jun 17.
Article in English | MEDLINE | ID: mdl-35707723

ABSTRACT

High secretion of the metabolites citrate and spermine is a unique hallmark for normal prostate epithelial cells, and is reduced in aggressive prostate cancer. However, the identity of the genes controlling this biological process is mostly unknown. In this study, we have created a gene signature of 150 genes connected to citrate and spermine secretion in the prostate. We have computationally integrated metabolic measurements with multiple transcriptomics datasets from the public domain, including 3826 tissue samples from prostate and prostate cancer. The accuracy of the signature is validated by its unique enrichment in prostate samples and prostate epithelial tissue compartments. The signature highlights genes AZGP1, ANPEP and metallothioneins with zinc-binding properties not previously studied in the prostate, and the expression of these genes are reduced in more aggressive cancer lesions. However, the absence of signature enrichment in common prostate model systems can make it challenging to study these genes mechanistically.

7.
BMC Med Genomics ; 14(1): 214, 2021 08 31.
Article in English | MEDLINE | ID: mdl-34465341

ABSTRACT

BACKGROUND: Detection of copy number variation (CNV) in genes associated with disease is important in genetic diagnostics, and next generation sequencing (NGS) technology provides data that can be used for CNV detection. However, CNV detection based on NGS data is in general not often used in diagnostic labs as the data analysis is challenging, especially with data from targeted gene panels. Wet lab methods like MLPA (MRC Holland) are widely used, but are expensive, time consuming and have gene-specific limitations. Our aim has been to develop a bioinformatic tool for CNV detection from NGS data in medical genetic diagnostic samples. RESULTS: Our computational pipeline for detection of CNVs in NGS data from targeted gene panels utilizes coverage depth of the captured regions and calculates a copy number ratio score for each region. This is computed by comparing the mean coverage of the sample with the mean coverage of the same region in other samples, defined as a pool. The pipeline selects pools for comparison dynamically from previously sequenced samples, using the pool with an average coverage depth that is nearest to the one of the samples. A sliding window-based approach is used to analyze each region, where length of sliding window and sliding distance can be chosen dynamically to increase or decrease the resolution. This helps in detecting CNVs in small or partial exons. With this pipeline we have correctly identified the CNVs in 36 positive control samples, with sensitivity of 100% and specificity of 91%. We have detected whole gene level deletion/duplication, single/multi exonic level deletion/duplication, partial exonic deletion and mosaic deletion. Since its implementation in mid-2018 it has proven its diagnostic value with more than 45 CNV findings in routine tests. CONCLUSIONS: With this pipeline as part of our diagnostic practices it is now possible to detect partial, single or multi-exonic, and intragenic CNVs in all genes in our target panel. This has helped our diagnostic lab to expand the portfolio of genes where we offer CNV detection, which previously was limited by the availability of MLPA kits.


Subject(s)
DNA Copy Number Variations
8.
F1000Res ; 102021.
Article in English | MEDLINE | ID: mdl-34249331

ABSTRACT

Background: Many types of data from genomic analyses can be represented as genomic tracks, i.e. features linked to the genomic coordinates of a reference genome. Examples of such data are epigenetic DNA methylation data, ChIP-seq peaks, germline or somatic DNA variants, as well as RNA-seq expression levels. Researchers often face difficulties in locating, accessing and combining relevant tracks from external sources, as well as locating the raw data, reducing the value of the generated information. Description of work: We propose to advance the application of FAIR data principles (Findable, Accessible, Interoperable, and Reusable) to produce searchable metadata for genomic tracks. Findability and Accessibility of metadata can then be ensured by a track search service that integrates globally identifiable metadata from various track hubs in the Track Hub Registry and other relevant repositories. Interoperability and Reusability need to be ensured by the specification and implementation of a basic set of recommendations for metadata. We have tested this concept by developing such a specification in a JSON Schema, called FAIRtracks, and have integrated it into a novel track search service, called TrackFind. We demonstrate practical usage by importing datasets through TrackFind into existing examples of relevant analytical tools for genomic tracks: EPICO and the GSuite HyperBrowser. Conclusion: We here provide a first iteration of a draft standard for genomic track metadata, as well as the accompanying software ecosystem. It can easily be adapted or extended to future needs of the research community regarding data, methods and tools, balancing the requirements of both data submitters and analytical end-users.


Subject(s)
Ecosystem , Metadata , Genome , Genomics , Software
9.
BMC Res Notes ; 14(1): 162, 2021 Apr 30.
Article in English | MEDLINE | ID: mdl-33931103

ABSTRACT

OBJECTIVE: Properties of gene products can be described or annotated with Gene Ontology (GO) terms. But for many genes we have limited information about their products, for example with respect to function. This is particularly true for long non-coding RNAs (lncRNAs), where the function in most cases is unknown. However, it has been shown that annotation as described by GO terms to some extent can be predicted by enrichment analysis on properties of co-expressed genes. RESULTS: GAPGOM integrates two relevant algorithms, lncRNA2GOA and TopoICSim, into a user-friendly R package. Here lncRNA2GOA does annotation prediction by co-expression, whereas TopoICSim estimates similarity between GO graphs, which can be used for benchmarking of prediction performance, but also for comparison of GO graphs in general. The package provides an improved implementation of the original tools, with substantial improvements in performance and documentation, unified interfaces, and additional features.


Subject(s)
Benchmarking , Computational Biology , Algorithms , Gene Ontology , Molecular Sequence Annotation
10.
Genomics Proteomics Bioinformatics ; 19(5): 848-859, 2021 10.
Article in English | MEDLINE | ID: mdl-33741524

ABSTRACT

Cytoscape is often used for visualization and analysis of metabolic pathways. For example, based on KEGG data, a reader for KEGG Markup Language (KGML) is used to load files into Cytoscape. However, although multiple genes can be responsible for the same reaction, the KGML-reader KEGGScape only presents the first listed gene in a network node for a given reaction. This can lead to incorrect interpretations of the pathways. Our new method, FunHoP, shows all possible genes in each node, making the pathways more complete. FunHoP collapses all genes in a node into one measurement using read counts from RNA-seq. Assuming that activity for an enzymatic reaction mainly depends upon the gene with the highest number of reads, and weighting the reads on gene length and ratio, a new expression value is calculated for the node as a whole. Differential expression at node level is then applied to the networks. Using prostate cancer as model, we integrate RNA-seq data from two patient cohorts with metabolism data from literature. Here we show that FunHoP gives more consistent pathways that are easier to interpret biologically. Code and documentation for running FunHoP can be found at https://github.com/kjerstirise/FunHoP.


Subject(s)
Metabolic Networks and Pathways , Software , Humans , Metabolic Networks and Pathways/genetics
11.
Cancer Inform ; 19: 1176935120965542, 2020.
Article in English | MEDLINE | ID: mdl-33116353

ABSTRACT

The k-Nearest Neighbor (kNN) classifier represents a simple and very general approach to classification. Still, the performance of kNN classifiers can often compete with more complex machine-learning algorithms. The core of kNN depends on a "guilt by association" principle where classification is performed by measuring the similarity between a query and a set of training patterns, often computed as distances. The relative performance of kNN classifiers is closely linked to the choice of distance or similarity measure, and it is therefore relevant to investigate the effect of using different distance measures when comparing biomedical data. In this study on classification of cancer data sets, we have used both common and novel distance measures, including the novel distance measures Sobolev and Fisher, and we have evaluated the performance of kNN with these distances on 4 cancer data sets of different type. We find that the performance when using the novel distance measures is comparable to the performance with more well-established measures, in particular for the Sobolev distance. We define a robust ranking of all the distance measures according to overall performance. Several distance measures show robust performance in kNN over several data sets, in particular the Hassanat, Sobolev, and Manhattan measures. Some of the other measures show good performance on selected data sets but seem to be more sensitive to the nature of the classification data. It is therefore important to benchmark distance measures on similar data prior to classification to identify the most suitable measure in each case.

12.
Sci Adv ; 6(37)2020 09.
Article in English | MEDLINE | ID: mdl-32917713

ABSTRACT

Intestinal epithelial homeostasis is maintained by adult intestinal stem cells, which, alongside Paneth cells, appear after birth in the neonatal period. We aimed to identify regulators of neonatal intestinal epithelial development by testing a small library of epigenetic modifier inhibitors in Paneth cell-skewed organoid cultures. We found that lysine-specific demethylase 1A (Kdm1a/Lsd1) is absolutely required for Paneth cell differentiation. Lsd1-deficient crypts, devoid of Paneth cells, are still able to form organoids without a requirement of exogenous or endogenous Wnt. Mechanistically, we find that LSD1 enzymatically represses genes that are normally expressed only in fetal and neonatal epithelium. This gene profile is similar to what is seen in repairing epithelium, and we find that Lsd1-deficient epithelium has superior regenerative capacities after irradiation injury. In summary, we found an important regulator of neonatal intestinal development and identified a druggable target to reprogram intestinal epithelium toward a reparative state.


Subject(s)
Intestinal Mucosa , Paneth Cells , Cell Differentiation/genetics , Histone Demethylases/genetics , Humans , Infant, Newborn , Organoids
13.
PLoS One ; 15(7): e0235613, 2020.
Article in English | MEDLINE | ID: mdl-32634176

ABSTRACT

Germline variants inactivating the mismatch repair (MMR) genes MLH1, MSH2, MSH6 and PMS2 cause Lynch syndrome that implies an increased cancer risk, where colon and endometrial cancer are the most frequent. Identification of these pathogenic variants is important to identify endometrial cancer patients with inherited increased risk of new cancers, in order to offer them lifesaving surveillance. However, several other genes are also part of the MMR pathway. It is therefore relevant to search for variants in additional genes that may be associated with cancer risk by including all known genes involved in the MMR pathway. Next-generation sequencing was used to screen 22 genes involved in the MMR pathway in constitutional DNA extracted from full blood from 199 unselected endometrial cancer patients. Bioinformatic pipelines were developed for identification and functional annotation of variants, using several different software tools and custom programs. This facilitated identification of 22 exonic, 4 UTR and 9 intronic variants that could be classified according to pathogenicity. This study has identified several germline variants in genes of the MMR pathway that potentially may be associated with an increased risk for cancer, in particular endometrial cancer, and therefore are relevant for further investigation. We have also developed bioinformatics strategies to analyse targeted sequencing data, including low quality data and genomic regions outside of the protein coding exons of the relevant genes.


Subject(s)
DNA Mismatch Repair , Endometrial Neoplasms/pathology , Mismatch Repair Endonuclease PMS2/genetics , MutL Protein Homolog 1/genetics , MutS Homolog 2 Protein/genetics , Colorectal Neoplasms, Hereditary Nonpolyposis/genetics , Colorectal Neoplasms, Hereditary Nonpolyposis/pathology , DNA Copy Number Variations , DNA, Neoplasm/blood , DNA, Neoplasm/chemistry , DNA, Neoplasm/metabolism , Endometrial Neoplasms/genetics , Exons , Female , High-Throughput Nucleotide Sequencing , Humans , Introns , Risk Factors , Untranslated Regions/genetics
14.
BMC Bioinformatics ; 21(1): 134, 2020 Apr 06.
Article in English | MEDLINE | ID: mdl-32252623

ABSTRACT

BACKGROUND: Diseases like cancer will lead to changes in gene expression, and it is relevant to identify key regulatory genes that can be linked directly to these changes. This can be done by computing a Regulatory Impact Factor (RIF) score for relevant regulators. However, this computation is based on estimating correlated patterns of gene expression, often Pearson correlation, and an assumption about a set of specific regulators, normally transcription factors. This study explores alternative measures of correlation, using the Fisher and Sobolev metrics, and an extended set of regulators, including epigenetic regulators and long non-coding RNAs (lncRNAs). Data on prostate cancer have been used to explore the effect of these modifications. RESULTS: A tool for computation of RIF scores with alternative correlation measures and extended sets of regulators was developed and tested on gene expression data for prostate cancer. The study showed that the Fisher and Sobolev metrics lead to improved identification of well-documented regulators of gene expression in prostate cancer, and the sets of identified key regulators showed improved overlap with previously defined gene sets of relevance to cancer. The extended set of regulators lead to identification of several interesting candidates for further studies, including lncRNAs. Several key processes were identified as important, including spindle assembly and the epithelial-mesenchymal transition (EMT). CONCLUSIONS: The study has shown that using alternative metrics of correlation can improve the performance of tools based on correlation of gene expression in genomic data. The Fisher and Sobolev metrics should be considered also in other correlation-based applications.


Subject(s)
Computational Biology/methods , Epigenesis, Genetic , Transcription Factors/metabolism , Databases, Genetic , Epithelial-Mesenchymal Transition , Gene Expression Regulation, Neoplastic , Humans , Male , Prostatic Neoplasms/genetics , Prostatic Neoplasms/pathology , RNA, Long Noncoding/metabolism , Transcription Factors/genetics
15.
BMC Med Genomics ; 13(1): 6, 2020 01 08.
Article in English | MEDLINE | ID: mdl-31914996

ABSTRACT

BACKGROUND: Prostate cancer (PCa) has the highest incidence rates of cancers in men in western countries. Unlike several other types of cancer, PCa has few genetic drivers, which has led researchers to look for additional epigenetic and transcriptomic contributors to PCa development and progression. Especially datasets on DNA methylation, the most commonly studied epigenetic marker, have recently been measured and analysed in several PCa patient cohorts. DNA methylation is most commonly associated with downregulation of gene expression. However, positive associations of DNA methylation to gene expression have also been reported, suggesting a more diverse mechanism of epigenetic regulation. Such additional complexity could have important implications for understanding prostate cancer development but has not been studied at a genome-wide scale. RESULTS: In this study, we have compared three sets of genome-wide single-site DNA methylation data from 870 PCa and normal tissue samples with multi-cohort gene expression data from 1117 samples, including 532 samples where DNA methylation and gene expression have been measured on the exact same samples. Genes were classified according to their corresponding methylation and expression profiles. A large group of hypermethylated genes was robustly associated with increased gene expression (UPUP group) in all three methylation datasets. These genes demonstrated distinct patterns of correlation between DNA methylation and gene expression compared to the genes showing the canonical negative association between methylation and expression (UPDOWN group). This indicates a more diversified role of DNA methylation in regulating gene expression than previously appreciated. Moreover, UPUP and UPDOWN genes were associated with different compartments - UPUP genes were related to the structures in nucleus, while UPDOWN genes were linked to extracellular features. CONCLUSION: We identified a robust association between hypermethylation and upregulation of gene expression when comparing samples from prostate cancer and normal tissue. These results challenge the classical view where DNA methylation is always associated with suppression of gene expression, which underlines the importance of considering corresponding expression data when assessing the downstream regulatory effect of DNA methylation.


Subject(s)
DNA Methylation , DNA, Neoplasm , Epigenesis, Genetic , Gene Expression Regulation, Neoplastic , Prostatic Neoplasms , Up-Regulation , DNA, Neoplasm/genetics , DNA, Neoplasm/metabolism , Humans , Male , Prostatic Neoplasms/genetics , Prostatic Neoplasms/metabolism , Prostatic Neoplasms/pathology
16.
Biostatistics ; 21(3): 625-639, 2020 07 01.
Article in English | MEDLINE | ID: mdl-30698663

ABSTRACT

We present model-based analysis for ChIA-PET (MACPET), which analyzes paired-end read sequences provided by ChIA-PET for finding binding sites of a protein of interest. MACPET uses information from both tags of each PET and searches for binding sites in a two-dimensional space, while taking into account different noise levels in different genomic regions. MACPET shows favorable results compared with MACS in terms of motif occurrence and spatial resolution. Furthermore, significant binding sites discovered by MACPET are involved in a higher number of significant three-dimensional interactions than those discovered by MACS. MACPET is freely available on Bioconductor. ChIA-PET; MACPET; Model-based clustering; Paired-end tags; Peak-calling algorithm.


Subject(s)
Chromatin Immunoprecipitation Sequencing , Chromatin Immunoprecipitation , Genome , High-Throughput Nucleotide Sequencing , Models, Biological , Protein Binding , Sequence Analysis, DNA , Humans
17.
Clin Epigenetics ; 11(1): 193, 2019 12 12.
Article in English | MEDLINE | ID: mdl-31831061

ABSTRACT

Sequencing technologies have changed not only our approaches to classical genetics, but also the field of epigenetics. Specific methods allow scientists to identify novel genome-wide epigenetic patterns of DNA methylation down to single-nucleotide resolution. DNA methylation is the most researched epigenetic mark involved in various processes in the human cell, including gene regulation and development of diseases, such as cancer. Increasing numbers of DNA methylation sequencing datasets from human genome are produced using various platforms-from methylated DNA precipitation to the whole genome bisulfite sequencing. Many of those datasets are fully accessible for repeated analyses. Sequencing experiments have become routine in laboratories around the world, while analysis of outcoming data is still a challenge among the majority of scientists, since in many cases it requires advanced computational skills. Even though various tools are being created and published, guidelines for their selection are often not clear, especially to non-bioinformaticians with limited experience in computational analyses. Separate tools are often used for individual steps in the analysis, and these can be challenging to manage and integrate. However, in some instances, tools are combined into pipelines that are capable to complete all the essential steps to achieve the result. In the case of DNA methylation sequencing analysis, the goal of such pipeline is to map sequencing reads, calculate methylation levels, and distinguish differentially methylated positions and/or regions. The objective of this review is to describe basic principles and steps in the analysis of DNA methylation sequencing data that in particular have been used for mammalian genomes, and more importantly to present and discuss the most pronounced computational pipelines that can be used to analyze such data. We aim to provide a good starting point for scientists with limited experience in computational analyses of DNA methylation and hydroxymethylation data, and recommend a few tools that are powerful, but still easy enough to use for their own data analysis.


Subject(s)
Computational Biology/methods , DNA Methylation , Sequence Analysis, DNA/methods , Data Analysis , Epigenesis, Genetic , Genome, Human , Humans , Software
18.
BMC Bioinformatics ; 19(1): 533, 2018 Dec 19.
Article in English | MEDLINE | ID: mdl-30567492

ABSTRACT

BACKGROUND: Almost 16,000 human long non-coding RNA (lncRNA) genes have been identified in the GENCODE project. However, the function of most of them remains to be discovered. The function of lncRNAs and other novel genes can be predicted by identifying significantly enriched annotation terms in already annotated genes that are co-expressed with the lncRNAs. However, such approaches are sensitive to the methods that are used to estimate the level of co-expression. RESULTS: We have tested and compared two well-known statistical metrics (Pearson and Spearman) and two geometrical metrics (Sobolev and Fisher) for identification of the co-expressed genes, using experimental expression data across 19 normal human tissues. We have also used a benchmarking approach based on semantic similarity to evaluate how well these methods are able to predict annotation terms, using a well-annotated set of protein-coding genes. CONCLUSION: This work shows that geometrical metrics, in particular in combination with the statistical metrics, will predict annotation terms more efficiently than traditional approaches. Tests on selected lncRNAs confirm that it is possible to predict the function of these genes given a reliable set of expression data. The software used for this investigation is freely available.


Subject(s)
Computational Biology/methods , Gene Expression Profiling , Gene Expression Regulation , Molecular Sequence Annotation , RNA, Long Noncoding/metabolism , Software , High-Throughput Nucleotide Sequencing , Humans , RNA, Long Noncoding/genetics
19.
F1000Res ; 72018.
Article in English | MEDLINE | ID: mdl-30271575

ABSTRACT

The Norwegian e-Infrastructure for Life Sciences (NeLS) has been developed by ELIXIR Norway to provide its users with a system enabling data storage, sharing, and analysis in a project-oriented fashion. The system is available through easy-to-use web interfaces, including the Galaxy workbench for data analysis and workflow execution. Users confident with a command-line interface and programming may also access it through Secure Shell (SSH) and application programming interfaces (APIs).  NeLS has been in production since 2015, with training and support provided by the help desk of ELIXIR Norway. Through collaboration with NorSeq, the national consortium for high-throughput sequencing, an integrated service is offered so that sequencing data generated in a research project is provided to the involved researchers through NeLS. Sensitive data, such as individual genomic sequencing data, are handled using the TSD (Services for Sensitive Data) platform provided by Sigma2 and the University of Oslo. NeLS integrates national e-infrastructure storage and computing resources, and is also integrated with the SEEK platform in order to store large data files produced by experiments described in SEEK.   In this article, we outline the architecture of NeLS and discuss possible directions for further development.


Subject(s)
Biological Science Disciplines , Database Management Systems , High-Throughput Nucleotide Sequencing , Humans , Information Dissemination/methods , Information Storage and Retrieval/methods , Norway
20.
FEBS Open Bio ; 8(7): 1135-1145, 2018 Jul.
Article in English | MEDLINE | ID: mdl-29988559

ABSTRACT

Proliferating cell nuclear antigen (PCNA), a member of the highly conserved DNA sliding clamp family, is an essential protein for cellular processes including DNA replication and repair. A large number of proteins from higher eukaryotes contain one of two PCNA-interacting motifs: PCNA-interacting protein box (PIP box) and AlkB homologue 2 PCNA-interacting motif (APIM). APIM has been shown to be especially important during cellular stress. PIP box is known to be functionally conserved in yeast, and here, we show that this is also the case for APIM. Several of the 84 APIM-containing yeast proteins are associated with cellular signaling as hub proteins, which are able to interact with a large number of other proteins. Cellular signaling is highly conserved throughout evolution, and we recently suggested a novel role for PCNA as a scaffold protein in cellular signaling in human cells. A cell-penetrating peptide containing the APIM sequence increases the sensitivity toward the chemotherapeutic agent cisplatin in both yeast and human cells, and both yeast and human cells become hypersensitive when the Hog1/p38 MAPK pathway is blocked. These results suggest that the interactions between APIM-containing signaling proteins and PCNA during the DNA damage response is evolutionary conserved between yeast and mammals and that PCNA has a role in cellular signaling also in yeast.

SELECTION OF CITATIONS
SEARCH DETAIL
...