Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
1.
Cancers (Basel) ; 12(2)2020 Feb 10.
Article in English | MEDLINE | ID: mdl-32050665

ABSTRACT

The authors wish to make the following corrections to this paper [1]: The authors would like to replace Table 3 in [1]. The corrections are correcting typographical errors when translating our database in BIC format to HGVS nomenclature, and removing four carriers which had zero follow-up time. [...].

2.
Cancers (Basel) ; 11(2)2019 Jan 23.
Article in English | MEDLINE | ID: mdl-30678073

ABSTRACT

Background: We have previously demonstrated that the Norwegian frequent pathogenic BRCA1 (path_BRCA1) variants are caused by genetic drift and recurrent de-novo mutations. We here examined the penetrance of frequent path_BRCA1 variants in fertile ages as a surrogate marker for fitness. Material and methods: We conducted an observational prospective study of penetrance for cancer in Norwegian female carriers of frequent path_BRCA1 variants, and compared our observed results to penetrance of infrequent path_BRCA1 variants and to average penetrance of path_BRCA1 variants reported by others. Results: The cumulative risk for breast cancer at 45 years in carriers of frequent path_BRCA1 variants was 20% (94% confidence interval 10⁻30%), compared to 35% (95% confidence interval 22⁻48%) in carriers of infrequent path_BRCA1 variants (p = 0.02), and to the 35% (confidence interval 32⁻39%) average for path_BRCA1 carriers reported by others (p = 0.0001). Discussion and conclusion: Carriers of the most frequent Norwegian path_BRCA1 variants had low incidence of cancer in fertile ages, indicating a low selective disadvantage. This, together with the variant locations being hotspots for de novo mutations and subject to genetic drift, as previously described, may have caused their high prevalence today. Besides being of theoretical interest to explain the phenomenon that a few path_BRCA1 variants are frequent, the later onset of breast cancer associated with the most frequent path_BRCA1 variants may be of interest for carriers who have to decide if and when to select prophylactic mastectomy.

3.
Gut ; 67(7): 1306-1316, 2018 07.
Article in English | MEDLINE | ID: mdl-28754778

ABSTRACT

BACKGROUND: Most patients with path_MMR gene variants (Lynch syndrome (LS)) now survive both their first and subsequent cancers, resulting in a growing number of older patients with LS for whom limited information exists with respect to cancer risk and survival. OBJECTIVE AND DESIGN: This observational, international, multicentre study aimed to determine prospectively observed incidences of cancers and survival in path_MMR carriers up to 75 years of age. RESULTS: 3119 patients were followed for a total of 24 475 years. Cumulative incidences at 75 years (risks) for colorectal cancer were 46%, 43% and 15% in path_MLH1, path_MSH2 and path_MSH6 carriers; for endometrial cancer 43%, 57% and 46%; for ovarian cancer 10%, 17% and 13%; for upper gastrointestinal (gastric, duodenal, bile duct or pancreatic) cancers 21%, 10% and 7%; for urinary tract cancers 8%, 25% and 11%; for prostate cancer 17%, 32% and 18%; and for brain tumours 1%, 5% and 1%, respectively. Ovarian cancer occurred mainly premenopausally. By contrast, upper gastrointestinal, urinary tract and prostate cancers occurred predominantly at older ages. Overall 5-year survival for prostate cancer was 100%, urinary bladder 93%, ureter 85%, duodenum 67%, stomach 61%, bile duct 29%, brain 22% and pancreas 0%. Path_PMS2 carriers had lower risk for cancer. CONCLUSION: Carriers of different path_MMR variants exhibit distinct patterns of cancer risk and survival as they age. Risk estimates for counselling and planning of surveillance and treatment should be tailored to each patient's age, gender and path_MMR variant. We have updated our open-access website www.lscarisk.org to facilitate this.


Subject(s)
Colonic Neoplasms/epidemiology , Colorectal Neoplasms, Hereditary Nonpolyposis/complications , Colorectal Neoplasms, Hereditary Nonpolyposis/mortality , Pancreatic Neoplasms/epidemiology , Urogenital Neoplasms/epidemiology , Age Factors , Aged , Colorectal Neoplasms, Hereditary Nonpolyposis/pathology , Databases, Factual , Female , Humans , Incidence , Male , Prospective Studies
4.
Oncotarget ; 8(44): 76290-76304, 2017 09 29.
Article in English | MEDLINE | ID: mdl-29100312

ABSTRACT

Background: Metastatic colorectal cancer (CRC) is associated with highly variable clinical outcome and response to therapy. The recently identified consensus molecular subtypes (CMS1-4) have prognostic and therapeutic implications in primary CRC, but whether these subtypes are valid for metastatic disease is unclear. We performed multi-level analyses of resectable CRC liver metastases (CLM) to identify molecular characteristics of metastatic disease and evaluate the clinical relevance. Methods: In this ancillary study to the Oslo-CoMet trial, CLM and tumor-adjacent liver tissue from 46 patients were analyzed by profiling mutations (targeted sequencing), genome-wide copy number alteration (CNAs), and gene expression. Results: Somatic mutations and CNAs detected in CLM were similar to reported primary CRC profiles, while CNA profiles of eight metastatic pairs suggested intra-patient divergence. A CMS classifier tool applied to gene expression data, revealed the cohort to be highly enriched for CMS2. Hierarchical clustering of genes with highly variable expression identified two subgroups separated by high or low expression of 55 genes with immune-related and metabolic functions. Importantly, induction of genes and pathways associated with immunogenic cell death (ICD) was identified in metastases exposed to neoadjuvant chemotherapy (NACT). Conclusions: The uniform classification of CLM by CMS subtyping may indicate that novel class discovery approaches need to be explored to uncover clinically useful stratification of CLM. Detected gene expression signatures support the role of metabolism and chemotherapy in shaping the immune microenvironment of CLM. Furthermore, the results point to rational exploration of immune modulating strategies in CLM, particularly by exploiting NACT-induced ICD.

5.
Article in English | MEDLINE | ID: mdl-29046738

ABSTRACT

BACKGROUND: We have previously reported a high incidence of colorectal cancer (CRC) in carriers of pathogenic MLH1 variants (path_MLH1) despite follow-up with colonoscopy including polypectomy. METHODS: The cohort included Finnish carriers enrolled in 3-yearly colonoscopy (n = 505; 4625 observation years) and carriers from other countries enrolled in colonoscopy 2-yearly or more frequently (n = 439; 3299 observation years). We examined whether the longer interval between colonoscopies in Finland could explain the high incidence of CRC and whether disease expression correlated with differences in population CRC incidence. RESULTS: Cumulative CRC incidences in carriers of path_MLH1 at 70-years of age were 41% for males and 36% for females in the Finnish series and 58% and 55% in the non-Finnish series, respectively (p > 0.05). Mean time from last colonoscopy to CRC was 32.7 months in the Finnish compared to 31.0 months in the non-Finnish (p > 0.05) and was therefore unaffected by the recommended colonoscopy interval. Differences in population incidence of CRC could not explain the lower point estimates for CRC in the Finnish series. Ten-year overall survival after CRC was similar for the Finnish and non-Finnish series (88% and 91%, respectively; p > 0.05). CONCLUSIONS: The hypothesis that the high incidence of CRC in path_MLH1 carriers was caused by a higher incidence in the Finnish series was not valid. We discuss whether the results were influenced by methodological shortcomings in our study or whether the assumption that a shorter interval between colonoscopies leads to a lower CRC incidence may be wrong. This second possibility is intriguing, because it suggests the dogma that CRC in path_MLH1 carriers develops from polyps that can be detected at colonoscopy and removed to prevent CRC may be erroneous. In view of the excellent 10-year overall survival in the Finnish and non-Finnish series we remain strong advocates of current surveillance practices for those with LS pending studies that will inform new recommendations on the best surveillance interval.

6.
Biostatistics ; 18(3): 586-587, 2017 07 01.
Article in English | MEDLINE | ID: mdl-28334081
7.
Gut ; 66(3): 464-472, 2017 03.
Article in English | MEDLINE | ID: mdl-26657901

ABSTRACT

OBJECTIVE: Estimates of cancer risk and the effects of surveillance in Lynch syndrome have been subject to bias, partly through reliance on retrospective studies. We sought to establish more robust estimates in patients undergoing prospective cancer surveillance. DESIGN: We undertook a multicentre study of patients carrying Lynch syndrome-associated mutations affecting MLH1, MSH2, MSH6 or PMS2. Standardised information on surveillance, cancers and outcomes were collated in an Oracle relational database and analysed by age, sex and mutated gene. RESULTS: 1942 mutation carriers without previous cancer had follow-up including colonoscopic surveillance for 13 782 observation years. 314 patients developed cancer, mostly colorectal (n=151), endometrial (n=72) and ovarian (n=19). Cancers were detected from 25 years onwards in MLH1 and MSH2 mutation carriers, and from about 40 years in MSH6 and PMS2 carriers. Among first cancer detected in each patient the colorectal cancer cumulative incidences at 70 years by gene were 46%, 35%, 20% and 10% for MLH1, MSH2, MSH6 and PMS2 mutation carriers, respectively. The equivalent cumulative incidences for endometrial cancer were 34%, 51%, 49% and 24%; and for ovarian cancer 11%, 15%, 0% and 0%. Ten-year crude survival was 87% after any cancer, 91% if the first cancer was colorectal, 98% if endometrial and 89% if ovarian. CONCLUSIONS: The four Lynch syndrome-associated genes had different penetrance and expression. Colorectal cancer occurred frequently despite colonoscopic surveillance but resulted in few deaths. Using our data, a website has been established at http://LScarisk.org enabling calculation of cumulative cancer risks as an aid to genetic counselling in Lynch syndrome.


Subject(s)
Colorectal Neoplasms, Hereditary Nonpolyposis/epidemiology , Colorectal Neoplasms, Hereditary Nonpolyposis/genetics , Endometrial Neoplasms/epidemiology , Ovarian Neoplasms/epidemiology , Population Surveillance , Adolescent , Adult , Age Factors , Aged , Aged, 80 and over , Child , Colonoscopy , Colorectal Neoplasms, Hereditary Nonpolyposis/diagnostic imaging , Colorectal Neoplasms, Hereditary Nonpolyposis/mortality , DNA-Binding Proteins/genetics , Databases, Factual , Endometrial Neoplasms/mortality , Female , Gene Expression , Heterozygote , Humans , Incidence , Male , Middle Aged , Mismatch Repair Endonuclease PMS2/genetics , MutL Protein Homolog 1/genetics , MutS Homolog 2 Protein/genetics , Ovarian Neoplasms/mortality , Prospective Studies , Survival Rate , Young Adult
8.
Gut ; 66(9): 1657-1664, 2017 09.
Article in English | MEDLINE | ID: mdl-27261338

ABSTRACT

OBJECTIVE: Today most patients with Lynch syndrome (LS) survive their first cancer. There is limited information on the incidences and outcome of subsequent cancers. The present study addresses three questions: (i) what is the cumulative incidence of a subsequent cancer; (ii) in which organs do subsequent cancers occur; and (iii) what is the survival following these cancers? DESIGN: Information was collated on prospectively organised surveillance and prospectively observed outcomes in patients with LS who had cancer prior to inclusion and analysed by age, gender and genetic variants. RESULTS: 1273 patients with LS from 10 countries were followed up for 7753 observation years. 318 patients (25.7%) developed 341 first subsequent cancers, including colorectal (n=147, 43%), upper GI, pancreas or bile duct (n=37, 11%) and urinary tract (n=32, 10%). The cumulative incidences for any subsequent cancer from age 40 to age 70 years were 73% for pathogenic MLH1 (path_MLH1), 76% for path_MSH2 carriers and 52% for path_MSH6 carriers, and for colorectal cancer (CRC) the cumulative incidences were 46%, 48% and 23%, respectively. Crude survival after any subsequent cancer was 82% (95% CI 76% to 87%) and 10-year crude survival after CRC was 91% (95% CI 83% to 95%). CONCLUSIONS: Relative incidence of subsequent cancer compared with incidence of first cancer was slightly but insignificantly higher than cancer incidence in patients with LS without previous cancer (range 0.94-1.49). The favourable survival after subsequent cancers validated continued follow-up to prevent death from cancer. The interactive website http://lscarisk.org was expanded to calculate the risks by gender, genetic variant and age for subsequent cancer for any patient with LS with previous cancer.


Subject(s)
Colonic Neoplasms , Colorectal Neoplasms, Hereditary Nonpolyposis , DNA-Binding Proteins/genetics , MutL Protein Homolog 1/genetics , MutS Homolog 2 Protein/genetics , Adult , Aged , Colonic Neoplasms/genetics , Colonic Neoplasms/pathology , Colorectal Neoplasms, Hereditary Nonpolyposis/epidemiology , Colorectal Neoplasms, Hereditary Nonpolyposis/genetics , Colorectal Neoplasms, Hereditary Nonpolyposis/pathology , DNA Mismatch Repair/genetics , Disease Progression , Europe/epidemiology , Female , Genetic Variation , Germ-Line Mutation , Humans , Incidence , Male , Middle Aged , Neoplasm Staging , Risk Assessment/methods , Risk Assessment/statistics & numerical data , Survival Analysis
9.
Biostatistics ; 17(1): 29-39, 2016 Jan.
Article in English | MEDLINE | ID: mdl-26272994

ABSTRACT

Removal of, or adjustment for, batch effects or center differences is generally required when such effects are present in data. In particular, when preparing microarray gene expression data from multiple cohorts, array platforms, or batches for later analyses, batch effects can have confounding effects, inducing spurious differences between study groups. Many methods and tools exist for removing batch effects from data. However, when study groups are not evenly distributed across batches, actual group differences may induce apparent batch differences, in which case batch adjustments may bias, usually deflate, group differences. Some tools therefore have the option of preserving the difference between study groups, e.g. using a two-way ANOVA model to simultaneously estimate both group and batch effects. Unfortunately, this approach may systematically induce incorrect group differences in downstream analyses when groups are distributed between the batches in an unbalanced manner. The scientific community seems to be largely unaware of how this approach may lead to false discoveries.


Subject(s)
Data Interpretation, Statistical , Microarray Analysis/standards , Humans , Microarray Analysis/methods , Reproducibility of Results
10.
Breast Cancer Res ; 17: 29, 2015 Feb 26.
Article in English | MEDLINE | ID: mdl-25849221

ABSTRACT

INTRODUCTION: Breast cancer is commonly classified into intrinsic molecular subtypes. Standard gene centering is routinely done prior to molecular subtyping, but it can produce inaccurate classifications when the distribution of clinicopathological characteristics in the study cohort differs from that of the training cohort used to derive the classifier. METHODS: We propose a subgroup-specific gene-centering method to perform molecular subtyping on a study cohort that has a skewed distribution of clinicopathological characteristics relative to the training cohort. On such a study cohort, we center each gene on a specified percentile, where the percentile is determined from a subgroup of the training cohort with clinicopathological characteristics similar to the study cohort. We demonstrate our method using the PAM50 classifier and its associated University of North Carolina (UNC) training cohort. We considered study cohorts with skewed clinicopathological characteristics, including subgroups composed of a single prototypic subtype of the UNC-PAM50 training cohort (n = 139), an external estrogen receptor (ER)-positive cohort (n = 48) and an external triple-negative cohort (n = 77). RESULTS: Subgroup-specific gene centering improved prediction performance with the accuracies between 77% and 100%, compared to accuracies between 17% and 33% from standard gene centering, when applied to the prototypic tumor subsets of the PAM50 training cohort. It reduced classification error rates on the ER-positive (11% versus 28%; P = 0.0389), the ER-negative (5% versus 41%; P < 0.0001) and the triple-negative (11% versus 56%; P = 0.1336) subgroups of the PAM50 training cohort. In addition, it produced higher accuracy for subtyping study cohorts composed of varying proportions of ER-positive versus ER-negative cases. Finally, it increased the percentage of assigned luminal subtypes on the external ER-positive cohort and basal-like subtype on the external triple-negative cohort. CONCLUSIONS: Gene centering is often necessary to accurately apply a molecular subtype classifier. Compared with standard gene centering, our proposed subgroup-specific gene centering produced more accurate molecular subtype assignments in a study cohort with skewed clinicopathological characteristics relative to the training cohort.


Subject(s)
Biomarkers, Tumor/genetics , Breast Neoplasms/diagnosis , Breast Neoplasms/genetics , Gene Expression Profiling , Molecular Typing , Cohort Studies , Datasets as Topic , Female , Gene Expression Profiling/methods , Gene Expression Regulation, Neoplastic , Humans , Molecular Typing/methods , Prognosis , Receptors, Estrogen/genetics
11.
BMC Cancer ; 14: 211, 2014 Mar 19.
Article in English | MEDLINE | ID: mdl-24645668

ABSTRACT

BACKGROUND: The aim was to assess and compare prognostic power of nine breast cancer gene signatures (Intrinsic, PAM50, 70-gene, 76-gene, Genomic-Grade-Index, 21-gene-Recurrence-Score, EndoPredict, Wound-Response and Hypoxia) in relation to ER status and follow-up time. METHODS: A gene expression dataset from 947 breast tumors was used to evaluate the signatures for prediction of Distant Metastasis Free Survival (DMFS). A total of 912 patients had available DMFS status. The recently published METABRIC cohort was used as an additional validation set. RESULTS: Survival predictions were fairly concordant across most signatures. Prognostic power declined with follow-up time. During the first 5 years of followup, all signatures except for Hypoxia were predictive for DMFS in ER-positive disease, and 76-gene, Hypoxia and Wound-Response were prognostic in ER-negative disease. After 5 years, the signatures had little prognostic power. Gene signatures provide significant prognostic information beyond tumor size, node status and histological grade. CONCLUSIONS: Generally, these signatures performed better for ER-positive disease, indicating that risk within each ER stratum is driven by distinct underlying biology. Most of the signatures were strong risk predictors for DMFS during the first 5 years of follow-up. Combining gene signatures with histological grade or tumor size, could improve the prognostic power, perhaps also of long-term survival.


Subject(s)
Breast Neoplasms/diagnosis , Breast Neoplasms/genetics , Databases, Genetic , Gene Expression Profiling/methods , Receptors, Estrogen/genetics , Breast Neoplasms/mortality , Cohort Studies , Female , Follow-Up Studies , Humans , Prognosis , Receptors, Estrogen/biosynthesis , Reproducibility of Results , Survival Rate/trends , Time Factors
12.
BMC Bioinformatics ; 14: 313, 2013 Oct 23.
Article in English | MEDLINE | ID: mdl-24152242

ABSTRACT

BACKGROUND: Processing of reads from high throughput sequencing is often done in terms of edges in the de Bruijn graph representing all k-mers from the reads. The memory requirements for storing all k-mers in a lookup table can be demanding, even after removal of read errors, but can be alleviated by using a memory efficient data structure. RESULTS: The FM-index, which is based on the Burrows-Wheeler transform, provides an efficient data structure providing a searchable index of all substrings from a set of strings, and is used to compactly represent full genomes for use in mapping reads to a genome: the memory required to store this is in the same order of magnitude as the strings themselves. However, reads from high throughput sequences mostly have high coverage and so contain the same substrings multiple times from different reads. I here present a modification of the FM-index, which I call the kFM-index, for indexing the set of k-mers from the reads. For DNA sequences, this requires 5 bit of information for each vertex of the corresponding de Bruijn subgraph, i.e. for each different k-1-mer, plus some additional overhead, typically 0.5 to 1 bit per vertex, for storing the equivalent of the FM-index for walking the underlying de Bruijn graph and reproducing the actual k-mers efficiently. CONCLUSIONS: The kFM-index could replace more memory demanding data structures for storing the de Bruijn k-mer graph representation of sequence reads. A Java implementation with additional technical documentation is provided which demonstrates the applicability of the data structure (http://folk.uio.no/einarro/Projects/KFM-index/).


Subject(s)
Algorithms , Genomics/methods , Sequence Analysis, DNA/methods
13.
Mol Cell Proteomics ; 12(6): 1723-34, 2013 Jun.
Article in English | MEDLINE | ID: mdl-23438732

ABSTRACT

Protein complexes enact most biochemical functions in the cell. Dynamic interactions between protein complexes are frequent in many cellular processes. As they are often of a transient nature, they may be difficult to detect using current genome-wide screens. Here, we describe a method to computationally predict physical interactions between protein complexes, applied to both humans and yeast. We integrated manually curated protein complexes and physical protein interaction networks, and we designed a statistical method to identify pairs of protein complexes where the number of protein interactions between a complex pair is due to an actual physical interaction between the complexes. An evaluation against manually curated physical complex-complex interactions in yeast revealed that 50% of these interactions could be predicted in this manner. A community network analysis of the highest scoring pairs revealed a biologically sensible organization of physical complex-complex interactions in the cell. Such analyses of proteomes may serve as a guide to the discovery of novel functional cellular relationships.


Subject(s)
Algorithms , Protein Interaction Mapping/statistics & numerical data , Protein Interaction Maps , Proteome/metabolism , Saccharomyces cerevisiae Proteins/metabolism , Saccharomyces cerevisiae/metabolism , Databases, Protein , Humans , Likelihood Functions , Protein Binding , Protein Multimerization , Saccharomyces cerevisiae/chemistry
14.
PLoS One ; 6(3): e17845, 2011 Mar 10.
Article in English | MEDLINE | ID: mdl-21423775

ABSTRACT

BACKGROUND: Several gene sets for prediction of breast cancer survival have been derived from whole-genome mRNA expression profiles. Here, we develop a statistical framework to explore whether combination of the information from such sets may improve prediction of recurrence and breast cancer specific death in early-stage breast cancers. Microarray data from two clinically similar cohorts of breast cancer patients are used as training (n = 123) and test set (n = 81), respectively. Gene sets from eleven previously published gene signatures are included in the study. PRINCIPAL FINDINGS: To investigate the relationship between breast cancer survival and gene expression on a particular gene set, a Cox proportional hazards model is applied using partial likelihood regression with an L2 penalty to avoid overfitting and using cross-validation to determine the penalty weight. The fitted models are applied to an independent test set to obtain a predicted risk for each individual and each gene set. Hierarchical clustering of the test individuals on the basis of the vector of predicted risks results in two clusters with distinct clinical characteristics in terms of the distribution of molecular subtypes, ER, PR status, TP53 mutation status and histological grade category, and associated with significantly different survival probabilities (recurrence: p = 0.005; breast cancer death: p = 0.014). Finally, principal components analysis of the gene signatures is used to derive combined predictors used to fit a new Cox model. This model classifies test individuals into two risk groups with distinct survival characteristics (recurrence: p = 0.003; breast cancer death: p = 0.001). The latter classifier outperforms all the individual gene signatures, as well as Cox models based on traditional clinical parameters and the Adjuvant! Online for survival prediction. CONCLUSION: Combining the predictive strength of multiple gene signatures improves prediction of breast cancer survival. The presented methodology is broadly applicable to breast cancer risk assessment using any new identified gene set.


Subject(s)
Breast Neoplasms/genetics , Gene Expression Profiling , Gene Expression Regulation, Neoplastic , Breast Neoplasms/diagnosis , Chromosome Mapping , Cluster Analysis , Endpoint Determination , Female , Genes, Neoplasm/genetics , Humans , Kaplan-Meier Estimate , Models, Genetic , Multivariate Analysis , Principal Component Analysis , Prognosis , Proportional Hazards Models , Risk Factors
15.
J Mol Evol ; 70(3): 266-74, 2010 Mar.
Article in English | MEDLINE | ID: mdl-20213140

ABSTRACT

The question of whether natural selection favors genetic stability or genetic variability is a fundamental problem in evolutionary biology. Bioinformatic analyses demonstrate that selection favors genetic stability by avoiding unstable nucleotide sequences in protein encoding DNA. Yet, such unstable sequences are maintained in several DNA repair genes, thereby promoting breakdown of repair and destabilizing the genome. Several studies have therefore argued that selection favors genetic variability at the expense of stability. Here we propose a new evolutionary mechanism, with supporting bioinformatic evidence, that resolves this paradox. Combining the concepts of gene-dependent mutation biases and meiotic recombination, we argue that unstable sequences in the DNA mismatch repair (MMR) genes are maintained by their own phenotype. In particular, we predict that human MMR maintains an overrepresentation of mononucleotide repeats (monorepeats) within and around the MMR genes. In support of this hypothesis, we report a 31% excess in monorepeats in 250 kb regions surrounding the seven MMR genes compared to all other RefSeq genes (1.75 vs. 1.34%, P = 0.0047), with a particularly high content in PMS2 (2.41%, P = 0.0047) and MSH6 (2.07%, P = 0.043). Based on a mathematical model of monorepeat frequency, we argue that the proposed mechanism may suffice to explain the observed excess of repeats around MMR genes. Our findings thus indicate that unstable sequences in MMR genes are maintained through evolution by the MMR mechanism. The evolutionary paradox of genetically unstable DNA repair genes may thus be explained by an equilibrium in which the phenotype acts back on its own genotype.


Subject(s)
Base Sequence/physiology , DNA Repair/genetics , Genetic Variation/physiology , Genomic Instability/physiology , Evolution, Molecular , Gene Frequency , Genes/physiology , Humans , Models, Biological , Models, Genetic , Models, Theoretical , Phenotype , Repetitive Sequences, Nucleic Acid/genetics , Sequence Analysis, DNA
16.
Bioinformatics ; 25(8): 996-1003, 2009 Apr 15.
Article in English | MEDLINE | ID: mdl-19244388

ABSTRACT

MOTIVATION: Helix-helix interactions play a critical role in the structure assembly, stability and function of membrane proteins. On the molecular level, the interactions are mediated by one or more residue contacts. Although previous studies focused on helix-packing patterns and sequence motifs, few of them developed methods specifically for contact prediction. RESULTS: We present a new hierarchical framework for contact prediction, with an application in membrane proteins. The hierarchical scheme consists of two levels: in the first level, contact residues are predicted from the sequence and their pairing relationships are further predicted in the second level. Statistical analyses on contact propensities are combined with other sequence and structural information for training the support vector machine classifiers. Evaluated on 52 protein chains using leave-one-out cross validation (LOOCV) and an independent test set of 14 protein chains, the two-level approach consistently improves the conventional direct approach in prediction accuracy, with 80% reduction of input for prediction. Furthermore, the predicted contacts are then used to infer interactions between pairs of helices. When at least three predicted contacts are required for an inferred interaction, the accuracy, sensitivity and specificity are 56%, 40% and 89%, respectively. Our results demonstrate that a hierarchical framework can be applied to eliminate false positives (FP) while reducing computational complexity in predicting contacts. Together with the estimated contact propensities, this method can be used to gain insights into helix-packing in membrane proteins.


Subject(s)
Computational Biology/methods , Membrane Proteins/chemistry , Databases, Protein , Membrane Proteins/metabolism , Models, Biological , Protein Structure, Secondary , Reproducibility of Results
17.
Nucleic Acids Res ; 35(9): 3100-8, 2007.
Article in English | MEDLINE | ID: mdl-17452365

ABSTRACT

The publication of a complete genome sequence is usually accompanied by annotations of its genes. In contrast to protein coding genes, genes for ribosomal RNA (rRNA) are often poorly or inconsistently annotated. This makes comparative studies based on rRNA genes difficult. We have therefore created computational predictors for the major rRNA species from all kingdoms of life and compiled them into a program called RNAmmer. The program uses hidden Markov models trained on data from the 5S ribosomal RNA database and the European ribosomal RNA database project. A pre-screening step makes the method fast with little loss of sensitivity, enabling the analysis of a complete bacterial genome in less than a minute. Results from running RNAmmer on a large set of genomes indicate that the location of rRNAs can be predicted with a very high level of accuracy. Novel, unannotated rRNAs are also predicted in many genomes. The software as well as the genome analysis results are available at the CBS web server.


Subject(s)
Genes, rRNA , Software , Computational Biology/methods , Genome, Bacterial , Genomics/methods , Markov Chains
18.
FEMS Immunol Med Microbiol ; 49(2): 243-51, 2007 Mar.
Article in English | MEDLINE | ID: mdl-17284282

ABSTRACT

Neisseria meningitidis, or the meningococcus, is the source of significant morbidity and mortality in humans worldwide. Even though mutability has been linked to the occurrence of outbreaks of epidemic disease, meningococcal DNA repair pathways are poorly delineated. For the first time, a collection of meningococcal disease-associated isolates has been demonstrated to express constitutively the DNA glycosylases MutY and Fpg in vivo. DNA sequence analysis showed considerable variability in the deduced amino acid sequences of MutS and Fpg, while MutY and RecA were highly conserved. Interestingly, multi-locus sequence typing demonstrated a putative link between the pattern of amino acid substitutions and levels of spontaneous mutagenicity in meningococcal strains. These results provide a basis for further studies aimed at resolving the genotype/phenotype relationships of meningococcal genome variability and mutator activity.


Subject(s)
Bacterial Proteins/genetics , DNA Repair/genetics , Neisseria meningitidis/genetics , Anti-Bacterial Agents/pharmacology , Base Sequence , DNA Glycosylases/genetics , DNA, Bacterial/chemistry , DNA, Bacterial/genetics , DNA-Formamidopyrimidine Glycosylase/genetics , DNA-Formamidopyrimidine Glycosylase/metabolism , Drug Resistance, Bacterial , Humans , Hydrogen Peroxide/pharmacology , Microbial Viability , Molecular Sequence Data , MutS DNA Mismatch-Binding Protein/genetics , Mutation , Neisseria meningitidis/drug effects , Neisseria meningitidis/physiology , Oxidative Stress , Phylogeny , Polymorphism, Genetic , Rec A Recombinases/genetics , Rifampin/pharmacology , Sequence Homology, Amino Acid
19.
J Comput Biol ; 13(6): 1197-213, 2006.
Article in English | MEDLINE | ID: mdl-16901237

ABSTRACT

A number of non-coding RNA are known to contain functionally important or conserved pseudoknots. However, pseudoknotted structures are more complex than orthodox, and most methods for analyzing secondary structures do not handle them. I present here a way to decompose and represent general secondary structures which extends the tree representation of the stem-loop structure, and use this to analyze the frequency of pseudoknots in known and in random secondary structures. This comparison shows that, though a number of pseudoknots exist, they are still relatively rare and mostly of the simpler kinds. In contrast, random secondary structures tend to be heavily knotted, and the number of available structures increases dramatically when allowing pseudoknots. Therefore, methods for structure prediction and non-coding RNA identification that allow pseudoknots are likely to be much less powerful than those that do not, unless they penalize pseudoknots appropriately.


Subject(s)
Models, Molecular , Nucleic Acid Conformation , RNA/chemistry , Algorithms , Sequence Analysis, RNA
SELECTION OF CITATIONS
SEARCH DETAIL
...