Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters










Publication year range
1.
Biotechnol Biofuels ; 9: 252, 2016.
Article in English | MEDLINE | ID: mdl-27895706

ABSTRACT

BACKGROUND: Trichoderma reesei is one of the main sources of biomass-hydrolyzing enzymes for the biotechnology industry. There is a need for improving its enzyme production efficiency. The use of metabolic modeling for the simulation and prediction of this organism's metabolism is potentially a valuable tool for improving its capabilities. An accurate metabolic model is needed to perform metabolic modeling analysis. RESULTS: A whole-genome metabolic model of T. reesei has been reconstructed together with metabolic models of 55 related species using the metabolic model reconstruction algorithm CoReCo. The previously published CoReCo method has been improved to obtain better quality models. The main improvements are the creation of a unified database of reactions and compounds and the use of reaction directions as constraints in the gap-filling step of the algorithm. In addition, the biomass composition of T. reesei has been measured experimentally to build and include a specific biomass equation in the model. CONCLUSIONS: The improvements presented in this work on the CoReCo pipeline for metabolic model reconstruction resulted in higher-quality metabolic models compared with previous versions. A metabolic model of T. reesei has been created and is publicly available in the BIOMODELS database. The model contains a biomass equation, reaction boundaries and uptake/export reactions which make it ready for simulation. To validate the model, we dem1onstrate that the model is able to predict biomass production accurately and no stoichiometrically infeasible yields are detected. The new T. reesei model is ready to be used for simulations of protein production processes.

2.
PLoS One ; 11(7): e0159302, 2016.
Article in English | MEDLINE | ID: mdl-27441920

ABSTRACT

In this paper we apply machine learning methods for predicting protein interactions in fungal secretion pathways. We assume an inter-species transfer setting, where training data is obtained from a single species and the objective is to predict protein interactions in other, related species. In our methodology, we combine several state of the art machine learning approaches, namely, multiple kernel learning (MKL), pairwise kernels and kernelized structured output prediction in the supervised graph inference framework. For MKL, we apply recently proposed centered kernel alignment and p-norm path following approaches to integrate several feature sets describing the proteins, demonstrating improved performance. For graph inference, we apply input-output kernel regression (IOKR) in supervised and semi-supervised modes as well as output kernel trees (OK3). In our experiments simulating increasing genetic distance, Input-Output Kernel Regression proved to be the most robust prediction approach. We also show that the MKL approaches improve the predictions compared to uniform combination of the kernels. We evaluate the methods on the task of predicting protein-protein-interactions in the secretion pathways in fungi, S.cerevisiae, baker's yeast, being the source, T. reesei being the target of the inter-species transfer learning. We identify completely novel candidate secretion proteins conserved in filamentous fungi. These proteins could contribute to their unique secretion capabilities.


Subject(s)
Fungal Proteins/metabolism , Machine Learning , Protein Interaction Mapping , Saccharomyces cerevisiae/metabolism , Secretory Pathway , Trichoderma/metabolism , Algorithms , Amino Acid Sequence , Databases, Protein , Evolution, Molecular , Fungal Proteins/chemistry , Genome, Fungal , Protein Interaction Maps , ROC Curve , Saccharomyces cerevisiae/genetics
3.
Appl Microbiol Biotechnol ; 100(16): 7203-22, 2016 Aug.
Article in English | MEDLINE | ID: mdl-27183995

ABSTRACT

The genomes of hybrid organisms, such as lager yeast (Saccharomyces cerevisiae × Saccharomyces eubayanus), contain orthologous genes, the functionality and effect of which may differ depending on their origin and copy number. How the parental subgenomes in lager yeast contribute to important phenotypic traits such as fermentation performance, aroma production, and stress tolerance remains poorly understood. Here, three de novo lager yeast hybrids with different ploidy levels (allodiploid, allotriploid, and allotetraploid) were generated through hybridization techniques without genetic modification. The hybrids were characterized in fermentations of both high gravity wort (15 °P) and very high gravity wort (25 °P), which were monitored for aroma compound and sugar concentrations. The hybrid strains with higher DNA content performed better during fermentation and produced higher concentrations of flavor-active esters in both worts. The hybrid strains also outperformed both the parent strains. Genome sequencing revealed that several genes related to the formation of flavor-active esters (ATF1, ATF2¸ EHT1, EEB1, and BAT1) were present in higher copy numbers in the higher ploidy hybrid strains. A direct relationship between gene copy number and transcript level was also observed. The measured ester concentrations and transcript levels also suggest that the functionality of the S. cerevisiae- and S. eubayanus-derived gene products differs. The results contribute to our understanding of the complex molecular mechanisms that determine phenotypes in lager yeast hybrids and are expected to facilitate targeted strain development through interspecific hybridization.


Subject(s)
Beer/microbiology , Chimera/genetics , Ethanol/metabolism , Fermentation/genetics , Saccharomyces cerevisiae/genetics , Chimera/growth & development , DNA, Fungal/genetics , Esters/analysis , Hybridization, Genetic , Organic Chemicals/analysis , Ploidies , Polymerase Chain Reaction , Polymorphism, Restriction Fragment Length , Saccharomyces cerevisiae/classification , Saccharomyces cerevisiae/metabolism , Transcription, Genetic/genetics
4.
Appl Microbiol Biotechnol ; 100(17): 7549-63, 2016 Sep.
Article in English | MEDLINE | ID: mdl-27102126

ABSTRACT

We describe here the identification and characterization of two novel enzymes belonging to the IlvD/EDD protein family, the D-xylonate dehydratase from Caulobacter crescentus, Cc XyDHT, (EC 4.2.1.82), and the L-arabonate dehydratase from Rhizobium leguminosarum bv. trifolii, Rl ArDHT (EC 4.2.1.25), that produce the corresponding 2-keto-3-deoxy-sugar acids. There is only a very limited amount of characterization data available on pentonate dehydratases, even though the enzymes from these oxidative pathways have potential applications with plant biomass pentose sugars. The two bacterial enzymes share 41 % amino acid sequence identity and were expressed and purified from Escherichia coli as homotetrameric proteins. Both dehydratases were shown to accept pentonate and hexonate sugar acids as their substrates and require Mg(2+) for their activity. Cc XyDHT displayed the highest activity on D-xylonate and D-gluconate, while Rl ArDHT functioned best on D-fuconate, L-arabonate and D-galactonate. The configuration of the OH groups at C2 and C3 position of the sugar acid were shown to be critical, and the C4 configuration also contributed substantially to the substrate recognition. The two enzymes were also shown to contain an iron-sulphur [Fe-S] cluster. Our phylogenetic analysis and mutagenesis studies demonstrated that the three conserved cysteine residues in the aldonic acid dehydratase group of IlvD/EDD family members, those of C60, C128 and C201 in Cc XyDHT, and of C59, C127 and C200 in Rl ArDHT, are needed for coordination of the [Fe-S] cluster. The iron-sulphur cluster was shown to be crucial for the catalytic activity (kcat) but not for the substrate binding (Km) of the two pentonate dehydratases.


Subject(s)
Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Caulobacter crescentus/enzymology , Hydro-Lyases/genetics , Hydro-Lyases/metabolism , Rhizobium leguminosarum/enzymology , Amino Acid Sequence , Arabinose/metabolism , Cloning, Molecular , Escherichia coli/genetics , Escherichia coli/metabolism , Gluconates/metabolism , Sequence Alignment , Xylose/metabolism
5.
Appl Microbiol Biotechnol ; 100(2): 969-85, 2016 Jan.
Article in English | MEDLINE | ID: mdl-26454869

ABSTRACT

Xylose is present with glucose in lignocellulosic streams available for valorisation to biochemicals. Saccharomyces cerevisiae has excellent characteristics as a host for the bioconversion, except that it strongly prefers glucose to xylose, and the co-consumption remains a challenge. Further, since xylose is not a natural substrate of S. cerevisiae, the regulatory response it induces in an engineered strain cannot be expected to have evolved for its utilisation. Xylose-induced effects on metabolism and gene expression during anaerobic growth of an engineered strain of S. cerevisiae on medium containing both glucose and xylose medium were quantified. The gene expression of S. cerevisiae with an XR-XDH pathway for xylose utilisation was analysed throughout the cultivation: at early cultivation times when mainly glucose was metabolised, at times when xylose was co-consumed in the presence of low glucose concentrations, and when glucose had been depleted and only xylose was being consumed. Cultivations on glucose as a sole carbon source were used as a control. Genome-scale dynamic flux balance analysis models were simulated to analyse the metabolic dynamics of S. cerevisiae. The simulations quantitatively estimated xylose-dependent flux dynamics and challenged the utilisation of the metabolic network. A relative increase in xylose utilisation was predicted to induce the bi-directionality of glycolytic flux and a redox challenge even at low glucose concentrations. Remarkably, xylose was observed to specifically delay the glucose-dependent repression of particular genes in mixed glucose-xylose cultures compared to glucose cultures. The delay occurred at a cultivation time when the metabolic flux activities were similar in the both cultures.


Subject(s)
Disaccharides/metabolism , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Xylose/metabolism , Anaerobiosis , Biomass , Culture Media/chemistry , Fermentation , Gene Expression , Genetic Engineering , Glucose/metabolism , Lignin/chemistry , Metabolic Networks and Pathways/genetics , Microarray Analysis , Saccharomyces cerevisiae/growth & development
6.
Appl Microbiol Biotechnol ; 99(22): 9439-47, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26264136

ABSTRACT

An open reading frame CC1225 from the Caulobacter crescentus CB15 genome sequence belongs to the Gfo/Idh/MocA protein family and has 47 % amino acid sequence identity with the glucose-fructose oxidoreductase from Zymomonas mobilis (Zm GFOR). We expressed the ORF CC1225 in the yeast Saccharomyces cerevisiae and used a yeast strain expressing the gene coding for Zm GFOR as a reference. Cell extracts of strains overexpressing CC1225 (renamed as Cc aaor) showed some Zm GFOR type of activity, producing D-gluconate and D-sorbitol when a mixture of D-glucose and D-fructose was used as substrate. However, the activity in Cc aaor expressing strain was >100-fold lower compared to strains expressing Zm gfor. Interestingly, C. crescentus AAOR was clearly more efficient than the Zm GFOR in converting in vitro a single sugar substrate D-xylose (10 mM) to xylitol without an added cofactor, whereas this type of activity was very low with Zm GFOR. Furthermore, when cultured in the presence of D-xylose, the S. cerevisiae strain expressing Cc aaor produced nearly equal concentrations of D-xylonate and xylitol (12.5 g D-xylonate l(-1) and 11.5 g D-xylitol l(-1) from 26 g D-xylose l(-1)), whereas the control strain and strain expressing Zm gfor produced only D-xylitol (5 g l(-1)). Deletion of the gene encoding the major aldose reductase, Gre3p, did not affect xylitol production in the strain expressing Cc aaor, but decreased xylitol production in the strain expressing Zm gfor. In addition, expression of Cc aaor together with the D-xylonolactone lactonase encoding the gene xylC from C. crescentus slightly increased the final concentration and initial volumetric production rate of both D-xylonate and D-xylitol. These results suggest that C. crescentus AAOR is a novel type of oxidoreductase able to convert the single aldose substrate D-xylose to both its oxidized and reduced product.


Subject(s)
Aldehyde Reductase/isolation & purification , Aldehyde Reductase/metabolism , Saccharomyces cerevisiae/enzymology , Saccharomyces cerevisiae/genetics , Sugar Acids/metabolism , Xylitol/metabolism , Xylose/metabolism , Aldehyde Reductase/genetics , Caulobacter crescentus/enzymology , Caulobacter crescentus/genetics , Gluconates/metabolism , Glucose/metabolism , Oxidation-Reduction , Oxidoreductases/genetics , Oxidoreductases/metabolism , Phylogeny , Saccharomyces cerevisiae/metabolism , Sorbitol/metabolism , Zymomonas/enzymology , Zymomonas/genetics
7.
Metab Eng ; 31: 153-62, 2015 Sep.
Article in English | MEDLINE | ID: mdl-26275749

ABSTRACT

Isoprene is a naturally produced hydrocarbon emitted into the atmosphere by green plants. It is also a constituent of synthetic rubber and a potential biofuel. Microbial production of isoprene can become a sustainable alternative to the prevailing chemical production of isoprene from petroleum. In this work, sequence homology searches were conducted to find novel isoprene synthases. Candidate sequences were functionally expressed in Escherichia coli and the desired enzymes were identified based on an isoprene production assay. The activity of three enzymes was shown for the first time: expression of the candidate genes from Ipomoea batatas, Mangifera indica, and Elaeocarpus photiniifolius resulted in isoprene formation. The Ipomoea batatas isoprene synthase produced the highest amounts of isoprene in all experiments, exceeding the isoprene levels obtained by the previously known Populus alba and Pueraria montana isoprene synthases that were studied in parallel as controls.


Subject(s)
Alkyl and Aryl Transferases/isolation & purification , Escherichia coli/genetics , Alkyl and Aryl Transferases/chemistry , Alkyl and Aryl Transferases/physiology , Amino Acid Sequence , Butadienes , Genome, Bacterial , Hemiterpenes/biosynthesis , Molecular Sequence Data , Pentanes , Sequence Homology
8.
BMC Biotechnol ; 14: 91, 2014 Oct 27.
Article in English | MEDLINE | ID: mdl-25344685

ABSTRACT

BACKGROUND: Trichoderma reesei is known as a good producer of industrial proteins but has hitherto been less successful in the production of therapeutic proteins. In order to elucidate the bottlenecks of heterologous protein production, human α-galactosidase A (GLA) was chosen as a model therapeutic protein. Fusion partners were designed to compare the effects of secretion using a cellobiohydrolase I (CBHI) carrier and intracellular production using a gamma zein peptide from maize (ZERA) which accumulates inside the endoplasmic reticulum (ER). The two strategies were compared on the basis of expression levels, purification performance, enzymatic activity, bioreactor cultivations, and transcriptional profiling. RESULTS: Constructs were cloned into the cbh1 locus of the T. reesei strain Rut-C30. The secretion and intracellular strains produced 20 mg/l and 636 mg/l of GLA respectively. Purifications of secreted product were accomplished using Step-Tactin affinity columns and for intracellular product, a method was developed for gravity-based density separation and protein body solubilisation. The secreted protein had similar specific activity to that of the commercially available mammalian form. The intracellular version had 5-10-fold lower activity due to the enzymes incompatibility with alkaline pH. The secretion strain achieved 10% lower total biomass than either the parental or the intracellular strain. The patterns of gene induction for intracellular and parental strains were similar, whereas the secretion strain had a broader spectrum of gene expression level changes. Identification of the genes involved indicated strong secretion stress in the secretion strain and to a lesser extent also in intracellular production. Genes involved in the unfolded protein response (UPR) and ER-associated degradation were induced by GLA production, including; hac1, pdi1, prp1, cnx1, der1, and bap31. CONCLUSIONS: Active human α-galactosidase could most effectively be produced intracellularly in Trichoderma reesei at >0.5 g/l by avoidance of the extracellular environment, although purification was challenging due to specific activity losses. Strain analysis revealed that in addition to the issues with secreted proteases, the processes of secretion stress including UPR and ER degradation remain as bottlenecks for heterologous protein production. Genetic engineering to eliminate these bottlenecks is the logical path towards establishing a strain capable of producing sensitive heterologous proteins.


Subject(s)
Protein Engineering/methods , alpha-Galactosidase/genetics , alpha-Galactosidase/metabolism , Humans , Protein Sorting Signals , Protein Transport , Secretory Pathway , Trichoderma/genetics
9.
Appl Microbiol Biotechnol ; 98(23): 9653-65, 2014 Dec.
Article in English | MEDLINE | ID: mdl-25236800

ABSTRACT

Four potential dehydrogenases identified through literature and bioinformatic searches were tested for L-arabonate production from L-arabinose in the yeast Saccharomyces cerevisiae. The most efficient enzyme, annotated as a D-galactose 1-dehydrogenase from the pea root nodule bacterium Rhizobium leguminosarum bv. trifolii, was purified from S. cerevisiae as a homodimeric protein and characterised. We named the enzyme as a L-arabinose/D-galactose 1-dehydrogenase (EC 1.1.1.-), Rl AraDH. It belongs to the Gfo/Idh/MocA protein family, prefers NADP(+) but uses also NAD(+) as a cofactor, and showed highest catalytic efficiency (k cat/K m) towards L-arabinose, D-galactose and D-fucose. Based on nuclear magnetic resonance (NMR) and modelling studies, the enzyme prefers the α-pyranose form of L-arabinose, and the stable oxidation product detected is L-arabino-1,4-lactone which can, however, open slowly at neutral pH to a linear L-arabonate form. The pH optimum for the enzyme was pH 9, but use of a yeast-in-vivo-like buffer at pH 6.8 indicated that good catalytic efficiency could still be expected in vivo. Expression of the Rl AraDH dehydrogenase in S. cerevisiae, together with the galactose permease Gal2 for L-arabinose uptake, resulted in production of 18 g of L-arabonate per litre, at a rate of 248 mg of L-arabonate per litre per hour, with 86 % of the provided L-arabinose converted to L-arabonate. Expression of a lactonase-encoding gene from Caulobacter crescentus was not necessary for L-arabonate production in yeast.


Subject(s)
Arabinose/metabolism , Galactose Dehydrogenases/metabolism , Rhizobium leguminosarum/enzymology , Saccharomyces cerevisiae/metabolism , Sugar Acids/metabolism , Cloning, Molecular , Coenzymes/metabolism , Enzyme Stability , Galactose Dehydrogenases/chemistry , Galactose Dehydrogenases/genetics , Galactose Dehydrogenases/isolation & purification , Gene Expression , Hydrogen-Ion Concentration , Kinetics , Molecular Sequence Data , NAD/metabolism , NADP/metabolism , Recombinant Proteins/chemistry , Recombinant Proteins/genetics , Recombinant Proteins/isolation & purification , Recombinant Proteins/metabolism , Rhizobium leguminosarum/metabolism , Saccharomyces cerevisiae/enzymology , Saccharomyces cerevisiae/genetics , Sequence Analysis, DNA
10.
BMC Genomics ; 15: 763, 2014 Sep 05.
Article in English | MEDLINE | ID: mdl-25192596

ABSTRACT

BACKGROUND: Production of D-xylonate by the yeast S. cerevisiae provides an example of bioprocess development for sustainable production of value-added chemicals from cheap raw materials or side streams. Production of D-xylonate may lead to considerable intracellular accumulation of D-xylonate and to loss of viability during the production process. In order to understand the physiological responses associated with D-xylonate production, we performed transcriptome analyses during D-xylonate production by a robust recombinant strain of S. cerevisiae which produces up to 50 g/L D-xylonate. RESULTS: Comparison of the transcriptomes of the D-xylonate producing and the control strain showed considerably higher expression of the genes controlled by the cell wall integrity (CWI) pathway and of some genes previously identified as up-regulated in response to other organic acids in the D-xylonate producing strain. Increased phosphorylation of Slt2 kinase in the D-xylonate producing strain also indicated that D-xylonate production caused stress to the cell wall. Surprisingly, genes encoding proteins involved in translation, ribosome structure and RNA metabolism, processes which are commonly down-regulated under conditions causing cellular stress, were up-regulated during D-xylonate production, compared to the control. The overall transcriptional responses were, therefore, very dissimilar to those previously reported as being associated with stress, including stress induced by organic acid treatment or production. Quantitative PCR analyses of selected genes supported the observations made in the transcriptomic analysis. In addition, consumption of ethanol was slower and the level of trehalose was lower in the D-xylonate producing strain, compared to the control. CONCLUSIONS: The production of organic acids has a major impact on the physiology of yeast cells, but the transcriptional responses to presence or production of different acids differs considerably, being much more diverse than responses to other stresses. D-Xylonate production apparently imposed considerable stress on the cell wall. Transcriptional data also indicated that activation of the PKA pathway occurred during D-xylonate production, leaving cells unable to adapt normally to stationary phase. This, together with intracellular acidification, probably contributes to cell death.


Subject(s)
Cell Wall/metabolism , Gene Expression Profiling/methods , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae/physiology , Sugar Acids/metabolism , Gene Expression Regulation, Fungal , MAP Kinase Signaling System , Mitogen-Activated Protein Kinases/genetics , Mitogen-Activated Protein Kinases/metabolism , Molecular Sequence Data , Phosphorylation , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae Proteins/metabolism , Sequence Analysis, RNA , Stress, Physiological , Xylose/metabolism
11.
PLoS Comput Biol ; 10(2): e1003465, 2014 Feb.
Article in English | MEDLINE | ID: mdl-24516375

ABSTRACT

We introduce a novel computational approach, CoReCo, for comparative metabolic reconstruction and provide genome-scale metabolic network models for 49 important fungal species. Leveraging on the exponential growth in sequenced genome availability, our method reconstructs genome-scale gapless metabolic networks simultaneously for a large number of species by integrating sequence data in a probabilistic framework. High reconstruction accuracy is demonstrated by comparisons to the well-curated Saccharomyces cerevisiae consensus model and large-scale knock-out experiments. Our comparative approach is particularly useful in scenarios where the quality of available sequence data is lacking, and when reconstructing evolutionary distant species. Moreover, the reconstructed networks are fully carbon mapped, allowing their use in 13C flux analysis. We demonstrate the functionality and usability of the reconstructed fungal models with computational steady-state biomass production experiment, as these fungi include some of the most important production organisms in industrial biotechnology. In contrast to many existing reconstruction techniques, only minimal manual effort is required before the reconstructed models are usable in flux balance experiments. CoReCo is available at http://esaskar.github.io/CoReCo/.


Subject(s)
Fungi/genetics , Fungi/metabolism , Genome, Fungal , Metabolic Networks and Pathways , Algorithms , Biomass , Biotechnology , Computational Biology , Evolution, Molecular , Fungi/classification , Gene Knockout Techniques , Industrial Microbiology , Metabolic Networks and Pathways/genetics , Models, Biological , Models, Genetic , Models, Statistical , Phylogeny , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/growth & development , Saccharomyces cerevisiae/metabolism , Species Specificity
12.
BMC Syst Biol ; 8: 16, 2014 Feb 14.
Article in English | MEDLINE | ID: mdl-24528924

ABSTRACT

BACKGROUND: Saccharomyces cerevisiae is able to adapt to a wide range of external oxygen conditions. Previously, oxygen-dependent phenotypes have been studied individually at the transcriptional, metabolite, and flux level. However, the regulation of cell phenotype occurs across the different levels of cell function. Integrative analysis of data from multiple levels of cell function in the context of a network of several known biochemical interaction types could enable identification of active regulatory paths not limited to a single level of cell function. RESULTS: The graph theoretical method called Enriched Molecular Path detection (EMPath) was extended to enable integrative utilization of transcription and flux data. The utility of the method was demonstrated by detecting paths associated with phenotype differences of S. cerevisiae under three different conditions of oxygen provision: 20.9%, 2.8% and 0.5%. The detection of molecular paths was performed in an integrated genome-scale metabolic and protein-protein interaction network. CONCLUSIONS: The molecular paths associated with the phenotype differences of S. cerevisiae under conditions of different oxygen provisions revealed paths of molecular interactions that could potentially mediate information transfer between processes that respond to the particular oxygen availabilities.


Subject(s)
Computational Biology/methods , Phenotype , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Transcription, Genetic , Cell Cycle , Down-Regulation , Fermentation , Gene Expression Regulation, Fungal , Oxygen , Saccharomyces cerevisiae/cytology
13.
Microb Cell Fact ; 11: 134, 2012 Oct 04.
Article in English | MEDLINE | ID: mdl-23035824

ABSTRACT

BACKGROUND: Trichoderma reesei is a soft rot Ascomycota fungus utilised for industrial production of secreted enzymes, especially lignocellulose degrading enzymes. About 30 carbohydrate active enzymes (CAZymes) of T. reesei have been biochemically characterised. Genome sequencing has revealed a large number of novel candidates for CAZymes, thus increasing the potential for identification of enzymes with novel activities and properties. Plenty of data exists on the carbon source dependent regulation of the characterised hydrolytic genes. However, information on the expression of the novel CAZyme genes, especially on complex biomass material, is very limited. RESULTS: In this study, the CAZyme gene content of the T. reesei genome was updated and the annotations of the genes refined using both computational and manual approaches. Phylogenetic analysis was done to assist the annotation and to identify functionally diversified CAZymes. The analyses identified 201 glycoside hydrolase genes, 22 carbohydrate esterase genes and five polysaccharide lyase genes. Updated or novel functional predictions were assigned to 44 genes, and the phylogenetic analysis indicated further functional diversification within enzyme families or groups of enzymes. GH3 ß-glucosidases, GH27 α-galactosidases and GH18 chitinases were especially functionally diverse. The expression of the lignocellulose degrading enzyme system of T. reesei was studied by cultivating the fungus in the presence of different inducing substrates and by subjecting the cultures to transcriptional profiling. The substrates included both defined and complex lignocellulose related materials, such as pretreated bagasse, wheat straw, spruce, xylan, Avicel cellulose and sophorose. The analysis revealed co-regulated groups of CAZyme genes, such as genes induced in all the conditions studied and also genes induced preferentially by a certain set of substrates. CONCLUSIONS: In this study, the CAZyme content of the T. reesei genome was updated, the discrepancies between the different genome versions and published literature were removed and the annotation of many of the genes was refined. Expression analysis of the genes gave information on the enzyme activities potentially induced by the presence of the different substrates. Comparison of the expression profiles of the CAZyme genes under the different conditions identified co-regulated groups of genes, suggesting common regulatory mechanisms for the gene groups.


Subject(s)
Lignin/metabolism , Trichoderma/genetics , Biomass , Cellulases/classification , Cellulases/genetics , Databases, Factual , Gene Expression Profiling , Genome, Fungal , Glycoside Hydrolases/genetics , Glycoside Hydrolases/metabolism , Phylogeny , Polysaccharide-Lyases/genetics , Polysaccharide-Lyases/metabolism , Substrate Specificity
14.
PLoS One ; 7(3): e32235, 2012.
Article in English | MEDLINE | ID: mdl-22461885

ABSTRACT

A variety of functionally important protein properties, such as secondary structure, transmembrane topology and solvent accessibility, can be encoded as a labeling of amino acids. Indeed, the prediction of such properties from the primary amino acid sequence is one of the core projects of computational biology. Accordingly, a panoply of approaches have been developed for predicting such properties; however, most such approaches focus on solving a single task at a time. Motivated by recent, successful work in natural language processing, we propose to use multitask learning to train a single, joint model that exploits the dependencies among these various labeling tasks. We describe a deep neural network architecture that, given a protein sequence, outputs a host of predicted local properties, including secondary structure, solvent accessibility, transmembrane topology, signal peptides and DNA-binding residues. The network is trained jointly on all these tasks in a supervised fashion, augmented with a novel form of semi-supervised learning in which the model is trained to distinguish between local patterns from natural and synthetic protein sequences. The task-independent architecture of the network obviates the need for task-specific feature engineering. We demonstrate that, for all of the tasks that we considered, our approach leads to statistically significant improvements in performance, relative to a single task neural network approach, and that the resulting model achieves state-of-the-art performance.


Subject(s)
Computational Biology/methods , Neural Networks, Computer , Protein Structure, Secondary , Proteins/chemistry , Algorithms , Binding Sites , Membrane Proteins/chemistry , Reproducibility of Results
15.
BMC Genomics ; 11: 441, 2010 Jul 19.
Article in English | MEDLINE | ID: mdl-20642838

ABSTRACT

BACKGROUND: Trichoderma reesei is the main industrial producer of cellulases and hemicellulases that are used to depolymerize biomass in a variety of biotechnical applications. Many of the production strains currently in use have been generated by classical mutagenesis. In this study we characterized genomic alterations in high-producing mutants of T. reesei by high-resolution array comparative genomic hybridization (aCGH). Our aim was to obtain genome-wide information which could be utilized for better understanding of the mechanisms underlying efficient cellulase production, and would enable targeted genetic engineering for improved production of proteins in general. RESULTS: We carried out an aCGH analysis of four high-producing strains (QM9123, QM9414, NG14 and Rut-C30) using the natural isolate QM6a as a reference. In QM9123 and QM9414 we detected a total of 44 previously undocumented mutation sites including deletions, chromosomal translocation breakpoints and single nucleotide mutations. In NG14 and Rut-C30 we detected 126 mutations of which 17 were new mutations not documented previously. Among these new mutations are the first chromosomal translocation breakpoints identified in NG14 and Rut-C30. We studied the effects of two deletions identified in Rut-C30 (a deletion of 85 kb in the scaffold 15 and a deletion in a gene encoding a transcription factor) on cellulase production by constructing knock-out strains in the QM6a background. Neither the 85 kb deletion nor the deletion of the transcription factor affected cellulase production. CONCLUSIONS: aCGH analysis identified dozens of mutations in each strain analyzed. The resolution was at the level of single nucleotide mutation. High-density aCGH is a powerful tool for genome-wide analysis of organisms with small genomes e.g. fungi, especially in studies where a large set of interesting strains is analyzed.


Subject(s)
Cellulase/biosynthesis , Comparative Genomic Hybridization/methods , Oligonucleotide Array Sequence Analysis/methods , Trichoderma/genetics , Trichoderma/metabolism , DNA, Fungal/genetics , Genomics , Oligonucleotide Probes/genetics , Polymorphism, Single Nucleotide , Sequence Deletion
16.
PLoS One ; 4(4): e5179, 2009.
Article in English | MEDLINE | ID: mdl-19365549

ABSTRACT

BACKGROUND: Retroviral LTRs, paired or single, influence the transcription of both retroviral and non-retroviral genomic sequences. Vertebrate genomes contain many thousand endogenous retroviruses (ERVs) and their LTRs. Single LTRs are difficult to detect from genomic sequences without recourse to repetitiveness or presence in a proviral structure. Understanding of LTR structure increases understanding of LTR function, and of functional genomics. Here we develop models of orthoretroviral LTRs useful for detection in genomes and for structural analysis. PRINCIPAL FINDINGS: Although mutated, ERV LTRs are more numerous and diverse than exogenous retroviral (XRV) LTRs. Hidden Markov models (HMMs), and alignments based on them, were created for HML- (human MMTV-like), general-beta-, gamma- and lentiretroviruslike LTRs, plus a general-vertebrate LTR model. Training sets were XRV LTRs and RepBase LTR consensuses. The HML HMM was most sensitive and detected 87% of the HML LTRs in human chromosome 19 at 96% specificity. By combining all HMMs with a low cutoff, for screening, 71% of all LTRs found by RepeatMasker in chromosome 19 were found. HMM consensus sequences had a conserved modular LTR structure. Target site duplications (TG-CA), TATA (occasionally absent), an AATAAA box and a T-rich region were prominent features. Most of the conservation was located in, or adjacent to, R and U5, with evidence for stem loops. Several of the long HML LTRs contained long ORFs inserted after the second A rich module. HMM consensus alignment allowed comparison of functional features like transcriptional start sites (sense and antisense) between XRVs and ERVs. CONCLUSION: The modular conserved and redundant orthoretroviral LTR structure with three A-rich regions is reminiscent of structurally relaxed Giardia promoters. The five HMMs provided a novel broad range, repeat-independent, ab initio LTR detection, with prospects for greater generalisation, and insight into LTR structure, which may aid development of LTR-targeted pharmaceuticals.


Subject(s)
DNA, Viral/genetics , Retroviridae/genetics , Terminal Repeat Sequences , Algorithms , Animals , Base Sequence , DNA, Viral/chemistry , Gene Expression Regulation, Viral , Genome, Human , Genome, Viral , Humans , Mice , Molecular Sequence Data , Nucleic Acid Conformation , Open Reading Frames , Opossums/genetics , Sensitivity and Specificity
17.
BMC Bioinformatics ; 8 Suppl 2: S11, 2007 May 03.
Article in English | MEDLINE | ID: mdl-17493249

ABSTRACT

BACKGROUND: Human endogenous retroviruses (HERVs) are surviving traces of ancient retrovirus infections and now reside within the human DNA. Recently HERV expression has been detected in both normal tissues and diseased patients. However, the activities (expression levels) of individual HERV sequences are mostly unknown. RESULTS: We introduce a generative mixture model, based on Hidden Markov Models, for estimating the activities of the individual HERV sequences from EST (expressed sequence tag) databases. We use the model to estimate the relative activities of 181 HERVs. We also empirically justify a faster heuristic method for HERV activity estimation and use it to estimate the activities of 2450 HERVs. The majority of the HERV activities were previously unknown. CONCLUSION: (i) Our methods estimate activity accurately based on experiments on simulated data. (ii) Our estimate on real data shows that 7% of the HERVs are active. The active ones are spread unevenly into HERV groups and relatively uniformly in terms of estimated age. HERVs with the retroviral env gene are more often active than HERVs without env. Few of the active HERVs have open reading frames for retroviral proteins.


Subject(s)
Algorithms , Chromosome Mapping/methods , Databases, Genetic , Evolution, Molecular , Expressed Sequence Tags , Genome, Viral/genetics , Retroviridae/genetics , Virus Activation/genetics , Humans , Markov Chains , Retroviridae/classification , Species Specificity
18.
Int J Neural Syst ; 15(3): 163-79, 2005 Jun.
Article in English | MEDLINE | ID: mdl-16013088

ABSTRACT

About 8 per cent of the human genome consists of human endogenous retroviral sequences (HERVs), which are remains from ancient infections. The HERVs may give rise to transcripts or affect the expression of human genes. The first step in understanding HERV function is to classify HERVs into families. In this work we study the relationships of existing HERV families and detect potentially new HERV families. A Median Self-Organizing Map (SOM), a SOM for non-vectorial data, is used to group and visualize a collection of 3661 HERVs. The SOM-based analysis is complemented with estimates of the reliability of the results. A novel trustworthiness visualization method is used to estimate which parts of the SOM visualization are reliable and which not. The reliability of extracted interesting HERV groups is verified by a bootstrap procedure suitable for SOM visualization-based analysis. The SOM detects a group of epsilonretroviral sequences and a group of ERV9, HERVW, and HUERSP3 sequences which suggests that ERV9 and HERVW sequences may have a common origin.


Subject(s)
Artificial Intelligence , Chromosome Mapping/methods , DNA/genetics , Endogenous Retroviruses/genetics , Genome, Human , Algorithms , Humans , Phylogeny , Reproducibility of Results
19.
BMC Bioinformatics ; 4: 48, 2003 Oct 13.
Article in English | MEDLINE | ID: mdl-14552657

ABSTRACT

BACKGROUND: Conventionally, the first step in analyzing the large and high-dimensional data sets measured by microarrays is visual exploration. Dendrograms of hierarchical clustering, self-organizing maps (SOMs), and multidimensional scaling have been used to visualize similarity relationships of data samples. We address two central properties of the methods: (i) Are the visualizations trustworthy, i.e., if two samples are visualized to be similar, are they really similar? (ii) The metric. The measure of similarity determines the result; we propose using a new learning metrics principle to derive a metric from interrelationships among data sets. RESULTS: The trustworthiness of hierarchical clustering, multidimensional scaling, and the self-organizing map were compared in visualizing similarity relationships among gene expression profiles. The self-organizing map was the best except that hierarchical clustering was the most trustworthy for the most similar profiles. Trustworthiness can be further increased by treating separately those genes for which the visualization is least trustworthy. We then proceed to improve the metric. The distance measure between the expression profiles is adjusted to measure differences relevant to functional classes of the genes. The genes for which the new metric is the most different from the usual correlation metric are listed and visualized with one of the visualization methods, the self-organizing map, computed in the new metric. CONCLUSIONS: The conjecture from the methodological results is that the self-organizing map can be recommended to complement the usual hierarchical clustering for visualizing and exploring gene expression data. Discarding the least trustworthy samples and improving the metric still improves it.


Subject(s)
Computer Graphics/standards , Gene Expression Profiling/standards , Oligonucleotide Array Sequence Analysis/standards , Animals , Cluster Analysis , Computer Graphics/trends , Gene Expression Profiling/methods , Gene Expression Profiling/statistics & numerical data , Gene Expression Regulation/genetics , Gene Expression Regulation, Fungal/genetics , Humans , Mice , Oligonucleotide Array Sequence Analysis/methods , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Sequence Homology, Nucleic Acid
SELECTION OF CITATIONS
SEARCH DETAIL
...