Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 9 de 9
Filter
Add more filters










Database
Language
Publication year range
1.
Plant Cell ; 34(2): 867-888, 2022 02 03.
Article in English | MEDLINE | ID: mdl-34865154

ABSTRACT

Plants respond to wounding stress by changing gene expression patterns and inducing the production of hormones including jasmonic acid. This wounding transcriptional response activates specialized metabolism pathways such as the glucosinolate pathways in Arabidopsis thaliana. While the regulatory factors and sequences controlling a subset of wound-response genes are known, it remains unclear how wound response is regulated globally. Here, we how these responses are regulated by incorporating putative cis-regulatory elements, known transcription factor binding sites, in vitro DNA affinity purification sequencing, and DNase I hypersensitive sites to predict genes with different wound-response patterns using machine learning. We observed that regulatory sites and regions of open chromatin differed between genes upregulated at early and late wounding time-points as well as between genes induced by jasmonic acid and those not induced. Expanding on what we currently know, we identified cis-elements that improved model predictions of expression clusters over known binding sites. Using a combination of genome editing, in vitro DNA-binding assays, and transient expression assays using native and mutated cis-regulatory elements, we experimentally validated four of the predicted elements, three of which were not previously known to function in wound-response regulation. Our study provides a global model predictive of wound response and identifies new regulatory sequences important for wounding without requiring prior knowledge of the transcriptional regulators.


Subject(s)
Arabidopsis/physiology , Gene Expression Regulation, Plant , Plant Growth Regulators/physiology , Arabidopsis/drug effects , Arabidopsis/genetics , Cyclopentanes/pharmacology , Metabolic Networks and Pathways , Models, Biological , Oxylipins/pharmacology , Plant Growth Regulators/pharmacology , Plants, Genetically Modified , Regulatory Sequences, Nucleic Acid , Reproducibility of Results , Transcription Factors/genetics
2.
Mol Biol Evol ; 38(8): 3397-3414, 2021 07 29.
Article in English | MEDLINE | ID: mdl-33871641

ABSTRACT

Genetic redundancy refers to a situation where an individual with a loss-of-function mutation in one gene (single mutant) does not show an apparent phenotype until one or more paralogs are also knocked out (double/higher-order mutant). Previous studies have identified some characteristics common among redundant gene pairs, but a predictive model of genetic redundancy incorporating a wide variety of features derived from accumulating omics and mutant phenotype data is yet to be established. In addition, the relative importance of these features for genetic redundancy remains largely unclear. Here, we establish machine learning models for predicting whether a gene pair is likely redundant or not in the model plant Arabidopsis thaliana based on six feature categories: functional annotations, evolutionary conservation including duplication patterns and mechanisms, epigenetic marks, protein properties including posttranslational modifications, gene expression, and gene network properties. The definition of redundancy, data transformations, feature subsets, and machine learning algorithms used significantly affected model performance based on holdout, testing phenotype data. Among the most important features in predicting gene pairs as redundant were having a paralog(s) from recent duplication events, annotation as a transcription factor, downregulation during stress conditions, and having similar expression patterns under stress conditions. We also explored the potential reasons underlying mispredictions and limitations of our studies. This genetic redundancy model sheds light on characteristics that may contribute to long-term maintenance of paralogs, and will ultimately allow for more targeted generation of functionally informative double mutants, advancing functional genomic studies.


Subject(s)
Arabidopsis/genetics , Biological Evolution , Gene Duplication , Machine Learning , Models, Genetic
3.
New Phytol ; 231(1): 475-489, 2021 07.
Article in English | MEDLINE | ID: mdl-33749860

ABSTRACT

Plant metabolites from diverse pathways are important for plant survival, human nutrition and medicine. The pathway memberships of most plant enzyme genes are unknown. While co-expression is useful for assigning genes to pathways, expression correlation may exist only under specific spatiotemporal and conditional contexts. Utilising > 600 tomato (Solanum lycopersicum) expression data combinations, three strategies for predicting memberships in 85 pathways were explored. Optimal predictions for different pathways require distinct data combinations indicative of pathway functions. Naive prediction (i.e. identifying pathways with the most similarly expressed genes) is error prone. In 52 pathways, unsupervised learning performed better than supervised approaches, possibly due to limited training data availability. Using gene-to-pathway expression similarities led to prediction models that outperformed those based simply on expression levels. Using 36 experimental validated genes, the pathway-best model prediction accuracy is 58.3%, significantly better compared with that for predicting annotated genes without experimental evidence (37.0%) or random guess (1.2%), demonstrating the importance of data quality. Our study highlights the need to extensively explore expression-based features and prediction strategies to maximise the accuracy of metabolic pathway membership assignment. The prediction framework outlined here can be applied to other species and serves as a baseline model for future comparisons.


Subject(s)
Metabolic Networks and Pathways , Solanum lycopersicum , Gene Expression , Genes, Plant , Solanum lycopersicum/genetics , Metabolic Networks and Pathways/genetics
4.
BMC Genomics ; 22(1): 99, 2021 Feb 02.
Article in English | MEDLINE | ID: mdl-33530937

ABSTRACT

BACKGROUND: Availability of plant genome sequences has led to significant advances. However, with few exceptions, the great majority of existing genome assemblies are derived from short read sequencing technologies with highly uneven read coverages indicative of sequencing and assembly issues that could significantly impact any downstream analysis of plant genomes. In tomato for example, 0.6% (5.1 Mb) and 9.7% (79.6 Mb) of short-read based assembly had significantly higher and lower coverage compared to background, respectively. RESULTS: To understand what the causes may be for such uneven coverage, we first established machine learning models capable of predicting genomic regions with variable coverages and found that high coverage regions tend to have higher simple sequence repeat and tandem gene densities compared to background regions. To determine if the high coverage regions were misassembled, we examined a recently available tomato long-read based assembly and found that 27.8% (1.41 Mb) of high coverage regions were potentially misassembled of duplicate sequences, compared to 1.4% in background regions. In addition, using a predictive model that can distinguish correctly and incorrectly assembled high coverage regions, we found that misassembled, high coverage regions tend to be flanked by simple sequence repeats, pseudogenes, and transposon elements. CONCLUSIONS: Our study provides insights on the causes of variable coverage regions and a quantitative assessment of factors contributing to plant genome misassembly when using short reads and the generality of these causes and factors should be tested further in other species.


Subject(s)
Genome, Plant , High-Throughput Nucleotide Sequencing , DNA Transposable Elements/genetics , Genomics , Sequence Analysis, DNA
5.
In Silico Plants ; 2(1): diaa005, 2020.
Article in English | MEDLINE | ID: mdl-33344884

ABSTRACT

Plant specialized metabolites mediate interactions between plants and the environment and have significant agronomical/pharmaceutical value. Most genes involved in specialized metabolism (SM) are unknown because of the large number of metabolites and the challenge in differentiating SM genes from general metabolism (GM) genes. Plant models like Arabidopsis thaliana have extensive, experimentally derived annotations, whereas many non-model species do not. Here we employed a machine learning strategy, transfer learning, where knowledge from A. thaliana is transferred to predict gene functions in cultivated tomato with fewer experimentally annotated genes. The first tomato SM/GM prediction model using only tomato data performs well (F-measure = 0.74, compared with 0.5 for random and 1.0 for perfect predictions), but from manually curating 88 SM/GM genes, we found many mis-predicted entries were likely mis-annotated. When the SM/GM prediction models built with A. thaliana data were used to filter out genes where the A. thaliana-based model predictions disagreed with tomato annotations, the new tomato model trained with filtered data improved significantly (F-measure = 0.92). Our study demonstrates that SM/GM genes can be better predicted by leveraging cross-species information. Additionally, our findings provide an example for transfer learning in genomics where knowledge can be transferred from an information-rich species to an information-poor one.

6.
Elife ; 92020 07 02.
Article in English | MEDLINE | ID: mdl-32613943

ABSTRACT

Plants produce phylogenetically and spatially restricted, as well as structurally diverse specialized metabolites via multistep metabolic pathways. Hallmarks of specialized metabolic evolution include enzymatic promiscuity and recruitment of primary metabolic enzymes and examples of genomic clustering of pathway genes. Solanaceae glandular trichomes produce defensive acylsugars, with sidechains that vary in length across the family. We describe a tomato gene cluster on chromosome 7 involved in medium chain acylsugar accumulation due to trichome specific acyl-CoA synthetase and enoyl-CoA hydratase genes. This cluster co-localizes with a tomato steroidal alkaloid gene cluster and is syntenic to a chromosome 12 region containing another acylsugar pathway gene. We reconstructed the evolutionary events leading to this gene cluster and found that its phylogenetic distribution correlates with medium chain acylsugar accumulation across the Solanaceae. This work reveals insights into the dynamics behind gene cluster evolution and cell-type specific metabolite diversity.


Plants produce a vast variety of different molecules known as secondary or specialized metabolites to attract pollinating insects, such as bees, or protect themselves against herbivores and pests. The secondary metabolites are made from simple building blocks that are readily available in plants, including amino acids, fatty acids and sugars. Different species of plant, and even different parts of the same plant, produce their own sets of secondary metabolites. For example, the hairs on the surface of tomatoes and other members of the nightshade family of plants make metabolites known as acylsugars. These chemicals deter herbivores and pests from damaging the plants. To make acylsugars, the plants attach long chains known as fatty acyl groups to molecules of sugar, such as sucrose. Some members of the nightshade family produce acylsugars with longer chains than others. In particular, acylsugars with long chains are only found in tomatoes and other closely-related species. It remained unclear how the nightshade family evolved to produce acylsugars with chains of different lengths. To address this question, Fan et al. used genetic and biochemical approaches to study tomato plants and other members of the nightshade family. The experiments identified two genes known as AACS and AECH in tomatoes that produce acylsugars with long chains. These two genes originated from the genes of older enzymes that metabolize fatty acids ­ the building blocks of fats ­ in plant cells. Unlike the older genes, AACS and AECH were only active at the tips of the hairs on the plant's surface. Fan et al. then investigated the evolutionary relationship between 11 members of the nightshade family and two other plant species. This revealed that AACS and AECH emerged in the nightshade family around the same time that longer chains of acylsugars started appearing. These findings provide insights into how plants evolved to be able to produce a variety of secondary metabolites that may protect them from a broader range of pests. The gene cluster identified in this work could be used to engineer other species of crop plants to start producing acylsugars as natural pesticides.


Subject(s)
Evolution, Molecular , Genes, Plant/genetics , Metabolic Networks and Pathways/genetics , Multigene Family/genetics , Solanaceae/genetics , Conserved Sequence/genetics , Genetic Variation/genetics , Solanaceae/metabolism , Solanum/genetics , Solanum/metabolism , Trichomes/metabolism
7.
Proc Natl Acad Sci U S A ; 116(6): 2344-2353, 2019 02 05.
Article in English | MEDLINE | ID: mdl-30674669

ABSTRACT

Plant specialized metabolism (SM) enzymes produce lineage-specific metabolites with important ecological, evolutionary, and biotechnological implications. Using Arabidopsis thaliana as a model, we identified distinguishing characteristics of SM and GM (general metabolism, traditionally referred to as primary metabolism) genes through a detailed study of features including duplication pattern, sequence conservation, transcription, protein domain content, and gene network properties. Analysis of multiple sets of benchmark genes revealed that SM genes tend to be tandemly duplicated, coexpressed with their paralogs, narrowly expressed at lower levels, less conserved, and less well connected in gene networks relative to GM genes. Although the values of each of these features significantly differed between SM and GM genes, any single feature was ineffective at predicting SM from GM genes. Using machine learning methods to integrate all features, a prediction model was established with a true positive rate of 87% and a true negative rate of 71%. In addition, 86% of known SM genes not used to create the machine learning model were predicted. We also demonstrated that the model could be further improved when we distinguished between SM, GM, and junction genes responsible for reactions shared by SM and GM pathways, indicating that topological considerations may further improve the SM prediction model. Application of the prediction model led to the identification of 1,220 A. thaliana genes with previously unknown functions, each assigned a confidence measure called an SM score, providing a global estimate of SM gene content in a plant genome.

8.
Genome Biol Evol ; 10(10): 2596-2613, 2018 10 01.
Article in English | MEDLINE | ID: mdl-30239695

ABSTRACT

Gene duplication and loss contribute to gene content differences as well as phenotypic divergence across species. However, the extent to which gene content varies among closely related plant species and the factors responsible for such variation remain unclear. Here, using the Solanaceae family as a model and Pfam domain families as a proxy for gene families, we investigated variation in gene family sizes across species and the likely factors contributing to the variation. We found that genes in highly variable families have high turnover rates and tend to be involved in processes that have diverged between Solanaceae species, whereas genes in low-variability families tend to have housekeeping roles. In addition, genes in high- and low-variability gene families tend to be duplicated by tandem and whole genome duplication, respectively. This finding together with the observation that genes duplicated by different mechanisms experience different selection pressures suggest that duplication mechanism impacts gene family turnover. We explored using pseudogene number as a proxy for gene loss but discovered that a substantial number of pseudogenes are actually products of pseudogene duplication, contrary to the expectation that most plant pseudogenes are remnants of once-functional duplicates. Our findings reveal complex relationships between variation in gene family size, gene functions, duplication mechanism, and evolutionary rate. The patterns of lineage-specific gene family expansion within the Solanaceae provide the foundation for a better understanding of the genetic basis underlying phenotypic diversity in this economically important family.


Subject(s)
Genome, Plant , Multigene Family , Solanaceae/genetics , Gene Duplication , Genetic Variation , Genomics , Pseudogenes
9.
Biotechniques ; 60(1): 13-20, 2016 Jan.
Article in English | MEDLINE | ID: mdl-26757807

ABSTRACT

The zebrafish represents a revolutionary tool in large-scale genetic and small-molecule screens for gene and drug discovery. Transgenic zebrafish are often utilized in these screens. Many transgenic fish lines are maintained in the heterozygous state due to the lethality associated with homozygosity; thus, their progeny must be sorted to ensure a population expressing the transgene of interest for use in screens. Sorting transgenic embryos under a fluorescence microscope is very labor-intensive and demands fine-tuned motor skills. Here we report an efficient transgenic method of utilizing pigmentation rescue of nacre mutant fish for accurate naked-eye identification of both mosaic founders and stable transgenic zebrafish. This was accomplished by co-injecting two constructs with the I-SceI meganuclease enzyme into pigmentless nacre embryos: I-SceI-mitfa:mitfa-I-SceI to rescue the pigmentation and I-SceI-zpromoter:gene-of-interest-I-SceI to express the gene of interest under a zebrafish promoter (zpromoter). Pigmentation rescue reliably predicted transgene integration. Compared with other transgenic techniques, our approach significantly increases the overall percentage of founders and facilitates accurate naked-eye identification of stable transgenic fish, greatly reducing laborious fluorescence microscope sorting and PCR genotyping. Thus, this approach is ideal for generating transgenic fish for large-scale screens.


Subject(s)
Gene Transfer Techniques , Microphthalmia-Associated Transcription Factor/genetics , Pigmentation/genetics , Promoter Regions, Genetic , Zebrafish Proteins/genetics , Animals , Animals, Genetically Modified , Genotype , Green Fluorescent Proteins/genetics , Microscopy, Fluorescence , Zebrafish/genetics , Zebrafish/physiology
SELECTION OF CITATIONS
SEARCH DETAIL
...