Search | VHL Regional Portal

crosshap: R package for local haplotype visualization for trait association analysis.

Marsh, Jacob I; Petereit, Jakob; Johnston, Brady A; Bayer, Philipp E; Tay Fernandez, Cassandria G; Al-Mamun, Hawlader A; Batley, Jacqueline; Edwards, David.

Bioinformatics ; 39(8)2023 08 01.

Article in English | MEDLINE | ID: mdl-37607004

ABSTRACT

SUMMARY: Genome-wide association studies (GWAS) excels at harnessing dense genomic variant datasets to identify candidate regions responsible for producing a given phenotype. However, GWAS and traditional fine-mapping methods do not provide insight into the complex local landscape of linkage that contains and has been shaped by the causal variant(s). Here, we present crosshap, an R package that performs robust density-based clustering of variants based on their linkage profiles to capture haplotype structures in a local genomic region of interest. Following this, crosshap is equipped with visualization tools for choosing optimal clustering parameters (É) before producing an intuitive figure that provides an overview of the complex relationships between linked variants, haplotype combinations, phenotype, and metadata traits. AVAILABILITY AND IMPLEMENTATION: The crosshap package is freely available under the MIT license and can be downloaded directly from CRAN with R >4.0.0. The development version is available on GitHub alongside issue support (https://github.com/jacobimarsh/crosshap). Tutorial vignettes and documentation are available (https://jacobimarsh.github.io/crosshap/).

Subject(s)

Documentation , Genome-Wide Association Study , Cluster Analysis , Haplotypes , Phenotype

The conservation of gene models can support genome annotation.

Tay Fernandez, Cassandria G; Bayer, Philipp E; Petereit, Jakob; Varshney, Rajeev; Batley, Jacqueline; Edwards, David.

Plant Genome ; 16(3): e20377, 2023 09.

Article in English | MEDLINE | ID: mdl-37602500

ABSTRACT

Many genome annotations include false-positive gene models, leading to errors in phylogenetic and comparative studies. Here, we propose a method to support gene model prediction based on evolutionary conservation and use it to identify potentially erroneous annotations. Using this method, we developed a set of 15,345 representative gene models from 12 legume assemblies that can be used to support genome annotations for other legumes.

Subject(s)

Fabaceae , Phylogeny

Legume-wide comparative analysis of pod shatter locus PDH1 reveals phaseoloid specificity, high cowpea expression, and stress responsive genomic context.

Marsh, Jacob I; Nestor, Benjamin J; Petereit, Jakob; Tay Fernandez, Cassandria G; Bayer, Philipp E; Batley, Jacqueline; Edwards, David.

Plant J ; 115(1): 68-80, 2023 Jul.

Article in English | MEDLINE | ID: mdl-36970933

ABSTRACT

Pod dehiscence is a major source of yield loss in legumes, which is exacerbated by aridity. Disruptive mutations in "Pod indehiscent 1" (PDH1), a pod sclerenchyma-specific lignin biosynthesis gene, has been linked to significant reductions in dehiscence in several legume species. We compared syntenic PDH1 regions across 12 legumes and two outgroups to uncover key historical evolutionary trends at this important locus. Our results clarified the extent to which PDH1 orthologs are present in legumes, showing the typical genomic context surrounding PDH1 has only arisen relatively recently in certain phaseoloid species (Vigna, Phaseolus, Glycine). The notable absence of PDH1 in Cajanus cajan may be a major contributor to its indehiscent phenotype compared with other phaseoloids. In addition, we identified a novel PDH1 ortholog in Vigna angularis and detected remarkable increases in PDH1 transcript abundance during Vigna unguiculata pod development. Investigation of the shared genomic context of PDH1 revealed it lies in a hotspot of transcription factors and signaling gene families that respond to abscisic acid and drought stress, which we hypothesize may be an additional factor influencing expression of PDH1 under specific environmental conditions. Our findings provide key insights into the evolutionary history of PDH1 and lay the foundation for optimizing the pod dehiscence role of PDH1 in major and understudied legume species.

Subject(s)

Phaseolus , Vigna , Vigna/genetics , Quantitative Trait Loci , Genome, Plant/genetics , Phaseolus/genetics , Genomics

Pangenomics and Crop Genome Adaptation in a Changing Climate.

Petereit, Jakob; Bayer, Philipp E; Thomas, William J W; Tay Fernandez, Cassandria G; Amas, Junrey; Zhang, Yueqi; Batley, Jacqueline; Edwards, David.

Plants (Basel) ; 11(15)2022 Jul 27.

Article in English | MEDLINE | ID: mdl-35956427

ABSTRACT

During crop domestication and breeding, wild plant species have been shaped into modern high-yield crops and adapted to the main agro-ecological regions. However, climate change will impact crop productivity in these regions, and agriculture needs to adapt to support future food production. On a global scale, crop wild relatives grow in more diverse environments than crop species, and so may host genes that could support the adaptation of crops to new and variable environments. Through identification of individuals with increased climate resilience we may gain a greater understanding of the genomic basis for this resilience and transfer this to crops. Pangenome analysis can help to identify the genes underlying stress responses in individuals harbouring untapped genomic diversity in crop wild relatives. The information gained from the analysis of these pangenomes can then be applied towards breeding climate resilience into existing crops or to re-domesticating crops, combining environmental adaptation traits with crop productivity.

An SGSGeneloss-Based Method for Constructing a Gene Presence-Absence Table Using Mosdepth.

Tay Fernandez, Cassandria G; Marsh, Jacob I; Nestor, Benjamin J; Gill, Mitchell; Golicz, Agnieszka A; Bayer, Philipp E; Edwards, David.

Methods Mol Biol ; 2512: 73-80, 2022.

Article in English | MEDLINE | ID: mdl-35818000

ABSTRACT

Presence-absence variants (PAV) are genomic regions present in some individuals of a species, but not others. PAVs have been shown to contribute to genomic diversity, especially in bacteria and plants. These structural variations have been linked to traits and can be used to track a species' evolutionary history. PAVs are usually called by aligning short read sequence data from one or more individuals to a reference genome or pangenome assembly, and then comparing coverage. Regions where reads do not align define absence in that individual, and the regions are classified as PAVs. The method below details how to align sequence reads to a reference and how to use the sequencing-coverage calculator Mosdepth to identify PAVs and construct a PAV table for use in downstream comparative genome analysis.

Subject(s)

Genome , Genomics , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Humans , Sequence Analysis, DNA/methods

Evaluating Plant Gene Models Using Machine Learning.

Upadhyaya, Shriprabha R; Bayer, Philipp E; Tay Fernandez, Cassandria G; Petereit, Jakob; Batley, Jacqueline; Bennamoun, Mohammed; Boussaid, Farid; Edwards, David.

Plants (Basel) ; 11(12)2022 Jun 20.

Article in English | MEDLINE | ID: mdl-35736770

ABSTRACT

Gene models are regions of the genome that can be transcribed into RNA and translated to proteins, or belong to a class of non-coding RNA genes. The prediction of gene models is a complex process that can be unreliable, leading to false positive annotations. To help support the calling of confident conserved gene models and minimize false positives arising during gene model prediction we have developed Truegene, a machine learning approach to classify potential low confidence gene models using 14 gene and 41 protein-based characteristics. Amino acid and nucleotide sequence-based features were calculated for conserved (high confidence) and non-conserved (low confidence) annotated genes from the published Pisum sativum Cameor genome. These features were used to train eXtreme Gradient Boost (XGBoost) classifier models to predict whether a gene model is likely to be real. The optimized models demonstrated a prediction accuracy ranging from 87% to 90% and an F-1 score of 0.91-0.94. We used SHapley Additive exPlanations (SHAP) and feature importance plots to identify the features that contribute to the model predictions, and we show that protein and gene-based features can be used to build accurate models for gene prediction that have applications in supporting future gene annotation processes.

Expanding Gene-Editing Potential in Crop Improvement with Pangenomes.

Tay Fernandez, Cassandria G; Nestor, Benjamin J; Danilevicz, Monica F; Marsh, Jacob I; Petereit, Jakob; Bayer, Philipp E; Batley, Jacqueline; Edwards, David.

Int J Mol Sci ; 23(4)2022 Feb 18.

Article in English | MEDLINE | ID: mdl-35216392

ABSTRACT

Pangenomes aim to represent the complete repertoire of the genome diversity present within a species or cohort of species, capturing the genomic structural variance between individuals. This genomic information coupled with phenotypic data can be applied to identify genes and alleles involved with abiotic stress tolerance, disease resistance, and other desirable traits. The characterisation of novel structural variants from pangenomes can support genome editing approaches such as Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR associated protein Cas (CRISPR-Cas), providing functional information on gene sequences and new target sites in variant-specific genes with increased efficiency. This review discusses the application of pangenomes in genome editing and crop improvement, focusing on the potential of pangenomes to accurately identify target genes for CRISPR-Cas editing of plant genomes while avoiding adverse off-target effects. We consider the limitations of applying CRISPR-Cas editing with pangenome references and potential solutions to overcome these limitations.

Subject(s)

CRISPR-Cas Systems/genetics , Crops, Agricultural/genetics , Genome, Plant/genetics , Gene Editing/methods , Phenotype , Plant Breeding/methods , Plants, Genetically Modified/genetics

High-Throughput Genotyping Technologies in Plant Taxonomy.

Danilevicz, Monica F; Tay Fernandez, Cassandria G; Marsh, Jacob I; Bayer, Philipp E; Edwards, David.

Methods Mol Biol ; 2222: 149-166, 2021.

Article in English | MEDLINE | ID: mdl-33301093

ABSTRACT

Molecular markers provide researchers with a powerful tool for variation analysis between plant genomes. They are heritable and widely distributed across the genome and for this reason have many applications in plant taxonomy and genotyping. Over the last decade, molecular marker technology has developed rapidly and is now a crucial component for genetic linkage analysis, trait mapping, diversity analysis, and association studies. This chapter focuses on molecular marker discovery, its application, and future perspectives for plant genotyping through pangenome assemblies. Included are descriptions of automated methods for genome and sequence distance estimation, genome contaminant analysis in sequence reads, genome structural variation, and SNP discovery methods.

Subject(s)

DNA Barcoding, Taxonomic , Genotyping Techniques , High-Throughput Screening Assays , Plants/classification , Plants/genetics , Computational Biology/methods , DNA Barcoding, Taxonomic/methods , DNA Barcoding, Taxonomic/standards , DNA Contamination , Evolution, Molecular , Genetic Markers , Genome, Plant , Genomics/methods , Genotype , High-Throughput Screening Assays/standards , Phylogeny , Polymorphism, Single Nucleotide

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL