Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 16 de 16
Filter
1.
Front Plant Sci ; 14: 1247181, 2023.
Article in English | MEDLINE | ID: mdl-38023883

ABSTRACT

Introduction: Ordinal traits are important complex traits in crops, while genome-wide association study (GWAS) is a widely-used method in their gene mining. Presently, GWAS of continuous quantitative traits (C-GWAS) and single-locus association analysis method of ordinal traits are the main methods used for ordinal traits. However, the detection power of these two methods is low. Methods: To address this issue, we proposed a new method, named MTOTC, in which hierarchical data of ordinal traits are transformed into continuous phenotypic data (CPData). Results: Then, FASTmrMLM, one C-GWAS method, was used to conduct GWAS for CPData. The results from the simulation studies showed that, MTOTC+FASTmrMLM for ordinal traits was better than the classical methods when there were four and fewer hierarchical levels. In addition, when MTOTC was combined with FASTmrEMMA, mrMLM, ISIS EM-BLASSO, pLARmEB, and pKWmEB, relatively high power and low false positive rate in QTN detection were observed as well. Subsequently, MTOTC was applied to analyze the hierarchical data of soybean salt-alkali tolerance. It was revealed that more significant QTNs were detected when MTOTC was combined with any of the above six C-GWAs. Discussion: Accordingly, the new method increases the choices of the GWAS methods for ordinal traits and helps to mine the genes for ordinal traits in resource populations.

2.
Front Plant Sci ; 14: 1050313, 2023.
Article in English | MEDLINE | ID: mdl-36875585

ABSTRACT

Introduction: Quantitative trait nucleotide (QTN)-by-environment interactions (QEIs) play an increasingly essential role in the genetic dissection of complex traits in crops as global climate change accelerates. The abiotic stresses, such as drought and heat, are the major constraints on maize yields. Multi-environment joint analysis can improve statistical power in QTN and QEI detection, and further help us to understand the genetic basis and provide implications for maize improvement. Methods: In this study, 3VmrMLM was applied to identify QTNs and QEIs for three yield-related traits (grain yield, anthesis date, and anthesis-silking interval) of 300 tropical and subtropical maize inbred lines with 332,641 SNPs under well-watered and drought and heat stresses. Results: Among the total 321 genes around 76 QTNs and 73 QEIs identified in this study, 34 known genes were reported in previous maize studies to be truly associated with these traits, such as ereb53 (GRMZM2G141638) and thx12 (GRMZM2G016649) associated with drought stress tolerance, and hsftf27 (GRMZM2G025685) and myb60 (GRMZM2G312419) associated with heat stress. In addition, among 127 homologs in Arabidopsis out of 287 unreported genes, 46 and 47 were found to be significantly and differentially expressed under drought vs well-watered treatments, and high vs. normal temperature treatments, respectively. Using functional enrichment analysis, 37 of these differentially expressed genes were involved in various biological processes. Tissue-specific expression and haplotype difference analysis further revealed 24 candidate genes with significantly phenotypic differences across gene haplotypes under different environments, of which the candidate genes GRMZM2G064159, GRMZM2G146192, and GRMZM2G114789 around QEIs may have gene-by-environment interactions for maize yield. Discussion: All these findings may provide new insights for breeding in maize for yield-related traits adapted to abiotic stresses.

3.
Front Plant Sci ; 14: 1283642, 2023.
Article in English | MEDLINE | ID: mdl-38259933

ABSTRACT

Introduction: Epistasis is currently a topic of great interest in molecular and quantitative genetics. Arabidopsis thaliana, as a model organism, plays a crucial role in studying the fundamental biology of diverse plant species. However, there have been limited reports about identification of epistasis related to flowering in genome-wide association studies (GWAS). Therefore, it is of utmost importance to conduct epistasis in Arabidopsis. Method: In this study, we employed Levene's test and compressed variance component mixed model in GWAS to detect quantitative trait nucleotides (QTNs) and QTN-by-QTN interactions (QQIs) for 11 flowering-related traits of 199 Arabidopsis accessions with 216,130 markers. Results: Our analysis detected 89 QTNs and 130 pairs of QQIs. Around these loci, 34 known genes previously reported in Arabidopsis were confirmed to be associated with flowering-related traits, such as SPA4, which is involved in regulating photoperiodic flowering, and interacts with PAP1 and PAP2, affecting growth of Arabidopsis under light conditions. Then, we observed significant and differential expression of 35 genes in response to variations in temperature, photoperiod, and vernalization treatments out of unreported genes. Functional enrichment analysis revealed that 26 of these genes were associated with various biological processes. Finally, the haplotype and phenotypic difference analysis revealed 20 candidate genes exhibiting significant phenotypic variations across gene haplotypes, of which the candidate genes AT1G12990 and AT1G09950 around QQIs might have interaction effect to flowering time regulation in Arabidopsis. Discussion: These findings may offer valuable insights for the identification and exploration of genes and gene-by-gene interactions associated with flowering-related traits in Arabidopsis, that may even provide valuable reference and guidance for the research of epistasis in other species.

4.
Front Plant Sci ; 13: 995609, 2022.
Article in English | MEDLINE | ID: mdl-36325550

ABSTRACT

Rice, which supports more than half the population worldwide, is one of the most important food crops. Thus, potential yield-related quantitative trait nucleotides (QTNs) and QTN-by-environment interactions (QEIs) have been used to develop efficient rice breeding strategies. In this study, a compressed variance component mixed model, 3VmrMLM, in genome-wide association studies was used to detect QTNs for eight yield-related traits of 413 rice accessions with 44,000 single nucleotide polymorphisms. These traits include florets per panicle, panicle fertility, panicle length, panicle number per plant, plant height, primary panicle branch number, seed number per panicle, and flowering time. Meanwhile, QTNs and QEIs were identified for flowering times in three different environments and five subpopulations. In the detections, a total of 7~23 QTNs were detected for each trait, including the three single-environment flowering time traits. In the detection of QEIs for flowering time in the three environments, 21 QTNs and 13 QEIs were identified. In the five subpopulation analyses, 3~9 QTNs and 2~4 QEIs were detected for each subpopulation. Based on previous studies, we identified 87 known genes around the significant/suggested QTNs and QEIs, such as LOC_Os06g06750 (OsMADS5) and LOC_Os07g47330 (FZP). Further differential expression analysis and functional enrichment analysis identified 30 candidate genes. Of these candidate genes, 27 genes had high expression in specific tissues, and 19 of these 27 genes were homologous to known genes in Arabidopsis. Haplotype difference analysis revealed that LOC_Os04g53210 and LOC_Os07g42440 are possibly associated with yield, and LOC_Os04g53210 may be useful around a QEI for flowering time. These results provide insights for future breeding for high quality and yield in rice.

5.
Plants (Basel) ; 11(19)2022 Sep 26.
Article in English | MEDLINE | ID: mdl-36235370

ABSTRACT

Rice (Oryza sativa) is one of the most important cereal crops in the world, and yield-related agronomic traits, including plant height (PH), panicle length (PL), and protein content (PC), are prerequisites for attaining the desired yield and quality in breeding programs. Meanwhile, the main effects and epistatic effects of quantitative trait nucleotides (QTNs) are all important genetic components for yield-related quantitative traits. In this study, we conducted genome-wide association studies (GWAS) for 413 rice germplasm resources, with 36,901 single nucleotide polymorphisms (SNPs), to identify QTNs, QTN-by-QTN interaction (QQI), and their candidate genes, using a multi-locus compressed variance component mixed model, 3VmrMLM. As a result, two significant QTNs and 56 paired QQIs were detected, amongst 5219 genes of these QTNs, and 26 genes were identified as the yield-related confirmed genes, such as LCRN1, OsSPL3, and OsVOZ1 for PH, and LOG and QsBZR1 for PL. To reveal the substantial contributions related to the variation of yield-related agronomic traits in rice, we further implemented an enrichment analysis and expression analysis. As the results showed, 114 genes, nearly all significant QQIs, were involved in 37 GO terms; for example, the macromolecule metabolic process (GO:0043170), intracellular part (GO:0044424), and binding (GO:0005488). It was revealed that most of the QQIs and the candidate genes were significantly involved in the biological process, molecular function, and cellular component of the target traits. The demonstrated genetic interactions play a critical role in yield-related agronomic traits of rice, and such epistatic interactions contributed to large portions of the missing heritability in GWAS. These results help us to understand the genetic basis underlying the inheritance of the three yield-related agronomic traits and provide implications for rice improvement.

6.
Front Genet ; 12: 648831, 2021.
Article in English | MEDLINE | ID: mdl-33981331

ABSTRACT

Genome-wide association study (GWAS) has identified thousands of genetic variants associated with complex traits and diseases. Compared with analyzing a single phenotype at a time, the joint analysis of multiple phenotypes can improve statistical power by taking into account the information from phenotypes. However, most established joint algorithms ignore the different level of correlations between multiple phenotypes; instead of that, they simultaneously analyze all phenotypes in a genetic model. Thus, they may fail to capture the genetic structure of phenotypes and consequently reduce the statistical power. In this study, we develop a novel method agglomerative nesting clustering algorithm for phenotypic dimension reduction analysis (AGNEP) to jointly analyze multiple phenotypes for GWAS. First, AGNEP uses an agglomerative nesting clustering algorithm to group correlated phenotypes and then applies principal component analysis (PCA) to generate representative phenotypes for each group. Finally, multivariate analysis is employed to test associations between genetic variants and the representative phenotypes rather than all phenotypes. We perform three simulation experiments with various genetic structures and a real dataset analysis for 19 Arabidopsis phenotypes. Compared to established methods, AGNEP is more powerful in terms of statistical power, computing time, and the number of quantitative trait nucleotides (QTNs). The analysis of the Arabidopsis real dataset further illustrates the efficiency of AGNEP for detecting QTNs, which are confirmed by The Arabidopsis Information Resource gene bank.

7.
Front Genet ; 12: 649196, 2021.
Article in English | MEDLINE | ID: mdl-33854527

ABSTRACT

The mixed linear model (MLM) has been widely used in genome-wide association study (GWAS) to dissect quantitative traits in human, animal, and plant genetics. Most methodologies consider all single nucleotide polymorphism (SNP) effects as random effects under the MLM framework, which fail to detect the joint minor effect of multiple genetic markers on a trait. Therefore, polygenes with minor effects remain largely unexplored in today's big data era. In this study, we developed a new algorithm under the MLM framework, which is called the fast multi-locus ridge regression (FastRR) algorithm. The FastRR algorithm first whitens the covariance matrix of the polygenic matrix K and environmental noise, then selects potentially related SNPs among large scale markers, which have a high correlation with the target trait, and finally analyzes the subset variables using a multi-locus deshrinking ridge regression for true quantitative trait nucleotide (QTN) detection. Results from the analyses of both simulated and real data show that the FastRR algorithm is more powerful for both large and small QTN detection, more accurate in QTN effect estimation, and has more stable results under various polygenic backgrounds. Moreover, compared with existing methods, the FastRR algorithm has the advantage of high computing speed. In conclusion, the FastRR algorithm provides an alternative algorithm for multi-locus GWAS in high dimensional genomic datasets.

8.
Genomics Proteomics Bioinformatics ; 18(4): 481-487, 2020 08.
Article in English | MEDLINE | ID: mdl-33346083

ABSTRACT

Previous studies have reported that some important loci are missed in single-locus genome-wide association studies (GWAS), especially because of the large phenotypic error in field experiments. To solve this issue, multi-locus GWAS methods have been recommended. However, only a few software packages for multi-locus GWAS are available. Therefore, we developed an R software named mrMLM v4.0.2. This software integrates mrMLM, FASTmrMLM, FASTmrEMMA, pLARmEB, pKWmEB, and ISIS EM-BLASSO methods developed by our lab. There are four components in mrMLM v4.0.2, including dataset input, parameter setting, software running, and result output. The fread function in data.table is used to quickly read datasets, especially big datasets, and the doParallel package is used to conduct parallel computation using multiple CPUs. In addition, the graphical user interface software mrMLM.GUI v4.0.2, built upon Shiny, is also available. To confirm the correctness of the aforementioned programs, all the methods in mrMLM v4.0.2 and three widely-used methods were used to analyze real and simulated datasets. The results confirm the superior performance of mrMLM v4.0.2 to other methods currently available. False positive rates are effectively controlled, albeit with a less stringent significance threshold. mrMLM v4.0.2 is publicly available at BioCode (https://bigd.big.ac.cn/biocode/tools/BT007077) or R (https://cran.r-project.org/web/packages/mrMLM.GUI/index.html) as an open-source software.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Software
9.
Comput Struct Biotechnol J ; 18: 59-65, 2020.
Article in English | MEDLINE | ID: mdl-31890145

ABSTRACT

The methodologies and software packages for mapping quantitative trait loci (QTLs) in bi-parental segregation populations are well established. However, it is still difficult to detect small-effect and linked QTLs. To address this issue, we proposed a genome-wide composite interval mapping (GCIM) in bi-parental segregation populations. To popularize this method, we developed an R package. This program with two versions (Graphical User Interface: QTL.gCIMapping.GUI v2.0 and code: QTL.gCIMapping v3.2) can be used to identify QTLs for quantitative traits in recombinant inbred lines, doubled haploid lines, backcross and F2 populations. To save running time, fread function was used to read the dataset, parallel operation was used in parameter estimation, and conditional probability calculation was implemented by C++. Once one input file with *.csv or *.txt formats is uploaded into the package, one or two output files and one figure can be obtained. The input file with the ICIM and win QTL cartographer formats is available as well. Real data analysis for 1000-grain weight in rice showed that the GCIM detects the maximum previously reported QTLs and genes, and has the minimum AIC value in the stepwise regression of all the identified QTLs for this trait; using stepwise regression and empirical Bayesian analyses, there are some false QTLs around the previously reported QTLs and genes from the CIM method. The above software packages on Windows, Mac and Linux can be downloaded from https://cran.r-project.org/web/packages/ or https://bigd.big.ac.cn/biocode/tools/7078/releases/27 in order to identify all kinds of omics QTLs.

10.
Sci Rep ; 9(1): 18034, 2019 12 02.
Article in English | MEDLINE | ID: mdl-31792302

ABSTRACT

One of the most important tasks in genome-wide association analysis (GWAS) is the detection of single-nucleotide polymorphisms (SNPs) which are related to target traits. With the development of sequencing technology, traditional statistical methods are difficult to analyze the corresponding high-dimensional massive data or SNPs. Recently, machine learning methods have become more popular in high-dimensional genetic data analysis for their fast computation speed. However, most of machine learning methods have several drawbacks, such as poor generalization ability, over-fitting, unsatisfactory classification and low detection accuracy. This study proposed a two-stage algorithm based on least angle regression and random forest (TSLRF), which firstly considered the control of population structure and polygenic effects, then selected the SNPs that were potentially related to target traits by using least angle regression (LARS), furtherly analyzed this variable subset using random forest (RF) to detect quantitative trait nucleotides (QTNs) associated with target traits. The new method has more powerful detection in simulation experiments and real data analyses. The results of simulation experiments showed that, compared with the existing approaches, the new method effectively improved the detection ability of QTNs and model fitting degree, and required less calculation time. In addition, the new method significantly distinguished QTNs and other SNPs. Subsequently, the new method was applied to analyze five flowering-related traits in Arabidopsis. The results showed that, the distinction between QTNs and unrelated SNPs was more significant than the other methods. The new method detected 60 genes confirmed to be related to the target trait, which was significantly higher than the other methods, and simultaneously detected multiple gene clusters associated with the target trait.


Subject(s)
Genome-Wide Association Study/methods , Genomics/methods , Machine Learning , Models, Genetic , Arabidopsis/genetics , Genome, Plant/genetics , Linear Models , Multigene Family/genetics , Polymorphism, Single Nucleotide/genetics , Quantitative Trait Loci/genetics
11.
Brief Bioinform ; 20(5): 1913-1924, 2019 09 27.
Article in English | MEDLINE | ID: mdl-30032279

ABSTRACT

In the genetic system that regulates complex traits, metabolites, gene expression levels, RNA editing levels and DNA methylation, a series of small and linked genes exist. To date, however, little is known about how to design an efficient framework for the detection of these kinds of genes. In this article, we propose a genome-wide composite interval mapping (GCIM) in F2. First, controlling polygenic background via selecting markers in the genome scanning of linkage analysis was replaced by estimating polygenic variance in a genome-wide association study. This can control large, middle and minor polygenic backgrounds in genome scanning. Then, additive and dominant effects for each putative quantitative trait locus (QTL) were separately scanned so that a negative logarithm P-value curve against genome position could be separately obtained for each kind of effect. In each curve, all the peaks were identified as potential QTLs. Thus, almost all the small-effect and linked QTLs are included in a multi-locus model. Finally, adaptive least absolute shrinkage and selection operator (adaptive lasso) was used to estimate all the effects in the multi-locus model, and all the nonzero effects were further identified by likelihood ratio test for true QTL identification. This method was used to reanalyze four rice traits. Among 25 known genes detected in this study, 16 small-effect genes were identified only by GCIM. To further demonstrate GCIM, a series of Monte Carlo simulation experiments was performed. As a result, GCIM is demonstrated to be more powerful than the widely used methods for the detection of closely linked and small-effect QTLs.


Subject(s)
Models, Genetic , Quantitative Trait Loci , DNA Methylation , Genetic Linkage , Humans , Monte Carlo Method
12.
Brief Bioinform ; 19(4): 700-712, 2018 07 20.
Article in English | MEDLINE | ID: mdl-28158525

ABSTRACT

The mixed linear model has been widely used in genome-wide association studies (GWAS), but its application to multi-locus GWAS analysis has not been explored and assessed. Here, we implemented a fast multi-locus random-SNP-effect EMMA (FASTmrEMMA) model for GWAS. The model is built on random single nucleotide polymorphism (SNP) effects and a new algorithm. This algorithm whitens the covariance matrix of the polygenic matrix K and environmental noise, and specifies the number of nonzero eigenvalues as one. The model first chooses all putative quantitative trait nucleotides (QTNs) with ≤ 0.005 P-values and then includes them in a multi-locus model for true QTN detection. Owing to the multi-locus feature, the Bonferroni correction is replaced by a less stringent selection criterion. Results from analyses of both simulated and real data showed that FASTmrEMMA is more powerful in QTN detection and model fit, has less bias in QTN effect estimation and requires a less running time than existing single- and multi-locus methods, such as empirical Bayes, settlement of mixed linear model under progressively exclusive relationship (SUPER), efficient mixed model association (EMMA), compressed MLM (CMLM) and enriched CMLM (ECMLM). FASTmrEMMA provides an alternative for multi-locus GWAS.


Subject(s)
Algorithms , Arabidopsis Proteins/genetics , Arabidopsis/genetics , Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide , Bayes Theorem , Computer Simulation , Linear Models , Models, Genetic , Multifactorial Inheritance , Phenotype
13.
Heredity (Edinb) ; 120(3): 208-218, 2018 03.
Article in English | MEDLINE | ID: mdl-29234158

ABSTRACT

Although nonparametric methods in genome-wide association studies (GWAS) are robust in quantitative trait nucleotide (QTN) detection, the absence of polygenic background control in single-marker association in genome-wide scans results in a high false positive rate. To overcome this issue, we proposed an integrated nonparametric method for multi-locus GWAS. First, a new model transformation was used to whiten the covariance matrix of polygenic matrix K and environmental noise. Using the transferred model, Kruskal-Wallis test along with least angle regression was then used to select all the markers that were potentially associated with the trait. Finally, all the selected markers were placed into multi-locus model, these effects were estimated by empirical Bayes, and all the nonzero effects were further identified by a likelihood ratio test for true QTN detection. This method, named pKWmEB, was validated by a series of Monte Carlo simulation studies. As a result, pKWmEB effectively controlled false positive rate, although a less stringent significance criterion was adopted. More importantly, pKWmEB retained the high power of Kruskal-Wallis test, and provided QTN effect estimates. To further validate pKWmEB, we re-analyzed four flowering time related traits in Arabidopsis thaliana, and detected some previously reported genes that were not identified by the other methods.


Subject(s)
Genome-Wide Association Study/methods , Models, Genetic , Multifactorial Inheritance , Arabidopsis/genetics , Arabidopsis/physiology , Bayes Theorem , Computer Simulation , Flowers/genetics , Flowers/physiology , Likelihood Functions , Monte Carlo Method
15.
Sci Rep ; 6: 29951, 2016 07 20.
Article in English | MEDLINE | ID: mdl-27435756

ABSTRACT

Composite interval mapping (CIM) is the most widely-used method in linkage analysis. Its main feature is the ability to control genomic background effects via inclusion of co-factors in its genetic model. However, the result often depends on how the co-factors are selected, especially for small-effect and linked quantitative trait loci (QTL). To address this issue, here we proposed a new method under the framework of genome-wide association studies (GWAS). First, a single-locus random-SNP-effect mixed linear model method for GWAS was used to scan each putative QTL on the genome in backcross or doubled haploid populations. Here, controlling background via selecting markers in the CIM was replaced by estimating polygenic variance. Then, all the peaks in the negative logarithm P-value curve were selected as the positions of multiple putative QTL to be included in a multi-locus genetic model, and true QTL were automatically identified by empirical Bayes. This called genome-wide CIM (GCIM). A series of simulated and real datasets was used to validate the new method. As a result, the new method had higher power in QTL detection, greater accuracy in QTL effect estimation, and stronger robustness under various backgrounds as compared with the CIM and empirical Bayes methods.

16.
Sci Rep ; 6: 19444, 2016 Jan 20.
Article in English | MEDLINE | ID: mdl-26787347

ABSTRACT

Genome-wide association studies (GWAS) have been widely used in genetic dissection of complex traits. However, common methods are all based on a fixed-SNP-effect mixed linear model (MLM) and single marker analysis, such as efficient mixed model analysis (EMMA). These methods require Bonferroni correction for multiple tests, which often is too conservative when the number of markers is extremely large. To address this concern, we proposed a random-SNP-effect MLM (RMLM) and a multi-locus RMLM (MRMLM) for GWAS. The RMLM simply treats the SNP-effect as random, but it allows a modified Bonferroni correction to be used to calculate the threshold p value for significance tests. The MRMLM is a multi-locus model including markers selected from the RMLM method with a less stringent selection criterion. Due to the multi-locus nature, no multiple test correction is needed. Simulation studies show that the MRMLM is more powerful in QTN detection and more accurate in QTN effect estimation than the RMLM, which in turn is more powerful and accurate than the EMMA. To demonstrate the new methods, we analyzed six flowering time related traits in Arabidopsis thaliana and detected more genes than previous reported using the EMMA. Therefore, the MRMLM provides an alternative for multi-locus GWAS.


Subject(s)
Genetic Loci , Genome-Wide Association Study , Linear Models , Polymorphism, Single Nucleotide , Arabidopsis/genetics , Computer Simulation , Genes, Plant , Genome-Wide Association Study/methods , Quantitative Trait, Heritable , ROC Curve , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL
...