RESUMO
Data integration using hierarchical analysis based on the central dogma or common pathway enrichment analysis may not reveal non-obvious relationships among omic data. Here, we applied factor analysis (FA) and Bayesian network (BN) modeling to integrate different omic data and complex traits by latent variables (production, carcass, and meat quality traits). A total of 14 latent variables were identified: five for phenotype, three for miRNA, four for protein, and two for mRNA data. Pearson correlation coefficients showed negative correlations between latent variables miRNA 1 (mirna1) and miRNA 2 (mirna2) (-0.47), ribeye area (REA) and protein 4 (prot4) (-0.33), REA and protein 2 (prot2) (-0.3), carcass and prot4 (-0.31), carcass and prot2 (-0.28), and backfat thickness (BFT) and miRNA 3 (mirna3) (-0.25). Positive correlations were observed among the four protein factors (0.45-0.83): between meat quality and fat content (0.71), fat content and carcass (0.74), fat content and REA (0.76), and REA and carcass (0.99). BN presented arcs from the carcass, meat quality, prot2, and prot4 latent variables to REA; from meat quality, REA, mirna2, and gene expression mRNA1 to fat content; from protein 1 (prot1) and mirna2 to protein 5 (prot5); and from prot5 and carcass to prot2. The relations of protein latent variables suggest new hypotheses about the impact of these proteins on REA. The network also showed relationships among miRNAs and nebulin proteins. REA seems to be the central node in the network, influencing carcass, prot2, prot4, mRNA1, and meat quality, suggesting that REA is a good indicator of meat quality. The connection among miRNA latent variables, BFT, and fat content relates to the influence of miRNAs on lipid metabolism. The relationship between mirna1 and prot5 composed of isoforms of nebulin needs further investigation. The FA identified latent variables, decreasing the dimensionality and complexity of the data. The BN was capable of generating interrelationships among latent variables from different types of data, allowing the integration of omics and complex traits and identifying conditional independencies. Our framework based on FA and BN is capable of generating new hypotheses for molecular research, by integrating different types of data and exploring non-obvious relationships.
RESUMO
An important criterion to consider in genetic evaluations is the extent of genetic connectedness across management units (MU), especially if they differ in their genetic mean. Reliable comparisons of genetic values across MU depend on the degree of connectedness: the higher the connectedness, the more reliable the comparison. Traditionally, genetic connectedness was calculated through pedigree-based methods; however, in the era of genomic selection, this can be better estimated utilizing new approaches based on genomics. Most procedures consider only additive genetic effects, which may not accurately reflect the underlying gene action of the evaluated trait, and little is known about the impact of non-additive gene action on connectedness measures. The objective of this study was to investigate the extent of genomic connectedness measures, for the first time, in Brazilian field data by applying additive and non-additive relationship matrices using a fatty acid profile data set from seven farms located in the three regions of Brazil, which are part of the three breeding programs. Myristic acid (C14:0) was used due to its importance for human health and reported presence of non-additive gene action. The pedigree included 427,740 animals and 925 of them were genotyped using the Bovine high-density genotyping chip. Six relationship matrices were constructed, parametrically and non-parametrically capturing additive and non-additive genetic effects from both pedigree and genomic data. We assessed genome-based connectedness across MU using the prediction error variance of difference (PEVD) and the coefficient of determination (CD). PEVD values ranged from 0.540 to 1.707, and CD from 0.146 to 0.456. Genomic information consistently enhanced the measures of connectedness compared to the numerator relationship matrix by at least 63%. Combining additive and non-additive genomic kernel relationship matrices or a non-parametric relationship matrix increased the capture of connectedness. Overall, the Gaussian kernel yielded the largest measure of connectedness. Our findings showed that connectedness metrics can be extended to incorporate genomic information and non-additive genetic variation using field data. We propose that different genomic relationship matrices can be designed to capture additive and non-additive genetic effects, increase the measures of connectedness, and to more accurately estimate the true state of connectedness in herds.