Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Nat Commun ; 15(1): 2546, 2024 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-38514647

RESUMO

Influenza virus continuously evolves to escape human adaptive immunity and generates seasonal epidemics. Therefore, influenza vaccine strains need to be updated annually for the upcoming flu season to ensure vaccine effectiveness. We develop a computational approach, beth-1, to forecast virus evolution and select representative virus for influenza vaccine. The method involves modelling site-wise mutation fitness. Informed by virus genome and population sero-positivity, we calibrate transition time of mutations and project the fitness landscape to future time, based on which beth-1 selects the optimal vaccine strain. In season-to-season prediction in historical data for the influenza A pH1N1 and H3N2 viruses, beth-1 demonstrates superior genetic matching compared to existing approaches. In prospective validations, the model shows superior or non-inferior genetic matching and neutralization against circulating virus in mice immunization experiments compared to the current vaccine. The method offers a promising and ready-to-use tool to facilitate vaccine strain selection for the influenza virus through capturing heterogeneous evolutionary dynamics over genome space-time and linking molecular variants to population immune response.


Assuntos
Vacinas contra Influenza , Influenza Humana , Humanos , Animais , Camundongos , Vacinas contra Influenza/genética , Vírus da Influenza A Subtipo H3N2/genética , Glicoproteínas de Hemaglutininação de Vírus da Influenza , Influenza Humana/epidemiologia , Influenza Humana/prevenção & controle , Mutação , Estações do Ano
2.
Infect Dis Model ; 8(1): 107-121, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-36632179

RESUMO

Virus evolution is a common process of pathogen adaption to host population and environment. Frequently, a small but important fraction of virus mutations are reported to contribute to higher risks of host infection, which is one of the major determinants of infectious diseases outbreaks at population scale. The key mutations contributing to transmission advantage of a genetic variant often grow and reach fixation rapidly. Based on classic epidemiology theories of disease transmission, we proposed a mechanistic explanation of the process that between-host transmission advantage may shape the observed logistic curve of the mutation proportion in population. The logistic growth of mutation is further generalized by incorporating time-varying selective pressure to account for impacts of external factors on pathogen adaptiveness. The proposed model is implemented in real-world data of COVID-19 to capture the emerging trends and changing dynamics of the B.1.1.7 strains of SARS-CoV-2 in England. The model characterizes and establishes the underlying theoretical mechanism that shapes the logistic growth of mutation in population.

3.
Hum Mutat ; 38(9): 1235-1239, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28419606

RESUMO

Genetic data consists of a wide range of marker types, including common, low-frequency, and rare variants. Multiple genetic markers and their interactions play central roles in the heritability of complex disease. In this study, we propose an algorithm that uses a stratified variable selection design by genetic architectures and interaction effects, achieved by a dataset-adaptive W-test. The polygenic sets in all strata were integrated to form a classification rule. The algorithm was applied to the Critical Assessment of Genome Interpretation 4 bipolar challenge sequencing data. The prediction accuracy was 60% using genetic markers on an independent test set. We found that epistasis among common genetic variants contributed most substantially to prediction precision. However, the sample size was not large enough to draw conclusions for the lack of predictability of low-frequency variants and their epistasis.


Assuntos
Transtorno Bipolar/genética , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA/métodos , Algoritmos , Epistasia Genética , Predisposição Genética para Doença , Humanos , Modelos Genéticos
4.
BMC Proc ; 10(Suppl 7): 153-157, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27980628

RESUMO

With the development of the next-generation sequencing technology, the influence of rare variants on complex disease has gathered increasing attention. In this paper, we propose a clustering-based approach, the clustering sum test, to test the effects of rare variants association by using the simulated data provided by the Genetic Analysis Workshop 19 with an unbalanced case-control ratio. The control individuals are (a) clustered into several subgroups, (b) statistics of the separate subcontrol groups as compared to the case group are calculated, and (c) a combined statistic value is obtained based on a distance score. Collapsing of rare variants is used together with the proposed method. In our results, comparing the same statistical test with and without clustering, the clustering strategy increases the number of true positives identified in the top 100 markers by 17.24 %. Compared to the sequence kernel association test, the proposed method is more robust in terms of replicated frequencies in the replicates data sets. The results suggest that the clustering approach could improve the power of nonparametric tests and that the clustering sum test has the potential to serve as a practical tool when dealing with rare variants with unbalanced case-control data in genome-wide case-control studies.

6.
Genet Epidemiol ; 40(7): 591-596, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-27531462

RESUMO

Advancement in sequencing technology enables the study of association between complex disorder phenotypes and single-nucleotide polymorphisms with rare mutations. However, the rare genetic variant has extremely small variance and impairs testing power of traditional statistical methods. We introduce a W-test collapsing method to evaluate rare-variant association by measuring the distributional differences between cases and controls through combined log of odds ratio within a genomic region. The method is model-free and inherits chi-squared distribution with degrees of freedom estimated from bootstrapped samples of the data, and allows for fast and accurate P-value calculation without the need of permutations. The proposed method is compared with the Weighted-Sum Statistic and Sequence Kernel Association Test on simulation datasets, and showed good performances and significantly faster computing speed. In the application of real next-generation sequencing dataset of hypertensive disorder, it identified genes of interesting biological functions associated to metabolism disorder and inflammation, including the MACROD1, NLRP7, AGK, PAK6, and APBB1. The proposed method offers an efficient and effective way for testing rare genetic variants in whole exome sequencing datasets.


Assuntos
Modelos Genéticos , Proteínas Adaptadoras de Transdução de Sinal/genética , Hidrolases de Éster Carboxílico , Estudos de Associação Genética , Variação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Hipertensão/genética , Hipertensão/patologia , Proteínas de Neoplasias/genética , Fosfotransferases (Aceptor do Grupo Álcool)/genética , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA
7.
Nucleic Acids Res ; 44(12): e115, 2016 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-27112568

RESUMO

Epistasis plays an essential role in the development of complex diseases. Interaction methods face common challenge of seeking a balance between persistent power, model complexity, computation efficiency, and validity of identified bio-markers. We introduce a novel W-test to identify pairwise epistasis effect, which measures the distributional difference between cases and controls through a combined log odds ratio. The test is model-free, fast, and inherits a Chi-squared distribution with data adaptive degrees of freedom. No permutation is needed to obtain the P-values. Simulation studies demonstrated that the W-test is more powerful in low frequency variants environment than alternative methods, which are the Chi-squared test, logistic regression and multifactor-dimensionality reduction (MDR). In two independent real bipolar disorder genome-wide associations (GWAS) datasets, the W-test identified significant interactions pairs that can be replicated, including SLIT3-CENPN, SLIT3-TMEM132D, CNTNAP2-NDST4 and CNTCAP2-RTN4R The genes in the pairs play central roles in neurotransmission and synapse formation. A majority of the identified loci are undiscoverable by main effect and are low frequency variants. The proposed method offers a powerful alternative tool for mapping the genetic puzzle underlying complex disorders.


Assuntos
Epistasia Genética , Modelos Genéticos , Polimorfismo de Nucleotídeo Único/genética , Algoritmos , Simulação por Computador , Humanos , Modelos Logísticos , Redução Dimensional com Múltiplos Fatores
8.
BMC Proc ; 8(Suppl 1): S47, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25519328

RESUMO

Current sequencing technology enables generation of whole genome sequencing data sets that contain a high density of rare variants, each of which is carried by, at most, 5% of the sampled subjects. Such variants are involved in the etiology of most common diseases in humans. These diseases can be studied by relevant longitudinal phenotype traits. Tests for association between such genotype information and longitudinal traits allow the study of the function of rare variants in complex human disorders. In this paper, we propose an association-screening framework that highlights the genotypic differences observed on rare variants and the longitudinal nature of phenotypes. In particular, both variants within a gene and longitudinal phenotypes are used to create partitions of subjects. Association between the 2 sets of constructed partitions is then evaluated. We apply the proposed strategy to the simulated data from the Genetic Analysis Workshop 18 and compare the obtained results with those from sequence kernel association test using the receiver operating characteristic curves.

10.
BMC Proc ; 8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo): S60, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25519395

RESUMO

It is believed that almost all common diseases are the consequence of complex interactions between genetic markers and environmental factors. However, few such interactions have been documented to date. Conventional statistical methods for detecting gene and environmental interactions are often based on the linear regression model, which assumes a linear interaction effect. In this study, we propose a nonparametric partition-based approach that is able to capture complex interaction patterns. We apply this method to the real data set of hypertension provided by Genetic Analysis Workshop 18. Compared with the linear regression model, the proposed approach is able to identify many additional variants with significant gene-environmental interaction effects. We further investigate one single-nucleotide polymorphism identified by our method and show that its gene-environmental interaction effect is, indeed, nonlinear. To adjust for the family dependence of phenotypes, we apply different permutation strategies and investigate their effects on the outcomes.

11.
BMC Proc ; 8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo): S62, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25519396

RESUMO

Environment has long been known to play an important part in disease etiology. However, not many genome-wide association studies take environmental factors into consideration. There is also a need for new methods to identify the gene-environment interactions. In this study, we propose a 2-step approach incorporating an influence measure that capturespure gene-environment effect. We found that pure gene-age interaction has a stronger association than considering the genetic effect alone for systolic blood pressure, measured by counting the number of single-nucleotide polymorphisms (SNPs)reaching a certain significance level. We analyzed the subjects by dividing them into two age groups and found no overlap in the top identified SNPs between them. This suggested that age might have a nonlinear effect on genetic association. Furthermore, the scores of the top SNPs for the two age subgroups were about 3times those obtained when using all subjects for systolic blood pressure. In addition, the scores of the older age subgroup were much higher than those for the younger group. The results suggest that genetic effects are stronger in older age and that genetic association studies should take environmental effects into consideration, especially age.

12.
BMC Proc ; 8(Suppl 1 Genetic Analysis Workshop 18Vanessa Olmo): S7, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25519400

RESUMO

In this study, we analyze the Genetic Analysis Workshop 18 (GAW18) data to identify regions of single-nucleotide polymorphisms (SNPs), which significantly influence hypertension status among individuals. We have studied the marginal impact of these regions on disease status in the past, but we extend the method to deal with environmental factors present in data collected over several exam periods. We consider the respective interactions between such traits as smoking status and age with the genetic information and hope to augment those genetic regions deemed influential marginally with those that contribute via an interactive effect. In particular, we focus only on rare variants and apply a procedure to combine signal among rare variants in a number of "fixed bins" along the chromosome. We extend the procedure in Agne et al [1] to incorporate environmental factors by dichotomizing subjects via traits such as smoking status and age, running the marginal procedure among each respective category (i.e., smokers or nonsmokers), and then combining their scores into a score for interaction. To avoid overlap of subjects, we examine each exam period individually. Out of a possible 629 fixed-bin regions in chromosome 3, we observe that 11 show up in multiple exam periods for gene-smoking score. Fifteen regions exhibit significance for multiple exam periods for gene-age score, with 4 regions deemed significant for all 3 exam periods. The procedure pinpoints SNPs in 8 "answer" genes, with 5 of these showing up as significant in multiple testing schemes (Gene-Smoking, Gene-Age for Exams 1, 2, and 3).

13.
Bioinformatics ; 28(21): 2834-42, 2012 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-22945786

RESUMO

MOTIVATION: Epistasis or gene-gene interaction has gained increasing attention in studies of complex diseases. Its presence as an ubiquitous component of genetic architecture of common human diseases has been contemplated. However, the detection of gene-gene interaction is difficult due to combinatorial explosion. RESULTS: We present a novel feature selection method incorporating variable interaction. Three gene expression datasets are analyzed to illustrate our method, although it can also be applied to other types of high-dimensional data. The quality of variables selected is evaluated in two ways: first by classification error rates, then by functional relevance assessed using biological knowledge. We show that the classification error rates can be significantly reduced by considering interactions. Secondly, a sizable portion of genes identified by our method for breast cancer metastasis overlaps with those reported in gene-to-system breast cancer (G2SBC) database as disease associated and some of them have interesting biological implication. In summary, interaction-based methods may lead to substantial gain in biological insights as well as more accurate prediction.


Assuntos
Algoritmos , Neoplasias da Mama/classificação , Neoplasias da Mama/genética , Perfilação da Expressão Gênica/métodos , Testes Genéticos/métodos , Modelos Estatísticos , Recidiva Local de Neoplasia/genética , Epistasia Genética , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Modelos Logísticos , Família Multigênica , Recidiva Local de Neoplasia/classificação
14.
Genet Epidemiol ; 35 Suppl 1: S56-60, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22128060

RESUMO

As part of Genetic Analysis Workshop 17 (GAW17), our group considered the application of novel and standard approaches to the analysis of genotype-phenotype association in next-generation sequencing data. Our group identified a major issue in the analysis of the GAW17 next-generation sequencing data: type I error and false-positive report probability rates higher than those expected based on empirical type I error levels (as high as 90%). Two main causes emerged: population stratification and long-range correlation (gametic phase disequilibrium) between rare variants. Population stratification was expected because of the diverse sample. Correlation between rare variants was attributable to both random causes (e.g., nearly 10,000 of 25,000 markers were private variants, and the sample size was small [n = 697]) and nonrandom causes (more correlation was observed than was expected by random chance). Principal components analysis was used to control for population structure and helped to minimize type I errors, but this was at the expense of identifying fewer causal variants. A novel multiple regression approach showed promise to handle correlation between markers. Further work is needed, first, to identify best practices for the control of type I errors in the analysis of sequencing data and then to explore and compare the many promising new aggregating approaches for identifying markers associated with disease phenotypes.


Assuntos
Marcadores Genéticos , Epidemiologia Molecular/métodos , Estudos de Associação Genética , Genótipo , Projeto Genoma Humano , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Análise de Componente Principal , Análise de Regressão , Sensibilidade e Especificidade , Análise de Sequência
15.
BMC Proc ; 5 Suppl 9: S3, 2011 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-22373412

RESUMO

In this study, we analyze the Genetic Analysis Workshop 17 data to identify regions of single-nucleotide polymorphisms (SNPs) that exhibit a significant influence on response rate (proportion of subjects with an affirmative affected status), called the affected ratio, among rare variants. Under the null hypothesis, the distribution of rare variants is assumed to be uniform over case (affected) and control (unaffected) subjects. We attempt to pinpoint regions where the composition is significantly different between case and control events, specifically where there are unusually high numbers of rare variants among affected subjects. We focus on private variants, which require a degree of "collapsing" to combine information over several SNPs, to obtain meaningful results. Instead of implementing a gene-based approach, where regions would vary in size and sometimes be too small to achieve a strong enough signal, we implement a fixed-bin approach, with a preset number of SNPs per region, relying on the assumption that proximity and similarity go hand in hand. Through application of 100-SNP and 30-SNP fixed bins, we identify several most influential regions, which later are seen to contain some of the causal SNPs. The 100- and 30-SNP approaches detected seven and three causal SNPs among the most significant regions, respectively, with two overlapping SNPs located in the ELAVL4 gene, reported by both procedures.

16.
BMC Proc ; 5 Suppl 9: S50, 2011 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-22373518

RESUMO

The advance of high-throughput next-generation sequencing technology makes possible the analysis of rare variants. However, the investigation of rare variants in unrelated-individuals data sets faces the challenge of low power, and most methods circumvent the difficulty by using various collapsing procedures based on genes, pathways, or gene clusters. We suggest a new way to identify causal rare variants using the F-statistic and sliced inverse regression. The procedure is tested on the data set provided by the Genetic Analysis Workshop 17 (GAW17). After preliminary data reduction, we ranked markers according to their F-statistic values. Top-ranked markers were then subjected to sliced inverse regression, and those with higher absolute coefficients in the most significant sliced inverse regression direction were selected. The procedure yields good false discovery rates for the GAW17 data and thus is a promising method for future study on rare variants.

17.
BMC Proc ; 5 Suppl 9: S106, 2011 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-22373536

RESUMO

Both common variants and rare variants are involved in the etiology of most complex diseases in humans. Developments in sequencing technology have led to the identification of a high density of rare variant single-nucleotide polymorphisms (SNPs) on the genome, each of which affects only at most 1% of the population. Genotypes derived from these SNPs allow one to study the involvement of rare variants in common human disorders. Here, we propose an association screening approach that treats genes as units of analysis. SNPs within a gene are used to create partitions of individuals, and inverse-probability weighting is used to overweight genotypic differences observed on rare variants. Association between a phenotype trait and the constructed partition is then evaluated. We consider three association tests (one-way ANOVA, chi-square test, and the partition retention method) and compare these strategies using the simulated data from the Genetic Analysis Workshop 17. Several genes that contain causal SNPs were identified by the proposed method as top genes.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...