Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Biometrics ; 79(1): 178-189, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-34608993

RESUMO

In this paper, we propose a frequentist model averaging method for quantile regression with high-dimensional covariates. Although research on these subjects has proliferated as separate approaches, no study has considered them in conjunction. Our method entails reducing the covariate dimensions through ranking the covariates based on marginal quantile utilities. The second step of our method implements model averaging on the models containing the covariates that survive the screening of the first step. We use a delete-one cross-validation method to select the model weights, and prove that the resultant estimator possesses an optimal asymptotic property uniformly over any compact (0,1) subset of the quantile indices. Our proof, which relies on empirical process theory, is arguably more challenging than proofs of similar results in other contexts owing to the high-dimensional nature of the problem and our relaxation of the conventional assumption of the weights summing to one. Our investigation of finite-sample performance demonstrates that the proposed method exhibits very favorable properties compared to the least absolute shrinkage and selection operator (LASSO) and smoothly clipped absolute deviation (SCAD) penalized regression methods. The method is applied to a microarray gene expression data set.


Assuntos
Projetos de Pesquisa , Humanos , Simulação por Computador , Análise de Regressão
2.
Econom J ; 24(1): 177-197, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-33746562

RESUMO

In this paper, we develop a model averaging method to estimate a high-dimensional covariance matrix, where the candidate models are constructed by different orders of polynomial functions. We propose a Mallows-type model averaging criterion and select the weights by minimizing this criterion, which is an unbiased estimator of the expected in-sample squared error plus a constant. Then, we prove the asymptotic optimality of the resulting model average covariance estimators. Finally, we conduct numerical simulations and a case study on Chinese airport network structure data to demonstrate the usefulness of the proposed approaches.

3.
J Am Stat Assoc ; 115(530): 972-984, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-34168389

RESUMO

Model averaging generally provides better predictions than model selection, but the existing model averaging methods cannot lead to parsimonious models. Parsimony is an especially important property when the number of parameters is large. To achieve a parsimonious model averaging coefficient estimator, we suggest a novel criterion for choosing weights. Asymptotic properties are derived in two practical scenarios: (i) one or more correct models exist in the candidate model set and (ii) all candidate models are misspecified. Under the former scenario, it is proved that our method can put the weight one to the smallest correct model and the resulting model averaging estimators of coefficients have many zeros and thus lead to a parsimonious model. The asymptotic distribution of the estimators is also provided. Under the latter scenario, prediction is mainly focused on and we prove that the proposed procedure is asymptotically optimal in the sense that its squared prediction loss and risk are asymptotically identical to those of the best-but infeasible-model averaging estimator. Numerical analysis shows the promise of the proposed procedure over existing model averaging and selection methods.

4.
Trop Doct ; 47(2): 165-167, 2017 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-27079490

RESUMO

Brucellosis is a common zoonotic infection worldwide and a major public health problem in developing countries including China. The aim of our study was to investigate the seroprevalence of Brucella infection in humans in Yixing, located at the centre of the Yangtze River Delta Urban Agglomeration. A total of 895 sera from apparently healthy abattoir workers and 3303 sera from general healthy people living in rural areas were collected in Yixing, screened by Rose-Bengal plate agglutination test (RBPT) and the positives were confirmed by standard tube agglutination test (SAT) according to official Chinese diagnostic criteria. Seropositivity among abattoir workers was 16.42% compared to zero among the general population living in rural areas. No significant difference of seropositivity was observed in age groups. Contact or inhalation of Brucella organisms from infected animals, principally goats, was found to be a significant risk factor. Education in occupational hygiene and public healthcare programmes are needed to control this emerging problem.


Assuntos
Brucelose/epidemiologia , Adolescente , Adulto , Distribuição por Idade , Idoso , Idoso de 80 Anos ou mais , Testes de Aglutinação , Animais , Anticorpos Antibacterianos/sangue , Brucella , Brucelose/diagnóstico , China/epidemiologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Rosa Bengala , Estudos Soroepidemiológicos , Distribuição por Sexo , Adulto Jovem
5.
Stat Sin ; 25: 1583-1598, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-27761098

RESUMO

This paper proposes a model averaging method based on Kullback-Leibler distance under a homoscedastic normal error term. The resulting model average estimator is proved to be asymptotically optimal. When combining least squares estimators, the model average estimator is shown to have the same large sample properties as the Mallows model average (MMA) estimator developed by Hansen (2007). We show via simulations that, in terms of mean squared prediction error and mean squared parameter estimation error, the proposed model average estimator is more efficient than the MMA estimator and the estimator based on model selection using the corrected Akaike information criterion in small sample situations. A modified version of the new model average estimator is further suggested for the case of heteroscedastic random errors. The method is applied to a data set from the Hong Kong real estate market.

7.
Sankhya Ser A ; 71(1): 73-93, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-21191551

RESUMO

This paper is concerned with a generalized growth curve model. We derive the unbiased invariant least squares estimators of the linear functions of variance-covariance matrix of disturbances. Under the minimum variance criterion, we obtain the necessary and sufficient conditions of the proposed estimators to be optimal. Simulation studies show that the proposed estimators perform well.

8.
Biometrika ; 95(3): 773-778, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-19122890

RESUMO

The conventional model selection criterion AIC has been applied to choose candidate models in mixed-effects models by the consideration of marginal likelihood. Vaida and Blanchard (2005) demonstrated that such a marginal AIC and its small sample correction are inappropriate when the research focus is on clusters. Correspondingly, these authors suggested to use conditional AIC. The conditional AIC is derived under the assumptions of the variance-covariance matrix or scaled variance-covariance matrix of random effects being known. We develop a general conditional AIC but without these strong assumptions. This allows Vaida and Blanchard's conditional AIC to be applied in a wide range. Simulation studies show that the proposed method is promising.

9.
Comput Stat Data Anal ; 52(5): 2538-2548, 2008 Jan 20.
Artigo em Inglês | MEDLINE | ID: mdl-19158943

RESUMO

In survival analysis, it is of interest to appropriately select significant predictors. In this paper, we extend the AIC(C) selection procedure of Hurvich and Tsai to survival models to improve the traditional AIC for small sample sizes. A theoretical verification under a special case of the exponential distribution is provided. Simulation studies illustrate that the proposed method substantially outperforms its counterpart: AIC, in small samples, and competes it in moderate and large samples. Two real data sets are also analyzed.

10.
Comput Stat Data Anal ; 53(2): 546-553, 2008 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-20016662

RESUMO

In practical data analysis, nonresponse phenomenon frequently occurs. In this paper, we propose an empirical likelihood based confidence interval for a common mean by combining the imputed data, assuming that data are missing completely at random. Simulation studies show that such confidence intervals perform well, even the missing proportion is high. Our method is applied to an analysis of a real data set from an AIDS clinic trial study.

11.
Biom J ; 49(3): 406-15, 2007 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-17623345

RESUMO

In this article we propose to use a semiparametric mixed-effects model based on an exploratory analysis of clinical trial data for a study of the relation between virologic responses and immunologic markers such as CD4+ and CD8 counts, and host-specific factors in AIDS clinical trials. The regression spline technique, used for inference for parameters in the model, reduces the unknown nonparametric components to parametric functions. It is simple and straightforward to implement the procedures using readily available software, and parameter inference can be developed from standard parametric models. We apply the model and the proposed method to an AIDS clinical study. Our findings indicate that viral load level is positively related to baseline viral load level, negatively related to CD4+ cell counts, but unrelated to CD8 cell counts and patient's age neither.


Assuntos
Síndrome da Imunodeficiência Adquirida/tratamento farmacológico , Antivirais/uso terapêutico , Modelos Biológicos , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Síndrome da Imunodeficiência Adquirida/imunologia , Síndrome da Imunodeficiência Adquirida/virologia , Contagem de Linfócito CD4 , Linfócitos T CD8-Positivos/imunologia , Ensaios Clínicos Fase I como Assunto , Ensaios Clínicos Fase II como Assunto , HIV/imunologia , Humanos , Carga Viral
12.
Genetics ; 173(3): 1747-60, 2006 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-16624925

RESUMO

DNA pooling is a cost-effective approach for collecting information on marker allele frequency in genetic studies. It is often suggested as a screening tool to identify a subset of candidate markers from a very large number of markers to be followed up by more accurate and informative individual genotyping. In this article, we investigate several statistical properties and design issues related to this two-stage design, including the selection of the candidate markers for second-stage analysis, statistical power of this design, and the probability that truly disease-associated markers are ranked among the top after second-stage analysis. We have derived analytical results on the proportion of markers to be selected for second-stage analysis. For example, to detect disease-associated markers with an allele frequency difference of 0.05 between the cases and controls through an initial sample of 1000 cases and 1000 controls, our results suggest that when the measurement errors are small (0.005), approximately 3% of the markers should be selected. For the statistical power to identify disease-associated markers, we find that the measurement errors associated with DNA pooling have little effect on its power. This is in contrast to the one-stage pooling scheme where measurement errors may have large effect on statistical power. As for the probability that the disease-associated markers are ranked among the top in the second stage, we show that there is a high probability that at least one disease-associated marker is ranked among the top when the allele frequency differences between the cases and controls are not <0.05 for reasonably large sample sizes, even though the errors associated with DNA pooling in the first stage are not small. Therefore, the two-stage design with DNA pooling as a screening tool offers an efficient strategy in genomewide association studies, even when the measurement errors associated with DNA pooling are nonnegligible. For any disease model, we find that all the statistical results essentially depend on the population allele frequency and the allele frequency differences between the cases and controls at the disease-associated markers. The general conclusions hold whether the second stage uses an entirely independent sample or includes both the samples used in the first stage and an independent set of samples.


Assuntos
Estudos de Casos e Controles , Frequência do Gene , Algoritmos , DNA/genética , Interpretação Estatística de Dados , Pool Gênico , Marcadores Genéticos , Humanos , Modelos Genéticos
13.
Genetics ; 172(1): 687-91, 2006 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-16204206

RESUMO

With respect to the multiple-tests problem, recently an increasing amount of attention has been paid to control the false discovery rate (FDR), the positive false discovery rate (pFDR), and the proportion of false positives (PFP). The new approaches are generally believed to be more powerful than the classical Bonferroni one. This article focuses on the PFP approach. It demonstrates via examples in genetic association studies that the Bonferroni procedure can be more powerful than the PFP-control one and also shows the intrinsic connection between controlling the PFP and controlling the overall type I error rate. Since controlling the PFP does not necessarily lead to a desired power level, this article addresses the design issue and recommends the sample sizes that can attain the desired power levels when the PFP is controlled. The results in this article also provide rough guidance for the sample sizes to achieve the desired power levels when the FDR and especially the pFDR are controlled.


Assuntos
Mapeamento Cromossômico , Ligação Genética , Locos de Características Quantitativas , Algoritmos , Reações Falso-Positivas , Humanos , Modelos Genéticos , Tamanho da Amostra
14.
Ann Hum Genet ; 69(Pt 4): 429-42, 2005 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-15996171

RESUMO

DNA pooling is a cost-effective strategy for genomewise association studies to identify disease genes. In the context of family-based association studies, Risch & Teng (1998) mainly considered families of identical structures to detect associations between genetic markers and disease, and suggested possible approaches to incorporating different family types without a thorough study of their properties. However, families collected in real genetic studies often have different structures and, more importantly, the informativeness of each family structure depends on the disease model which is generally unknown. So there is a need to develop and investigate statistical methods to combine information from diverse family types. In this article, we propose a general strategy to incorporate different family types by assigning each family an "optimal" weight in association tests. In addition, we consider measurement errors in our analysis. When we evaluate our approach under different disease models and measurement errors, we find that our weighting scheme may lead to a substantial reduction in sample size required over the approach suggested by Risch & Teng (1998), and measurement errors may have significant impact on the required sample size when the error rates are not negligible.


Assuntos
DNA/genética , Genética Médica/estatística & dados numéricos , Núcleo Familiar , Projetos de Pesquisa/estatística & dados numéricos , Simulação por Computador , Feminino , Humanos , Masculino , Modelos Genéticos , Tamanho da Amostra , Viés de Seleção
15.
Genet Epidemiol ; 26(4): 286-93, 2004 May.
Artigo em Inglês | MEDLINE | ID: mdl-15095388

RESUMO

This report points out that some sibling genetic risk parameters can be regarded as the ratios of the characteristic values in the ascertainment subpopulation. Based on this observation, we reconsider Olson and Cordell's ([2000] Genet. Epidemiol. 18:217-235) and Cordell and Olson's ([2000] Genet. Epidemiol. 18:307-321) estimators, and re-derive these estimators. Furthermore, we provide the closed-form variance estimators. Simulation results suggest that our proposed estimators perform very well, and single ascertainment may be better than complete ascertainment for estimating these genetic parameters.


Assuntos
Viés , Predisposição Genética para Doença/genética , Modelos Genéticos , Núcleo Familiar , Alelos , Marcadores Genéticos/genética , Humanos , Modelos Estatísticos , Distribuição de Poisson , Recidiva , Risco
16.
Genet Epidemiol ; 26(1): 1-10, 2004 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-14691952

RESUMO

Case-control association studies using unrelated individuals may offer an effective approach for identifying genetic variants that have small to moderate disease risks. In general, two different strategies may be employed to establish associations between genotypes and phenotypes: (1) collecting individual genotypes or (2) quantifying allele frequencies in DNA pools. These two technologies have their respective advantages. Individual genotyping gathers more information, whereas DNA pooling may be more cost effective. Recent technological advances in DNA pooling have generated great interest in using DNA pooling in association studies. In this article, we investigate the impacts of errors in genotyping or measuring allele frequencies on the identification of genetic associations with these two strategies. We find that, with current technologies, compared to individual genotyping, a larger sample is generally required to achieve the same power using DNA pooling. We further consider the use of DNA pooling as a screening tool to identify candidate regions for follow-up studies. We find that the majority of the positive regions identified from DNA pooling results may represent false positives if measurement errors are not appropriately considered in the design of the study.


Assuntos
DNA/genética , Modelos Genéticos , Viés de Seleção , Algoritmos , Estudos de Casos e Controles , Frequência do Gene , Genótipo , Humanos , Tamanho da Amostra
17.
Hum Hered ; 56(1-3): 131-8, 2003.
Artigo em Inglês | MEDLINE | ID: mdl-14614247

RESUMO

Several statistical methods have been proposed to estimate haplotype frequencies, either based on unrelated individuals or based on families. These estimates may yield insights on population genetics as well as associations between candidate regions and disease of interest. One limitation of the existing methods is that all these methods make the implicit assumption that there are no genotyping errors. However, genotyping errors are unavoidable in practice. Numerous methods have been developed to incorporate genotyping errors in genetic studies, but none to date have addressed the issues of haplotype inference in the presence of genotyping errors. In this article, we develop statistical methods for haplotype inference incorporating genotyping errors. We describe how our methods can be applied to analyze unrelated individuals as well as nuclear families. Our simulation results show that the proposed methods perform well in the presence of genotyping errors.


Assuntos
Interpretação Estatística de Dados , Frequência do Gene , Haplótipos , Simulação por Computador , Genótipo , Humanos
18.
Genetics ; 164(3): 1161-73, 2003 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-12871922

RESUMO

The identification of genotyping errors is an important issue in mapping complex disease genes. Although it is common practice to genotype multiple markers in a candidate region in genetic studies, the potential benefit of jointly analyzing multiple markers to detect genotyping errors has not been investigated. In this article, we discuss genotyping error detections for a set of tightly linked markers in nuclear families, and the objective is to identify families likely to have genotyping errors at one or more markers. We make use of the fact that recombination is a very unlikely event among these markers. We first show that, with family trios, no extra information can be gained by jointly analyzing markers if no phase information is available, and error detection rates are usually low if Mendelian consistency is used as the only standard for checking errors. However, for nuclear families with more than one child, error detection rates can be greatly increased with the consideration of more markers. Error detection rates also increase with the number of children in each family. Because families displaying Mendelian consistency may still have genotyping errors, we calculate the probability that a family displaying Mendelian consistency has correct genotypes. These probabilities can help identify families that, although showing Mendelian consistency, may have genotyping errors. In addition, we examine the benefit of available haplotype frequencies in the general population on genotyping error detections. We show that both error detection rates and the probability that an observed family displaying Mendelian consistency has correct genotypes can be greatly increased when such additional information is available.


Assuntos
Testes Genéticos/métodos , Modelos Genéticos , Projetos de Pesquisa , Família , Marcadores Genéticos , Genótipo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...