Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
Mais filtros










Intervalo de ano de publicação
1.
Biometrics ; 80(1)2024 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-38465983

RESUMO

In genomics studies, the investigation of gene relationships often brings important biological insights. Currently, the large heterogeneous datasets impose new challenges for statisticians because gene relationships are often local. They change from one sample point to another, may only exist in a subset of the sample, and can be nonlinear or even nonmonotone. Most previous dependence measures do not specifically target local dependence relationships, and the ones that do are computationally costly. In this paper, we explore a state-of-the-art network estimation technique that characterizes gene relationships at the single cell level, under the name of cell-specific gene networks. We first show that averaging the cell-specific gene relationship over a population gives a novel univariate dependence measure, the averaged Local Density Gap (aLDG), that accumulates local dependence and can detect any nonlinear, nonmonotone relationship. Together with a consistent nonparametric estimator, we establish its robustness on both the population and empirical levels. Then, we show that averaging the cell-specific gene relationship over mini-batches determined by some external structure information (eg, spatial or temporal factor) better highlights meaningful local structure change points. We explore the application of aLDG and its minibatch variant in many scenarios, including pairwise gene relationship estimation, bifurcating point detection in cell trajectory, and spatial transcriptomics structure visualization. Both simulations and real data analysis show that aLDG outperforms existing ones.


Assuntos
Algoritmos , Análise da Expressão Gênica de Célula Única , Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes , Análise de Sequência de RNA/métodos
2.
R Soc Open Sci ; 10(11): 230857, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-38034126

RESUMO

Multivariate time-series data that capture the temporal evolution of interconnected systems are ubiquitous in diverse areas. Understanding the complex relationships and potential dependencies among co-observed variables is crucial for the accurate statistical modelling and analysis of such systems. Here, we introduce kernel-based statistical tests of joint independence in multivariate time series by extending the d-variable Hilbert-Schmidt independence criterion to encompass both stationary and non-stationary processes, thus allowing broader real-world applications. By leveraging resampling techniques tailored for both single- and multiple-realization time series, we show how the method robustly uncovers significant higher-order dependencies in synthetic examples, including frequency mixing data and logic gates, as well as real-world climate, neuroscience and socio-economic data. Our method adds to the mathematical toolbox for the analysis of multivariate time series and can aid in uncovering high-order interactions in data.

3.
Biometrics ; 79(4): 3227-3238, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37312587

RESUMO

It has been increasingly appealing to evaluate whether expression levels of two genes in a gene coexpression network are still dependent given samples' clinical information, in which the conditional independence test plays an essential role. For enhanced robustness regarding model assumptions, we propose a class of double-robust tests for evaluating the dependence of bivariate outcomes after controlling for known clinical information. Although the proposed test relies on the marginal density functions of bivariate outcomes given clinical information, the test remains valid as long as one of the density functions is correctly specified. Because of the closed-form variance formula, the proposed test procedure enjoys computational efficiency without requiring a resampling procedure or tuning parameters. We acknowledge the need to infer the conditional independence network with high-dimensional gene expressions, and further develop a procedure for multiple testing by controlling the false discovery rate. Numerical results show that our method accurately controls both the type-I error and false discovery rate, and it provides certain levels of robustness regarding model misspecification. We apply the method to a gastric cancer study with gene expression data to understand the associations between genes belonging to the transforming growth factor ß signaling pathway given cancer-stage information.


Assuntos
Redes Reguladoras de Genes , Neoplasias , Humanos , Neoplasias/genética
4.
Entropy (Basel) ; 25(3)2023 Feb 26.
Artigo em Inglês | MEDLINE | ID: mdl-36981314

RESUMO

The Conditional Independence (CI) test is a fundamental problem in statistics. Many nonparametric CI tests have been developed, but a common challenge exists: the current methods perform poorly with a high-dimensional conditioning set. In this paper, we considered a nonparametric CI test using a kernel-based test statistic, which can be viewed as an extension of the Hilbert-Schmidt Independence Criterion (HSIC). We propose a local bootstrap method to generate samples from the null distribution H0:X⫫Y∣Z. The experimental results showed that our proposed method led to a significant performance improvement compared with previous methods. In particular, our method performed well against the growth of the dimension of the conditioning set. Meanwhile, our method can be computed efficiently against the growth of the sample size and the dimension of the conditioning set.

5.
Biometrika ; 109(2): 277-293, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37416628

RESUMO

We consider the problem of conditional independence testing: given a response Y and covariates (X,Z), we test the null hypothesis that Y⫫X∣Z. The conditional randomization test was recently proposed as a way to use distributional information about X∣Z to exactly and nonasymptotically control Type-I error using any test statistic in any dimensionality without assuming anything about Y∣(X,Z). This flexibility, in principle, allows one to derive powerful test statistics from complex prediction algorithms while maintaining statistical validity. Yet the direct use of such advanced test statistics in the conditional randomization test is prohibitively computationally expensive, especially with multiple testing, due to the requirement to recompute the test statistic many times on resampled data. We propose the distilled conditional randomization test, a novel approach to using state-of-the-art machine learning algorithms in the conditional randomization test while drastically reducing the number of times those algorithms need to be run, thereby taking advantage of their power and the conditional randomization test's statistical guarantees without suffering the usual computational expense. In addition to distillation, we propose a number of other tricks, like screening and recycling computations, to further speed up the conditional randomization test without sacrificing its high power and exact validity. Indeed, we show in simulations that all our proposals combined lead to a test that has similar power to the most powerful existing conditional randomization test implementations, but requires orders of magnitude less computation, making it a practical tool even for large datasets. We demonstrate these benefits on a breast cancer dataset by identifying biomarkers related to cancer stage.

6.
Sichuan Mental Health ; (6): 314-317, 2021.
Artigo em Chinês | WPRIM (Pacífico Ocidental) | ID: wpr-987499

RESUMO

The purpose of this article was to introduce the methods for the independence test of a special high-dimensional table (ie g×2×2 table) and its SAS implementation. There were three approaches, in the SAS software and the statistical textbooks, which could be used to perform the independence test for the data of a multiway table. The three kinds of methods were the generalized CMH χ2 test (for short, approach-1), the weighted χ2 test with the weighted coefficients in its formula (for short, approach-2), and the weighted χ2 test without the weighted coefficients in its formula (for short, approach-3), respectively. This article revealed that the “approach-2” and “approach-3” were the same weighted χ2 test essentially, but with different manifestations. It also revealed that the weighted χ2 test statistic estimation was approximately equal to the CMH χ2 test statistic estimation in the “approach-1”. Based on an example and the SAS software, the article introduced the concrete approaches for the independence test of the g×2×2 table data, explained the output results, and made the statistical and professional conclusions.

7.
Sichuan Mental Health ; (6): 318-321, 2021.
Artigo em Chinês | WPRIM (Pacífico Ocidental) | ID: wpr-987500

RESUMO

The purpose of this paper was to introduce the independence test and the SAS implementation for the "unstratified person-time data" and the "stratified person-time data". In "person-time data", the sample size in each level of the treatment factor was expressed as "person-years". Furthermore, it was necessary to use the "incidence density" to replace the "incidence rate" in the usual qualitative data analysis. The paper introduced the concrete approaches of comparing the "incidence density" of the "unstratified person-time data" and the "stratified person-time data" in detail, and demonstrated the whole process of using SAS software to realize the calculation through two examples, including the SAS program code, the SAS output results, the results explanation and the conclusion statement.

8.
Genes (Basel) ; 11(5)2020 05 18.
Artigo em Inglês | MEDLINE | ID: mdl-32443545

RESUMO

The genetic markers on mitochondria DNA (mtDNA) and Y-chromosome can be applied as a powerful tool in population genetics. We present a study to reveal the genetic background of Kyrgyz group, a Chinese ethnic group living in northwest China, and genetic polymorphisms of 60 loci on maternal inherited mtDNA and 24 loci on paternal inherited Y-chromosome short tandem repeats (Y-STRs) were investigated. The relationship between the two systems was tested, and the result indicated that they were statistically independent from each other. The genetic distances between Kyrgyz group and 11 reference populations for mtDNA, and 13 reference populations for Y-STRs were also calculated, respectively. The present results demonstrated that the Kyrgyz group was genetically closer to East Asian populations than European populations based on the mtDNA loci but the other way around for the Y-STRs. The genetic analyses could largely strengthen the understanding for the genetic background of the Kyrgyz group.


Assuntos
Cromossomos Humanos Y/genética , DNA Mitocondrial/genética , Genética Populacional , Repetições de Microssatélites/genética , China/epidemiologia , Etnicidade/genética , Feminino , Marcadores Genéticos/genética , Haplótipos/genética , Humanos , Masculino , Polimorfismo Genético/genética
9.
BMC Genomics ; 19(1): 650, 2018 Sep 04.
Artigo em Inglês | MEDLINE | ID: mdl-30180792

RESUMO

BACKGROUND: Long non-coding RNAs (lncRNAs) can indirectly regulate mRNAs expression levels by sequestering microRNAs (miRNAs), and act as competing endogenous RNAs (ceRNAs) or as sponges. Previous studies identified lncRNA-mediated sponge interactions in various cancers including the breast cancer. However, breast cancer subtypes are quite distinct in terms of their molecular profiles; therefore, ceRNAs are expected to be subtype-specific as well. RESULTS: To find lncRNA-mediated ceRNA interactions in breast cancer subtypes, we develop an integrative approach. We conduct partial correlation analysis and kernel independence tests on patient gene expression profiles and further refine the candidate interactions with miRNA target information. We find that although there are sponges common to multiple subtypes, there are also distinct subtype-specific interactions. Functional enrichment of mRNAs that participate in these interactions highlights distinct biological processes for different subtypes. Interestingly, some of the ceRNAs also reside in close proximity in the genome; for example, those involving HOX genes, HOTAIR, miR-196a-1 and miR-196a-2. We also discover subtype-specific sponge interactions with high prognostic potential. We found that patients differ significantly in their survival distributions if they are group based on the expression patterns of specific ceRNA interactions. However, it is not the case if the expression of individual RNAs participating in ceRNA is used. CONCLUSION: These results can help shed light on subtype-specific mechanisms of breast cancer, and the methodology developed herein can help uncover sponges in other diseases.


Assuntos
Neoplasias da Mama/genética , Carcinoma Basocelular/genética , Redes Reguladoras de Genes , MicroRNAs/genética , RNA Longo não Codificante/genética , RNA Mensageiro/genética , Receptor ErbB-2/metabolismo , Neoplasias da Mama/classificação , Neoplasias da Mama/metabolismo , Neoplasias da Mama/patologia , Carcinoma Basocelular/classificação , Carcinoma Basocelular/metabolismo , Carcinoma Basocelular/patologia , Biologia Computacional , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Prognóstico , Taxa de Sobrevida
10.
Forensic Sci Int ; 283: 173-179, 2018 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-29324348

RESUMO

Randomly acquired characteristics (RACs), also known as accidental marks, are random markings on a shoe sole, such as scratches or holes, that are used by forensic experts to compare a suspect's shoe with a print found at the crime scene. This article investigates the relationships among three features of a RAC: its location, shape type and orientation. If these features, as well as the RACs, are independent of each other, a simple probabilistic calculation could be used to evaluate the rarity of a RAC and hence the evidential value of the shoe and print comparison, whereas a correlation among the features would complicate the analysis. Using a data set of about 380 shoes, it is found that RACs and their features are not independent, and moreover, are not independent of the shoe sole pattern. It is argued that some of the dependencies found are caused by the elements of the sole. The results have important implications for the way forensic experts should evaluate the degree of rarity of a combination of RACs.

11.
Talanta ; 178: 348-354, 2018 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-29136832

RESUMO

The interleaved Incremental Association Markov Blanket (inter-IAMB) is described herein as a feature selection method for the NIR spectroscopic analysis of several samples (diesel, gasoline, and etchant solutions). Although the Markov blanket (MB) has been proven to be the minimal optimal set of features (variables) that does not change the original target distribution, variables selected by the existing IAMB algorithm could be redundant and/or misleading as the IAMB requires an unnecessarily large amount of learning data to identify the MB. Use of the inter-IAMB interleaving the grow phase with the shrink phase to maintain the size of the MB as small as possible by immediately eliminating invalid candidates could overcome this drawback. In this report, a likelihood-ratio (LR)-based conditional independence test, able to handle spectroscopic data normally comprising a large number of continuous variables in a small number of samples, was uniquely embedded in the inter-IAMB and its utility was evaluated. The variables selected by the inter-IAMB in complexly overlapped and feature-indistinct NIR spectra were used to determine the corresponding sample properties. For comparison, the properties were also determined using the IAMB-selected variables as well as the whole variables. The inter-IAMB was more effective in the selection of variables than the IAMB and thus able to improve the accuracy in the determination of the sample properties, even though a smaller number of variables was used. The proposed LR-embedded inter-IAMB could be a potential feature selection method for vibrational spectroscopic analysis, especially when the obtained spectral features are specificity-deficient and extensively overlapped.

12.
J Am Stat Assoc ; 110(512): 1726-1734, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26877569

RESUMO

Statistical inference on conditional dependence is essential in many fields including genetic association studies and graphical models. The classic measures focus on linear conditional correlations, and are incapable of characterizing non-linear conditional relationship including non-monotonic relationship. To overcome this limitation, we introduces a nonparametric measure of conditional dependence for multivariate random variables with arbitrary dimensions. Our measure possesses the necessary and intuitive properties as a correlation index. Briefly, it is zero almost surely if and only if two multivariate random variables are conditionally independent given a third random variable. More importantly, the sample version of this measure can be expressed elegantly as the root of a V or U-process with random kernels and has desirable theoretical properties. Based on the sample version, we propose a test for conditional independence, which is proven to be more powerful than some recently developed tests through our numerical simulations. The advantage of our test is even greater when the relationship between the multivariate random variables given the third random variable cannot be expressed in a linear or monotonic function of one random variable versus the other. We also show that the sample measure is consistent and weakly convergent, and the test statistic is asymptotically normal. By applying our test in a real data analysis, we are able to identify two conditionally associated gene expressions, which otherwise cannot be revealed. Thus, our measure of conditional dependence is not only an ideal concept, but also has important practical utility.

13.
Am Stat ; 48(3): 158-169, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25308974

RESUMO

We develop a novel nonparametric likelihood ratio test for independence between two random variables using a technique that is free of the common constraints of defining a given set of specific dependence structures. Our methodology revolves around an exact density-based empirical likelihood ratio test statistic that approximates in a distribution-free fashion the corresponding most powerful parametric likelihood ratio test. We demonstrate that the proposed test is very powerful in detecting general structures of dependence between two random variables, including non-linear and/or random-effect dependence structures. An extensive Monte Carlo study confirms that the proposed test is superior to the classical nonparametric procedures across a variety of settings. The real-world applicability of the proposed test is illustrated using data from a study of biomarkers associated with myocardial infarction.

14.
J Stat Plan Inference ; 142(12): 3097-3106, 2012 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-23956488

RESUMO

A class of distribution-free tests is proposed for the independence of two subsets of response coordinates. The tests are based on the pairwise distances across subjects within each subset of the response. A complete graph is induced by each subset of response coordinates, with the sample points as nodes and the pairwise distances as the edge weights. The proposed test statistic depends only on the rank order of edges in these complete graphs. The response vector may be of any dimensions. In particular, the number of samples may be smaller than the dimensions of the response. The test statistic is shown to have a normal limiting distribution with known expectation and variance under the null hypothesis of independence. The exact distribution free null distribution of the test statistic is given for a sample of size 14, and its Monte-Carlo approximation is considered for larger sample sizes. We demonstrate in simulations that this new class of tests has good power properties for very general alternatives.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...