Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 31
Filter
1.
Spectrochim Acta A Mol Biomol Spectrosc ; 302: 123072, 2023 Dec 05.
Article in English | MEDLINE | ID: mdl-37390722

ABSTRACT

Candida rugosa lipase (CRL, EC3.1.1.3) is one of the main enzymes synthesizing esters, and ZIF-8 was chosen as an immobilization carrier for lipase. Enzyme activity testing often requires expensive reagents as substrates, and the experiment processes are time-consuming and inconvenient. As a result, a novel approach based on near-infrared spectroscopy (NIRs) was developed for predicting CRL/ZIF-8 enzyme activity. The absorbance of the immobilized enzyme catalytic system was evaluated using UV-Vis spectroscopy to investigate the amount of CRL/ZIF-8 enzyme activity. The powdered samples' near-infrared spectra were obtained. The sample's enzyme activity data were linked with each sample's original NIR spectra to establish the NIR model. A partial least squares (PLS) model of immobilized enzyme activity was developed by coupling spectral preprocessing with a variable screening technique. The experiments were completed within 48 h to eliminate inaccuracies between the reduction in enzyme activity with increasing laying-aside time throughout the test and the NIRs modeling. The root-mean-square error of cross-validation (RMSECV), the correlation coefficient of validation set (R) value, and the ratio of prediction to deviation (RPD) value were employed as assessment model indicators. The near-infrared spectrum model was developed by merging the best 2nd derivative spectral preprocessing with the Competitive Adaptive Reweighted Sampling (CARS) variable screening method. This model's root-mean-square error of cross-validation (RMSECV) was 0.368 U/g, the correlation coefficient of calibration set (R_cv) value was 0.943, the root-mean-square error of prediction (RMSEP) set was 0.414 U/g, the correlation coefficient of validation set (R) value was 0.952, and the ratio of prediction to deviation (RPD) was 3.0. The model demonstrates that the fitting relationship between the predicted and the reference enzyme activity value of the NIRs is satisfactory. The findings revealed a strong relationship between NIRs and CRL/ZIF-8 enzyme activity. As a result, the established model could be implemented to quantify the enzyme activity of CRL/ZIF-8 quickly by including more variations of natural samples. The prediction method is simple, rapid, and adaptable to be the theoretical and practical basis for further studying other interdisciplinary research work in enzymology and spectroscopy.


Subject(s)
Enzymes, Immobilized , Spectroscopy, Near-Infrared , Spectroscopy, Near-Infrared/methods , Least-Squares Analysis , Calibration
2.
J Am Stat Assoc ; 118(541): 135-146, 2023.
Article in English | MEDLINE | ID: mdl-37346228

ABSTRACT

With rapid advances in information technology, massive datasets are collected in all fields of science, such as biology, chemistry, and social science. Useful or meaningful information is extracted from these data often through statistical learning or model fitting. In massive datasets, both sample size and number of predictors can be large, in which case conventional methods face computational challenges. Recently, an innovative and effective sampling scheme based on leverage scores via singular value decompositions has been proposed to select rows of a design matrix as a surrogate of the full data in linear regression. Analogously, variable screening can be viewed as selecting rows of the design matrix. However, effective variable selection along this line of thinking remains elusive. In this article, we bridge this gap to propose a weighted leverage variable screening method by utilizing both the left and right singular vectors of the design matrix. We show theoretically and empirically that the predictors selected using our method can consistently include true predictors not only for linear models but also for complicated general index models. Extensive simulation studies show that the weighted leverage screening method is highly computationally efficient and effective. We also demonstrate its success in identifying carcinoma related genes using spatial transcriptome data.

3.
Entropy (Basel) ; 25(3)2023 Mar 17.
Article in English | MEDLINE | ID: mdl-36981413

ABSTRACT

Sufficient variable screening rapidly reduces dimensionality with high probability in ultra-high dimensional modeling. To rapidly screen out the null predictors, a quantile-adaptive sufficient variable screening framework is developed by controlling the false discovery. Without any specification of an actual model, we first introduce a compound testing procedure based on the conditionally imputing marginal rank correlation at different quantile levels of response to select active predictors in high dimensionality. The testing statistic can capture sufficient dependence through two paths: one is to control false discovery adaptively and the other is to control the false discovery rate by giving a prespecified threshold. It is computationally efficient and easy to implement. We establish the theoretical properties under mild conditions. Numerical studies including simulation studies and real data analysis contain supporting evidence that the proposal performs reasonably well in practical settings.

4.
Biometrics ; 79(2): 903-914, 2023 06.
Article in English | MEDLINE | ID: mdl-35043393

ABSTRACT

Causal inference has been increasingly reliant on observational studies with rich covariate information. To build tractable causal procedures, such as the doubly robust estimators, it is imperative to first extract important features from high or even ultra-high dimensional data. In this paper, we propose causal ball screening for confounder selection from modern ultra-high dimensional data sets. Unlike the familiar task of variable selection for prediction modeling, our confounder selection procedure aims to control for confounding while improving efficiency in the resulting causal effect estimate. Previous empirical and theoretical studies suggest excluding causes of the treatment that are not confounders. Motivated by these results, our goal is to keep all the predictors of the outcome in both the propensity score and outcome regression models. A distinctive feature of our proposal is that we use an outcome model-free procedure for propensity score model selection, thereby maintaining double robustness in the resulting causal effect estimator. Our theoretical analyses show that the proposed procedure enjoys a number of properties, including model selection consistency and pointwise normality. Synthetic and real data analysis show that our proposal performs favorably with existing methods in a range of realistic settings. Data used in preparation of this paper were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.


Subject(s)
Models, Statistical , Models, Theoretical , Computer Simulation , Propensity Score , Causality
5.
Stat Methods Med Res ; 32(1): 22-40, 2023 01.
Article in English | MEDLINE | ID: mdl-36177601

ABSTRACT

Ultra-high dimensional data, such as gene and neuroimaging data, are becoming increasingly important in biomedical science. Identifying important biomarkers from the huge number of features can help us gain better insights into further researches. Variable screening is an efficient tool to achieve this goal under the large scale cases, which reduces the dimension of features into a moderate size by removing the major part of inactive ones. Developing novel variable screening methods for high-dimensional features with group structures is challenging, especially under the overlapped cases. For example, the huge-scaled genes usually can be partitioned into hundreds of pathways according to background knowledge. One primary characteristic for this type of data is that many genes may appear across more than one pathway, which means that different pathways are overlapped. However, existing variable screening methods only could deal with disjoint group structure cases. To fill this gap, we propose a novel variable screening method for the generalized linear model by incorporating overlapped partition structures with theoretical guarantee. Besides the sure screening property, we also test the performance of the proposed method through a series of numerical studies and apply it to statistical analysis of a breast cancer data.


Subject(s)
Linear Models , Biomarkers
6.
Stat Methods Med Res ; 31(10): 1845-1859, 2022 10.
Article in English | MEDLINE | ID: mdl-35635269

ABSTRACT

Precision medicine is a medical paradigm that focuses on making effective treatment decision based on individual patient characteristics. When there are a large amount of patient information, such as patient's genetic information, medical records and clinical measurements, available, it is of interest to select the covariates which have interactions with the treatment, for example, in determining the individualized treatment regime where only a subset of covariates with treatment interactions involves in decision making. We propose a marginal feature ranking and screening procedure for measuring interactions between the treatment and covariates. The method does not require imposing a specific model structure on the regression model and is applicable in a high dimensional setting. Theoretical properties in terms of consistency in ranking and selection are established. We demonstrate the finite sample performance of the proposed method by simulation and illustrate the applications with two real data examples from clinical trials.


Subject(s)
Precision Medicine , Computer Simulation , Humans , Treatment Outcome
7.
Sensors (Basel) ; 22(2)2022 Jan 17.
Article in English | MEDLINE | ID: mdl-35062644

ABSTRACT

Volatile organic compounds (VOCs) could be used as an indicator of the freshness of oysters. However, traditional characterization methods for VOCs have some disadvantages, such as having a high instrument cost, cumbersome pretreatment, and being time consuming. In this work, a fast and non-destructive method based on colorimetric sensor array (CSA) and visible near-infrared spectroscopy (VNIRS) was established to identify the freshness of oysters. Firstly, four color-sensitive dyes, which were sensitive to VOCs of oysters, were selected, and they were printed on a silica gel plate to obtain a CSA. Secondly, a charge coupled device (CCD) camera was used to obtain the "before" and "after" image of CSA. Thirdly, VNIS system obtained the reflected spectrum data of the CSA, which can not only obtain the color change information before and after the reaction of the CSA with the VOCs of oysters, but also reflect the changes in the internal structure of color-sensitive materials after the reaction of oysters' VOCs. The pattern recognition results of VNIS data showed that the fresh oysters and stale oysters could be separated directly from the principal component analysis (PCA) score plot, and linear discriminant analysis (LDA) model based on variables selection methods could obtain a good performance for the freshness detection of oysters, and the recognition rate of the calibration set was 100%, while the recognition rate of the prediction set was 97.22%. The result demonstrated that the CSA, combined with VNIRS, showed great potential for VOCS measurement, and this research result provided a fast and nondestructive identification method for the freshness identification of oysters.


Subject(s)
Ostreidae , Volatile Organic Compounds , Animals , Colorimetry , Discriminant Analysis , Spectroscopy, Near-Infrared
8.
Anal Chim Acta ; 1191: 339298, 2022 Jan 25.
Article in English | MEDLINE | ID: mdl-35033262

ABSTRACT

Noninvasive detection of blood components is the most ideal and effective method to prevent and detect many clinical diseases. However, the accuracy of noninvasive detection based on the spectrum is not always satisfactory. The influence of various interferences in measurement limits the accuracy of the analysis. The dynamic spectrum theory can theoretically eliminate the individual differences and measurement environment influence and improve measurement accuracy. The concentration of globulin is closely related to the status of the immune system, which is of great significance for clinical diagnosis. This paper improves the signal-to-noise ratio from all links of dynamic spectrum data processing to realize the noninvasive detection of globulin. Through reasonable pretreatment, extraction, quality evaluation, and variable screening, the valid information of the spectrum gets maximum utilization. Finally, using the partial least squares prediction model to predict globulin concentration. The results show that the model established by dynamic spectrum treated by this method has a good predictive performance for globulin. The correlation coefficient of the prediction set is 0.962, the root-mean-square error of the prediction set is only 1.058 g/L, the correlation coefficient of the calibration set is 0.996, and the root-mean-square error of the calibration set is 0.332 g/L. The experimental results show that reasonable data processing of dynamic spectrum can effectively improve the signal-to-noise ratio of the data, make the established model have good prediction accuracy and performance, and realize the high-precision prediction globulin. This paper provides a complete research idea and method for the noninvasive detection of blood components. It is hopeful to realize the noninvasive quantitative detection of trace components in blood.


Subject(s)
Globulins , Calibration , Humans , Least-Squares Analysis , Signal-To-Noise Ratio , Spectrum Analysis
9.
BMC Med Inform Decis Mak ; 21(1): 322, 2021 11 22.
Article in English | MEDLINE | ID: mdl-34809631

ABSTRACT

BACKGROUND: While random forests are one of the most successful machine learning methods, it is necessary to optimize their performance for use with datasets resulting from a two-phase sampling design with a small number of cases-a common situation in biomedical studies, which often have rare outcomes and covariates whose measurement is resource-intensive. METHODS: Using an immunologic marker dataset from a phase III HIV vaccine efficacy trial, we seek to optimize random forest prediction performance using combinations of variable screening, class balancing, weighting, and hyperparameter tuning. RESULTS: Our experiments show that while class balancing helps improve random forest prediction performance when variable screening is not applied, class balancing has a negative impact on performance in the presence of variable screening. The impact of the weighting similarly depends on whether variable screening is applied. Hyperparameter tuning is ineffective in situations with small sample sizes. We further show that random forests under-perform generalized linear models for some subsets of markers, and prediction performance on this dataset can be improved by stacking random forests and generalized linear models trained on different subsets of predictors, and that the extent of improvement depends critically on the dissimilarities between candidate learner predictions. CONCLUSION: In small datasets from two-phase sampling design, variable screening and inverse sampling probability weighting are important for achieving good prediction performance of random forests. In addition, stacking random forests and simple linear models can offer improvements over random forests.


Subject(s)
Machine Learning , Vaccine Efficacy , Humans , Probability
10.
Stat Sin ; 30: 1049-1067, 2020.
Article in English | MEDLINE | ID: mdl-32982122

ABSTRACT

Generalized varying coefficient models are particularly useful for examining dynamic effects of covariates on a continuous, binary or count response. This paper is concerned with feature screening for generalized varying coefficient models with ultrahigh dimensional covariates. The proposed screening procedure is based on joint quasi-likelihood of all predictors, and therefore is distinguished from marginal screening procedures proposed in the literature. In particular, the proposed procedure can effectively identify active predictors that are jointly dependent but marginally independent of the response. In order to carry out the proposed procedure, we propose an effective algorithm and establish the ascent property of the proposed algorithm. We further prove that the proposed procedure possesses the sure screening property. That is, with probability tending to one, the selected variable set includes the actual active predictors. We examine the finite sample performance of the proposed procedure and compare it with existing ones via Monte Carlo simulations, and illustrate the proposed procedure by a real data example.

11.
Stat Sin ; 30(3): 1213-1233, 2020 Jul.
Article in English | MEDLINE | ID: mdl-32742137

ABSTRACT

In the era of precision medicine, survival outcome data with high-throughput predictors are routinely collected. Models with an exceedingly large number of covariates are either infeasible to fit or likely to incur low predictability because of overfitting. Variable screening is key in identifying and removing irrelevant attributes. Recent years have seen a surge in screening methods, but most of them rely on some particular modeling assumptions. Motivated by a study on detecting gene signatures for multiple myeloma patients' survival, we propose a model-free L q -norm learning procedure, which includes the well-known Cramér-von Mises and Kolmogorov criteria as two special cases. The work provides an integrative framework for detecting predictors with various levels of impact, such as short- or long-term impact, on censored outcome data. The framework naturally leads to a scheme which combines results from different q to reduce false negatives, an aspect often overlooked by the current literature. We show that our method possesses sure screening properties. The utility of the proposal is confirmed with simulation studies and an analysis of the multiple myeloma study.

12.
Stat Med ; 39(16): 2167-2184, 2020 Jul 20.
Article in English | MEDLINE | ID: mdl-32282097

ABSTRACT

Model selection in high-dimensional settings has received substantial attention in recent years, however, similar advancements in the low-dimensional setting have been lacking. In this article, we introduce a new variable selection procedure for low to moderate scale regressions (n>p). This method repeatedly splits the data into two sets, one for estimation and one for validation, to obtain an empirically optimized threshold which is then used to screen for variables to include in the final model. In an extensive simulation study, we show that the proposed variable selection technique enjoys superior performance compared with candidate methods (backward elimination via repeated data splitting, univariate screening at 0.05 level, adaptive LASSO, SCAD), being amongst those with the lowest inclusion of noisy predictors while having the highest power to detect the correct model and being unaffected by correlations among the predictors. We illustrate the methods by applying them to a cohort of patients undergoing hepatectomy at our institution.


Subject(s)
Computer Simulation , Humans
13.
Comb Chem High Throughput Screen ; 23(8): 740-756, 2020.
Article in English | MEDLINE | ID: mdl-32342803

ABSTRACT

AIM AND OBJECTIVE: Near Infrared (NIR) spectroscopy data are featured by few dozen to many thousands of samples and highly correlated variables. Quantitative analysis of such data usually requires a combination of analytical methods with variable selection or screening methods. Commonly-used variable screening methods fail to recover the true model when (i) some of the variables are highly correlated, and (ii) the sample size is less than the number of relevant variables. In these cases, Partial Least Squares (PLS) regression based approaches can be useful alternatives. MATERIALS AND METHODS: In this research, a fast variable screening strategy, namely the preconditioned screening for ridge partial least squares regression (PSRPLS), is proposed for modelling NIR spectroscopy data with high-dimensional and highly correlated covariates. Under rather mild assumptions, we prove that using Puffer transformation, the proposed approach successfully transforms the problem of variable screening with highly correlated predictor variables to that of weakly correlated covariates with less extra computational effort. RESULTS: We show that our proposed method leads to theoretically consistent model selection results. Four simulation studies and two real examples are then analyzed to illustrate the effectiveness of the proposed approach. CONCLUSION: By introducing Puffer transformation, high correlation problem can be mitigated using the PSRPLS procedure we construct. By employing RPLS regression to our approach, it can be made more simple and computational efficient to cope with the situation where model size is larger than the sample size while maintaining a high precision prediction.


Subject(s)
Soil/chemistry , Spectroscopy, Near-Infrared/methods , Computer Simulation , Databases, Chemical , Least-Squares Analysis , Models, Theoretical , Monte Carlo Method
14.
Biom J ; 62(3): 610-626, 2020 05.
Article in English | MEDLINE | ID: mdl-31448463

ABSTRACT

When performing survival analysis in very high dimensions, it is often required to reduce the number of covariates using preliminary screening. During the last years, a large number of variable screening methods for the survival context have been developed. However, guidance is missing for choosing an appropriate method in practice. The aim of this work is to provide an overview of marginal variable screening methods for survival and develop recommendations for their use. For this purpose, a literature review is given, offering a comprehensive and structured introduction to the topic. In addition, a novel screening procedure based on distance correlation and martingale residuals is proposed, which is particularly useful in detecting nonmonotone associations. For evaluating the performance of the discussed approaches, a simulation study is conducted, comparing the true positive rates of competing variable screening methods in different settings. A real data example on mantle cell lymphoma is provided.


Subject(s)
Biometry/methods , Endpoint Determination , Analysis of Variance , Humans , Lymphoma, Mantle-Cell/epidemiology , Survival Analysis
15.
J Am Stat Assoc ; 114(526): 928-937, 2019.
Article in English | MEDLINE | ID: mdl-31692981

ABSTRACT

Extracting important features from ultra-high dimensional data is one of the primary tasks in statistical learning, information theory, precision medicine and biological discovery. Many of the sure independent screening methods developed to meet these needs are suitable for special models under some assumptions. With the availability of more data types and possible models, a model-free generic screening procedure with fewer and less restrictive assumptions is desirable. In this paper, we propose a generic nonparametric sure independence screening procedure, called BCor-SIS, on the basis of a recently developed universal dependence measure: Ball correlation. We show that the proposed procedure has strong screening consistency even when the dimensionality is an exponential order of the sample size without imposing sub-exponential moment assumptions on the data. We investigate the flexibility of this procedure by considering three commonly encountered challenging settings in biological discovery or precision medicine: iterative BCor-SIS, interaction pursuit, and survival outcomes. We use simulation studies and real data analyses to illustrate the versatility and practicability of our BCor-SIS method.

16.
J Stroke Cerebrovasc Dis ; 28(9): 2517-2524, 2019 Sep.
Article in English | MEDLINE | ID: mdl-31296477

ABSTRACT

BACKGROUND: The purpose of this study was to validate and pilot the use of the four-variable screening tool (4V) and modified 4V tools to identify acute ischemic stroke and transient ischemic attack (TIA) patients at high risk of obstructive sleep apnea (OSA). METHODS: Two modified scales, 4V-1 (ie, using neck circumference instead body mass index, regardless of gender) and 4V-2 (ie, as above but scored differently according to gender) were designed. These tools were used in a consecutive cohort of 124 acute ischemic stroke/TIA patients, together with the 4V-1, 4V-2, 4V, as well as the STOP-BANG, the Berlin questionnaire, and the Epworth Sleepiness Scale (ESS). Objective level 2 or level 3 polysomnography was used to confirm OSA and its severity. Both questionnaires and polysomnography were completed within 1 week from symptom onset. RESULTS: Area under the curve (AUC) of 4V was 0.807 (P< .0001) while AUC of STOP-BANG, Berlin Questionnaire and ESS were .701 (P< .0001), .704 (P< .0001) and .576 (P = .1556), respectively. AUC of 4V was greater than of STOP-BANG (z = 2.200, P = .0220), Berlin (z = 2.024, P = .0430) and ESS (z = 3.363, P = .0003). AUC of modified 4V-1 and modified 4V-2 were .824 (P< .001) and .835 (P< .001), respectively. Performance of modified 4V-2 was higher versus modified 4V-1 (z = 2.111, P = .0348) and higher but not significantly so to regular 4V (z = 1.784, P = .0744). CONCLUSIONS: Neck circumference scored by gender is a useful substitution to body mass index in the 4V when screening OSA at early stages of ischemic stroke/TIA patients.


Subject(s)
Brain Ischemia/diagnosis , Decision Support Techniques , Ischemic Attack, Transient/diagnosis , Neck/pathology , Sleep Apnea, Obstructive/etiology , Stroke/diagnosis , Aged , Blood Pressure , Body Mass Index , Brain Ischemia/complications , Brain Ischemia/physiopathology , Female , Humans , Ischemic Attack, Transient/complications , Ischemic Attack, Transient/physiopathology , Male , Middle Aged , Pilot Projects , Polysomnography , Predictive Value of Tests , Prognosis , Reproducibility of Results , Risk Assessment , Risk Factors , Severity of Illness Index , Sex Factors , Sleep Apnea, Obstructive/diagnosis , Sleep Apnea, Obstructive/physiopathology , Snoring/physiopathology , Stroke/complications , Stroke/physiopathology , Surveys and Questionnaires
17.
J Am Stat Assoc ; 114(528): 1787-1799, 2019.
Article in English | MEDLINE | ID: mdl-31929665

ABSTRACT

This paper addresses the challenge of efficiently capturing a high proportion of true signals for subsequent data analyses when sample sizes are relatively limited with respect to data dimension. We propose the signal missing rate as a new measure for false negative control to account for the variability of false negative proportion. Novel data-adaptive procedures are developed to control signal missing rate without incurring many unnecessary false positives under dependence. We justify the efficiency and adaptivity of the proposed methods via theory and simulation. The proposed methods are applied to GWAS on human height to effectively remove irrelevant SNPs while retaining a high proportion of relevant SNPs for subsequent polygenic analysis.

18.
Article in English | MEDLINE | ID: mdl-32435328

ABSTRACT

In neuroimaging studies, regression models are frequently used to identify the association of the imaging features and clinical outcome, where the number of imaging features (e.g., hundreds of thousands of voxel-level predictors) much outweighs the number of subjects in the studies. Classical best subset selection or penalized variable selection methods that perform well for low- or moderate-dimensional data do not scale to ultrahigh-dimensional neuroimaging data. To reduce the dimensionality, variable screening has emerged as a powerful tool for feature selection in neuroimaging studies. We present a selective review of the recent developments in ultrahigh-dimensional variable screening, with a focus on their practical performance on the analysis of neuroimaging data with complex spatial correlation structures and high-dimensionality. We conduct extensive simulation studies to compare the performance on selection accuracy and computational costs between the different methods. We present analyses of resting-state functional magnetic resonance imaging data in the Autism Brain Imaging Data Exchange study. This article is categorized under: Applications of Computational Statistics > Computational and Molecular BiologyStatistical Learning and Exploratory Methods of the Data Sciences > Image Data MiningStatistical and Graphical Methods of Data Analysis > Analysis of High Dimensional Data.

19.
Sleep Breath ; 23(3): 969-977, 2019 Sep.
Article in English | MEDLINE | ID: mdl-30448963

ABSTRACT

PURPOSE: Obstructive sleep apnea (OSA) is highly prevalent and causes serious cardiovascular complications. Several screening questionnaires for OSA have been introduced, but only few validation studies have been conducted in general population. The aim of the present study was to assess the diagnostic value of three OSA screening questionnaires (Berlin Questionnaire, BQ; STOP-Bang Questionnaire, STOP-B; Four-Variable Screening Tool, Four-V) in a Korean community sample. METHODS: A total of 1148 community-dwelling participants completed the BQ, STOP-B, and Four-V. An overnight in-laboratory polysomnography (PSG) was conducted in randomly selected 116 participants. Sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio, negative likelihood ratio, and area under the curve (AUC) were calculated. RESULTS: The Four-V with cutoff ≥ 8 showed high sensitivity for overall OSA (69.4%), and the Four-V with cutoff ≥ 9 showed high specificity for both overall OSA (81.5%) and moderate to severe OSA (69.0%). On the other hand, the STOP-B showed acceptable sensitivity and specificity for both overall OSA (61.3 and 79.6%, respectively) and moderate to severe OSA (72.4 and 67.8%, respectively). The STOP-Bang also showed the largest area under the receiver-operator characteristic curve for both overall OSA (0.752) and moderate to severe OSA (0.750). The BQ showed the lowest performance in predicting OSA. CONCLUSIONS: Among the three questionnaires, the STOP-B was revealed as the most useful screening tool for OSA in terms of sensitivity, specificity, and area under the receiver-operator characteristic curve in the population of South Korea.


Subject(s)
Severity of Illness Index , Sleep Apnea, Obstructive/diagnosis , Surveys and Questionnaires/standards , Adult , Female , Humans , Male , Middle Aged , Polysomnography , Predictive Value of Tests , Republic of Korea , Sensitivity and Specificity , Translating
20.
J Am Stat Assoc ; 113(522): 780-788, 2018.
Article in English | MEDLINE | ID: mdl-30078921

ABSTRACT

Suppose one has a collection of parameters indexed by a (possibly infinite dimensional) set. Given data generated from some distribution, the objective is to estimate the maximal parameter in this collection evaluated at the distribution that generated the data. This estimation problem is typically non-regular when the maximizing parameter is non-unique, and as a result standard asymptotic techniques generally fail in this case. We present a technique for developing parametric-rate confidence intervals for the quantity of interest in these non-regular settings. We show that our estimator is asymptotically efficient when the maximizing parameter is unique so that regular estimation is possible. We apply our technique to a recent example from the literature in which one wishes to report the maximal absolute correlation between a prespecified outcome and one of p predictors. The simplicity of our technique enables an analysis of the previously open case where p grows with sample size. Specifically, we only require that log p grows slower than n , where n is the sample size. We show that, unlike earlier approaches, our method scales to massive data sets: the point estimate and confidence intervals can be constructed in O(np) time.

SELECTION OF CITATIONS
SEARCH DETAIL
...