Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 53
Filter
1.
Heliyon ; 10(13): e33695, 2024 Jul 15.
Article in English | MEDLINE | ID: mdl-39044968

ABSTRACT

The water quality index (WQI) is a widely used tool for comprehensive assessment of river environments. However, its calculation involves numerous water quality parameters, making sample collection and laboratory analysis time-consuming and costly. This study aimed to identify key water parameters and the most reliable prediction models that could provide maximum accuracy using minimal indicators. Water quality from 2020 to 2023 were collected including nine biophysical and chemical indicators in seventeen rivers in Yancheng and Nantong, two coastal cities in Jiangsu Province, China, adjacent to the Yellow Sea. Linear regression and seven machine learning models (Artificial Neural Network (ANN), Self-Organizing Maps (SOM), K-Nearest Neighbor (KNN), Support Vector Machines (SVM), Random Forest (RF), Extreme Gradient Boosting (XGB) and Stochastic Gradient Boosting (SGB)) were developed to predict WQI using different groups of input variables based on correlation analysis. The results indicated that water quality improved from 2020 to 2022 but deteriorated in 2023, with inland stations exhibiting better conditions than coastal ones, particularly in terms of turbidity and nutrients. The water environment was comparatively better in Nantong than in Yancheng, with mean WQI values of approximately 55.3-72.0 and 56.4-67.3, respectively. The classifications "Good" and "Medium" accounted for 80 % of the records, with no instances of "Excellent" and 2 % classified as "Bad". The performance of all prediction models, except for SOM, improved with the addition of input variables, achieving R2 values higher than 0.99 in models such as SVM, RF, XGB, and SGB. The most reliable models were RF and XGB with key parameters of total phosphorus (TP), ammonia nitrogen (AN), and dissolved oxygen (DO) (R2 = 0.98 and 0.91 for training and testing phase) for predicting WQI values, and RF using TP and AN (accuracy higher than 85 %) for WQI grades. The prediction accuracy for "Medium" and "Low" water quality grades was highest at 90 %, followed by the "Good" level at 70 %. The model results could contribute to efficient water quality evaluation by identifying key water parameters and facilitating effective water quality management in river basins.

2.
Water Res ; 254: 121407, 2024 May 01.
Article in English | MEDLINE | ID: mdl-38442609

ABSTRACT

The water body's suspended concentration reflects many coastal environmental indicators, which is important for predicting ecological hazards. The modeling of any concentration in water requires solving the settling-diffusion equation (SDE), and the values of several key input parameters therein (settling velocity ws, eddy diffusivity Ds, and erosion rates p(t)) directly determine the prediction performance. The time-consuming large-scale simulations would benefit if the parameter values could be estimated through available observations in the target sea area. The present work proposes a new optimization method for synchronously estimating the three parameters from limited concentration observations. First, an analytical solution to the one-dimensional vertical (1DV) SDE for suspended concentrations in an unsteady scenario is derived. Second, the near bottom suspended sediment concentration (SSC) profiles are measured with high-resolution observation. Third, the key parameters are optimized through the best fit of the measured SSC profiles and those modeled with the unsteady solution. Nonlinear least square fitting (NLSF) is introduced to judge the best fits automatically. The high-resolution concentration measurements in a specially-designed cylindrical tank experiment using the Yellow River Delta sediments test the proposed method. The method performs well in the initial period of turbulence generation when sediment resuspension is significant. It optimizes p(t), ws, and Ds with reasonable values and uniqueness of their combination. The proposed theory is a practical tool for quickly estimating key substance transport parameters from limited observations; it also has the potential to construct local parametric models to benefit the 3D modeling of coastal substance transport. Although the present work takes SSC as an example, it can be extended to any suspended particulate concentration in the water.


Subject(s)
Geologic Sediments , Water , Rivers , Water Movements , Environmental Monitoring/methods
3.
Multivariate Behav Res ; 59(1): 62-77, 2024.
Article in English | MEDLINE | ID: mdl-37261427

ABSTRACT

Many person-fit statistics have been proposed to detect aberrant response behaviors (e.g., cheating, guessing). Among them, lz is one of the most widely used indices. The computation of lz assumes the item and person parameters are known. In reality, they often have to be estimated from data. The better the estimation, the better lz will perform. When aberrant behaviors occur, the person and item parameter estimations are inaccurate, which in turn degrade the performance of lz. In this study, an iterative procedure was developed to attain more accurate person parameter estimates for improved performance of lz. A series of simulations were conducted to evaluate the iterative procedure under two conditions of item parameters, known and unknown, and three aberrant response styles of difficulty-sharing cheating, random-sharing cheating, and random guessing. The results demonstrated the superiority of the iterative procedure over the non-iterative one in maintaining control of Type-I error rates and improving the power of detecting aberrant responses. The proposed procedure was applied to a high-stake intelligence test.


Subject(s)
Psychometrics , Humans , Psychometrics/methods , Intelligence Tests
4.
BMC Bioinformatics ; 24(1): 362, 2023 Sep 26.
Article in English | MEDLINE | ID: mdl-37752445

ABSTRACT

BACKGROUND: The central biological clock governs numerous facets of mammalian physiology, including sleep, metabolism, and immune system regulation. Understanding gene regulatory relationships is crucial for unravelling the mechanisms that underlie various cellular biological processes. While it is possible to infer circadian gene regulatory relationships from time-series gene expression data, relying solely on correlation-based inference may not provide sufficient information about causation. Moreover, gene expression data often have high dimensions but a limited number of observations, posing challenges in their analysis. METHODS: In this paper, we introduce a new hybrid framework, referred to as Circadian Gene Regulatory Framework (CGRF), to infer circadian gene regulatory relationships from gene expression data of rats. The framework addresses the challenges of high-dimensional data by combining the fuzzy C-means clustering algorithm with dynamic time warping distance. Through this approach, we efficiently identify the clusters of genes related to the target gene. To determine the significance of genes within a specific cluster, we employ the Wilcoxon signed-rank test. Subsequently, we use a dynamic vector autoregressive method to analyze the selected significant gene expression profiles and reveal directed causal regulatory relationships based on partial correlation. CONCLUSION: The proposed CGRF framework offers a comprehensive and efficient solution for understanding circadian gene regulation. Circadian gene regulatory relationships are inferred from the gene expression data of rats based on the Aanat target gene. The results show that genes Pde10a, Atp7b, Prok2, Per1, Rhobtb3 and Dclk1 stand out, which have been known to be essential for the regulation of circadian activity. The potential relationships between genes Tspan15, Eprs, Eml5 and Fsbp with a circadian rhythm need further experimental research.


Subject(s)
Gene Expression Profiling , Gene Expression Regulation , Rats , Animals , Gene Expression Profiling/methods , Transcription Factors/metabolism , Algorithms , Circadian Rhythm/genetics , Gene Expression , Mammals/genetics
5.
J Hazard Mater ; 446: 130744, 2023 Mar 15.
Article in English | MEDLINE | ID: mdl-36630874

ABSTRACT

Effective and selective removal of 99TcO4-, one of the most nuisance radionuclides in nuclear waste, is highly desirable but remains a significant challenge. Herein, two isostructural MOFs, NCU-3-X (X = Cl, Br) were constructed by ZnX2 coordinated to nitrogen-containing neutral ligand tri(4-(1H-imidazole-1-l) phenyl) amine for efficient adsorption ReO4-/TcO4-. Owning to the twofold interpenetrating structure, both of them exhibit strong alkaline resistance. Consequently, NCU-3-Br exhibited superior adsorption performances with a maximum capacity as high as 483 mg/g, which is 2.23 times larger than that of NCU-3-Cl. The primary reasons accounting for the enhanced adsorption performances of NCU-3-Br are that compared to chlorine atoms, the smaller electronegativity of bromine atoms as halogen bonds donor can facilitate the formation of σ-holes, enhance positively charged skeleton, and reduce the adsorption energy associated with ReO4-/TcO4-. In addition, the one-dimensional hydrophobic channels in the NCU-3-Br framework enable NCU-3-Br to have highly selective toward ReO4-, which has a low relative charge density against interfering ions. The SRS simulation removal experiment further confirmed the excellent adsorption capacity of NCU-3-Br to ReO4-/TcO4-. This work illustrated that the halogenated new strategy incorporated different halogen atoms into MOF skeletons can dramatically modulate the adsorption performances for ReO4-/TcO4-.

6.
Sci Rep ; 13(1): 1015, 2023 01 18.
Article in English | MEDLINE | ID: mdl-36653488

ABSTRACT

China implemented a strict lockdown policy to prevent the spread of COVID-19 in the worst-affected regions, including Wuhan and Shanghai. This study aims to investigate impact of these lockdowns on air quality index (AQI) using a deep learning framework. In addition to historical pollutant concentrations and meteorological factors, we incorporate social and spatio-temporal influences in the framework. In particular, spatial autocorrelation (SAC), which combines temporal autocorrelation with spatial correlation, is adopted to reflect the influence of neighbouring cities and historical data. Our deep learning analysis obtained the estimates of the lockdown effects as - 25.88 in Wuhan and - 20.47 in Shanghai. The corresponding prediction errors are reduced by about 47% for Wuhan and by 67% for Shanghai, which enables much more reliable AQI forecasts for both cities.


Subject(s)
Air Pollutants , Air Pollution , COVID-19 , Deep Learning , Humans , Air Pollutants/analysis , COVID-19/epidemiology , COVID-19/prevention & control , Particulate Matter/analysis , Pandemics/prevention & control , China/epidemiology , Communicable Disease Control , Air Pollution/analysis , Cities , Spatial Analysis , Environmental Monitoring
7.
Ann Bot ; 131(1): 11-16, 2023 02 07.
Article in English | MEDLINE | ID: mdl-35291007

ABSTRACT

BACKGROUND: Polyploids are common in flowering plants and they tend to have more expanded ranges of distributions than their diploid progenitors. Possible mechanisms underlying polyploid success have been intensively investigated. Previous studies showed that polyploidy generates novel changes and that subgenomes in allopolyploid species often differ in gene number, gene expression levels and levels of epigenetic alteration. It is widely believed that such differences are the results of conflicts among the subgenomes. These differences have been treated by some as subgenome dominance, and it is claimed that the magnitude of subgenome dominance increases in polyploid evolution. SCOPE: In addition to changes which occurred during evolution, differences between subgenomes of a polyploid species may also be affected by differences between the diploid donors and changes which occurred during polyploidization. The variable genome components in many plant species are extensive, which would result in exaggerated differences between a subgenome and its progenitor when a single genotype or a small number of genotypes are used to represent a polyploid or its donors. When artificially resynthesized polyploids are used as surrogates for newly formed genotypes which have not been exposed to evolutionary selection, differences between diploid genotypes available today and those involved in the formation of the natural polyploid genotypes must also be considered. CONCLUSIONS: Contrary to the now widely held views that subgenome biases in polyploids are the results of conflicts among the subgenomes and that one of the parental subgenomes generally retains more genes which are more highly expressed, available results show that subgenome biases mainly reflect legacy from the progenitors and that they can be detected before the completion of polyploidization events. Further, there is no convincing evidence that the magnitudes of subgenome biases have significantly changed during evolution for any of the allopolyploid species assessed.


Subject(s)
Genome, Plant , Magnoliopsida , Evolution, Molecular , Polyploidy , Magnoliopsida/genetics
8.
J Hazard Mater ; 443(Pt B): 130325, 2023 02 05.
Article in English | MEDLINE | ID: mdl-36372023

ABSTRACT

The elimination of anion is of great importance from radioactive nuclear waste containing 99TcO4- by rationally designing anion-scavenging materials with high density of charge and more accessible adsorption sites. Herein, a tailor-made cationic organic polymer with donor-acceptor (D-A) structure, namely TrDCPN, was successfully synthesized by rationally modifying the benzimidazole unit for efficient trapping the perrhenate (ReO4-) as a 99Tc surrogate. Systematic control of the skeleton affect enables the material to integrate a variety of features, surmounting the long-term challenge of 99TcO4-/ReO4- remediation under extreme conditions of high acid/base and high ionic strength. Furthermore, the TrDCPN shows excellent affinity toward ReO4- in the existence of large excess of competitive anions (SO42-, NO3- and PO43-etc.) as well as promising reusability for trapping ReO4-. The excellent stability and separation were derived from the introduction of large conjugated modules, triazine core and hydrophobic. More importantly, the synthetic cationic organic polymer with D-A feature was first proved that the introduction of halogen can effectively enhance the backbone charge, and increase the adsorption capacity by synergy of ion exchange, electrostatic interaction and δ hole-anion interaction. The adsorption capacity of TrDCPN can be up to 420.3 mg/g and reach equilibrium within 20 min. It is noteworthy that TrDCPN successfully immobilizes ReO4- from simulated Hanford waste with a high separation efficiency of 93 %, providing a new paradigm for material design to dispose of the problem of radioactive pollutants in the environment.


Subject(s)
Halogens , Radioactive Waste , Polymers , Cations , Adsorption , Ion Exchange
9.
Heliyon ; 8(11): e11474, 2022 Nov.
Article in English | MEDLINE | ID: mdl-36411891

ABSTRACT

Centrality has always been used in transportation networks to estimate the status and importance of a node in the networks, especially in the shipping networks. However, most of the studies only take the shipping network as an unweighted network or only considering the tie weights in the weighted networks, ignoring the truth that both the number of ties and tie weights contribute to the centrality in weighted shipping networks. Therefore, we proposed a new method combining both the number of ties and tie weights to assess the node centrality based on effective distance by integrating the studies of Opsahl et al., (2010) and Du et al., (2015). An empirical analysis of shipping network at the country level for the 21st-centrtury Maritime Silk Road (MSR) was performed. The result of correlation analysis between country's degree centrality and the Liner Shipping Connectivity Index (LSCI) published by the United Nations Conference on Trade and Development (UNCTAD) proved the superiority of our method compared to the traditional centrality metrics. In weighted networks, both the number of ties the tie weights should be considered by adjusting the parameters. The method proposed in this study can also be used to nodes' status and importance estimation of various networks in other fields.

10.
J Appl Stat ; 49(14): 3677-3692, 2022.
Article in English | MEDLINE | ID: mdl-36246863

ABSTRACT

Variable selection is fundamental to high dimensional statistical modeling, and many approaches have been proposed. However, existing variable selection methods do not perform well in presence of outliers in response variable or/and covariates. In order to ensure a high probability of correct selection and efficient parameter estimation, we investigate a robust variable selection method based on a modified Huber's function with an exponential squared loss tail. We also prove that the proposed method has oracle properties. Furthermore, we carry out simulation studies to evaluate the performance of the proposed method for both pn. Our simulation results indicate that the proposed method is efficient and robust against outliers and heavy-tailed distributions. Finally, a real dataset from an air pollution mortality study is used to illustrate the proposed method.

11.
Sci Rep ; 12(1): 13867, 2022 08 16.
Article in English | MEDLINE | ID: mdl-35974067

ABSTRACT

In environmental monitoring, multiple spatial variables are often sampled at a geographical location that can depend on each other in complex ways, such as non-linear and non-Gaussian spatial dependence. We propose a new mixture copula model that can capture those complex relationships of spatially correlated multiple variables and predict univariate variables while considering the multivariate spatial relationship. The proposed method is demonstrated using an environmental application and compared with three existing methods. Firstly, improvement in the prediction of individual variables by utilising multivariate spatial copula compares to the existing univariate pair copula method. Secondly, performance in prediction by utilising mixture copula in the multivariate spatial copula framework compares with an existing multivariate spatial copula model that uses a non-linear principal component analysis. Lastly, improvement in the prediction of individual variables by utilising the non-linear non-Gaussian multivariate spatial copula model compares to the linear Gaussian multivariate cokriging model. The results show that the proposed spatial mixture copula model outperforms the existing methods in the cross-validation of actual and predicted values at the sampled locations.


Subject(s)
Spatial Analysis , Principal Component Analysis
12.
PLoS One ; 17(8): e0271457, 2022.
Article in English | MEDLINE | ID: mdl-36001585

ABSTRACT

Many studies have considered temperature trends at the global scale, but the literature is commonly associated with an overall increase in mean temperature in a defined past time period and hence lacking in in-depth analysis of the latent trends. For example, in addition to heterogeneity in mean and median values, daily temperature data often exhibit quasi-periodic heterogeneity in variance, which has largely been overlooked in climate research. To this end, we propose a joint model of quantile regression and variability. By accounting appropriately for the heterogeneity in these types of data, our analysis using Australian data reveals that daily maximum temperature is warming by ∼0.21°C per decade and daily minimum temperature by ∼0.13°C per decade. More interestingly, our modeling also shows nuanced patterns of change over space and time depending on location, season, and the percentiles of the temperature series.


Subject(s)
Climate Change , Australia , Regression Analysis , Seasons , Spatio-Temporal Analysis , Temperature
13.
Water Res ; 218: 118518, 2022 Jun 30.
Article in English | MEDLINE | ID: mdl-35526355

ABSTRACT

An in-situ monitoring of water quality (suspended sediment concentration, SSC) and concurrent hydrodynamics was conducted in the subaqueous Yellow River Delta in China. Empirical mode decomposition and spectral analysis on the SSC time series reveal the different periodicities of each physical mechanism that contribute to the SSC variations. Based on this physical understanding, the decomposed SSC time series were trained separately with a newly-proposed augmented lncosh ridge regression, in which (1) a lncosh function was incorporated in traditional ridge regression for handling outliers in original data, and (2) the temporal auto-correlation in the decomposed SSC series was used for augmented regression. Finally, the trained sub-series were added up as the final prediction. The advantages of this decomposition-ensemble framework is that it depends on SSC only, superior to the normal process-based models which need the concurrent hydrodynamics for estimating bed shear stress. This will not only reduce the measurement uncertainties of the input when training the data-driven model, but also save the prediction cost as no other parameters than SSC need to be measured and input for running the model. The framework realized 6-hour-ahead high-accuracy forecasting with mean relative errors of 5.80-9.44% in the present case study. The proposed framework can be extended to forecast any signal that is superposed by components with various timescales (periodicities) which is common in nature.


Subject(s)
Rivers , Water Quality , Environmental Monitoring , Forecasting , Geologic Sediments/analysis , Physics
14.
Animals (Basel) ; 12(2)2022 Jan 15.
Article in English | MEDLINE | ID: mdl-35049823

ABSTRACT

Selecting the minimal best subset out of a huge number of factors for influencing the response is a fundamental and very challenging NP-hard problem because the presence of many redundant genes results in over-fitting easily while missing an important gene can more detrimental impact on predictions, and computation is prohibitive for exhaust search. We propose a modified memetic algorithm (MA) based on an improved splicing method to overcome the problems in the traditional genetic algorithm exploitation capability and dimension reduction in the predictor variables. The new algorithm accelerates the search in identifying the minimal best subset of genes by incorporating it into the new local search operator and hence improving the splicing method. The improvement is also due to another two novel aspects: (a) updating subsets of genes iteratively until the no more reduction in the loss function by splicing and increasing the probability of selecting the true subsets of genes; and (b) introducing add and del operators based on backward sacrifice into the splicing method to limit the size of gene subsets. Additionally, according to the experimental results, our proposed optimizer can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms. Moreover, the mutation operator is replaced by it to enhance exploitation capability and initial individuals are improved by it to enhance efficiency of search. A dataset of the body weight of Hu sheep was used to evaluate the superiority of the modified MA against the genetic algorithm. According to our experimental results, our proposed optimizer can obtain a better minimal subset of genes with a few iterations, compared with all considered algorithms including the most advanced adaptive best-subset selection algorithm.

15.
Stat Med ; 40(30): 6835-6854, 2021 12 30.
Article in English | MEDLINE | ID: mdl-34619808

ABSTRACT

This article proposes a new robust smooth-threshold estimating equation to select important variables and automatically estimate parameters for high dimensional longitudinal data. A novel working correlation matrix is proposed to capture correlations within the same subject. The proposed procedure works well when the number of covariates pn increases as the number of subjects n increases. The proposed estimates are competitive with the estimates obtained with the true correlation structure, especially when the data are contaminated. Moreover, the proposed method is robust against outliers in the response variables and/or covariates. Furthermore, the oracle properties for robust smooth-threshold estimating equations under "large n, diverging pn " are established under some regularity conditions. Extensive simulation studies and a yeast cell cycle data are used to evaluate the performance of the proposed method, and results show that the proposed method is competitive with existing robust variable selection procedures.


Subject(s)
Data Analysis , Models, Statistical , Computer Simulation , Humans , Research Design
16.
Lifetime Data Anal ; 27(4): 679-709, 2021 10.
Article in English | MEDLINE | ID: mdl-34215947

ABSTRACT

In medical studies, the collected covariates contain underlying outliers. For clustered/longitudinal data with censored observations, the traditional Gehan-type estimator is robust to outliers in response but sensitive to outliers in the covariate domain, and it also ignores the within-cluster correlations. To take account of within-cluster correlations, varying cluster sizes, and outliers in covariates, we propose weighted Gehan-type estimating functions for parameter estimation in the accelerated failure time model for clustered data. We provide the asymptotic properties of the resulting estimators and carry out simulation studies to evaluate the performance of the proposed method under a variety of realistic settings. The simulation results demonstrate that the proposed method is robust to the outliers existing in the covariate domain and lead to much more efficient estimators when a strong within-cluster correlation exists. Finally, the proposed method is applied to two medical datasets and more reliable and convincing results are hence obtained.


Subject(s)
Research Design , Causality , Computer Simulation , Humans
17.
Stat Methods Med Res ; 30(8): 1800-1815, 2021 08.
Article in English | MEDLINE | ID: mdl-33975508

ABSTRACT

In robust regression, it is usually assumed that the distribution of the error term is symmetric or the data are symmetrically contaminated by outliers. However, this assumption is usually not satisfied in practical problems, and thus if the traditional robust methods, such as Tukey's biweight and Huber's method, are used to estimate the regression parameters, the efficiency of the parameter estimation can be lost. In this paper, we construct an asymmetric Tukey's biweight loss function with two tuning parameters and propose a data-driven method to find the most appropriate tuning parameters. Furthermore, we provide an adaptive algorithm to obtain robust and efficient parameter estimates. Our extensive simulation studies suggest that the proposed method performs better than the symmetric methods when error terms follow an asymmetric distribution or are asymmetrically contaminated. Finally, a cardiovascular risk factors dataset is analyzed to illustrate the proposed method.


Subject(s)
Algorithms , Research Design , Computer Simulation
18.
Genome ; 64(9): 847-856, 2021 Sep.
Article in English | MEDLINE | ID: mdl-33661713

ABSTRACT

Subgenome asymmetry (SA) has routinely been attributed to different responses between the subgenomes of a polyploid to various stimuli during evolution. Here, we compared subgenome differences in gene ratio and relative diversity between artificial and natural genotypes of several allopolyploid species. Surprisingly, consistent differences were not detected between these two types of polyploid genotypes, although they differ in times exposed to evolutionary selection. The estimated ratio of shared genes between a subgenome and its diploid donor was invariably higher for the artificial allopolyploid genotypes than those for the natural genotypes, which is expected as it is now well-known that many genes in a species are not shared among all individuals. As the exact diploid parent for a given subgenome is unknown, the estimated ratios of shared genes for the natural genotypes would also include difference among individual genotypes of the diploid donor species. Further, we detected the presence of SA in genotypes before the completion of the polyploidization events as well as in those which were not formed via polyploidization. These results indicate that SA may, to a large degree, reflect differences between its diploid donors or that changes occurred during polyploid evolution are defined by their donor genomes.


Subject(s)
Diploidy , Genome, Plant , Polyploidy , Arabidopsis , Brassica , Gossypium , Triticum
19.
Brief Bioinform ; 22(3)2021 05 20.
Article in English | MEDLINE | ID: mdl-32382739

ABSTRACT

Reversible post-translational modification (PTM) orchestrates various biological processes by changing the properties of proteins. Since many proteins are multiply modified by PTMs, identification of PTM crosstalk site has emerged to be an intriguing topic and attracted much attention. In this study, we systematically deciphered the in situ crosstalk of ubiquitylation and SUMOylation that co-occurs on the same lysine residue. We first collected 3363 ubiquitylation-SUMOylation (UBS) crosstalk site on 1302 proteins and then investigated the prime sequence motifs, the local evolutionary degree and the distribution of structural annotations at the residue and sequence levels between the UBS crosstalk and the single modification sites. Given the properties of UBS crosstalk sites, we thus developed the mUSP classifier to predict UBS crosstalk site by integrating different types of features with two-step feature optimization by recursive feature elimination approach. By using various cross-validations, the mUSP model achieved an average area under the curve (AUC) value of 0.8416, indicating its promising accuracy and robustness. By comparison, the mUSP has significantly better performance with the improvement of 38.41 and 51.48% AUC values compared to the cross-results by the previous single predictor. The mUSP was implemented as a web server available at http://bioinfo.ncu.edu.cn/mUSP/index.html to facilitate the query of our high-accuracy UBS crosstalk results for experimental design and validation.


Subject(s)
Protein Processing, Post-Translational , Proteome/metabolism , Amino Acids/metabolism , Biological Evolution , Humans , Sumoylation , Ubiquitination
20.
Stat Methods Med Res ; 29(12): 3641-3652, 2020 12.
Article in English | MEDLINE | ID: mdl-32662336

ABSTRACT

Robust approach is often desirable in presence of outliers for more efficient parameter estimation. However, the choice of the regularization parameter value impacts the efficiency of the parameter estimators. To maximize the estimation efficiency, we construct a likelihood function for simultaneously estimating the regression parameters and the tuning parameter. The "working" likelihood function is deemed as a vehicle for efficient regression parameter estimation, because we do not assume the data are generated from this likelihood function. The proposed method can effectively find a value of the regularization parameter based on the extent of contamination in the data. We carry out extensive simulation studies in a variety of cases to investigate the performance of the proposed method. The simulation results show that the efficiency can be enhanced as much as 40% when the data follow a heavy-tailed distribution, and reaches as high as 468% for the heteroscedastic variance cases compared to the traditional Huber's method with a fixed regularization parameter. For illustration, we also analyzed two datasets: one from a diabetics study and the other from a mortality study.


Subject(s)
Likelihood Functions , Computer Simulation
SELECTION OF CITATIONS
SEARCH DETAIL
...