Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 12 de 12
Filter
1.
Braz. J. Pharm. Sci. (Online) ; 59: e22373, 2023. tab, graf
Article in English | LILACS | ID: biblio-1439538

ABSTRACT

Abstract Quantitative Structure-Activity Relationship (QSAR) is a computer-aided technology in the field of medicinal chemistry that seeks to clarify the relationships between molecular structures and their biological activities. Such technologies allow for the acceleration of the development of new compounds by reducing the costs of drug design. This work presents 3D-QSARpy, a flexible, user-friendly and robust tool, freely available without registration, to support the generation of QSAR 3D models in an automated way. The user only needs to provide aligned molecular structures and the respective dependent variable. The current version was developed using Python with packages such as scikit-learn and includes various techniques of machine learning for regression. The diverse techniques employed by the tool is a differential compared to known methodologies, such as CoMFA and CoMSIA, because it expands the search space of possible solutions, and in this way increases the chances of obtaining relevant models. Additionally, approaches for select variables (dimension reduction) were implemented in the tool. To evaluate its potentials, experiments were carried out to compare results obtained from the proposed 3D-QSARpy tool with the results from already published works. The results demonstrated that 3D-QSARpy is extremely useful in the field due to its expressive results.


Subject(s)
Drug Design , Quantitative Structure-Activity Relationship , Machine Learning/classification , Costs and Cost Analysis/classification , Health Services Needs and Demand/classification
2.
Acta Pharmaceutica Sinica ; (12): 1041-1048, 2023.
Article in Chinese | WPRIM | ID: wpr-978751

ABSTRACT

Mannitol-calcium chloride metal organic framework (MOF) cocrystal significantly improved the tabletability of β-mannitol and could be developed as a new tablet filler. However, mannitol monomer was found in the product during the scale-up production of the excipient, which significantly affected the functional properties of the excipient. In this study, we intend to quantify the multi-component eutectic system of mannitol-calcium chloride. In this experiment, the MOF cocrystal excipient mannitol-calcium chloride cocrystal was used as the model compound, and infrared spectrum was collected. Based on partial least squares regression (PLSR) method, the abnormal bands were removed and the spectrum was preprocessed by normalization. The quantitative correction model of mannitol-calcium chloride MOF cocrystal content in cocrystal excipients was established and compared by two different variable screening methods, genetic algorithm (GA) and competitive adaptive reweighting algorithm (CARS). Two different variable screening methods, GA method and CARS method, were used to screen out 160 and 14 variables, respectively. The mannitol-calcium chloride cocrystal model established by CARS-PLSR method had the best performance, and the average relative error (MRE) and corrected root mean square error (RMSEC) of the model were 0.008 8 and 0.892 5, respectively, the determination coefficient (R2) of the model was increased from 0.978 3 to 0.994 4. The quantitative method of eutectic system established in this study has high prediction accuracy, fast detection speed and good stability, which is of great significance for optimizing the preparation process conditions and quality control methods of such eutectic excipients.

3.
Journal of Xi'an Jiaotong University(Medical Sciences) ; (6): 628-632, 2021.
Article in Chinese | WPRIM | ID: wpr-1006702

ABSTRACT

【Objective】 To compare the performance of five commonly used variable selection methods in high-dimensional biomedical data variable screening so as to explore the effects of sample size and association among candidate variables on screening results and provide evidence for the development of variable selection strategy in high-dimensional biomedical data analysis. 【Methods】 Variable selection algorithms were implemented based on R-programming language. Monte Carlo method was used to simulate high-dimensional biomedical data under different conditions to evaluate and compare the performance of different variable selection methods. Variable selection performance was evaluated based on the true positive rate and true negative rate in screening. 【Results】 For specified high-dimensional data, the variable selection performance was improved for all the methods when sample size was increased, and the association between candidate variables did affect variable screening results. Simulation results indicated that the elastic network algorithm yielded the best screening performance, LASSO algorithm took the second place, and ridge algorithm did not work at all. 【Conclusion】 Elastic network algorithm is an ideal variable screening method for high-dimensional data variable screening.

4.
Acta Pharmaceutica Sinica ; (12): 138-143, 2019.
Article in Chinese | WPRIM | ID: wpr-778673

ABSTRACT

Near-infrared spectroscopy (NIRS) combined with chemometrics can achieve rapid detection in process analysis. After variable selection, the redundant information is effectively removed and the characteristic variables related to the response values are selected. Compared with global model, the complexity is significantly reduced and the prediction accuracy is also improved. In this study, near-infrared spectroscopy analysis combined with different variable selection methods was applied to achieve the rapid detection of baicalin in the extraction process of Scutellaria baicalensis. Data sets were divided based on sample set portioning based on joint x-y distance (SPXY) method. Competitive adaptive weighted resampling method (CARS), random frog (RF) and successive projections algorithm (SPA) were applied to variable selection. Partial least squares (PLS) models were constructed based on above three methods, and the prediction results were compared. After CARS, RF and SPA method, 92, 10 and 17 variables were screened out respectively. According to the performance of the models, CARS method is found to be more effective and suitable than RF and SPA. Furthermore, the characteristic variables selected by CARS method have a better correspondence with the chemical structure of baicalin. The root mean square error (RMSEC) of the calibration set and the root mean square error (RMSEP) of the prediction set are 0.528 2 and 0.720 2 respectively. Compared with the global PLS model, the correlation coefficient of the calibration set (Rc) is increased to 0.979 9 from 0.917 0, and the relative standard errors of prediction (RSEP) is reduced to 5.59% from 10.58%.

5.
Chinese Journal of Analytical Chemistry ; (12): 136-142, 2018.
Article in Chinese | WPRIM | ID: wpr-664883

ABSTRACT

Near-infrared spectroscopy ( NIR ) is widely used in the area of food quantitative and qualitative analysis.Variable selection technique is a critical step of the spectrum modeling with the development of chemometrics.In this study, a novel variable selection strategy, automatic weighting variable combination population analysis (AWVCPA), was proposed.Firstly, binary matrix sampling (BMS) strategy that gives each variable the same chance to be selected and generates different variable combinations, was used to produce a population of subsets to construct a population of sub-models.Then, the variable frequency ( Fre) and partial least squares regression ( Reg) , which were two kinds of information vector ( IVs) were weighted to obtain the value of the contribution of each spectral variables, the influence of two IVs of Rre and Reg was considered to each spectral variable.Finally, it used the exponentially decreasing function ( EDF) to remove the low contribution wavelengths so as to select the characteristic variable.In the case of near infrared spectrum of beer and corn, the prediction model based on partial least squares ( PLS ) was established.Compared with other variable selection methods, the research showed that AWVCPA was the best variable selection strategy in the same situation.It had 72.7% improvement compared AWVCPA-PLS with PLS and the predicted root mean square error (RMSEP) decreased from 0.5348 to 0.1457 on beer dataset.It had 64.7% improvement compared AWVCPA-PLS with PLS and the RMSEP decreased from 0.0702 to 0.0248 on corn dataset.

6.
Chinese Traditional and Herbal Drugs ; (24): 3317-3321, 2017.
Article in Chinese | WPRIM | ID: wpr-852584

ABSTRACT

Objective: To determine the content of chlorogenic acid in Lonicerae Japonicae Flos by the combined near-infrared and variable selection methods. Methods: Synergy interval partial least squares (SIPLS), competitive adaptive reweighted sampling method (CARS), variable importance in projection (VIP), and successive projections algorithm (SPA) were used to build a chlorogenic acid quantitative model in Lonicerae Japonicae Flos and compare. High performance liquid chromatography (HPLC) was used as a reference to select the optimum variable screening method. Results: Study results showed that SIPLS was the most desirable method for chlorogenic acid in regression performance with Rpre2 at 0.990 3 and RMSEP at 2.316%. Conclusion: The quantitative model of chlorogenic acid established by NIR combined with SIPLS has good performance and meets the requirement of real-time analysis of traditional Chinese medicine production process.

7.
Chinese Journal of Epidemiology ; (12): 679-683, 2017.
Article in Chinese | WPRIM | ID: wpr-737706

ABSTRACT

With the rapid development of genome sequencing technology and bioinformatics in recent years,it has become possible to measure thousands of omics data which might be associated with the progress of diseases,i.e."high-dimensional data".This type of omics data have a common feature that the number of variable p is usually greater than the observation cases n,and often has high correlation between independent variables.Therefore,it is a great statistical challenge to identify really meaningful variables from omics data.This paper summarizes the methods of Bayesian variable selection in the analysis of high-dimensional data.

8.
Chinese Journal of Epidemiology ; (12): 679-683, 2017.
Article in Chinese | WPRIM | ID: wpr-736238

ABSTRACT

With the rapid development of genome sequencing technology and bioinformatics in recent years,it has become possible to measure thousands of omics data which might be associated with the progress of diseases,i.e."high-dimensional data".This type of omics data have a common feature that the number of variable p is usually greater than the observation cases n,and often has high correlation between independent variables.Therefore,it is a great statistical challenge to identify really meaningful variables from omics data.This paper summarizes the methods of Bayesian variable selection in the analysis of high-dimensional data.

9.
Chinese Journal of Analytical Chemistry ; (12): 1694-1702, 2017.
Article in Chinese | WPRIM | ID: wpr-666560

ABSTRACT

Near infrared spectroscopy (NIR) was used to detect trans fatty acids (TFA) in edible vegetable oils quantitatively. And prediction model of TFA was optimized through band selection, pretreatment method, variable selection and modeling method. NIR spectra of 98 edible vegetable oil samples were collected in spectral range of 4000-10000 cm-1 using an Antaris Ⅱ Fourier transform near infrared spectrometer, and the true content of TFA was measured by gas chromatography. First, optimization of waveband and pretreatment method was conducted on original spectra. On this basis, competitive adaptive reweighted sampling (CARS) was used to select important variables that related to TFA. Finally, the prediction models of TFA content in edible vegetable oils were established using principal component regression ( PCR), partial least square (PLS) and least square support vector machine (LS-SVM). The results indicated that NIR spectroscopy was feasible for detecting TFA content in edible vegetable oils, R2 of the best prediction model after optimized in calibration and prediction sets were 0. 992 and 0. 989, and root mean square error of calibration (RMSEC) and root mean square error of prediction ( RMSEP) were 0. 071% and 0. 075% , respectively. Only 26 variables were used in the best prediction model, accounting for 0. 854% of the whole waveband variables. In addition, compared with the full waveband PLS prediction model, the R2 in prediction set increased from 0. 904 to 0. 989, and RMSEP decreased from 0. 230% to 0. 075% . It shows that model optimization is very necessary, CARS method can select important variables related to TFA effectively and immensely reduce the number of modeling variables, so it can simplify the prediction model, and greatly improve the accuracy and stability of prediction model.

10.
Genomics & Informatics ; : 149-159, 2016.
Article in English | WPRIM | ID: wpr-172206

ABSTRACT

With the success of the genome-wide association studies (GWASs), many candidate loci for complex human diseases have been reported in the GWAS catalog. Recently, many disease prediction models based on penalized regression or statistical learning methods were proposed using candidate causal variants from significant single-nucleotide polymorphisms of GWASs. However, there have been only a few systematic studies comparing existing methods. In this study, we first constructed risk prediction models, such as stepwise linear regression (SLR), least absolute shrinkage and selection operator (LASSO), and Elastic-Net (EN), using a GWAS chip and GWAS catalog. We then compared the prediction accuracy by calculating the mean square error (MSE) value on data from the Korea Association Resource (KARE) with body mass index. Our results show that SLR provides a smaller MSE value than the other methods, while the numbers of selected variables in each model were similar.


Subject(s)
Humans , Body Mass Index , Decision Support Techniques , Genome-Wide Association Study , Korea , Learning , Linear Models
11.
Article in English | IMSEAR | ID: sea-155182

ABSTRACT

Background & objectives: Physicians’ satisfaction/dissatisfaction from their job is an important factor associated with health service that deals with human life. This study was conducted to ascertain overall level and proportion of physicians’ satisfaction from their job as well as to identify those components that influenced it. Method: A comprehensive customized questionnaire was used with Section A to assess demographic profile of physicians and Section B to assess satisfaction. Response to each question was devised using Likert scale. Likert scale responses were converted to normal scale so that statistical procedures could be naturally developed. A total of 170 physicians were selected using multistage sampling. Questionnaire was administered on one to one basis to avoid non-response. Precise and contextualized descriptive and inferential statistical procedures were used for analysis. Result: Of the 140 physicians, 103 (74%) were satisfied from their job with average score of 19.15 ± 11.46 while 37 (26%) were dissatisfied with average score -09.27 ± 06.30. Nine out of 15 components were found significant (P<0.05). Conclusion: Comparative assessment of the present results with those of other studies revealed that satisfaction percentage of Indian physicians and those of the developed countries were almost the same. Perhaps, magnitude of satisfaction level (average score) of the Indian physicians were towards the lower side. Nine determinants, identified in this study can be used safely to assess any professionals’ satisfaction.

12.
Genomics & Informatics ; : 95-101, 2007.
Article in English | WPRIM | ID: wpr-86068

ABSTRACT

In this paper, we consider the variable selection methods in the Cox model when a large number of gene expression levels are involved with survival time. Deciding which genes are associated with survival time has been a challenging problem because of the large number of genes and relatively small sample size (n << p). Several methods for variable selection have been proposed in the Cox model. Among those, we consider least absolute shrinkage and selection operator (LASSO), threshold gradient descent regularization (TGDR), and two different clustering threshold gradient descent regularization (CTGDR)- the K-means CTGDR and the hierarchical CTGDR - and compare these four methods in an application of lung cancer data. Comparison of the four methods shows that the two CTGDR methods yield more compact gene selection than TGDR, while LASSO selects the smallest number of genes. When these methods are evaluated by the approach of Ma and Huang (2007), none of the methods shows satisfactory performance in separating the two risk groups using the log-rank statistic based on the risk scores calculated from the selected genes. However, when the risk scores are calculated from the genes that are significant in the Cox model, the performance of the log-rank statistics shows that the two risk groups are well separated. Especially, the TGDR method has the largest log-rank statistic, and the K-means CTGDR method and the LASSO method show similar performance, but the hierarchical CTGDR method has the smallest log-rank statistic.


Subject(s)
Cluster Analysis , Gene Expression , Lung Neoplasms , Lung , Sample Size
SELECTION OF CITATIONS
SEARCH DETAIL