Pesquisa | Portal Regional da BVS (teste)

1.

Selection of optimal validation methods for quantitative structure-activity relationships and applicability domain.

Héberger, K.

SAR QSAR Environ Res ; 34(5): 415-434, 2023 May.

Artigo em Inglês | MEDLINE | ID: mdl-37227317

RESUMO

This brief literature survey groups the (numerical) validation methods and emphasizes the contradictions and confusion considering bias, variance and predictive performance. A multicriteria decision-making analysis has been made using the sum of absolute ranking differences (SRD), illustrated with five case studies (seven examples). SRD was applied to compare external and cross-validation techniques, indicators of predictive performance, and to select optimal methods to determine the applicability domain (AD). The ordering of model validation methods was in accordance with the sayings of original authors, but they are contradictory within each other, suggesting that any variant of cross-validation can be superior or inferior to other variants depending on the algorithm, data structure and circumstances applied. A simple fivefold cross-validation proved to be superior to the Bayesian Information Criterion in the vast majority of situations. It is simply not sufficient to test a numerical validation method in one situation only, even if it is a well defined one. SRD as a preferable multicriteria decision-making algorithm is suitable for tailoring the techniques for validation, and for the optimal determination of the applicability domain according to the dataset in question.

Assuntos

Algoritmos , Relação Quantitativa Estrutura-Atividade , Teorema de Bayes

2.

Modelling methods and cross-validation variants in QSAR: a multi-level analysis^$.

Rácz, A; Bajusz, D; Héberger, K.

SAR QSAR Environ Res ; 29(9): 661-674, 2018 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-30160175

RESUMO

Prediction performance often depends on the cross- and test validation protocols applied. Several combinations of different cross-validation variants and model-building techniques were used to reveal their complexity. Two case studies (acute toxicity data) were examined, applying five-fold cross-validation (with random, contiguous and Venetian blind forms) and leave-one-out cross-validation (CV). External test sets showed the effects and differences between the validation protocols. The models were generated with multiple linear regression (MLR), principal component regression (PCR), partial least squares (PLS) regression, artificial neural networks (ANN) and support vector machines (SVM). The comparisons were made by the sum of ranking differences (SRD) and factorial analysis of variance (ANOVA). The largest bias and variance could be assigned to the MLR method and contiguous block cross-validation. SRD can provide a unique and unambiguous ranking of methods and CV variants. Venetian blind cross-validation is a promising tool. The generated models were also compared based on their basic performance parameters (r2 and Q2). MLR produced the largest gap, while PCR gave the smallest. Although PCR is the best validated and balanced technique, SVM always outperformed the other methods, when experimental values were the benchmark. Variable selection was advantageous, and the modelling had a larger influence than CV variants.

Assuntos

Descoberta de Drogas/métodos , Modelos Moleculares , Relação Quantitativa Estrutura-Atividade , Análise de Variância , Bases de Dados de Compostos Químicos/estatística & dados numéricos , Testes de Toxicidade/estatística & dados numéricos

3.

Consistency of QSAR models: Correct split of training and test sets, ranking of models and performance parameters.

Rácz, A; Bajusz, D; Héberger, K.

SAR QSAR Environ Res ; 26(7-9): 683-700, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26434574

RESUMO

Recent implementations of QSAR modelling software provide the user with numerous models and a wealth of information. In this work, we provide some guidance on how one should interpret the results of QSAR modelling, compare and assess the resulting models, and select the best and most consistent ones. Two QSAR datasets are applied as case studies for the comparison of model performance parameters and model selection methods. We demonstrate the capabilities of sum of ranking differences (SRD) in model selection and ranking, and identify the best performance indicators and models. While the exchange of the original training and (external) test sets does not affect the ranking of performance parameters, it provides improved models in certain cases (despite the lower number of molecules in the training set). Performance parameters for external validation are substantially separated from the other merits in SRD analyses, highlighting their value in data fusion.

Assuntos

Derivados de Benzeno/química , Maleimidas/química , Relação Quantitativa Estrutura-Atividade , Amidoidrolases/antagonistas & inibidores , Amidoidrolases/química , Animais , Derivados de Benzeno/toxicidade , Cyprinidae , Técnicas de Apoio para a Decisão , Humanos , Maleimidas/toxicidade , Modelos Estatísticos , Simulação de Acoplamento Molecular , Monoacilglicerol Lipases/antagonistas & inibidores , Monoacilglicerol Lipases/química , Software

4.

Comparison of comet assay parameters for estimation of genotoxicity by sum of ranking differences.

Sunjog, K; Kolarevic, S; Héberger, K; Gacic, Z; Knezevic-Vukcevic, J; Vukovic-Gacic, B; Lenhardt, M.

Anal Bioanal Chem ; 405(14): 4879-85, 2013 May.

Artigo em Inglês | MEDLINE | ID: mdl-23525541

RESUMO

The genotoxic potential of waters in six rivers and reservoirs from Serbia was monitored in different tissues of chub (Squalius cephalus L. 1758) with the alkaline comet assay. The comet assay, or single-cell gel electrophoresis, has a wide application as a simple and sensitive method for evaluating DNA damage in fish exposed to various xenobiotics in the aquatic environment. Three types of cells, erythrocytes, gill cells, and liver cells, were used for assessing DNA damage. Images of randomly selected cells were analyzed with a Leica fluorescence microscope and image analysis by software (Comet Assay IV Image analysis system, PI, UK). Three parameters (tail length-l, tail intensity-i, and Olive tail moment-m) were analyzed on 1,700 nuclei per cell type. The procedure for sum of ranking differences (SRD) was implemented to compare different types of cells and different parameters for estimation of DNA damage. Regarding our nine different estimations of genotoxicity: tail length, intensity, and moment in erythrocytes (rel, rei, rem), liver cells (rll, rli, rlm), and gill cells (rgl, rgi, rgm), the SRD procedure has shown that the Olive tail moment and tail intensity are (almost) equally good parameters; the SRD value was lower for the tail moment and tail intensity than for tail length in the case of all types of cells. The least reliable parameter was rel; close to the borderline case were rei, rll, and rgl (~5 % probability of random ranking).

Assuntos

Ensaio Cometa/métodos , Cyprinidae/genética , Dano ao DNA/genética , DNA/genética , Interpretação Estatística de Dados , Testes de Mutagenicidade/métodos , Xenobióticos/intoxicação , Animais , DNA/efeitos dos fármacos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

5.

Role of Hansen solubility parameters in solid phase extraction.

Bielicka-Daszkiewicz, K; Voelkel, A; Pietrzynska, M; Héberger, K.

J Chromatogr A ; 1217(35): 5564-70, 2010 Aug 27.

Artigo em Inglês | MEDLINE | ID: mdl-20643412

RESUMO

The sorbent-eluent systems combined from eight polymeric sorbents and seven solvents as eluents were used for the extraction of phenol and its oxidation products from water samples. The individual interactions between sorbents, eluents and analytes were characterized by Hansen solubility parameters. Principal components analysis (PCA) was used for revealing the dominant interactions (dispersive, polar, and hydrogen bonding type) in sorbent-analyte-eluent systems. The importance of solubility parameters was also determined by a novel procedure based on sum of ranking differences (SRD). Although PCA and ranking by SRD are based on different principles and calculations, they have provided very similar results. The recovery in a given system has been predicted from the magnitudes of mutual interactions (sorbent-analyte, sorbent-eluent, analyte-eluent) by multiple linear regression.

Assuntos

Extração em Fase Sólida/instrumentação , Adsorção , Ligação de Hidrogênio , Fenóis/química , Fenóis/isolamento & purificação , Polímeros/química , Extração em Fase Sólida/métodos , Solubilidade , Solventes/química

6.

Cluster and principal component analysis for Kováts' retention indices on apolar and polar stationary phases in gas chromatography.

Dallos, A; Ngo, H S; Kresz, R; Héberger, K.

J Chromatogr A ; 1177(1): 175-82, 2008 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-18067899

RESUMO

An extended set of Kováts' retention indices of 137 organic compounds obtained at 405.15K with C 78 standard alkane and with standalone polar interactive groups was analyzed. The retention data determined using gas chromatography were subjected to hierarchical cluster analysis and principal component analysis to detect structure in the data and to classify retention indices and solutes alike. The statistical evaluation of the retention data explored the correlation between the retention indices of the solutes and the similarities/differences of the stationary liquids. A set of chromatographic systems was selected as possible new standards relating to linear solvation energy relationships based on the chemometric data reduction. The new molecular descriptors based on retention indices were tested in correlation models for normal boiling point and olive oil/gas partition coefficient data using ridge regression. The ridge regression provided new ways for variable selection. The normal boiling points of organic compounds can reasonably be described using retention indices on apolar (C78) and polar (trifluoro, hydroxy, bromo and cyano) model compounds. The partition coefficients between olive oil and the gas phase can similarly be well correlated with retention indices on apolar (C78) and polar (tetramethoxy, trifluoro and hydroxy) model compounds.

Assuntos

Cromatografia Gasosa/métodos , Análise por Conglomerados , Relação Estrutura-Atividade

7.

Evaluation of chemometric techniques to select orthogonal chromatographic systems.

Van Gyseghem, E; Dejaegher, B; Put, R; Forlay-Frick, P; Elkihel, A; Daszykowski, M; Héberger, K; Massart, D L; Heyden, Y Vander.

J Pharm Biomed Anal ; 41(1): 141-51, 2006 Apr 11.

Artigo em Inglês | MEDLINE | ID: mdl-16352413

RESUMO

Several chemometric techniques were compared for their performance to determine the orthogonality and similarity between chromatographic systems. Pearson's correlation coefficient (r) based color maps earlier were used to indicate selectivity differences between systems. These maps, in which the systems were ranked according to decreasing or increasing dissimilarities observed in the weighted-average-linkage dendrogram, were now applied as reference method. A number of chemometric techniques were evaluated as potential alternative (visualization) methods for the same purpose. They include hierarchical clustering techniques (single, complete, unweighted-average-linkage, centroid and Ward's method), the Kennard and Stone algorithm, auto-associative multivariate regression trees (AAMRT), and the generalized pairwise correlation method (GPCM) with McNemar's statistical test. After all, the reference method remained our preferred technique to select orthogonal and identify similar systems.

Assuntos

Química Farmacêutica/métodos , Cromatografia/métodos , Preparações Farmacêuticas/análise , Tecnologia Farmacêutica/métodos , Algoritmos , Análise por Conglomerados , Estudos de Avaliação como Assunto , Análise Multivariada , Reprodutibilidade dos Testes

8.

Prediction of ozone concentration in ambient air using multivariate methods.

Lengyel, A; Héberger, K; Paksy, L; Bánhidi, O; Rajkó, R.

Chemosphere ; 57(8): 889-96, 2004 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-15488579

RESUMO

Multivariate statistical methods including pattern recognition (Principal Component Analysis--PCA) and modeling (Multiple Linear Regression--MLR, Partial Least Squares--PLS, as well as Principal Component Regression--PCR) methods were carried out to evaluate the state of ambient air in Miskolc (second largest city in Hungary). Samples were taken from near the ground at a place with an extremely heavy traffic. Although PCA is not able to determine the significance of variables, it can uncover their similarities and classify the cases. PCA revealed that it is worth to separate day and night data because different factors influence the ozone concentrations during all day. Ozone concentration was modeled by MLR and PCR with the same efficiency if the conditions of meteorological parameters were not changed (i.e. morning and afternoon). Without night data, PCR and PLS suggest that the main process is not a photochemical but a chemical one.

Assuntos

Ar/análise , Monitoramento Ambiental/métodos , Ozônio/análise , Hungria , Modelos Lineares , Análise Multivariada , Análise de Componente Principal

9.

Differentiation of vegetable oils by mass spectrometry combined with statistical analysis.

Jakab, A; Nagy, K; Héberger, K; Vékey, K; Forgács, E.

Rapid Commun Mass Spectrom ; 16(24): 2291-7, 2002.

Artigo em Inglês | MEDLINE | ID: mdl-12478574

RESUMO

The main triacylglycerol (TAG) composition of different plant oils (almond, avocado, corn germ, grape seed, linseed, mustard seed, olive, peanut, pumpkin seed, sesame seed, soybean, sunflower, walnut and wheat germ) were analyzed using two different mass spectrometric techniques: HPLC/APCI-MS (high-performance liquid chromatography/atmospheric pressure chemical ionization mass spectrometry) and MALDI-TOFMS (matrix-assisted laser desorption/ionization time-of-flight mass spectrometry).Linear discriminant analysis (LDA) as a multivariate mathematical statistical method was successfully used to distinguish different plant oils based on their relative TAG composition. With LDA analysis of either APCI-MS or MALDI-MS data, the classification among the almond, avocado, grape seed, linseed, mustard seed, olive, sesame seed and soybean oil samples was 100% correct. In both cases only 6 different oil samples from a total of 73 were not classified correctly.

Assuntos

Espectrometria de Massas/métodos , Óleos de Plantas/química , Cromatografia Líquida de Alta Pressão , Análise Discriminante , Estrutura Molecular , Espectrometria de Massas por Ionização e Dessorção a Laser Assistida por Matriz

10.

Variable selection using pair-correlation method. Environmental applications.

Héberger, K; Rajkó, R.

SAR QSAR Environ Res ; 13(5): 541-54, 2002 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-12442770

RESUMO

Pair-correlation method (PCM) has been developed for selecting between two, correlated descriptor variables. PCM utilizes systematic information present in the scatter of QSAR applications. The data are suitably ordered in a 2 x 2 contingency table. Statistical tests are used to discriminate between the descriptor variables. We have developed, adapted, investigated and compared the following test statistics to each other: Conditional Fisher's exact test (CE), McNemar's test (MN), Chi-square test and Williams' t-test (Wt). If a test indicates significant difference between the descriptors, we use the terms superior-inferior or winner-loser for the overwhelming and subordinate descriptors, respectively. If more than two variables are to be compared, the discrimination can be made pair-wise and then the variables have to be ordered. Three ways of ordering have been used: simple ordering (number of wins), ordering according to the differences between wins and losses, and ordering according to probability-weighted differences between wins and losses. The basic algorithm of PCM has been described in this paper, the various selection criteria and the ordering methods were compared on suitable model systems. These case studies involved description of cAMP phospodiesterase inhibition by flavons, toxicity of chlorobenzenes and mutagenic character of aromatic and heteroaromatic amines.

Assuntos

Poluentes Ambientais/toxicidade , Modelos Químicos , Estatística como Assunto , 3',5'-AMP Cíclico Fosfodiesterases/farmacologia , Aminas/toxicidade , Clorobenzenos/toxicidade , Inibidores Enzimáticos , Flavonoides/efeitos adversos , Flavonoides/farmacologia , Relação Estrutura-Atividade

11.

Prediction of tumoricidal activity and accumulation of photosensitizers in photodynamic therapy using multiple linear regression and artificial neural networks.

Vanyrúr, R; Héberger, K; Kövesdi, I; Jakus, J.

Photochem Photobiol ; 75(5): 471-8, 2002 May.

Artigo em Inglês | MEDLINE | ID: mdl-12017472

RESUMO

The biological activities of a congeneric series of pyropheophorbides used as sensitizers in photodynamic therapy have been predicted on the basis of their molecular structures, using multiple linear regression and artificial neural network (ANN) computations. Theoretical descriptors (a total of 81) were calculated by the 3DNET program based on the three-dimensional structure (3D) of the geometry-optimized molecules. These input descriptors were tested as independent variables and used for model building. Systematic descriptor selections yielded models with one, two or three descriptors with good cross-validation results. The predictive abilities of the best fitting models were checked by shuffling and cross-validation procedures. ANN was suitable for building models for both linear and nonlinear relationships. Lipophilicity was sufficient to predict the accumulation of the sensitizers in the target tissue. Weighted holistic invariant molecular descriptors weighted by atomic mass, Van der Waals volume or electronegativity were also needed to predict photodynamic activity properly. Our models were able to predict the biological activities of 13 pyropheophorbide derivatives solely on the basis of their 3D molecular structures. Moreover, linear and nonlinear variable selection methods were compared in models built linearly and nonlinearly. It is expedient to use the same method (linear or nonlinear) for variable selection as for parameter estimation.

Assuntos

Antineoplásicos/química , Fotoquimioterapia , Fármacos Fotossensibilizantes/química , Desenho de Fármacos , Modelos Moleculares , Rede Nervosa , Redes Neurais de Computação

12.

Principal component analysis of polarity and interaction parameters in inverse gas chromatography.

Héberger, K; Milczewska, K; Voelkel, A.

J Chromatogr Sci ; 39(9): 375-84, 2001 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-11565947

RESUMO

Inverse gas chromatography is used in the characterization of aliphatic-aromatic and aromatic ketones, their oximes, and ketone-oxime or oxime-oxime mixtures. All these organic materials are used as liquid stationary phases in gas chromatographic columns. A series of polarity and Flory-Huggins interaction parameters are determined and used to describe the physicochemical properties of examined materials, metal extractants, and products of their degradation. Principal component analysis (PCA) is performed on a data matrix consisting of polarity and interaction parameters for ketones, their oximes, and mixtures. The calculations are carried out on the correlation matrix. It is found that seven principal components account for more than 95% of the total variance in the data, indicating that the polarity (interaction) parameters are not correlating well. Physical meanings are attributed to the principal components, the most influential ones being that the first and the second principal components account for several Flory-Huggins interaction parameters, whereas the fifth is correlated with criterion "A". The plots of component loadings show characteristic groupings of polarity indicators, whereas that of component scores show several groupings of stationary phases. Cluster analysis provides mainly the same groupings. PCA allows for the grouping of polarity and solubility parameters based on the information carried within those parameters. There is no need to use more than one parameter from each cluster. McReynolds polarity and the partial molar excess Gibbs free energy of solution per methylene group carry the same information. The groups of ketones, oximes, and their mixtures can be distinguished with the use of PCA on the basis of the measured polarity, solubility parameters, or both.

Assuntos

Cromatografia Gasosa/métodos , Fenômenos Químicos , Físico-Química , Cobre , Cetonas/química , Matemática , Oximas/química , Solubilidade , Termodinâmica

13.

Estimation of molar heat capacities in solution from gas chromatographic data.

Héberger, K; Görgényi, M.

J Chromatogr Sci ; 39(3): 113-20, 2001 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-11277252

RESUMO

The temperature dependence of retention data (retention or capacity factors) is measured for 35 aliphatic ketones and aldehydes as model compounds on a dimethylpolysiloxane stationary phase. A novel model is derived to determine the heat of solution and the solution molar heat capacities from the fits of the log natural of the difference of the retention factor and the column temperature (T) versus 1/T and the temperature arrangement. The convex curvature present in the residual plots of a former defined equation of ours disappears when applying a newly defined model. A detailed statistical analysis clearly shows the superiority of the refined model to the earlier one in a broader temperature range. The validation of this model is made through a comparison of heat capacity values taken from literature determined by different methods. The molar heat capacity of a pure liquid oxo compound is similar to that of when the same compound is solvated in a stationary phase.

14.

Principal component analysis of measured quantities during degradation of hydroperoxides in oxidized vegetable oils.

Héberger, K; Keszler, A; Gude, M.

Lipids ; 34(1): 83-92, 1999 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-10188601

RESUMO

Decomposition of hydroperoxides in sunflower oil under strictly oxygen-free conditions was followed by measuring peroxide values against time, absorbance values at 232 and 268 nm, para-anisidine values, and by quantitative analyses of volatile products using various additives. The results were arranged in a matrix form and subjected to principal component analysis. Three principal components explained 89-97% of the total variance in the data. The measured quantities and the effect of additives were closely related. Characteristic plots showed similarities among the measured quantities (loading plots) and among the additives (score plots). Initial decomposition rate of hydroperoxides and the amount of volatile products formed were similar to each other. The outliers, the absorbance values, were similar to each other but carried independent information from the other quantities. Para-anisidine value (PAV) was a unique parameter. Since PAV behaved differently during the course of hydroperoxide degradation, it served as a kinetic indicator. Most additives were similar in their effects on the mentioned quantities, but two outliers were also observed. Rotation of the principal component axes did not change the dominant patterns observed. The investigations clearly showed which variables were worth measuring to evaluate different additives.

Assuntos

Peróxido de Hidrogênio/análise , Modelos Estatísticos , Óleos de Plantas/análise , Óleos de Plantas/química , Análise de Variância , Compostos de Anilina/análise , Biodegradação Ambiental , Aditivos Alimentares/análise , Peróxido de Hidrogênio/química , Peróxido de Hidrogênio/metabolismo , Oxirredução , Óleos de Plantas/metabolismo , Reprodutibilidade dos Testes , Análise Espectral/métodos , Volatilização

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA