Pesquisa | Portal Regional da BVS

1.

On the increase of predictive performance with high-level data fusion.

Doeswijk, T G; Smilde, A K; Hageman, J A; Westerhuis, J A; van Eeuwijk, F A.

Anal Chim Acta ; 705(1-2): 41-7, 2011 Oct 31.

Artigo em Inglês | MEDLINE | ID: mdl-21962346

RESUMO

The combination of the different data sources for classification purposes, also called data fusion, can be done at different levels: low-level, i.e. concatenating data matrices, medium-level, i.e. concatenating data matrices after feature selection and high-level, i.e. combining model outputs. In this paper the predictive performance of high-level data fusion is investigated. Partial least squares is used on each of the data sets and dummy variables representing the classes are used as response variables. Based on the estimated responses y(j) for data set j and class k, a Gaussian distribution p(g(k)|y(j)) is fitted. A simulation study is performed that shows the theoretical performance of high-level data fusion for two classes and two data sets. Within group correlations of the predicted responses of the two models and differences between the predictive ability of each of the separate models and the fused models are studied. Results show that the error rate is always less than or equal to the best performing subset and can theoretically approach zero. Negative within group correlations always improve the predictive performance. However, if the data sets have a joint basis, as with metabolomics data, this is not likely to happen. For equally performing individual classifiers the best results are expected for small within group correlations. Fusion of a non-predictive classifier with a classifier that exhibits discriminative ability lead to increased predictive performance if the within group correlations are strong. An example with real life data shows the applicability of the simulation results.

Assuntos

Metabolômica/métodos , Inteligência Artificial , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos

2.

Dynamic metabolomic data analysis: a tutorial review.

Smilde, A K; Westerhuis, J A; Hoefsloot, H C J; Bijlsma, S; Rubingh, C M; Vis, D J; Jellema, R H; Pijl, H; Roelfsema, F; van der Greef, J.

Metabolomics ; 6(1): 3-17, 2010 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-20339444

RESUMO

In metabolomics, time-resolved, dynamic or temporal data is more and more collected. The number of methods to analyze such data, however, is very limited and in most cases the dynamic nature of the data is not even taken into account. This paper reviews current methods in use for analyzing dynamic metabolomic data. Moreover, some methods from other fields of science that may be of use to analyze such dynamic metabolomics data are described in some detail. The methods are put in a general framework after providing a formal definition on what constitutes a 'dynamic' method. Some of the methods are illustrated with real-life metabolomics examples.

3.

Matrix correlations for high-dimensional data: the modified RV-coefficient.

Smilde, A K; Kiers, H A L; Bijlsma, S; Rubingh, C M; van Erk, M J.

Bioinformatics ; 25(3): 401-5, 2009 Feb 01.

Artigo em Inglês | MEDLINE | ID: mdl-19073588

RESUMO

MOTIVATION: Modern functional genomics generates high-dimensional datasets. It is often convenient to have a single simple number characterizing the relationship between pairs of such high-dimensional datasets in a comprehensive way. Matrix correlations are such numbers and are appealing since they can be interpreted in the same way as Pearson's correlations familiar to biologists. The high-dimensionality of functional genomics data is, however, problematic for existing matrix correlations. The motivation of this article is 2-fold: (i) we introduce the idea of matrix correlations to the bioinformatics community and (ii) we give an improvement of the most promising matrix correlation coefficient (the RV-coefficient) circumventing the problems of high-dimensional data. RESULTS: The modified RV-coefficient can be used in high-dimensional data analysis studies as an easy measure of common information of two datasets. This is shown by theoretical arguments, simulations and applications to two real-life examples from functional genomics, i.e. a transcriptomics and metabolomics example. AVAILABILITY: The Matlab m-files of the methods presented can be downloaded from http://www.bdagroup.nl.

Assuntos

Genômica/métodos , Algoritmos , Simulação por Computador , Metabolômica/métodos

4.

Cross-validation of component models: a critical look at current methods.

Bro, R; Kjeldahl, K; Smilde, A K; Kiers, H A L.

Anal Bioanal Chem ; 390(5): 1241-51, 2008 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-18214448

RESUMO

In regression, cross-validation is an effective and popular approach that is used to decide, for example, the number of underlying features, and to estimate the average prediction error. The basic principle of cross-validation is to leave out part of the data, build a model, and then predict the left-out samples. While such an approach can also be envisioned for component models such as principal component analysis (PCA), most current implementations do not comply with the essential requirement that the predictions should be independent of the entity being predicted. Further, these methods have not been properly reviewed in the literature. In this paper, we review the most commonly used generic PCA cross-validation schemes and assess how well they work in various scenarios.

Assuntos

Análise de Componente Principal/métodos , Análise de Componente Principal/normas , Simulação por Computador , Modelos Biológicos , Reprodutibilidade dos Testes

5.

Examples of NIR based real time release in tablet manufacturing.

Skibsted, E T S; Westerhuis, J A; Smilde, A K; Witte, D T.

J Pharm Biomed Anal ; 43(4): 1297-305, 2007 Mar 12.

Artigo em Inglês | MEDLINE | ID: mdl-17166686

RESUMO

Real time release (RTR) of products is a new paradigm in the pharmaceutical industry. An RTR system assures that when the last manufacturing step is passed all the final release criteria are met. Various types of models can be used within the RTR framework. For each RTR system, the monitoring capability, control capability and RTR capability need to be tested. This paper presents some practical examples within the RTR framework using near-infrared and process data obtained from a tablet manufacturing process.

Assuntos

Química Farmacêutica/métodos , Composição de Medicamentos , Espectrofotometria Infravermelho/métodos , Comprimidos/química , Tecnologia Farmacêutica , Modelos Estatísticos

6.

Simple assessment of homogeneity in pharmaceutical mixing processes using a near-infrared reflectance probe and control charts.

Skibsted, E T S; Boelens, H F M; Westerhuis, J A; Witte, D T; Smilde, A K.

J Pharm Biomed Anal ; 41(1): 26-35, 2006 Apr 11.

Artigo em Inglês | MEDLINE | ID: mdl-16289623

RESUMO

Determination of homogeneous mixing of the active pharmaceutical ingredient (API) is an important in-process control within the manufacturing of solid dosage forms. In this paper two new near-infrared (NIR) based methods were presented; a qualitative and a quantitative method. Both methods are based on the calculation of net analyte signal (NAS) models which were very easy to develop, specific with respect to the API and required no additional reference analysis. Using a well-mixed batch as a 'golden standard' batch, control charts were developed and used for monitoring the homogeneity of other batches with NIR. The methods were fast, easy to use, non-destructive and provided statistical tests of homogeneity. A mixing study was characterized with the two methods and the methods were validated by comparison with traditional HPLC analysis.

Assuntos

Química Farmacêutica/métodos , Espectrofotometria Infravermelho/métodos , Tecnologia Farmacêutica/métodos , Cromatografia Líquida de Alta Pressão/métodos , Composição de Medicamentos , Modelos Estatísticos , Preparações Farmacêuticas/análise , Reprodutibilidade dos Testes , Fatores de Tempo

7.

Net analyte signal based statistical quality control.

Skibsted, E T S; Boelens, H F M; Westerhuis, J A; Smilde, A K; Broad, N W; Rees, D R; Witte, D T.

Anal Chem ; 77(22): 7103-14, 2005 Nov 15.

Artigo em Inglês | MEDLINE | ID: mdl-16285655

RESUMO

Net analyte signal statistical quality control (NAS-SQC) is a new methodology to perform multivariate product quality monitoring based on the net analyte signal approach. The main advantage of NAS-SQC is that the systematic variation in the product due to the analyte (or property) of interest is separated from the remaining systematic variation due to all other compounds in the matrix. This enhances the ability to flag products out of statistical control. Using control charts, the analyte content, variation of other compounds, and residual variation can be monitored. As an example, NAS-SQC is used to appreciate the control content uniformity of a commercially available pharmaceutical tablet product measured with near-infrared spectroscopy. Using the NAS chart, the active pharmaceutical ingredient (API) content is easily monitored for new tablets. However, since quality is a multivariate property, other quality parameters of the tablets are also monitored simultaneously. It will be demonstrated that, besides the API content, the water content of the tablets as well as the homogeneity of the other compounds is monitored.

Assuntos

Técnicas de Química Analítica/métodos , Técnicas de Química Analítica/normas , Computadores , Modelos Químicos , Controle de Qualidade , Análise Espectral

8.

Classification of highly similar crude oils using data sets from comprehensive two-dimensional gas chromatography and multivariate techniques.

van Mispelaar, V G; Smilde, A K; de Noord, O E; Blomberg, J; Schoenmakers, P J.

J Chromatogr A ; 1096(1-2): 156-64, 2005 Nov 25.

Artigo em Inglês | MEDLINE | ID: mdl-16236289

RESUMO

Comprehensive two-dimensional gas chromatography (GCxGC) has proven to be an extremely powerful separation technique for the analysis of complex volatile mixtures. This separation power can be used to discriminate between highly similar samples. In this article we will describe the use of GCxGC for the discrimination of crude oils from different reservoirs within one oil field. These highly complex chromatograms contain about 6000 individual, quantified components. Unfortunately, small differences in most of these 6000 components characterize the difference between these reservoirs. For this reason, multivariate-analysis (MVA) techniques are required for finding chemical profiles describing the differences between the reservoirs. Unfortunately, such methods cannot discern between 'informative variables', or peaks describing differences between samples, and 'uninformative variables', or peaks not describing relevant differences. For this reason, variable selection techniques are required. A selection based on information between duplicate measurements was used. With this information, 292 peaks were used for building a discrimination model. Validation was performed using the ratio of the sum of distances between groups and the sum of distances within groups. This step resulted in the detection of an outlier, which could be traced to a production problem, which could be explained retrospectively.

Assuntos

Cromatografia Gasosa/métodos , Petróleo/classificação , Análise Multivariada , Análise de Componente Principal

9.

New indicator for optimal preprocessing and wavelength selection of near-infrared spectra.

Skibsted, E T S; Boelens, H F M; Westerhuis, J A; Witte, D T; Smilde, A K.

Appl Spectrosc ; 58(3): 264-71, 2004 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-15035705

RESUMO

Preprocessing of near-infrared spectra to remove unwanted, i.e., non-related spectral variation and selection of informative wavelengths is considered to be a crucial step prior to the construction of a quantitative calibration model. The standard methodology when comparing various preprocessing techniques and selecting different wavelengths is to compare prediction statistics computed with an independent set of data not used to make the actual calibration model. When the errors of reference value are large, no such values are available at all, or only a limited number of samples are available, other methods exist to evaluate the preprocessing method and wavelength selection. In this work we present a new indicator (SE) that only requires blank sample spectra, i.e., spectra of samples that are mixtures of the interfering constituents (everything except the analyte), a pure analyte spectrum, or alternatively, a sample spectrum where the analyte is present. The indicator is based on computing the net analyte signal of the analyte and the total error, i.e., instrumental noise and bias. By comparing the indicator values when different preprocessing techniques and wavelength selections are applied to the spectra, the optimal preprocessing technique and the optimal wavelength selection can be determined without knowledge of reference values, i.e., it minimizes the non-related spectral variation. The SE indicator is compared to two other indicators that also use net analyte signal computations. To demonstrate the feasibility of the SE indicator, two near-infrared spectral data sets from the pharmaceutical industry were used, i.e., diffuse reflectance spectra of powder samples and transmission spectra of tablets. Especially in pharmaceutical spectroscopic applications, it is expected beforehand that the non-related spectral variation is rather large and it is important to remove it. The indicator gave excellent results with respect to wavelength selection and optimal preprocessing. The SE indicator performs better than the two other indicators, and it is also applicable to other situations where the Beer-Lambert law is valid.

Assuntos

Química Farmacêutica/métodos , Espectroscopia de Luz Próxima ao Infravermelho/métodos , Calibragem , Técnicas de Química Analítica/instrumentação , Técnicas de Química Analítica/métodos , Química Farmacêutica/instrumentação , Preparações Farmacêuticas/química , Pós , Espectroscopia de Luz Próxima ao Infravermelho/instrumentação , Comprimidos

10.

Multiblock PLS analysis of an industrial pharmaceutical process.

Lopes, J A; Menezes, J C; Westerhuis, J A; Smilde, A K.

Biotechnol Bioeng ; 80(4): 419-27, 2002 Nov 20.

Artigo em Inglês | MEDLINE | ID: mdl-12325150

RESUMO

The performance of an industrial pharmaceutical process (production of an active pharmaceutical ingredient by fermentation, API) was modeled by multiblock partial least squares (MBPLS). The most important process stages are inoculum production and API production fermentation. Thirty batches (runs) were produced according to an experimental planning. Rather than merging all these data into a single block of independent variables (as in ordinary PLS), four data blocks were used separately (manipulated and quality variables for each process stage). With the multiblock approach it was possible to calculate weights and scores for each independent block. It was found that the inoculum quality variables were highly correlated with API production for nominal fermentations. For the nonnominal fermentations, the manipulations of the fermentation stage explained the amount of API obtained (especially the pH and biomass concentration). Based on the above process analysis it was possible to select a smaller set of variables with which a new model was built. The amount of variance predicted of the final API concentration (cross-validation) for this model was 82.4%. The advantage of the multiblock model over the standard PLS model is that the contributions of the two main process stages to the API volumetric productivity were determined.

Assuntos

Reatores Biológicos , Fermentação/fisiologia , Modelos Biológicos , Streptomycetaceae/crescimento & desenvolvimento , Streptomycetaceae/metabolismo , Tecnologia Farmacêutica/métodos , Simulação por Computador , Análise dos Mínimos Quadrados , Modelos Estatísticos , Análise Multivariada , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Glycine max/metabolismo

11.

A multiway 3D QSAR analysis of a series of (S)-N-[(1-ethyl-2-pyrrolidinyl)methyl]-6-methoxybenzamides.

Nilsson, J; Homan, E J; Smilde, A K; Grol, C J; Wikström, H.

J Comput Aided Mol Des ; 12(1): 81-93, 1998 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-9570091

RESUMO

Recently, the multilinear PLS algorithm was presented by Bro and later implemented as a regression method in 3D QSAR by Nilsson et al. In the present article a well-known set of (S)-N-[(1-ethyl-2-pyrrolidinyl)methyl]-6-methoxybenzamides, with affinity towards the dopamine D2 receptor subtype, was utilised for the validation of the multilinear PLS method. After exhaustive conformational analyses on the ligands, the active analogue approach was employed to align them in their presumed pharmacologically active conformations, using (-)-piquindone as a template. Descriptors were then generated in the GRID program, and 40 calibration compounds and 18 test compounds were selected by means of a principal component analysis in the descriptor space. The final model was validated with different types of cross-validation experiments, e.g. leave-one-out, leave-three-out and leave-five-out. The cross-validated Q2 was 62% for all experiments, confirming the stability of the model. The prediction of the test set with a predicted Q2 of 62% also established the predictive ability. Finally, the conformations and the alignment of the ligands in combination with multilinear PLS, obviously, played an important role for the success of our model.

Assuntos

Benzamidas/química , Benzamidas/farmacologia , Antagonistas de Dopamina/química , Antagonistas de Dopamina/farmacologia , Algoritmos , Benzamidas/metabolismo , Simulação por Computador , Antagonistas de Dopamina/metabolismo , Antagonistas dos Receptores de Dopamina D2 , Ligantes , Modelos Moleculares , Conformação Molecular , Receptores de Dopamina D2/metabolismo , Relação Estrutura-Atividade , Termodinâmica

12.

Influence of temperature on vibrational spectra and consequences for the predictive ability of multivariate models.

Wülfert, F; Kok, W T; Smilde, A K.

Anal Chem ; 70(9): 1761-7, 1998 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-21651271

RESUMO

Temperature, pressure, viscosity, and other process variables fluctuate during an industrial process. When vibrational spectra are measured on- or in-line for process analytical and control purposes, the fluctuations influence the shape of the spectra in a nonlinear manner. The influence of these temperature-induced spectral variations on the predictive ability of multivariate calibration model is assessed. Short-wave NIR spectra of ethanol/water/2-propanol mixtures are taken at different temperatures, and different local and global partial least-squares calibration strategies are applied. The resulting prediction errors and sensitivity vectors of a test set are compared. For data with no temperature variation, the local models perform best with high sensitivity but the knowledge of the temperature for prediction measurements cannot aid in the improvement of local model predictions when temperature variation is introduced. The prediction errors of global models are considerably lower when temperature variation is present in the data set but at the expense of sensitivity. To be able to build temperature-stable calibration models with high sensitivity, a way of explicitly modeling the temperature should be found.

13.

Liquid chromatographic analysis of carboxylic acids using N-(4-aminobutyl)-N-ethylisoluminol as chemiluminescent label: determination of ibuprofen in saliva.

Steijger, O M; Lingeman, H; Brinkman, U A; Holthuis, J J; Smilde, A K; Doornbos, D A.

J Chromatogr ; 615(1): 97-110, 1993 May 19.

Artigo em Inglês | MEDLINE | ID: mdl-8393459

RESUMO

N-(4-Aminobutyl)-N-ethylisoluminol was used for labelling of carboxylic acids. The derivatization reaction was carried out with 1-hydroxybenzotriazole as pre-activator of the carboxylic acid function and N-ethyl-N'-(3-dimethylaminopropyl)carbodiimide as the coupling reagent. Optimum conditions for the derivatization were determined by using factorial design analysis, with ibuprofen as the test compound. Chemiluminescence detection was carried out using a post-column on-line electrochemical hydrogen peroxide generation system and the addition of microperoxidase as the catalyst. The detection limit of derivatized ibuprofen in human saliva was 0.7 ng per 0.5 ml of saliva, with a recovery of 96.1 +/- 1.3%. The method was linear over at least three decades (2.5 ng to 2.5 micrograms) and the repeatability was satisfactory (R.S.D. = 5.2% at the 25 ng level; n = 4).

Assuntos

Ácidos Carboxílicos/análise , Ibuprofeno/análise , Luminol/análogos & derivados , Saliva/química , Proteínas Sanguíneas/metabolismo , Humanos , Peróxido de Hidrogênio/análise , Indicadores e Reagentes , Medições Luminescentes , Peroxidase/química , Ligação Proteica

14.

Optimization of the reversed-phase high-performance liquid chromatographic separation of synthetic estrogenic and progestogenic steroids using the multi-criteria decision making method.

Smilde, A K; Bruins, C H; Doornbos, D A; Vink, J.

J Chromatogr ; 410(1): 1-12, 1987 Nov 20.

Artigo em Inglês | MEDLINE | ID: mdl-3429543

RESUMO

The optimization of the reversed-phase high-performance liquid chromatographic separation of a mixture of ethynylestradiol, desogestrel and three related compounds is described. A procedure is used that allows the prediction of the capacity factors of each individual synthetic steroid, depending on the mobile phase composition. Therefore, complete chromatograms can be predicted and evaluated with the multi-criteria decision making method.

Assuntos

Estrogênios/isolamento & purificação , Progestinas/isolamento & purificação , Cromatografia Líquida de Alta Pressão

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA