Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 210
Filter
1.
J Anal Methods Chem ; 2017: 9402045, 2017.
Article in English | MEDLINE | ID: mdl-28168083

ABSTRACT

The rapid increase in the use of metabolite profiling/fingerprinting techniques to resolve complicated issues in metabolomics has stimulated demand for data processing techniques, such as alignment, to extract detailed information. In this study, a new and automated method was developed to correct the retention time shift of high-dimensional and high-throughput data sets. Information from the target chromatographic profiles was used to determine the standard profile as a reference for alignment. A novel, piecewise data partition strategy was applied for the determination of the target components in the standard profile as markers for alignment. An automated target search (ATS) method was proposed to find the exact retention times of the selected targets in other profiles for alignment. The linear interpolation technique (LIT) was employed to align the profiles prior to pattern recognition, comprehensive comparison analysis, and other data processing steps. In total, 94 metabolite profiles of ginseng were studied, including the most volatile secondary metabolites. The method used in this article could be an essential step in the extraction of information from high-throughput data acquired in the study of systems biology, metabolomics, and biomarker discovery.

2.
Guang Pu Xue Yu Guang Pu Fen Xi ; 37(1): 95-102, 2017 01.
Article in Chinese | MEDLINE | ID: mdl-30192487

ABSTRACT

Near infrared spectroscopy (NIRS) is a kind of indirect analysis technology, whose application depends on the setting up of relevant calibration model. In order to improve interpretability, accuracy and modeling efficiency of the prediction model, wavelength selection becomes very important and it can minimize redundant information of near infrared spectrum. Intelligent optimization algorithm is a sort of commonly wavelength selection method which establishes algorithm model by mathematical abstraction from the background of biological behavior or movement form of material, then iterative calculation to solve combinatorial optimization problems. Its core strategy is screening effective wavelength points in multivariate calibration modeling by using some objective functions as a standard with successive approximation method. In this work, five intelligent optimization algorithms, including ant colony optimization (ACO), genetic algorithm (GA), particle swarm optimization (PSO), random frog (RF) and simulated annealing (SA) algorithm, were used to select characteristic wavelength from NIR data of tobacco leaf for determination of total nitrogen and nicotine content and together with partial least squares (PLS) to construct multiple correction models. The comparative analysis results of these models showed that, the total nitrogen optimums models of dataset A and B were PSO-PLS and GA-PLS models. GA-PLS and SA-PLS models were the optimums for nicotine, respectively. Although not all predicting performance of these optimization models was superior to that of full spectrum PLS models, they were simplified greatly and their forecasting accuracy, precision, interpretability and stability were improved. Therefore, this research will have great significance and plays an important role for the practical application. Meanwhile, it could be concluded that the informative wavelength combination for total nitrogen were 4 587~4 878 and 6 700~7 200 cm(-1), and that for tobacco nicotine were 4 500~4 700 and 5 800~6 000 cm(-1). These selected wavelengths have actually physical significance.

3.
Analyst ; 141(19): 5586-97, 2016 Oct 07.
Article in English | MEDLINE | ID: mdl-27435388

ABSTRACT

Variable selection and outlier detection are important processes in chemical modeling. Usually, they affect each other. Their performing orders also strongly affect the modeling results. Currently, many studies perform these processes separately and in different orders. In this study, we examined the interaction between outliers and variables and compared the modeling procedures performed with different orders of variable selection and outlier detection. Because the order of outlier detection and variable selection can affect the interpretation of the model, it is difficult to decide which order is preferable when the predictabilities (prediction error) of the different orders are relatively close. To address this problem, a simultaneous variable selection and outlier detection approach called Model Adaptive Space Shrinkage (MASS) was developed. This proposed approach is based on model population analysis (MPA). Through weighted binary matrix sampling (WBMS) from model space, a large number of partial least square (PLS) regression models were built, and the elite parts of the models were selected to statistically reassign the weight of each variable and sample. Then, the whole process was repeated until the weights of the variables and samples converged. Finally, MASS adaptively found a high performance model which consisted of the optimized variable subset and sample subset. The combination of these two subsets could be considered as the cleaned dataset used for chemical modeling. In the proposed approach, the problem of the order of variable selection and outlier detection is avoided. One near infrared spectroscopy (NIR) dataset and one quantitative structure-activity relationship (QSAR) dataset were used to test this approach. The result demonstrated that MASS is a useful method for data cleaning before building a predictive model.

4.
Anal Chim Acta ; 914: 17-34, 2016 Mar 31.
Article in English | MEDLINE | ID: mdl-26965324

ABSTRACT

This review focuses on recent and potential advances in chemometric methods in relation to data processing in metabolomics, especially for data generated from mass spectrometric techniques. Metabolomics is gradually being regarded a valuable and promising biotechnology rather than an ambitious advancement. Herein, we outline significant developments in metabolomics, especially in the combination with modern chemical analysis techniques, and dedicated statistical, and chemometric data analytical strategies. Advanced skills in the preprocessing of raw data, identification of metabolites, variable selection, and modeling are illustrated. We believe that insights from these developments will help narrow the gap between the original dataset and current biological knowledge. We also discuss the limitations and perspectives of extracting information from high-throughput datasets.


Subject(s)
Mass Spectrometry/methods , Metabolomics , Chromatography, Liquid , Models, Theoretical
5.
Int J Biol Macromol ; 87: 290-4, 2016 Jun.
Article in English | MEDLINE | ID: mdl-26927937

ABSTRACT

A method using partial least squares (PLS) for simultaneous determination of neutral and uronic sugars was developed in this paper. This method is based on the development of the reaction between the analytes and anthrone. The calibration set was built with 25 binary solutions at the concentrations ranging from 20 to 100µg/mL for glucose and from 10 to 50µg/mL for glucuronic acid. An independent prediction set was utilized to check the robustness of the PLS calibration model. The root-mean-square error of prediction (RMSEP) values for neutral and uronic sugars are 1.2233 and 1.9367, respectively. The correlation coefficient for the prediction set (Rp(2)) values for them are 0.9971 and 0.9767, respectively. Compared with the univariate method, the proposed method improves detection accuracy. In addition, it was also applied to commercial polysaccharides and Glycyrrhiza uralensis polysaccharides (GUPs), and the results indicated that the PLS model was suitable for simultaneous determination of neutral and uronic sugars.


Subject(s)
Carbohydrates/analysis , Carbohydrates/chemistry , Spectrophotometry, Ultraviolet/methods , Uronic Acids/chemistry , Calibration , Glycyrrhiza/chemistry , Least-Squares Analysis , Principal Component Analysis , Reproducibility of Results , Time Factors
6.
J Chromatogr B Analyt Technol Biomed Life Sci ; 1015-1016: 82-91, 2016 Mar 15.
Article in English | MEDLINE | ID: mdl-26901849

ABSTRACT

Traditional Chinese medicines (TCMs) bring a great challenge in quality control and evaluating the efficacy because of their complexity of chemical composition. Chemometric techniques provide a good opportunity for mining more useful chemical information from TCMs. Then, the application of chemometrics in the field of TCMs is spontaneous and necessary. This review focuses on the recent various important chemometrics tools for chromatographic fingerprinting, including peak alignment information features, baseline correction and applications of chemometrics in metabolomics and modernization of TCMs, including authentication and evaluation of the quality of TCMs, evaluating the efficacy of TCMs and essence of TCM syndrome. In the conclusions, the general trends and some recommendations for improving chromatographic metabolomics data analysis are provided.


Subject(s)
Chromatography , Drugs, Chinese Herbal/analysis , Metabolomics , Chromatography/methods , Chromatography/standards , Metabolomics/methods , Metabolomics/standards
7.
Anal Chim Acta ; 911: 27-34, 2016 Mar 10.
Article in English | MEDLINE | ID: mdl-26893083

ABSTRACT

Biomarker discovery is one important goal in metabolomics, which is typically modeled as selecting the most discriminating metabolites for classification and often referred to as variable importance analysis or variable selection. Until now, a number of variable importance analysis methods to discover biomarkers in the metabolomics studies have been proposed. However, different methods are mostly likely to generate different variable ranking results due to their different principles. Each method generates a variable ranking list just as an expert presents an opinion. The problem of inconsistency between different variable ranking methods is often ignored. To address this problem, a simple and ideal solution is that every ranking should be taken into account. In this study, a strategy, called rank aggregation, was employed. It is an indispensable tool for merging individual ranking lists into a single "super"-list reflective of the overall preference or importance within the population. This "super"-list is regarded as the final ranking for biomarker discovery. Finally, it was used for biomarkers discovery and selecting the best variable subset with the highest predictive classification accuracy. Nine methods were used, including three univariate filtering and six multivariate methods. When applied to two metabolic datasets (Childhood overweight dataset and Tubulointerstitial lesions dataset), the results show that the performance of rank aggregation has improved greatly with higher prediction accuracy compared with using all variables. Moreover, it is also better than penalized method, least absolute shrinkage and selectionator operator (LASSO), with higher prediction accuracy or less number of selected variables which are more interpretable.


Subject(s)
Biomarkers/metabolism , Metabolomics , Case-Control Studies , Child , Gas Chromatography-Mass Spectrometry , Humans , Models, Theoretical , Overweight/blood
8.
Analyst ; 141(6): 1973-80, 2016 Mar 21.
Article in English | MEDLINE | ID: mdl-26846329

ABSTRACT

In order to solve the spectra standardization problem in near-infrared (NIR) spectroscopy, a Transfer via Extreme learning machine Auto-encoder Method (TEAM) has been proposed in this study. A comparative study among TEAM, piecewise direct standardization (PDS), generalized least squares (GLS) and calibration transfer methods based on canonical correlation analysis (CCA) was conducted, and the performances of these algorithms were benchmarked with three spectral datasets: corn, tobacco and pharmaceutical tablet spectra. The results show that TEAM is a stable method and can significantly reduce prediction errors compared with PDS, GLS and CCA. TEAM can also achieve the best RMSEPs in most cases with a small number of calibration sets. TEAM is implemented in Python language and available as an open source package at https://github.com/zmzhang/TEAM.

9.
Anal Chim Acta ; 908: 63-74, 2016 Feb 18.
Article in English | MEDLINE | ID: mdl-26826688

ABSTRACT

In this study, a new variable selection method called bootstrapping soft shrinkage (BOSS) method is developed. It is derived from the idea of weighted bootstrap sampling (WBS) and model population analysis (MPA). The weights of variables are determined based on the absolute values of regression coefficients. WBS is applied according to the weights to generate sub-models and MPA is used to analyze the sub-models to update weights for variables. The optimization procedure follows the rule of soft shrinkage, in which less important variables are not eliminated directly but are assigned smaller weights. The algorithm runs iteratively and terminates until the number of variables reaches one. The optimal variable set with the lowest root mean squared error of cross-validation (RMSECV) is selected. The method was tested on three groups of near infrared (NIR) spectroscopic datasets, i.e. corn datasets, diesel fuels datasets and soy datasets. Three high performing variable selection methods, i.e. Monte Carlo uninformative variable elimination (MCUVE), competitive adaptive reweighted sampling (CARS) and genetic algorithm partial least squares (GA-PLS) are used for comparison. The results show that BOSS is promising with improved prediction performance. The Matlab codes for implementing BOSS are freely available on the website: http://www.mathworks.com/matlabcentral/fileexchange/52770-boss.


Subject(s)
Models, Chemical , Algorithms , Least-Squares Analysis , Monte Carlo Method , Spectroscopy, Near-Infrared
10.
Talanta ; 147: 82-9, 2016 Jan 15.
Article in English | MEDLINE | ID: mdl-26592580

ABSTRACT

Male infertility has become an important public health problem worldwide. Nowadays the diagnosis of male infertility frequently depends on the results of semen quality or requires more invasive surgical intervention. Therefore, it is necessary to develop a novel approach for early diagnosis of male infertility. According to the presence or absence of normal sexual function, the male infertility is classified into two phenotypes, erectile dysfunction (ED) and semen abnormalities (SA). The aim of this study was to investigate the GC-MS plasma profiles of infertile male having erectile dysfunction (ED) and having semen abnormalities (SA) and discover the potential biomarkers. The plasma samples from healthy controls (HC) (n=61) and infertility patients with ED (n=26) or with SA (n=44) were analyzed by gas chromatography-mass spectrometry (GC-MS) for discrimination and screening potential biomarkers. The partial least squares-discriminant analysis (PLS-DA) was performed on GC-MS dataset. The results showed that HC could be discriminated from infertile cases having SA (AUC=86.96%, sensitivity=78.69%, specificity=84.09%, accuracy=80.95%) and infertile cases having ED (AUC=94.33%, sensitivity=80.33%, specificity=100%, accuracy=87.36%). Some potential biomarkers were successfully discovered by two commonly used variable selection methods, variable importance on projection (VIP) and original coefficients of PLS-DA (ß). 1,5-Anhydro-sorbitol and α-hydroxyisovaleric acid were identified as the potential biomarkers for distinguishing HC from the male infertility patients. Meanwhile, lactate, glutamate and cholesterol were the found to be the important variables to distinguish between patients with erectile dysfunction from those with semen abnormalities. The plasma metabolomics may be developed as a novel approach for fast, noninvasive, and acceptable diagnosis and characterization of male infertility.


Subject(s)
Biomarkers/blood , Blood Chemical Analysis , Gas Chromatography-Mass Spectrometry , Infertility, Male/blood , Metabolomics/methods , Plasma/chemistry , Humans , Male , Multivariate Analysis
11.
Analyst ; 140(23): 7955-64, 2015 Dec 07.
Article in English | MEDLINE | ID: mdl-26514234

ABSTRACT

Accurate peak detection is essential for analyzing high-throughput datasets generated by analytical instruments. Derivatives with noise reduction and matched filtration are frequently used, but they are sensitive to baseline variations, random noise and deviations in the peak shape. A continuous wavelet transform (CWT)-based method is more practical and popular in this situation, which can increase the accuracy and reliability by identifying peaks across scales in wavelet space and implicitly removing noise as well as the baseline. However, its computational load is relatively high and the estimated features of peaks may not be accurate in the case of peaks that are overlapping, dense or weak. In this study, we present multi-scale peak detection (MSPD) by taking full advantage of additional information in wavelet space including ridges, valleys, and zero-crossings. It can achieve a high accuracy by thresholding each detected peak with the maximum of its ridge. It has been comprehensively evaluated with MALDI-TOF spectra in proteomics, the CAMDA 2006 SELDI dataset as well as the Romanian database of Raman spectra, which is particularly suitable for detecting peaks in high-throughput analytical signals. Receiver operating characteristic (ROC) curves show that MSPD can detect more true peaks while keeping the false discovery rate lower than MassSpecWavelet and MALDIquant methods. Superior results in Raman spectra suggest that MSPD seems to be a more universal method for peak detection. MSPD has been designed and implemented efficiently in Python and Cython. It is available as an open source package at .

12.
J Sep Sci ; 38(21): 3720-6, 2015 Nov.
Article in English | MEDLINE | ID: mdl-26315612

ABSTRACT

The classical traditional Chinese formulation LiuweiDihuang, shown to have clinical efficacy for "nourishing kidney-yin" in traditional Chinese medicine, has been used for thousands of years in China. Little attention, however, has been paid to quality control methods for this formulation. Hence, a rapid and sensitive analytical technique is urgently needed for the evaluation of LiuweiDihuang preparations to assess its quality and pharmacological functionality. In this study, an ultra high performance liquid chromatography dual-wavelength method was developed to simultaneously determine 11 constituents in LiuweiDihuang preparations. This robust approach provided a fast and comprehensive quantitative determination of the major bioactive markers within LiuweiDihuang preparations. To distinguish four dosage forms of LiuweiDihuang preparations, a random forest technique was applied on the spectrometric fingerprint data obtained. This combination approach of chromatographic techniques and data analyses might serve as a rapid and efficient tool to ensure the quality of LiuweiDihuang preparations and other Chinese medicinal formulations and can support quality control and scientific research into the pharmacological potential for these formulations.


Subject(s)
Chromatography, High Pressure Liquid/methods , Drugs, Chinese Herbal/chemistry , Spectrophotometry, Ultraviolet/methods , Limit of Detection , Principal Component Analysis , Reference Standards , Reproducibility of Results
13.
Article in English | MEDLINE | ID: mdl-26262599

ABSTRACT

In this work, eleven compounds were successfully separated from Trollius chinensis Bunge by using a two-step high-speed counter-current chromatography (HSCCC) method. NRTL-SAC (nonrandom two-liquid segment activity coefficient) method, a newly developed solvent system selection strategy, was applied to screening the suitable biphasic liquid systems. Hexane/ethyl acetate/ethanol/water (3:7:3:7, v/v) solvent system was used in the first step, while the hexane/ethyl acetate/methanol/water (1:2:1:2, 1:4:1:4, 1:9:1:9, v/v) systems were employed in the second step. The chemical structures of the separated compounds were identified by UV, high resolution ESI-MS and MS/MS data. The separated compounds are 3,4-dihydroxyphenylethanol (1), vanillic acid (2), orientin (3), vitexin (4), veratric acid (5), 2″-O-(3‴, 4‴-dimethoxybenzoyl) orientin (6), 2″-O-feruloylorientin (7), 2″-O-feruloylvitexin (8), 2″-O-(2‴-methylbutyryl) vitexin (9), 2″-O-(2‴-methylbutyryl) isoswertiajaponin (10), 2″-O-(2‴-methylbutyryl) isoswertisin (11). The results demonstrate that HSCCC is a powerful tool for the separation of compounds from extremely complex samples.


Subject(s)
Countercurrent Distribution/methods , Flavonoids/isolation & purification , Hydroxybenzoates/isolation & purification , Ranunculaceae/chemistry , Chromatography, High Pressure Liquid , Solvents
14.
Int J Biol Macromol ; 79: 681-6, 2015 Aug.
Article in English | MEDLINE | ID: mdl-26051342

ABSTRACT

Glycyrrhiza uralensis, an important Chinese medicine, has a long history of use in China. In this study, three water-soluble polysaccharides fractions (GUPs-1, GUPs-2 and GUPs-3) were isolated and purified from the root of G. uralensis by DEAE-52 and Sephadex G-100 column chromatography. Physicochemical properties and antioxidant activities of the three purified polysaccharides were investigated. The molecular weights of GUPs-1, GUPs-2 and GUPs-3 were 10,160, 11,680 and 13,360 Da, and the ratios of glucose were 23.4%, 14% and 1.13%, respectively. The antioxidant activities of the three purified polysaccharides followed the order: GUPs-1>GUPs-2>GUPs-3. GUPs with lower molecular weight and higher ratio of glucose, basically exhibited higher antioxidant activities at the same concentration. This indicated that the molecular weight and the ratio of monosaccharide composition of the GUPs could affect the antioxidant activities.


Subject(s)
Antioxidants/chemistry , Glycyrrhiza uralensis/chemistry , Iron Chelating Agents/chemistry , Plant Roots/chemistry , Polysaccharides/chemistry , Antioxidants/isolation & purification , Biphenyl Compounds/antagonists & inhibitors , Chromatography, Gel , Chromatography, Ion Exchange , Glucose/analysis , Hydroxyl Radical/antagonists & inhibitors , Iron Chelating Agents/isolation & purification , Molecular Weight , Oxidation-Reduction , Picrates/antagonists & inhibitors , Polysaccharides/isolation & purification
15.
Anal Chim Acta ; 880: 32-41, 2015 Jun 23.
Article in English | MEDLINE | ID: mdl-26092335

ABSTRACT

Partial least squares (PLS) is one of the most widely used methods for chemical modeling. However, like many other parameter tunable methods, it has strong tendency of over-fitting. Thus, a crucial step in PLS model building is to select the optimal number of latent variables (nLVs). Cross-validation (CV) is the most popular method for PLS model selection because it selects a model from the perspective of prediction ability. However, a clear minimum of prediction errors may not be obtained in CV which makes the model selection difficult. To solve the problem, we proposed a new strategy for PLS model selection which combines the cross-validated coefficient of determination (Qcv(2)) and model stability (S). S is defined as the stability of PLS regression vectors which is obtained using model population analysis (MPA). The results show that, when a clear maximum of Qcv(2) is not obtained, S can provide additional information of over-fitting and it helps in finding the optimal nLVs. Compared with other regression vector based indictors such as the Euclidean 2-norm (B2), the Durbin Watson statistic (DW) and the jaggedness (J), S is more sensitive to over-fitting. The model selected by our method has both good prediction ability and stability.


Subject(s)
Algorithms , Models, Chemical , Least-Squares Analysis , Software , Glycine max/chemistry , Glycine max/metabolism , Spectrophotometry, Ultraviolet
16.
Int J Biol Macromol ; 79: 983-7, 2015 Aug.
Article in English | MEDLINE | ID: mdl-26093314

ABSTRACT

A method for quantitative analysis of the polysaccharides contents in Glycyrrhiza was developed based on near infrared (NIR) spectroscopy, and by adopting the phenol-sulphuric acid method as the reference method. This is the first time to use this method for predicting polysaccharides contents in Glycyrrhiza. To improve the predictive ability (or robustness) of the model, the competitive adaptive reweighted sampling (CARS) mathematical strategy was used for selecting relevance wavelengths. By using the restricted relevance wavelengths, the PLS model was more efficient and parsimonious. The coefficient of determination of prediction (Rp(2)) and the root mean square error of prediction (RMSEP) of the obtained optimum models were 0.9119 and 0.4350 for polysaccharides. The selected relevance wavelengths were also interpreted. It proved that all the wavelengths selected by CARS were related to functional groups of polysaccharide. The overall results show that NIR spectroscopy combined with chemometrics can be efficiently utilised for analysis of polysaccharides contents in Glycyrrhiza.


Subject(s)
Antioxidants/chemistry , Glycyrrhiza/chemistry , Polysaccharides/isolation & purification , Antioxidants/isolation & purification , Fruit/chemistry , Polysaccharides/chemistry , Polysaccharides/classification , Spectroscopy, Near-Infrared
17.
Biochem Biophys Res Commun ; 461(1): 186-92, 2015 May 22.
Article in English | MEDLINE | ID: mdl-25881503

ABSTRACT

Renal interstitial fibrosis closely relates to chronic kidney disease and is regarded as the final common pathway in most cases of end-stage renal disease. Metabolomic biomarkers can facilitate early diagnosis and allow better understanding of the pathogenesis underlying renal fibrosis. Gas chromatography-mass spectrometry (GC/MS) is one of the most promising techniques for identification of metabolites. However, the existence of the background, baseline offset, and overlapping peaks makes accurate identification of the metabolites unachievable. In this study, GC/MS coupled with chemometric methods was successfully developed to accurately identify and seek metabolic biomarkers for rats with renal fibrosis. By using these methods, seventy-six metabolites from rat serum were accurately identified and five metabolites (i.e., urea, ornithine, citric acid, galactose, and cholesterol) may be useful as potential biomarkers for renal fibrosis.


Subject(s)
Algorithms , Biomarkers/blood , Blood Chemical Analysis/methods , Data Interpretation, Statistical , Gas Chromatography-Mass Spectrometry/methods , Kidney/metabolism , Renal Insufficiency, Chronic/blood , Animals , Fibrosis/blood , Male , Multivariate Analysis , Rats , Rats, Wistar , Renal Insufficiency, Chronic/diagnosis , Reproducibility of Results , Sensitivity and Specificity
18.
J Chromatogr A ; 1393: 47-56, 2015 May 08.
Article in English | MEDLINE | ID: mdl-25818557

ABSTRACT

Solvent system selection is the first step toward a successful counter-current chromatography (CCC) separation. This paper introduces a systematic and practical solvent system selection strategy based on the nonrandom two-liquid segment activity coefficient (NRTL-SAC) model, which is efficient in predicting the solute partition coefficient. Firstly, the application of the NRTL-SAC method was extended to the ethyl acetate/n-butanol/water and chloroform/methanol/water solvent system families. Moreover, the versatility and predictive capability of the NRTL-SAC method were investigated. The results indicate that the solute molecular parameters identified from hexane/ethyl acetate/methanol/water solvent system family are capable of predicting a large number of partition coefficients in several other different solvent system families. The NRTL-SAC strategy was further validated by successfully separating five components from Salvia plebeian R.Br. We therefore propose that NRTL-SAC is a promising high throughput method for rapid solvent system selection and highly adaptable to screen suitable solvent system for real-life CCC separation.


Subject(s)
Chromatography, High Pressure Liquid/methods , Countercurrent Distribution/methods , Solvents/chemistry , 1-Butanol/chemistry , Acetates/chemistry , Chloroform/chemistry , Hexanes/chemistry , Methanol/chemistry , Plant Extracts/chemistry , Salvia/chemistry , Water/chemistry
19.
J Sep Sci ; 38(6): 965-74, 2015 Mar.
Article in English | MEDLINE | ID: mdl-25645318

ABSTRACT

Retention time shift is one of the most challenging problems during the preprocessing of massive chromatographic datasets. Here, an improved version of the moving window fast Fourier transform cross-correlation algorithm is presented to perform nonlinear and robust alignment of chromatograms by analyzing the shifts matrix generated by moving window procedure. The shifts matrix in retention time can be estimated by fast Fourier transform cross-correlation with a moving window procedure. The refined shift of each scan point can be obtained by calculating the mode of corresponding column of the shifts matrix. This version is simple, but more effective and robust than the previously published moving window fast Fourier transform cross-correlation method. It can handle nonlinear retention time shift robustly if proper window size has been selected. The window size is the only one parameter needed to adjust and optimize. The properties of the proposed method are investigated by comparison with the previous moving window fast Fourier transform cross-correlation and recursive alignment by fast Fourier transform using chromatographic datasets. The pattern recognition results of a gas chromatography mass spectrometry dataset of metabolic syndrome can be improved significantly after preprocessing by this method. Furthermore, the proposed method is available as an open source package at https://github.com/zmzhang/MWFFT2.


Subject(s)
Data Interpretation, Statistical , Drugs, Chinese Herbal/analysis , Fatty Acids, Nonesterified/blood , Metabolic Syndrome/blood , Scutellaria baicalensis/chemistry , Chromatography , Data Mining , Fourier Analysis , Humans
20.
Anal Chim Acta ; 862: 14-23, 2015 Mar 03.
Article in English | MEDLINE | ID: mdl-25682424

ABSTRACT

Variable (wavelength or feature) selection techniques have become a critical step for the analysis of datasets with high number of variables and relatively few samples. In this study, a novel variable selection strategy, variable combination population analysis (VCPA), was proposed. This strategy consists of two crucial procedures. First, the exponentially decreasing function (EDF), which is the simple and effective principle of 'survival of the fittest' from Darwin's natural evolution theory, is employed to determine the number of variables to keep and continuously shrink the variable space. Second, in each EDF run, binary matrix sampling (BMS) strategy that gives each variable the same chance to be selected and generates different variable combinations, is used to produce a population of subsets to construct a population of sub-models. Then, model population analysis (MPA) is employed to find the variable subsets with the lower root mean squares error of cross validation (RMSECV). The frequency of each variable appearing in the best 10% sub-models is computed. The higher the frequency is, the more important the variable is. The performance of the proposed procedure was investigated using three real NIR datasets. The results indicate that VCPA is a good variable selection strategy when compared with four high performing variable selection methods: genetic algorithm-partial least squares (GA-PLS), Monte Carlo uninformative variable elimination by PLS (MC-UVE-PLS), competitive adaptive reweighted sampling (CARS) and iteratively retains informative variables (IRIV). The MATLAB source code of VCPA is available for academic research on the website: http://www.mathworks.com/matlabcentral/fileexchange/authors/498750.


Subject(s)
Models, Statistical , Algorithms , Calibration , Least-Squares Analysis , Monte Carlo Method , Multivariate Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...