Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Commun Chem ; 6(1): 139, 2023 Jul 04.
Artigo em Inglês | MEDLINE | ID: mdl-37402835

RESUMO

The collision cross section (CCS) values derived from ion mobility spectrometry can be used to improve the accuracy of compound identification. Here, we have developed the Structure included graph merging with adduct method for CCS prediction (SigmaCCS) based on graph neural networks using 3D conformers as inputs. A model was trained, evaluated, and tested with >5,000 experimental CCS values. It achieved a coefficient of determination of 0.9945 and a median relative error of 1.1751% on the test set. The model-agnostic interpretation method and the visualization of the learned representations were used to investigate the chemical rationality of SigmaCCS. An in-silico database with 282 million CCS values was generated for three different adduct types of 94 million compounds. Its source code is publicly available at https://github.com/zmzhang/SigmaCCS . Altogether, SigmaCCS is an accurate, rational, and off-the-shelf method to directly predict CCS values from molecular structures.

2.
Nat Commun ; 14(1): 3722, 2023 Jun 22.
Artigo em Inglês | MEDLINE | ID: mdl-37349295

RESUMO

Spectrum matching is the most common method for compound identification in mass spectrometry (MS). However, some challenges limit its efficiency, including the coverage of spectral libraries, the accuracy, and the speed of matching. In this study, a million-scale in-silico EI-MS library is established. Furthermore, an ultra-fast and accurate spectrum matching (FastEI) method is proposed to substantially improve accuracy using Word2vec spectral embedding and boost the speed using the hierarchical navigable small-world graph (HNSW). It achieves 80.4% recall@10 accuracy (88.3% with 5 Da mass filter) with a speedup of two orders of magnitude compared with the weighted cosine similarity method (WCS). When FastEI is applied to identify the molecules beyond NIST 2017 library, it achieves 50% recall@1 accuracy. FastEI is packaged as a standalone and user-friendly software for common users with limited computational backgrounds. Overall, FastEI combined with a million-scale in-silico library facilitates compound identification as an accurate and ultra-fast tool.


Assuntos
Algoritmos , Elétrons , Espectrometria de Massas , Software , Biblioteca Gênica
3.
Anal Chem ; 95(11): 4863-4870, 2023 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-36908216

RESUMO

Raman spectroscopy has been widely used to provide the structural fingerprint for molecular identification. Due to interference from coexisting components, noise, baseline, and systematic differences between spectrometers, component identification with Raman spectra is challenging, especially for mixtures. In this study, a method entitled DeepRaman has been proposed to solve those problems by combining the comparison ability of a pseudo-Siamese neural network (pSNN) and the input-shape flexibility of spatial pyramid pooling (SPP). DeepRaman was trained, validated, and tested with 41,564 augmented Raman spectra from two databases (pharmaceutical material and S.T. Japan). It can achieve 96.29% accuracy, 98.40% true positive rate (TPR), and 94.36% true negative rate (TNR) on the test set. Another six data sets measured on different instruments were used to evaluate the performance of the proposed method from different aspects. DeepRaman can provide accurate identification results and significantly outperform the hit quality index (HQI) method and other deep learning models. In addition, it performs well in cases of different spectral complexity and low-content components. Once the model is established, it can be used directly on different data sets without retraining or transfer learning. Furthermore, it also obtains promising results for the analysis of surface-enhanced Raman spectroscopy (SERS) data sets and Raman imaging data sets. In summary, it is an accurate, universal, and ready-to-use method for component identification in various application scenarios.

4.
Anal Chem ; 95(2): 612-620, 2023 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-36597722

RESUMO

Region of interest (ROI) extraction is a fundamental step in analyzing metabolomic datasets acquired by liquid chromatography-mass spectrometry (LC-MS). However, noises and backgrounds in LC-MS data often affect the quality of extracted ROIs. Therefore, developing effective ROI evaluation algorithms is necessary to eliminate false positives meanwhile keep the false-negative rate as low as possible. In this study, a deep fused filter of ROIs (dffROI) was proposed to improve the accuracy of ROI extraction by combining the handcrafted evaluation metrics with convolutional neural network (CNN)-learned representations. To evaluate the performance of dffROI, dffROI was compared with peakonly (CNN-learned representation) and five handcrafted metrics on three LC-MS datasets and a gas chromatography-mass spectrometry (GC-MS) dataset. Results show that dffROI can achieve higher accuracy, better true-positive rate, and lower false-positive rate. Its accuracy, true-positive rate, and false-positive rate are 0.9841, 0.9869, and 0.0186 on the test set, respectively. The classification error rate of dffROI (1.59%) is significantly reduced compared with peakonly (2.73%). The model-agnostic feature importance demonstrates the necessity of fusing handcrafted evaluation metrics with the convolutional neural network representations. dffROI is an automatic, robust, and universal method for ROI filtering by virtue of information fusion and end-to-end learning. It is implemented in Python programming language and open-sourced at https://github.com/zhanghailiangcsu/dffROI under BSD License. Furthermore, it has been integrated into the KPIC2 framework previously proposed by our group to facilitate real metabolomic LC-MS dataset analysis.


Assuntos
Redes Neurais de Computação , Espectrometria de Massas em Tandem , Cromatografia Líquida , Algoritmos , Cromatografia Gasosa-Espectrometria de Massas
5.
Talanta ; 244: 123415, 2022 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-35358897

RESUMO

DeepResolution (Deep learning-assisted multivariate curve Resolution) has been proposed to solve the co-eluting problem for GC-MS data. However, DeepResolution models must be retrained when encountering unknown components, which is undoubtedly time-consuming and burdensome. In this study, a new pipeline named DeepResoution2 was proposed to overcome these limitations. DeepResolution2 utilizes deep neural networks to divide the profile into segments, estimate the number of components in each segment, and predict the elution region of each component. Subsequently, the information obtained by these deep learning models is used to assist the multivariate curve resolution procedure. Only seven models (1 + 1 + 5) are required to automate the whole analysis procedure of untargeted GC-MS data, which is an important improvement over DeepResolution. These seven models are stable and universal. Once established, they can be used to resolve most GC-MS data. Compared with MS-DIAL, ADAP-GC, and AMDIS, DeepResolution2 can obtain more reasonable mass spectra, chromatograms and peak areas to identify and quantify compounds. DeepResoution2 (0.955) outperformed AMDIS (0.939), MS-DIAL (0.948) and ADAP-GC (0.860) in terms of the linear correlation between concentrations and peak areas on overlapped peaks in fatty acid dataset. In real biological samples of human male infertility plasma, the peak areas and mass spectra of 136 untargeted GC-MS files were automatically extracted by DeepResolution2 without any prior information and manual intervention. DeepResolution2 includes all the functions for analyzing untargeted GC-MS datasets from the feature extraction of raw data files to the establishment of discriminant models.


Assuntos
Aprendizado Profundo , Ácidos Graxos , Cromatografia Gasosa-Espectrometria de Massas/métodos , Humanos , Masculino , Espectrometria de Massas , Redes Neurais de Computação
6.
J Chromatogr A ; 1656: 462536, 2021 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-34563892

RESUMO

The combination of retention time (RT), accurate mass and tandem mass spectra can improve the structural annotation in untargeted metabolomics. However, the incorporation of RT for metabolite identification has received less attention because of the limitation of available RT data, especially for hydrophilic interaction liquid chromatography (HILIC). Here, the Graph Neural Network-based Transfer Learning (GNN-TL) is proposed to train a model for HILIC RTs prediction. The graph neural network was pre-trained using an in silico HILIC RT dataset (pseudo-labeling dataset) with ∼306 K molecules. Then, the weights of dense layers in the pre-trained GNN (pre-GNN) model were fine-tuned by transfer learning using a small number of experimental HILIC RTs from the target chromatographic system. The GNN-TL outperformed the methods in Retip, including the Random Forest (RF), Bayesian-regularized neural network (BRNN), XGBoost, light gradient-boosting machine (LightGBM), and Keras. It achieved the lowest mean absolute error (MAE) of 38.6 s on the test set and 33.4 s on an additional test set. It has the best ability to generalize with a small performance difference between training, test, and additional test sets. Furthermore, the predicted RTs can filter out nearly 60% false positive candidates on average, which is valuable for the identification of compounds complementary to mass spectrometry.


Assuntos
Redes Neurais de Computação , Espectrometria de Massas em Tandem , Teorema de Bayes , Cromatografia Líquida , Interações Hidrofóbicas e Hidrofílicas , Aprendizado de Máquina
7.
J Chromatogr A ; 1635: 461713, 2021 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-33229011

RESUMO

Gas chromatography-mass spectrometry (GC-MS) is one of the major platforms for analyzing volatile compounds in complex samples. However, automatic and accurate extraction of qualitative and quantitative information is still challenging when analyzing complex GC-MS data, especially for the components incompletely separated by chromatography. Deep-Learning-Assisted Multivariate Curve Resolution (DeepResolution) was proposed in this study. It essentially consists of convolutional neural networks (CNN) models to determine the number of components of each overlapped peak and the elution region of each compound. With the assistance of the predicted elution regions, the informative regions (such as selective region and zero-concentration region) of each compound can be located precisely. Then, full rank resolution (FRR), multivariate curve resolution-alternating least squares (MCR-ALS) or iterative target transformation factor analysis (ITTFA) can be chosen adaptively to resolve the overlapped components without manual intervention. The results showed that DeepResolution has superior compound identification capability and better quantitative performances when comparing with MS-DIAL, ADAP-GC and AMDIS. It was also found that baseline levels, interferents, component concentrations and peak tailing have little influences on resolution result. Besides, DeepResolution can be extended easily when encountering unknown component(s), due to the independence of each CNN model. All procedures of DeepResolution can be performed automatically, and adaptive selection of resolution methods ensures the balance between resolution power and consumed time. It is implemented in Python and available at https://github.com/XiaqiongFan/DeepResolution.


Assuntos
Aprendizado Profundo , Cromatografia Gasosa-Espectrometria de Massas/métodos , Análise Multivariada , Análise dos Mínimos Quadrados , Redes Neurais de Computação
8.
Analyst ; 144(5): 1789-1798, 2019 Feb 25.
Artigo em Inglês | MEDLINE | ID: mdl-30672931

RESUMO

Raman spectroscopy is widely used as a fingerprint technique for molecular identification. However, Raman spectra contain molecular information from multiple components and interferences from noise and instrumentation. Thus, component identification using Raman spectra is still challenging, especially for mixtures. In this study, a novel approach entitled deep learning-based component identification (DeepCID) was proposed to solve this problem. Convolution neural network (CNN) models were established to predict the presence of components in mixtures. Comparative studies showed that DeepCID could learn spectral features and identify components in both simulated and real Raman spectral datasets of mixtures with higher accuracy and significantly lower false positive rates. In addition, DeepCID showed better sensitivity when compared with the logistic regression (LR) with L1-regularization, k-nearest neighbor (kNN), random forest (RF) and back propagation artificial neural network (BP-ANN) models for ternary mixture spectral datasets. In conclusion, DeepCID is a promising method for solving the component identification problem in the Raman spectra of mixtures.

9.
J Sep Sci ; 39(17): 3457-68, 2016 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-27384131

RESUMO

Xue Fu Zhu Yu Decoction, a famous formula that has been used for treating many blood stasis-caused diseases for many centuries, comprises 11 kinds of traditional Chinese medicines. A convenient, efficient, and rapid analytical method was developed to simultaneously determine the major compounds in this decoction. An ultra-high performance liquid chromatography with hybrid ion trap time-of-flight mass spectrometry method was used to rapidly separate and detect the major constituents of the decoction. Using this technique, we identified or tentatively identified 34 compounds, including 21 flavonoids, 5 terpenoids, 3 organic acids, 2 lactones, 1 alkaloid, 1 amino acid, and 1 cyanogenic glycoside. The MS analysis of these constituents was described in detail. Findings may contribute to future metabolic and pharmacokinetic studies of this medicine.


Assuntos
Cromatografia Líquida de Alta Pressão/métodos , Medicamentos de Ervas Chinesas/química , Espectrometria de Massas/métodos , Plantas Medicinais/química , Medicina Tradicional Chinesa
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...