Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
PeerJ Comput Sci ; 10: e2073, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38855250

RESUMO

Metabolomics data has high-dimensional features and a small sample size, which is typical of high-dimensional small sample (HDSS) data. Too high a dimensionality leads to the curse of dimensionality, and too small a sample size tends to trigger overfitting, which poses a challenge to deeper mining in metabolomics. Feature selection is a valuable technique for effectively handling the challenges HDSS data poses. For the feature selection problem of HDSS data in metabolomics, a hybrid Max-Relevance and Min-Redundancy (mRMR) and multi-objective particle swarm feature selection method (MCMOPSO) is proposed. Experimental results using metabolomics data and various University of California, Irvine (UCI) public datasets demonstrate the effectiveness of MCMOPSO in selecting feature subsets with a limited number of high-quality features. MCMOPSO achieves this by efficiently eliminating irrelevant and redundant features, showcasing its efficacy. Therefore, MCMOPSO is a powerful approach for selecting features from high-dimensional metabolomics data with limited sample sizes.

2.
Sci Rep ; 14(1): 152, 2024 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-38168582

RESUMO

In the field of data analysis, it is often faced with a large number of missing values, especially in metabolomics data, this problem is more prominent. Data imputation is a common method to deal with missing metabolomics data, while traditional data imputation methods usually ignore the differences in missing types, and thus the results of data imputation are not satisfactory. In order to discriminate the missing types of metabolomics data, a missing data classification model (PX-MDC) based on particle swarm algorithm and XGBoost is proposed in this paper. First, the missing values in a given missing data set are obtained by panning the missing values to obtain the largest subset of complete data, and then the particle swarm algorithm is used to search for the concentration threshold of missing data and the proportion of low concentration deletions as a percentage of overall deletions. Next, the missing data are simulated based on the search results. Finally, the training data are trained using the XGBoost model using the feature set proposed in this paper in order to build a classifier for the missing data. The experimental results show that the particle swarm algorithm is able to match the traditional enumeration method in terms of accuracy and significantly reduce the search time in concentration threshold search. Compared with the current mainstream methods, the PX-MDC model designed in this paper exhibits higher accuracy and is able to distinguish different deletion types for the same metabolite. This study is expected to make an important breakthrough in metabolomics data imputation and provide strong support for research in related fields.


Assuntos
Algoritmos , Metabolômica , Metabolômica/métodos
3.
Math Biosci Eng ; 20(8): 14395-14413, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37679141

RESUMO

A dose-effect relationship analysis of traditional Chinese Medicine (TCM) is crucial to the modernization of TCM. However, due to the complex and nonlinear nature of TCM data, such as multicollinearity, it can be challenging to conduct a dose-effect relationship analysis. Partial least squares can be applied to multicollinearity data, but its internally extracted principal components cannot adequately express the nonlinear characteristics of TCM data. To address this issue, this paper proposes an analytical model based on a deep Boltzmann machine (DBM) and partial least squares. The model uses the DBM to extract nonlinear features from the feature space, replaces the components in partial least squares, and performs a multiple linear regression. Ultimately, this model is suitable for analyzing the dose-effect relationship of TCM. The model was evaluated using experimental data from Ma Xing Shi Gan Decoction and datasets from the UCI Machine Learning Repository. The experimental results demonstrate that the prediction accuracy of the model based on the DBM and partial least squares method is on average 10% higher than that of existing methods.


Assuntos
Aprendizado de Máquina , Medicina Tradicional Chinesa , Análise dos Mínimos Quadrados , Modelos Lineares
4.
Artigo em Inglês | MEDLINE | ID: mdl-34880918

RESUMO

The text similarity calculation plays a crucial role as the core work of artificial intelligence commercial applications such as traditional Chinese medicine (TCM) auxiliary diagnosis, intelligent question and answer, and prescription recommendation. However, TCM texts have problems such as short sentence expression, inaccurate word segmentation, strong semantic relevance, high feature dimension, and sparseness. This study comprehensively considers the temporal information of sentence context and proposes a TCM text similarity calculation model based on the bidirectional temporal Siamese network (BTSN). We used the enhanced representation through knowledge integration (ERNIE) pretrained language model to train character vectors instead of word vectors and solved the problem of inaccurate word segmentation in TCM. In the Siamese network, the traditional fully connected neural network was replaced by a deep bidirectional long short-term memory (BLSTM) to capture the contextual semantics of the current word information. The improved similarity BLSTM was used to map the sentence that is to be tested into two sets of low-dimensional numerical vectors. Then, we performed similarity calculation training. Experiments on the two datasets of financial and TCM show that the performance of the BTSN model in this study was better than that of other similarity calculation models. When the number of layers of the BLSTM reached 6 layers, the accuracy of the model was the highest. This verifies that the text similarity calculation model proposed in this study has high engineering value.

5.
Comput Math Methods Med ; 2020: 8308173, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32328156

RESUMO

The basic experimental data of traditional Chinese medicine are generally obtained by high-performance liquid chromatography and mass spectrometry. The data often show the characteristics of high dimensionality and few samples, and there are many irrelevant features and redundant features in the data, which bring challenges to the in-depth exploration of Chinese medicine material information. A hybrid feature selection method based on iterative approximate Markov blanket (CI_AMB) is proposed in the paper. The method uses the maximum information coefficient to measure the correlation between features and target variables and achieves the purpose of filtering irrelevant features according to the evaluation criteria, firstly. The iterative approximation Markov blanket strategy analyzes the redundancy between features and implements the elimination of redundant features and then selects an effective feature subset finally. Comparative experiments using traditional Chinese medicine material basic experimental data and UCI's multiple public datasets show that the new method has a better advantage to select a small number of highly explanatory features, compared with Lasso, XGBoost, and the classic approximate Markov blanket method.


Assuntos
Bases de Dados de Produtos Farmacêuticos/estatística & dados numéricos , Medicamentos de Ervas Chinesas/química , Reconhecimento Automatizado de Padrão/estatística & dados numéricos , Algoritmos , Inteligência Artificial , Cromatografia Líquida de Alta Pressão , Biologia Computacional , Humanos , Cadeias de Markov , Espectrometria de Massas , Medicina Tradicional Chinesa/estatística & dados numéricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...