Pesquisa | Portal Regional da BVS (teste)

Missing value imputation in a data matrix using the regularised singular value decomposition.

Arciniegas-Alarcón, Sergio; García-Peña, Marisol; Krzanowski, Wojtek J; Rengifo, Camilo.

MethodsX ; 11: 102289, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-37560402

RESUMO

Some statistical analysis techniques may require complete data matrices, but a frequent problem in the construction of databases is the incomplete collection of information for different reasons. One option to tackle the problem is to estimate and impute the missing data. This paper describes a form of imputation that mixes regression with lower rank approximations. To improve the quality of the imputations, a generalisation is proposed that replaces the singular value decomposition (SVD) of the matrix with a regularised SVD in which the regularisation parameter is estimated by cross-validation. To evaluate the performance of the proposal, ten sets of real data from multienvironment trials were used. Missing values were created in each set at four percentages of missing not at random, and three criteria were then considered to investigate the effectiveness of the proposal. The results show that the regularised method proves very competitive when compared to the original method, beating it in several of the considered scenarios. As it is a very general system, its application can be extended to all multivariate data matrices. â¢The imputation method is modified through the inclusion of a stable and efficient computational algorithm that replaces the classical SVD least squares criterion by a penalised criterion. This penalty produces smoothed eigenvectors and eigenvalues that avoid overfitting problems, improving the performance of the method when the penalty is necessary. The size of the penalty can be determined by minimising one of the following criteria: the prediction errors, the Procrustes similarity statistic or the critical angles between subspaces of principal components.

Missing value imputation using least squares techniques in contaminated matrices.

Garcia-Peña, Marisol; Arciniegas-Alarcón, Sergio; Krzanowski, Wojtek J.

MethodsX ; 9: 101683, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35478595

RESUMO

This paper describes strategies to reduce the possible effect of outliers on the quality of imputations produced by a method that uses a mixture of two least squares techniques: regression and lower rank approximation of a matrix. To avoid the influence of discrepant data and maintain the computational speed of the original scheme, pre-processing options were explored before applying the imputation method. The first proposal is to previously use a robust singular value decomposition, the second is to detect outliers and then treat the potential outliers as missing. To evaluate the proposed methods, a cross-validation study was carried out on ten complete matrices of real data from multi-environment trials. The imputations were compared with the original data using three statistics: a measure of goodness of fit, the squared cosine between matrices and the prediction error. The results show that the original method should be replaced by one of the options presented here because outliers can cause low quality imputations or convergence problems.â¢The imputation algorithm based on Gabriel's cross-validation method uses two least squares techniques that can be affected by the presence of outliers. The inclusion of a robust singular value decomposition allows both to robustify the procedure and to detect outliers and consider them later as missing. These forms of pre-processing ensure that the algorithm performs well on any dataset that has a matrix form with suspected contamination.

Deep learning for early detection of pathological changes in X-ray bone microstructures: case of osteoarthritis.

Jakaite, Livija; Schetinin, Vitaly; Hladuvka, Jirí; Minaev, Sergey; Ambia, Aziz; Krzanowski, Wojtek.

Sci Rep ; 11(1): 2294, 2021 01 27.

Artigo em Inglês | MEDLINE | ID: mdl-33504863

RESUMO

Texture features are designed to quantitatively evaluate patterns of spatial distribution of image pixels for purposes of image analysis and interpretation. Unexplained variations in the texture patterns often lead to misinterpretation and undesirable consequences in medical image analysis. In this paper we explore the ability of machine learning (ML) methods to design a radiology test of Osteoarthritis (OA) at early stage when the number of patients' cases is small. In our experiments we use high-resolution X-ray images of knees in patients which were identified with Kellgren-Lawrence scores progressing from 1. The existing ML methods have provided a limited diagnostic accuracy, whilst the proposed Group Method of Data Handling strategy of Deep Learning has significantly extended the diagnostic test. The comparative experiments demonstrate that the proposed framework using the Zernike-based texture features has significantly improved the diagnostic accuracy on average by 11%. This allows us to conclude that the designed model for early diagnostic of OA will provide more accurate radiology tests, although new study is required when a large number of patients' cases will be available.

Assuntos

Aprendizado Profundo , Aprendizado de Máquina , Osteoartrite do Joelho/patologia , Humanos , Matemática , Redes Neurais de Computação

Feature Extraction with GMDH-Type Neural Networks for EEG-Based Person Identification.

Schetinin, Vitaly; Jakaite, Livija; Nyah, Ndifreke; Novakovic, Dusica; Krzanowski, Wojtek.

Int J Neural Syst ; 28(6): 1750064, 2018 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-29370728

RESUMO

The brain activity observed on EEG electrodes is influenced by volume conduction and functional connectivity of a person performing a task. When the task is a biometric test the EEG signals represent the unique "brain print", which is defined by the functional connectivity that is represented by the interactions between electrodes, whilst the conduction components cause trivial correlations. Orthogonalization using autoregressive modeling minimizes the conduction components, and then the residuals are related to features correlated with the functional connectivity. However, the orthogonalization can be unreliable for high-dimensional EEG data. We have found that the dimensionality can be significantly reduced if the baselines required for estimating the residuals can be modeled by using relevant electrodes. In our approach, the required models are learnt by a Group Method of Data Handling (GMDH) algorithm which we have made capable of discovering reliable models from multidimensional EEG data. In our experiments on the EEG-MMI benchmark data which include 109 participants, the proposed method has correctly identified all the subjects and provided a statistically significant ([Formula: see text]) improvement of the identification accuracy. The experiments have shown that the proposed GMDH method can learn new features from multi-electrode EEG data, which are capable to improve the accuracy of biometric identification.

Assuntos

Identificação Biométrica/métodos , Encéfalo/fisiologia , Eletroencefalografia , Redes Neurais de Computação , Processamento de Sinais Assistido por Computador , Algoritmos , Orientação de Axônios/fisiologia , Mapeamento Encefálico , Humanos , Análise de Regressão

Bayesian Decision Trees for predicting survival of patients: a study on the US National Trauma Data Bank.

Schetinin, Vitaly; Jakaite, Livia; Jakaitis, Janis; Krzanowski, Wojtek.

Comput Methods Programs Biomed ; 111(3): 602-12, 2013 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-23849498

RESUMO

Trauma and Injury Severity Score (TRISS) models have been developed for predicting the survival probability of injured patients the majority of which obtain up to three injuries in six body regions. Practitioners have noted that the accuracy of TRISS predictions is unacceptable for patients with a larger number of injuries. Moreover, the TRISS method is incapable of providing accurate estimates of predictive density of survival, that are required for calculating confidence intervals. In this paper we propose Bayesian inference for estimating the desired predictive density. The inference is based on decision tree models which split data along explanatory variables, that makes these models interpretable. The proposed method has outperformed the TRISS method in terms of accuracy of prediction on the cases recorded in the US National Trauma Data Bank. The developed method has been made available for evaluation purposes as a stand-alone application.

Assuntos

Teorema de Bayes , Análise de Sobrevida , Ferimentos e Lesões/fisiopatologia , Fatores Etários , Algoritmos , Árvores de Decisões , Feminino , Humanos , Masculino , Probabilidade , Incerteza , Estados Unidos/epidemiologia , Ferimentos e Lesões/mortalidade

Confident interpretation of Bayesian decision tree ensembles for clinical applications.

Schetinin, Vitaly; Fieldsend, Jonathan E; Partridge, Derek; Coats, Timothy J; Krzanowski, Wojtek J; Everson, Richard M; Bailey, Trevor C; Hernandez, Adolfo.

IEEE Trans Inf Technol Biomed ; 11(3): 312-9, 2007 May.

Artigo em Inglês | MEDLINE | ID: mdl-17521081

RESUMO

Bayesian averaging (BA) over ensembles of decision models allows evaluation of the uncertainty of decisions that is of crucial importance for safety-critical applications such as medical diagnostics. The interpretability of the ensemble can also give useful information for experts responsible for making reliable decisions. For this reason, decision trees (DTs) are attractive decision models for experts. However, BA over such models makes an ensemble of DTs uninterpretable. In this paper, we present a new approach to probabilistic interpretation of Bayesian DT ensembles. This approach is based on the quantitative evaluation of uncertainty of the DTs, and allows experts to find a DT that provides a high predictive accuracy and confident outcomes. To make the BA over DTs feasible in our experiments, we use a Markov Chain Monte Carlo technique with a reversible jump extension. The results obtained from clinical data show that in terms of predictive accuracy, the proposed method outperforms the maximum a posteriori (MAP) method that has been suggested for interpretation of DT ensembles.

Assuntos

Algoritmos , Inteligência Artificial , Teorema de Bayes , Sistemas de Apoio a Decisões Clínicas , Técnicas de Apoio para a Decisão , Diagnóstico por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos , Método de Monte Carlo

Biclustering models for structured microarray data.

Turner, Heather L; Bailey, Trevor C; Krzanowski, Wojtek J; Hemingway, Cheryl A.

IEEE/ACM Trans Comput Biol Bioinform ; 2(4): 316-29, 2005.

Artigo em Inglês | MEDLINE | ID: mdl-17044169

RESUMO

Microarrays have become a standard tool for investigating gene function and more complex microarray experiments are increasingly being conducted. For example, an experiment may involve samples from several groups or may investigate changes in gene expression over time for several subjects, leading to large three-way data sets. In response to this increase in data complexity, we propose some extensions to the plaid model, a biclustering method developed for the analysis of gene expression data. This model-based method lends itself to the incorporation of any additional structure such as external grouping or repeated measures. We describe how the extended models may be fitted and illustrate their use on real data.

Assuntos

Análise por Conglomerados , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Simulação por Computador , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Humanos , Tuberculose Meníngea/genética , Tuberculose Pulmonar/complicações , Tuberculose Pulmonar/genética

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA