Pesquisa | Portal Regional da BVS

Enhancing the prediction of IDC breast cancer staging from gene expression profiles using hybrid feature selection methods and deep learning architecture.

Kishore, Akash; Venkataramana, Lokeswari; Prasad, D Venkata Vara; Mohan, Akshaya; Jha, Bhavya.

Med Biol Eng Comput ; 61(11): 2895-2919, 2023 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-37530887

RESUMO

Prediction of the stage of cancer plays an important role in planning the course of treatment and has been largely reliant on imaging tools which do not capture molecular events that cause cancer progression. Gene-expression data-based analyses are able to identify these events, allowing RNA-sequence and microarray cancer data to be used for cancer analyses. Breast cancer is the most common cancer worldwide, and is classified into four stages - stages 1, 2, 3, and 4 [2]. While machine learning models have previously been explored to perform stage classification with limited success, multi-class stage classification has not had significant progress. There is a need for improved multi-class classification models, such as by investigating deep learning models. Gene-expression-based cancer data is characterised by the small size of available datasets, class imbalance, and high dimensionality. Class balancing methods must be applied to the dataset. Since all the genes are not necessary for stage prediction, retaining only the necessary genes can improve classification accuracy. The breast cancer samples are to be classified into 4 classes of stages 1 to 4. Invasive ductal carcinoma breast cancer samples are obtained from The Cancer Genome Atlas (TCGA) and Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) datasets and combined. Two class balancing techniques are explored, synthetic minority oversampling technique (SMOTE) and SMOTE followed by random undersampling. A hybrid feature selection pipeline is proposed, with three pipelines explored involving combinations of filter and embedded feature selection methods: Pipeline 1 - minimum-redundancy maximum-relevancy (mRMR) and correlation feature selection (CFS), Pipeline 2 - mRMR, mutual information (MI) and CFS, and Pipeline 3 - mRMR and support vector machine-recursive feature elimination (SVM-RFE). The classification is done using deep learning models, namely deep neural network, convolutional neural network, recurrent neural network, a modified deep neural network, and an AutoKeras generated model. Classification performance post class-balancing and various feature selection techniques show marked improvement over classification prior to feature selection. The best multiclass classification was found to be by a deep neural network post SMOTE and random undersampling, and feature selection using mRMR and recursive feature elimination, with a Cohen-Kappa score of 0.303 and a classification accuracy of 53.1%. For binary classification into early and late-stage cancer, the best performance is obtained by a modified deep neural network (DNN) post SMOTE and random undersampling, and feature selection using mRMR and recursive feature elimination, with an accuracy of 81.0% and a Cohen-Kappa score (CKS) of 0.280. This pipeline also showed improved multiclass classification performance on neuroblastoma cancer data, with a best area under the receiver operating characteristic (auROC) curve score of 0.872, as compared to 0.71 obtained in previous work, an improvement of 22.81%. The results and analysis reveal that feature selection techniques play a vital role in gene-expression data-based classification, and the proposed hybrid feature selection pipeline improves classification performance. Multi-class classification is possible using deep learning models, though further improvement particularly in late-stage classification is necessary and should be explored further.

Assuntos

Neoplasias da Mama , Aprendizado Profundo , Humanos , Feminino , Neoplasias da Mama/genética , Transcriptoma , Estadiamento de Neoplasias , Perfilação da Expressão Gênica/métodos

Classification of COVID-19 from tuberculosis and pneumonia using deep learning techniques.

Venkataramana, Lokeswari; Prasad, D Venkata Vara; Saraswathi, S; Mithumary, C M; Karthikeyan, R; Monika, N.

Med Biol Eng Comput ; 60(9): 2681-2691, 2022 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-35834050

RESUMO

Deep learning provides the healthcare industry with the ability to analyse data at exceptional speeds without compromising on accuracy. These techniques are applicable to healthcare domain for accurate and timely prediction. Convolutional neural network is a class of deep learning methods which has become dominant in various computer vision tasks and is attracting interest across a variety of domains, including radiology. Lung diseases such as tuberculosis (TB), bacterial and viral pneumonias, and COVID-19 are not predicted accurately due to availability of very few samples for either of the lung diseases. The disease could be easily diagnosed using X-ray or CT scan images. But the number of images available for each of the disease is not as equally as other resulting in imbalance nature of input data. Conventional supervised machine learning methods do not achieve higher accuracy when trained using a lesser amount of COVID-19 data samples. Image data augmentation is a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images in the dataset. Data augmentation helped reduce overfitting when training a deep neural network. The SMOTE (Synthetic Minority Oversampling Technique) algorithm is used for the purpose of balancing the classes. The novelty in this research work is to apply combined data augmentation and class balance techniques before classification of tuberculosis, pneumonia, and COVID-19. The classification accuracy obtained with the proposed multi-level classification after training the model is recorded as 97.4% for TB and pneumonia and 88% for bacterial, viral, and COVID-19 classifications. The proposed multi-level classification method produced is ~8 to ~10% improvement in classification accuracy when compared with the existing methods in this area of research. The results reveal the fact that the proposed system is scalable to growing medical data and classifies lung diseases and its sub-types in less time with higher accuracy.

Assuntos

COVID-19 , Aprendizado Profundo , Pneumopatias , Pneumonia Viral , Tuberculose , Humanos , Pneumonia Viral/diagnóstico por imagem

Analysis and prediction of water quality using deep learning and auto deep learning techniques.

Prasad, D Venkata Vara; Venkataramana, Lokeswari Y; Kumar, P Senthil; Prasannamedha, G; Harshana, S; Srividya, S Jahnavi; Harrinei, K; Indraganti, Sravya.

Sci Total Environ ; 821: 153311, 2022 May 15.

Artigo em Inglês | MEDLINE | ID: mdl-35065104

RESUMO

Natural water sources like ponds, lakes and rivers are facing a great threat because of activities like discharge of untreated industrial effluents, sewage water, wastes, etc. It is mandatory to examine the water quality to ensure that only safe water is available for consumption. Traditional methods of water quality inspection are a cumbersome process and hence, Artificial Intelligence (AI) can be used as a catalyst for this process. AutoDL is an upcoming field to automate deep learning pipelines and enables model creation and interpretation with minimal code. However, it is still in the nascent stage. This work explores the suitability of adopting AutoDL for Water Quality Assessment by drawing a comparison between AutoDL and a conventional models and analysis to foresee the quality of the water, an appropriate class based on Water Quality Index segregating water bodies into different classes. The accuracy of conventional DL is 1.8% higher than that of AutoDL for binary class water data. The accuracy of conventional DL is 1% higher than that of AutoDL for multiclass water data. The accuracy of conventional model was ~98% to ~99% whereas AutoDL method yielded ~96% to ~98%. However, the AutoDL model ease the task of finding the appropriate DL model and proved better efficiency without manual intervention.

Assuntos

Aprendizado Profundo , Qualidade da Água , Inteligência Artificial , Rios

Automating water quality analysis using ML and auto ML techniques.

Venkata Vara Prasad, D; Senthil Kumar, P; Venkataramana, Lokeswari Y; Prasannamedha, G; Harshana, S; Jahnavi Srividya, S; Harrinei, K; Indraganti, Sravya.

Environ Res ; 202: 111720, 2021 11.

Artigo em Inglês | MEDLINE | ID: mdl-34297938

RESUMO

Generation of unprocessed effluents, municipal refuse, factory wastes, junking of compostable and non-compostable effluents has hugely contaminated nature-provided water bodies like rivers, lakes and ponds. Therefore, there is a necessity to look into the water standards before the usage. This is a problem that can greatly benefit from Artificial Intelligence (AI). Traditional methods require human inspection and is time consuming. Automatic Machine Learning (AutoML) facilities supply machine learning with push of a button, or, on a minimum level, ensure to retain algorithm execution, data pipelines, and code, generally, are kept from sight and are anticipated to be the stepping stone for normalising AI. However, it is still a field under research. This work aims to recognize the areas where an AutoML system falls short or outperforms a traditional expert system built by data scientists. Keeping this as the motive, this work dives into the Machine Learning (ML) algorithms for comparing AutoML and an expert architecture built by the authors for Water Quality Assessment to evaluate the Water Quality Index, which gives the general water quality, and the Water Quality Class, a term classified on the basis of the Water Quality Index. The results prove that the accuracy of AutoML and TPOT was 1.4 % higher than conventional ML techniques for binary class water data. For Multi class water data, AutoML was 0.5 % higher and TPOT was 0.6% higher than conventional ML techniques.

Assuntos

Inteligência Artificial , Qualidade da Água , Algoritmos , Análise de Alimentos , Humanos , Aprendizado de Máquina

Improving classification accuracy of cancer types using parallel hybrid feature selection on microarray gene expression data.

Venkataramana, Lokeswari; Jacob, Shomona Gracia; Ramadoss, Rajavel; Saisuma, Dodda; Haritha, Dommaraju; Manoja, Kunthipuram.

Genes Genomics ; 41(11): 1301-1313, 2019 11.

Artigo em Inglês | MEDLINE | ID: mdl-31429008

RESUMO

BACKGROUND: Data mining techniques are used to mine unknown knowledge from huge data. Microarray gene expression (MGE) data plays a major role in predicting type of cancer. But as MGE data is huge in volume, applying traditional data mining approaches is time consuming. Hence parallel programming frameworks like Hadoop, Spark and Mahout are necessary to ease the task of computation. OBJECTIVE: Not all the gene expressions are necessary in prediction, it is very essential to select important genes for improving classification accuracy. So feature selection algorithms are parallelized and executed on Spark framework to eliminate unnecessary genes and identify only predictive genes in very less time without affecting prediction accuracy. METHODS: Parallelized hybrid feature selection (HFS) method is proposed to serve the purpose. This method includes parallelized correlation feature subset selection followed by rank-based feature selection methods. The selected subset of genes is evaluated using parallel classification algorithms. The accuracy values obtained are compared with existing rank-weight feature selection, parallelized recursive feature selection methods and also with the values obtained by executing parallelized HFS on DistributedWekaSpark. RESULTS: The classification accuracy obtained with the proposed parallelized HFS method is 97% and 79% for gastric cancer and childhood leukemia respectively. The proposed parallelized HFS method produced ~ 4% to ~ 15% improvement in classification accuracy when compared with previous methods. CONCLUSION: The results reveal the fact that the proposed parallelized feature selection algorithm is scalable to growing medical data and predicts cancer sub-types in lesser time with higher accuracy.

Assuntos

Algoritmos , Biomarcadores Tumorais/genética , Leucemia Mieloide Aguda/classificação , Análise em Microsséries/métodos , Neoplasias Gástricas/classificação , Biomarcadores Tumorais/metabolismo , Biomarcadores Tumorais/normas , Humanos , Leucemia Mieloide Aguda/genética , Leucemia Mieloide Aguda/patologia , Análise em Microsséries/normas , Técnicas de Diagnóstico Molecular/métodos , Técnicas de Diagnóstico Molecular/normas , Sensibilidade e Especificidade , Neoplasias Gástricas/genética , Neoplasias Gástricas/patologia

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA