Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 30
Filter
1.
Sci Rep ; 14(1): 13126, 2024 06 07.
Article in English | MEDLINE | ID: mdl-38849422

ABSTRACT

In human-computer interaction systems, speech emotion recognition (SER) plays a crucial role because it enables computers to understand and react to users' emotions. In the past, SER has significantly emphasised acoustic properties extracted from speech signals. The use of visual signals for enhancing SER performance, however, has been made possible by recent developments in deep learning and computer vision. This work utilizes a lightweight Vision Transformer (ViT) model to propose a novel method for improving speech emotion recognition. We leverage the ViT model's capabilities to capture spatial dependencies and high-level features in images which are adequate indicators of emotional states from mel spectrogram input fed into the model. To determine the efficiency of our proposed approach, we conduct a comprehensive experiment on two benchmark speech emotion datasets, the Toronto English Speech Set (TESS) and the Berlin Emotional Database (EMODB). The results of our extensive experiment demonstrate a considerable improvement in speech emotion recognition accuracy attesting to its generalizability as it achieved 98%, 91%, and 93% (TESS-EMODB) accuracy respectively on the datasets. The outcomes of the comparative experiment show that the non-overlapping patch-based feature extraction method substantially improves the discipline of speech emotion recognition. Our research indicates the potential for integrating vision transformer models into SER systems, opening up fresh opportunities for real-world applications requiring accurate emotion recognition from speech compared with other state-of-the-art techniques.


Subject(s)
Emotions , Humans , Emotions/physiology , Speech/physiology , Deep Learning , Speech Recognition Software , Databases, Factual , Algorithms
2.
Front Big Data ; 7: 1366312, 2024.
Article in English | MEDLINE | ID: mdl-38590699

ABSTRACT

Background: Melanoma is one of the deadliest skin cancers that originate from melanocytes due to sun exposure, causing mutations. Early detection boosts the cure rate to 90%, but misclassification drops survival to 15-20%. Clinical variations challenge dermatologists in distinguishing benign nevi and melanomas. Current diagnostic methods, including visual analysis and dermoscopy, have limitations, emphasizing the need for Artificial Intelligence understanding in dermatology. Objectives: In this paper, we aim to explore dermoscopic structures for the classification of melanoma lesions. The training of AI models faces a challenge known as brittleness, where small changes in input images impact the classification. A study explored AI vulnerability in discerning melanoma from benign lesions using features of size, color, and shape. Tests with artificial and natural variations revealed a notable decline in accuracy, emphasizing the necessity for additional information, such as dermoscopic structures. Methodology: The study utilizes datasets with clinically marked dermoscopic images examined by expert clinicians. Transformers and CNN-based models are employed to classify these images based on dermoscopic structures. Classification results are validated using feature visualization. To assess model susceptibility to image variations, classifiers are evaluated on test sets with original, duplicated, and digitally modified images. Additionally, testing is done on ISIC 2016 images. The study focuses on three dermoscopic structures crucial for melanoma detection: Blue-white veil, dots/globules, and streaks. Results: In evaluating model performance, adding convolutions to Vision Transformers proves highly effective for achieving up to 98% accuracy. CNN architectures like VGG-16 and DenseNet-121 reach 50-60% accuracy, performing best with features other than dermoscopic structures. Vision Transformers without convolutions exhibit reduced accuracy on diverse test sets, revealing their brittleness. OpenAI Clip, a pre-trained model, consistently performs well across various test sets. To address brittleness, a mitigation method involving extensive data augmentation during training and 23 transformed duplicates during test time, sustains accuracy. Conclusions: This paper proposes a melanoma classification scheme utilizing three dermoscopic structures across Ph2 and Derm7pt datasets. The study addresses AI susceptibility to image variations. Despite a small dataset, future work suggests collecting more annotated datasets and automatic computation of dermoscopic structural features.

4.
Front Artif Intell ; 6: 1232640, 2023.
Article in English | MEDLINE | ID: mdl-37876961

ABSTRACT

Ensemble learning aims to improve prediction performance by combining several models or forecasts. However, how much and which ensemble learning techniques are useful in deep learning-based pipelines for pancreas computed tomography (CT) image classification is a challenge. Ensemble approaches are the most advanced solution to many machine learning problems. These techniques entail training multiple models and combining their predictions to improve the predictive performance of a single model. This article introduces the idea of Stacked Ensemble Deep Learning (SEDL), a pipeline for classifying pancreas CT medical images. The weak learners are Inception V3, VGG16, and ResNet34, and we employed a stacking ensemble. By combining the first-level predictions, an input train set for XGBoost, the ensemble model at the second level of prediction, is created. Extreme Gradient Boosting (XGBoost), employed as a strong learner, will make the final classification. Our findings showed that SEDL performed better, with a 98.8% ensemble accuracy, after some adjustments to the hyperparameters. The Cancer Imaging Archive (TCIA) public access dataset consists of 80 pancreas CT scans with a resolution of 512 * 512 pixels, from 53 male and 27 female subjects. A sample of two hundred and twenty-two images was used for training and testing data. We concluded that implementing the SEDL technique is an effective way to strengthen the robustness and increase the performance of the pipeline for classifying pancreas CT medical images. Interestingly, grouping like-minded or talented learners does not make a difference.

5.
Front Artif Intell ; 6: 1227950, 2023.
Article in English | MEDLINE | ID: mdl-37818427

ABSTRACT

The devastating effect of plant disease infestation on crop production poses a significant threat to the attainment of the United Nations' Sustainable Development Goal 2 (SDG2) of food security, especially in Sub-Saharan Africa. This has been further exacerbated by the lack of effective and accessible plant disease detection technologies. Farmers' inability to quickly and accurately diagnose plant diseases leads to crop destruction and reduced productivity. The diverse range of existing plant diseases further complicates detection for farmers without the right technologies, hindering efforts to combat food insecurity in the region. This study presents a web-based plant diagnosis application, referred to as mobile-enabled Plant Diagnosis-Application (mPD-App). First, a publicly available image dataset, containing a diverse range of plant diseases, was acquired from Kaggle for the purpose of training the detection system. The image dataset was, then, made to undergo the preprocessing stage which included processes such as image-to-array conversion, image reshaping, and data augmentation. The training phase leverages the vast computational ability of the convolutional neural network (CNN) to effectively classify image datasets. The CNN model architecture featured six convolutional layers (including the fully connected layer) with phases, such as normalization layer, rectified linear unit (RELU), max pooling layer, and dropout layer. The training process was carefully managed to prevent underfitting and overfitting of the model, ensuring accurate predictions. The mPD-App demonstrated excellent performance in diagnosing plant diseases, achieving an overall accuracy of 93.91%. The model was able to classify 14 different types of plant diseases with high precision and recall values. The ROC curve showed a promising area under the curve (AUC) value of 0.946, indicating the model's reliability in detecting diseases. The web-based mPD-App offers a valuable tool for farmers and agricultural stakeholders in Sub-Saharan Africa, to detect and diagnose plant diseases effectively and efficiently. To further improve the application's performance, ongoing efforts should focus on expanding the dataset and refining the model's architecture. Agricultural authorities and policymakers should consider promoting and integrating such technologies into existing agricultural extension services to maximize their impact and benefit the farming community.

6.
J Imaging ; 9(7)2023 Jul 07.
Article in English | MEDLINE | ID: mdl-37504815

ABSTRACT

The prognosis of patients with pancreatic ductal adenocarcinoma (PDAC) is greatly improved by an early and accurate diagnosis. Several studies have created automated methods to forecast PDAC development utilising various medical imaging modalities. These papers give a general overview of the classification, segmentation, or grading of many cancer types utilising conventional machine learning techniques and hand-engineered characteristics, including pancreatic cancer. This study uses cutting-edge deep learning techniques to identify PDAC utilising computerised tomography (CT) medical imaging modalities. This work suggests that the hybrid model VGG16-XGBoost (VGG16-backbone feature extractor and Extreme Gradient Boosting-classifier) for PDAC images. According to studies, the proposed hybrid model performs better, obtaining an accuracy of 0.97 and a weighted F1 score of 0.97 for the dataset under study. The experimental validation of the VGG16-XGBoost model uses the Cancer Imaging Archive (TCIA) public access dataset, which has pancreas CT images. The results of this study can be extremely helpful for PDAC diagnosis from computerised tomography (CT) pancreas images, categorising them into five different tumours (T), node (N), and metastases (M) (TNM) staging system class labels, which are T0, T1, T2, T3, and T4.

7.
Sensors (Basel) ; 23(13)2023 Jun 23.
Article in English | MEDLINE | ID: mdl-37447699

ABSTRACT

Introduction: Object detection in remotely sensed satellite images is critical to socio-economic, bio-physical, and environmental monitoring, necessary for the prevention of natural disasters such as flooding and fires, socio-economic service delivery, and general urban and rural planning and management. Whereas deep learning approaches have recently gained popularity in remotely sensed image analysis, they have been unable to efficiently detect image objects due to complex landscape heterogeneity, high inter-class similarity and intra-class diversity, and difficulty in acquiring suitable training data that represents the complexities, among others. Methods: To address these challenges, this study employed multi-object detection deep learning algorithms with a transfer learning approach on remotely sensed satellite imagery captured on a heterogeneous landscape. In the study, a new dataset of diverse features with five object classes collected from Google Earth Engine in various locations in southern KwaZulu-Natal province in South Africa was used to evaluate the models. The dataset images were characterized with objects that have varying sizes and resolutions. Five (5) object detection methods based on R-CNN and YOLO architectures were investigated via experiments on our newly created dataset. Conclusions: This paper provides a comprehensive performance evaluation and analysis of the recent deep learning-based object detection methods for detecting objects in high-resolution remote sensing satellite images. The models were also evaluated on two publicly available datasets: Visdron and PASCAL VOC2007. Results showed that the highest detection accuracy of the vegetation and swimming pool instances was more than 90%, and the fastest detection speed 0.2 ms was observed in YOLOv8.


Subject(s)
Deep Learning , Remote Sensing Technology/methods , South Africa , Algorithms , Satellite Imagery
8.
Sci Rep ; 13(1): 11990, 2023 07 25.
Article in English | MEDLINE | ID: mdl-37491423

ABSTRACT

Speech emotion classification (SEC) has gained the utmost height and occupied a conspicuous position within the research community in recent times. Its vital role in Human-Computer Interaction (HCI) and affective computing cannot be overemphasized. Many primitive algorithmic solutions and deep neural network (DNN) models have been proposed for efficient recognition of emotion from speech however, the suitability of these methods to accurately classify emotion from speech with multi-lingual background and other factors that impede efficient classification of emotion is still demanding critical consideration. This study proposed an attention-based network with a pre-trained convolutional neural network and regularized neighbourhood component analysis (RNCA) feature selection techniques for improved classification of speech emotion. The attention model has proven to be successful in many sequence-based and time-series tasks. An extensive experiment was carried out using three major classifiers (SVM, MLP and Random Forest) on a publicly available TESS (Toronto English Speech Sentence) dataset. The result of our proposed model (Attention-based DCNN+RNCA+RF) achieved 97.8% classification accuracy and yielded a 3.27% improved performance, which outperforms state-of-the-art SEC approaches. Our model evaluation revealed the consistency of attention mechanism and feature selection with human behavioural patterns in classifying emotion from auditory speech.


Subject(s)
Emotions , Speech , Humans , Habits , Language , Neural Networks, Computer
9.
J Imaging ; 9(4)2023 Apr 18.
Article in English | MEDLINE | ID: mdl-37103235

ABSTRACT

Millions of people are affected by retinal abnormalities worldwide. Early detection and treatment of these abnormalities could arrest further progression, saving multitudes from avoidable blindness. Manual disease detection is time-consuming, tedious and lacks repeatability. There have been efforts to automate ocular disease detection, riding on the successes of the application of Deep Convolutional Neural Networks (DCNNs) and vision transformers (ViTs) for Computer-Aided Diagnosis (CAD). These models have performed well, however, there remain challenges owing to the complex nature of retinal lesions. This work reviews the most common retinal pathologies, provides an overview of prevalent imaging modalities and presents a critical evaluation of current deep-learning research for the detection and grading of glaucoma, diabetic retinopathy, Age-Related Macular Degeneration and multiple retinal diseases. The work concluded that CAD, through deep learning, will increasingly be vital as an assistive technology. As future work, there is a need to explore the potential impact of using ensemble CNN architectures in multiclass, multilabel tasks. Efforts should also be expended on the improvement of model explainability to win the trust of clinicians and patients.

10.
J Digit Imaging ; 36(2): 414-432, 2023 04.
Article in English | MEDLINE | ID: mdl-36456839

ABSTRACT

Retinal fundus images are non-invasively acquired and faced with low contrast, noise, and uneven illumination. The low-contrast problem makes objects in the retinal fundus image indistinguishable and the segmentation of blood vessels very challenging. Retinal blood vessels are significant because of their diagnostic importance in ophthalmologic diseases. This paper proposes improved retinal fundus images for optimal segmentation of blood vessels using convolutional neural networks (CNNs). This study explores some robust contrast enhancement tools on the RGB and the green channel of the retinal fundus images. The improved images undergo quality evaluation using mean square error (MSE), peak signal to noise ratio (PSNR), Similar Structure Index Matrix (SSIM), histogram, correlation, and intersection distance measures for histogram comparison before segmentation in the CNN-based model. The simulation results analysis reveals that the improved RGB quality outperforms the improved green channel. This revelation implies that the choice of RGB to the green channel for contrast enhancement is adequate and effectively improves the quality of the fundus images. This improved contrast will, in turn, boost the predictive accuracy of the CNN-based model during the segmentation process. The evaluation of the proposed method on the DRIVE dataset achieves an accuracy of 94.47, sensitivity of 70.92, specificity of 98.20, and AUC (ROC) of 97.56.


Subject(s)
Algorithms , Retinal Diseases , Humans , Neural Networks, Computer , Fundus Oculi , Retinal Vessels/diagnostic imaging , Image Processing, Computer-Assisted/methods
11.
Front Med (Lausanne) ; 10: 1240360, 2023.
Article in English | MEDLINE | ID: mdl-38193036

ABSTRACT

Introduction: To improve comprehension of initial brain growth in wellness along with sickness, it is essential to precisely segment child brain magnetic resonance imaging (MRI) into white matter (WM) and gray matter (GM), along with cerebrospinal fluid (CSF). Nonetheless, in the isointense phase (6-8 months of age), the inborn myelination and development activities, WM along with GM display alike stages of intensity in both T1-weighted and T2-weighted MRI, making tissue segmentation extremely difficult. Methods: The comprehensive review of studies related to isointense brain MRI segmentation approaches is highlighted in this publication. The main aim and contribution of this study is to aid researchers by providing a thorough review to make their search for isointense brain MRI segmentation easier. The systematic literature review is performed from four points of reference: (1) review of studies concerning isointense brain MRI segmentation; (2) research contribution and future works and limitations; (3) frequently applied evaluation metrics and datasets; (4) findings of this studies. Results and discussion: The systemic review is performed on studies that were published in the period of 2012 to 2022. A total of 19 primary studies of isointense brain MRI segmentation were selected to report the research question stated in this review.

12.
Front Big Data ; 5: 1025806, 2022.
Article in English | MEDLINE | ID: mdl-36387012

ABSTRACT

Apparent age estimation via human face image has attracted increased attention due to its numerous real-world applications. Predicting the apparent age has been quite difficult for machines and humans. However, researchers have focused on machine estimation of "age as perceived" to a high level of accuracy. To further improve the performance of apparent age estimation from the facial image, researchers continue to examine different methods to enhance its results further. This paper presents a critical review of the modern approaches and techniques for the apparent age estimation task. We also present a comparative analysis of the performance of some of those approaches on the apparent facial aging benchmark. The study also highlights the strengths and weaknesses of each approach used for apparent age estimation to guide in choosing the appropriate algorithms for future work in the field. The work focuses on the most popular algorithms and those that appear to have been the most successful for apparent age estimation to improve on the existing state-of-the-art results. We based our evaluations on three facial aging datasets, including looking at people (LAP)-2015, LAP-2016, and APPA-REAL, the most popular and publicly available datasets benchmark for apparent age estimation.

13.
Comput Intell Neurosci ; 2022: 3364141, 2022.
Article in English | MEDLINE | ID: mdl-36211015

ABSTRACT

Classification of isolated digits is the basic challenge for many speech classification systems. While a lot of work has been carried out on spoken languages, only limited research work on spoken English digit data has been reported in the literature. The paper proposes an intelligent-based system based on deep feedforward neural network (DFNN) with hyperparameter optimization techniques, an ensemble method; random forest (RF), and a regression method; gradient boosting (GB) for the classification of spoken digit data. The paper investigates different machine learning (ML) algorithms to determine the best method for the classification of spoken English digit data. The DFNN classifier outperformed the RF and GB classifiers on the public benchmark spoken English digit data and achieved 99.65% validation accuracy. The outcome of the proposed model performs better compared to existing models with only traditional classifiers.


Subject(s)
Deep Learning , Algorithms , Language , Machine Learning , Neural Networks, Computer
14.
Comput Intell Neurosci ; 2022: 3514807, 2022.
Article in English | MEDLINE | ID: mdl-36093471

ABSTRACT

Biometrics is the recognition of a human using biometric characteristics for identification, which may be physiological or behavioral. The physiological biometric features are the face, ear, iris, fingerprint, and handprint; behavioral biometrics are signatures, voice, gait pattern, and keystrokes. Numerous systems have been developed to distinguish biometric traits used in multiple applications, such as forensic investigations and security systems. With the current worldwide pandemic, facial identification has failed due to users wearing masks; however, the human ear has proven more suitable as it is visible. Therefore, the main contribution is to present the results of a CNN developed using EfficientNet. This paper presents the performance achieved in this research and shows the efficiency of EfficientNet on ear recognition. The nine variants of EfficientNets were fine-tuned and implemented on multiple publicly available ear datasets. The experiments showed that EfficientNet variant B8 achieved the best accuracy of 98.45%.


Subject(s)
Biometric Identification , Biometric Identification/methods , Biometry/methods , Humans , Iris/anatomy & histology , Recognition, Psychology
15.
ScientificWorldJournal ; 2022: 8415705, 2022.
Article in English | MEDLINE | ID: mdl-35450417

ABSTRACT

Dental caries detection, in the past, has been a challenging task given the amount of information got from various radiographic images. Several methods have been introduced to improve the quality of images for faster caries detection. Deep learning has become the methodology of choice when it comes to analysis of medical images. This survey gives an in-depth look into the use of deep learning for object detection, segmentation, and classification. It further looks into literature on segmentation and detection methods of dental images through deep learning. From the literature studied, we found out that methods were grouped according to the type of dental caries (proximal, enamel), type of X-ray images used (extraoral, intraoral), and segmentation method (threshold-based, cluster-based, boundary-based, and region-based). From the works reviewed, the main focus has been found to be on threshold-based segmentation methods. Most of the reviewed papers have preferred the use of intraoral X-ray images over extraoral X-ray images to perform segmentation on dental images of already isolated parts of the teeth. This paper presents an in-depth analysis of recent research in deep learning for dental caries segmentation and detection. It involves discussing the methods and algorithms used in segmenting and detecting dental caries. It also discusses various existing models used and how they compare with each other in terms of system performance and evaluation. We also discuss the limitations of these methods, as well as future perspectives on how to improve their performance.


Subject(s)
Dental Caries , Tooth , Algorithms , Dental Caries/diagnostic imaging , Humans
16.
Front Med (Lausanne) ; 9: 830515, 2022.
Article in English | MEDLINE | ID: mdl-35355598

ABSTRACT

The high mortality rate in Tuberculosis (TB) burden regions has increased significantly in the last decades. Despite the possibility of treatment for TB, high burden regions still suffer inadequate screening tools, which result in diagnostic delay and misdiagnosis. These challenges have led to the development of Computer-Aided Diagnostic (CAD) system to detect TB automatically. There are several ways of screening for TB, but Chest X-Ray (CXR) is more prominent and recommended due to its high sensitivity in detecting lung abnormalities. This paper presents the results of a systematic review based on PRISMA procedures that investigate state-of-the-art Deep Learning techniques for screening pulmonary abnormalities related to TB. The systematic review was conducted using an extensive selection of scientific databases as reference sources that grant access to distinctive articles in the field. Four scientific databases were searched to retrieve related articles. Inclusion and exclusion criteria were defined and applied to each article to determine those included in the study. Out of the 489 articles retrieved, 62 were included. Based on the findings in this review, we conclude that CAD systems are promising in tackling the challenges of the TB epidemic and made recommendations for improvement in future studies.

17.
J Imaging ; 8(3)2022 Feb 25.
Article in English | MEDLINE | ID: mdl-35324610

ABSTRACT

Cure rates for kidney cancer vary according to stage and grade; hence, accurate diagnostic procedures for early detection and diagnosis are crucial. Some difficulties with manual segmentation have necessitated the use of deep learning models to assist clinicians in effectively recognizing and segmenting tumors. Deep learning (DL), particularly convolutional neural networks, has produced outstanding success in classifying and segmenting images. Simultaneously, researchers in the field of medical image segmentation employ DL approaches to solve problems such as tumor segmentation, cell segmentation, and organ segmentation. Segmentation of tumors semantically is critical in radiation and therapeutic practice. This article discusses current advances in kidney tumor segmentation systems based on DL. We discuss the various types of medical images and segmentation techniques and the assessment criteria for segmentation outcomes in kidney tumor segmentation, highlighting their building blocks and various strategies.

18.
PeerJ Comput Sci ; 7: e736, 2021.
Article in English | MEDLINE | ID: mdl-34909462

ABSTRACT

Facial Expression Recognition (FER) has gained considerable attention in affective computing due to its vast area of applications. Diverse approaches and methods have been considered for a robust FER in the field, but only a few works considered the intensity of emotion embedded in the expression. Even the available studies on expression intensity estimation successfully assigned a nominal/regression value or classified emotion in a range of intervals. Most of the available works on facial expression intensity estimation successfully present only the emotion intensity estimation. At the same time, others proposed methods that predict emotion and its intensity in different channels. These multiclass approaches and extensions do not conform to man heuristic manner of recognising emotion and its intensity estimation. This work presents a Multilabel Convolution Neural Network (ML-CNN)-based model, which could simultaneously recognise emotion and provide ordinal metrics as the intensity estimation of the emotion. The proposed ML-CNN is enhanced with the aggregation of Binary Cross-Entropy (BCE) loss and Island Loss (IL) functions to minimise intraclass and interclass variations. Also, ML-CNN model is pre-trained with Visual Geometric Group (VGG-16) to control overfitting. In the experiments conducted on Binghampton University 3D Facial Expression (BU-3DFE) and Cohn Kanade extension (CK+) datasets, we evaluate ML-CNN's performance based on accuracy and loss. We also carried out a comparative study of our model with some popularly used multilabel algorithms using standard multilabel metrics. ML-CNN model simultaneously predicts emotion and intensity estimation using ordinal metrics. The model also shows appreciable and superior performance over four standard multilabel algorithms: Chain Classifier (CC), distinct Random K label set (RAKEL), Multilabel K Nearest Neighbour (MLKNN) and Multilabel ARAM (MLARAM).

19.
Comput Intell Neurosci ; 2021: 2921508, 2021.
Article in English | MEDLINE | ID: mdl-34950198

ABSTRACT

Recent advances in medical imaging analysis, especially the use of deep learning, are helping to identify, detect, classify, and quantify patterns in radiographs. At the center of these advances is the ability to explore hierarchical feature representations learned from data. Deep learning is invaluably becoming the most sought out technique, leading to enhanced performance in analysis of medical applications and systems. Deep learning techniques have achieved great performance results in dental image segmentation. Segmentation of dental radiographs is a crucial step that helps the dentist to diagnose dental caries. The performance of these deep networks is however restrained by various challenging features of dental carious lesions. Segmentation of dental images becomes difficult due to a vast variety in topologies, intricacies of medical structures, and poor image qualities caused by conditions such as low contrast, noise, irregular, and fuzzy edges borders, which result in unsuccessful segmentation. The dental segmentation method used is based on thresholding and connected component analysis. Images are preprocessed using the Gaussian blur filter to remove noise and corrupted pixels. Images are then enhanced using erosion and dilation morphology operations. Finally, segmentation is done through thresholding, and connected components are identified to extract the Region of Interest (ROI) of the teeth. The method was evaluated on an augmented dataset of 11,114 dental images. It was trained with 10 090 training set images and tested on 1024 testing set images. The proposed method gave results of 93% for both precision and recall values, respectively.


Subject(s)
Dental Caries , Image Processing, Computer-Assisted , Humans , Normal Distribution
20.
Comput Intell Neurosci ; 2021: 9790894, 2021.
Article in English | MEDLINE | ID: mdl-34950203

ABSTRACT

Tuberculosis (TB) remains a life-threatening disease and is one of the leading causes of mortality in developing regions due to poverty and inadequate medical resources. Tuberculosis is medicable, but it necessitates early diagnosis through reliable screening techniques. Chest X-ray is a recommended screening procedure for identifying pulmonary abnormalities. Still, this recommendation is not enough without experienced radiologists to interpret the screening results, which forms part of the problems in rural communities. Consequently, various computer-aided diagnostic systems have been developed for the automatic detection of tuberculosis. However, their sensitivity and accuracy are still significant challenges that require constant improvement due to the severity of the disease. Hence, this study explores the application of a leading state-of-the-art convolutional neural network (EfficientNets) model for the classification of tuberculosis. Precisely, five variants of EfficientNets were fine-tuned and implemented on two prominent and publicly available chest X-ray datasets (Montgomery and Shenzhen). The experiments performed show that EfficientNet-B4 achieved the best accuracy of 92.33% and 94.35% on both datasets. These results were then improved through Ensemble learning and reached 97.44%. The performance recorded in this study portrays the efficiency of fine-tuning EfficientNets on medical imaging classification through Ensemble.


Subject(s)
Neural Networks, Computer , Tuberculosis , Computer Systems , Humans , Tuberculosis/diagnosis
SELECTION OF CITATIONS
SEARCH DETAIL
...