Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 55
Filtrar
Mais filtros











Intervalo de ano de publicação
1.
Sci Rep ; 14(1): 20543, 2024 09 04.
Artigo em Inglês | MEDLINE | ID: mdl-39232010

RESUMO

Stroke, the second leading cause of mortality globally, predominantly results from ischemic conditions. Immediate attention and diagnosis, related to the characterization of brain lesions, play a crucial role in patient prognosis. Standard stroke protocols include an initial evaluation from a non-contrast CT to discriminate between hemorrhage and ischemia. However, non-contrast CTs lack sensitivity in detecting subtle ischemic changes in this phase. Alternatively, diffusion-weighted MRI studies provide enhanced capabilities, yet are constrained by limited availability and higher costs. Hence, we idealize new approaches that integrate ADC stroke lesion findings into CT, to enhance the analysis and accelerate stroke patient management. This study details a public challenge where scientists applied top computational strategies to delineate stroke lesions on CT scans, utilizing paired ADC information. Also, it constitutes the first effort to build a paired dataset with NCCT and ADC studies of acute ischemic stroke patients. Submitted algorithms were validated with respect to the references of two expert radiologists. The best achieved Dice score was 0.2 over a test study with 36 patient studies. Despite all the teams employing specialized deep learning tools, results reveal limitations of computational approaches to support the segmentation of small lesions with heterogeneous density.


Assuntos
AVC Isquêmico , Tomografia Computadorizada por Raios X , Humanos , AVC Isquêmico/diagnóstico por imagem , Tomografia Computadorizada por Raios X/métodos , Imageamento por Ressonância Magnética/métodos , Algoritmos , Imagem de Difusão por Ressonância Magnética/métodos , Isquemia Encefálica/diagnóstico por imagem , Masculino , Feminino , Idoso , Processamento de Imagem Assistida por Computador/métodos , Aprendizado Profundo , Acidente Vascular Cerebral/diagnóstico por imagem , Encéfalo/diagnóstico por imagem , Encéfalo/patologia
2.
Antibiotics (Basel) ; 13(8)2024 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-39200068

RESUMO

Antiviral peptides (AVPs) represent a promising strategy for addressing the global challenges of viral infections and their growing resistances to traditional drugs. Lab-based AVP discovery methods are resource-intensive, highlighting the need for efficient computational alternatives. In this study, we developed five non-trained but supervised multi-query similarity search models (MQSSMs) integrated into the StarPep toolbox. Rigorous testing and validation across diverse AVP datasets confirmed the models' robustness and reliability. The top-performing model, M13+, demonstrated impressive results, with an accuracy of 0.969 and a Matthew's correlation coefficient of 0.71. To assess their competitiveness, the top five models were benchmarked against 14 publicly available machine-learning and deep-learning AVP predictors. The MQSSMs outperformed these predictors, highlighting their efficiency in terms of resource demand and public accessibility. Another significant achievement of this study is the creation of the most comprehensive dataset of antiviral sequences to date. In general, these results suggest that MQSSMs are promissory tools to develop good alignment-based models that can be successfully applied in the screening of large datasets for new AVP discovery.

3.
Front Radiol ; 4: 1283392, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38645773

RESUMO

Data collection, curation, and cleaning constitute a crucial phase in Machine Learning (ML) projects. In biomedical ML, it is often desirable to leverage multiple datasets to increase sample size and diversity, but this poses unique challenges, which arise from heterogeneity in study design, data descriptors, file system organization, and metadata. In this study, we present an approach to the integration of multiple brain MRI datasets with a focus on homogenization of their organization and preprocessing for ML. We use our own fusion example (approximately 84,000 images from 54,000 subjects, 12 studies, and 88 individual scanners) to illustrate and discuss the issues faced by study fusion efforts, and we examine key decisions necessary during dataset homogenization, presenting in detail a database structure flexible enough to accommodate multiple observational MRI datasets. We believe our approach can provide a basis for future similarly-minded biomedical ML projects.

4.
NeuroRehabilitation ; 54(2): 227-235, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38306062

RESUMO

BACKGROUND: Premature newborns have a higher risk of abnormal visual development and visual impairment. OBJECTIVE: To develop a computational methodology to help assess functional vision in premature infants by tracking iris distances. METHODS: This experimental study was carried out with children up to two years old. A pattern of image capture with the visual stimulus was proposed to evaluate visual functions of vertical and horizontal visual tracking, visual field, vestibulo-ocular reflex, and fixation. The participants' visual responses were filmed to compose a dataset and develop a detection algorithm using the OpenCV library allied with FaceMesh for the detection and selection of the face, detection of specific facial points and tracking of the iris positions is done. A feasibility study was also conducted from the videos processed by the software. RESULTS: Forty-one children of different ages and diagnoses participated in the experimental study, forming a robust dataset. The software resulted in the tracking of iris positions during visual function evaluation stimuli. Furthermore, in the feasibility study, 8 children participated, divided into Pre-term and Term groups. There was no statistical difference in any visual variable analyzed in the comparison between groups. CONCLUSION: The computational methodology developed was able to track the distances traveled by the iris, and thus can be used to help assess visual function in children.


Assuntos
Recém-Nascido Prematuro , Visão Ocular , Lactente , Criança , Recém-Nascido , Humanos , Recém-Nascido Prematuro/fisiologia , Software , Algoritmos , Estudos de Viabilidade
5.
Data Brief ; 53: 110065, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38317735

RESUMO

When training Artificial Intelligence and Deep Learning models, especially by using Supervised Learning techniques, a labeled dataset is required to have an input with data and its corresponding labeled output data. In the case of images, for classification, segmentation, or other processing tasks, a pair of images is required in the same sense, one image as an input (the noisy image) and the desired (the denoised image) one as an output. For SAR despeckling applications, the common approach is to have a set of optical images that then are corrupted with synthetic noise, since there is no ground truth available. The corrupted image is considered the input and the optical one is the noiseless one (ground truth). In this paper, we provide a dataset based on actual SAR images. The ground truth was obtained from SAR images of Sentinel 1 of the same region in different instants of time and then they were processed and merged into one single image that serves as the output of the dataset. Every SAR image (noisy and ground truth) was split into 1600 images of 512 × 512 pixels, so a total of 3200 images were obtained. The dataset was also split into 3000 for training and 200 for validation, all of them available in four labeled folders.

6.
J Imaging Inform Med ; 37(4): 1691-1710, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38409608

RESUMO

Early diagnosis of potentially malignant disorders, such as oral epithelial dysplasia, is the most reliable way to prevent oral cancer. Computational algorithms have been used as an auxiliary tool to aid specialists in this process. Usually, experiments are performed on private data, making it difficult to reproduce the results. There are several public datasets of histological images, but studies focused on oral dysplasia images use inaccessible datasets. This prevents the improvement of algorithms aimed at this lesion. This study introduces an annotated public dataset of oral epithelial dysplasia tissue images. The dataset includes 456 images acquired from 30 mouse tongues. The images were categorized among the lesion grades, with nuclear structures manually marked by a trained specialist and validated by a pathologist. Also, experiments were carried out in order to illustrate the potential of the proposed dataset in classification and segmentation processes commonly explored in the literature. Convolutional neural network (CNN) models for semantic and instance segmentation were employed on the images, which were pre-processed with stain normalization methods. Then, the segmented and non-segmented images were classified with CNN architectures and machine learning algorithms. The data obtained through these processes is available in the dataset. The segmentation stage showed the F1-score value of 0.83, obtained with the U-Net model using the ResNet-50 as a backbone. At the classification stage, the most expressive result was achieved with the Random Forest method, with an accuracy value of 94.22%. The results show that the segmentation contributed to the classification results, but studies are needed for the improvement of these stages of automated diagnosis. The original, gold standard, normalized, and segmented images are publicly available and may be used for the improvement of clinical applications of CAD methods on oral epithelial dysplasia tissue images.


Assuntos
Redes Neurais de Computação , Camundongos , Animais , Aprendizado de Máquina , Algoritmos , Neoplasias Bucais/diagnóstico por imagem , Neoplasias Bucais/patologia , Processamento de Imagem Assistida por Computador/métodos , Bases de Dados Factuais , Lesões Pré-Cancerosas/diagnóstico por imagem , Lesões Pré-Cancerosas/patologia , Língua/patologia , Língua/diagnóstico por imagem , Humanos , Mucosa Bucal/patologia , Mucosa Bucal/diagnóstico por imagem
7.
BMC Health Serv Res ; 24(1): 37, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-38183029

RESUMO

BACKGROUND: No-show to medical appointments has significant adverse effects on healthcare systems and their clients. Using machine learning to predict no-shows allows managers to implement strategies such as overbooking and reminders targeting patients most likely to miss appointments, optimizing the use of resources. METHODS: In this study, we proposed a detailed analytical framework for predicting no-shows while addressing imbalanced datasets. The framework includes a novel use of z-fold cross-validation performed twice during the modeling process to improve model robustness and generalization. We also introduce Symbolic Regression (SR) as a classification algorithm and Instance Hardness Threshold (IHT) as a resampling technique and compared their performance with that of other classification algorithms, such as K-Nearest Neighbors (KNN) and Support Vector Machine (SVM), and resampling techniques, such as Random under Sampling (RUS), Synthetic Minority Oversampling Technique (SMOTE) and NearMiss-1. We validated the framework using two attendance datasets from Brazilian hospitals with no-show rates of 6.65% and 19.03%. RESULTS: From the academic perspective, our study is the first to propose using SR and IHT to predict the no-show of patients. Our findings indicate that SR and IHT presented superior performances compared to other techniques, particularly IHT, which excelled when combined with all classification algorithms and led to low variability in performance metrics results. Our results also outperformed sensitivity outcomes reported in the literature, with values above 0.94 for both datasets. CONCLUSION: This is the first study to use SR and IHT methods to predict patient no-shows and the first to propose performing z-fold cross-validation twice. Our study highlights the importance of avoiding relying on few validation runs for imbalanced datasets as it may lead to biased results and inadequate analysis of the generalization and stability of the models obtained during the training stage.


Assuntos
Algoritmos , Benchmarking , Humanos , Brasil , Aprendizado de Máquina , Técnicas de Apoio para a Decisão
8.
J Mol Graph Model ; 126: 108627, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-37801808

RESUMO

This research investigates the application of Graph Neural Networks (GNNs) to enhance the cost-effectiveness of drug development, addressing the limitations of cost and time. Class imbalances within classification datasets, such as the discrepancy between active and inactive compounds, give rise to difficulties that can be resolved through strategies like oversampling, undersampling, and manipulation of the loss function. A comparison is conducted between three distinct datasets using three different GNN architectures. This benchmarking research can steer future investigations and enhance the efficacy of GNNs in drug discovery and design. Three hundred models for each combination of architecture and dataset were trained using hyperparameter tuning techniques and evaluated using a range of metrics. Notably, the oversampling technique outperforms eight experiments, showcasing its potential. While balancing techniques boost imbalanced dataset models, their efficacy depends on dataset specifics and problem type. Although oversampling aids molecular graph datasets, more research is needed to optimize its usage and explore other class imbalance solutions.


Assuntos
Desenvolvimento de Medicamentos , Descoberta de Drogas , Hidrolases , Redes Neurais de Computação
9.
Data Brief ; 51: 109689, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38099148

RESUMO

The objective of this document is to introduce the datasets and the methods for accessing them, derived from the article "Social, commercial, and economic diversity. Poverty and expectations among street vendors in Florencia, Caquetá, Colombia." These datasets aim to provide insights into the conditions and characteristics of street vending in Colombia. The data collection process involved both mapping and personal surveys conducted on 190 street vendors. Additionally, practical recommendations are provided for tailoring the implementation of each survey instrument based on the specific attributes of the study's target demographic. The collected data holds the potential for comparative and longitudinal analyses, not only within different Colombian cities but also in cities worldwide facing similar circumstances to those of intermediate cities like Florencia. These datasets offer a valuable resource for understanding the dynamics of street vending and its implications, fostering more comprehensive research and informed policymaking.

10.
Data Brief ; 50: 109604, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37808545

RESUMO

The data for provide evidences of the multi steady state of the human cell line HEK 293 was obtained from 2 L bioreactor continuous culture. A HEK 293 cell line transfected to produce soluble HER1 receptor was used. The bioreactor was operated at three different dilution rates in sequential manner. Daily samples of culture broth were collected, a total of 85 samples were processed. Viable cell concentration and culture viability was addressing by trypan blue exclusion method using a hemocytometer. Heterologous HER1 supernatant concentration was quantified by a specific ELISA and the metabolites by mass spectrometry coupled to a liquid chromatography. The primary data were collected in excel files, where it was calculated the kinetic and other variables by using mass balance and mathematical principles. It was compared the steady states behavior each other's to find out the existence of steady states' multiplicity, taking into account the stationary phase with respect to the cell density (which means its coefficient of variation is less than 20 %). From the metabolic measurements by using Liquid Chromatography coupled to mass spectrometry (LC-MS), it was also built the data matrix with the specific rates of the 76 metabolites obtained. The data were processed and analyzed, using multivariate data asssnalysis (MVDA) to reduce the complexity and to find the main patterns present in the data. We describe also the full data of the metabolites not only for steady states but also in the time evolution, which could help others in terms of modeling and deep understanding of HEK293 metabolism, especially under different culture conditions.

11.
Genes (Basel) ; 14(8)2023 07 28.
Artigo em Inglês | MEDLINE | ID: mdl-37628602

RESUMO

In the last decade, there has been a boost in autophagy reports due to its role in cancer progression and its association with tumor resistance to treatment. Despite this, many questions remain to be elucidated and explored among the different tumors. Here, we used omics-based cancer datasets to identify autophagy genes as prognostic markers in cancer. We then combined these findings with independent studies to further characterize the clinical significance of these genes in cancer. Our observations highlight the importance of innovative approaches to analyze tumor heterogeneity, potentially affecting the expression of autophagy-related genes with either pro-tumoral or anti-tumoral functions. In silico analysis allowed for identifying three genes (TBC1D12, KERA, and TUBA3D) not previously described as associated with autophagy pathways in cancer. While autophagy-related genes were rarely mutated across human cancers, the expression profiles of these genes allowed the clustering of different cancers into three independent groups. We have also analyzed datasets highlighting the effects of drugs or regulatory RNAs on autophagy. Altogether, these data provide a comprehensive list of targets to further the understanding of autophagy mechanisms in cancer and investigate possible therapeutic targets.


Assuntos
Neoplasias , Humanos , Neoplasias/genética , Autofagia/genética , Relevância Clínica , Análise por Conglomerados , RNA
12.
Open Res Eur ; 3: 67, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37645488

RESUMO

The western tropical North Atlantic (WTNA) is a very complex region, with the influence of intense western boundary currents in connection with equatorial zonal currents, important atmospheric forcings (e.g Intertropical Convergence Zone), mesoscale activities (e.g NBC rings), and the world's largest river discharge as the Amazon River runoff. The volume discharge is equivalent to more than one-third of the Atlantic river freshwater input, with a plume that spreads over the region reaching the northwestward Caribbean Sea and eastward longitudes of 30°W, and influencing from physical to biological structures. Therefore, in order to enable and encourage more understanding of the region, here we present a dataset based on an idealized scenario of no river runoff of the Amazon River and Par ´a River in the WTNA. The numerical simulations were conducted with a regional oceanic modeling system (ROMS) model and three pairs of files were generated with the model outputs: (i) ROMS-files, with the parameters of the ROMS-outputs raw data in a NetCDF format and monthly and weekly frequencies; (ii) MATLAB-files, which contain oceanographic parameters also in monthly and weekly frequencies; and (iii) NetCDF-files, with oceanographic parameters again in monthly and weekly frequencies. For each file, we present the coordinates and variable names, descriptions, and correspondent units. The dataset is available in the Science Data Bank repository (doi: https://doi.org/10.57760/sciencedb.02145).

13.
J Pers Med ; 13(7)2023 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-37511754

RESUMO

In Colombia, the first case of COVID-19 was confirmed on 6 March 2020. On 13 March 2023, Colombia registered 6,360,780 confirmed positive cases of COVID-19, representing 12.18% of the total population. The National Administrative Department of Statistics (DANE) in Colombia published in 2020 a COVID-19 vulnerability index, which estimates the vulnerability (per city block) of being infected with COVID-19. Unfortunately, DANE did not consider multiple factors that could increase the risk of COVID-19 (in addition to demographic and health), such as environmental and mobility data (found in the related literature). The proposed multidimensional index considers variables of different types (unemployment rate, gross domestic product, citizens' mobility, vaccination data, and climatological and spatial information) in which the incidence of COVID-19 is calculated and compared with the incidence of the COVID-19 vulnerability index provided by DANE. The collection, data preparation, modeling, and evaluation phases of the Cross-Industry Standard Process for Data Mining methodology (CRISP-DM) were considered for constructing the index. The multidimensional index was evaluated using multiple machine learning models to calculate the incidence of COVID-19 cases in the main cities of Colombia. The results showed that the best-performing model to predict the incidence of COVID-19 in Colombia is the Extra Trees Regressor algorithm, obtaining an R-squared of 0.829. This work is the first step toward a multidimensional analysis of COVID-19 risk factors, which has the potential to support decision making in public health programs. The results are also relevant for calculating vulnerability indexes for other viral diseases, such as dengue.

14.
Data Brief ; 48: 109219, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37383761

RESUMO

The TRI-POL project explores the triangle of interactive relationships between affective and ideological polarisation, political distrust, and the politics of party competition. In this project there are two complementary groups of datasets with individual-level survey data and digital trace data collected in five countries: Argentina, Chile, Italy, Portugal and Spain. These datasets are comprised of three waves carried out over a six-month period between late September 2021 and April 2022. In addition, the survey datasets include a series of experiments embedded in the different waves that examine social exposure, polarisation framing, and social sorting. The digital trace datasets include variables on individuals' behaviours and exposure to information received via digital media and social media. This data was collected using a combination of tracking technologies that the interviewees installed in their different devices. This digital trace data is matched with the individual-level survey data. These datasets are especially useful for researchers who wish to explore dynamics of polarisation, political attitudes, and political communication.

15.
J Med Internet Res ; 25: e43333, 2023 06 22.
Artigo em Inglês | MEDLINE | ID: mdl-37347537

RESUMO

Artificial Intelligence (AI) represents a significant milestone in health care's digital transformation. However, traditional health care education and training often lack digital competencies. To promote safe and effective AI implementation, health care professionals must acquire basic knowledge of machine learning and neural networks, critical evaluation of data sets, integration within clinical workflows, bias control, and human-machine interaction in clinical settings. Additionally, they should understand the legal and ethical aspects of digital health care and the impact of AI adoption. Misconceptions and fears about AI systems could jeopardize its real-life implementation. However, there are multiple barriers to promoting electronic health literacy, including time constraints, overburdened curricula, and the shortage of capacitated professionals. To overcome these challenges, partnerships among developers, professional societies, and academia are essential. Integrating specialists from different backgrounds, including data specialists, lawyers, and social scientists, can significantly contribute to combating digital illiteracy and promoting safe AI implementation in health care.


Assuntos
Inteligência Artificial , Currículo , Humanos , Escolaridade , Redes Neurais de Computação , Aprendizado de Máquina
16.
Data Brief ; 48: 109128, 2023 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-37122923

RESUMO

The gold standard for the diagnosis of oral cancer is the microscopic analysis of specimens removed preferentially through incisional biopsies of oral mucosa with a clinically detected suspicious lesion. This dataset contains captured histopathological images of oral squamous cell carcinoma and leukoplakia. A total of 237 images were captured, 89 leukoplakia with dysplasia images, 57 leukoplakia without dysplasia images and 91 carcinoma images. The images were captured with an optical light microscope, using 10x and 40x objectives, attached to a microscope camera and visualized through a software. The images were saved in PNG format at 2048 × 1536 size pixels and they refer to hematoxylin-eosin stained histopathologic slides from biopsies performed between 2010 and 2021 in patients managed at the Oral Diagnosis project (NDB) of the Federal University of Espírito Santo (UFES). Oral leukoplakias were represented by samples with and without epithelial dysplasia. Since the diagnosis considers socio-demographic data (gender, age and skin color) as well as clinical data (tobacco use, alcohol consumption, sun exposure, fundamental lesion, type of biopsy, lesion color, lesion surface and lesion diagnosis), this information was also collected. So, our aim by releasing this dataset NDB-UFES is to provide a new dataset to be used by researchers in Artificial Intelligence (machine and deep learning) to develop tools to assist clinicians and pathologists in the automated diagnosis of oral potentially malignant disorders and oral squamous cell carcinoma.

17.
Sensors (Basel) ; 23(8)2023 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-37112504

RESUMO

Nowadays, Brain-Computer Interfaces (BCIs) still captivate large interest because of multiple advantages offered in numerous domains, explicitly assisting people with motor disabilities in communicating with the surrounding environment. However, challenges of portability, instantaneous processing time, and accurate data processing remain for numerous BCI system setups. This work implements an embedded multi-tasks classifier based on motor imagery using the EEGNet network integrated into the NVIDIA Jetson TX2 card. Therefore, two strategies are developed to select the most discriminant channels. The former uses the accuracy based-classifier criterion, while the latter evaluates electrode mutual information to form discriminant channel subsets. Next, the EEGNet network is implemented to classify discriminant channel signals. Additionally, a cyclic learning algorithm is implemented at the software level to accelerate the model learning convergence and fully profit from the NJT2 hardware resources. Finally, motor imagery Electroencephalogram (EEG) signals provided by HaLT's public benchmark were used, in addition to the k-fold cross-validation method. Average accuracies of 83.7% and 81.3% were achieved by classifying EEG signals per subject and motor imagery task, respectively. Each task was processed with an average latency of 48.7 ms. This framework offers an alternative for online EEG-BCI systems' requirements, dealing with short processing times and reliable classification accuracy.


Assuntos
Interfaces Cérebro-Computador , Humanos , Eletroencefalografia/métodos , Algoritmos , Imagens, Psicoterapia , Software
18.
Data Brief ; 47: 108978, 2023 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-36879615

RESUMO

This dataset is composed of photomicrographs of the immunohistochemical expression of Biglycan (BGN) in breast tissue, with and without cancer, using only the staining of 3-3' diaminobenzidine (DAB), after processing images with color deconvolution plugin, from Image J. The immunohistochemical DAB expression of BGN was obtained using the monoclonal antibody (M01) (clone 4E1-1G7 - Abnova Corporation, mouse anti-human). Photomicrographs were obtained, under standard conditions, using an optical microscope, with UPlanFI 100x objective (resolution: 2.75 mm), yielding an image size of 4800 × 3600 pixels. After color deconvolution, the dataset with 336 images was divided into 2 two categories: (I) with cancer and (II) without cancer. This dataset allows the training and validation of machine learning models to diagnose, recognize and classify the presence of breast cancer, using the intensity of the colors of the BGN.

19.
Sensors (Basel) ; 23(3)2023 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-36772092

RESUMO

Ransomware-related cyber-attacks have been on the rise over the last decade, disturbing organizations considerably. Developing new and better ways to detect this type of malware is necessary. This research applies dynamic analysis and machine learning to identify the ever-evolving ransomware signatures using selected dynamic features. Since most of the attributes are shared by diverse ransomware-affected samples, our study can be used for detecting current and even new variants of the threat. This research has the following objectives: (1) Execute experiments with encryptor and locker ransomware combined with goodware to generate JSON files with dynamic parameters using a sandbox. (2) Analyze and select the most relevant and non-redundant dynamic features for identifying encryptor and locker ransomware from goodware. (3) Generate and make public a dynamic features dataset that includes these selected parameters for samples of different artifacts. (4) Apply the dynamic feature dataset to obtain models with machine learning algorithms. Five platforms, 20 ransomware, and 20 goodware artifacts were evaluated. The final feature dataset is composed of 2000 registers of 50 characteristics each. This dataset allows for a machine learning detection with a 10-fold cross-evaluation with an average accuracy superior to 0.99 for gradient boosted regression trees, random forest, and neural networks.

20.
J Imaging ; 9(2)2023 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-36826960

RESUMO

Smartphones with an in-built camera are omnipresent today in the life of over eighty percent of the world's population. They are very often used to photograph documents. Document binarization is a key process in many document processing platforms. This paper assesses the quality, file size and time performance of sixty-eight binarization algorithms using five different versions of the input images. The evaluation dataset is composed of deskjet, laser and offset printed documents, photographed using six widely-used mobile devices with the strobe flash off and on, under two different angles and four shots with small variations in the position. Besides that, this paper also pinpoints the algorithms per device that may provide the best visual quality-time, document transcription accuracy-time, and size-time trade-offs. Furthermore, an indication is also given on the "overall winner" that would be the algorithm of choice if one has to use one algorithm for a smartphone-embedded application.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA