Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 5.728
Filtrar
1.
Comput Biol Med ; 179: 108830, 2024 Jul 10.
Artigo em Inglês | MEDLINE | ID: mdl-38991321

RESUMO

Undiagnosed and untreated human immunodeficiency virus (HIV) infection increases morbidity in the HIV-positive person and allows onward transmission of the virus. Minimizing missed opportunities for HIV diagnosis when a patient visits a healthcare facility is essential in restraining the epidemic and working toward its eventual elimination. Most state-of-the-art proposals employ machine learning (ML) methods and structured data to enhance HIV diagnoses, however, there is a dearth of recent proposals utilizing unstructured textual data from Electronic Health Records (EHRs). In this work, we propose to use only the unstructured text of the clinical notes as evidence for the classification of patients as suspected or not suspected. For this purpose, we first compile a dataset of real clinical notes from a hospital with patients classified as suspects and non-suspects of having HIV. Then, we evaluate the effectiveness of two types of classification models to identify patients suspected of being infected with the virus: classical ML algorithms and two Large Language Models (LLMs) from the biomedical domain in Spanish. The results show that both LLMs outperform classical ML algorithms in the two settings we explore: one dataset version is balanced, containing an equal number of suspicious and non-suspicious patients, while the other reflects the real distribution of patients in the hospital, being unbalanced. We obtain F1 score figures of 94.7 with both LLMs in the unbalanced setting, while in the balance one, RoBERTaBio model outperforms the other one with a F1 score of 95.7. The findings indicate that leveraging unstructured text with LLMs in the biomedical domain yields promising outcomes in diminishing missed opportunities for HIV diagnosis. A tool based on our system could assist a doctor in deciding whether a patient in consultation should undergo a serological test.

2.
BMC Public Health ; 24(1): 1753, 2024 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-38956527

RESUMO

BACKGROUND: The aim of this review was to investigate the impact of short message service (SMS)-based interventions on childhood and adolescent vaccine coverage and timeliness. METHODS: A pre-defined search strategy was used to identify all relevant publications up until July 2022 from electronic databases. Reports of randomised trials written in English and involving children and adolescents less than 18 years old were included. The review was conducted in accordance with PRISMA guidelines. RESULTS: Thirty randomised trials were identified. Most trials were conducted in high-income countries. There was marked heterogeneity between studies. SMS-based interventions were associated with small to moderate improvements in vaccine coverage and timeliness compared to no SMS reminder. Reminders with embedded education or which were combined with monetary incentives performed better than simple reminders in some settings. CONCLUSION: Some SMS-based interventions appear effective for improving child vaccine coverage and timeliness in some settings. Future studies should focus on identifying which features of SMS-based strategies, including the message content and timing, are determinants of effectiveness.


Assuntos
Sistemas de Alerta , Envio de Mensagens de Texto , Humanos , Criança , Adolescente , Cobertura Vacinal/estatística & dados numéricos , Ensaios Clínicos Controlados Aleatórios como Assunto , Pré-Escolar
3.
J Cheminform ; 16(1): 76, 2024 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-38956728

RESUMO

Materials science is an interdisciplinary field that studies the properties, structures, and behaviors of different materials. A large amount of scientific literature contains rich knowledge in the field of materials science, but manually analyzing these papers to find material-related data is a daunting task. In information processing, named entity recognition (NER) plays a crucial role as it can automatically extract entities in the field of materials science, which have significant value in tasks such as building knowledge graphs. The typically used sequence labeling methods for traditional named entity recognition in material science (MatNER) tasks often fail to fully utilize the semantic information in the dataset and cannot effectively extract nested entities. Herein, we proposed to convert the sequence labeling task into a machine reading comprehension (MRC) task. MRC method effectively can solve the challenge of extracting multiple overlapping entities by transforming it into the form of answering multiple independent questions. Moreover, the MRC framework allows for a more comprehensive understanding of the contextual information and semantic relationships within materials science literature, by integrating prior knowledge from queries. State-of-the-art (SOTA) performance was achieved on the Matscholar, BC4CHEMD, NLMChem, SOFC, and SOFC-Slot datasets, with F1-scores of 89.64%, 94.30%, 85.89%, 85.95%, and 71.73%, respectively in MRC approach. By effectively utilizing semantic information and extracting nested entities, this approach holds great significance for knowledge extraction and data analysis in the field of materials science, and thus accelerating the development of material science.Scientific contributionWe have developed an innovative NER method that enhances the efficiency and accuracy of automatic entity extraction in the field of materials science by transforming the sequence labeling task into a MRC task, this approach provides robust support for constructing knowledge graphs and other data analysis tasks.

4.
Heliyon ; 10(11): e32401, 2024 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-38961924

RESUMO

Urban guide signs, a fundamental component of traffic sign systems, convey both directional and locational information. Previous studies mainly focused on the font or volume of information, while little attention was paid to the layout of text-based Chinese guide signs, which is an unregulated area but crucial in practical applications and related to people's travel safety. This study investigates the impact of text layout and information volume on the spatial representation of road networks through two experimental studies, examining the effects of different designs on path determination and global road network knowledge. The results indicate that the text layout of urban road guide signs significantly influences the formation of spatial representation of the road network. Specifically, vertical guide signs displaying road names on both sides proved more effective than horizontal ones. While the volume of road name information does not markedly affect the formation of spatial representation, the arrangement of road names does influence the determination of information volume, with vertical layouts facilitating the presentation of more information. It is anticipated that these design recommendations for road signs can effectively mitigate the incidence of road traffic accidents.

5.
Front Psychol ; 15: 1335682, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38962237

RESUMO

Deep learning from collaboration occurs if the learner enacts interactive activities in the sense of leveraging the knowledge externalized by co-learners as resource for own inferencing processes and if these interactive activities in turn promote the learner's deep comprehension outcomes. This experimental study investigates whether inducing dyad members to enact constructive preparation activities can promote deep learning from subsequent collaboration while examining prior knowledge as moderator. In a digital collaborative learning environment, 122 non-expert university students assigned to 61 dyads studied a text about the human circulatory system and then prepared individually for collaboration according to their experimental conditions: the preparation tasks varied across dyads with respect to their generativity, that is, the degree to which they required the learners to enact constructive activities (note-taking, compare-contrast, or explanation). After externalizing their answer to the task, learners in all conditions inspected their partner's externalization and then jointly discussed their text understanding via chat. Results showed that more rather than less generative tasks fostered constructive preparation but not interactive collaboration activities or deep comprehension outcomes. Moderated mediation analyses considering actor and partner effects indicated the indirect effects of constructive preparation activities on deep comprehension outcomes via interactive activities to depend on prior knowledge: when own prior knowledge was relatively low, self-performed but not partner-performed constructive preparation activities were beneficial. When own prior knowledge was relatively high, partner-performed constructive preparation activities were conducive while one's own were ineffective or even detrimental. Given these differential effects, suggestions are made for optimizing the instructional design around generative preparation tasks to streamline the effectiveness of constructive preparation activities for deep learning from digital collaboration.

6.
Artif Intell Med ; 154: 102924, 2024 Jun 26.
Artigo em Inglês | MEDLINE | ID: mdl-38964194

RESUMO

BACKGROUND: Radiology reports are typically written in a free-text format, making clinical information difficult to extract and use. Recently, the adoption of structured reporting (SR) has been recommended by various medical societies thanks to the advantages it offers, e.g. standardization, completeness, and information retrieval. We propose a pipeline to extract information from Italian free-text radiology reports that fits with the items of the reference SR registry proposed by a national society of interventional and medical radiology, focusing on CT staging of patients with lymphoma. METHODS: Our work aims to leverage the potential of Natural Language Processing and Transformer-based models to deal with automatic SR registry filling. With the availability of 174 Italian radiology reports, we investigate a rule-free generative Question Answering approach based on the Italian-specific version of T5: IT5. To address information content discrepancies, we focus on the six most frequently filled items in the annotations made on the reports: three categorical (multichoice), one free-text (free-text), and two continuous numerical (factual). In the preprocessing phase, we encode also information that is not supposed to be entered. Two strategies (batch-truncation and ex-post combination) are implemented to comply with the IT5 context length limitations. Performance is evaluated in terms of strict accuracy, f1, and format accuracy, and compared with the widely used GPT-3.5 Large Language Model. Unlike multichoice and factual, free-text answers do not have 1-to-1 correspondence with their reference annotations. For this reason, we collect human-expert feedback on the similarity between medical annotations and generated free-text answers, using a 5-point Likert scale questionnaire (evaluating the criteria of correctness and completeness). RESULTS: The combination of fine-tuning and batch splitting allows IT5 ex-post combination to achieve notable results in terms of information extraction of different types of structured data, performing on par with GPT-3.5. Human-based assessment scores of free-text answers show a high correlation with the AI performance metrics f1 (Spearman's correlation coefficients>0.5, p-values<0.001) for both IT5 ex-post combination and GPT-3.5. The latter is better at generating plausible human-like statements, even if it systematically provides answers even when they are not supposed to be given. CONCLUSIONS: In our experimental setting, a fine-tuned Transformer-based model with a modest number of parameters (i.e., IT5, 220 M) performs well as a clinical information extraction system for automatic SR registry filling task. It can extract information from more than one place in the report, elaborating it in a manner that complies with the response specifications provided by the SR registry (for multichoice and factual items), or that closely approximates the work of a human-expert (free-text items); with the ability to discern when an answer is supposed to be given or not to a user query.

7.
Comput Biol Med ; 179: 108819, 2024 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-38964245

RESUMO

Automatic skin segmentation is an efficient method for the early diagnosis of skin cancer, which can minimize the missed detection rate and treat early skin cancer in time. However, significant variations in texture, size, shape, the position of lesions, and obscure boundaries in dermoscopy images make it extremely challenging to accurately locate and segment lesions. To address these challenges, we propose a novel framework named TG-Net, which exploits textual diagnostic information to guide the segmentation of dermoscopic images. Specifically, TG-Net adopts a dual-stream encoder-decoder architecture. The dual-stream encoder comprises Res2Net for extracting image features and our proposed text attention (TA) block for extracting textual features. Through hierarchical guidance, textual features are embedded into the process of image feature extraction. Additionally, we devise a multi-level fusion (MLF) module to merge higher-level features and generate a global feature map as guidance for subsequent steps. In the decoding stage of the network, local features and the global feature map are utilized in three multi-scale reverse attention modules (MSRA) to produce the final segmentation results. We conduct extensive experiments on three publicly accessible datasets, namely ISIC 2017, HAM10000, and PH2. Experimental results demonstrate that TG-Net outperforms state-of-the-art methods, validating the reliability of our method. Source code is available at https://github.com/ukeLin/TG-Net.

8.
J Tissue Viability ; 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38964979

RESUMO

BACKGROUND: This pilot study assessed text messaging as an early intervention for preventing pressure ulcers (PrUs) in individuals with spinal cord injury (SCI) post-hospital discharge. METHOD: Thirty-nine wheelchair-users discharged after acquiring a SCI, underwent randomisation into an intervention group (n = 20) with text messages and a control group (n = 19). All participants received standard post-discharge care and completed a skincare questionnaire before and 6-month after discharge. Primary outcomes included feasibility and acceptability of early intervention using text messaging, alongside performance, concordance, and attitudes toward skincare. Secondary outcomes measured perception and the incidence of PrUs. RESULTS: Baseline demographics were comparable between the intervention and control groups. Eight of 20 participants completed 6-month follow-up questionnaires in the intervention group, six participants completed the 6-month questionnaires in the control group,. Participants expressed high satisfaction with text messages, understanding of content, and increased confidence in preventing PrUs. At 6-month post-discharge, the intervention group showed improved prevention practices, heightened awareness of PrU risks, and increased perceived importance of prevention, which were not observed in the control group. However, there were no significant differences in PrU incidence, possibly due to the small sample size and short follow-up. CONCLUSION: The study demonstrates that using text messaging as an early intervention for PrU prevention in individuals with SCI is feasible and well-received. Preliminary results suggest a positive impact on participants' attitudes and practices, indicating the potential of text messaging to reduce PrU incidence. However, further research with larger samples and extended follow-up is crucial to validate these promising initial findings.

9.
Heliyon ; 10(12): e32093, 2024 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-38948047

RESUMO

Chinese agricultural named entity recognition (NER) has been studied with supervised learning for many years. However, considering the scarcity of public datasets in the agricultural domain, exploring this task in the few-shot scenario is more practical for real-world demands. In this paper, we propose a novel model named GlyReShot, integrating the knowledge of Chinese character glyph into few-shot NER models. Although the utilization of glyph has been proven successful in supervised models, two challenges still persist in the few-shot setting, i.e., how to obtain glyph representations and when to integrate them into the few-shot model. GlyReShot handles the two challenges by introducing a lightweight glyph representation obtaining module and a training-free label refinement strategy. Specifically, the glyph representations are generated based on the descriptive sentences by filling the predefined template. As most steps come before training, this module aligns well with the few-shot setting. Furthermore, by computing the confidence values for draft predictions, the refinement strategy selectively utilizes the glyph information only when the confidence values are relatively low, thus mitigating the influence of noise. Finally, we annotate a new agricultural NER dataset and the experimental results demonstrate effectiveness of GlyReShot for few-shot Chinese agricultural NER.

10.
PeerJ ; 12: e17470, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38948230

RESUMO

TIN-X (Target Importance and Novelty eXplorer) is an interactive visualization tool for illuminating associations between diseases and potential drug targets and is publicly available at newdrugtargets.org. TIN-X uses natural language processing to identify disease and protein mentions within PubMed content using previously published tools for named entity recognition (NER) of gene/protein and disease names. Target data is obtained from the Target Central Resource Database (TCRD). Two important metrics, novelty and importance, are computed from this data and when plotted as log(importance) vs. log(novelty), aid the user in visually exploring the novelty of drug targets and their associated importance to diseases. TIN-X Version 3.0 has been significantly improved with an expanded dataset, modernized architecture including a REST API, and an improved user interface (UI). The dataset has been expanded to include not only PubMed publication titles and abstracts, but also full-text articles when available. This results in approximately 9-fold more target/disease associations compared to previous versions of TIN-X. Additionally, the TIN-X database containing this expanded dataset is now hosted in the cloud via Amazon RDS. Recent enhancements to the UI focuses on making it more intuitive for users to find diseases or drug targets of interest while providing a new, sortable table-view mode to accompany the existing plot-view mode. UI improvements also help the user browse the associated PubMed publications to explore and understand the basis of TIN-X's predicted association between a specific disease and a target of interest. While implementing these upgrades, computational resources are balanced between the webserver and the user's web browser to achieve adequate performance while accommodating the expanded dataset. Together, these advances aim to extend the duration that users can benefit from TIN-X while providing both an expanded dataset and new features that researchers can use to better illuminate understudied proteins.


Assuntos
Interface Usuário-Computador , Humanos , Processamento de Linguagem Natural , PubMed , Software
11.
Data Brief ; 55: 110545, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-38952954

RESUMO

This dataset involves a collection of soybean market news through web scraping from a Brazilian website. The news articles gathered span from January 2015 to June 2023 and have undergone a labeling process to categorize them as relevant or non-relevant. The news labeling process was conducted under the guidance of an agricultural economics expert, who collaborated with a group of nine individuals. Ten parameters were considered to assist participants in the labeling process. The dataset comprises approximately 11,000 news articles and serves as a valuable resource for researchers interested in exploring trends in the soybean market. Importantly, this dataset can be utilized for tasks such as classification and natural language processing. It provides insights into labeled soybean market news and supports open science initiatives, facilitating further analysis within the research community.

12.
Syst Rev ; 13(1): 174, 2024 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-38978132

RESUMO

BACKGROUND: The demand for high-quality systematic literature reviews (SRs) for evidence-based medical decision-making is growing. SRs are costly and require the scarce resource of highly skilled reviewers. Automation technology has been proposed to save workload and expedite the SR workflow. We aimed to provide a comprehensive overview of SR automation studies indexed in PubMed, focusing on the applicability of these technologies in real world practice. METHODS: In November 2022, we extracted, combined, and ran an integrated PubMed search for SRs on SR automation. Full-text English peer-reviewed articles were included if they reported studies on SR automation methods (SSAM), or automated SRs (ASR). Bibliographic analyses and knowledge-discovery studies were excluded. Record screening was performed by single reviewers, and the selection of full text papers was performed in duplicate. We summarized the publication details, automated review stages, automation goals, applied tools, data sources, methods, results, and Google Scholar citations of SR automation studies. RESULTS: From 5321 records screened by title and abstract, we included 123 full text articles, of which 108 were SSAM and 15 ASR. Automation was applied for search (19/123, 15.4%), record screening (89/123, 72.4%), full-text selection (6/123, 4.9%), data extraction (13/123, 10.6%), risk of bias assessment (9/123, 7.3%), evidence synthesis (2/123, 1.6%), assessment of evidence quality (2/123, 1.6%), and reporting (2/123, 1.6%). Multiple SR stages were automated by 11 (8.9%) studies. The performance of automated record screening varied largely across SR topics. In published ASR, we found examples of automated search, record screening, full-text selection, and data extraction. In some ASRs, automation fully complemented manual reviews to increase sensitivity rather than to save workload. Reporting of automation details was often incomplete in ASRs. CONCLUSIONS: Automation techniques are being developed for all SR stages, but with limited real-world adoption. Most SR automation tools target single SR stages, with modest time savings for the entire SR process and varying sensitivity and specificity across studies. Therefore, the real-world benefits of SR automation remain uncertain. Standardizing the terminology, reporting, and metrics of study reports could enhance the adoption of SR automation techniques in real-world practice.


Assuntos
Automação , PubMed , Revisões Sistemáticas como Assunto , Humanos
13.
Med Phys ; 2024 Jul 09.
Artigo em Inglês | MEDLINE | ID: mdl-38981056

RESUMO

BACKGROUND: A comprehensive collection of data on doses in adult computed tomography procedures in Australia has not been undertaken for some time. This is largely due to the effort involved in collecting the data required for calculating the population dose. This data collection effort can be greatly reduced, and the coverage increased, if the process can be automated without major changes to the workflow of the imaging facilities providing the data. Success would provide a tool to determine a truly national assessment of the dose incurred through diagnostic imaging in Australia. PURPOSE: The aims of this study were to develop an automated tool to categorize electronic records of imaging procedures into a standardized set of broad procedure types, to validate the tool by applying it to data collected from nine facilities, and to assess the feasibility of applying the automated tool to compute population dose and determine the data manipulations required. METHODS: A rule-based classifier was implemented capitalizing on semantic and clinical rules. The keyword list was initially built from 609 unique study descriptions. It was then refined using an additional 414 unique study descriptions. The classifier was then tested on an additional 1198 unique study descriptions. Input from a radiologist provided the ground truth for the refinement of the classifier. RESULTS: From a sample of 238 139 studies containing 2794 unique study descriptions, the classifier correctly classified 2789 study types with only five misclassifications, demonstrating the feasibility of automating the process and the need for data pre-processing. Dose statistics for 21 categories were compiled using the 238 139 studies. CONCLUSION: The classifier achieved excellent classification results using the testing data supplied by the facilities. However, since all data supplied were from public facilities, the performance of the classifier may be biased. The performance of the classifier is yet to be tested on a more representative mix of private and public facilities.

14.
Cas Lek Cesk ; 162(7-8): 294-297, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38981715

RESUMO

The advent of large language models (LLMs) based on neural networks marks a significant shift in academic writing, particularly in medical sciences. These models, including OpenAI's GPT-4, Google's Bard, and Anthropic's Claude, enable more efficient text processing through transformer architecture and attention mechanisms. LLMs can generate coherent texts that are indistinguishable from human-written content. In medicine, they can contribute to the automation of literature reviews, data extraction, and hypothesis formulation. However, ethical concerns arise regarding the quality and integrity of scientific publications and the risk of generating misleading content. This article provides an overview of how LLMs are changing medical writing, the ethical dilemmas they bring, and the possibilities for detecting AI-generated text. It concludes with a focus on the potential future of LLMs in academic publishing and their impact on the medical community.


Assuntos
Redes Neurais de Computação , Humanos , Processamento de Linguagem Natural , Idioma , Editoração/ética
15.
JMIR Med Inform ; 12: e59680, 2024 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-38954456

RESUMO

BACKGROUND: Named entity recognition (NER) is a fundamental task in natural language processing. However, it is typically preceded by named entity annotation, which poses several challenges, especially in the clinical domain. For instance, determining entity boundaries is one of the most common sources of disagreements between annotators due to questions such as whether modifiers or peripheral words should be annotated. If unresolved, these can induce inconsistency in the produced corpora, yet, on the other hand, strict guidelines or adjudication sessions can further prolong an already slow and convoluted process. OBJECTIVE: The aim of this study is to address these challenges by evaluating 2 novel annotation methodologies, lenient span and point annotation, aiming to mitigate the difficulty of precisely determining entity boundaries. METHODS: We evaluate their effects through an annotation case study on a Japanese medical case report data set. We compare annotation time, annotator agreement, and the quality of the produced labeling and assess the impact on the performance of an NER system trained on the annotated corpus. RESULTS: We saw significant improvements in the labeling process efficiency, with up to a 25% reduction in overall annotation time and even a 10% improvement in annotator agreement compared to the traditional boundary-strict approach. However, even the best-achieved NER model presented some drop in performance compared to the traditional annotation methodology. CONCLUSIONS: Our findings demonstrate a balance between annotation speed and model performance. Although disregarding boundary information affects model performance to some extent, this is counterbalanced by significant reductions in the annotator's workload and notable improvements in the speed of the annotation process. These benefits may prove valuable in various applications, offering an attractive compromise for developers and researchers.

16.
Cannabis ; 7(2): 24-37, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38975595

RESUMO

Parent communication can be protective against cannabis use among young adults. However, changes in parent-student communication frequency naturally occur during the transition from high school to college. Recent research suggests declines in parent-student communication frequency predict increased drinking and consequences during the first year of college, yet these effects on other risky behaviors are unknown. The current study investigated whether post-matriculation changes in frequency of texting/calling with parents predict cannabis use and simultaneous use of cannabis and alcohol, and whether pre-matriculation cannabis and simultaneous use predict changes in communication. First-year students (N = 287, 61.3% female, 50.9% White) reported cannabis and simultaneous use pre- and post-matriculation (T1 & T3) and changes in frequency of texting/calling their mother/father per day (T2). Negative binomial hurdle models examined whether T2 changes in communication frequency predicted T3 cannabis and simultaneous use, and logistic regression models examined whether T1 cannabis and simultaneous use predicted T2 changes in communication frequency. Results revealed that increasing (vs. decreasing) frequency of calling with mothers and texting with fathers was protective against cannabis use, whereas increasing frequency of calling with fathers was associated with greater risk of use. Changes in communication did not significantly predict simultaneous use, nor did pre-matriculation cannabis or simultaneous use predict changes in either mode of communication with parents during the college transition. These findings highlight that changes in mother and father communication may be both beneficial and detrimental to cannabis use depending on the parent and mode of communication. Implications for these findings are discussed.

17.
Proc Natl Acad Sci U S A ; 121(29): e2319514121, 2024 Jul 16.
Artigo em Inglês | MEDLINE | ID: mdl-38976724

RESUMO

Works of fiction play a crucial role in the production of cultural stereotypes. Concerning gender, a widely held presumption is that many such works ascribe agency to men and passivity to women. However, large-scale diachronic analyses of this notion have been lacking. This paper provides an assessment of agency attributions in 87,531 fiction works written between 1850 and 2010. It introduces a syntax-based approach for extracting networks of character interactions. Agency is then formalized as a dyadic property: Does a character primarily serve as an agent acting upon the other character or as recipient acted upon by the other character? Findings indicate that female characters are more likely to be passive in cross-gender relationships than their male counterparts. This difference, the gender agency gap, has declined since the 19th century but persists into the 21st. Male authors are especially likely to attribute less agency to female characters. Moreover, certain kinds of actions, especially physical and villainous ones, have more pronounced gender disparities.


Assuntos
Redação , Feminino , Masculino , Humanos , História do Século XIX , História do Século XX , História do Século XXI , Literatura , Identidade de Gênero
20.
PeerJ Comput Sci ; 10: e2084, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38983195

RESUMO

Feature selection (FS) is a critical step in many data science-based applications, especially in text classification, as it includes selecting relevant and important features from an original feature set. This process can improve learning accuracy, streamline learning duration, and simplify outcomes. In text classification, there are often many excessive and unrelated features that impact performance of the applied classifiers, and various techniques have been suggested to tackle this problem, categorized as traditional techniques and meta-heuristic (MH) techniques. In order to discover the optimal subset of features, FS processes require a search strategy, and MH techniques use various strategies to strike a balance between exploration and exploitation. The goal of this research article is to systematically analyze the MH techniques used for FS between 2015 and 2022, focusing on 108 primary studies from three different databases such as Scopus, Science Direct, and Google Scholar to identify the techniques used, as well as their strengths and weaknesses. The findings indicate that MH techniques are efficient and outperform traditional techniques, with the potential for further exploration of MH techniques such as Ringed Seal Search (RSS) to improve FS in several applications.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...