Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Mais filtros










Intervalo de ano de publicação
1.
Data Brief ; 53: 110118, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38348323

RESUMO

Arabic, unlike many languages, suffers from punctuation inconsistency, posing a significant obstacle for Natural Language Processing (NLP). To address this, we present the Arabic Punctuation Dataset (APD), a large collection of annotated Modern Standard Arabic texts designed to train machine learning models in sentence boundary identification and punctuation prediction. APD leverages the "theme-rheme completion" principle, a grammatical feature closely linked to consistent punctuation placement. It consists of an annotated collection of Modern Standard Arabic (MSA) texts that encompass 312 million words in approximately 12 million sentences. It comprises three diverse components: Arabic Book Chapters (ABC): Manually annotated, non-fiction, book excerpts, constituting a gold-standard reference. Complete Book Translations (CBT): Parallel English-Arabic book translations with aligned sentence endings, ideal for machine translation training. Scrambled Sentences from the Arabic Component of the United Nations Parallel Corpus (SSAC-UNPC): Jumbled sentences for model training in automatic punctuation restoration. Beyond NLP, APD serves as a valuable resource for linguistics research, language learning, and real-time subtitling. Its authentic, grammar-based approach can enhance the readability and clarity of machine-generated text, opening doors for various applications such as automatic speech recognition, text summarization, and machine translation.

2.
Neurocomputing (Amst) ; 511: 142-154, 2022 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-36097509

RESUMO

The Covid-19 pandemic has galvanized scientists to apply machine learning methods to help combat the crisis. Despite the significant amount of research there exists no comprehensive survey devoted specifically to examining deep learning methods for Covid-19 forecasting. In this paper, we fill the gap in the literature by reviewing and analyzing the current studies that use deep learning for Covid-19 forecasting. In our review, all published papers and preprints, discoverable through Google Scholar, for the period from Apr 1, 2020 to Feb 20, 2022 which describe deep learning approaches to forecasting Covid-19 were considered. Our search identified 152 studies, of which 53 passed the initial quality screening and were included in our survey. We propose a model-based taxonomy to categorize the literature. We describe each model and highlight its performance. Finally, the deficiencies of the existing approaches are identified and the necessary improvements for future research are elucidated. The study provides a gateway for researchers who are interested in forecasting Covid-19 using deep learning.

3.
Neural Comput Appl ; 34(19): 16387-16422, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35971379

RESUMO

Bat-inspired algorithm (BA) is a robust swarm intelligence algorithm that finds success in many problem domains. The ecosystem of bat animals inspires the main idea of BA. This review paper scanned and analysed the state-of-the-art researches investigated using BA from 2017 to 2021. BA has very impressive characteristics such as its easy-to-use, simple in concepts, flexible and adaptable, consistent, and sound and complete. It has strong operators that incorporate the natural selection principle through survival-of-the-fittest rule within the intensification step attracted by local-best solution. Initially, the growth of the recent solid works published in Scopus indexed articles is summarized in terms of the number of BA-based Journal articles published per year, citations, top authors, work with BA, top institutions, and top countries. After that, the different versions of BA are highlighted to be in line with the complex nature of optimization problems such as binary, modified, hybridized, and multiobjective BA. The successful applications of BA are reviewed and summarized, such as electrical and power system, wireless and network system, environment and materials engineering, classification and clustering, structural and mechanical engineering, feature selection, image and signal processing, robotics, medical and healthcare, scheduling domain, and many others. The critical analysis of the limitations and shortcomings of BA is also mentioned. The open-source codes of BA code are given to build a wealthy BA review. Finally, the BA review is concluded, and the possible future directions for upcoming developments are suggested such as utilizing BA to serve in dynamic, robust, multiobjective, large-scaled optimization as well as improve BA performance by utilizing structure population, tuning parameters, memetic strategy, and selection mechanisms. The reader of this review will determine the best domains and applications used by BA and can justify their BA-related contributions.

4.
Neural Comput Appl ; 34(18): 16019-16032, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35529091

RESUMO

Social media is becoming a source of news for many people due to its ease and freedom of use. As a result, fake news has been spreading quickly and easily regardless of its credibility, especially in the last decade. Fake news publishers take advantage of critical situations such as the Covid-19 pandemic and the American presidential elections to affect societies negatively. Fake news can seriously impact society in many fields including politics, finance, sports, etc. Many studies have been conducted to help detect fake news in English, but research conducted on fake news detection in the Arabic language is scarce. Our contribution is twofold: first, we have constructed a large and diverse Arabic fake news dataset. Second, we have developed and evaluated transformer-based classifiers to identify fake news while utilizing eight state-of-the-art Arabic contextualized embedding models. The majority of these models had not been previously used for Arabic fake news detection. We conduct a thorough analysis of the state-of-the-art Arabic contextualized embedding models as well as comparison with similar fake news detection systems. Experimental results confirm that these state-of-the-art models are robust, with accuracy exceeding 98%.

5.
Neural Comput Appl ; 34(2): 1135-1159, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34483495

RESUMO

The process of tagging a given text or document with suitable labels is known as text categorization or classification. The aim of this work is to automatically tag a news article based on its vocabulary features. To accomplish this objective, 2 large datasets have been constructed from various Arabic news portals. The first dataset contains of 90k single-labeled articles from 4 domains (Business, Middle East, Technology and Sports). The second dataset has over 290 k multi-tagged articles. To examine the single-label dataset, we employed an array of ten shallow learning classifiers. Furthermore, we added an ensemble model that adopts the majority-voting technique of all studied classifiers. The performance of the classifiers on the first dataset ranged between 87.7% (AdaBoost) and 97.9% (SVM). Analyzing some of the misclassified articles confirmed the need for a multi-label opposed to single-label categorization for better classification results. For the second dataset, we tested both shallow learning and deep learning multi-labeling approaches. A custom accuracy metric, designed for the multi-labeling task, has been developed for performance evaluation along with hamming loss metric. Firstly, we used classifiers that were compatible with multi-labeling tasks such as Logistic Regression and XGBoost, by wrapping each in a OneVsRest classifier. XGBoost gave the higher accuracy, scoring 84.7%, while Logistic Regression scored 81.3%. Secondly, ten neural networks were constructed (CNN, CLSTM, LSTM, BILSTM, GRU, CGRU, BIGRU, HANGRU, CRF-BILSTM and HANLSTM). CGRU proved to be the best multi-labeling classifier scoring an accuracy of 94.85%, higher than the rest of the classifies.

6.
Afr. J. Gastroenterol. Hepatol ; 5(1): 40-57, 2022. figures, tables
Artigo em Inglês | AIM (África) | ID: biblio-1513131

RESUMO

Aims Upper Gastrointestinal bleeding (UGIB) in critically ill patients under mechanical ventilation (MV) is a significant cause of morbidity and mortality. Therefore, it aimed to study the incidence, predictors, and etiology of UGIB in critically ill patients under MV. Patients and Methods Three hundred and sixty critically ill patients were managed by mechanical ventilation. The patients were evaluated by complete clinical examination, APACHE II score, liver and kidney function tests, and abdominal ultrasound. In addition, upper gastrointestinal endoscopy was done for survived patients with UGIB during MV after weaning with a stable clinical condition for at least 48 hours. Results 41 patients (11.4 %) had UGIB; 15 patients (36.6%) survived and death occurred in 26 (63.4%). Upper endoscopy revealed large ulcers > 2 cm in the gastric antrum (n=1), multiple antral ulcers (n=2), large >2cm corporeal gastric ulcers (n=2) [all were Forrest Ib with oozing surface], bleeding small duodenal bulb ulcers < 2cm (n=1) [Forrest Ia with spurting], small ulcers in the lower esophagus with lower end esophagitis (n=2), black esophagus (n=1), ulcer on top of grade III oesophageal varices (n=2), severe portal hypertensive gastropathy (n=3), candida esophagitis and gastritis (n=1). Logistic regression analysis revealed that the independent variables of UGIB were elevated serum creatinine, APACHE II score >14, peak inspiratory pressure ≥ 30cmH2O, and prolonged aPTT. Conclusions : Mechanically ventilated patients had a high risk of upper gastrointestinal bleeding, which the postulated parameters can predict for adequate prophylaxis.


Assuntos
Trato Gastrointestinal Superior
7.
Data Brief ; 33: 106503, 2020 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-33294506

RESUMO

The automatic identification and verification of speakers through representative audio continue to gain the attention of many researchers with diverse domains of applications. Despite this diversity, the availability of classified and categorized multi-purpose Arabic audio libraries is scarce. Therefore, we introduce a large Arabic-based audio clips dataset (15810 clips) of 30 popular reciters cantillating 37 chapters from the Holy Quran. These chapters have a variable number of verses saved to different subsequent folders, where each verse is allocated one folder containing 30 audio clips for the declared reciters covering the same textual content. An additional 397 audio clips for 12 competent imitators of the top reciters are collected based on popularity and number of views/downloads to allow for cross-comparison of text, reciters, and authenticity. Based on the volume, quality, and rich diversity of this dataset we anticipate a wide range of deployments for speaker identification, in addition to setting a new direction for the structure and organization of similar large audio clips dataset.

8.
Data Brief ; 25: 104076, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-31440535

RESUMO

Text Classification is one of the most popular Natural Language Processing (NLP) tasks. Text classification (aka categorization) is an active research topic in recent years. However, much less attention was directed towards this task in Arabic, due to the lack of rich representative resources for training an Arabic text classifier. Therefore, we introduce a large Single-labeled Arabic News Articles Dataset (SANAD) of textual data collected from three news portals. The dataset is a large one consisting of almost 200k articles distributed into seven categories that we offer to the research community on Arabic computational linguistics. We anticipate that this rich dataset would make a great aid for a variety of NLP tasks on Modern Standard Arabic (MSA) textual data, especially for single label text classification purposes. We present the data in raw form. SANAD is composed of three main datasets scraped from three news portals, which are AlKhaleej, AlArabiya, and Akhbarona. SANAD is made public and freely available at https://data.mendeley.com/datasets/57zpx667y9.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...