Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
Add more filters










Database
Language
Publication year range
1.
Data Brief ; 52: 109857, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38161660

ABSTRACT

Plagiarism detection (PD) is a process of identifying instances where someone has presented another person's work or ideas as their own. Plagiarism detection is categorized into two types (i) Intrinsic plagiarism detection primarily concerns the assessment of authorship consistency within a single document, aiming to identify instances where portions of the text may have been copied or paraphrased from elsewhere within the same document. Author clustering, closely related to intrinsic plagiarism detection, involves grouping documents based on their stylistic and linguistic characteristics to identify common authors or sources within a given dataset. On the other hand, (ii) extrinsic plagiarism detection delves into the comparative analysis of a suspicious document against a set of external source documents, seeking instances of shared phrases, sentences, or paragraphs between them, which is often referred to as text reuse or verbatim copying. Detection of plagiarism from documents is a long-established task in the area of NLP with remarkable contributions in multiple applications. A lot of research has already been conducted in the English and other foreign languages but Urdu language needs a lot of attention especially in intrinsic plagiarism detection domain. The major reason is that Urdu is a low resource language and unfortunately there is no high-quality benchmark corpus available for intrinsic plagiarism detection in Urdu language. This study presents a high-quality benchmark Corpus comprising 10,872 documents. The corpus is structured into two granularity levels: sentence level and paragraph level. This dataset serves multifaceted purposes, facilitating intrinsic plagiarism detection, verbatim text reuse identification, and author clustering in the Urdu language. Also, it holds significance for natural language processing researchers and practitioners as it facilitates the development of specialized plagiarism detection models tailored to the Urdu language. These models can play a vital role in education and publishing by improving the accuracy of plagiarism detection, effectively addressing a gap and enhancing the overall ability to identify copied content in Urdu writing.

2.
Educ Inf Technol (Dordr) ; 28(3): 2681-2725, 2023.
Article in English | MEDLINE | ID: mdl-36061104

ABSTRACT

Fundamentals of Database Systems is a core course in computing disciplines as almost all small, medium, large, or enterprise systems essentially require data storage component. Database System Education (DSE) provides the foundation as well as advanced concepts in the area of data modeling and its implementation. The first course in DSE holds a pivotal role in developing students' interest in this area. Over the years, the researchers have devised several different tools and methods to teach this course effectively, and have also been revisiting the curricula for database systems education. In this study a Systematic Literature Review (SLR) is presented that distills the existing literature pertaining to the DSE to discuss these three perspectives for the first course in database systems. Whereby, this SLR also discusses how the developed teaching and learning assistant tools, teaching and assessment methods and database curricula have evolved over the years due to rapid change in database technology. To this end, more than 65 articles related to DSE published between 1995 and 2022 have been shortlisted through a structured mechanism and have been reviewed to find the answers of the aforementioned objectives. The article also provides useful guidelines to the instructors, and discusses ideas to extend this research from several perspectives. To the best of our knowledge, this is the first research work that presents a broader review about the research conducted in the area of DSE.

3.
Data Brief ; 42: 108293, 2022 Jun.
Article in English | MEDLINE | ID: mdl-35637892

ABSTRACT

Dataset presented in this paper is obtained from the top online automobile selling and purchasing websites. A total of 1000 reviews related to hybrid cars in the form of text reviews are extracted with the help of the Web Scraper tool. The dataset presents the customers sentiments in the form of reviews related to hybrid cars. Various aspects are taken into consideration while annotating the reviews such as driving, performance, comfort, safety features, interior, exterior and accessories. The annotation of data is done at three levels by three annotators i.e., (1) overall polarity of a review, (2) segregation of the sentence term in which aspect is discussed, (3) polarity of the discussed aspect. Cohen's Kappa score of 0.90 was achieved among the authors while annotating the reviews. Dataset can be used for sentiment analysis, information retrieving, lexicon analysis, and grammatical and morphological analysis.

SELECTION OF CITATIONS
SEARCH DETAIL
...