Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 1 de 1
Filter
Add more filters










Database
Language
Publication year range
1.
BMC Res Notes ; 12(1): 801, 2019 Dec 11.
Article in English | MEDLINE | ID: mdl-31829258

ABSTRACT

OBJECTIVES: Classification of textual file formats is a topic of interest in network forensics. There are a few publicly available datasets of files with textual formats. Therewith, there is no public dataset for file fragments of textual file formats. So, a big research challenge in file fragment classification of textual file formats is to compare the performance of the developed methods over the same datasets. DATA DESCRIPTION: In this study, we present a dataset that contains file fragments of five textual file formats: Binary file format for Word 97-Word 2003, Microsoft Word open XML format, portable document format, rich text file, and standard text document. This dataset contains the file fragments in three different languages: English, Persian, and Chinese. For each pair of file format and language, 1500 file fragments are provided. So, the dataset of file fragments contains 22,500 file fragments.


Subject(s)
Datasets as Topic , Classification , Information Services , Language , Software , Word Processing
SELECTION OF CITATIONS
SEARCH DETAIL
...