RESUMO
With the exponential growth of high-quality fake images in social networks and media, it is necessary to develop recognition algorithms for this type of content. One of the most common types of image and video editing consists of duplicating areas of the image, known as the copy-move technique. Traditional image processing approaches manually look for patterns related to the duplicated content, limiting their use in mass data classification. In contrast, approaches based on deep learning have shown better performance and promising results, but they present generalization problems with a high dependence on training data and the need for appropriate selection of hyperparameters. To overcome this, we propose two approaches that use deep learning, a model by a custom architecture and a model by transfer learning. In each case, the impact of the depth of the network is analyzed in terms of precision (P), recall (R) and F1 score. Additionally, the problem of generalization is addressed with images from eight different open access datasets. Finally, the models are compared in terms of evaluation metrics, and training and inference times. The model by transfer learning of VGG-16 achieves metrics about 10% higher than the model by a custom architecture, however, it requires approximately twice as much inference time as the latter.
RESUMO
This paper presents H-Voice, a dataset of 6672 histograms of original and fake voice recordings obtained by the Imitation [1,2] and the Deep Voice [3] methods. The dataset is organized into six directories: Training_fake, Training_original, Validation_fake, Validation_original, External_test1, and External_test2. The training directories include 2088 histograms of fake voice recordings and 2020 histograms of original voice recordings. Each validation directory has 864 histograms obtained from fake voice recordings and original voice recordings. Finally, External_test1 has 760 histograms (380 from fake voice recordings obtained by the Imitation method and 380 from original voice recordings), and External_test2 has 76 histograms (72 from fake voice recordings obtained by the Deep Voice method and 4 from original voice recordings). With this dataset, the researchers can train, cross-validate and test classification models using machine learning techniques to identify fake voice recordings.
RESUMO
This paper presents the CG-1050 dataset consisting of 100 original images, 1050 tampered images and their corresponding masks. The dataset is organized into four directories: original images, tampered images, mask images, and a description file. The directory of original images includes 15 color and 85 grayscale images. The directory of tampered images has 1050 images obtained through one of the following type of tampering: copy-move, cut-paste, retouching, and colorizing. The true mask between every pair of original and its tampered image is included in the mask directory (1380 masks). The description file shows the names of the images (i.e., original, tampered and mask), the image description, the photo location, the type of tampering, and the manipulated object in the image. With this dataset, the researchers can train and validate fake image classification methods, either for labelling the tampered image or for forgery pixel-detection.
RESUMO
Image encryption methods aim to protect content privacy. Typically, they encompass scrambling and diffusion. Every pixel of the image is permuted (scrambling) and its value is transformed according to a key (diffusion). Although several methods have been proposed in the literature, some of them have been cryptanalyzed. In this paper, we present a novel method that deviates the traditional schemes. We use variable length codes based on Collatz conjecture for transforming the content of the image into non-intelligible audio; therefore, scrambling and diffusion processes are performed simultaneously in a non-linear way. With our method, different ciphered audio is obtained every time, and it depends exclusively on the selected key (the size of the key space equal to 8 . 57 × 10 506 ). Several tests were performed in order to analyze randomness of the ciphered audio signals and the sensitivity of the key. Firstly, it was found that entropy and the level of disorder of ciphered audio signals are very close to the maximum value of randomness. Secondly, fractal behavior was detected into scatter plots of adjacent samples, altering completely the behavior of natural images. Finally, if the key was slightly modified, the image could not be recovered. With the above results, it was concluded that our method is very useful in image privacy protection applications.