Search | VHL Regional Portal

1.

Graph-based methods for Author Name Disambiguation: a survey.

De Bonis, Michele; Falchi, Fabrizio; Manghi, Paolo.

PeerJ Comput Sci ; 9: e1536, 2023.

Article in English | MEDLINE | ID: mdl-37810360

ABSTRACT

Scholarly knowledge graphs (SKG) are knowledge graphs representing research-related information, powering discovery and statistics about research impact and trends. Author name disambiguation (AND) is required to produce high-quality SKGs, as a disambiguated set of authors is fundamental to ensure a coherent view of researchers' activity. Various issues, such as homonymy, scarcity of contextual information, and cardinality of the SKG, make simple name string matching insufficient or computationally complex. Many AND deep learning methods have been developed, and interesting surveys exist in the literature, comparing the approaches in terms of techniques, complexity, performance, etc. However, none of them specifically addresses AND methods in the context of SKGs, where the entity-relationship structure can be exploited. In this paper, we discuss recent graph-based methods for AND, define a framework through which such methods can be confronted, and catalog the most popular datasets and benchmarks used to test such methods. Finally, we outline possible directions for future work on this topic.

2.

On the Generalization of Deep Learning Models in Video Deepfake Detection.

Coccomini, Davide Alessandro; Caldelli, Roberto; Falchi, Fabrizio; Gennaro, Claudio.

J Imaging ; 9(5)2023 Apr 29.

Article in English | MEDLINE | ID: mdl-37233308

ABSTRACT

The increasing use of deep learning techniques to manipulate images and videos, commonly referred to as "deepfakes", is making it more challenging to differentiate between real and fake content, while various deepfake detection systems have been developed, they often struggle to detect deepfakes in real-world situations. In particular, these methods are often unable to effectively distinguish images or videos when these are modified using novel techniques which have not been used in the training set. In this study, we carry out an analysis of different deep learning architectures in an attempt to understand which is more capable of better generalizing the concept of deepfake. According to our results, it appears that Convolutional Neural Networks (CNNs) seem to be more capable of storing specific anomalies and thus excel in cases of datasets with a limited number of elements and manipulation methodologies. The Vision Transformer, conversely, is more effective when trained with more varied datasets, achieving more outstanding generalization capabilities than the other methods analysed. Finally, the Swin Transformer appears to be a good alternative for using an attention-based method in a more limited data regime and performs very well in cross-dataset scenarios. All the analysed architectures seem to have a different way to look at deepfakes, but since in a real-world environment the generalization capability is essential, based on the experiments carried out, the attention-based architectures seem to provide superior performances.

3.

Bus Violence: An Open Benchmark for Video Violence Detection on Public Transport.

Ciampi, Luca; Foszner, Pawel; Messina, Nicola; Staniszewski, Michal; Gennaro, Claudio; Falchi, Fabrizio; Serao, Gianluca; Cogiel, Michal; Golba, Dominik; Szczesna, Agnieszka; Amato, Giuseppe.

Sensors (Basel) ; 22(21)2022 Oct 31.

Article in English | MEDLINE | ID: mdl-36366043

ABSTRACT

The automatic detection of violent actions in public places through video analysis is difficult because the employed Artificial Intelligence-based techniques often suffer from generalization problems. Indeed, these algorithms hinge on large quantities of annotated data and usually experience a drastic drop in performance when used in scenarios never seen during the supervised learning phase. In this paper, we introduce and publicly release the Bus Violence benchmark, the first large-scale collection of video clips for violence detection on public transport, where some actors simulated violent actions inside a moving bus in changing conditions, such as the background or light. Moreover, we conduct a performance analysis of several state-of-the-art video violence detectors pre-trained with general violence detection databases on this newly established use case. The achieved moderate performances reveal the difficulties in generalizing from these popular methods, indicating the need to have this new collection of labeled data, beneficial for specializing them in this new scenario.

Subject(s)

Artificial Intelligence , Benchmarking , Violence , Algorithms , Aggression

4.

The Face Deepfake Detection Challenge.

Guarnera, Luca; Giudice, Oliver; Guarnera, Francesco; Ortis, Alessandro; Puglisi, Giovanni; Paratore, Antonino; Bui, Linh M Q; Fontani, Marco; Coccomini, Davide Alessandro; Caldelli, Roberto; Falchi, Fabrizio; Gennaro, Claudio; Messina, Nicola; Amato, Giuseppe; Perelli, Gianpaolo; Concas, Sara; Cuccu, Carlo; Orrù, Giulia; Marcialis, Gian Luca; Battiato, Sebastiano.

J Imaging ; 8(10)2022 Sep 28.

Article in English | MEDLINE | ID: mdl-36286357

ABSTRACT

Multimedia data manipulation and forgery has never been easier than today, thanks to the power of Artificial Intelligence (AI). AI-generated fake content, commonly called Deepfakes, have been raising new issues and concerns, but also new challenges for the research community. The Deepfake detection task has become widely addressed, but unfortunately, approaches in the literature suffer from generalization issues. In this paper, the Face Deepfake Detection and Reconstruction Challenge is described. Two different tasks were proposed to the participants: (i) creating a Deepfake detector capable of working in an "in the wild" scenario; (ii) creating a method capable of reconstructing original images from Deepfakes. Real images from CelebA and FFHQ and Deepfake images created by StarGAN, StarGAN-v2, StyleGAN, StyleGAN2, AttGAN and GDWCT were collected for the competition. The winning teams were chosen with respect to the highest classification accuracy value (Task I) and "minimum average distance to Manhattan" (Task II). Deep Learning algorithms, particularly those based on the EfficientNet architecture, achieved the best results in Task I. No winners were proclaimed for Task II. A detailed discussion of teams' proposed methods with corresponding ranking is presented in this paper.

5.

Deep networks for behavioral variant frontotemporal dementia identification from multiple acquisition sources.

Di Benedetto, Marco; Carrara, Fabio; Tafuri, Benedetta; Nigro, Salvatore; De Blasi, Roberto; Falchi, Fabrizio; Gennaro, Claudio; Gigli, Giuseppe; Logroscino, Giancarlo; Amato, Giuseppe.

Comput Biol Med ; 148: 105937, 2022 09.

Article in English | MEDLINE | ID: mdl-35985188

ABSTRACT

Behavioral variant frontotemporal dementia (bvFTD) is a neurodegenerative syndrome whose clinical diagnosis remains a challenging task especially in the early stage of the disease. Currently, the presence of frontal and anterior temporal lobe atrophies on magnetic resonance imaging (MRI) is part of the diagnostic criteria for bvFTD. However, MRI data processing is usually dependent on the acquisition device and mostly require human-assisted crafting of feature extraction. Following the impressive improvements of deep architectures, in this study we report on bvFTD identification using various classes of artificial neural networks, and present the results we achieved on classification accuracy and obliviousness on acquisition devices using extensive hyperparameter search. In particular, we will demonstrate the stability and generalization of different deep networks based on the attention mechanism, where data intra-mixing confers models the ability to identify the disorder even on MRI data in inter-device settings, i.e., on data produced by different acquisition devices and without model fine tuning, as shown from the very encouraging performance evaluations that dramatically reach and overcome the 90% value on the AuROC and balanced accuracy metrics.

Subject(s)

Alzheimer Disease , Frontotemporal Dementia , Atrophy , Humans , Magnetic Resonance Imaging

6.

An embedded toolset for human activity monitoring in critical environments.

Di Benedetto, Marco; Carrara, Fabio; Ciampi, Luca; Falchi, Fabrizio; Gennaro, Claudio; Amato, Giuseppe.

Expert Syst Appl ; 199: 117125, 2022 Aug 01.

Article in English | MEDLINE | ID: mdl-35431465

ABSTRACT

In many working and recreational activities, there are scenarios where both individual and collective safety have to be constantly checked and properly signaled, as occurring in dangerous workplaces or during pandemic events like the recent COVID-19 disease. From wearing personal protective equipment to filling physical spaces with an adequate number of people, it is clear that a possibly automatic solution would help to check compliance with the established rules. Based on an off-the-shelf compact and low-cost hardware, we present a deployed real use-case embedded system capable of perceiving people's behavior and aggregations and supervising the appliance of a set of rules relying on a configurable plug-in framework. Working on indoor and outdoor environments, we show that our implementation of counting people aggregations, measuring their reciprocal physical distances, and checking the proper usage of protective equipment is an effective yet open framework for monitoring human activities in critical conditions.

7.

MOCCA: Multilayer One-Class Classification for Anomaly Detection.

Massoli, Fabio Valerio; Falchi, Fabrizio; Kantarci, Alperen; Akti, Seymanur; Ekenel, Hazim Kemal; Amato, Giuseppe.

IEEE Trans Neural Netw Learn Syst ; 33(6): 2313-2323, 2022 Jun.

Article in English | MEDLINE | ID: mdl-34874873

ABSTRACT

Anomalies are ubiquitous in all scientific fields and can express an unexpected event due to incomplete knowledge about the data distribution or an unknown process that suddenly comes into play and distorts the observations. Usually, due to such events' rarity, to train deep learning (DL) models on the anomaly detection (AD) task, scientists only rely on "normal" data, i.e., nonanomalous samples. Thus, letting the neural network infer the distribution beneath the input data. In such a context, we propose a novel framework, named multilayer one-class classification (MOCCA), to train and test DL models on the AD task. Specifically, we applied our approach to autoencoders. A key novelty in our work stems from the explicit optimization of the intermediate representations for the task at hand. Indeed, differently from commonly used approaches that consider a neural network as a single computational block, i.e., using the output of the last layer only, MOCCA explicitly leverages the multilayer structure of deep architectures. Each layer's feature space is optimized for AD during training, while in the test phase, the deep representations extracted from the trained layers are combined to detect anomalies. With MOCCA, we split the training process into two steps. First, the autoencoder is trained on the reconstruction task only. Then, we only retain the encoder tasked with minimizing the L2 distance between the output representation and a reference point, the anomaly-free training data centroid, at each considered layer. Subsequently, we combine the deep features extracted at the various trained layers of the encoder model to detect anomalies at inference time. To assess the performance of the models trained with MOCCA, we conduct extensive experiments on publicly available datasets, namely CIFAR10, MVTec AD, and ShanghaiTech. We show that our proposed method reaches comparable or superior performance to state-of-the-art approaches available in the literature. Finally, we provide a model analysis to give insights regarding the benefits of our training procedure.

8.

The VISIONE Video Search System: Exploiting Off-the-Shelf Text Search Engines for Large-Scale Video Retrieval.

Amato, Giuseppe; Bolettieri, Paolo; Carrara, Fabio; Debole, Franca; Falchi, Fabrizio; Gennaro, Claudio; Vadicamo, Lucia; Vairo, Claudio.

J Imaging ; 7(5)2021 Apr 23.

Article in English | MEDLINE | ID: mdl-34460672

ABSTRACT

This paper describes in detail VISIONE, a video search system that allows users to search for videos using textual keywords, the occurrence of objects and their spatial relationships, the occurrence of colors and their spatial relationships, and image similarity. These modalities can be combined together to express complex queries and meet users' needs. The peculiarity of our approach is that we encode all information extracted from the keyframes, such as visual deep features, tags, color and object locations, using a convenient textual encoding that is indexed in a single text retrieval engine. This offers great flexibility when results corresponding to various parts of the query (visual, text and locations) need to be merged. In addition, we report an extensive analysis of the retrieval performance of the system, using the query logs generated during the Video Browser Showdown (VBS) 2019 competition. This allowed us to fine-tune the system by choosing the optimal parameters and strategies from those we tested.

9.

Hebbian semi-supervised learning in a sample efficiency setting.

Lagani, Gabriele; Falchi, Fabrizio; Gennaro, Claudio; Amato, Giuseppe.

Neural Netw ; 143: 719-731, 2021 Nov.

Article in English | MEDLINE | ID: mdl-34438195

ABSTRACT

We propose to address the issue of sample efficiency, in Deep Convolutional Neural Networks (DCNN), with a semi-supervised training strategy that combines Hebbian learning with gradient descent: all internal layers (both convolutional and fully connected) are pre-trained using an unsupervised approach based on Hebbian learning, and the last fully connected layer (the classification layer) is trained using Stochastic Gradient Descent (SGD). In fact, as Hebbian learning is an unsupervised learning method, its potential lies in the possibility of training the internal layers of a DCNN without labels. Only the final fully connected layer has to be trained with labeled examples. We performed experiments on various object recognition datasets, in different regimes of sample efficiency, comparing our semi-supervised (Hebbian for internal layers + SGD for the final fully connected layer) approach with end-to-end supervised backprop training, and with semi-supervised learning based on Variational Auto-Encoder (VAE). The results show that, in regimes where the number of available labeled samples is low, our semi-supervised approach outperforms the other approaches in almost all the cases.

Subject(s)

Neural Networks, Computer , Supervised Machine Learning

10.

TweepFake: About detecting deepfake tweets.

Fagni, Tiziano; Falchi, Fabrizio; Gambini, Margherita; Martella, Antonio; Tesconi, Maurizio.

PLoS One ; 16(5): e0251415, 2021.

Article in English | MEDLINE | ID: mdl-33984021

ABSTRACT

The recent advances in language modeling significantly improved the generative capabilities of deep neural models: in 2019 OpenAI released GPT-2, a pre-trained language model that can autonomously generate coherent, non-trivial and human-like text samples. Since then, ever more powerful text generative models have been developed. Adversaries can exploit these tremendous generative capabilities to enhance social bots that will have the ability to write plausible deepfake messages, hoping to contaminate public debate. To prevent this, it is crucial to develop deepfake social media messages detection systems. However, to the best of our knowledge no one has ever addressed the detection of machine-generated texts on social networks like Twitter or Facebook. With the aim of helping the research in this detection field, we collected the first dataset of real deepfake tweets, TweepFake. It is real in the sense that each deepfake tweet was actually posted on Twitter. We collected tweets from a total of 23 bots, imitating 17 human accounts. The bots are based on various generation techniques, i.e., Markov Chains, RNN, RNN+Markov, LSTM, GPT-2. We also randomly selected tweets from the humans imitated by the bots to have an overall balanced dataset of 25,572 tweets (half human and half bots generated). The dataset is publicly available on Kaggle. Lastly, we evaluated 13 deepfake text detection methods (based on various state-of-the-art approaches) to both demonstrate the challenges that Tweepfake poses and create a solid baseline of detection techniques. We hope that TweepFake can offer the opportunity to tackle the deepfake detection on social media messages as well.

Subject(s)

Social Media , Artificial Intelligence , Humans , Language , Markov Chains

11.

Virtual to Real Adaptation of Pedestrian Detectors.

Ciampi, Luca; Messina, Nicola; Falchi, Fabrizio; Gennaro, Claudio; Amato, Giuseppe.

Sensors (Basel) ; 20(18)2020 Sep 14.

Article in English | MEDLINE | ID: mdl-32937977

ABSTRACT

Pedestrian detection through Computer Vision is a building block for a multitude of applications. Recently, there has been an increasing interest in convolutional neural network-based architectures to execute such a task. One of these supervised networks' critical goals is to generalize the knowledge learned during the training phase to new scenarios with different characteristics. A suitably labeled dataset is essential to achieve this purpose. The main problem is that manually annotating a dataset usually requires a lot of human effort, and it is costly. To this end, we introduce ViPeD (Virtual Pedestrian Dataset), a new synthetically generated set of images collected with the highly photo-realistic graphical engine of the video game GTA V (Grand Theft Auto V), where annotations are automatically acquired. However, when training solely on the synthetic dataset, the model experiences a Synthetic2Real domain shift leading to a performance drop when applied to real-world images. To mitigate this gap, we propose two different domain adaptation techniques suitable for the pedestrian detection task, but possibly applicable to general object detection. Experiments show that the network trained with ViPeD can generalize over unseen real-world scenarios better than the detector trained over real-world data, exploiting the variety of our synthetic dataset. Furthermore, we demonstrate that with our domain adaptation techniques, we can reduce the Synthetic2Real domain shift, making the two domains closer and obtaining a performance improvement when testing the network over the real-world images.

Subject(s)

Image Interpretation, Computer-Assisted , Neural Networks, Computer , Pedestrians , Humans , Movement

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL