Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
J Imaging ; 10(9)2024 Sep 14.
Artículo en Inglés | MEDLINE | ID: mdl-39330449

RESUMEN

Video summarization aims to select the most informative subset of frames in a video to facilitate efficient video browsing. Past efforts have invariantly involved training summarization models with annotated summaries or heuristic objectives. In this work, we reveal that features pre-trained on image-level tasks contain rich semantic information that can be readily leveraged to quantify frame-level importance for zero-shot video summarization. Leveraging pre-trained features and contrastive learning, we propose three metrics featuring a desirable keyframe: local dissimilarity, global consistency, and uniqueness. We show that the metrics can well-capture the diversity and representativeness of frames commonly used for the unsupervised generation of video summaries, demonstrating competitive or better performance compared to past methods when no training is needed. We further propose a contrastive learning-based pre-training strategy on unlabeled videos to enhance the quality of the proposed metrics and, thus, improve the evaluated performance on the public benchmarks TVSum and SumMe.

2.
MethodsX ; 13: 102780, 2024 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-39007030

RESUMEN

In today's world of managing multimedia content, dealing with the amount of CCTV footage poses challenges related to storage, accessibility and efficient navigation. To tackle these issues, we suggest an encompassing technique, for summarizing videos that merges machine-learning techniques with user engagement. Our methodology consists of two phases, each bringing improvements to video summarization. In Phase I we introduce a method for summarizing videos based on keyframe detection and behavioral analysis. By utilizing technologies like YOLOv5 for object recognition, Deep SORT for object tracking, and Single Shot Detector (SSD) for creating video summaries. In Phase II we present a User Interest Based Video summarization system driven by machine learning. By incorporating user preferences into the summarization process we enhance techniques with personalized content curation. Leveraging tools such as NLTK, OpenCV, TensorFlow, and the EfficientDET model enables our system to generate customized video summaries tailored to preferences. This innovative approach not only enhances user interactions but also efficiently handles the overwhelming amount of video data on digital platforms. By combining these two methodologies we make progress in applying machine learning techniques while offering a solution to the complex challenges presented by managing multimedia data.

3.
Sensors (Basel) ; 23(16)2023 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-37631590

RESUMEN

This paper introduces a transformer encoding linker network (TELNet) for automatically identifying scene boundaries in videos without prior knowledge of their structure. Videos consist of sequences of semantically related shots or chapters, and recognizing scene boundaries is crucial for various video processing tasks, including video summarization. TELNet utilizes a rolling window to scan through video shots, encoding their features extracted from a fine-tuned 3D CNN model (transformer encoder). By establishing links between video shots based on these encoded features (linker), TELNet efficiently identifies scene boundaries where consecutive shots lack links. TELNet was trained on multiple video scene detection datasets and demonstrated results comparable to other state-of-the-art models in standard settings. Notably, in cross-dataset evaluations, TELNet demonstrated significantly improved results (F-score). Furthermore, TELNet's computational complexity grows linearly with the number of shots, making it highly efficient in processing long videos.

4.
Sensors (Basel) ; 23(7)2023 Mar 23.
Artículo en Inglés | MEDLINE | ID: mdl-37050439

RESUMEN

Individuals spend time on online video-sharing platforms searching for videos. Video summarization helps search through many videos efficiently and quickly. In this paper, we propose an unsupervised video summarization method based on deep reinforcement learning with an interpolation method. To train the video summarization network efficiently, we used the graph-level features and designed a reinforcement learning-based video summarization framework with a temporal consistency reward function and other reward functions. Our temporal consistency reward function helped to select keyframes uniformly. We present a lightweight video summarization network with transformer and CNN networks to capture the global and local contexts to efficiently predict the keyframe-level importance score of the video in a short length. The output importance score of the network was interpolated to fit the video length. Using the predicted importance score, we calculated the reward based on the reward functions, which helped select interesting keyframes efficiently and uniformly. We evaluated the proposed method on two datasets, SumMe and TVSum. The experimental results illustrate that the proposed method showed a state-of-the-art performance compared to the latest unsupervised video summarization methods, which we demonstrate and analyze experimentally.

5.
Artif Intell Med ; 139: 102544, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-37100512

RESUMEN

The outbreak of COVID-19 pandemic poses new challenges to research community to investigate novel mechanisms for monitoring as well as controlling its further spread via crowded scenes. Moreover, the contemporary methods of COVID-19 preventions are enforcing strict protocols in the public places. The emergence of robust computer vision-enabled applications leverages intelligent frameworks for monitoring of the pandemic deterrence in public places. The employment of COVID-19 protocols via wearing face masks by human is an effective procedure that is implemented in several countries across the world. It is a challenging task for authorities to manually monitor these protocols particularly in densely crowded public gatherings such as, shopping malls, railway stations, airports, religious places etc. Thus, to overcome these issues, the proposed research aims to design an operative method that automatically detects the violation of face mask regulation for COVID-19 pandemic. In this research work, we expound a novel technique for COVID-19 protocol desecration via video summarization in the crowded scenes (CoSumNet). Our approach automatically yields short summaries from crowded video scenes (i.e., with and without mask human). Besides, the CoSumNet can be deployed in crowded places that may assist the controlling agencies to take appropriate actions to enforce the penalty to the protocol violators. To evaluate the efficacy of the approach, the CoSumNet is trained on a benchmark "Face Mask Detection ∼12K Images Dataset" and validated through various real-time CCTV videos. The CoSumNet demonstrates superior performance of 99.98 % and 99.92 % detection accuracy in the seen and unseen scenarios respectively. Our method offers promising performance in cross-datasets environments as well as on a variety of face masks. Furthermore, the model can convert the longer videos to short summaries in nearly 5-20 s approximately.


Asunto(s)
COVID-19 , Humanos , COVID-19/epidemiología , COVID-19/prevención & control , Pandemias/prevención & control , Ambiente
6.
Sensors (Basel) ; 23(6)2023 Mar 07.
Artículo en Inglés | MEDLINE | ID: mdl-36991606

RESUMEN

The popularity of dogs has been increasing owing to factors such as the physical and mental health benefits associated with raising them. While owners care about their dogs' health and welfare, it is difficult for them to assess these, and frequent veterinary checkups represent a growing financial burden. In this study, we propose a behavior-based video summarization and visualization system for monitoring a dog's behavioral patterns to help assess its health and welfare. The system proceeds in four modules: (1) a video data collection and preprocessing module; (2) an object detection-based module for retrieving image sequences where the dog is alone and cropping them to reduce background noise; (3) a dog behavior recognition module using two-stream EfficientNetV2 to extract appearance and motion features from the cropped images and their respective optical flow, followed by a long short-term memory (LSTM) model to recognize the dog's behaviors; and (4) a summarization and visualization module to provide effective visual summaries of the dog's location and behavior information to help assess and understand its health and welfare. The experimental results show that the system achieved an average F1 score of 0.955 for behavior recognition, with an execution time allowing real-time processing, while the summarization and visualization results demonstrate how the system can help owners assess and understand their dog's health and welfare.


Asunto(s)
Recolección de Datos , Perros , Animales
7.
Neural Netw ; 161: 359-370, 2023 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-36780859

RESUMEN

Video summarization has long been used to ease video browsing and plays a more crucial role with the explosion of online videos. In the context of event-centric videos, we aim to extract the corresponding clips of more important events in the video. To tackle the dilemma between the detection precision and the clip completeness faced by previous methods, we present an efficient Boundary-Aware framework for Summary clip Extraction (BASE) to extract summary clips with more precise boundaries while maintaining their completeness. Specifically, we propose a new distance-based importance signal to reflect the progress information in each video. The signal can not only help us to detect boundaries with higher precision, but also make it possible to preserve the clip completeness. For the feature presentation part, we also explore new information types to facilitate video summarization. Our approach outperforms current state-of-the-art video summarization models in terms of more precise clip boundaries and more complete summary clips. Note that we even yield comparable results to manual annotations.


Asunto(s)
Grabación en Video , Grabación en Video/métodos
8.
Sensors (Basel) ; 22(21)2022 Oct 28.
Artículo en Inglés | MEDLINE | ID: mdl-36365972

RESUMEN

Video summarization (VS) is a widely used technique for facilitating the effective reading, fast comprehension, and effective retrieval of video content. Certain properties of the new video data, such as a lack of prominent emphasis and a fuzzy theme development border, disturb the original thinking mode based on video feature information. Moreover, it introduces new challenges to the extraction of video depth and breadth features. In addition, the diversity of user requirements creates additional complications for more accurate keyframe screening issues. To overcome these challenges, this paper proposes a hierarchical spatial-temporal cross-attention scheme for video summarization based on comparative learning. Graph attention networks (GAT) and the multi-head convolutional attention cell are used to extract local and depth features, while the GAT-adjusted bidirection ConvLSTM (DB-ConvLSTM) is used to extract global and breadth features. Furthermore, a spatial-temporal cross-attention-based ConvLSTM is developed for merging hierarchical characteristics and achieving more accurate screening in similar keyframes clusters. Verification experiments and comparative analysis demonstrate that our method outperforms state-of-the-art methods.


Asunto(s)
Algoritmos , Interpretación de Imagen Asistida por Computador , Interpretación de Imagen Asistida por Computador/métodos , Grabación en Video/métodos
9.
Sensors (Basel) ; 22(19)2022 Oct 10.
Artículo en Inglés | MEDLINE | ID: mdl-36236789

RESUMEN

Deep summarization models have succeeded in the video summarization field based on the development of gated recursive unit (GRU) and long and short-term memory (LSTM) technology. However, for some long videos, GRU and LSTM cannot effectively capture long-term dependencies. This paper proposes a deep summarization network with auxiliary summarization losses to address this problem. We introduce an unsupervised auxiliary summarization loss module with LSTM and a swish activation function to capture the long-term dependencies for video summarization, which can be easily integrated with various networks. The proposed model is an unsupervised framework for deep reinforcement learning that does not depend on any labels or user interactions. Additionally, we implement a reward function (R(S)) that jointly considers the consistency, diversity, and representativeness of generated summaries. Furthermore, the proposed model is lightweight and can be successfully deployed on mobile devices and enhance the experience of mobile users and reduce pressure on server operations. We conducted experiments on two benchmark datasets and the results demonstrate that our proposed unsupervised approach can obtain better summaries than existing video summarization methods. Furthermore, the proposed algorithm can generate higher F scores with a nearly 6.3% increase on the SumMe dataset and a 2.2% increase on the TVSum dataset compared to the DR-DSN model.


Asunto(s)
Algoritmos , Memoria a Largo Plazo , Memoria a Largo Plazo/fisiología
10.
Comput Biol Med ; 149: 106087, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-36115301

RESUMEN

Wireless capsule endoscopy (WCE) can be viewed as an innovative technology introduced in the medical domain to directly visualize the digestive system using a battery-powered electronic capsule. It is considered a desirable substitute for conventional digestive tract diagnostic methods for a comfortable and painless inspection. Despite many benefits, WCE results in poor video quality due to low frame resolution and diagnostic accuracy. Many research groups have presented diversified, low-complexity compression techniques to economize battery power consumed in the radio-frequency transmission of the captured video, which allows for capturing the images at high resolution. Many vision-based computational methods have been developed to improve the diagnostic yield. These methods include approaches for automatically detecting abnormalities and reducing the amount of time needed for video analysis. Though various research works have been put forth in the WCE imaging field, there is still a wide gap between the existing techniques and the current needs. Hence, this article systematically reviews recent WCE video compression and summarization techniques. The review's objectives are as follows: First, to provide the details of the requirement, challenges and design percepts for the low complexity WCE video compressor. Second, to discuss the most recent compression methods, emphasizing simple distributed video coding methods. Next, to review the most recent summarization techniques and the significance of using deep neural networks. Further, this review aims to provide a quantitative analysis of the state-of-the-art methods along with their advantages and drawbacks. At last, to discuss existing problems and possible future directions for building a robust WCE imaging framework.


Asunto(s)
Endoscopía Capsular , Compresión de Datos , Endoscopía Capsular/métodos , Compresión de Datos/métodos , Tracto Gastrointestinal
11.
Med Image Anal ; 80: 102490, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35717873

RESUMEN

Ultrasound (US) plays a vital role in breast cancer screening, especially for women with dense breasts. Common practice requires a sonographer to recognize key diagnostic features of a lesion and record a single or several representative frames during the dynamic scanning before performing the diagnosis. However, existing computer-aided diagnosis tools often focus on the final diagnosis process while neglecting the influence of the keyframe selection. Moreover, the lesions could have highly-irregular shapes, varying sizes, and locations during the scanning. The recognition of diagnostic characteristics associated with the lesions is challenging and also faces severe class imbalance. To address these, we proposed a reinforcement learning-based framework that can automatically extract keyframes from breast US videos of unfixed length. It is equipped with a detection-based nodule filtering module and a novel reward mechanism that can integrate anatomical and diagnostic features of the lesions into keyframe searching. A simple yet effective loss function was also designed to alleviate the class imbalance issue. Extensive experiments illustrate that the proposed framework can benefit from both innovations and is able to generate representative keyframe sequences in various screening conditions.


Asunto(s)
Neoplasias de la Mama , Ultrasonografía Mamaria , Neoplasias de la Mama/diagnóstico por imagen , Neoplasias de la Mama/patología , Diagnóstico por Computador , Detección Precoz del Cáncer , Femenino , Humanos
12.
Front Big Data ; 5: 1106776, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36700133

RESUMEN

With the massive expansion of videos on the internet, searching through millions of them has become quite challenging. Smartphones, recording devices, and file sharing are all examples of ways to capture massive amounts of real time video. In smart cities, there are many surveillance cameras, which has created a massive volume of video data whose indexing, retrieval, and administration is a difficult problem. Exploring such results takes time and degrades the user experience. In this case, video summarization is extremely useful. Video summarization allows for the efficient storing, retrieval, and browsing of huge amounts of information from video without sacrificing key features. This article presents a classification and analysis of video summarization approaches, with a focus on real-time video summarization (RVS) domain techniques that can be used to summarize videos. The current study will be useful in integrating essential research findings and data for quick reference, laying the preliminaries, and investigating prospective research directions. A variety of practical uses, including aberrant detection in a video surveillance system, have made successful use of video summarization in smart cities.

13.
Softw Impacts ; 10: 100185, 2021 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-34870242

RESUMEN

The COVID-19 pandemic has accelerated the need for automatic triaging and summarization of ultrasound videos for fast access to pathologically relevant information in the Emergency Department and lowering resource requirements for telemedicine. In this work, a PyTorch based unsupervised reinforcement learning methodology which incorporates multi feature fusion to output classification labels, segmentation maps and summary videos for lung ultrasound is presented. The use of unsupervised training eliminates tedious manual labeling of key-frames by clinicians opening new frontiers in scalability in training using unlabeled or weakly labeled data. Our approach was benchmarked against expert clinicians from different geographies displaying superior Precision and F1 scores (over 80% and 44%).

14.
Math Biosci Eng ; 18(6): 9294-9311, 2021 10 27.
Artículo en Inglés | MEDLINE | ID: mdl-34814346

RESUMEN

Numerous limitations of Shot-based and Content-based key-frame extraction approaches have encouraged the development of Cluster-based algorithms. This paper proposes an Optimal Threshold and Maximum Weight (OTMW) clustering approach that allows accurate and automatic extraction of video summarization. Firstly, the video content is analyzed using the image color, texture and information complexity, and video feature dataset is constructed. Then a Golden Section method is proposed to determine the threshold function optimal solution. The initial cluster center and the cluster number k are automatically obtained by employing the improved clustering algorithm. k-clusters video frames are produced with the help of K-MEANS algorithm. The representative frame of each cluster is extracted using the Maximum Weight method and an accurate video summarization is obtained. The proposed approach is tested on 16 multi-type videos, and the obtained key-frame quality evaluation index, and the average of Fidelity and Ratio are 96.11925 and 97.128, respectively. Fortunately, the key-frames extracted by the proposed approach are consistent with artificial visual judgement. The performance of the proposed approach is compared with several state-of-the-art cluster-based algorithms, and the Fidelity are increased by 12.49721, 10.86455, 10.62984 and 10.4984375, respectively. In addition, the Ratio is increased by 1.958 on average with small fluctuations. The obtained experimental results demonstrate the advantage of the proposed solution over several related baselines on sixteen diverse datasets and validated that proposed approach can accurately extract video summarization from multi-type videos.


Asunto(s)
Algoritmos , Análisis por Conglomerados
15.
Entropy (Basel) ; 23(8)2021 Jul 30.
Artículo en Inglés | MEDLINE | ID: mdl-34441122

RESUMEN

Because the data volume of news videos is increasing exponentially, a way to quickly browse a sketch of the video is important in various applications, such as news media, archives and publicity. This paper proposes a news video summarization method based on SURF features and an improved clustering algorithm, to overcome the defects in existing algorithms that fail to account for changes in shot complexity. Firstly, we extracted SURF features from the video sequences and matched the features between adjacent frames, and then detected the abrupt and gradual boundaries of the shot by calculating similarity scores between adjacent frames with the help of double thresholds. Secondly, we used an improved clustering algorithm to cluster the color histogram of the video frames within the shot, which merged the smaller clusters and then selected the frame closest to the cluster center as the key frame. The experimental results on both the public and self-built datasets show the superiority of our method over the alternatives in terms of accuracy and speed. Additionally, the extracted key frames demonstrate low redundancy and can credibly represent a sketch of news videos.

16.
Sensors (Basel) ; 21(13)2021 Jul 02.
Artículo en Inglés | MEDLINE | ID: mdl-34283118

RESUMEN

This paper addresses the problem of unsupervised video summarization. Video summarization helps people browse large-scale videos easily with a summary from the selected frames of the video. In this paper, we propose an unsupervised video summarization method with piecewise linear interpolation (Interp-SUM). Our method aims to improve summarization performance and generate a natural sequence of keyframes with predicting importance scores of each frame utilizing the interpolation method. To train the video summarization network, we exploit a reinforcement learning-based framework with an explicit reward function. We employ the objective function of the exploring under-appreciated reward method for training efficiently. In addition, we present a modified reconstruction loss to promote the representativeness of the summary. We evaluate the proposed method on two datasets, SumMe and TVSum. The experimental result showed that Interp-SUM generates the most natural sequence of summary frames than any other the state-of-the-art methods. In addition, Interp-SUM still showed comparable performance with the state-of-art research on unsupervised video summarization methods, which is shown and analyzed in the experiments of this paper.


Asunto(s)
Algoritmos , Interpretación de Imagen Asistida por Computador , Humanos , Grabación en Video
17.
Entropy (Basel) ; 22(11)2020 Nov 12.
Artículo en Inglés | MEDLINE | ID: mdl-33287053

RESUMEN

This paper proposes a video summarization algorithm called the Mutual Information and Entropy based adaptive Sliding Window (MIESW) method, which is specifically for the static summary of gesture videos. Considering that gesture videos usually have uncertain transition postures and unclear movement boundaries or inexplicable frames, we propose a three-step method where the first step involves browsing a video, the second step applies the MIESW method to select candidate key frames, and the third step removes most redundant key frames. In detail, the first step is to convert the video into a sequence of frames and adjust the size of the frames. In the second step, a key frame extraction algorithm named MIESW is executed. The inter-frame mutual information value is used as a metric to adaptively adjust the size of the sliding window to group similar content of the video. Then, based on the entropy value of the frame and the average mutual information value of the frame group, the threshold method is applied to optimize the grouping, and the key frames are extracted. In the third step, speeded up robust features (SURF) analysis is performed to eliminate redundant frames in these candidate key frames. The calculation of Precision, Recall, and Fmeasure are optimized from the perspective of practicality and feasibility. Experiments demonstrate that key frames extracted using our method provide high-quality video summaries and basically cover the main content of the gesture video.

18.
Appl Plant Sci ; 8(8): e11387, 2020 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-32995105

RESUMEN

PREMISE: Aerial imagery from small unmanned aerial vehicle systems is a promising approach for high-throughput phenotyping and precision agriculture. A key requirement for both applications is to create a field-scale mosaic of the aerial imagery sequence so that the same features are in registration, a very challenging problem for crop imagery. METHODS: We have developed an improved mosaicking pipeline, Video Mosaicking and summariZation (VMZ), which uses a novel two-dimensional mosaicking algorithm that minimizes errors in estimating the transformations between successive frames during registration. The VMZ pipeline uses only the imagery, rather than relying on vehicle telemetry, ground control points, or global positioning system data, to estimate the frame-to-frame homographies. It exploits the spatiotemporal ordering of the image frames to reduce the computational complexity of finding corresponding features between frames using feature descriptors. We compared the performance of VMZ to a standard two-dimensional mosaicking algorithm (AutoStitch) by mosaicking imagery of two maize (Zea mays) research nurseries freely flown with a variety of trajectories. RESULTS: The VMZ pipeline produces superior mosaics faster. Using the speeded up robust features (SURF) descriptor, VMZ produces the highest-quality mosaics. DISCUSSION: Our results demonstrate the value of VMZ for the future automated extraction of plant phenotypes and dynamic scouting for crop management.

19.
Heliyon ; 5(10): e02699, 2019 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-31720461

RESUMEN

Video summarization aims to find a compact representation of input videos. The method finds out interesting parts of the video by discarding the remaining parts of the video. The abstracts thus generated enhances browsing and retrieval of video data. The quality of summaries generated by video summarization algorithms can be improved if the redundant frames in the input video are taken care of before summarization. This paper presents a novel domain-independent method for redundancy elimination from input videos before summarization maintaining keyframes in the original video. The frames of input video are first presampled by selecting two frames in one second. The flow vectors between consecutive frames are computed using SIFT Flow algorithm. The magnitude of flow vectors at each pixel position of the frame are summed up to find the displacement magnitude between the consecutive frames. The redundant frames are filtered out based on local averaging of the displacement values. The evaluation of the method is done using two standard datasets namely VSUMM and OVP. The results demonstrate that an average reduction rate of 97.64% is achieved consistently on videos of all categories. The method also gives superior results compared to other state-of-the-art redundancy elimination methods for video summarization.

20.
Entropy (Basel) ; 20(10)2018 Sep 29.
Artículo en Inglés | MEDLINE | ID: mdl-33265837

RESUMEN

Multimedia information requires large repositories of audio-video data. Retrieval and delivery of video content is a very time-consuming process and is a great challenge for researchers. An efficient approach for faster browsing of large video collections and more efficient content indexing and access is video summarization. Compression of data through extraction of keyframes is a solution to these challenges. A keyframe is a representative frame of the salient features of the video. The output frames must represent the original video in temporal order. The proposed research presents a method of keyframe extraction using the mean of consecutive k frames of video data. A sliding window of size k / 2 is employed to select the frame that matches the median entropy value of the sliding window. This is called the Median of Entropy of Mean Frames (MME) method. MME is mean-based keyframes selection using the median of the entropy of the sliding window. The method was tested for more than 500 videos of sign language gestures and showed satisfactory results.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA