Pesquisa | Portal Regional da BVS (teste)

A Pix2Pix Architecture for Complete Offline Handwritten Text Normalization.

Barreiro-Garrido, Alvaro; Ruiz-Parrado, Victoria; Moreno, A Belen; Velez, Jose F.

Sensors (Basel) ; 24(12)2024 Jun 16.

Artigo em Inglês | MEDLINE | ID: mdl-38931676

RESUMO

In the realm of offline handwritten text recognition, numerous normalization algorithms have been developed over the years to serve as preprocessing steps prior to applying automatic recognition models to handwritten text scanned images. These algorithms have demonstrated effectiveness in enhancing the overall performance of recognition architectures. However, many of these methods rely heavily on heuristic strategies that are not seamlessly integrated with the recognition architecture itself. This paper introduces the use of a Pix2Pix trainable model, a specific type of conditional generative adversarial network, as the method to normalize handwritten text images. Also, this algorithm can be seamlessly integrated as the initial stage of any deep learning architecture designed for handwritten recognition tasks. All of this facilitates training the normalization and recognition components as a unified whole, while still maintaining some interpretability of each module. Our proposed normalization approach learns from a blend of heuristic transformations applied to text images, aiming to mitigate the impact of intra-personal handwriting variability among different writers. As a result, it achieves slope and slant normalizations, alongside other conventional preprocessing objectives, such as normalizing the size of text ascenders and descenders. We will demonstrate that the proposed architecture replicates, and in certain cases surpasses, the results of a widely used heuristic algorithm across two metrics and when integrated as the first step of a deep recognition architecture.

SSD vs. YOLO for Detection of Outdoor Urban Advertising Panels under Multiple Variabilities.

Morera, Ángel; Sánchez, Ángel; Moreno, A Belén; Sappa, Ángel D; Vélez, José F.

Sensors (Basel) ; 20(16)2020 Aug 15.

Artigo em Inglês | MEDLINE | ID: mdl-32824232

RESUMO

This work compares Single Shot MultiBox Detector (SSD) and You Only Look Once (YOLO) deep neural networks for the outdoor advertisement panel detection problem by handling multiple and combined variabilities in the scenes. Publicity panel detection in images offers important advantages both in the real world as well as in the virtual one. For example, applications like Google Street View can be used for Internet publicity and when detecting these ads panels in images, it could be possible to replace the publicity appearing inside the panels by another from a funding company. In our experiments, both SSD and YOLO detectors have produced acceptable results under variable sizes of panels, illumination conditions, viewing perspectives, partial occlusion of panels, complex background and multiple panels in scenes. Due to the difficulty of finding annotated images for the considered problem, we created our own dataset for conducting the experiments. The major strength of the SSD model was the almost elimination of False Positive (FP) cases, situation that is preferable when the publicity contained inside the panel is analyzed after detecting them. On the other side, YOLO produced better panel localization results detecting a higher number of True Positive (TP) panels with a higher accuracy. Finally, a comparison of the two analyzed object detection models with different types of semantic segmentation networks and using the same evaluation metrics is also included.

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA