Pesquisa | Portal Regional da BVS (teste)

Image recognition-based petal arrangement estimation.

Nakatani, Tomoya; Utsumi, Yuzuko; Fujimoto, Koichi; Iwamura, Masakazu; Kise, Koichi.

Front Plant Sci ; 15: 1334362, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38638358

RESUMO

Flowers exhibit morphological diversity in the number and positional arrangement of their floral organs, such as petals. The petal arrangements of blooming flowers are represented by the overlap position relation between neighboring petals, an indicator of the floral developmental process; however, only specialists are capable of the petal arrangement identification. Therefore, we propose a method to support the estimation of the arrangement of the perianth organs, including petals and tepals, using image recognition techniques. The problem for realizing the method is that it is not possible to prepare a large number of image datasets: we cannot apply the latest machine learning based image processing methods, which require a large number of images. Therefore, we describe the tepal arrangement as a sequence of interior-exterior patterns of tepal overlap in the image, and estimate the tepal arrangement by matching the pattern with the known patterns. We also use methods that require less or no training data to implement the method: the fine-tuned YOLO v5 model for flower detection, GrubCut for flower segmentation, the Harris corner detector for tepal overlap detection, MAML-based interior-exterior estimation, and circular permutation matching for tepal arrangement estimation. Experimental results showed good accuracy when flower detection, segmentation, overlap location estimation, interior-exterior estimation, and circle permutation matching-based tepal arrangement estimation were evaluated independently. However, the accuracy decreased when they were integrated. Therefore, we developed a user interface for manual correction of the position of overlap estimation and interior-exterior pattern estimation, which ensures the quality of tepal arrangement estimation.

ViTSTR-Transducer: Cross-Attention-Free Vision Transformer Transducer for Scene Text Recognition.

Buoy, Rina; Iwamura, Masakazu; Srun, Sovila; Kise, Koichi.

J Imaging ; 9(12)2023 Dec 13.

Artigo em Inglês | MEDLINE | ID: mdl-38132694

RESUMO

Attention-based encoder-decoder scene text recognition (STR) architectures have been proven effective in recognizing text in the real world, thanks to their ability to learn an internal language model. Nevertheless, the cross-attention operation that is used to align visual and linguistic features during decoding is computationally expensive, especially in low-resource environments. To address this bottleneck, we propose a cross-attention-free STR framework that still learns a language model. The framework we propose is ViTSTR-Transducer, which draws inspiration from ViTSTR, a vision transformer (ViT)-based method designed for STR and the recurrent neural network transducer (RNN-T) initially introduced for speech recognition. The experimental results show that our ViTSTR-Transducer models outperform the baseline attention-based models in terms of the required decoding floating point operations (FLOPs) and latency while achieving a comparable level of recognition accuracy. Compared with the baseline context-free ViTSTR models, our proposed models achieve superior recognition accuracy. Furthermore, compared with the recent state-of-the-art (SOTA) methods, our proposed models deliver competitive results.

Explainable Connectionist-Temporal-Classification-Based Scene Text Recognition.

Buoy, Rina; Iwamura, Masakazu; Srun, Sovila; Kise, Koichi.

J Imaging ; 9(11)2023 Nov 15.

Artigo em Inglês | MEDLINE | ID: mdl-37998095

RESUMO

Connectionist temporal classification (CTC) is a favored decoder in scene text recognition (STR) for its simplicity and efficiency. However, most CTC-based methods utilize one-dimensional (1D) vector sequences, usually derived from a recurrent neural network (RNN) encoder. This results in the absence of explainable 2D spatial relationship between the predicted characters and corresponding image regions, essential for model explainability. On the other hand, 2D attention-based methods enhance recognition accuracy and offer character location information via cross-attention mechanisms, linking predictions to image regions. However, these methods are more computationally intensive, compared with the 1D CTC-based methods. To achieve both low latency and model explainability via character localization using a 1D CTC decoder, we propose a marginalization-based method that processes 2D feature maps and predicts a sequence of 2D joint probability distributions over the height and class dimensions. Based on the proposed method, we newly introduce an association map that aids in character localization and model prediction explanation. This map parallels the role of a cross-attention map, as seen in computationally-intensive attention-based architectures. With the proposed method, we consider a ViT-CTC STR architecture that uses a 1D CTC decoder and a pretrained vision Transformer (ViT) as a 2D feature extractor. Our ViT-CTC models were trained on synthetic data and fine-tuned on real labeled sets. These models outperform the recent state-of-the-art (SOTA) CTC-based methods on benchmarks in terms of recognition accuracy. Compared with the baseline Transformer-decoder-based models, our ViT-CTC models offer a speed boost up to 12 times regardless of the backbone, with a maximum 3.1% reduction in total word recognition accuracy. In addition, both qualitative and quantitative assessments of character locations estimated from the association map align closely with those from the cross-attention map and ground-truth character-level bounding boxes.

Examining Participant Adherence with Wearables in an In-the-Wild Setting.

Nolasco, Hannah R; Vargo, Andrew; Bohley, Niklas; Brinkhaus, Christian; Kise, Koichi.

Sensors (Basel) ; 23(14)2023 Jul 18.

Artigo em Inglês | MEDLINE | ID: mdl-37514773

RESUMO

Wearable devices offer a wealth of data for ubiquitous computing researchers. For instance, sleep data from a wearable could be used to identify an individual's harmful habits. Recently, devices which are unobtrusive in size, setup, and maintenance are becoming commercially available. However, most data validation for these devices come from brief, short-term laboratory studies or experiments which have unrepresentative samples that are also inaccessible to most researchers. For wearables research conducted in-the-wild, the prospect of running a study has the risk of financial costs and failure. Thus, when researchers conduct in-the-wild studies, the majority of participants tend to be university students. In this paper, we present a month-long in-the-wild study with 31 Japanese adults who wore a sleep tracking device called the Oura ring. The high device usage results found in this study can be used to inform the design and deployment of longer-term mid-size in-the-wild studies.

Assuntos

Dispositivos Eletrônicos Vestíveis , Adulto , Humanos , Sono

TAIM: Tool for Analyzing Root Images to Calculate the Infection Rate of Arbuscular Mycorrhizal Fungi.

Muta, Kaoru; Takata, Shiho; Utsumi, Yuzuko; Matsumura, Atsushi; Iwamura, Masakazu; Kise, Koichi.

Front Plant Sci ; 13: 881382, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35592584

RESUMO

Arbuscular mycorrhizal fungi (AMF) infect plant roots and are hypothesized to improve plant growth. Recently, AMF is now available for axenic culture. Therefore, AMF is expected to be used as a microbial fertilizer. To evaluate the usefulness of AMF as a microbial fertilizer, we need to investigate the relationship between the degree of root colonization of AMF and plant growth. The method popularly used for calculation of the degree of root colonization, termed the magnified intersections method, is performed manually and is too labor-intensive to enable an extensive survey to be undertaken. Therefore, we automated the magnified intersections method by developing an application named "Tool for Analyzing root images to calculate the Infection rate of arbuscular Mycorrhizal fungi: TAIM." TAIM is a web-based application that calculates the degree of AMF colonization from images using automated computer vision and pattern recognition techniques. Experimental results showed that TAIM correctly detected sampling areas for calculation of the degree of infection and classified the sampling areas with 87.4% accuracy. TAIM is publicly accessible at http://taim.imlab.jp/.

Tiller estimation method using deep neural networks.

Kinose, Rikuya; Utsumi, Yuzuko; Iwamura, Masakazu; Kise, Koichi.

Front Plant Sci ; 13: 1016507, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36714728

RESUMO

This paper describes a method based on a deep neural network (DNN) for estimating the number of tillers on a plant. A tiller is a branch on a grass plant, and the number of tillers is one of the most important determinants of yield. Traditionally, the tiller number is usually counted by hand, and so an automated approach is necessary for high-throughput phenotyping. Conventional methods use heuristic features to estimate the tiller number. Based on the successful application of DNNs in the field of computer vision, the use of DNN-based features instead of heuristic features is expected to improve the estimation accuracy. However, as DNNs generally require large volumes of data for training, it is difficult to apply them to estimation problems for which large training datasets are unavailable. In this paper, we use two strategies to overcome the problem of insufficient training data: the use of a pretrained DNN model and the use of pretext tasks for learning the feature representation. We extract features using the resulting DNNs and estimate the tiller numbers through a regression technique. We conducted experiments using side-view whole plant images taken with plan backgroud. The experimental results show that the proposed methods using a pretrained model and specific pretext tasks achieve better performance than the conventional method.

Focusing on the face or getting distracted by social signals? The effect of distracting gestures on attentional focus in natural interaction.

Kajopoulos, Jasmin; Cheng, Gordon; Kise, Koichi; Müller, Hermann J; Wykowska, Agnieszka.

Psychol Res ; 85(2): 491-502, 2021 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-32705336

RESUMO

Attentional orienting towards others' gaze direction or pointing has been well investigated in laboratory conditions. However, less is known about the operation of attentional mechanisms in online naturalistic social interaction scenarios. It is equally plausible that following social directional cues (gaze, pointing) occurs reflexively, and/or that it is influenced by top-down cognitive factors. In a mobile eye-tracking experiment, we show that under natural interaction conditions, overt attentional orienting is not necessarily reflexively triggered by pointing gestures or a combination of gaze shifts and pointing gestures. We found that participants conversing with an experimenter, who, during the interaction, would play out pointing gestures as well as directional gaze movements, continued to mostly focus their gaze on the face of the experimenter, demonstrating the significance of attending to the face of the interaction partner-in line with effective top-down control over reflexive orienting of attention in the direction of social cues.

Assuntos

Atenção/fisiologia , Sinais (Psicologia) , Face , Gestos , Orientação Espacial/fisiologia , Adulto , Feminino , Fixação Ocular/fisiologia , Humanos , Masculino , Estimulação Luminosa/métodos , Adulto Jovem

Automatic Generation of Typographic Font From Small Font Subset.

Miyazaki, Tomo; Tsuchiya, Tatsunori; Sugaya, Yoshihiro; Omachi, Shinichiro; Iwamura, Masakazu; Uchida, Seiichi; Kise, Koichi.

IEEE Comput Graph Appl ; 40(1): 99-111, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-31380748

RESUMO

The automated generation of fonts containing a large number of characters is in high demand. For example, a typical Japanese font requires over 1000 characters. Unfortunately, professional typographers create the majority of fonts, resulting in significant financial and time investments for font generation. The main contribution of this article is the development of a method that automatically generates a target typographic font containing thousands of characters, from a small subset of character images in the target font. We generate characters other than the subset so that a complete font is obtained. We propose a novel font generation method with the capability to deal with various fonts, including a font composed of distinctive strokes, which are difficult for existing methods to handle. We demonstrated the proposed method by generating 2965 characters in 47 fonts. Moreover, objective and subjective evaluations verified that the generated characters are similar to the original characters.

Estimation of reading subjective understanding based on eye gaze analysis.

Lima Sanches, Charles; Augereau, Olivier; Kise, Koichi.

PLoS One ; 13(10): e0206213, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30359411

RESUMO

The integration of ubiquitous technologies in the field of education has considerably enhanced our way of learning. Such technologies enable students to get a gradual feedback about their performance and to provide adapted learning materials. It is particularly important in the domain of foreign language learning which requires intense daily practice. One of the main inputs of adaptive learning systems is the user's understanding of a reading material. The reader's understanding can be divided into two parts: the objective understanding and the subjective understanding. The objective understanding can be measured by comprehension questions about the content of the text. The subjective understanding is the reader's perception of his own understanding. The subjective understanding plays an important role in the reader's motivation, self-esteem and confidence. However, its automatic estimation remains a challenging task. This paper is one of the first to propose a method to estimate the subjective understanding. We show that using the eye gaze to predict the subjective understanding improves the estimation by 13% as compared to using comprehension questions.

Assuntos

Compreensão/fisiologia , Avaliação Educacional/métodos , Movimentos Oculares/fisiologia , Fixação Ocular/fisiologia , Leitura , Adulto , Algoritmos , Atenção/fisiologia , Feminino , Humanos , Idioma , Testes de Linguagem , Masculino , Movimentos Sacádicos/fisiologia , Estudantes/psicologia , Adulto Jovem

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA