RESUMO
Videos capture events that typically contain multiple sequential, and simultaneous, actions even in the span of only a few seconds. However, most large-scale datasets built to train models for action recognition in video only provide a single label per video. Consequently, models can be incorrectly penalized for classifying actions that exist in the videos but are not explicitly labeled and do not learn the full spectrum of information present in each video in training. Towards this goal, we present the Multi-Moments in Time dataset (M-MiT) which includes over two million action labels for over one million three second videos. This multi-label dataset introduces novel challenges on how to train and analyze models for multi-action detection. Here, we present baseline results for multi-action recognition using loss functions adapted for long tail multi-label learning, provide improved methods for visualizing and interpreting models trained for multi-label action detection and show the strength of transferring models trained on M-MiT to smaller datasets.
Assuntos
Algoritmos , AprendizagemRESUMO
Generative adversarial networks (GANs) enable computers to learn complex data distributions and sample from these distributions. When applied to the visual domain, this allows artificial, yet photorealistic images to be synthesized. Their success at this very challenging task triggered an explosion of research within the field of artificial intelligence (AI), yielding various new GAN findings and applications. After explaining the core principles behind GANs and reviewing recent GAN innovations, we illustrate how they can be applied to tackle thorny theoretical and methodological problems in cognitive science. We focus on how GANs can reveal hidden structure in internal representations and how they offer a valuable new compromise in the trade-off between experimental control and ecological validity.
Assuntos
Inteligência Artificial , Processamento de Imagem Assistida por Computador , Ciência Cognitiva , Computadores , Humanos , Redes Neurais de ComputaçãoRESUMO
We present the Moments in Time Dataset, a large-scale human-annotated collection of one million short videos corresponding to dynamic events unfolding within three seconds. Modeling the spatial-audio-temporal dynamics even for actions occurring in 3 second videos poses many challenges: meaningful events do not include only people, but also objects, animals, and natural phenomena; visual and auditory events can be symmetrical in time ("opening" is "closing" in reverse), and either transient or sustained. We describe the annotation process of our dataset (each video is tagged with one action or activity label among 339 different classes), analyze its scale and diversity in comparison to other large-scale video datasets for action recognition, and report results of several baseline models addressing separately, and jointly, three modalities: spatial, temporal and auditory. The Moments in Time dataset, designed to have a large coverage and diversity of events in both visual and auditory modalities, can serve as a new challenge to develop models that scale to the level of complexity and abstract reasoning that a human processes on a daily basis.
Assuntos
Bases de Dados Factuais , Gravação em Vídeo , Animais , Atividades Humanas/classificação , Humanos , Processamento de Imagem Assistida por Computador , Reconhecimento Automatizado de PadrãoRESUMO
The bone morphogenetic protein (BMP) signaling pathway is essential for normal development and tissue homeostasis. BMP signal transduction occurs when ligands interact with a complex of type 1 and type 2 receptors to activate downstream transcription factors. It is well established that a single BMP receptor may bind multiple BMP ligands with varying affinity, and this has been largely attributed to conformation at the amino acid level. However, all three type 2 BMP receptors (BMPR2, ACVR2A/B) contain consensus N-glycosylation sites in their extracellular domains (ECDs), which could play a role in modulating interaction with ligand. Here, we show a differential pattern of N-glycosylation between BMPR2 and ACVR2A/B. Site-directed mutagenesis reveals that BMPR2 is uniquely glycosylated near its ligand binding domain and at a position that is mutated in patients with heritable pulmonary arterial hypertension. We further demonstrate using a cell-free pulldown assay that N-glycosylation of the BMPR2-ECD enhances its ability to bind BMP2 ligand but has no impact on binding by the closely-related ACVR2B. Our results illuminate a novel aspect of BMP signaling pathway mechanics and demonstrate a functional difference resulting from post-translational modification of type 2 BMP receptors. Additionally, since BMPR2 is required for several aspects of normal development and defects in its function are strongly implicated in human disease, our findings are likely to be relevant in several biological contexts in normal and abnormal human physiology.