Search | VHL Regional Portal

The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines.

Damen, Dima; Doughty, Hazel; Farinella, Giovanni Maria; Fidler, Sanja; Furnari, Antonino; Kazakos, Evangelos; Moltisanti, Davide; Munro, Jonathan; Perrett, Toby; Price, Will; Wray, Michael.

IEEE Trans Pattern Anal Mach Intell ; 43(11): 4125-4141, 2021 11.

Article in English | MEDLINE | ID: mdl-32365017

ABSTRACT

Since its introduction in 2018, EPIC-KITCHENS has attracted attention as the largest egocentric video benchmark, offering a unique viewpoint on people's interaction with objects, their attention, and even intention. In this paper, we detail how this large-scale dataset was captured by 32 participants in their native kitchen environments, and densely annotated with actions and object interactions. Our videos depict nonscripted daily activities, as recording is started every time a participant entered their kitchen. Recording took place in four countries by participants belonging to ten different nationalities, resulting in highly diverse kitchen habits and cooking styles. Our dataset features 55 hours of video consisting of 11.5M frames, which we densely labelled for a total of 39.6K action segments and 454.2K object bounding boxes. Our annotation is unique in that we had the participants narrate their own videos (after recording), thus reflecting true intention, and we crowd-sourced ground-truths based on these. We describe our object, action and anticipation challenges, and evaluate several baselines over two test splits, seen and unseen kitchens. We introduce new baselines that highlight the multimodal nature of the dataset and the importance of explicit temporal modelling to discriminate fine-grained actions (e.g., 'closing a tap' from 'opening' it up).

Subject(s)

Algorithms , Cooking , Attention , Humans

Identifying Clinical Terms in Medical Text Using Ontology-Guided Machine Learning.

Arbabi, Aryan; Adams, David R; Fidler, Sanja; Brudno, Michael.

JMIR Med Inform ; 7(2): e12596, 2019 May 10.

Article in English | MEDLINE | ID: mdl-31094361

ABSTRACT

BACKGROUND: Automatic recognition of medical concepts in unstructured text is an important component of many clinical and research applications, and its accuracy has a large impact on electronic health record analysis. The mining of medical concepts is complicated by the broad use of synonyms and nonstandard terms in medical documents. OBJECTIVE: We present a machine learning model for concept recognition in large unstructured text, which optimizes the use of ontological structures and can identify previously unobserved synonyms for concepts in the ontology. METHODS: We present a neural dictionary model that can be used to predict if a phrase is synonymous to a concept in a reference ontology. Our model, called the Neural Concept Recognizer (NCR), uses a convolutional neural network to encode input phrases and then rank medical concepts based on the similarity in that space. It uses the hierarchical structure provided by the biomedical ontology as an implicit prior embedding to better learn embedding of various terms. We trained our model on two biomedical ontologies-the Human Phenotype Ontology (HPO) and Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT). RESULTS: We tested our model trained on HPO by using two different data sets: 288 annotated PubMed abstracts and 39 clinical reports. We achieved 1.7%-3% higher F1-scores than those for our strongest manually engineered rule-based baselines (P=.003). We also tested our model trained on the SNOMED-CT by using 2000 Intensive Care Unit discharge summaries from MIMIC (Multiparameter Intelligent Monitoring in Intensive Care) and achieved 0.9%-1.3% higher F1-scores than those of our baseline. The results of our experiments show high accuracy of our model as well as the value of using the taxonomy structure of the ontology in concept recognition. CONCLUSION: Most popular medical concept recognizers rely on rule-based models, which cannot generalize well to unseen synonyms. In addition, most machine learning methods typically require large corpora of annotated text that cover all classes of concepts, which can be extremely difficult to obtain for biomedical ontologies. Without relying on large-scale labeled training data or requiring any custom training, our model can be efficiently generalized to new synonyms and performs as well or better than state-of-the-art methods custom built for specific ontologies.

3D Object Proposals Using Stereo Imagery for Accurate Object Class Detection.

Chen, Xiaozhi; Kundu, Kaustav; Zhu, Yukun; Ma, Huimin; Fidler, Sanja; Urtasun, Raquel.

IEEE Trans Pattern Anal Mach Intell ; 40(5): 1259-1272, 2018 05.

Article in English | MEDLINE | ID: mdl-28541196

ABSTRACT

The goal of this paper is to perform 3D object detection in the context of autonomous driving. Our method aims at generating a set of high-quality 3D object proposals by exploiting stereo imagery. We formulate the problem as minimizing an energy function that encodes object size priors, placement of objects on the ground plane as well as several depth informed features that reason about free space, point cloud densities and distance to the ground. We then exploit a CNN on top of these proposals to perform object detection. In particular, we employ a convolutional neural net (CNN) that exploits context and depth information to jointly regress to 3D bounding box coordinates and object pose. Our experiments show significant performance gains over existing RGB and RGB-D object proposal methods on the challenging KITTI benchmark. When combined with the CNN, our approach outperforms all existing results in object detection and orientation estimation tasks for all three KITTI object classes. Furthermore, we experiment also with the setting where LIDAR information is available, and show that using both LIDAR and stereo leads to the best result.

Human-Machine CRFs for Identifying Bottlenecks in Scene Understanding.

Mottaghi, Roozbeh; Fidler, Sanja; Yuille, Alan; Urtasun, Raquel; Parikh, Devi.

IEEE Trans Pattern Anal Mach Intell ; 38(1): 74-87, 2016 Jan.

Article in English | MEDLINE | ID: mdl-26656579

ABSTRACT

Recent trends in image understanding have pushed for scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning, and local appearance based classifiers. In this work, we are interested in understanding the roles of these different tasks in improved scene understanding, in particular semantic segmentation, object detection and scene recognition. Towards this goal, we "plug-in" human subjects for each of the various components in a conditional random field model. Comparisons among various hybrid human-machine CRFs give us indications of how much "head room" there is to improve scene understanding by focusing research efforts on various individual tasks.

Subject(s)

Artificial Intelligence/statistics & numerical data , Brain-Computer Interfaces/statistics & numerical data , Algorithms , Computer Simulation , Databases, Factual , Humans , Pattern Recognition, Automated/statistics & numerical data , Pattern Recognition, Visual

Combining reconstructive and discriminative subspace methods for robust classification and regression by subsampling.

Fidler, Sanja; Skocaj, Danijel; Leonardis, Ales.

IEEE Trans Pattern Anal Mach Intell ; 28(3): 337-50, 2006 Mar.

Article in English | MEDLINE | ID: mdl-16526421

ABSTRACT

Linear subspace methods that provide sufficient reconstruction of the data, such as PCA, offer an efficient way of dealing with missing pixels, outliers, and occlusions that often appear in the visual data. Discriminative methods, such as LDA, which, on the other hand, are better suited for classification tasks, are highly sensitive to corrupted data. We present a theoretical framework for achieving the best of both types of methods: An approach that combines the discrimination power of discriminative methods with the reconstruction property of reconstructive methods which enables one to work on subsets of pixels in images to efficiently detect and reject the outliers. The proposed approach is therefore capable of robust classification with a high-breakdown point. We also show that subspace methods, such as CCA, which are used for solving regression tasks, can be treated in a similar manner. The theoretical results are demonstrated on several computer vision tasks showing that the proposed approach significantly outperforms the standard discriminative methods in the case of missing pixels and images containing occlusions and outliers.

Subject(s)

Algorithms , Artificial Intelligence , Face/anatomy & histology , Image Interpretation, Computer-Assisted/methods , Models, Biological , Pattern Recognition, Automated/methods , Signal Processing, Computer-Assisted , Cluster Analysis , Computer Simulation , Discriminant Analysis , Humans , Image Enhancement/methods , Information Storage and Retrieval/methods , Models, Statistical , Principal Component Analysis , Regression Analysis , Reproducibility of Results , Sample Size , Sensitivity and Specificity

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL