Sort by: Year Popularity Relevance

Learning monocular depth estimation infusing traditional stereo knowledge

, , ,  - 2019

Depth estimation from a single image represents a fascinating, yet challenging problem with countless applications. Recent works proved that this task could be learned without direct supervision from ground truth labels leveraging image synthesis on sequences or stereo pairs. Focusing on this second case, in this paper we leverage stereo matching...


Self-Supervised Learning of 3D Human Pose using Multi-view Geometry

, ,  - 2019

Training accurate 3D human pose estimators requires large amount of 3D ground-truth data which is costly to collect. Various weakly or self supervised pose estimation methods have been proposed due to lack of 3D data. Nevertheless, these methods, in addition to 2D ground-truth poses, require either additional supervision in various forms (e.g. un...


Passage Re-ranking with BERT

,  - 2019

Recently, neural models pretrained on a language modeling task, such as ELMo (Peters et al., 2017), OpenAI GPT (Radford et al., 2018), and BERT (Devlin et al., 2018), have achieved impressive results on various natural language processing tasks such as question-answering and natural language inference. In this paper, we describe a simple re-imple...


Scale-Adaptive Neural Dense Features: Learning via Hierarchical Context Aggregation

, ,  - 2019

How do computers and intelligent agents view the world around them? Feature extraction and representation constitutes one the basic building blocks towards answering this question. Traditionally, this has been done with carefully engineered hand-crafted techniques such as HOG, SIFT or ORB. However, there is no ``one size fits all'' approach that ...


A Hybrid SLAM and Object Recognition System for Pepper Robot

, ,  - 2019

Humanoid robots are playing increasingly important roles in real-life tasks especially when it comes to indoor applications. Providing robust solutions for the tasks such as indoor environment mapping, self-localisation and object recognition are essential to make the robots to be more autonomous, hence, more human-like. The well-known Aldebaran ...


A Two-Stream Siamese Neural Network for Vehicle Re-Identification by Using Non-Overlapping Cameras

, ,  - 2019

We describe in this paper a Two-Stream Siamese Neural Network for vehicle re-identification. The proposed network is fed simultaneously with small coarse patches of the vehicle shape's, with 96 x 96 pixels, in one stream, and fine features extracted from license plate patches, easily readable by humans, with 96 x 48 pixels, in the other one. Then...


75 Languages, 1 Model: Parsing Universal Dependencies Universally

 - 2019

We present UDify, a multilingual multi-task model capable of accurately predicting universal part-of-speech, morphological features, lemmas, and dependency trees simultaneously for all 124 Universal Dependencies treebanks across 75 languages. By leveraging a multilingual BERT self-attention model pretrained on 104 languages, we found that fine-tu...


Bidirectional Learning for Domain Adaptation of Semantic Segmentation

, ,  - 2019

Domain adaptation for semantic image segmentation is very necessary since manually labeling large datasets with pixel-level labels is expensive and time consuming. Existing domain adaptation techniques either work on limited datasets, or yield not so good performance compared with supervised learning. In this paper, we propose a novel bidirection...


Exploring the Limitations of Behavior Cloning for Autonomous Driving

, , ,  - 2019

Driving requires reacting to a wide variety of complex environment conditions and agent behaviors. Explicitly modeling each possible scenario is unrealistic. In contrast, imitation learning can, in theory, leverage data from large fleets of human-driven cars. Behavior cloning in particular has been successfully used to learn simple visuomotor pol...


Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics

, , , , ,  - 2019

We address the problem of video representation learning without human-annotated labels. While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a frame-by-frame basis, which are not applicable to many video analytic tasks where spatio-temporal features are prevailing...