AIST » Waterfront » DHRC » Members » Top »

Visual Localization for First-person Vision System

[Click to open a movie] [Click to open a movie]

We describe a method of image-based localization suitable for the First-Persion Vision system composed of a single head-mouted video camera. Assuming the camera motion is within a known environment, mapping and localization are conducted in two separate processes. In mapping, the 3D structure of the environment is reconstructed from uncalibrated images using a Structure-from-Motion technique. The structure is then refined by a multi-view stereo algorithm. Each point in the reconstructed 3D point cloud are associated with the corresponding image features, composing a set of triplets of 3D point, image feature, and viewing direction. The mapping process is time consuming and conducted in advance. Localization, on the other hand, can be at an interactive speed taking advantage of the precomputed triplets. First,image features are extracted given an image acquired by a head-mounted camera. The corresponding points in the environment are sought using the features as query keys, and matched to estimate the camera pose in the environment. We demonstrate that a large environment map is created using more than 500 images, and the localization is with centimeter accuracy and at an interactive speed. We believe our sensing system is simple and accurate, and therefore suitable for activity modeling and behavior understanding in everyday life environment.

Publications

Shuntaro Yamazaki, Masaaki Mochimaru, and Takeo Kanade,
Visual Localization for First-person Vision System”, (in Japanese)
Proc. Pattern Recognition and Media Understanding, pp.73-78,
May. 2010.
[Paper] [Movie] [Presentation] [BibTex]

Supplementary Material

CVHRRE 2009 Video

This video is a compilation of the main results of this project for modeling 3D maps and visual localization using a single camera mounted on a mobile robot platform. (Watch this at YouTube.com)