Visual Localization for First-person Vision System
We describe a method of image-based localization suitable for the First-Persion Vision system composed of a single head-mouted video camera. Assuming the camera motion is within a known environment, mapping and localization are conducted in two separate processes. In mapping, the 3D structure of the environment is reconstructed from uncalibrated images using a Structure-from-Motion technique. The structure is then refined by a multi-view stereo algorithm. Each point in the reconstructed 3D point cloud are associated with the corresponding image features, composing a set of triplets of 3D point, image feature, and viewing direction. The mapping process is time consuming and conducted in advance. Localization, on the other hand, can be at an interactive speed taking advantage of the precomputed triplets. First,image features are extracted given an image acquired by a head-mounted camera. The corresponding points in the environment are sought using the features as query keys, and matched to estimate the camera pose in the environment. We demonstrate that a large environment map is created using more than 500 images, and the localization is with centimeter accuracy and at an interactive speed. We believe our sensing system is simple and accurate, and therefore suitable for activity modeling and behavior understanding in everyday life environment.
Publications
Shuntaro Yamazaki, Masaaki Mochimaru, and Takeo Kanade,
“Visual Localization for First-person Vision System”, (in Japanese)
Proc. Pattern Recognition and Media Understanding, pp.73-78,
May. 2010.
[Paper]
[Movie]
[Presentation]
[BibTex]