6D object pose estimation is one of the key problems in enabling autonomous systems equipped with visual sensors to accurately estimate the positions of objects in the environment in order to grasp and manipulate them. Think, for example, of an assistant robot at home, that can automatically fill-in a dish-washer or a manufacturing robot manipulating objects on the production line.
The CosyPose approach estimates the 6D pose of multiple known objects in a scene captured by one or more input images with unknown camera viewpoints. The main innovation is a marriage of powerful deep neural networks trained from a combination of synthetic and real images together with multi-view geometric constraints.
CosyPose: 6D object pose estimation optimizing multi-view COnSistencY. Given a set of RGB images depicting a scene with known objects taken from unknown viewpoints (top), our method accurately reconstructs the scene (bottom) recovering all objects in the scene, their 6D pose and the camera viewpoints.
The award. The proposed approach achieves state-of-the-art results on multiple benchmarks, doubling the performance of existing methods on the most complex datasets that were beyond the capabilities of previous systems. The method won 5 awards including “The Overall Best Method” in the 6D object pose estimation challenge at ECCV 2020, outperforming other competitors by a significant margin in multiple categories and reaching the performance of methods that use further information from costly depth sensors (which are also sensitive to illumination) while only using non-expensive RGB sensors. The results of the challenge are available as slides and as an overview paper.
Support from the Jean Zay computing cluster. Training the 6D object pose estimation models requires a significant compute on multiple GPUs. Developing the method and winning the challenge would have not been possible without the generous support from the French National Jean Zay computing cluster.
ECCV 2020. The 6D object pose estimation challenge was held in conjunction with the European Conference on Computer Vision, (ECCV) 2020. ECCV is one of the top three computer vision conferences (together with CVPR and ICCV) and is listed among the top 100 most cited journals and conferences over all areas of science by Google Scholar.