Diffusion-PbD: Generalizable Robot Programming by Demonstration with Diffusion Features

University of Washington

Given just a single observed human demonstration, Diffusion-PbD can synthesize robot manipulation programs that adapt to unseen objects, unseen viewpoints, and unseen environments.

Abstract

Programming by Demonstration (PbD) is an intuitive technique for programming robot manipulation skills by demonstrating the desired behavior. However, most existing approaches either require extensive demonstrations or fail to generalize beyond their initial demonstration conditions. We introduce Diffusion-PbD, a novel approach to PbD that enables users to synthesize generalizable robot manipulation skills from a single demonstration by utilizing the representations captured by pre-trained visual foundation models. At demonstration time, hand and object detection priors are used to extract waypoints from the human demonstrations anchored to reference points in the scene. At execution time, features from pre-trained diffusion models are leveraged to identify corresponding reference points in new observations. We validate this approach through a series of real-world robot experiments, showing that Diffusion-PbD is applicable to a wide range of manipulation tasks and has strong ability to generalize to unseen objects, camera viewpoints, and scenes.

Video

Generalization to Unseen Objects, Viewpoints, and Scenes

Wide Variety of Manipulation Skills

Contribution of Stable Diffusion Features