Match Policy

Abstract

Many manipulation tasks require the robot to rearrange objects relative to one another. Such tasks can be described as a sequence of relative poses between parts of a set of rigid bodies. In this work, we propose $\text{Match Policy}$, a simple but novel pipeline for solving high-precision pick and place tasks. Instead of predicting actions directly, our method registers the pick and place targets to the stored demonstrations. This transfers action inference into a point cloud registration task and enables us to realize nontrivial manipulation policies without any training. $\text{Match Policy}$ is designed to solve high-precision tasks with a key-frame setting. By leveraging the geometric interaction and the symmetries of the task, it achieves extremely high sample efficiency and generalizability to unseen configurations. We demonstrate its state-of-the-art performance across various tasks on RLbench benchmark compared with several strong baselines and test it on a real robot with six tasks.

Introduction

A Simple Pipeline without Training. To provide a convenient tool for robotic pick-place policies that require minimal effort to deploy across different tasks, we propose $\text{Match Policy}$, a simple pipeline that transfers manipulation policy learning into point cloud registration (PCR). $\text{Match Policy}$ constructs a combined point cloud of the desired scene using segmented point clouds, where objects are arranged in the expected configuration. As illustrated in Figure 1, we store a collection of combined point clouds from the demonstration data. During inference, the point clouds of the pick and place objects are registered to these stored point clouds, and the resulting registration poses are used to compute the action. Unlike the prior works that require heavy training, we realize this pipeline with optimization-based method: $\text{Match Policy}$ takes use of the RANSAC and ICP and produces the pick-place policy immediately after the demonstration collection.

Our proposed method has a couple of key advantages. First, the PCR step corresponds the local geometric details shown in the demonstration to the new observation, enabling the agent to solve high-precision tasks. Second, $\text{Match Policy}$ illustrates great sample efficiency, i.e., the ability to learn good policies with relatively few expert demonstrations. We demonstrate it can achieve the compelling performance with only one demonstration and can generalize to many different novel poses with various experiments. Finally, $\text{Match Policy}$ shows high adaptability when tested with different camera settings, e.g., single camera view and low-resolution cameras, as well as on tasks with long horizons and articulated objects.

Method Overview

$\text{Match Policy}$ takes the segmented point clouds as input and outputs the key-frame actions $(\text{pick}, \text{pre-place}, \text{place})$. It have three steps:

Storing Combined Point Clouds ${P_{ab}}$. We first construct the combined point cloud $P_{ab}$ from the demonstration data. $P_{ab}$ represents either the desired pick configuration or the desired preplace/place configuration, as shown in Fig 1.
Registering $\hat{P}_a$ and $\hat{P}_a$ to $P_{ab}$. We denote $\hat{P}_a$ and $\hat{P}_b$ as the observed point clouds during inference, to distinguish them from the demonstrated point clouds. Our registration model $f_{r}\colon (\hat{P}_a, \hat{P}_b, P_{ab}) \mapsto (\hat{T}_a, \hat{T}_b)$ outputs the poses that match $\hat{P}_a$ and $\hat{P}_b$ to the combined point cloud $P_{ab}$.
Calculating $(\text{pick}, \text{pre-place}, \text{place})$. We calculate the pick action as the relative pose to arrange the gripper to the current pick target. The preplace and place action are determined by moving the pick target while keeping the placement stationary, to match desired configuration.

Experiment

Our method is tested with 8 simulated tasks including the high-precision task, long-horizon task, and articulated object manipulation. We also validate the method on the real robot with 6 different tasks. Check our paper for the detailed results.

Citation


     
      @misc{huang2024matchpolicysimplepipeline,
        title={MATCH POLICY: A Simple Pipeline from Point Cloud Registration to Manipulation Policies}, 
        author={Haojie Huang and Haotian Liu and Dian Wang and Robin Walters and Robert Platt},
        year={2024},
        eprint={2409.15517},
        archivePrefix={arXiv},
        primaryClass={cs.RO},
        url={https://arxiv.org/abs/2409.15517},}

Match Policy: A Simple Pipeline from Point Cloud Registration to Manipulation Policies

Introduction to Match Policy: A simple imitation pipeline without training.

Abstract

Introduction

Method Overview

Experiment

Citation