Diffusion Policy

Diffusion Policy presents an innovative method for generating robot actions by conceptualizing visuomotor policy learning as a conditional denoising diffusion process. During inference, it utilizes the gradient of the score function for the action distribution and applies iterative stochastic Langevin dynamics, allowing for robust management of complex, multimodal, and high-dimensional action spaces while maintaining training stability. Key features, including receding horizon control, visual input conditioning, and a time-series diffusion transformer, further enhance the effectiveness of this approach for real-world visuomotor policy learning.

A diffusion policy pipeline is provided for evaluating the diffusion policy model on the Push-T task in simulation. This pipeline includes source code optimized with Intel® OpenVINO™ for improved performance, and supports both Transformer-based and CNN-based diffusion policy for inference on the Push-T task.

In this tutorial, we will introduce how to setup Diffusion Policy simulation pipeline.

Simulation Task

Push-T

The Push-T task is a manipulation task where the robot must push a T-shaped block to a target location. The block is gray, the target is green, and the robot’s End-Effector is represented by a blue circular shape. The task includes variations in initial conditions for both the T block and the End-Effector, requiring the robot to exploit complex and contact-rich object dynamics to achieve precise positioning of the T block using point contacts.

There are two variants with proprioception for End-Effector location:

image : one with RGB image observations
low-dim : another with 9 2D keypoints obtained from the ground-truth pose of the T block

The maximum step of the task is 300, and the reward is defined as the maximum overlap between the target position and the T-shaped block during the pushing process.

Prerequisites

Please make sure you have finished setup steps in Installation & Setup.

Installation

Install Diffusion Policy package

The Embodied Intelligence SDK provides optimized source code for Intel® OpenVINO™. To get the source code from /opt/diffusion-policy-ov/ with the following command:

$ sudo apt install diffusion-policy-ov
$ sudo chown -R $USER /opt/diffusion-policy-ov/

After installing the diffusion-policy-ov package, follow the README.md file in /opt/diffusion-policy-ov/ to set up the complete source code environment.

Virtual environment setup

Download and install the Miniforge as follows if you don’t have conda installed on your machine:

$ wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
$ bash Miniforge3-Linux-x86_64.sh
$ source ~/.bashrc

You can use conda --version to verify you conda installation. After installation, create a new python environment robodiff:

$ cd <diffusion-policy_SOURCE_CODE_PATH>
$ mamba env create -f conda_environment.yaml

After installation, activate the robodiff Python environment in your current terminal:

$ conda activate robodiff

Install Intel® OpenVINO™

Install the Intel® OpenVINO™ with the following command:

$ pip install huggingface_hub==0.24.7 openvino==2024.6

Run pipeline

Inference

Refer to <diffusion-policy_SOURCE_CODE_PATH>/ov_convert/README.md for instructions on downloading the pre-trained checkpoints, there are four available checkpoints for Push-T task.

Note

For detailed instructions on the model conversion process, please refer to the model tutorial at Diffusion Policy.

Item	Pre-trained checkpoint Name	Low-dim or image	Policy	Parameters
`lowdim_t967.ckpt`	`epoch=0850-test_mean_score=0.967.ckpt`	low-dim	diffusion policy transformer	8.96M
`lowdim_c969.ckpt`	`epoch=0550-test_mean_score=0.969.ckpt`	low-dim	diffusion policy CNN	65.25M
`image_t748.ckpt`	`epoch=0100-test_mean_score=0.748.ckpt`	image	diffusion policy transformer	20.18M
`image_c884.ckpt`	`epoch=0500-test_mean_score=0.884.ckpt`	image	diffusion policy CNN	262.71M

Refer to <diffusion-policy_SOURCE_CODE_PATH>/ov_convert/README.md for instructions on converting the model checkpoint to OpenVINO IR format.

Attention

You need to set the --output_dir to save the converted model to ~/ov_models/pushT/ directory.

The expectation result of this step is that you will have the following files in ~/ov_models/pushT/ directory:

$ ls ~/ov_models/pushT/ -l
-rw-rw-r-- ... image_c884_obs_encoder_onepass.bin
-rw-rw-r-- ... image_c884_obs_encoder_onepass.xml
-rw-rw-r-- ... image_c884_unet_onepass.bin
-rw-rw-r-- ... image_c884_unet_onepass.xml
-rw-rw-r-- ... image_t748_obs_encoder_onepass.bin
-rw-rw-r-- ... image_t748_obs_encoder_onepass.xml
-rw-rw-r-- ... image_t748_unet_onepass.bin
-rw-rw-r-- ... image_t748_unet_onepass.xml
-rw-rw-r-- ... lowdim_c969_unet.bin
-rw-rw-r-- ... lowdim_c969_unet.xml
-rw-rw-r-- ... lowdim_t967_unet.bin
-rw-rw-r-- ... lowdim_t967_unet.xml

You can run the inference with the following command:

Attention

You need to set the --checkpoint to select the pre-trained checkpoint because it contains the policy model configuration.
You need to set the --output_dir to save the inference results.
You can set the --seed to control the randomness of the inference; the default value is 4300000.
For converted OpenVINO IR, you don’t need to set the model path since the default load directory is ~/ov_models/pushT/.

$ conda activate robodiff
$ cd <diffusion-policy_SOURCE_CODE_PATH>
$ python eval.py --checkpoint <Pre-Trained_ckpt_PATH> --output_dir <output_dir>

The inference results will be saved in the <output_dir> directory, which contains the following files:

$ ls <output_dir> -l
-rw-rw-r-- ... eval_log.json
drwxrwxr-x ... media

The eval_log.json contains the evaluation results, and the media directory contains the video of the inference process.