DITTO: Demonstration Imitation by Trajectory Transformation

Repository providing the source code for the paper

DITTO: Demonstration Imitation by Trajectory Transformation
Nick Heppert, Max Argus, Tim Welschehold, Thomas Brox and Abhinav Valada

Installation 🏗

We recommend to use conda.

# conda create -n DITTO python=3.8 # for a ROS-compatible environment
conda create -n DITTO python=3.10 # future proof
conda activate DITTO

Install torch as your liking/your CUDA version e.g.

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

pip install -r requirements.txt
conda install -c conda-forge libstdcxx-ng
pip install -e .

Install FlowControl (optional)

Frist clone the fork of the repo

git clone [email protected]:SuperN1ck/flowcontrol.git

follow the instructions in the readme, i.e.

pip install -e .

and installing RAFT as described here

Data 🗄️

Download the all files from here and extract them

tar -xzvf demonstration_data.tar.gz

at a location to your liking. As defined in config.py, per default DITTO assumes demonstrations located under the project root under demonstrations/.

If correctly extrated you should see a file structure like

demonstrations/
    recordings_24_01_04/
        coke_tray/
            000/
                cnos_redetected_masks/  # We used CNOS to redect all masks given the initial frame.
                    object_seg_000.png
                    ...
                confidence/             # Confidence arrays
                    00000.npy
                    ...
                depth/                  # Depth arrays
                    00000.npy
                    ...
                hands23/                # Output of hands23, see their repo (link below)
                    left_hand/
                    right_hand/
                    00000.png
                    ...
                rgb/                    # RGB arrays
                    00000.png
                    ...
                sam_refined/            
                    # Refined masks using our procedure described in the appendix.
                    # Provides a container segmentation or object segmentation for the timestep in the name.
                    # Not for every timestep.
                    [container_seg_000.png] 
                    [object_seg_000.png]
                    ...
                config.yaml            # Zed2 Camera Configuration
                images.np.npz           # All images compressed into a single archive for faster loading
                intrinsics.yaml         # The instrinsics of the used camera
                time_steps.yaml         # See down low for a thorough explanation
                zed2.svo                # Raw Zed2 recordings
            001/
            ...
        ...
    recordings_25_01_05/
        ...

For an overview of the Hands23 format see their Github Page.

Running 🏃🏼

All scripts are powered by tyro, an awesome CLI wrapper, thus, you can simply append --help and will get an overview of all parameters.

Video Segmentation Pipeline 📹

The scripts have been run in a while and could be potentially be broken/out of sync with their respective. We provide pre-processed masks in our data already. If you have any issues and/or suggestions, happy for any issue/PR!

Using Hands23

To extract labels use

python scripts/process_with_hands23.py

To then automatically annotate timesteps based on the hand state use

python scripts/auto_annotate_timesteps.py

Updating Hands23 Masks

We provide two methods to improve object and goal segmentation masks. First, we use CNOS to re-detect objects

python scripts/redetect_with_cnos.py

And second, we use SAM and spatial distances

python scripts/refine_with_sam.py

We found the latter to perform consistently better.

Notebook for Utility Overview 🪛

We provide a notebook under notebooks/DITTO_example_usage.ipynb that showcases how to use DITTO.

Paper Evaluation 📓

To run the same evaluations as in the paper use the following commands.

Table I (a) - Correspondences within one Episode

python scripts/eval/track_correspondences_within_episode.py

Table I (b) - Correspondences between Episodes

python scripts/eval/track_correspondences_between_episode.py

Table II - Relative Poses between Episodes

python scripts/eval/evaluate_relative_poses_between_trajectories.py

Result Plotting

The previous mentioned scripts will produce result files. For investigating the results we provide two notebooks under

notebooks/
    eval/
        correspondences.ipynb
        poses.ipynb

3D Renderings

We also provide a notebook to render rotating 3D scenes under notebooks/show_single_trajectory.ipynb.

`time_steps.yaml` ⏱️

For each demonstration there is a time_steps.yaml-file. It contains manual adjustements if needed.

# Automatically extracted from scripts/auto_annotate_timesteps.py
 - t_start: 13
 - t_stop: 44
# Use the CNOS-based re-dection mask over Hands23 ones (CNOS is very unlikely to be better due to occlusions)
 - cnos_over_hands23: [17, ...] 
# In case none of our methods produced a good segmentation mask, we will shift a timestep back/forward
 - redirect_to_other_step: {21: 22, ...}
# Overwrites the start of the episode
 - t_h23_object_manual: 11
# Overwrites the stop of the episode
 - t_stop_manual: 35

Publication 📝

If you find this repository useful, please consider citing the paper as follows ✍🏼:

@inproceedings{heppert2024ditto,
  title={DITTO: Demonstration Imitation by Trajectory Transformation},
  author={Nick Heppert, Max Argus, Tim Welschehold, Thomas Brox, Abhinav Valada},
  booktitle={2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year={2024},
  organization={IEEE}
}

Also checkout our utility library casino.

Additions/Changes from the accepted paper 🔀

Demonstration Splitting

Please note that different from the paper, we split the tennisball_cup into two seperate tasks.

Original, for the numbers in the paper, they were all merged under one

tennisball_cup/
    000/
    001/
    002/
    003/
    004/

In this repository, as defined in config/valid_episodes.yaml, we assume a split into

tennisball_cup_laying/
    000/
    001/
    002/
tennisball_cup_upright/
    003/
    004/

Updated Metrics

Vilja Lott also developed an updated metric for calculating the relative pose error. We will release the results soon.

🤝 Acknowledgements

This work was partially funded by the Carl Zeiss Foundation with the ReScaLe project.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
DITTO		DITTO
config		config
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
overview.png		overview.png
package.xml		package.xml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DITTO: Demonstration Imitation by Trajectory Transformation

Installation 🏗

Install FlowControl (optional)

Data 🗄️

Running 🏃🏼

Video Segmentation Pipeline 📹

Using Hands23

Updating Hands23 Masks

Notebook for Utility Overview 🪛

Paper Evaluation 📓

Table I (a) - Correspondences within one Episode

Table I (b) - Correspondences between Episodes

Table II - Relative Poses between Episodes

Result Plotting

3D Renderings

`time_steps.yaml` ⏱️

Publication 📝

Additions/Changes from the accepted paper 🔀

Demonstration Splitting

Updated Metrics

🤝 Acknowledgements

About

Releases

Packages

Languages

License

robot-learning-freiburg/DITTO

Folders and files

Latest commit

History

Repository files navigation

DITTO: Demonstration Imitation by Trajectory Transformation

Installation 🏗

Install FlowControl (optional)

Data 🗄️

Running 🏃🏼

Video Segmentation Pipeline 📹

Using Hands23

Updating Hands23 Masks

Notebook for Utility Overview 🪛

Paper Evaluation 📓

Table I (a) - Correspondences within one Episode

Table I (b) - Correspondences between Episodes

Table II - Relative Poses between Episodes

Result Plotting

3D Renderings

time_steps.yaml ⏱️

Publication 📝

Additions/Changes from the accepted paper 🔀

Demonstration Splitting

Updated Metrics

🤝 Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`time_steps.yaml` ⏱️

Packages