Direct Dynamic Retargeting for Humanoid Imitation Learning from Videos

Explainable & Ethical AI

Published: arXiv: 2605.23762v1

Authors

Constant Roux Ludovic De Matteïs Armand Jordana Valentin Guillet Nicolas Mansard Olivier Stasse Philippe Souères

Abstract

Imitation Learning from monocular video demonstrations provides a scalable approach for teaching complex skills to humanoid robots. However, translating human motion to humanoids requires overcoming significant morphological mismatches. Standard approaches rely on Geometric Retargeting or Indirect Dynamic Retargeting pipelines. We identify that these intermediate kinematic projections introduce a geometric bias, restricting the search space and yielding suboptimal dynamic behaviors. In this paper, we propose Direct Dynamic Retargeting (DDR), a novel single-stage framework that generates high-fidelity, dynamically feasible trajectories directly from expert videos. By formulating the problem in the task space and leveraging a sampling-based Model Predictive Control solver within a physics simulator, DDR natively optimizes over complex contact sequences while mitigating input drift. Our experiments demonstrate that bypassing the geometric bias allows DDR to outperform state-of-the-art baselines in demonstration tracking accuracy. Furthermore, we establish that providing such physically viable references to RL agents accelerates training convergence and enhances the final execution of agile and balancing behaviors. Source code will be made publicly available.

Paper Summary

Problem

Humanoid robots are designed to mimic human movements, but translating human motion to robots requires overcoming significant morphological mismatches. This challenge makes it difficult to teach complex skills to humanoid robots using Imitation Learning (IL) from monocular videos.

Key Innovation

The research introduces Direct Dynamic Retargeting (DDR), a novel single-stage framework that generates high-fidelity, dynamically feasible trajectories directly from expert videos. DDR bypasses the intermediate geometric bias inherent in previous methods, allowing it to outperform state-of-the-art baselines in demonstration tracking accuracy.

Practical Impact

The practical impact of DDR is significant. By providing physically viable references to RL agents, DDR accelerates training convergence and enhances the final execution of agile and balancing behaviors. This breakthrough has the potential to revolutionize the field of humanoid robotics, enabling robots to learn complex skills more efficiently and effectively.

Analogy / Intuitive Explanation

Imagine trying to teach a child to ride a bike by providing them with a video of an expert rider. The child would struggle to replicate the exact movements, as the video would not account for their own physical limitations and differences. DDR is like a "video editor" that translates the expert rider's movements into a format that is tailored to the child's (or robot's) specific needs, allowing them to learn and execute the skills more easily.

Paper Information

Categories:

cs.RO

Published Date:

arXiv ID:

2605.23762v1

Quick Actions

Back to Home