Controllable Motion Generation via Diffusion Modal Coupling

Luobin Wang, Hongzhan Yu, CHenning Yu, Sicun Gao, Henrik I Christensen

June, 2026

Systems Overview

Abstract

Diffusion models are increasingly used in robotics to represent multi-modal distributions over system states and behaviors, but precise control of generated outcomes without degrading physical realism remains challenging. This paper introduces a controllable diffusion framework that (i) replaces the standard unimodal Gaussian prior with an explicit multi-modal prior, and (ii) enforces modal coupling between prior components and principal data modes through novel forward and reverse diffusion processes. Sampling is initialized directly from a selected prior mode aligned with task constraints, avoiding train–test mismatch and manifold drift commonly induced by post-hoc guidance. Empirical evaluations on motion prediction (Waymo Dataset) and multi-task control (Maze2D) show consistent improvements over guidance-based baselines in fidelity, diversity, and controllability. These results indicate that multi-modal priors with strong modal coupling provide a scalable basis for controllable motion generation in robotics. The official implementation is provided in https://github.com/RobinWangSD/Diffusion-Modal-Coupling/w.

Type

Publication

International Conference for Robotics and Automation

Diffusion models are increasingly used in robotics to represent multi-modal distributions over system states and behaviors, but precise control of generated outcomes without degrading physical realism remains challenging. This paper introduces a controllable diffusion framework that (i) re- places the standard unimodal Gaussian prior with an explicit multi-modal prior, and (ii) enforces modal coupling between prior components and principal data modes through novel forward and reverse diffusion processes. Sampling is initial- ized directly from a selected prior mode aligned with task constraints, avoiding train–test mismatch and manifold drift commonly induced by post-hoc guidance. Empirical evaluations on motion prediction (Waymo Dataset) and multi-task control (Maze2D) show consistent improvements over guidance-based baselines in fidelity, diversity, and controllability. These re- sults indicate that multi-modal priors with strong modal cou- pling provide a scalable basis for controllable motion gener- ation in robotics. The official implementation is provided in https://github.com/RobinWangSD/Diffusion-Modal-Coupling/