Best viewed on desktop — the full site includes interactive 3D simulations and live policy demos for a better understanding of our findings. Open full site
Back Behavior Cloning Findings
Compliant wins. Low Kp + high Kd = best success rates.
Low loss ≠ good policy. Stiff gains fit easier but fail at the task.

When using position targets as the action representation, the controller gains implicitly shape the learning problem by altering the action distribution. Compliant gains produce larger, more expressive position targets, while stiff gains yield small, nearly linear targets that are easy to regress but carry less information about the intended behavior.

The heatmap below shows task success rates across different gain configurations. Each cell represents a (Kp, Kd) pair — darker means higher success. Tap any cell to compare its training loss curve.

Gain-Dependent Demonstrations. The viewer below replays the same robot trajectory under different gain settings. Notice how the state trajectories (robot motion) remain nearly identical while the position targets (actions) change dramatically with gains.

Kp 200 Kd 10
Joint Response
Noise 0.00
Loading...

Live Trajectory. Use the playback controls to scrub through episodes. Switch tasks with the tabs.

Training Results. The heatmap below shows task success rates across different gain configurations. Each cell represents a (Kp, Kd) pair — darker means higher success. Tap any cell to compare its training loss curve.

Task Success Rate. Darker = higher success. Tap any cell to load its training curve below.

Training Loss. Compare convergence across gain configurations.

Notice the paradox: the best-performing gain regions (upper-left, compliant) have higher training loss than stiff regions (lower-right). This is because stiff controllers produce smooth, near-linear action distributions that are trivial for a neural network to fit — but the resulting policies lack the expressiveness needed for successful task execution.

This pattern holds across all five tasks, suggesting that gain selection for behavior cloning should prioritize learnability — not low training error or task-specific stiffness.

View on desktop for interactive gain sliders & live 3D policy rollouts

Includes drag-to-perturb, noise injection, and gain grid exploration