BC Findings — Tune to Learn

When using position targets as the action representation, the controller gains implicitly shape the learning problem by altering the action distribution. Compliant gains produce larger, more expressive position targets, while stiff gains yield small, nearly linear targets that are easy to regress but carry less information about the intended behavior.

The heatmap below shows task success rates across different gain configurations. Each cell represents a (Kp, Kd) pair — darker means higher success. Tap any cell to compare its training loss curve.

Gain-Dependent Demonstrations. The viewer below replays the same robot trajectory under different gain settings. Notice how the state trajectories (robot motion) remain nearly identical while the position targets (actions) change dramatically with gains.

Training Results. The heatmap below shows task success rates across different gain configurations. Each cell represents a (Kp, Kd) pair — darker means higher success. Tap any cell to compare its training loss curve.

Notice the paradox: the best-performing gain regions (upper-left, compliant) have higher training loss than stiff regions (lower-right). This is because stiff controllers produce smooth, near-linear action distributions that are trivial for a neural network to fit — but the resulting policies lack the expressiveness needed for successful task execution.

This pattern holds across all five tasks, suggesting that gain selection for behavior cloning should prioritize learnability — not low training error or task-specific stiffness.

BC BC prefers Compliant, Damped Controllers