Best viewed on desktop — the full site includes interactive 3D simulations and live policy demos for a better understanding of our findings. Open full site
Tune to Learn: How Controller Gains Shape Robot Policy Learning

Tune to Learn:
How Controller Gains Affect
Robot Policy Learning

1 MIT

* Equal contribution, Order determined by Coin Flipping

Abstract

Position controllers have become the dominant interface for executing learned manipulation policies. Yet a critical design decision remains understudied: how should we choose controller gains for policy learning? We argue that gain selection should be guided by learnability: how amenable different gain settings are to the learning algorithm in use.

  1. Behavior Cloning (BC) learning benefits from compliant and overdamped gain regimes,
  2. Reinforcement Learning (RL) can succeed across all gain regimes given compatible hyperparameter tuning, and
  3. Sim-to-Real transfer is harmed by stiff and overdamped gain regimes.

These findings reveal that optimal gain selection depends not on the desired task behavior, but on the learning paradigm employed.

Video Summary

Key Findings

Conclusion & Remarks

We have presented a systematic study of how position controller gains shape learning dynamics across three paradigms of modern robot learning. Our findings reveal that gains function not as behavioral parameters, but as an inductive bias that modulates the learning interface between policy and environment.

Broader Implications

Whole-Body Tracking Controllers for Humanoids. Modern humanoid robots increasingly use RL-trained whole-body tracking policies as low-level controllers, analogous to the PD controllers studied here. These motion tracking policies tend to be inherently stiff — yet when such policies serve as the low-level interface for higher-level learning, their effective compliance directly shapes the learning dynamics of the policies above them, much as PD gains do in our setting.

Learning from wearable devices

The Implicit Stiff-Controller Assumption in Learning from Wearables or Videos. Paradigms that learn manipulation skills from human videos or wearable devices typically treat observed next-timestep state as the action label, implicitly assuming perfect target tracking — which our results suggest may be suboptimal for imitation learning. Whether these gain-dependent trends generalize to such cross-embodiment or whole-body control settings remains an open question.

BibTeX

@misc{bronars2026tunelearncontrollergains,
      title={Tune to Learn: How Controller Gains Shape Robot Policy Learning},
      author={Antonia Bronars and Younghyo Park and Pulkit Agrawal},
      year={2026},
      eprint={2604.02523},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2604.02523},
}
View Full Interactive Site

Includes 3D simulations, interactive charts, and live demos