Tune to Learn:
How Controller Gains Affect Robot Policy Learning

1 MIT

* Equal contribution, Order determined by Coin Flipping

Abstract

Position controllers have become the dominant interface for executing learned manipulation policies. Yet a critical design decision remains understudied: how should we choose controller gains for policy learning? The conventional wisdom is to select gains based on desired task compliance or stiffness. However, this logic breaks down when controllers are paired with state-conditioned policies: effective stiffness emerges from the interplay between learned reactions and control dynamics, not from gains alone. We argue that gain selection should instead be guided by learnability: how amenable different gain settings are to the learning algorithm in use. In this work, we systematically investigate how position controller gains affect three core components of modern robot learning pipelines: behavior cloning, reinforcement learning from scratch, and sim-to-real transfer. Through extensive experiments across multiple tasks and robot embodiments, we find that:

  1. Behavior Cloning (BC) learning benefits from compliant and overdamped gain regimes,
  2. Reinforcement Learning (RL) can succeed across all gain regimes given compatible hyperparameter tuning, and
  3. Sim-to-Real transfer is harmed by stiff and overdamped gain regimes.

These findings reveal that optimal gain selection depends not on the desired task behavior, but on the learning paradigm employed.

Video Summary

Position Impedance Controllers and Gains

Position impedance controllers are commonly used in robotics to enable compliant behavior. The control law is typically defined as:

$$\tau = \underbrace{\mathbf{K}_p (\mathbf{x}_{\text{desired}} - \mathbf{x})}_{\text{feedback}} + \underbrace{\mathbf{K}_d (- \dot{\mathbf{x}})}_{\text{damping}} + \underbrace{\tau_{\text{ff}}}_{\text{feedforward}}$$

where $\tau$ is the control torque, $\mathbf{K}_p$ is the stiffness gain matrix, and $\mathbf{K}_d$ is the damping gain matrix. Click and drag the circle below to adjust the gains and see how it affects the robot's response to different excitations. The response curves are recorded from real-world Franka Research 3.

The 3D simulation below runs the same PD control law on a MuJoCo model of the Franka Research 3. Drag the gains on the chart above and observe how different gain regimes track sinusoidal or step commands — or click & drag the robot to apply perturbation forces.

Interactive. Click & drag robot to apply forces. Change excitation mode (Hold/Sine/Step) to see tracking behavior. Orbit: right-click · Zoom: scroll

Before we dig deeper into the world of low-level controllers —

How do you tune your controller gains?

Pick one to jump to the most relevant section — or just scroll through everything.

You said: Don't Tune

You're not alone — many practitioners never touch the default gains. But our experiments show that the default gain regime can quietly bottleneck policy performance. In behavior cloning, for example, swapping to a compliant, overdamped regime can improve success rates by over 30% on the same task with the same data. Keep reading to see how different gain choices interact with each stage of the learning pipeline.

You said: Tune for Task

Matching gains to the desired task stiffness is the most common heuristic — but it can be misleading when a learned policy is in the loop. Because the policy itself reacts to state, the effective stiffness is a product of both the controller gains and the learned behavior. Our findings suggest that optimizing gains for learnability — not task compliance — leads to significantly better policies.

You said: Tune for Teleop

Optimizing for a comfortable teleoperation feel is a reasonable starting point — after all, better teleop usually means better demonstration data. However, gains that feel natural for a human operator may not be optimal for the downstream learning algorithm. Our results show that the best gain regime differs depending on whether you're doing behavior cloning, RL from scratch, or sim-to-real transfer. Read on to find out which regime helps most for your pipeline.

Conclusion & Remarks

We have presented a systematic study of how position controller gains shape learning dynamics across three paradigms of modern robot learning. Our findings reveal that gains function not as behavioral parameters, but as an inductive bias that modulates the learning interface between policy and environment. Behavior cloning favors compliant, overdamped regimes; reinforcement learning adapts to any gain setting given compatible hyperparameters; and sim-to-real transfer suffers with stiff, overdamped configurations. These results provide both conceptual clarity and practical guidance for a widely used yet underexplored design decision.

Broader Implications

Whole-Body Tracking Controllers for Humanoids. Modern humanoid robots increasingly use RL-trained whole-body tracking policies as low-level controllers, analogous to the PD controllers studied here. Notably, recent work has shown that these motion tracking policies tend to be inherently stiff regardless of the underlying controller gains Margolis et al. "SoftMimic: Learning Compliant Whole-body Control from Examples." arXiv 2025.—yet when such policies serve as the low-level interface for higher-level loco-manipulation learning, their effective compliance directly shapes the learning dynamics of the policies above them, much as PD gains do in our setting.

Learning from Wearables or Videos. Similarly, paradigms that learn manipulation skills from human videos Qiu et al. "Humanoid Policy ~ Human Policy." arXiv 2025. Grauman et al. "Ego4D: Around the World in 3,000 Hours of Egocentric Video." CVPR 2022. or wearable devices Chi et al. "Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots." arXiv 2024. typically treat observed next-timestep state as the action label, implicitly assuming perfect target tracking—which our results suggest may be suboptimal for imitation learning. Whether these gain-dependent trends generalize to such cross-embodiment or whole-body control settings remains an open question, and we hope our findings offer a useful lens for investigating these directions.

BibTeX

@inproceedings{author2026method,
  title     = {Your Paper Title Goes Here},
  author    = {One, Author and Two, Author and Three, Author and Four, Author},
  booktitle = {Conference on Robot Learning (CoRL)},
  year      = {2026}
}

Acknowledgments

This work was supported by … We thank … for helpful discussions. The website template is adapted from Nerfies.