* Equal contribution, Order determined by Coin Flipping
Position controllers have become the dominant interface for executing learned manipulation policies. Yet a critical design decision remains understudied: how should we choose controller gains for policy learning? The conventional wisdom is to select gains based on desired task compliance or stiffness. However, this logic breaks down when controllers are paired with state-conditioned policies: effective stiffness emerges from the interplay between learned reactions and control dynamics, not from gains alone. We argue that gain selection should instead be guided by learnability: how amenable different gain settings are to the learning algorithm in use. In this work, we systematically investigate how position controller gains affect three core components of modern robot learning pipelines: behavior cloning, reinforcement learning from scratch, and sim-to-real transfer. Through extensive experiments across multiple tasks and robot embodiments, we find that:
These findings reveal that optimal gain selection depends not on the desired task behavior, but on the learning paradigm employed.
$$\tau = \underbrace{\mathbf{K}_p (\mathbf{x}_{\text{desired}} - \mathbf{x})}_{\text{feedback}} + \underbrace{\mathbf{K}_d (- \dot{\mathbf{x}})}_{\text{damping}} + \underbrace{\tau_{\text{ff}}}_{\text{feedforward}}$$
where $\tau$ is the control torque, $\mathbf{K}_p$ is the stiffness gain matrix, and $\mathbf{K}_d$ is the damping gain matrix. Click and drag the circle below to adjust the gains and see how it affects the robot's response to different excitations. The response curves are recorded from real-world Franka Research 3.The 3D simulation below runs the same PD control law on a MuJoCo model of the Franka Research 3. Drag the gains on the chart above and observe how different gain regimes track sinusoidal or step commands — or click & drag the robot to apply perturbation forces.
Before we dig deeper into the world of low-level controllers —
Pick one to jump to the most relevant section — or just scroll through everything.
You're not alone — many practitioners never touch the default gains. But our experiments show that the default gain regime can quietly bottleneck policy performance. In behavior cloning, for example, swapping to a compliant, overdamped regime can improve success rates by over 30% on the same task with the same data. Keep reading to see how different gain choices interact with each stage of the learning pipeline.
Matching gains to the desired task stiffness is the most common heuristic — but it can be misleading when a learned policy is in the loop. Because the policy itself reacts to state, the effective stiffness is a product of both the controller gains and the learned behavior. Our findings suggest that optimizing gains for learnability — not task compliance — leads to significantly better policies.
Optimizing for a comfortable teleoperation feel is a reasonable starting point — after all, better teleop usually means better demonstration data. However, gains that feel natural for a human operator may not be optimal for the downstream learning algorithm. Our results show that the best gain regime differs depending on whether you're doing behavior cloning, RL from scratch, or sim-to-real transfer. Read on to find out which regime helps most for your pipeline.
We have presented a systematic study of how position controller gains shape learning dynamics across three paradigms of modern robot learning. Our findings reveal that gains function not as behavioral parameters, but as an inductive bias that modulates the learning interface between policy and environment. Behavior cloning favors compliant, overdamped regimes; reinforcement learning adapts to any gain setting given compatible hyperparameters; and sim-to-real transfer suffers with stiff, overdamped configurations. These results provide both conceptual clarity and practical guidance for a widely used yet underexplored design decision.
@inproceedings{author2026method,
title = {Your Paper Title Goes Here},
author = {One, Author and Two, Author and Three, Author and Four, Author},
booktitle = {Conference on Robot Learning (CoRL)},
year = {2026}
}
This work was supported by … We thank … for helpful discussions. The website template is adapted from Nerfies.