Amortizing Trajectory Diffusion
with Keyed Drift Fields

Anonymous authors

Paper
Code

Overview

Keyed Drifting Policies (KDP) is a simple and effective one-step trajectory planning method for offline RL and robotics, designed to recover diffusion-like planning behavior without the expensive iterative denoising loop.
KDP is conditioning-aware. Rather than matching trajectories in the full high-dimensional window space, KDP forms neighborhoods in a compact key space aligned with the condition (for example, the current state), which avoids collapse toward average, inert trajectories.
KDP is trained by drift-based refinement. It pulls generated trajectories toward condition-matched dataset windows and repels them from nearby generated samples, using a stop-gradient drifted target to amortize refinement into the generator during training.
KDP is fast and performant. At inference, it generates a full trajectory window in a single forward pass, preserving multimodal candidate planning while substantially reducing planning latency compared with diffusion sampling.

Challenge

Diffusion trajectory planners are expressive but too slow for real-time control. In receding-horizon planning, each control step may require \(T\) sequential denoising steps for every candidate trajectory, so latency grows quickly and can make closed-loop planning brittle under tight compute budgets.
One-step conditional trajectory generation is easy to get wrong. If we naively match trajectories in the full high-dimensional window space, distances are dominated by unconstrained future dimensions, which pulls samples toward average trajectories, collapses diversity, and leads to near-static behavior instead of useful condition-consistent plans.

Key observation: Conditional trajectory generation requires a conditioning-aware notion of neighborhood.

Abstract

Diffusion-based trajectory planners can synthesize rich, multimodal action sequences for offline reinforcement learning, but their iterative denoising incurs substantial inference-time cost, making closed-loop planning slow under tight compute budgets. We study the problem of achieving diffusion-like trajectory planning behavior with one-step inference, while retaining the ability to sample diverse candidate plans and condition on the current state in a receding-horizon control loop. Our key observation is that conditional trajectory generation fails under naïve distribution-matching objectives when the similarity measure used to align generated trajectories with the dataset is dominated by unconstrained future dimensions. In practice, this causes attraction toward average trajectories, collapses action diversity, and yields near-static behavior. Our key insight is that conditional generative planning requires a conditioning-aware notion of neighborhood: trajectory updates should be computed using distances in a compact key space that reflects the condition, while still applying updates in the full trajectory space. Building on this, we introduce Keyed Drifting Policies (KDP), a one-step trajectory generator trained with a drift-field objective that attracts generated trajectories toward condition-matched dataset windows and repels them from nearby generated samples, using a stop-gradient drifted target to amortize iterative refinement into training. At inference, the resulting policy produces a full trajectory window in a single forward pass. Across standard RL benchmarks and real-time hardware deployments, KDP achieves strong performance with one-step inference and substantially lower planning latency than diffusion sampling.

Algorithm

Click to see the full KDP training algorithm

Click to see full algorithm for receding-horizon planning with KDP

Quantitative Results

Short-horizon continuous control

TABLE I: Performance in short horizon control on D4RL Locomotion (ME=Medium-Expert, M=Medium).
	HalfCheetah		Hopper		Walker2D		Avg↑	NFE↓
	ME	M	ME	M	ME	M

Offline selection
BC	55.2	42.6	52.5	52.9	107.5	75.3	64.3	–
CQL	91.6	44.0	105.4	58.5	108.8	72.5	80.1	–
IQL	86.7	47.4	91.5	66.3	109.6	78.3	80.0	–
DT	86.8	42.6	107.6	67.6	108.1	74.0	81.1	–
TT	95.0	46.9	110.0	61.1	101.9	79.0	82.3	–
MOPO	63.3	42.3	23.7	28.0	44.6	17.8	36.6	–
MOReL	53.3	42.1	108.7	95.4	95.6	77.8	78.8	–
MBOP	105.9	44.6	55.1	48.8	70.2	41.0	60.9	–
Generative trajectory planners
DQL	96.8	69.1	111.1	90.5	110.1	87.0	91.1	5
CAC	84.3	69.1	100.4	80.7	110.4	83.1	88.0	2
CBC	32.7	31.0	90.6	71.7	110.4	83.1	69.9	2
Diffuser	88.9	42.8	103.3	74.3	106.9	79.6	82.6	20
KDP	92.5	62.1	103.6	90.3	108.5	87.2	90.7	1

TABLE II: Analysis of planning cost. Hardware: 1xNVIDIA A100. (D=Diffuser, UR=Unranked, R=Ranked).
Method	NFE↓	BEF↓	PL	E2E
BC	1	1	1.01	1.22
D-UR(K=1, T=20)	20	20	149.32	149.51
D-R(K=16, T=20)	20	320	172.16	172.34
D-R(K=64, T=20)	20	1280	279.98	280.03
KDP-UR(K=1)	1	1	1.98	2.11
KDP-R(K=16)	1	32	3.97	4.10
KDP-R(K=64)	1	128	4.76	4.96

Long-horizon, goal-conditioned planning

TABLE III: Long horizon manipulation analysis. PL is in s/step. (UM=U-Maze, Med.=Medium, Lar.=Large).
Method	Maze2D				AntMaze
	UM	Med.	Lar.	PL	UM	Med.	PL
CQL	5.7	5.0	12.5	–	84.0	53.7	–
IQL	47.4	34.9	58.6	–	62.2	70.0	–
Diffuser	113.9	121.5	123.0	3.848	76.0	31.9	6.153
KDP	122.3	130.4	133.0	0.031	81.3	67.6	0.052

TABLE IV: Training objective ablations. (Locom.=Locomotion, Act. div.=Action diversity).
Ablation	Locom.↑	Maze2D↑	Act. div.↑
Full method (KDP)	90.7	128.2	0.93
No keying	2.91	9.6	0.14
Include self-negatives	78.08	91.3	0.93
Attraction-only (no repulsion)	1.05	3.7	2.2e6
No drift normalization	86.93	131.5	0.93
Single τ	92.37	102.1	0.93

Dexterous manipulation and high-DoF control

TABLE V: Adroit results.
Task	Diffuser		KDP
	Score	T(s)	Score	T(s)
pen-clone	10.7	1.634	53.4	0.020
door-clone	56.7	1.598	61.8	0.024
hammer-clone	53.1	1.532	79.6	0.034
relocate-clone	56.2	1.685	62.8	0.036

TABLE VI: Action chunking.
Task	L	Diff.	KDP
adroit-pen	1	38%	57%
adroit-pen	8	41%	73%
adroit-hammer	1	47%	62%
adroit-hammer	8	41%	63%
adroit-relocate	1	35%	65%
adroit-relocate	8	64%	71%

Real-world hardware experiments

TABLE VII: Real-world closed-loop results on a Crazyflie and an SO-100 arm. Setup: Diffuser (K=64, T=20); KDP (K=64).
Domain	Method	Succ. (%)	TTS (s)	Replan (Hz)	Planner p50	E2E p50
Navigation	Diffuser	92	12.5	4.1	242	312
Navigation	KDP	94	12.0	38	8	26
Manipulation	Diffuser	88	35	3	311	410
Manipulation	KDP	90	32	17	9	47

Amortizing Trajectory Diffusion
with Keyed Drift Fields

Paper

Code

Overview

Challenge

Abstract

Algorithm

Click to see the full KDP training algorithm

Click to see full algorithm for receding-horizon planning with KDP

Quantitative Results

Short-horizon continuous control

Long-horizon, goal-conditioned planning

Dexterous manipulation and high-DoF control

Real-world hardware experiments

Qualitative Results

Simulated benchmark results

Real-world hardware results

Amortizing Trajectory Diffusionwith Keyed Drift Fields

Paper

Code

Overview

Challenge

Abstract

Algorithm

Click to see the full KDP training algorithm

Click to see full algorithm for receding-horizon planning with KDP

Quantitative Results

Short-horizon continuous control

Long-horizon, goal-conditioned planning

Dexterous manipulation and high-DoF control

Real-world hardware experiments

Qualitative Results

Simulated benchmark results

Real-world hardware results

Amortizing Trajectory Diffusion
with Keyed Drift Fields