Velocity-Field Action Attention

Rectifying Action Inequality in Robotic Foundation Models

98.6%
LIBERO SR
92.4%
RoboTwin SR
+0
Parameters
4+
Base Models
Abstract

Existing robotic foundation models, while powerful, are predicated on an implicit assumption of temporal homogeneity: treating all actions as equally informative during optimization. This "flat" training paradigm remains indifferent to the underlying physical hierarchy of manipulation. In reality, robot trajectories are fundamentally heterogeneous — low-velocity segments often dictate task success through precision-demanding interactions, while high-velocity motions serve as error-tolerant transitions. To rectify this misalignment, we introduce AttenA+, an architecture-agnostic framework that prioritizes kinematically critical segments via velocity-driven action attention. By reweighting the training objective based on the inverse velocity field, AttenA+ naturally aligns the model's learning capacity with the physical demands of manipulation. As a plug-and-play enhancement, it integrates into existing backbones without structural modifications or additional parameters. Extensive experiments demonstrate significant improvements across state-of-the-art models, with real-world validation on a Franka manipulator further showcasing its robustness and cross-task generalization.

Motivation
Action Inequality in Robot Trajectories
Not all actions are created equal. Velocity field analysis reveals that slow, precision-demanding steps (grasp, align, place) are far more critical than fast transitions.
Velocity field analysis
Figure 2: Velocity fields reveal inherent action inequality. Rapid motions are often redundant transitions, while slow-motion phases dominate task success or failure.
Method
AttenA+ Framework
A paradigm-agnostic plug-in that reweights training loss via inverse velocity field — no architecture changes, no extra parameters.
AttenA+ Framework
Figure 3: Given visual and language observations, the velocity field assigns higher attention weights to slow, critical manipulation steps and lower weights to fast transitional motions.

Step 1 — Velocity Magnitude

Compute the L2 norm of the velocity components (joint velocities) for each action step in the chunk.

v_t = || a_t[1:6] ||_2

Step 2 — Attention Weight

Map velocity to a monotonically decreasing weight function, clipped to a configurable range for stability.

w_t ~ 1 / v_t² w_t ∈ [1/c_max, c_max]

Step 3 — Weighted Loss

Multiply per-step loss by attention weight. Works for both discriminative (L1) and generative (MSE / flow-matching) objectives.

L = (1/TD) Σ w_t · |a_pred - a_gt|

Plug-and-Play

Integrates into OpenVLA-OFT, π₀, π₀.₅, FastWAM and more — just add the velocity attention wrapper to your training loop.

from attena import VelocityAttention attena = VelocityAttention( weight_strategy="inverse_squared", clip_max_weight=2.0)
AttenA+ Overview
Figure 1: Overview of AttenA+. A paradigm-agnostic enhancement framework that seamlessly plugs into mainstream discriminative and generative architectures, as well as emerging World-Action Models, consistently improving success rates across diverse benchmarks.
Results
Performance Benchmarks
Consistent gains across simulation benchmarks and real-world robot experiments.
LIBERO Benchmark
MethodSpatialObjectGoalLongSR (%)ER (%)
OpenVLA-OFT97.698.497.994.597.12.9
π₀96.898.895.885.294.155.85
UniVLA96.596.895.692.095.234.78
VLA-ADP99.098.296.891.296.33.7
AttenA+OFT (Ours)99.0100.098.896.698.61.4
AttenA+π₀.₅ (Ours)99.299.698.894.297.952.05
RoboTwin 2.0 Benchmark
MethodEmbodied PT.CleanRandomSR (%)ER (%)
π₀65.9258.4062.237.8
π₀.₅82.7476.7679.7520.25
Motus88.6687.0287.812.2
LingBot-VA92.9091.5092.27.8
Fast-WAM91.8891.7891.88.2
AttenA+WAM (Ours)93.0691.8692.467.54
Qualitative Comparison
Figure 4: Qualitative comparison with/without AttenA+. The baseline fails due to accumulated errors in slow, critical manipulation steps (clip, align, release). AttenA+ prioritizes these high-precision segments, leading to stable task completion.
Simulation Demos
LIBERO: AttenA+ vs Baseline
Top row: AttenA+OFT successfully completes all tasks. Bottom row: baseline fails on precision-demanding actions.
AttenA+OFT — Success
Spatial
Object
Goal
Long
OpenVLA-OFT Baseline — Failed
Spatial
Object
Goal
Long
RoboTwin 2.0
Simulation Task Demonstrations
AttenA+WAM across 24 diverse tasks. Each card shows clean environment (left) and randomized environment (right).
CleanRandom
Adjust Bottle
CleanRandom
Pick Dual Bottles
CleanRandom
Stack Bowls
CleanRandom
Open Microwave
CleanRandom
Place Burger & Fries
CleanRandom
Shake Bottle
CleanRandom
Stack Blocks
CleanRandom
Place Bread Basket
CleanRandom
Click Bell
CleanRandom
Dump Bin
CleanRandom
Grab Roller
CleanRandom
Rotate QR Code
CleanRandom
Scan Object
CleanRandom
Hang Mug
CleanRandom
Press Stapler
CleanRandom
Lift Pot
CleanRandom
Handover Block
CleanRandom
Move Can to Pot
CleanRandom
Place A→B Left
CleanRandom
Place Dual Shoes
CleanRandom
Put Object in Cabinet
CleanRandom
Stack 3 Bowls
CleanRandom
Blocks Ranking by Size
CleanRandom
Place Object on Scale
Real-World Validation
Franka Manipulator Experiments
AttenA+OFT deployed on a real Franka Panda robot. Each task evaluated over 50 trials, with the largest gains on complex multi-object (+8%) and long-horizon (+6%) tasks.
Real-World Results
ModelClose DrawerPut CubeMulti-ObjectLong-HorizonSR (%)ER (%)
OpenVLA-OFT10096908492.57.5
AttenA+OFT (Ours)100100989097.03.0
(a) Close Drawer
(b) Put Cube
(c) Multi-Object
(d) Long-Horizon
Real Experimental Tasks
Figure 5: Overview of experimental tasks. I. Simulation: LIBERO and RoboTwin benchmarks. II. Real-world Franka experiments: (a) drawer opening, (b) pick-and-place, (c) multi-objects, (d) sequential manipulation.
Ecosystem
Supported Base Models
AttenA+ is architecture-agnostic — plug it into any action-centric foundation model.

OpenVLA-OFT

Discriminative VLA

FastWAM

World-Action Model

π₀ / π₀.₅

Generative Flow Matching

More Models

ACT, Diffusion Policy, ...

Citation
BibTeX
@inproceedings{peng2026attenaplus,
  title     = {AttenA+: Rectifying Action Inequality in Robotic Foundation Models},
  author    = {Peng, Daojie and Ma, Fulong and Cao, Jiahang and Zhang, Qiang
               and Xie, Xupeng and Guo, Jian and Luo, Ping
               and Luo, Andrew F. and Zhou, Boyu and Ma, Jun},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year      = {2026},
}