AttenA+: Rectifying Action Inequality in Robotic Foundation Models

Abstract

Existing robotic foundation models, while powerful, are predicated on an implicit assumption of temporal homogeneity: treating all actions as equally informative during optimization. This "flat" training paradigm remains indifferent to the underlying physical hierarchy of manipulation. In reality, robot trajectories are fundamentally heterogeneous — low-velocity segments often dictate task success through precision-demanding interactions, while high-velocity motions serve as error-tolerant transitions. To rectify this misalignment, we introduce AttenA+, an architecture-agnostic framework that prioritizes kinematically critical segments via velocity-driven action attention. By reweighting the training objective based on the inverse velocity field, AttenA+ naturally aligns the model's learning capacity with the physical demands of manipulation. As a plug-and-play enhancement, it integrates into existing backbones without structural modifications or additional parameters. Extensive experiments demonstrate significant improvements across state-of-the-art models, with real-world validation on a Franka manipulator further showcasing its robustness and cross-task generalization.

Motivation

Action Inequality in Robot Trajectories

Not all actions are created equal. Velocity field analysis reveals that slow, precision-demanding steps (grasp, align, place) are far more critical than fast transitions.

Figure 2: Velocity fields reveal inherent action inequality. Rapid motions are often redundant transitions, while slow-motion phases dominate task success or failure.

Method

AttenA+ Framework

A paradigm-agnostic plug-in that reweights training loss via inverse velocity field — no architecture changes, no extra parameters.

Figure 3: Given visual and language observations, the velocity field assigns higher attention weights to slow, critical manipulation steps and lower weights to fast transitional motions.

Step 1 — Velocity Magnitude

Compute the L2 norm of the velocity components (joint velocities) for each action step in the chunk.

v_t = || a_t[1:6] ||_2

Step 2 — Attention Weight

Map velocity to a monotonically decreasing weight function, clipped to a configurable range for stability.

w_t ~ 1 / v_t² w_t ∈ [1/c_max, c_max]

Step 3 — Weighted Loss

Multiply per-step loss by attention weight. Works for both discriminative (L1) and generative (MSE / flow-matching) objectives.

L = (1/TD) Σ w_t · |a_pred - a_gt|

Plug-and-Play

Integrates into OpenVLA-OFT, π₀, π₀.₅, FastWAM and more — just add the velocity attention wrapper to your training loop.

from attena import VelocityAttention attena = VelocityAttention( weight_strategy="inverse_squared", clip_max_weight=2.0)

Figure 1: Overview of AttenA+. A paradigm-agnostic enhancement framework that seamlessly plugs into mainstream discriminative and generative architectures, as well as emerging World-Action Models, consistently improving success rates across diverse benchmarks.

Results

Performance Benchmarks

Consistent gains across simulation benchmarks and real-world robot experiments.

LIBERO Benchmark

Method	Spatial	Object	Goal	Long	SR (%)	ER (%)
OpenVLA-OFT	97.6	98.4	97.9	94.5	97.1	2.9
π₀	96.8	98.8	95.8	85.2	94.15	5.85
UniVLA	96.5	96.8	95.6	92.0	95.23	4.78
VLA-ADP	99.0	98.2	96.8	91.2	96.3	3.7
AttenA+OFT (Ours)	99.0	100.0	98.8	96.6	98.6	1.4
AttenA+π₀.₅ (Ours)	99.2	99.6	98.8	94.2	97.95	2.05

RoboTwin 2.0 Benchmark

Method	Embodied PT.	Clean	Random	SR (%)	ER (%)
π₀	✓	65.92	58.40	62.2	37.8
π₀.₅	✓	82.74	76.76	79.75	20.25
Motus	✓	88.66	87.02	87.8	12.2
LingBot-VA	✓	92.90	91.50	92.2	7.8
Fast-WAM	✗	91.88	91.78	91.8	8.2
AttenA+WAM (Ours)	✗	93.06	91.86	92.46	7.54

Figure 4: Qualitative comparison with/without AttenA+. The baseline fails due to accumulated errors in slow, critical manipulation steps (clip, align, release). AttenA+ prioritizes these high-precision segments, leading to stable task completion.

Simulation Demos

LIBERO: AttenA+ vs Baseline

Top row: AttenA+OFT successfully completes all tasks. Bottom row: baseline fails on precision-demanding actions.

AttenA+OFT — Success

Spatial

Object

Goal

Long

OpenVLA-OFT Baseline — Failed

Spatial

Object

Goal

Long

RoboTwin 2.0

Simulation Task Demonstrations

AttenA+WAM across 24 diverse tasks. Each card shows clean environment (left) and randomized environment (right).

CleanRandom

Adjust Bottle

CleanRandom

Pick Dual Bottles

CleanRandom

Stack Bowls

CleanRandom

Open Microwave

CleanRandom

Place Burger & Fries

CleanRandom

Shake Bottle

CleanRandom

Stack Blocks

CleanRandom

Place Bread Basket

CleanRandom

Click Bell

CleanRandom

Dump Bin

CleanRandom

Grab Roller

CleanRandom

Rotate QR Code

CleanRandom

Scan Object

CleanRandom

Hang Mug

CleanRandom

Press Stapler

CleanRandom

Lift Pot

CleanRandom

Handover Block

CleanRandom

Move Can to Pot

CleanRandom

Place A→B Left

CleanRandom

Place Dual Shoes

CleanRandom

Put Object in Cabinet

CleanRandom

Stack 3 Bowls

CleanRandom

Blocks Ranking by Size

CleanRandom

Place Object on Scale

Real-World Validation

Franka Manipulator Experiments

AttenA+OFT deployed on a real Franka Panda robot. Each task evaluated over 50 trials, with the largest gains on complex multi-object (+8%) and long-horizon (+6%) tasks.

Real-World Results

Model	Close Drawer	Put Cube	Multi-Object	Long-Horizon	SR (%)	ER (%)
OpenVLA-OFT	100	96	90	84	92.5	7.5
AttenA+OFT (Ours)	100	100	98	90	97.0	3.0

(a) Close Drawer

(b) Put Cube

(c) Multi-Object

(d) Long-Horizon

Figure 5: Overview of experimental tasks. I. Simulation: LIBERO and RoboTwin benchmarks. II. Real-world Franka experiments: (a) drawer opening, (b) pick-and-place, (c) multi-objects, (d) sequential manipulation.

Ecosystem

Supported Base Models

AttenA+ is architecture-agnostic — plug it into any action-centric foundation model.

OpenVLA-OFT

Discriminative VLA

FastWAM

World-Action Model

π₀ / π₀.₅

Generative Flow Matching

More Models

ACT, Diffusion Policy, ...

Citation

BibTeX

@inproceedings{peng2026attenaplus,
  title     = {AttenA+: Rectifying Action Inequality in Robotic Foundation Models},
  author    = {Peng, Daojie and Ma, Fulong and Cao, Jiahang and Zhang, Qiang
               and Xie, Xupeng and Guo, Jian and Luo, Ping
               and Luo, Andrew F. and Zhou, Boyu and Ma, Jun},
  booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
  year      = {2026},
}