TMECH 2026

Legged Open-Vocabulary Object Navigator

1 HKUST(Guangzhou)   2 Beijing Innovation Center of Humanoid Robotics   3 HKUST
LOVON System Overview
01
Abstract
Object navigation in open-world environments remains a formidable and pervasive challenge for robotic systems, particularly when it comes to executing long-horizon tasks that require both open-world object detection and high-level task planning. Traditional methods often struggle to integrate these components effectively, and this limits their capability to deal with complex, long-range navigation missions. In this paper, we propose LOVON, a novel framework that integrates large language models (LLMs) for hierarchical task planning with open-vocabulary visual detection models, tailored for effective long-range object navigation in dynamic, unstructured environments. To tackle real-world challenges including visual jittering, blind zones, and temporary target loss, we design dedicated solutions such as Laplacian Variance Filtering for visual stabilization. We also develop a functional execution logic for the robot that guarantees LOVON's capabilities in autonomous navigation, task adaptation, and robust task completion. Extensive evaluations demonstrate the successful completion of long-sequence tasks involving real-time detection, search, and navigation toward open-vocabulary dynamic targets. Furthermore, real-world experiments across different legged robots (Unitree Go2, B2, and H1-2) showcase the compatibility and appealing plug-and-play feature of LOVON.
02
Video Demonstration
Watch LOVON navigate toward open-vocabulary targets across multiple robot platforms and environments.
03
Core Capabilities
Four pillars of the LOVON framework for open-world legged robot navigation.
01

LLM Hierarchical Planning

Large language models decompose complex long-horizon navigation missions into ordered basic instructions with adaptive replanning and execution logic.

02

Open-Vocabulary Detection

Detect any object by natural language — no pre-defined categories required.

03

Laplacian Filtering

Visual stabilization that eliminates motion blur during dynamic locomotion.

04

Multi-Embodiment Plug-and-Play

Seamlessly deployable on Go2, B2 (quadruped), and H1-2 (humanoid) — same framework, different morphologies. Only 1.5 hours of training time with a compact model size.

SR 1.00
Most Envs
1.5h
Training
3
Platforms
240x
Faster Train
04
System Pipeline
LLM task planner decomposes tasks → detection model processes filtered video → Language-to-Motion Model generates control vectors.
LOVON Pipeline
05
Simulation Results
LOVON achieves SR 1.00 across most environments — outperforming EVT and matching SOTA TrackVLA with 240x less training time (1.5h vs 360h).
Results Table
06
Demo Gallery
Scroll through real-world navigation demos across diverse targets and robots.
 Scroll horizontally
Go2 → Backpack
B2 → Person
Go2 → Office
Go2 → Fridge
Go2 → Bench
Go2 → Bike
Go2 → Car
Go2 → Ball
H1-2 → Chair
H1-2 → Person
Go2 → Dog
Go2 → Stairs
Go2 → Plants
Go2 → Chair Kick
07
Capabilities
Multi-embodiment, open-world environments, long-horizon tasks, and robustness to visual disturbances.

Multi-Embodiment

LOVON operates across H1-2 humanoid, Go2 and B2 quadrupeds — same framework, seamless deployment.

Open-World Seeking

Indoor offices, labs, stairs; outdoor parking, playgrounds, wild grass — detecting targets in real-time.

Long-Horizon Task

Multi-target navigation: "Run to the backpack, then to the chair at 0.5 m/s, then approach the person fastly."

Recapture Lost Target

An umbrella blocks the view — LOVON recovers from the occlusion and continues approaching the target.

Dynamic Tracking

Tracking a moving person in wild grass — the robot maintains safe distance and real-time detection.

Challenging Terrain

Navigating spiral staircases and uneven surfaces while maintaining real-time target detection.

08
Citation
If you find LOVON useful for your research, please cite our paper.
@article{daojie2025lovon,
  title={LOVON: Legged Open-Vocabulary Object Navigator},
  author={Peng, Daojie and Cao, Jiahang and Zhang, Qiang and Ma, Jun},
  journal={arXiv preprint arXiv:2507.06747},
  year={2025}
}