LOVON — Legged Open-Vocabulary Object Navigator

01

Abstract

Object navigation in open-world environments remains a formidable and pervasive challenge for robotic systems, particularly when it comes to executing long-horizon tasks that require both open-world object detection and high-level task planning. Traditional methods often struggle to integrate these components effectively, and this limits their capability to deal with complex, long-range navigation missions. In this paper, we propose LOVON, a novel framework that integrates large language models (LLMs) for hierarchical task planning with open-vocabulary visual detection models, tailored for effective long-range object navigation in dynamic, unstructured environments. To tackle real-world challenges including visual jittering, blind zones, and temporary target loss, we design dedicated solutions such as Laplacian Variance Filtering for visual stabilization. We also develop a functional execution logic for the robot that guarantees LOVON's capabilities in autonomous navigation, task adaptation, and robust task completion. Extensive evaluations demonstrate the successful completion of long-sequence tasks involving real-time detection, search, and navigation toward open-vocabulary dynamic targets. Furthermore, real-world experiments across different legged robots (Unitree Go2, B2, and H1-2) showcase the compatibility and appealing plug-and-play feature of LOVON.

02

Video Demonstration

Watch LOVON navigate toward open-vocabulary targets across multiple robot platforms and environments.

03

Core Capabilities

Four pillars of the LOVON framework for open-world legged robot navigation.

01

LLM Hierarchical Planning

Large language models decompose complex long-horizon navigation missions into ordered basic instructions with adaptive replanning and execution logic.

02

Open-Vocabulary Detection

Detect any object by natural language — no pre-defined categories required.

03

Laplacian Filtering

Visual stabilization that eliminates motion blur during dynamic locomotion.

04

Multi-Embodiment Plug-and-Play

Seamlessly deployable on Go2, B2 (quadruped), and H1-2 (humanoid) — same framework, different morphologies. Only 1.5 hours of training time with a compact model size.

SR 1.00

Most Envs

1.5h

Training

3

Platforms

240x

Faster Train

04

System Pipeline

LLM task planner decomposes tasks → detection model processes filtered video → Language-to-Motion Model generates control vectors.

05

Simulation Results

LOVON achieves SR 1.00 across most environments — outperforming EVT and matching SOTA TrackVLA with 240x less training time (1.5h vs 360h).

06

Demo Gallery

Scroll through real-world navigation demos across diverse targets and robots.

Scroll horizontally

Go2 → Backpack

B2 → Person

Go2 → Office

Go2 → Fridge

Go2 → Bench

Go2 → Bike

Go2 → Car

Go2 → Ball

H1-2 → Chair

H1-2 → Person

Go2 → Dog

Go2 → Stairs

Go2 → Plants

Go2 → Chair Kick

07

Capabilities

Multi-embodiment, open-world environments, long-horizon tasks, and robustness to visual disturbances.

Multi-Embodiment

LOVON operates across H1-2 humanoid, Go2 and B2 quadrupeds — same framework, seamless deployment.

Open-World Seeking

Indoor offices, labs, stairs; outdoor parking, playgrounds, wild grass — detecting targets in real-time.

Long-Horizon Task

Multi-target navigation: "Run to the backpack, then to the chair at 0.5 m/s, then approach the person fastly."

Recapture Lost Target

An umbrella blocks the view — LOVON recovers from the occlusion and continues approaching the target.

Dynamic Tracking

Tracking a moving person in wild grass — the robot maintains safe distance and real-time detection.

Challenging Terrain

Navigating spiral staircases and uneven surfaces while maintaining real-time target detection.

08

Citation

If you find LOVON useful for your research, please cite our paper.

@article{daojie2025lovon,
  title={LOVON: Legged Open-Vocabulary Object Navigator},
  author={Peng, Daojie and Cao, Jiahang and Zhang, Qiang and Ma, Jun},
  journal={arXiv preprint arXiv:2507.06747},
  year={2025}
}