Ultra-Fusion

A Resilient Tightly-Coupled Multi-Sensor Fusion SLAM Framework under Sensor Degradation and Spatiotemporal Perturbation for Intelligent Transportation Systems

Yihong Tian1, Junjie Zhang2, Liuyang Li3, Deteng Zhang4, Yunfei Zuo1, and Jie Yin5

1 Beijing Institute of Technology  ·  2 Chongqing University  ·  3 Sichuan University  ·  4 Northwestern Polytechnical University  ·  5 Shanghai Jiao Tong University

Robust Fusion for Degraded Multi-Sensor Localization.

Unified sliding-window estimation with observability-aware initialization, factor-wise reliability scheduling, and online spatiotemporal calibration.

HKIsland03. MaRS-LVIG reconstruction rendered with aligned 3D Gaussian splats.

HKAirport02. Large outdoor UAV sequence with stable localization and dense rendering.

Ultra-Fusion teaser overview

Overview of Ultra-Fusion. The framework unifies heterogeneous sensors in a timestamp-ordered estimator, supports deployment on ground, aerial, legged, and vehicle platforms, and improves localization robustness under sensor degradation, spatiotemporal uncertainty, and long-term or high-speed operation.

TL;DR

Ultra-Fusion is a resilient tightly-coupled multi-sensor fusion SLAM framework for intelligent transportation systems. A Unified Sliding-Window Estimator orders asynchronous measurements by timestamp and supports WIO, VIO, LIO, and LVIO with optional wheel and/or GNSS augmentation. Observability-Aware Initialization, Factor-Wise Reliability Scheduling, and Online Spatiotemporal Calibration improve robustness under sensor degradation and calibration perturbation.

Pipeline

Ultra-Fusion unified estimator pipeline

Mechanism. Ultra-Fusion first converts asynchronous sensor streams into timestamp-ordered factors, then decides which factors are observable, reliable, and calibration-safe enough to enter the same sliding-window optimization. The page evidence below follows this logic: if the scheduler and calibration modules work, difficult trajectories should remain closed, stable, and transferable.

Online Point Cloud Demo

The online demo connects the paper's mapping claim to an inspectable reconstruction: optimized poses support both colored point-cloud mapping and LiDAR-guided 3D Gaussian Splatting on MaRS-LVIG and FAST-LIVO2 outdoor scenes.

CBD_Building_01. FAST-LIVO2 outdoor sequence with stable localization.

SYSU_01. FAST-LIVO2 campus-scale sequence with dense mapping.

Colored Point Cloud
Loading point cloud...
3D Gaussian Splatting
Loading Gaussian splats...

Controls: left-drag to rotate, wheel to zoom, right-drag to pan. Arrow keys pan; hold Shift/Ctrl/Meta with arrow keys to rotate. Click a viewer panel first so keyboard input applies to that panel.

Benchmark on wheeled robots and autonomous driving

The primary benchmark asks whether one estimator can cover different sensor configurations without changing the optimization core. On M3DGR, Ultra-Fusion variants lead their compatible sensor groups; on M2DGR-Plus, the LVWIO configuration obtains the lowest average drift rate and RMSE among the compared methods.

M3DGR WIO 1.3 / 26.99

Avg. rank / ATE, improving over raw wheel odometry and Ground-Fusion WIO.

M3DGR VWIO 2.1 / 2.23

Best visual-compatible aggregate among evaluated visual fusion baselines.

M3DGR LWIO 3.5 / 0.17

Best LiDAR-compatible aggregate in the paper's grouped comparison.

M3DGR LVWIO 1.4 / 0.15

Best LiDAR-visual-wheel-inertial aggregate across ten M3DGR sequences.

M3DGR sequence highlights

M3DGR · Dark01. Nighttime low-light sequence with severe visual degradation. Ultra-Fusion: 0.08 m ATE (Rank 2nd). Top-3: PIN-SLAM 0.05 | Ours 0.08 | SR-LIVO 0.09.

M3DGR · Corridor01. Long-corridor LiDAR-degeneration sequence with repetitive structures and weak lateral constraints. Ultra-Fusion: 0.02 m ATE (Rank 1st). Top-3: Ours 0.02 | LTAOM 0.19 | Eq-LIO 0.20.

M2DGR-Plus campus-scale wheeled routes (ATE RMSE / drift rate)
Method Anomaly Switch Tree Building1 Parking1 Street2
FAST-LIVO20.10 / 1.36%1.60 / 1.86%2.70 / 3.73%1.62 / 3.51%0.81 / 2.90%0.57 / 1.56%
Ground-Fusion0.29 / 3.96%1.80 / 2.09%Fail0.60 / 1.29%0.48 / 1.73%0.23 / 0.63%
Ground-Fusion++FailFail3.40 / 4.70%FailFailFail
Ultra-Fusion (LVWIO)0.09 / 1.23%0.23 / 0.27%0.16 / 0.21%0.32 / 0.69%0.10 / 0.36%0.12 / 0.33%

Conclusion. Ultra-Fusion achieves the lowest average drift/RMSE on M2DGR-Plus: 0.59% / 0.24 m, compared with 2.32% / 1.48 m for FAST-LIVO2 and 1.71% / 0.75 m for Ground-Fusion.

Robustness under Sensor Degradation

Degradation experiments cover visual failure, LiDAR degeneracy, wheel slippage, and GNSS denial. Reliability-aware gating keeps informative constraints while suppressing inconsistent factors.

Qualitative robustness across degraded localization scenarios
Why this matters

Drift does not accumulate when the scene becomes weakly observable.

The four scenarios intentionally remove different kinds of information: long-corridor geometry, GNSS reference, stable legged motion, and clean visual correspondences. Ultra-Fusion keeps the start and end regions aligned because it does not force low-confidence measurements into the estimator.

Corridor weak geometry GNSS denial no absolute anchor Stairs oscillating platform Vegetation ambiguous vision
LiDAR-degenerate Isaac Sim sequences (ATE RMSE, m)
MethodWild01Wild02Tunnel01Tunnel02
R3LIVE308.03210.97Fail1310.17
FAST-LIVO5.31309.441.0524.64
FAST-LIVO25.9412.960.1314.42
Ultra-Fusion (wo intensity)0.100.950.082.07
Ultra-Fusion (w intensity)0.060.700.082.21

Conclusion. In strongly degenerate geometry, reliability scheduling plus optional intensity constraints stabilize localization by suppressing ill-conditioned LiDAR factors.

Additional paper observations: GNSS assistance on Longtime02 reduces ATE from 17.40 m to 8.45 m, while GNSS-denial trajectories remain continuous through denied segments; wheel reliability checks reduce average ATE versus raw wheel and Ground-Fusion WIO baselines.

GNSS augmentation and GNSS-denial trajectory evidence
External anchors become optional

GNSS helps when reliable, but local fusion remains continuous when GNSS disappears.

Integrity-checked GNSS factors limit long-term drift, while LiDAR, visual, inertial, and wheel factors maintain continuity through denied segments.

Long-Term and High-Speed Operation

Long-Term Operation

Extended M3DGR trajectories evaluate whether localization drift stays bounded over long-duration wheeled routes.

Long-duration M3DGR sequences (ATE RMSE, m)
MethodLongtime01Longtime02
FAST-LIVO20.527.5
FAST-LIVO25.138.4
Ground-Fusion++7.515.9
Ultra-Fusion (LVWIO)4.32.8

High-Speed Operation

Urban driving on KAIST stresses wheel-inertial prediction, LiDAR geometry, visual correction, and calibration consistency under high-speed motion.

KAIST · Urban 29 exceeds 29 km/h peak speed and 3.5 km route length. Ultra-Fusion maintains stable localization and accurate mapping on this sequence.

Onboard running. Localization and mapping during the Urban 29 drive.

KAIST Urban 29 trajectory overlaid on Google Maps

Google Maps overlay. The same Urban 29 run after completion, with the estimated trajectory projected onto the satellite map.

Both panels visualize the same KAIST Urban 29 sequence: the left panel replays the onboard run; the right panel shows the post-run trajectory on Google Maps.

Across the full KAIST benchmark (up to 96.9 km/h), Ultra-Fusion (LVWIO) reaches average drift around 0.38% with strong stability on high-speed urban sequences.

Robustness Across Platforms

Cross-platform evaluation tests whether the same estimation design transfers beyond the M3DGR ground robot: high-speed driving on KAIST, legged locomotion on GrandTour, and low-altitude UAV mapping on MARS-LVIG.

GrandTour (Legged Robot)

On the quadruped benchmark GrandTour, Ultra-Fusion (LVIO) achieves the lowest RTE on 3/4 representative sequences (e.g., 0.41 cm on SPX-2, 0.34 cm on SNOW-2, 0.26 cm on EIG-1).

GrandTour · EIG-1. Hill-to-station descent with long walking and mixed planar/non-planar structures. Ultra-Fusion: 0.26 cm RTE (Rank 1st). Top-3: Ours 0.26 | Coco-LIC 0.40 | Fast-LIMO 1.10.

GrandTour · SPX-2. Jungfraujoch research-station sequence with dynamic initialization, thin structures, moving objects, and loop closure. Ultra-Fusion: 0.41 cm RTE (Rank 1st). Top-3: Ours 0.41 | Coco-LIC 0.44 | Voxel-SLAM 1.03.

GrandTour · ARC-2. Narrow passages in a rescue training center, with limited visibility and featureless walls challenging both vision and LiDAR. Ultra-Fusion: 0.90 cm RTE (Rank 2nd). Top-3: FAST-LIVO2 0.70 | Ours 0.90 | Coco-LIC 1.01.

GrandTour · SNOW-2. Fast walking on fully snow-covered terrain, with limited planar structures and small-scale loop closures. Ultra-Fusion: 0.34 cm RTE (Rank 1st). Top-3: Ours 0.34 | Coco-LIC 0.41 | Fast-LIMO 1.16.

MARS-LVIG (UAV)

On the aerial benchmark MARS-LVIG, Ultra-Fusion attains the best average rank / RMSE (1.5 / 1.47) with strong results across airport, island, town, and valley sequences under large viewpoint and altitude variation.

MARS-LVIG · HKisland03. Downward-looking island mapping at about 90 m altitude and 9 m/s flight speed, with large-scale sparse structures. Ultra-Fusion: 0.87 m ATE (Rank 1st). Top-3: Ours 0.87 | FAST-LIVO2 0.89 | Ground-Fusion++ 1.71.

MARS-LVIG · HKairport02. Open airfield aerial mapping at about 80 m altitude and 6 m/s flight speed, with weak geometric constraints. Ultra-Fusion: 0.61 m ATE (Rank 1st). Top-3: Ours 0.61 | R3LIVE 0.82 | AKF-LIO 0.87.

Conclusion. The same unified estimator transfers from wheeled ITS to legged and aerial platforms without redesigning the core fusion architecture.

Evidence Chain: Why Ultra-Fusion Is Robust

The experimental story follows the estimator design: first verify that the robustness modules are causal, then stress the system with miscalibrated factors, and finally validate real-time runtime feasibility.

Module Ablations

Two core modules are ablated: Factor-Wise Reliability Scheduling (FRS) and Observability-Aware Initialization. The reported gains consistently show that robustness improvements come from reliability-aware factor gating and adaptive bootstrap, rather than adding sensors alone.

LiDAR FRS gain ATE -75.3%

Mean ATE drops by 0.45 m with LiDAR FRS enabled.

Visual FRS gain ATE -36.2%

Mean ATE drops by 1.60 m under visual degradation.

Wheel FRS gain ATE -41.3%

Mean ATE drops by 1.56 m with wheel consistency checks.

Adaptive initialization 0.153 s init

Mean init latency improves from 4.642 s (wo adaptive) to 0.153 s.

Initialization ablation (18 sequences with complete first-20s outputs)
Method Mean init (s) Median init (s) Mean 20s ATE (m)
Ground-Fusion++2.1181.05885.217
FAST-LIVO20.9130.67128.687
FAST-LIVO1.4361.19273.754
Ultra-Fusion (wo adaptive init)4.6422.09716.808
Ultra-Fusion (full)0.1530.1500.483

Conclusion. Adaptive bootstrap is necessary for both fast start-up and stable early-window localization.

Robustness under Spatiotemporal Miscalibration

Predicted temporal offsets under injected LiDAR-IMU delays
Calibration as online evidence

Temporal offsets are estimated from motion consistency instead of assumed fixed.

The predicted offset distributions concentrate near the injected delays, supporting the claim that Ultra-Fusion treats timing uncertainty as part of localization rather than as a preprocessing assumption.

Injected IMU time offsets on Wild01 (RMSE, m)
OffsetFAST-LIVOFAST-LIVO2Ground-FusionGround-Fusion++UF wo OSCUF full
0 ms8.69998.04063.41948.80260.04840.0319
+100 ms38.143610.49904.17328.92720.07180.0348
+200 ms77.06219.702810.06829.09520.46840.0344
-200 ms157.529710.899330.41768.60850.13040.0375
-300 ms16.791815.49563.59998.49331.00940.0403

Conclusion. Online temporal calibration keeps Ultra-Fusion stable under large injected delays, with full model errors remaining near centimeter scale in this benchmark.

Extrinsic rotation perturbation on HILTI22 (RMSE, m)
Rotation errorFAST-LIVO2Ground-Fusion++UF wo extr calibUF full
0 deg0.152.930.120.10
3 deg0.132.440.340.10
7 deg8.811.500.430.14
8 deg145.112.750.440.10
10 deg940.373.810.750.25

Conclusion. Online extrinsic calibration markedly improves tolerance to large rotational perturbations.

Runtime Analysis

Runtime is profiled on 60 s segments with Robosense, Velodyne, Livox, and Hesai LiDARs using a real-time configuration: per-frame LiDAR frontend, window size 4, at most 3 nonlinear iterations, 12 ms solver budget, and capped factors on an Intel Core i9-14900K CPU. Ultra-Fusion requires 5.48–10.73 ms per optimization step, satisfying real-time operation in this setting.

Time consumption per LiDAR scan of Ultra-Fusion with baselines on different LiDAR types

Time consumption per LiDAR scan of Ultra-Fusion with baselines on different LiDAR types.

BibTeX

@article{tian2026ultrafusion,
  author  = {Yihong Tian and Junjie Zhang and Liuyang Li and Deteng Zhang and Yunfei Zuo and Jie Yin},
  title   = {Ultra-Fusion: A Resilient Tightly-Coupled Multi-Sensor Fusion SLAM Framework under Sensor Degradation and Spatiotemporal Perturbation for Intelligent Transportation Systems},
  journal = {arXiv preprint arXiv:2606.21223},
  year    = {2026},
}