Abstract

Teaser image

A main bottleneck of learning-based robotic scene understanding methods is the heavy reliance on extensive annotated training data, which often limits their generalization ability. In LiDAR panoptic segmentation, this challenge becomes even more pronounced due to the need to simultaneously address both semantic and instance segmentation from complex, high-dimensional point cloud data. In this work, we address the challenge of LiDAR panoptic segmentation with very few labeled samples by leveraging recent advances in label-efficient vision panoptic segmentation. To this end, we propose a novel method, Limited-Label LiDAR Panoptic Segmentation (L³PS), which requires only a minimal amount of labeled data. Our approach first utilizes a label-efficient 2D network to generate panoptic pseudo-labels from a small set of annotated images, which are subsequently projected onto point clouds. We then introduce a novel 3D refinement module that capitalizes on the geometric properties of point clouds. By incorporating clustering techniques, sequential scan accumulation, and ground point separation, this module significantly enhances the accuracy of the pseudo-labels, improving segmentation quality by up to +10.6 PQ and +7.9 mIoU. We demonstrate that these refined pseudo-labels can be used to effectively train off-the-shelf LiDAR segmentation networks. Through extensive experiments, we show that L³PS not only outperforms existing methods but also substantially reduces the annotation burden. We release the code of our work on GitHub.

Technical Approach

Overview of our approach

In this work, we propose a novel method for addressing Limited-Label LiDAR Panoptic Segmentation (L³PS) by focusing on label efficiency in terms of annotation costs. We argue that generating panoptic annotations for 2D images is substantially less expensive than for 3D point clouds. Therefore, our L³PS approach leverages recent 2D label reduction techniques to generate large-scale 2D pseudo-labels from a minimal set of images. This is followed by a multi-step enhancement process that produces 3D panoptic annotations. The resulting pseudo-labels are used to train an off-the-shelf LiDAR panoptic segmentation network for real-time deployment.

Toy example of the refinement process

We refine the initial labels using 3D point cloud data through a series of steps. First, we compute relative 3D positions of consecutive LiDAR scans using KISS-ICP and accumulate them into larger point clouds, where each represents a single scene. This follows standard practices in autonomous driving datasets. Next, we apply Patchwork++ to segment each scene into ground and non-ground regions, which are processed separately. We then use HDBSCAN to group points into clusters for both partitions, assigning unclustered points to the nearest existing cluster using a k-nearest neighbors classifier. Each cluster is expected to represent a distinct object, ensuring all points within it share the same label. To further improve accuracy, we identify rare semantic classes, such as construction vehicles, that are underrepresented in the initial labels. If a cluster contains a high percentage of void labels, it is assigned as void; otherwise, if a rare class appears frequently, it is used as the label. In all other cases, the most common label in the cluster is assigned. This process enhances label quality by leveraging 3D spatial information for more precise segmentation.

Video

Code

A software implementation of this project based on PyTorch can be found in our GitHub repository for academic usage and is released under the GPLv3 license. For any commercial purpose, please contact the authors.

Publications

If you find our work useful, please consider citing our paper:

Ahmet Selim Çanakçı, Niclas Vödisch, Kürsat Petek, Wolfram Burgard, and Abhinav Valada
Label-Efficient LiDAR Panoptic Segmentation
arXiv preprint arXiv:2503.02372, 2025.

(PDF) (BibTeX)

Authors

Ahmet Selim Çanakçı

Ahmet Selim Çanakçı

University of Freiburg

Niclas Vödisch

Niclas Vödisch

University of Freiburg

Kürsat Petek

Kürsat Petek

University of Freiburg

Wolfram Burgard

Wolfram Burgard

University of Technology Nuremberg

Abhinav Valada

Abhinav Valada

University of Freiburg

Acknowledgment

This work was funded by the German Research Foundation (DFG) Emmy Noether Program grant number 468878300.