RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation
Abstract
Roadside Collaborative Perception refers to a system where multiple roadside units collaborate to pool their perceptual data, assisting vehicles in enhancing their environmental awareness. Existing roadside perception methods concentrate on model design but overlook data issues like calibration errors, sparse information, and multi-view consistency, leading to poor performance on recent published datasets. To significantly enhance roadside collaborative perception and address critical data issues, we present the first simulation framework RoCo-Sim for road-side collaborative perception. RoCo-Sim is capable of generating diverse, multi-view consistent simulated roadside data through dynamic foreground editing and full-scene style transfer of a single image. RoCo-Sim consists of four components: (1) Camera Extrinsic Optimization ensures accurate 3D to 2D projection for roadside cameras; (2) A novel Multi-View Occlusion-Aware Sampler (MOAS) determines the placement of diverse digital assets within 3D space; (3) DepthSAM innovatively models foreground-background relationships from single-frame fixed-view images, ensuring multi-view consistency of foreground; and (4) Scalable Post-Processing Toolkit generates more realistic and enriched scenes through style transfer and other enhancements. RoCo-Sim significantly improves roadside 3D object detection, outperforming SOTA methods by 83.74 on Rcooper-Intersection and 83.12 on TUMTraf-V2X for AP70. RoCo-Sim fills a critical gap in roadside perception simulation. Code and pre-trained models will be released soon: https://github.com/duyuwen-duen/RoCo-Sim
Community
This work unleashes the power of simulation data to enhance roadside collaborative perception, demonstrating that simulation data, rather than model architecture, is the true winner in performance competition. The proposed RoCo-Sim is the first simulation framework specifically designed for roadside collaborative perception. RoCo-Sim generates a large scale of multi-view consistent simulation data from sparse fixed viewpoints, supports both foreground editing and full-scene style transformation, and can
be rapidly deployed to any new road environment. We hope this work will significantly accelerate the practical development of roadside collaborative perception, ultimately enhancing driving safety. Code and pre-trained models will be released soon.
Paper: https://arxiv.org/pdf/2503.10410
Github: https://github.com/duyuwen-duen/RoCo-Sim
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Doracamom: Joint 3D Detection and Occupancy Prediction with Multi-view 4D Radars and Cameras for Omnidirectional Perception (2025)
- RS2V-L: Vehicle-Mounted LiDAR Data Generation from Roadside Sensor Observations (2025)
- PFSD: A Multi-Modal Pedestrian-Focus Scene Dataset for Rich Tasks in Semi-Structured Environments (2025)
- Griffin: Aerial-Ground Cooperative Detection and Tracking Dataset and Benchmark (2025)
- Dur360BEV: A Real-world 360-degree Single Camera Dataset and Benchmark for Bird-Eye View Mapping in Autonomous Driving (2025)
- MetaOcc: Surround-View 4D Radar and Camera Fusion Framework for 3D Occupancy Prediction with Dual Training Strategies (2025)
- HeightFormer: Learning Height Prediction in Voxel Features for Roadside Vision Centric 3D Object Detection via Transformer (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on HF中国镜像站 checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper