Research2026-07-01

Scenario Generation for Testing of Autonomous Driving Systems Using Real-World Failure Records

Originally published byArxiv CS.AI

arXiv:2606.31131v1 Announce Type: new Abstract: To ensure safe on-road behavior, pre-deployment testing and failure discovery of Autonomous Driving Systems (ADS) is crucial. Present day simulation based testing methods focus largely on mathematical models for efficient search of optimal scenarios,...

What Happened

A new research paper on arXiv proposes a method for generating test scenarios for autonomous driving systems (ADS) by leveraging real-world failure records rather than purely synthetic or mathematically generated cases. The approach addresses a fundamental bottleneck in ADS validation: how to efficiently discover edge cases that could cause system failures without relying on unrealistic or overly abstract scenario models. By using actual failure data from on-road testing or deployed fleets, the method aims to produce more relevant and high-risk scenarios for simulation-based testing.

Why It Matters

The core challenge in autonomous vehicle safety is that critical failures are rare events—often termed "long-tail" problems. Current simulation testing methods typically rely on mathematical optimization or random search to find scenarios that stress the ADS. These approaches can generate many scenarios, but they may miss the specific, nuanced conditions that lead to real-world failures. The proposed method shifts the focus from generating scenarios from scratch to transforming and augmenting known failure records. This is significant because:

Relevance over volume: A single scenario derived from a real crash or near-miss is more informative than thousands of synthetic cases that lack grounding in actual driving conditions.
Data efficiency: Instead of exploring an infinite scenario space, the method narrows the search to regions where failures have already occurred, reducing computational cost.
Bridging simulation and reality: By starting with real-world failure records, the generated scenarios are more likely to preserve the physical and perceptual complexities (e.g., lighting, weather, road geometry) that cause ADS misbehavior.

For the autonomous driving industry, this could accelerate the validation cycle. Currently, companies like Waymo and Cruise log millions of miles, but manually extracting and replaying failure scenarios is labor-intensive. Automating the generation of high-fidelity, failure-derived test cases could help identify system weaknesses before deployment.

Implications for AI Practitioners

Data-centric AI for safety: This work reinforces the shift from model-centric to data-centric AI. Practitioners should invest in curating and labeling failure records from their own deployments or public datasets (e.g., nuScenes, Waymo Open Dataset) as a foundation for scenario generation.

Simulation fidelity requirements: The method’s success depends on how accurately the simulation environment can reproduce the conditions of the original failure record. Practitioners need to ensure their simulators (e.g., CARLA, LGSVL) can faithfully recreate sensor noise, lighting variations, and dynamic agent behaviors.

Integration with existing testing pipelines: The generated scenarios must be compatible with current regression testing and continuous integration workflows. This means outputting scenarios in standard formats (e.g., OpenSCENARIO) and automating their injection into simulation runs.

Coverage metrics: Practitioners should consider developing new coverage metrics that measure how well the generated scenarios cover the failure modes observed in real-world data, rather than relying solely on traditional metrics like code coverage or scenario diversity.

Key Takeaways

Using real-world failure records as seeds for scenario generation produces more relevant and safety-critical test cases than purely synthetic methods.
This approach reduces the search space for edge cases, making ADS validation more computationally efficient and grounded in reality.
AI practitioners must prioritize high-fidelity simulation environments and robust data curation pipelines to fully leverage failure-derived scenarios.
The methodology highlights a broader trend in AI safety: moving from brute-force exploration to targeted, data-informed testing.

Read Original Article on Arxiv CS.AI

arxivpapers