LoCoT2V-Bench: Benchmarking Long-Form and Complex Text-to-Video Generation

ArXi:2510.26412v3 Announce Type: replace-cross Recent advances in text-to-video generation have achieved impressive performance on short clips, yet evaluating long-form generation under complex textual inputs remains a significant challenge. In response to this challenge, we present LoCoT2V-Bench, a benchmark for long video generation (LVG) featuring multi-scene prompts with hierarchical metadata (e.g., character settings and camera behaviors), constructed from collected real-world videos.