VocSim: A Training-free Benchmark for Zero-shot Content Identity in Single-source Audio

ArXi:2512.10120v2 Announce Type: replace-cross General-purpose audio representations aim to map acoustically variable instances of the same event to nearby points, resolving content identity in a zero-shot setting. Unlike supervised classification benchmarks that measure adaptability via parameter updates, we