MusTBENCH: Benchmarking and Advancing Temporal Grounding in Music LLMs

ArXi:2605.29300v1 Announce Type: cross Recent Large Audio-Language Models (LALMs) have nstrated promising abilities in understanding musical content. However, whether their responses are grounded in the correct temporal regions of the audio remains underexplored. This limitation is particularly critical for music understanding, where key information often occurs as temporally localized events, such as instrument entries and rhythmic transitions. To address this gap, we