AI RESEARCH

BlueFin: Benchmarking LLM Agents on Financial Spreadsheets

arXiv CS.AI

ArXi:2605.30907v1 Announce Type: cross We present BlueFin, a benchmark that tasks large language model (LLM) agents with synthesis, manipulation, and comprehension tasks over spreadsheet workbooks in the professional finance domain.