AI RESEARCH
BlueFin: Benchmarking LLM Agents on Financial Spreadsheets
arXiv CS.AI
•
ArXi:2605.30907v1 Announce Type: cross We present BlueFin, a benchmark that tasks large language model (LLM) agents with synthesis, manipulation, and comprehension tasks over spreadsheet workbooks in the professional finance domain.