AI RESEARCH

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

arXiv CS.CL

ArXi:2606.02404v1 Announce Type: new Frontier model evaluations are shifting from foundational capabilities (e.g., instruction following and reasoning) toward compositional, agentic ones, but Korean agentic benchmarks remain scarce. We