Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation

ArXi:2605.29861v1 Announce Type: cross Large Language Models (LLMs) have advanced autonomous agents from deep search, which retrieves concise factual answers, to deep research, which synthesizes scattered evidence into long-form reports. However, verifiable multimodal deep research remains challenging due to open-ended synthesis without deterministic ground truth and the need to interleave textual arguments with visual evidence. We propose \textsc{Ptah}, a multi-agent harness for interleaved report generation.