Benchmarking Multimodal LLMs on Code Generation for Complex Interactive Webpages

ArXi:2606.00154v1 Announce Type: cross Recent advancements in multimodal large language models (MLLMs) have achieved remarkable progress in multimodal reasoning and code generation, catalyzing a new paradigm for front-end development. In particular, these models can directly transform visual designs into executable code, significantly improving the efficiency and adaptability of web development. Modern web applications are dynamic and interactive, featuring frequent user-page interactions.