AI RESEARCH

Disentangling Adversarial Prompts: A Semantic-Graph Defense for Robust LLM Security

arXiv CS.AI

ArXi:2605.27823v1 Announce Type: cross Large Language Models (LLMs) are increasingly vulnerable to adversarial prompts that exploit semantic ambiguities to bypass safety mechanisms, resulting in harmful or inappropriate outputs. Such attacks, including jailbreaking and prompt injection, pose significant risks to the integrity and availability of LLMs in security-critical applications.