What Does the Server See? Understanding Privacy Leakage from Large Language Models in Split Inference

ArXi:2605.23158v1 Announce Type: cross The deployment of large language models (LLMs) on resource-constrained devices remains challenging, spurring interest in split inference, where models are partitioned between client and server to reduce computational burden and enhance privacy by transmitting only intermediate activations. However, the privacy-preserving capabilities of split inference, particularly in the context of LLMs, have not been exhaustively investigated. To fill this gap, we