AI RESEARCH
ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference
arXiv CS.AI
•
ArXi:2510.02361v2 Announce Type: replace-cross Transformer-based large models excel in natural language processing and computer vision, but face severe computational inefficiencies due to the self-attention's quadratic complexity with input tokens. Recently, researchers have proposed a series of methods based on block selection and compression to alleviate this problem, but they either have issues with semantic incompleteness or poor