Fine-Tuning Without Forgetting In-Context Learning: A Theoretical Analysis of Linear Attention Models

ArXi:2602.23197v2 Announce Type: replace-cross Transformer-based large language models exhibit in-context learning, enabling adaptation to downstream tasks via few-shot prompting with nstrations. In practice, such models are often fine-tuned to improve zero-shot performance on downstream tasks, allowing them to solve tasks without examples and thereby reducing inference costs. However, fine-tuning can degrade in-context learning, limiting the performance of fine-tuned models on tasks not seen during fine-tuning.