AI RESEARCH

TRACER: A Semantic-Aware Framework for Fine-Grained Contamination Detection in Code LLMs

arXiv CS.AI

ArXi:2605.24079v1 Announce Type: cross Data contamination is a known threat to the reliability of model evaluation. However, it remains underexplored in code large language models (LLMs), where contamination often goes beyond exact duplication. We present TRACER, a semantic-aware framework for fine-grained code contamination detection. TRACER models contamination using three levels of semantic overlap - Functionally Identical, Nearly Identical, and Shared Logic - and detects them through a coarse-to-fine pipeline. We also