AI RESEARCH

Go-UT-Bench: A Fine-Tuning Dataset for LLM-Based Unit Test Generation in Go

arXiv CS.LG

Training data imbalance poses a major challenge for code LLMs. Most available data heavily over represents raw opensource code while underrepresenting broader software engineering tasks, especially in low resource languages like Golang. As a result, models excel at code autocompletion but struggle with real world developer workflows such as unit test generation.