InFerActive: Interactive Tree-Based Exploration of LLM Sampling for Safety Evaluation

ArXi:2512.10234v2 Announce Type: replace-cross Even LLMs that appear safe during evaluation can still produce harmful responses in deployment. Because stochastic sampling yields different responses to the same prompt, low-probability harmful outputs can still reach users at scale. Common human evaluation workflows generate many random samples per prompt and review them in static spreadsheets. The practice scales poorly, forcing evaluators to repeatedly reread near-duplicate prefixes.