When prompt perturbations break your A/B test: A valid statistical test for generative surveying

ArXi:2605.27463v1 Announce Type: cross Generative surveying -- where collections of LLM-based personas provide feedback on messages -- has emerged as a cheap and scalable alternative to traditional market research. However, LLMs are sensitive to small variations in prompt design and