Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification

ArXi:2605.20193v1 Announce Type: cross Quantized Large Language Models (LLMs) are used often in qualitative analysis because they run fast and need fewer computing resources. This study examines how different lower bits quantization levels (8-bit, 4-bit, 3-bit, and 2-bit) and quantization types affect the performance of LLaMA-3.1 (8B) on qualitative analysis. The study uses expert and non-expert responses from 82 interview transcripts. Low-bit models often produce higher levels of hallucinations and unstable results, especially when reading non-expert language with unclear terms.