Embeddings for NVIDIA's Nemotron Personas

r/LocalLLaMA
Generative AI AI Hardware Open Source AI AI Research

I extracted embedding vectors for nvidia/Nemotron-Personas dataset. It's an incredible resource consisting of millions of synthetic personas with detailed backgrounds (names, ages, occupations, hobbies, and more), but finding specific personas or clustering them is difficult. To solve this, I used Qwen 0.6B to compute embeddings. While 0.6B is lightweight, it works perfectly for running semantic searches or finding K-Nearest Neighbors to build out persona groups. You can find the precomputed embedding vectors (Korea, Japan, France, USA). Please check out web.