Hugging Face Dataset Lineage Explorer
r/LocalLLaMA
•
Machine Learning
Generative AI
AI Research
AI Tools
As Hugging Face's Machine Learning Librarian, I am probably obsessed with metadata than most, but one field in the dataset spec for HF dataset card READMEs is source_datasets. This is very rarely used, so it's quite hard to know how different datasets relate to each other. To help with this, I did a bit of work with Claude Code to explore if it's possible to detect how datasets have derivatives, i.e. translations, cleaned up versions, etc.