losing my mind fine-tuning jina-v5 for a legal corpus
r/LocalLLaMA
•
Machine Learning
AI Research
For the last month i've been trying to fine-tune jina-v5 (which has performed best on my corpus out of the box) on slovak law chunks, time and time again no matter what i do I can't get the model to learn nuance of slovak syntax. here's the biggest trap chunk that keeps confusing my AI with my translation: Query: "krádež cigariet" = theft of cigarettes Podľa § 60 ods. 1 písm. a/ Tr. zák. súd obvinenému ukladá trest prepadnutia vecí a to: 1000 ks cigariet zn. Marlboro gold, 400 ks cigariet zn. Rothmans modré, 1000 ks cigariet zn. Rothmans červené, 400 ks cigariet zn.