Back to feed
Reddit r/LocalLLaMA·

losing my mind fine-tuning jina-v5 for a legal corpus

Signal
35
Hype
15
In three linesUser has been fine-tuning Jina-v5 on Slovak legal corpus for a month without success. Model fails to capture Slovak syntactic nuances, especially on ambiguous cases ("krádež" vs "prepadnutie"). Tested multiple approaches: LLM-generated queries, similar chunk injection, logit mining with Qwen 3.5-397B, but fine-tunes consistently underperform base model.
Read source
Your take?
EmbeddingsFine-tuningRAGEvals

Summary generated by Claude — human-verified