Back to feed
Reddit r/MachineLearning·

How to fine-tune an LLM for open-ended problems? [P]

Signal
35
Hype
15
In three linesResearcher asks how to fine-tune an LLM for open-ended math problems (proofs). Standard SFT and RLHF inadequate; seeks appropriate method using MathNet dataset.
Read source
Your take?
Fine-tuningReinforcement learningReasoning

Summary generated by Claude — human-verified