Back to feed
Reddit r/MachineLearning·

How do ML practitioners select hyperparameters, architectures, etc for self-supervised representation learning when the loss is non-monotonic? [D]

Signal
35
Hype
15
In three linesA researcher questions hyperparameter selection practices for non-contrastive self-supervised learning (BYOL, JEPA, data2vec). They highlight that non-monotonic losses make it hard to evaluate what is actually learned, and that criteria like RankMe (based on effective rank of embeddings) become ineffective once integrated into the loss function.
Read source
Your take?
RAGFine-tuningEvals

Summary generated by Claude — human-verified