Reddit r/MachineLearning·24 May 2026

How do ML practitioners select hyperparameters, architectures, etc for self-supervised representation learning when the loss is non-monotonic? [D]

Signal

Hype

In three linesA researcher questions hyperparameter selection practices for non-contrastive self-supervised learning (BYOL, JEPA, data2vec). They highlight that non-monotonic losses make it hard to evaluate what is actually learned, and that criteria like RankMe (based on effective rank of embeddings) become ineffective once integrated into the loss function.

Read source

Your take?

RAG Fine-tuning Evals

Summary generated by Claude — human-verified

How do ML practitioners select hyperparameters, architectures, etc for self-supervised representation learning when the loss is non-monotonic? [D]

Other angles on this story