Topic

#Llama

Llama is a family of open-weight large language models developed by Meta AI, designed for research and commercial use. For example, Llama 3 can be run locally or fine-tuned on custom datasets using libraries like Hugging Face Transformers.

40Articles
7Sources
65Avg. signal
arXiv cs.AI·

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

arXiv study on iterative refinement of LLM-generated reward functions for sparse structured RL. Authors identify two dominant failure modes (reward flooding, semantic misunderstanding) and propose diagnostic-driven refinement guided by failure-mode taxonomy. Results: DoorKey-8x8 improves from 2.3% to 97.6%, KeyCorridor from 31.2% to 86.7%. Limitations: method restricted to PPO and sparse structured tasks.

Reinforcement learningLlamaPrompt engineering
SIG
72
HYP
00
Reddit r/LocalLLaMA·

Flash Attention for llama.cpp on RDNA3: 47% less KV VRAM than Vulkan f16 K, KLD almost losselss on F16 K / q4_0 V. Part 1.

Flash Attention optimization for llama.cpp on RDNA3 GPUs: 47% VRAM reduction vs Vulkan f16. Packs four 8-bit K-values into native sudot4 instructions without lossy quantization. At 128k context with MTP draft: 21.76 GiB vs 23.18 GiB (1.42 GiB savings). Quality preserved: mean KLD 0.00455 (q4_0 V), 97.06% identical top tokens.

LlamaCode generationBenchmarks
SIG
82
HYP
00
Reddit r/LocalLLaMA·

Speed difference between Windows 11 and Linux with llama.cpp: a myth when using medium and large MoE models

llama.cpp benchmark comparing Windows 11 and Linux (Ubuntu 26.04) on Nvidia GPU (RTX 5080 + 2× RTX 5060 Ti). No significant performance difference: Qwen 3.5 122B achieves PP 300/TG 28 (Windows) vs PP 290/TG 28.5 (Linux); Qwen 3.5 397B: PP 140/TG 16 vs PP 150/TG 15.2. Tests repeated 4 times with recent llama.cpp including VRAM optimization.

LlamaQwenBenchmarks
SIG
72
HYP
00
arXiv cs.LG·

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

Study on LLM reward design failures in sparse structured RL. Authors identify two dominant failure modes (reward flooding, semantic misunderstanding) and propose diagnostic-driven iterative refinement. On MiniGrid, DoorKey-8x8 improves from 2.3% to 97.6% success; KeyCorridor from 31.2% to 86.7%. Failure-mode taxonomy is the primary mechanism.

Reinforcement learningLlamaPrompt engineering
SIG
72
HYP
00
arXiv cs.CL·

Keyphrase Generative Representation of Youth Crisis Conversations Beyond Static Taxonomies

Analysis of 703,975 youth crisis SMS conversations (Kids Help Phone, 2018-2023). Introduces Keyphrase Generative Representation (KGR), a constrained LLM generating context-specific keyphrases. Taxonomy expanded from 19 to 39 labels with 0.96 accuracy. KGR identifies 81% accurate keyphrases and improves topic-retrieval workflow (+0.45 accuracy vs manual process).

LlamaPrompt engineeringRAG
SIG
72
HYP
00
Llama — AI news · Signal IA