Topic

#Prompt engineering

Prompt engineering is the practice of crafting and structuring instructions given to a language model to obtain accurate and useful outputs. For example, chain-of-thought prompting techniques measurably improve GPT-4's performance on reasoning tasks.

40Articles
8Sources
68Avg. signal
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> mksglu /</span> context-mode

Context-mode optimizes context window for AI coding agents by sandboxing tool outputs. Achieves 98% token reduction. Compatible with 15 platforms.

AI AgentsCode generationPrompt engineering
SIG
72
HYP
00
arXiv cs.CL·

Toward Robust In-Context Learning: Leveraging Out-of-distribution Proxies for Target Inaccessible Demonstration Retrieval

DOPA, a demonstration retrieval framework, uses an OOD proxy to approximate the inaccessible target domain and guide selection of relevant demonstrations. A Mahalanobis distance-based global diversity constraint ensures sufficient variety among retrieved examples. Positive results across multiple LLMs and tasks under severe distribution shift.

Prompt engineeringBenchmarksPapers
SIG
72
HYP
00
arXiv cs.CL·

On the Limits of LLM Adaptability: Impact of Model-Internalized Priors on Annotation Task Performance

arXiv study on LLM adaptation limits for annotation tasks. Toxicity detection experiments across diverse datasets show 66% of zero-shot errors resist correction via prompting (rescue rate 34.8%). Models follow misaligned definitions while maintaining confidence. Definition-Specific Familiarity (DSF) metric correlates with performance (r=+0.41), outperforming memorization metrics.

Prompt engineeringEvalsBenchmarks
SIG
78
HYP
00
Reddit r/MachineLearning·

[P] Built a persistent cognitive runtime around an LLM — zero behavioral prompts, emergent autonomy from architecture. Comparison test: standard LLM in identical ecosystem did nothing.[P]

Developer builds LIA, a persistent cognitive runtime around an LLM without behavioral prompts. Architecture includes 20k+ self-evaluated memories, cognitive kernel (LCRK v3), self-rule system, and private Linux domain. Control test: standard LLM in identical ecosystem remains inactive.

AI AgentsPrompt engineeringReasoning
SIG
35
HYP
00
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> pbakaus /</span> impeccable

Impeccable is a design language to improve AI tools' ability to generate interfaces. The GitHub project offers a structured approach to guide models in creating coherent designs.

Prompt engineeringTools
SIG
35
HYP
00
arXiv cs.AI·

Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

Study on harness self-evolution (prompts, skills, memories, tools) in LLM agents. Analyzes two capabilities: harness-updating (producing useful updates) and harness-benefit (benefiting from them). Findings: harness-updating is capability-agnostic (Qwen3.5-9B matches Claude Opus gains), while harness-benefit is non-monotonic (mid-tier models benefit most).

AI AgentsPrompt engineeringBenchmarks
SIG
75
HYP
00
arXiv cs.CL·

Skill is Not One-Size-Fits-All: Model-Aware Skill Alignment for LLM Agents

MASA (Model-Aware Skill Alignment) adapts procedural skills for LLM agents to each model backbone without weight modification. A hierarchical evolution pipeline rewrites skills via hill climbing and UCB-driven tree search, then a lightweight rewriter trained on trajectories reproduces adaptation in a single forward pass. Gains up to 25.8 points across three interactive environments and four backbones.

AI AgentsPrompt engineeringReasoning
SIG
78
HYP
00
arXiv cs.AI·

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

arXiv study on iterative refinement of LLM-generated reward functions for sparse structured RL. Authors identify two dominant failure modes (reward flooding, semantic misunderstanding) and propose diagnostic-driven refinement guided by failure-mode taxonomy. Results: DoorKey-8x8 improves from 2.3% to 97.6%, KeyCorridor from 31.2% to 86.7%. Limitations: method restricted to PPO and sparse structured tasks.

Reinforcement learningLlamaPrompt engineering
SIG
72
HYP
00
arXiv cs.AI·

COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

COLLEAGUE.SKILL is an automated trace-to-skill distillation system for generating person-grounded AI skills via expert knowledge extraction. The system produces versioned packages with two coordinated tracks: capability (practices, mental models, decision heuristics) and bounded behavior (communication style, interaction rules). 18.5k GitHub stars, 215 skills from 165 contributors.

AI AgentsPrompt engineeringOpen source
SIG
72
HYP
00
GitHub Trending·

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> ronisarkarexe /</span> story-spark-ai

StorySparkAI is an open-source platform enabling users to generate and share multiple story variations from a single prompt. Designed for creative professionals.

Open sourcePrompt engineeringTools
SIG
35
HYP
00
arXiv cs.LG·

When LLM Reward Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL

Study on LLM reward design failures in sparse structured RL. Authors identify two dominant failure modes (reward flooding, semantic misunderstanding) and propose diagnostic-driven iterative refinement. On MiniGrid, DoorKey-8x8 improves from 2.3% to 97.6% success; KeyCorridor from 31.2% to 86.7%. Failure-mode taxonomy is the primary mechanism.

Reinforcement learningLlamaPrompt engineering
SIG
72
HYP
00
arXiv cs.CL·

Thoughts-as-Planning: Latent World Models for Chain-of-Thoughts Optimization via Reinforcement Planning

Thoughts-as-Planning formalizes reasoning chain optimization as sequential decision-making over latent semantic space. The framework learns a latent world model simulating effects of reasoning chain edits on outputs, supporting multi-scale edits (token, segment, instruction) via gradient descent or reinforcement learning planning.

ReasoningReinforcement learningPrompt engineering
SIG
72
HYP
00
arXiv cs.CL·

Analyzing Persona Effects in Generated Explanations from Multimodal LLM Agents in Urban Perception

Study of persona effects on explanations generated by multimodal LLM agents in urban perception. Analysis of 59,808 annotations from 1,200 persona-conditioned agents: captions show strong convergence, justifications display systematic variation tied to socioeconomic and political attributes, perception tags show no significant persona-related differences.

VisionAI AgentsPrompt engineering
SIG
72
HYP
00
arXiv cs.CL·

Reasoning that Travels: Dissecting How Chain-of-Thought Transfers Across Models

Study of chain-of-thought (CoT) transfer across models using a provider-receiver framework. Full traces often transfer successfully, but mechanisms vary: answer extraction (AIME), receiver competence (MMLU-Pro), or partial structured information (ZebraLogic). In free-generation mode, partial CoTs improve performance, suggesting guidance for continued reasoning.

ReasoningPrompt engineeringBenchmarks
SIG
78
HYP
00
arXiv cs.CL·

Structured Prompt Optimization Meets Reinforcement Learning for Global and Local Interpretability over Complex Text

eXTC combines structured prompt optimization and reinforcement learning for text classification. The system learns a natural language rulebook first, then distills reasoning from a teacher LLM into a compact model, then expands capabilities via RL. Result: fast inference with local reasoning traces and global modular explanations of learned domain rules.

Prompt engineeringReinforcement learningReasoning
SIG
72
HYP
00
arXiv cs.AI·

SkillGrad: Optimizing Agent Skills Like Gradient Descent

SkillGrad optimizes LLM agent skills using a gradient-descent-inspired framework. Task executions provide trajectory-level loss signals, automatic diagnostics generate text-based gradients, and a momentum agent accumulates recurring patterns. Evaluated on SpreadsheetBench and WikiTableQuestions, SkillGrad outperforms training-based baselines by 6.7 percentage points on average.

AI AgentsReinforcement learningPrompt engineering
SIG
78
HYP
00
arXiv cs.AI·

Hierarchical Prompt-Domain Control and Learning for Resource-Constrained Agentic Language Models

Hierarchical framework for compact LLMs in resource-constrained agentic systems. Model distillation + oracle-controller loop monitors protocol validity, projects histories into feasible prompt domain, triggers lightweight fine-tuning under drift. Separates schema learning from semantic adaptation. Evaluated on Multi-Fidelity Bayesian Optimization with improved reliability and cost-efficiency.

AI AgentsFine-tuningPrompt engineering
SIG
72
HYP
00