Topic

#Code generation

Code generation refers to an AI model's ability to produce source code from a natural language prompt. GitHub Copilot, powered by OpenAI's Codex models, is one of the most widely used tools in this space.

40Articles

10Sources

68Avg. signal

arXiv cs.CL·Jun 18

JetFlow: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

JetFlow improves speculative decoding by combining parallel drafting efficiency with branch-wise causal conditioning. On H100 GPUs, it achieves 9.64x speedup on MATH-500 and 4.58x on open-ended conversations, outperforming existing tree-based methods on dense and MoE Qwen3 models.

Benchmarks Code generation Open source

SIG

HYP

arXiv cs.LG·Jun 18

Breaking the Solver Bottleneck: Training Task Generators at the Learnable Frontier

PROPEL is a framework training task generators via RL to create optimally difficult problems for agent learning. A lightweight probe predicts solver pass rate without repeated rollouts, reducing evaluation to a single forward pass. On code and SWE tasks, learnable-frontier generation increases from 10.1% to 20% (Qwen2.5-3B) and 9.8% to 19.6% (Qwen3.5-27B).

Reinforcement learning AI Agents Code generation

SIG

HYP

arXiv cs.LG·Jun 18

Ghost Attractor Networks: Basin-Structured Dynamical Decoders for Closed-Loop Sequential Generation

Ghost Attractor Networks introduce an efficient dynamical decoder for sequential generation in robotics. With 2.3M parameters, it matches the offline accuracy of a 1.07B-parameter Diffusion Transformer (462× fewer parameters, 32× lower latency). On LIBERO-10, phase conditioning improves success rate by 13.5 percentage points over MLP baseline.

Code generation Robotics Reasoning

SIG

HYP

arXiv cs.LG·Jun 18

CODEBLOCK: Learning to Supervise Code at the Right Granularity

CodeBlock is a structure-aware sparse supervision framework for code LLM fine-tuning. It selects syntactically coherent code blocks rather than isolated tokens, estimating utility via generalized cross-entropy and data-flow signals. On 6 code-generation benchmarks, CodeBlock outperforms full-token SFT while using only 1.9% of supervised response tokens.

Code generation Fine-tuning Papers

SIG

HYP

arXiv cs.AI·Jun 18

X+Slides: Benchmarking Audience-Conditioned Slide Generation

X+Slides is a benchmark for evaluating audience-conditioned slide generation. Built on 113 topics and 8,133 probes, it measures four metrics: Audience Coverage, Domain-wise Coverage, Efficiency, and Correctness. Tests on DeepPresenter, SlideTailor, and NotebookLM show Audience Coverage scores between 0.594 and 0.853.

Benchmarks Code generation

SIG

HYP

Simon Willison·Jun 17

GLM-5.2 is probably the most powerful text-only open weights LLM

Z.ai released GLM-5.2 (753B parameters, 40 active via MoE) under MIT license on June 16th. Text-only model with 1M token context window. Ranks 1st on Artificial Analysis Intelligence Index v4.1 (score 51) ahead of DeepSeek V4 Pro and Kimi K2.6. 2nd on Code Arena WebDev behind Claude Fable 5.

Open source Benchmarks Code generation

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

My GLM-5.2-FP8 HGX-H200 SGLang docker deploy config

Docker deployment config for GLM-5.2-FP8 on HGX-H200 using SGLang. Achieves 70 tokens/s and 262k context by disabling DP and moe-a2a-backend deepep, with mem-fraction-static set to 0.83. Official vLLM recipes incompatible with H200.

Qwen Code generation Infrastructure

SIG

HYP

The Decoder·Jun 17

Zhipu AI's GLM-5.2 closes in on closed-source leaders in coding marathons

Zhipu AI releases GLM-5.2 under MIT license with stable 1-million-token context. On FrontierSWE benchmark for long-duration coding tasks, the open-source model trails Anthropic's Claude Opus 4.8 by just one percentage point. Significant gap remains on reasoning versus closed-source rivals.

Open source Code generation Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

Multilingual-Multimodal-NLP/LoopCoder-V2 · Hugging Face

LoopCoder-V2 is a 7B code model based on Parallel Loop Transformer (PLT) that improves test-time performance through two passes of shared Transformer blocks. Trained on 18T tokens of mixed text/code data, it reaches 64.4 on SWE-bench Verified (vs 43.0 baseline), with two loops as the optimal gain-cost setting.

Code generation Reasoning Benchmarks

SIG

HYP

Simon Willison·Jun 17

Quoting Charity Majors

Charity Majors observes that in 2025, the economics of code production flipped: generating code became nearly free and instant instead of expensive and time-consuming. Lines of code shifted from being treasured and carefully curated to disposable and regenerable overnight.

Code generation Prompt engineering

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

Gemma 4 E2B running in-browser at 255 tok/s using WebGPU kernels written by Fable 5

Gemma 4 E2B runs in-browser at 255 tokens/sec using WebGPU kernels optimized by Fable 5. Demo and kernels released on Hugging Face.

Gemini Code generation Open source

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

GameCraft-Bench evaluates whether AI agents can build playable games end-to-end in a real game engine. Benchmark tests Opus-4.7, GPT-5.5, Kimi-K2.6, DeepSeek-V4-Pro and others. No results reported for medium-sized models (27B-31B).

AI Agents Benchmarks Code generation

SIG

HYP

Hacker News (AI)·Jun 17

Launch HN: Adam (YC W25) – Open-Source AI CAD

Adam is an open-source AI-powered CAD software launched by a YC W25 startup. The project aims to automate computer-aided design through AI models.

Open source Tools Code generation

SIG

HYP

Hacker News (AI)·Jun 17

Agentic coding deserves more than a chat box bolted onto VS Code

Critical take on agentic coding integration in VS Code as a simple chat interface. Author argues current tools lack depth to leverage agentic systems' potential and require architectural redesign of editors.

AI Agents Code generation Tools

SIG

HYP

The Decoder·Jun 17

Nvidia research shows robots that train themselves through AI coding agents

Researchers from Nvidia, Carnegie Mellon University, and UC Berkeley use AI coding agents to teach robots dexterous grasping in real-world conditions. A fleet of eight robots achieves 99% success rate on complex tasks.

AI Agents Code generation Robotics

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

GLM-5.2 is a win for local AI

GLM-5.2 (744B) under MIT license marks progress for local AI despite its massive footprint. The community can distill its reasoning capabilities into 8B/70B models, significantly improving local setups.

Open source Fine-tuning Reasoning

SIG

HYP

Reddit r/LocalLLaMA·Jun 17

Headless screenshot loops let a local 30B agent finish a raytraced FPS demo in pure C

A local Qwen 27B agent completed a raytraced FPS demo in pure C using headless screenshot loops for visual debugging. Adding headless mode with keyboard/mouse injection and frame capture transformed the approach: the model learned to automate recursive visual debugging loops independently.

Qwen AI Agents Code generation

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> continuedev /</span> continue

Continue is an open-source coding agent featured on GitHub Trending. The project provides a software development assistance solution.

AI Agents Code generation Open source

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> DeusData /</span> codebase-memory-mcp

High-performance code intelligence MCP server. Indexes codebases into persistent knowledge graph in milliseconds. Supports 158 languages, sub-ms queries, 99% fewer tokens. Single static binary, zero dependencies.

MCP Code generation RAG

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> Lampese /</span> codex-switcher

Lampese/codex-switcher is a desktop application for managing multiple OpenAI Codex CLI accounts. Open-source tool enabling account switching.

OpenAI Code generation Tools

SIG

HYP

GitHub Trending·Jun 17

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> continuedev /</span> continue

Continue is an open-source coding agent featured on GitHub Trending. The project provides an automated development assistance solution.

AI Agents Code generation Open source

SIG

HYP

Latent Space·Jun 17

[AINews] GLM-5.2: the top Frontend Coding model in the world, IndexShare for Speculative Decoding

GLM-5.2 becomes the top open-source model for frontend coding. Zhipu AI also announces IndexShare, a speculative decoding technique to accelerate inference.

Code generation Benchmarks Open source

SIG

HYP

arXiv cs.CL·Jun 17

Self-Generated Error Training for Token Editing in Diffusion Language Models

Training method to improve token editing in diffusion language models (LLaDA2.1). Addresses training-inference mismatch between random corruptions and model's own errors. Uses no-gradient draft pass followed by supervision on self-generated corruptions via LoRA. Reduces edit intensity and transcription errors.

Code generation Fine-tuning Reasoning

SIG

HYP

arXiv cs.CL·Jun 17

MLLP-VRAIN UPV system for the IWSLT 2026 Simultaneous Speech Translation task

MLLP-VRAIN group participates in IWSLT 2026 simultaneous speech translation using Parakeet and Qwen 3.5 models. Cascaded system with adaptive policies and RAG mechanism for domain-specific context. +5.82 XCOMET-XL improvement on En→De test set versus previous year.

Qwen RAG Code generation

SIG

HYP

arXiv cs.CL·Jun 17

GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine?

GameCraft-Bench evaluates coding agents' ability to generate playable games end-to-end in Godot. The benchmark comprises 140 tasks across 15 game families. Top agents achieve only 41.46% success, revealing struggles to produce complete games with sufficient content and coherent visual feedback.

Code generation AI Agents Benchmarks

SIG

HYP

arXiv cs.CL·Jun 17

VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination

VoidPadding introduces a dedicated [VOID] token for padding in masked diffusion language models (MDLMs), freeing [EOS] for semantic termination. On Dream-7B-Instruct, it improves mathematical reasoning and code generation benchmarks by +17.84 points over baseline and +6.95 over RainbowPadding, reducing NFE by 55.7%.

Code generation Reasoning Benchmarks

SIG

HYP

arXiv cs.LG·Jun 17

Discrete Autoregressive Transformer for Generative Mechanism Synthesis

Discrete autoregressive transformer for mechanism synthesis. Conditional sequence model with VAE latent and quantized joint coordinates. Trained on >1M mechanisms with Chamfer distance and DTW metrics. Mean Chamfer distance 0.0132, DTW 0.153 on held-out tests.

Code generation Benchmarks Papers

SIG

HYP

arXiv cs.LG·Jun 17

Operator Boosting Produces Pareto-Efficient PDE Surrogates

Operator Boosting constructs compact neural-operator surrogates for PDEs via stagewise residual learning. Tested on FNO, DeepONet, and CNO across 30 benchmarks (PDEBench, APEBench), the method reduces parameters by 72–95% while improving accuracy on 21 dataset-architecture pairs and achieves Pareto gains on 7/10 PDE benchmarks.

Papers Benchmarks Code generation

SIG

HYP

arXiv cs.AI·Jun 17

Dissecting model behavior through agent trajectories

Study of harness-model alignment via 138k agent trajectories. Authors introduce Simple Strands Agent (SSA), a generic harness tested on Claude, Gemini, GPT, Grok, Qwen across SWE-Pro, SWE-Verified, and Terminal-Bench-2. Beyond pass@1 scores, analysis reveals fine-grained behavioral differences: edit frequency, testing activity, phase transitions.

AI Agents Benchmarks Code generation

SIG

HYP

arXiv cs.AI·Jun 17

Beyond Domains: Reusing Web Skills via Transferable Interaction Patterns

SkillMigrator is an LLM agent that learns reusable web skills and transfers them across sites by matching layout structure rather than specific element references. Induced skills are stored as transferable interaction patterns (TIPs). On WebArena and Mind2Web, SkillMigrator reduces average LLM-action count by 8-10% at matched success rate.

AI Agents Code generation Benchmarks

SIG

HYP

arXiv cs.AI·Jun 17

FllumaOne: A Code-Native Multimodal CAD Dataset with Executable Programs and Kernel-Validated Feature Histories

FllumaOne is a multimodal CAD dataset of 100,000 models generated by executable Python programs in Flluma (OpenCASCADE-based CAD system). Each sample aligns the program with a feature tree, STEP representation, point cloud, and natural-language descriptions. A Qwen2.5-Coder-1.5B baseline achieves 99.98% Python syntax validity and 99.14% STEP-export validity.

Code generation Benchmarks Vision

SIG

HYP

arXiv cs.AI·Jun 17

LongWebBench: Evaluating Structural and Functional Webpage Generation in Long-Horizon Settings

LongWebBench is a benchmark evaluating long-horizon webpage generation by vision-language models. It contains 490 real-world pages for structural evaluation and 507 goal-oriented interaction tasks over 129 pages. Experiments show structural fidelity degrades with webpage length, and visually plausible generations often fail to support multi-step executable interactions.

Vision Benchmarks AI Agents

SIG

HYP

arXiv cs.AI·Jun 17

PreAct: Computer-Using Agents that Get Faster on Repeated Tasks

PreAct compiles successful runs of computer-using agents into small state-machine programs, replayed 8.5-13x faster with no per-step LLM calls. An independent evaluator validates each program before storage. Across three benchmarks (mobile, desktop, web), this verification prevents faulty program accumulation (+1.75-2.6 tasks).

AI Agents Code generation Benchmarks

SIG

HYP

arXiv cs.CL·Jun 17

Bridging Functional Correctness and Runtime Efficiency Gaps in LLM-Based Code Translation

SwiftTrans, an LLM-based code translation framework, combines multi-perspective exploration (MpTranslator with parallel in-context learning) and difference-aware selection (DiffSelector) to improve both functional correctness and runtime efficiency. Evaluated on CodeNet, F2SBench, and SwiftBench.

Code generation Prompt engineering Benchmarks

SIG

HYP

arXiv cs.LG·Jun 17

When the Next Step Is Not One Step: Distribution-Aware Execution Modeling for Concurrent Go Programs

7B model fine-tuned to predict next step in concurrent Go programs by learning event distributions rather than single labels. On 798 predictions from real bugs (CockroachDB, Kubernetes, gRPC, etcd), achieves 36.2% accuracy with <1000 traces, outperforming Gemini 3.5 Flash zero-shot (34.8%). Dataset, adapters, and tooling released.

Code generation Benchmarks Fine-tuning

SIG

HYP

arXiv cs.AI·Jun 17

DecoSearch: Complexity-Aware Routing and Plan-Level Repair for Text-to-SQL

DecoSearch is a training-free framework for text-to-SQL translation that routes queries by complexity. A schema selector prunes the database, an LLM judger decides if decomposition is needed, and a DAG solves atomic sub-questions. Achieves 70.53% on BIRD and 88.31% on Spider with DeepSeek, outperforming training-free baselines.

Code generation Reasoning RAG

SIG

HYP

Simon Willison·Jun 17

<click-to-play> — a still that plays

Web Component <click-to-play> that converts a static image into a play button to load GIFs on demand. Improves performance by preventing automatic loading of large files.

Tools Code generation

SIG

HYP

Vercel AI Blog·Jun 17

Introducing eve, an open-source agent framework

Vercel releases eve, an open-source framework for building and deploying AI agents. Minimal agent requires only two files (model + instructions). Add tools, skills, channels by creating files. Deploy to production with vercel deploy, unchanged from local development.

AI Agents Open source Tools

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

GLM-5.2 just dropped open weights and it already looks weirdly strong for coding

GLM-5.2 released with open weights under MIT license. 1M context window, two reasoning effort modes, strong coding arena performance. Open-source model unlike API-only alternatives.

Qwen Open source Code generation

SIG

HYP

Reddit r/LocalLLaMA·Jun 16

GLM 5.2 API is live, weights are on HF, and ollama has it already

GLM-5.2 API live at $1.4/M input tokens, $4.4/M output. Weights released MIT-licensed on HuggingFace, Ollama support available. Benchmarks: 81.0 Terminal-Bench 2.1, 62.1 SWE-bench Pro, 74.4 FrontierSWE. 1M context window, two thinking modes (High/Max).

Open source Code generation Benchmarks

SIG

HYP