Topic

#Image generation

Image generation refers to an AI model's ability to create visuals from a text description. Stable Diffusion, for example, produces realistic or artistic images in seconds from a simple prompt.

40Articles

8Sources

56Avg. signal

Latent Space·Jun 18

[AINews] Midjourney Medical: scan your organs like you step on a scale

Midjourney announces its second product: a medical application enabling organ scanning via smartphone without specialized medical equipment. The AI model analyzes captured images to provide preliminary diagnostics.

Image generation Vision Business

SIG

HYP

Reddit r/MachineLearning·Jun 17

I deployed a GAN on a Raspberry Pi 4 and built a physical NFT minting device [P]

DCGAN 128×128 deployed on Raspberry Pi 4 with ESP32 display. Model trained 800 epochs on M3 (4h), 2480 images, exported to ONNX (53MB). Inference 3s per face. Generates hybrid faces with randomized titles. Presented as street art installation in NYC.

Image generation Open source Tools

SIG

HYP

GitHub Trending·Jun 16

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> ParthJadhav /</span> app-store-screenshots

Open-source tool for automated app store screenshot generation using AI. Automates visual marketing asset creation for mobile applications.

Image generation Tools Open source

SIG

HYP

Le Big Data·Jun 15

Ce fou furieux tente de recréer GTA 6 de A à Z… uniquement avec une IA

A developer attempts to recreate GTA 6 entirely using AI, in parallel with the official release scheduled for November. The project leverages AI models to generate code, graphics assets, and game design.

Code generation Image generation Tools

SIG

HYP

GitHub Trending·Jun 14

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> AUTOMATIC1111 /</span> stable-diffusion-webui

AUTOMATIC1111/stable-diffusion-webui is a popular GitHub repository providing a web interface for Stable Diffusion. Open-source tool enabling image generation through an accessible UI.

Image generation Open source Tools

SIG

HYP

Reddit r/LocalLLaMA·Jun 12

Open Dungeon: local roleplay with Gemma 4 QAT + inline Uncen-FLUX images, running at full 256K context under 8GB RAM (OS)

Open Dungeon is a local roleplay game using Gemma 4 QAT (12B) via Ollama for narration and FLUX for image generation. Runs on 7.7GB RAM with full 256K context, no APIs or cloud. Features Do/Say/Story modes, line editing, model selection. MIT licensed, source available.

Gemini Open source Image generation

SIG

HYP

Reddit r/LocalLLaMA·Jun 10

Lemonade v10.7 release and project organization update

Lemonade v10.7 introduces omni-modal models (image gen/editing), a benchmarking tool (lemonade bench) comparing llama.cpp, FastFlowLM and vLLM, and extends multi-vendor support (CUDA, Vulkan). 19 contributors, 6 working groups with 4 led by non-AMDers. GPU acceleration on AMD, Apple Silicon, Nvidia, Intel.

Open source Benchmarks Image generation

SIG

HYP

Reddit r/LocalLLaMA·Jun 10

DiffusionGemma: The Developer Guide- Google Developers Blog

Google releases a developer guide for DiffusionGemma, its diffusion-based image generation model. The guide covers integration, optimization, and practical use cases for developers.

Gemini Image generation Tools

SIG

HYP

The Decoder·Jun 8

Microsoft Research's Lens proves detailed captions matter more than raw scale for training efficient image generators

Microsoft Research presents Lens, a text-to-image model with 3.8 billion parameters that matches much larger rivals on benchmarks at a fraction of training cost. Key innovation: 800 million detailed captions generated by GPT-4.1 instead of vague web alt-text. Code and weights released under open-source license.

Image generation Benchmarks Open source

SIG

HYP

Reddit r/MachineLearning·Jun 8

Open image generation models are closer to closed-source quality than this sub thinks [D]

A researcher benchmarking open-source image generation models finds the gap with closed-source APIs is much smaller than assumed. Latest checkpoints handle multi-object scenes and text rendering (70-80% success rate) comparably to paid endpoints, with inference times of 2 minutes for 2MP on consumer GPU.

Image generation Benchmarks Open source

SIG

HYP

Hacker News (AI)·Jun 7

Efficient and Training-Free Single-Image Diffusion Models

New approach for single-image diffusion models that generates images without additional training. The method is computationally efficient and memory-optimized.

Image generation Papers

SIG

HYP

Reddit r/LocalLLaMA·Jun 5

Horus Image Generation is here! 🤩📷

TokenAI announces Horus Lens 1.0, first open-source image generation model developed entirely in Egypt. Specialized family of text-to-image models, marking a major milestone for Egypt's and the Arab world's AI ecosystem.

Image generation Open source

SIG

HYP

Le Big Data·Jun 4

Vous ne savez pas quoi acheter ? L’IA d’Amazon s’en charge

Amazon deploys an AI generative image function to facilitate purchases. The capability allows users to visually generate products from text descriptions, integrating image generation directly into the shopping journey.

Image generation Business

SIG

HYP

The Decoder·Jun 4

xAI updates Grok Imagine to 1.5 with image-to-video generation at 720p resolution

xAI releases grok-imagine-video-1.5-preview, an image-to-video model generating cinematic videos up to 720p from still images and text prompts. Multiple clips can be stitched together into longer scenes.

Video generation Image generation

SIG

HYP

Le Big Data·Jun 4

Google lance Dreambeans, cette appli IA crée des petites histoires basées sur votre vie

Google launches Dreambeans, an AI app generating personalized micro-stories based on user data. The app offers an alternative to endless scrolling on traditional social networks.

DeepMind Image generation

SIG

HYP

Le Big Data·Jun 4

Ideogram 4.0 affiche des performances record : le nouveau roi des IA d’image open source ?

Ideogram releases Ideogram 4.0, an image generation AI model with record-breaking performance. The model is positioned as a potential leader among open-source image generation solutions.

Image generation Open source

SIG

HYP

Latent Space·Jun 4

[AINews] Reve 2 and Ideogram 4: Layouts in Imagegen

Reve 2 and Ideogram 4 introduce layout capabilities in image generation. Two major updates for spatial control and composition in visual creation tools.

Image generation

SIG

HYP

The Decoder·Jun 3

Ideogram 4.0 drops as an open-weight model with native 2K resolution and improved text rendering

Ideogram 4.0 releases as open-weight model with native 2K resolution, bounding box control, and improved text rendering. On DesignArena leaderboard, it ranks first among open models, behind only OpenAI and Google. Commercial use requires paid license.

Image generation Open source Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·Jun 3

Ideogram 4 is open source! (top ranked on DesignArena)

Ideogram 4, an image generation model, has been released as open source. It ranks first on DesignArena benchmark for visual quality and prompt adherence.

Image generation Open source Benchmarks

SIG

HYP

The Decoder·Jun 3

Build 2026: Microsoft tops Google in image generation while playing catch-up on reasoning

Microsoft announces seven in-house AI models at Build 2026, including its first reasoning model. The company also introduces a new tuning method and an autonomous background agent.

Reasoning Image generation AI Agents

SIG

HYP

Reddit r/LocalLLaMA·Jun 2

1-bit Bonsai Image 4B and Ternary Bonsai Image 4B Image Generation for Local Devices with just 0.93 GB and 1.21 GB respectively of Diffusion Transformer Footprint. So tiny!

Bonsai Image 4B releases 1-bit and ternary quantized image generation models at 0.93 GB and 1.21 GB respectively. These compressed Diffusion Transformer variants run on local devices with minimal memory footprint.

Image generation Open source Tools

SIG

HYP

Reddit r/LocalLLaMA·Jun 2

NVIDIA releases Cosmos 3 Omnimodal world modelson HF

NVIDIA releases Cosmos 3, a collection of omnimodal world models (Nano 16B, Super 64B) capable of generating dynamic video, image, audio, and action commands from text, image, video, and action trajectory inputs. Available on Hugging Face for Physical AI applications.

Video generation Image generation Open source

SIG

HYP

arXiv cs.LG·Jun 2

Geometric Erasure by Contrastive Velocity Matching in Rectified Flows

GEM is a concept erasure framework for Rectified Flow Transformers. It bridges trajectory-based unlearning (Generative Flow Networks) and teacher-guided erasure, using geometric guidance signals to suppress unwanted concepts while preserving benign generation and preventing harmful content synthesis.

AI safety Alignment Papers

SIG

HYP

Hacker News (AI)·May 31

1-Bit Bonsai Image 4B Image Generation for Local Devices

Bonsai Image 4B is a 1-bit quantized image generation model designed to run on local devices. The model compresses weights to 1-bit to drastically reduce size and computational requirements, enabling inference on resource-constrained hardware.

Image generation Open source Infrastructure

SIG

HYP

GitHub Trending·May 31

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> Comfy-Org /</span> ComfyUI

ComfyUI is a modular GUI for diffusion models with a node/graph-based interface, providing API and backend capabilities for image generation.

Image generation Open source Tools

SIG

HYP

Reddit r/LocalLLaMA·May 31

Diffusion in prod: how are you handling spiky GPU load and cold starts?

Production challenges with diffusion models: handling GPU load spikes, cold starts, and inference costs. Scaling from 100 to 10k requests exposes architectural issues and multi-tenancy problems.

Image generation Infrastructure Tools

SIG

HYP

Hacker News (AI)·May 31

AI grifters are creating fake Black people to sell Shein junk

Scammers are using AI-generated images of fake Black people to promote Shein products on social media. Fraudulent marketing practice exploiting image generation and racial bias.

Image generation AI safety Business

SIG

HYP

Reddit r/MachineLearning·May 28

A new dataset with more that 100M hi-quality, curated images, with captions and meta data! [P]

MONET, an Apache 2.0 dataset of 104.9M high-quality images with captions and metadata, released on Hugging Face. Built from 2.9B images and refined. Includes paper, UMAP visualization, text/image retrieval tool, and codebase for training T2I models.

Image generation Embeddings Open source

SIG

HYP

GitHub Trending·May 28

<svg aria-hidden="true" data-component="Octicon" height="16" viewBox="0 0 16 16" version="1.1" width="16" data-view-component="true" class="octicon octicon-repo mr-1 tmp-mr-1 color-fg-muted"> <path d="M2 2.5A2.5 2.5 0 0 1 4.5 0h8.75a.75.75 0 0 1 .75.75v12.5a.75.75 0 0 1-.75.75h-2.5a.75.75 0 0 1 0-1.5h1.75v-2h-8a1 1 0 0 0-.714 1.7.75.75 0 1 1-1.072 1.05A2.495 2.495 0 0 1 2 11.5Zm10.5-1h-8a1 1 0 0 0-1 1v6.708A2.486 2.486 0 0 1 4.5 9h8ZM5 12.25a.25.25 0 0 1 .25-.25h3.5a.25.25 0 0 1 .25.25v3.25a.25.25 0 0 1-.4.2l-1.45-1.087a.249.249 0 0 0-.3 0L5.4 15.7a.25.25 0 0 1-.4-.2Z"></path> </svg> <span data-view-component="true" class="text-normal"> yossTheDev /</span> removerized

Removerized is an AI image toolkit running fully in the browser. Free, private, and offline-first with no server dependency.

Image generation Open source Tools

SIG

HYP

The Decoder·May 28

Amazon builds its own AI production platform and greenlights three AI animated series for Prime Video

Amazon MGM Studios and AWS launch a creators' fund and in-house AI platform called 'Project Nara'. Three animated series are in production with five-week timelines for pilots. Amazon claims the only end-to-end AI content ecosystem in the industry.

Image generation Video generation Business

SIG

HYP

Le Big Data·May 28

Fini les templates ? CapCut lance Design Studio 2.0, l’IA qui joue les directrices artistiques

CapCut launches Design Studio 2.0, an AI-powered platform for graphic creation that replaces traditional templates. The tool offers automated artistic direction for visual design.

Image generation Tools Business

SIG

HYP

The Decoder·May 27

Microsoft's MAI-Image-2.5 pulls even with Google's Nano Banana 2 on benchmarks

Microsoft MAI-Image-2.5 ranks third on Arena's text-to-image leaderboard, matching Google Nano Banana 2 but trailing OpenAI Image-2. The model shows clear improvements in rendering text within images and commercial visuals.

Image generation Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·May 27

Does Engram Do Memory Retrieval in Autoregressive Image Generation?

An Engram module (O(1) hash-keyed associative memory) injected into Transformers for autoregressive image generation on ImageNet 256×256 fails to improve quality (FID) despite FLOP gains. Gate-clamp, donor-probe, and frozen-table experiments show the module acts as a gated architectural side-pathway, not a content-addressed retrieval mechanism.

Papers Image generation Benchmarks

SIG

HYP

Reddit r/LocalLLaMA·May 26

Small comparison on full compute performance (Anima) of 5090 (600,475 and 400W) vs 6000 PRO MaxQ (325W), and 6000 PRO WS/SE (600W).

Compute performance benchmark (text-to-image diffusion) comparing RTX 5090 (400-600W) vs RTX 6000 PRO MaxQ (325W) and 6000 PRO WS (600W). Tests on Forge Neo with SageAttention 2.1, 896x1088 resolution, batch size 4. 5090 undervolted/overclocked (2930MHz, +4400MHz VRAM), 6000 PRO MaxQ modified (+550MHz core).

Image generation Benchmarks Infrastructure

SIG

HYP

Reddit r/LocalLLaMA·May 26

PrismML just released Binary and Ternary Bonsai Image 4B: 1-bit/ternary text-to-image diffusion transformers that can even run 100% locally in your browser on WebGPU.

PrismML releases Bonsai Image 4B, 1-bit/ternary quantized text-to-image diffusion transformers. ~3GB model size (vs 16GB for FLUX.2 Klein), runs 100% locally in browser via WebGPU. Apache-2.0 licensed.

Image generation Open source Tools

SIG

HYP

arXiv cs.LG·May 26

Filtered Posterior Mean Collections: A Unified Framework for Analytical Models of Diffusion Generalization

Unified framework (FPMC) modeling denoising functions in diffusion models. Consolidates existing approaches through query precision vectors, response weights, and source distributions. Improves performance via soft relaxations and distribution augmentations.

Image generation Papers Benchmarks

SIG

HYP

arXiv cs.LG·May 22

Hierarchical Variational Policies for Reward-Guided Diffusion

Hierarchical variational framework for adapting pretrained diffusion models to reward-aligned objectives. Formulates test-time adaptation as a lightweight stochastic policy that amortizes per-step control. On 4x super-resolution: better perceptual quality with 5x faster inference than best baseline.

Reinforcement learning Image generation

SIG

HYP

Reddit r/LocalLLaMA·May 21

Training a vision model from scratch on iPod touch 4 images

A user trains a DCGAN model from scratch on 350 images of a red Solo cup taken with an iPod touch 4 under varying lighting and backgrounds. Goal: capture sensor-specific artifacts from the device. Generated images resemble DALL-E 2022 output.

Image generation Open source

SIG

HYP

Le Big Data·May 20

Ne vous faites plus avoir : les images de ChatGPT ont désormais une « marque »

OpenAI adds an invisible watermark to images generated by ChatGPT to identify them and combat misinformation. This watermarking technique enables detection of AI-generated content.

GPT Image generation AI safety

SIG

HYP

Hacker News (AI)·May 19

OpenAI Adopts Google's SynthID Watermark for AI Images with Verification Tool

OpenAI integrates Google's SynthID watermark into DALL-E to mark AI-generated images. A verification tool detects these invisible markings, improving traceability of synthetic content.

OpenAI Image generation AI safety

SIG

HYP