arXiv cs.AI·19 May 2026

Vision Transformer-Conditioned UNet for Domain-Adaptive Semantic Segmentation

Signal

Hype

In three linesViTC-UNet conditions a UNet on frozen pre-trained Vision Transformer representations via learnable tokens and two-way attention decoder. The approach improves biomedical semantic segmentation on MRI and CT without end-to-end fine-tuning, combining ViT global priors with UNet local inductive bias and high-resolution decoding.

Read source

Your take?

Vision Papers Benchmarks

Summary generated by Claude — human-verified

Vision Transformer-Conditioned UNet for Domain-Adaptive Semantic Segmentation

Other angles on this story