Vision Transformer-Conditioned UNet for Domain-Adaptive Semantic Segmentation
Signal
72
Hype
18
In three linesViTC-UNet conditions a UNet on frozen pre-trained Vision Transformer representations via learnable tokens and two-way attention decoder. The approach improves biomedical semantic segmentation on MRI and CT without end-to-end fine-tuning, combining ViT global priors with UNet local inductive bias and high-resolution decoding.Read source
Your take?
Summary generated by Claude — human-verified