Vision Transformer-Conditioned UNet for Domain-Adaptive Semantic Segmentation
ViTC-UNet conditions a UNet on frozen pre-trained Vision Transformer representations via learnable tokens and two-way attention decoder. The approach improves biomedical semantic segmentation on MRI and CT without end-to-end fine-tuning, combining ViT global priors with UNet local inductive bias and high-resolution decoding.