U-Net is a convolutional neural network that was developed for image segmentation. The network is based on a fully convolutional neural network whose architecture was modified and extended to work with fewer training images and to yield more precise segmentation. Segmentation of a 512 × 512 image takes less than a second on a modern (2015) GPU using the U-Net architecture.
The U-Net architecture has also been employed in diffusion models for iterative image denoising. This technology underlies many modern image generation models, such as DALL-E, Midjourney, and Stable Diffusion.
U-Net is also being explored for language models. Tokenization is not a separate step, allowing the model to more easily understand spelling and concurrently vectorizing / tokenizing higher level concepts.