North Coast Synthesis Ltd.

Vision Transformers

◀ Prev | 2025-12-01, access: $ Basic

What if we applied the "attention is all you need" architecture to images instead of language?  That's the question considered in this paper from 2021, which laid the groundwork for today's multi-modal models.

Video AIAYN BERT attention basics image What if we applied the "attention is all you need" architecture to images instead of language? That's the question considered in this paper from 2021, which laid the groundwork for today's multi-modal models.