METHOD,論文主要部分
In model design we follow the original Transformer (Vaswani et al., 2017) as closely as possible. An advantage of this intentionally simple setup is that scalable NLP Transformer architectures – and their efficient implementations – can be used almost out of the box.
論文一上來就強調了,ViT基本上就是采用的原始Transformer結構。接下來的一句中的幾個關鍵點:
- intentionally simple setup,簡單化設計。指的就是直接使用Transformer結構,而沒有做其他的適配性的結構改造,強調模型的簡潔性。
- out of the box,強調開箱可用。
ViT模型架構
這一節一上來就放了模型架構圖:
- 論文一上來就說了Transformer在圖像領域最關鍵的問題,如何把一個2D圖像(包含多通道)變成一個一維的數據:The standard Transformer receives as input a 1D
se