Scaling vision Transformer 論文理解
- 1. 摘要
- 2. 一些主要結論小結
- 2.1 few shot transfer learning
- 2.2 Pareto-front
- 3. 討論
- 3.1 Limitations
- 3.2 社會作用
- 4. 文章結論
- 參考資料
1. 摘要
Attention-based neural networks such as the Vision Transformer (ViT) have recently attained state-of-the-art results on many computer vision benchmarks. Scale is a primary ingredient in attaining excellent results, therefore, understanding a model’s scaling propertie