接上一篇
Approach
前面的摘要和Introduction做了一些概要性的介紹,論文在第二章,也就是approach中,介紹了模型的設計,zero,one,few-shot的設計等等。
這一章一開頭就說,GPT-3的結構和GPT-2的結構一樣,只是在相應的把模型尺寸,數據規模,訓練時間等增加了。Our basic pre-training approach, including model, data, and training, is similar to the process described in [RWC+19],
with relatively straightforward scaling up of the model size, dataset size and diversity, and length of training。
而且在上下文學習這一塊也和GPT-2一樣,Our use of in-context learning is also similar to [RWC+19], but in this work we systematically explore different settings for
learning within the context.
所以論文的意思是,從不同的角度來評估GPT-3,也就是在第一章中提到的,GPT-3有多不依賴某個具體的NLP任務&#x