Language Models are Few-Shot Learners: 開箱即用的GPT-3(二)

bicheng/2025/7/12 4:25:01/文章來源:https://blog.csdn.net/pcgamer/article/details/149194587

接上一篇

Approach

前面的摘要和Introduction做了一些概要性的介紹，論文在第二章，也就是approach中，介紹了模型的設計，zero，one，few-shot的設計等等。

這一章一開頭就說，GPT-3的結構和GPT-2的結構一樣，只是在相應的把模型尺寸，數據規模，訓練時間等增加了。Our basic pre-training approach, including model, data, and training, is similar to the process described in [RWC+19],
with relatively straightforward scaling up of the model size, dataset size and diversity, and length of training。

而且在上下文學習這一塊也和GPT-2一樣，Our use of in-context learning is also similar to [RWC+19], but in this work we systematically explore different settings for
learning within the context.

所以論文的意思是，從不同的角度來評估GPT-3，也就是在第一章中提到的，GPT-3有多不依賴某個具體的NLP任務&#x

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/bicheng/88489.shtml
繁體地址，請注明出處：http://hk.pswp.cn/bicheng/88489.shtml
英文地址，請注明出處：http://en.pswp.cn/bicheng/88489.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！