nlp gpt論文_GPT-3:NLP鎮的最新動態

nlp gpt論文

什么是GPT-3? (What is GPT-3?)

The launch of Open AI’s 3rd generation of the pre-trained language model, GPT-3 (Generative Pre-training Transformer) has got the data science fraternity buzzing with excitement!

Open AI的第三代預訓練語言模型GPT-3(生成式預訓練變壓器)的發布使數據科學界的關注度高漲!

The world of Language Models (LM) is quite fascinating. To give a brief introduction, these models learn the probabilities of a sequence of words that occur in a commonly spoken language (say, English) and predict the next possible word in that sequence. They are essential for numerous NLP tasks like:

語言模型(LM)的世界非常迷人。 為了簡要介紹,這些模型學習了以常用口語(例如英語)出現的單詞序列的概率,并預測了該序列中的下一個可能單詞。 它們對于許多NLP任務至關重要,例如:

  • Language Translation

    語言翻譯
  • Text Classification

    文字分類
  • Sentiment Extraction

    情感提取
  • Reading Comprehension

    閱讀理解
  • Named Entity Recognition

    命名實體識別
  • Question Answer Systems

    問答系統
  • News Article Generation, etc

    新聞文章生成等

They’ve become immensely popular since the release of BERT by Google, with a host of companies competing to build the next big thing in the NLP domain!

自Google發行BERT以來,它們已經變得非常受歡迎,許多公司競相在NLP領域打造下一個重要產品!

Open AI’s GPT-3 is the largest Language Model having 175 BN parameters, 10x more than that of Microsoft’s Turing NLG

Open AI的GPT-3是最大的語言模型,具有175個BN參數,是Microsoft Turing NLG的10倍以上

Open AI has been in the race for a long time now. The capabilities, features and limitations of their latest edition, GPT-3, has been described in a detailed research paper. Its predecessor GPT-2 (released in Feb 2019) was trained on 40GB of text data and had 1.5 BN parameters. In comparison, GPT-3 has a whopping 175 BN parameters, 10 times more than the next largest LM, the Turing NLG, developed by Microsoft with 17 BN parameters!

開放式AI競賽已經有很長時間了。 最新研究版本GPT-3的功能,特性和局限性已在一份詳細的研究論文中進行了描述。 它的前身GPT-2 (于2019年2月發布)接受了40GB文本數據的訓練,參數為1.5BN。 相比之下,GPT-3的參數高達175個BN,是第二大LM圖靈NLG的十倍,圖靈NLG是由微軟開發的具有17個BN參數的!

Image for post
Fig-1: Comparison of all available language models (LMs) parameter wise圖1:所有可用語言模型(LM)參數明智的比較

GPT-3 is based on the concepts of transformer and attention similar to GPT-2. It has been trained on a large and variety of data like Common Crawl, web texts, books and Wikipedia, based on the tokens from each data. Prior to training the model, the average quality of the datasets has been improved in 3 steps.

GPT-3基于變壓器和注意力的概念 類似于GPT-2。 根據每個數據的標記,已經針對大量數據(例如Common Crawl ,Web文本,書籍和Wikipedia)進行了培訓。 在訓練模型之前,數據集的平均質量已通過3個步驟得到了改善。

The following table shows the training corpus of GPT-3:

下表顯示了GPT-3的訓練語料庫:

Image for post

GPT-3 has variants in terms of

GPT-3在以下方面具有變體

  • Sizes (Parameters and Layers)

    大小(參數和層)
  • Architectures

    建筑學
  • Learning hyper-parameters (batch size in tokens and learning rate) ranging from 125 MN to 175 BN parameters

    學習超參數(令牌的批量大小和學習率)范圍從125 MN到175 BN參數

“The largest version of GPT-3 has 175 BN Parameters, 96 Attention Layers and 3.2 MN Batch Size”

GPT-3的最大版本具有175 BN參數,96個注意層和3.2 MN批處理大小”

Here are the details of the different variants of GPT-3 model:

以下是GPT-3模型的不同變體的詳細信息:

Image for post
Fig-2: Details of variants of the GPT-3 model圖2:GPT-3模型的變體詳細信息

它能做什么? (What can it do?)

Many of the NLP tasks discussed in this blog can be performed by GPT-3 without any gradient, parameter updates or fine-tuning. This makes it a Task-Agnostic Model as it can perform tasks without any or very few prompts or examples or demonstrations called shots.

GPT-3可以執行此博客中討論的許多NLP任務,而無需進行任何漸變,參數更新或微調。 這使其成為與任務無關的模型,因為它可以執行任務而無需任何或很少的提示,示例或稱為鏡頭的演示。

The following image displays a Zero / One / Few-Shot based task accuracy comparison for various sizes of the model (in terms of parameters) for a simple task to remove random symbols from a word with the number of in-context examples ranging between 10 to 100.

下圖顯示了針對零模型,零模型,零模型的任務準確性比較,該模型針對各種大小的模型(就參數而言),以完成一項簡單任務,以從單詞中刪除隨機符號,上下文中示例的數量在10個之間到100。

Image for post
Fig-3: Zero / One / Few-Shot based task accuracy comparison for models of different sizes圖3:基于零/一/少量射擊的任務精度比較,用于不同大小的模型

“假新聞”難題 (The “Fake News” Conundrum)

Earlier, the release of the largest model of GPT-2 was briefly stalled due to a controversial debate of it being capable of generating fake news. It was later published on Colab notebooks. In recent times, however, this has been quite common and the real news themselves have been hard to believe!

早些時候,由于有爭議的關于GPT-2能夠產生假新聞的爭議,GPT-2的最大型號的發布暫時停止了。 后來發表在Colab筆記本上 。 但是,最近這種情況已經很普遍了,真正的新聞本身很難讓人相信!

The fake news generated by GPT-3 has been so difficult to distinguish from the real ones, and in one of the experiments, the results show that only 50% of the fake news could actually be detected!

由GPT-3生成的虛假新聞很難與真實新聞區分開,在其中一項實驗中,結果表明實際上只能檢測到50%的虛假新聞!

Image for post
Fig-4: Accuracy comparison of manual fake news detection for models of different sizes圖4:不同大小型號的人工假新聞檢測的準確性比較

In a task to predict the last word of a sentence, GPT-3 outperformed the current SOTA (state of the art) algorithm by 8% with an accuracy score of 76% in a zero-shot setting. In the few-shots setting, it has achieved an accuracy score of 86.4%!

在預測句子的最后一個單詞的任務中,GPT-3在零擊設置中的性能得分為76%,優于當前的SOTA(最新技術)算法。 在幾次拍攝設置中,它的準確率達到86.4%!

In a closed book question answering tasks, GPT-3 outperformed a fine-tuned SOTA that uses an Information Retrieval component in both one and few-shot settings.

在一本閉卷的問答任務中,GPT-3的性能優于經過精心調整的SOTA,該SOTA在一次和多次拍攝設置中都使用了信息檢索組件。

Image for post
Fig-5: Performance of GPT-3 on Trivia QA for models of different sizes圖5:不同尺寸型號的Trivia QA上GPT-3的性能

The GPT-3 API has been on the waiting list, but all the folks who could get a chance to try it shared their interesting findings and amazing results of this powerful model. Here are a few things that were observed while experimenting on the API’s interface called the Playground.

GPT-3 API一直在等待中,但是所有有機會嘗試使用它的人都分享了他們有趣的發現以及該強大模型的驚人結果。 這是在API的稱為Playground的接口上進行實驗時觀察到的一些事情。

Open AI GPT-3 API游樂場摘要: (Summary of the Open AI GPT-3 API Playground:)

Settings and Presets:Upon clicking on the settings icon, one can configure various parameters like the text length, temperature (from low/boring to standard to chaotic/creative), start and stop generated text etc. And there are multiple presets to choose and play around with like Chat, Q&A, Parsing Unstructured Data, Summarize for a 2nd grader

設置和預設:單擊設置圖標后,可以配置各種參數,例如文本長度,溫度(從低/無聊到標準到混亂/創意),開始和停止生成的文本等。并且有多個預設可供選擇和玩耍,例如聊天,問答,解析非結構化數據,為二年級學生匯總

  • Chat:

    聊天:

    The chat preset looks more like a chatbot where you can set the character of the AI as friendly, creative, clever and helpful which provides informative answers in a very polite manner whereas if you set the character of the AI to brutal it responds exactly as the character suggests!

    聊天預設看起來更像是一個聊天機器人,您可以在其中將AI的角色設置為友好,富有創造力,聰明和樂于助人,以非常有禮貌的方式提供信息豐富的答案,而如果將AI的角色設置為殘酷,則其響應方式與性格暗示!

  • Q&A:

    問答:

    Question answering needs some training before it starts answering our questions and people did not have any complaints with the kind of answers received.

    問題解答在開始回答我們的問題之前需要接受一些培訓,并且人們對所收到的答案沒有任何抱怨。

  • Parsing Unstructured Data:

    解析非結構化數據:

    This is an interesting preset of the model which can comprehend and extract structured information from the unstructured text

    這是模型的一個有趣的預設,它可以理解和從非結構化文本中提取結構化信息

  • Summarize for 2nd Grader:

    總結二年級學生:

    This preset shows another level of text compression by rephrasing the difficult sentences and concepts into simpler words and sentences that can be easily understood by a kid

    該預設通過將難于理解的句子和概念改寫為較容易理解的簡單單詞和句子,從而顯示了另一級文本壓縮

Multilingual text processing:GPT-3 can handle languages other than English better than GPT-2. People have tried tasks in various languages German, Russian and Japanese it did perform well and were very much ready for multilingual text processing.

多語言文本處理: GPT-3可以比GPT-2更好地處理英語以外的語言。 人們嘗試了多種語言的德語,俄語和日語任務,性能很好,并且已經為多語言文本處理做好了充分的準備。

Text Generation:It can generate poems on demand that too in a particular style if required, can write stories and essays with some fine-tuning even in other languages

文本生成:它可以按需生成詩歌,如果需要,也可以使用特定樣式的詩,甚至可以用其他語言對故事和論文進行微調。

Code Generation:People have claimed that this API can generate code with a minimum prompts

代碼生成:人們聲稱此API可以在最少提示的情況下生成代碼

Here is an article which showcases all its capabilities and excerpts from social media.

是一篇文章,展示了其所有功能和來自社交媒體的摘錄。

And this is how the AI interface looks like (Below image shows the Q&A preset):

這就是AI界面的樣子(下圖顯示了Q&A預設):

Image for post
Fig-6: Preview of the AI Playground page for a Q&A preset圖6:Q&A預設的AI Playground頁面預覽

我們如何使用它? (How can we use it?)

Unlike a lot of language models, GPT-3 does not need Transfer Learning, where the model is fine-tuned on task-specific data sets for specific tasks. The author of a research paper on GPT-3 mentions the following advantages of having a task-agnostic model:

與許多語言模型不同,GPT-3不需要轉移學習,在該模型中,可以根據特定任務的特定于任務的數據集對模型進行微調。 有關GPT-3的研究論文的作者提到了具有任務不可知模型的以下優點:

  • Collecting task-specific data is difficult

    收集特定于任務的數據很困難
  • Fine-tuning might yield out-of-distribution performance

    微調可能會導致分布外性能
  • Need for an adaptable NLP system similar to humans, which can understand the natural language (English) and perform tasks with few or no prompts

    需要類似于人類的適應性NLP系統,該系統可以理解自然語言(英語),并且很少或沒有提示地執行任務

The applications of GPT-3 are in-context learning, where a model is fed with a task/prompt/shot or an example and it responds to it on the basis of the skills and pattern recognition abilities that were learnt during the training to adapt the current specific task.

GPT-3的應用是在上下文中學習,在模型中提供任務/提示/鏡頭或示例,并根據訓練過程中學習的技能和模式識別能力對模型做出響應當前的特定任務。

Despite its tremendous useability, the huge model size is the biggest factor hindering the usage for most people, except those with available resources. However, there are discussions in the fraternity that distillation might come to the rescue!

盡管具有巨大的可用性,但是巨大的模型大小是阻礙大多數人(除了擁有可用資源的人)使用的最大因素。 但是,在兄弟會中有討論可能會解救蒸餾 !

有什么限制? (What are the limitations?)

The Open AI founder himself said that “GPT-3 has weaknesses and it makes silly mistakes”. It is weak in the segment of sentence comparison where it has to see the usage of a word in 2 different sentences.

Open AI創始人本人說:“ GPT-3有弱點,并且會犯愚蠢的錯誤”。 它在句子比較部分中很弱,在該部分中必須查看兩個不同句子中一個單詞的用法。

As per the researchers, it still faces some problems in the following tasks:

根據研究人員的說法,它在以下任務中仍然面臨一些問題:

  • Repetitions

    重復次數
  • Coherence loss

    相干損失
  • Contradictions

    矛盾之處
  • Drawing real conclusions

    得出真實結論
  • Multiple digit additions and subtractions

    多位數加減
Image for post
Fig-7: Chart showing the results of different arithmetic tasks in a few-shot setting for models of different sizes圖7:該圖表顯示了針對不同大小的模型在幾次設置中不同算術任務的結果

結論 (Conclusion)

It is great to have an NLP system that doesn’t require large amounts of custom-task specific datasets and custom-model architecture to solve specific NLP tasks. The experiments conducted show its power, potential and impact on the future of NLP advancement.

擁有不需要大量特定于定制任務的數據集和定制模型體系結構來解決特定NLP任務的NLP系統,真是太好了。 進行的實驗表明了它的力量,潛力以及對NLP未來發展的影響。

Though GPT-3 doesn’t do well on everything and the size of it makes it difficult to use by everyone, this is just the threshold of a lot of new improvements to come in the field of NLP!

盡管GPT-3不能在所有方面都做得很好,并且它的大小使每個人都難以使用,但這只是NLP領域中許多新改進的門檻!

翻譯自: https://medium.com/quick-bites/gpt-3-the-latest-in-the-nlp-town-961259a0930f

nlp gpt論文

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/388531.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/388531.shtml
英文地址,請注明出處:http://en.pswp.cn/news/388531.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

真實不裝| 阿里巴巴新人上路指北

新手上路,總想聽聽前輩們分享他們走過的路。橙子選取了阿里巴巴合伙人逍遙子(阿里巴巴集團CEO) 、Eric(螞蟻金服董事長兼CEO)、Judy(阿里巴巴集團CPO)的幾段分享,他們是如何看待職場…

小程序學習總結

上個周末抽空了解了一下小程序,現在將所學所感記錄以便日后翻看;需要指出的是我就粗略過了下小程序的api了解了下小程序的開發流程以及工具的使用,然后寫了一個小程序的demo;在我看來,如果有前端基礎學習小程序無異于錦上添花了,而我這個三年的碼農雖也寫過不少前端代碼但離專業…

tomcat java環境配置

jsp 環境變量配置 一、配置JDK 首先,從Sun網站上下載jdk。 雙擊jdk-1_5_0_04-windows-i586-p.exe開始安裝,默認安裝到C:/Program Files/Java/jdk1.5.0_04,你也可以更改路徑,但要記住最后選擇的路徑,設置環境變量的時候…

uber 數據可視化_使用R探索您在Uber上的活動:如何分析和可視化您的個人數據歷史記錄

uber 數據可視化Perhaps, dear reader, you are too young to remember that before, the only way to request a particular transport service such as a taxi was to raise a hand to make a signal to an available driver, who upon seeing you would stop if he was not …

java B2B2C springmvc mybatis電子商城系統(四)Ribbon

2019獨角獸企業重金招聘Python工程師標準>>> 一:Ribbon是什么? Ribbon是Netflix發布的開源項目,主要功能是提供客戶端的軟件負載均衡算法,將Netflix的中間層服務連接在一起。Ribbon客戶端組件提供一系列完善的配置項如…

c語言函數的形參有幾個,C中子函數最多有幾個形參

該樓層疑似違規已被系統折疊 隱藏此樓查看此樓C89 31個,C99 127個。ANSI C892.2.4.1 Translation limitsThe implementation shall be able to translate and execute at least one program that contains at least one instance of every one of the following lim…

Linux上Libevent的安裝

1、下載wget -O libevent-2.0.21-stable.tar.gz https://github.com/downloads/libevent/libevent/libevent-2.0.21-stable.tar.gz2、解壓 tar zxvf libevent-2.0.21-stable.tar.gz3、配置安裝路徑 cd libevent-2.0.21-stable ./configure -prefix/usr4、編譯并安裝 make make …

Win7安裝oracle 10 g

開始-運行-輸入hdwwiz-回車 ——選則手動 ——網絡適配器——左邊選Microsoft,右邊找到Microsoft Loopback Adapter ——完成 打開 控制面板\網絡和 Internet\網絡和共享中心 會發現多了一個本地連接 點詳細信息 發現是Microsoft Loopback Adapter的。…

基于plotly數據可視化_[Plotly + Datashader]可視化大型地理空間數據集

基于plotly數據可視化簡介(我們將創建的內容): (Introduction (what we’ll create):) Unlike the previous tutorials in this map-based visualization series, we will be dealing with a very large dataset in this tutorial (about 2GB of lat, lon coordinat…

Centos用戶和用戶組管理

inux系統是一個多用戶多任務的分時操作系統,任何一個要使用系統資源的用戶,都必須首先向系統管理員申請一個賬號,然后以這個賬號的身份進入系統。1、添加新的用戶賬號使用useradd命令,其語法如下:useradd 選項 用戶名-…

吹氣球問題的C語言編程,C語言怎樣給一個數組中的數從大到小排序

滿意答案#include "stdio.h"int main(){int i,j;int a[12];for(i1; i<10; i)scanf("%d",&a[i]);for(i1; i<10; i)for(ji; j<10; j)if(a[i]{int ta[i];a[i]a[j];a[j]t;}//前十個數的排序for(i1; i<10; i)printf("%d ",a[i]);prin…

裴波那契數列

斐波那契數列&#xff1a;0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, ... 求斐波那契數列第 n 項的值&#xff1a; 方法一&#xff1a;遞歸 function fibonacci(n) {if (!Number.isSafeInteger(n) || n < 0) {return;}if (n 0 || n 1) {return n;} else {return fibo…

劃痕實驗 遷移面積自動統計_從Jupyter遷移到合作實驗室

劃痕實驗 遷移面積自動統計If you want to use Google Colaboratory to perform your data analysis, for building data pipelines and data visualizations, here is the beginners’ guide to migrate from one tool to the other.如果您想使用Google Colaboratory進行數據分…

英法德三門語言同時達到c1,【分享】插翅而飛的孩子(轉載)

微信轉來的&#xff0c;覺得發人深思&#xff0c;轉來這里插翅而飛的孩子(一)開篇一&#xff1a;讓孩子擁有一雙豐滿的翅膀。作者簡介&#xff1a;英華蘭的Dr.Bing,德國兒童教育學博士&#xff0c;數字媒體碩士和計算機軟件工程本科。精通英法德三門語言&#xff0c;從事兒童語…

數據庫建表賦予權限語句

sqlplus /nologconn / as sysdba//創建臨時表空間create temporary tablespace zfmi_temptempfile D:\oracle\oradata\zfmi\zfmi_temp.dbf size 32m autoextend on next 32m maxsize 2048mextent management local;//tempfile參數必須有//創建數據表空間create tablespace zfmi…

day03 基本數據類型

1.什么是數據類型 變量值即我們 存放的數據 &#xff0c;數據類型及變量值的類型 2.變量值為何要區分類型 因為變量值使用記錄現實世界中事物的特征&#xff0c;針對不同的特征就應該用不同類型的值去標識 3.如何應用數據類型 一 數據類型&#xff1a; 1.整型int &#xff1a;…

美國移民局的I797表原件和I129表是什么呢

I-129表,Petition for a Non-immigrant Worker&#xff0c;即非移民工作許可申請表I797 表 &#xff0c;Original L1-1A approval notice L1簽證批準通過通知表L-1簽證的申請程序1. L-1簽證的申請必須首先由準備調派雇員的外國母公司在美國的分支機構向移民局提出陳情申請。這些…

數據開放 數據集_除開放式清洗之外:敘述是開放數據門戶的未來嗎?

數據開放 數據集There is growing consensus in the open data community that the mere release of open data — that is data that can be freely accessed, remixed, and redistributed — is not enough to realize the full potential of openness. Successful open data…

單選按鈕android服務器,android – 如何在radiogroup中將單選按鈕設置...

我已經動態創建了RadioGroup和RadioButton,如下所示&#xff1a;RadioGroup radioGroup new RadioGroup(context);RadioButton radioBtn1 new RadioButton(context);RadioButton radioBtn2 new RadioButton(context);RadioButton radioBtn3 new RadioButton(context);radio…

導入DMP文件過程

導入DMP文件過程 --釋放重名表空間 drop tablespace hxgr including contents and datafiles cascade constraints; --建立表空間 create tablespace hxgr logging datafile D:\oracle\oradata\hxgr\hxgr.dbf size 100m autoextend on next 32m maxsize 2048m extent manage…