openai-gpt
Disclaimer: My opinions are informed by my experience maintaining Cortex, an open source platform for machine learning engineering.
免責聲明:我的看法是基于我維護 機器學習工程的開源平臺 Cortex的 經驗而 得出 的。
If you frequent any part of the tech internet, you’ve come across GPT-3, OpenAI’s new state of the art language model. While hype cycles forming around new technology isn’t new—GPT-3’s predecessor, GPT-2, generated quite a few headlines as well—GPT-3 is in a league of its own.
如果您經常光顧技術互聯網的任何部分,就會遇到OpenAI的最新語言模型GPT-3。 盡管圍繞新技術形成的炒作周期并不新鮮-GPT-3的前身GPT-2也引起了許多關注,但GPT-3卻是一個聯盟。
Looking at Hacker News for the last couple months, there have been dozens of hugely popular posts, all about GPT-3:
在最近幾個月的Hacker News中,有數十篇非常受歡迎的帖子,都是關于GPT-3的:
If you’re on Twitter, you’ve no doubt seen projects built on GPT-3 going viral, like this Apple engineer who used GPT-3 to write Javascript using a specific 3D rendering library:
如果您在Twitter上,那么您無疑會看到基于GPT-3構建的項目正在蓬勃發展,例如這位蘋果工程師使用GPT-3使用特定的3D渲染庫編寫Javascript:
And of course, there have been plenty of “Is this the beginning of SkyNet?” articles written:
當然,有很多“這是天網的開始嗎?” 撰寫文章:

The excitement over GPT-3 is just a piece of an bigger trend. Every month, we see more and more new initiatives release, all built on machine learning.
對GPT-3的興奮只是一大趨勢。 每個月,我們都會看到越來越多的新計劃發布,它們都是基于機器學習的。
To understand why this is happening, and what the trend’s broader implications are, GPT-3 serves as a useful study.
要了解這種情況的發生原因以及趨勢的更廣泛含義,GPT-3是一項有用的研究。
GPT-3有什么特別之處? (What’s so special about GPT-3?)
The obvious take here is that GPT-3 is simply more powerful than any other language model, and that the increase in production machine learning lately can be chalked up to similar improvements across the field.
顯而易見,GPT-3比其他任何語言模型都強大,并且最近生產機器學習的增加可以歸結為該領域的類似改進。
Undoubtedly, yes. This is a factor. But, and this is crucial, GPT-3 isn’t so popular just because it’s powerful. GPT-3 is ubiquitous because it is usable.
毫無疑問,是的。 這是一個因素。 但是,這很關鍵,GPT-3并不是因為它強大而流行。 GPT-3因其可用而無處不在。
By “usable,” I mean that anyone can build with it, and it’s easy. For context, after the full GPT-2 was released, most of the popular projects built on it were built by machine learning specialists, and required substantial effort:
“可用”是指任何人都可以使用它進行構建,而且很容易。 就上下文而言,在完整的GPT-2發布之后,基于它的大多數流行項目都是由機器學習專家構建的,并且需要大量的精力:
Comparatively, it has only been a couple of months since GPT-3's announcement, and we’re already seeing dozens of viral projects built on it, often of the “I got bored and built this in an afternoon” variety:
相比較而言,距GPT-3發布僅兩個月,我們已經看到了數十個病毒式項目,這些項目通常是“我無聊并在下午建造的”這類項目:
Anyone with some basic engineering chops can now build an application leveraging state of the art machine learning, and this increase in the usability of models—not just their raw power—is an industry-wide phenomenon.
現在,任何具有基本工程知識的人都可以利用最先進的機器學習來構建應用程序,并且模型可用性 (不僅是原始能力)的這種增加是整個行業的現象。
為什么用機器學習突然變得如此容易 (Why it’s suddenly so easy to build with machine learning)
One of the biggest blockers to using machine learning in production has been infrastructure. We’ve had models capable of doing incredible things for a long time, but actually building with them has remained a major challenge.
基礎設施是在生產中使用機器學習的最大障礙之一。 我們擁有的模型能夠長時間執行令人難以置信的工作,但實際上如何構建它們仍然是一個重大挑戰。
For example, consider GPT-2. How would you build a GPT-2 application?
例如,考慮使用GPT-2。 您將如何構建GPT-2應用程序?
Intuitively, the model is more or less an input-output machine, and the most logical thing to do would be to treat it as some sort of microservice, a predict()
function your application could call. Pass in some text and receive GPT-2 generated text in return, just like any other API.
直觀地講,該模型或多或少是一臺輸入/輸出機器,最合乎邏輯的事情是將其視為某種微服務,即應用程序可以調用的predict()
函數。 與其他任何API一樣,傳遞一些文本并接收GPT-2生成的文本作為回報。
This is the main way of deploying GPT-2 (what is known as realtime inference), and it comes with some serious challenges:
這是部署GPT-2(稱為實時推斷)的主要方式,并且面臨一些嚴峻的挑戰:
GPT-2 is massive. The fully trained model is roughly 6 GB. Hosting a GPT-2 microservice requires a lot of disk space.
GPT-2非常龐大 。 經過全面訓練的模型大約為6 GB。 托管GPT-2微服務需要大量磁盤空間。
GPT-2 is compute hungry. Without at least one GPU, you will not be able to generate predictions with anywhere near acceptable latency.
GPT-2非常餓 。 如果沒有至少一個GPU,您將無法在接近可接受延遲的任何位置生成預測。
GPT-2 is expensive. Given the above, you need to deploy GPT-2 to a cluster provisioned with large GPU instances—very expensive at scale.
GPT-2價格昂貴。 鑒于上述情況,您需要將GPT-2部署到配備了大型GPU實例的集群上,這在規模上非常昂貴。
And this is just for the vanilla, pretrained GPT-2 model. If you want to fine tune GPT-2 for other tasks, that too will be its own technical challenge.
這僅適用于經過預訓練的原始GPT-2模型。 如果您想對GPT-2進行微調以完成其他任務,那也將是其自身的技術挑戰。
This is why machine learning has been so unusable. Using it in production required you not only to be versed in machine learning, but also DevOps and backend development. This describes very few people.
這就是為什么機器學習如此無法使用的原因。 在生產中使用它不僅需要精通機器學習,還需要DevOps和后端開發。 這說明很少有人。
Over the last several years, this has changed. There has been an emphasis in the community to improve infrastructure, and as a result, it’s gotten much easier to actually use models. Now, you can take a new model, write your API, and hit deploy
—no DevOps needed.
在過去的幾年中,這種情況發生了變化。 社區一直在強調改善基礎結構,因此,實際使用模型變得更加容易。 現在,您可以采用新模型,編寫API并進行deploy
-無需DevOps。
GPT-3 is an extreme example of this trend. The model, which is almost certainly too large for most teams to host, was actually released as an API.
GPT-3是這種趨勢的極端例子。 該模型幾乎可以肯定對于大多數團隊來說太大了,實際上是作為API發布的。
While this move rankled many, it had a secondary effect. All of a sudden, using the most powerful language model in the world was easier than sending a text message with Twilio or setting up payments with Stripe.
盡管此舉激怒了許多人,但產生了輔助作用。 突然之間,使用世界上最強大的語言模型比通過Twilio發送短信或通過Stripe設置付款要容易得多。
In other words, you could call GPT-3 the most complex language model in history, but you could also call it just another API.
換句話說,您可以將GPT-3稱為歷史上最復雜的語言模型,但也可以將其稱為另一個API 。
The number of people who can query an API, as it turns out, is orders of magnitude higher than the number of people that can deploy GPT-2 to production, hence the huge number of GPT-3 projects.
事實證明,可以查詢API的人數比可以將GPT-2部署到生產環境的人數高出幾個數量級,因此存在大量的GPT-3項目。
機器學習工程現已成為主流 (Machine learning engineering is mainstream now)
GPT-3’s hype train is a convergence of things. It does have unprecedented accuracy, but it is also incredibly usable, and was released at a time when machine learning engineering has matured as an ecosystem and discipline.
GPT-3的炒作是事物的融合。 它確實具有史無前例的準確性,但也非常有用,并且是在機器學習工程作為一種生態系統和學科成熟時發布的。
For context, machine learning engineering is a field focused on building applications out of models. “How can I train a model to most accurately generate text?” is an ML research question. “How can I use GPT-2 to write folk music?” is a machine learning engineering question.
就上下文而言,機器學習工程是一個專注于用模型構建應用程序的領域。 “如何訓練模型以最準確地生成文本?” 是一個機器學習研究問題。 “如何使用GPT-2 編寫民間音樂 ?” 是一個機器學習工程問題。
Because the machine learning engineering community is growing rapidly, companies are releasing new models like web frameworks, hoping to attract engineers to build with them. A consideration, therefore, has to be usability—they want to release not just the most powerful, but the most used model.
由于機器學習工程界正在Swift發展,因此公司正在發布諸如Web框架之類的新模型,希望吸引工程師與之一起構建。 因此,必須考慮可用性-他們不僅要發布功能最強大的模型,而且要發布使用最多的模型。
Obviously, the proliferation of machine learning has many implications, but for engineers, there are two big conclusions to draw from this GPT-3 situation:
顯然,機器學習的普及具有很多含義,但是對于工程師來說,從GPT-3的情況可以得出兩個大結論:
- It is easier than ever for you to actually build with machine learning. 使用機器學習進行實際構建比以往任何時候都容易。
- It is unlikely that in the near future you will be working on a piece of software that doesn’t not incorporate machine learning in some way. 在不久的將來,您不太可能會開發一款不會以某種方式并入機器學習的軟件。
Machine learning is becoming a standard part of the software stack, and that trend is only accelerating. If you’re not already, it’s time to get familiar with production machine learning.
機器學習正在成為軟件堆棧的標準部分,而且這種趨勢還在加速發展。 如果您還不是,請該熟悉生產機器學習了。
翻譯自: https://towardsdatascience.com/why-are-you-seeing-gpt-3-everywhere-f156a71b77b0
openai-gpt
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/389377.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/389377.shtml 英文地址,請注明出處:http://en.pswp.cn/news/389377.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!