文本訓練集_訓練文本中的不穩定性

文本訓練集

介紹 (Introduction)

In text generation, conventionally, maximum likelihood estimation is used to train a model to generate a text one token at a time. Each generated token will be compared against the ground-truth data. If any token is different from the actual token, this information will be used to update the model. However, such training may cause the generation to be generic or repetitive.

通常,在文本生成中,最大似然估計用于訓練模型以一次生成一個令牌的文本。 每個生成的令牌將與真實數據進行比較。 如果任何令牌與實際令牌不同,則此信息將用于更新模型。 但是,這樣的訓練可能導致生成是通用的或重復的。

Generative Adversarial Network (GAN) tackles this problem by introducing 2 models — generator and discriminator. The goal of the discriminator is to determine whether a sentence x is real or fake (fake refers to generated by models), whereas the generator attempts to produce a sentence that can fool the discriminator. These two models are competing against each other, which results in the improvement of both networks until the generator can produce a human-like sentence.

生成對抗網絡(GAN)通過引入兩種模型(生成器和鑒別器)解決了這個問題。 判別器的目標是確定句子x是真實的還是假的(偽造是指由模型生成),而生成器會嘗試生成可以使判別器蒙蔽的句子。 這兩個模型相互競爭,導致兩個網絡都得到改善,直到生成器可以產生類似于人的句子為止。

Although we may see some promising results with computer vision and text generation communities, getting hands-on this type of modeling is difficult.

盡管我們可能會在計算機視覺和文本生成社區中看到一些令人鼓舞的結果,但是很難進行這種類型的建模。

GANS問題 (Problem with GANS)

  1. Mode Collapse (Lack of Diversity) This is a common problem with GAN training. Mode collapse occurs when the model does not care about the input random noise, and it keeps generating the same sentence regardless of the input. In this sense, the model is now trying to fool the discriminator, and finding a single point is sufficient enough to do so.

    模式崩潰(缺乏多樣性)這是GAN訓練中的常見問題。 當模型不關心輸入的隨機噪聲時,就會發生模式崩潰,并且不管輸入如何,它都會不斷生成相同的句子。 從這個意義上講,模型現在正試圖欺騙鑒別器,找到一個單點足以做到這一點。

  2. Unstable Training. The most important problem is to ensure that the generator and the discriminator are on par to each other. If either one outperforms each other, the whole training will become unstable, and no useful information will be learned. For example, when the generator’s loss is slowly reducing, that means the generator starts to find a way to fool the discriminator even though the generation is still immature. On the other hand, when discriminator is OP, there is no new information for the generator to learn. Every generation will be evaluated as fake; therefore, the generator will have to rely on changing the word randomly in searching for a sentence that may fool the D.

    培訓不穩定。 最重要的問題是確保生成器和鑒別器彼此相同。 如果任何一個人的表現都超過對方,則整個培訓將變得不穩定,并且將不會學習有用的信息。 例如,當發電機的損耗逐漸降低時,這意味著即使發電機還不成熟,發電機也開始尋找一種欺騙鑒別器的方法。 另一方面,當判別器為OP時,沒有新的信息可供生成器學習。 每一代都將被視為假貨; 因此,生成器將不得不依靠隨機更改單詞來搜索可能使D蒙蔽的句子。

  3. Intuition is NOT Enough. Sometimes, your intended modeling is correct, but it may not work as you want it to be. It may require more than that to work. Frequently, you need to do hyperparameters tuning by tweaking learning rate, trying different loss functions, using batch norm, or trying different activation functions.

    直覺還不夠 。 有時,您預期的建模是正確的,但可能無法按您希望的那樣工作。 可能需要更多的工作。 通常,您需要通過調整學習率,嘗試使用不同的損失函數,使用批處理規范或嘗試使用不同的激活函數來進行超參數調整。

  4. Lots of Training Time. Some work reported training up to 400 epochs. That is tremendous if we compare with Seq2Seq that might take only 50 epochs or so to get a well-structured generation. The reason that causes it to be slow is the exploration of the generation. G does not receive any explicit signal of which token is bad. Rather it receives for the whole generation. To able to produce a natural sentence, G needs to explore various combinations of words to reach there. How often do you think G can accidentally produce <eos> out of nowhere? If we use MLE, the signal is pretty clear that there should be <eos> and there are <pad> right after it.

    大量的培訓時間。 一些工作報告說培訓了多達400個紀元。 如果我們與Seq2Seq進行比較,那可能只花費50個紀元左右即可得到結構良好的世代,這是巨大的。 導致它變慢的原因是一代人的探索。 G沒有收到任何明顯的信號,指出哪個令牌不好。 相反,它為整個世代所接受。 為了產生自然的句子,G需要探索各種單詞組合以到達那里。 您認為G多久會偶然地偶然產生<eos>? 如果我們使用MLE,則信號很清楚,應該有<eos>,緊隨其后的是<pad>。

潛在解決方案 (Potential Solutions)

Many approaches have been attempted to handle this type of training.

已經嘗試了許多方法來處理這種訓練。

  1. Use ADAM Optimizer. Some suggest using the ADAM for the generator and SGD for the discriminator. But most importantly, some paper starts to tweak the beta for the ADAM. betas=(0.5, 0.999)

    使用ADAM優化器 。 有些人建議使用ADAM作為生成器,使用SGD作為鑒別器。 但最重要的是,一些論文開始調整ADAM的beta版本。 beta =(0.5,0.999)

  2. Wasserstein GAN. Some work reports using WGAN can stabilize the training greatly. From our experiments, however, WGAN can not even reach the quality of regular GAN. Perhaps we are missing something. (See? It’s quite difficult)

    瓦瑟斯坦甘 。 使用WGAN的一些工作報告可以大大穩定培訓。 但是,根據我們的實驗,WGAN甚至無法達到常規GAN的質量。 也許我們缺少了一些東西。 (看?這很困難)

  3. GAN Variation. Some suggest trying KL-GAN, or VAE-GAN. These can make the models easier to train.

    GAN變化 。 有些人建議嘗試KL-GAN或VAE-GAN。 這些可以使模型更容易訓練。

  4. Input Noise to the Discriminator. To make the discriminator’s learning on par with the generator, which in general have a harder time than the D, we input some noise along with the input as well as using dropout to make things easier.

    鑒別器的輸入噪聲 。 為了使鑒別器的學習與生成器(通常比生成器困難)相提并論,我們在輸入的同時輸入一些噪聲,并使用壓差使事情變得更容易。

  5. DCGAN (Deep Convolutional GAN). This is only for computer vision tasks. However, this model is known to avoid unstable training. The key in this model is to not use ReLU, use BatchNorm, and use Strided Convolution.

    DCGAN(深度卷積GAN) 。 這僅用于計算機視覺任務。 但是,已知該模型可以避免不穩定的訓練。 該模型的關鍵是不使用ReLU,使用BatchNorm和使用Strided Convolution。

  6. Ensemble of Discriminator. Instead of having a single discriminator, multiple discriminators are trained with different batch, to capture different aspects of respect. Thus, the generator can not just fool a single D, but to be more generalized so that it can fool all of them. This is also related to Dropout GAN (many D and dropout some during training).

    鑒別器合奏 。 代替單個鑒別器,而是用不同的批次訓練多個鑒別器,以捕獲尊重的不同方面。 因此,生成器不僅可以欺騙單個D,還可以對其進行更廣泛的概括以使其欺騙所有D。 這也與輟學GAN(許多D,并且在培訓期間輟學)有關。

  7. Parameter Tuning. With the learning rate, dropout ratio, batch size, and so on. It is difficult to determine how much a model is better than another. Therefore, some would test on multiple parameters and see whichever works best. One bottleneck is there is no evaluation metric for GAN, which results in a lot of manual checks to determine the quality.

    參數調整 。 具有學習率,輟學率,批量大小等。 很難確定一個模型比另一個模型好多少。 因此,有些人會在多個參數上進行測試,然后看哪個效果最好。 一個瓶頸是沒有針對GAN的評估指標,這導致需要大量手動檢查來確定質量。

  8. Scheduling G and D. Trying to learn G 5 times followed by D 1 times are reported to be useless in many work. If you want to try scheduling, do something more meaningful.

    安排G和D。 據報告,嘗試學習G 5次然后學習D 1次在許多工作中是沒有用的。 如果您想嘗試安排時間,請做一些更有意義的事情。

while 
train_G()while discriminator
train_D()

結論 (Conclusion)

Adversarial-based text generation opens a new avenue on how a model is trained. Instead of relying on MLE, discriminator(s) are used to signal whether or not the generation is correct. However, such training has its downside that it is quite hard to train. Many studies suggest some tips on how to avoid the problems are described above; however, you need to try a variety of settings (or parameters) to assure your generative model can learn properly.

基于對抗的文本生成為如何訓練模型開辟了一條新途徑。 鑒別器不是依靠MLE,而是用于發信號通知生成是否正確。 但是,這種訓練有其缺點,那就是很難訓練。 上面有許多研究提出了一些避免問題的技巧。 但是,您需要嘗試各種設置(或參數)以確保生成模型可以正確學習。

進一步閱讀 (Further Reading)

翻譯自: https://towardsdatascience.com/instability-in-training-text-gan-20273d6a859a

文本訓練集

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/392550.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/392550.shtml
英文地址,請注明出處:http://en.pswp.cn/news/392550.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

山東省賽 傳遞閉包

https://vjudge.net/contest/311348#problem/A 思路&#xff1a;用floyd傳遞閉包處理點與點之間的關系&#xff0c;之后開數組記錄每個數字比它大的個數和小的個數&#xff0c;如果這個個數超過n/2那么它不可能作為中位數&#xff0c;其他的都有可能。 #include<bits/stdc.h…

如何使用動態工具提示構建React Native圖表

by Vikrant Negi通過Vikrant Negi 如何使用動態工具提示構建React Native圖表 (How to build React Native charts with dynamic tooltips) Creating charts, be it on the web or on mobile apps, has always been an interesting and challenging task especially in React …

如何解決ajax跨域問題(轉)

由 于此前很少寫前端的代碼(哈哈&#xff0c;不合格的程序員啊)&#xff0c;最近項目中用到json作為系統間交互的手段&#xff0c;自然就伴隨著眾多ajax請求&#xff0c;隨之而來的就是要解決 ajax的跨域問題。本篇將講述一個小白從遇到跨域不知道是跨域問題&#xff0c;到知道…

mysql并發錯誤_又談php+mysql并發數據出錯問題

最近&#xff0c;項目中的所有crond定時盡量取消&#xff0c;改成觸發式。比如每日6點清理數據。原來的邏輯&#xff0c;寫一個crond定時搞定現在改為觸發式6點之后第一個玩家/用戶 進入&#xff0c;才開始清理數據。出現了一個問題1 如何確保第一個玩家觸發&#xff1f;updat…

leetcode 621. 任務調度器(貪心算法)

給你一個用字符數組 tasks 表示的 CPU 需要執行的任務列表。其中每個字母表示一種不同種類的任務。任務可以以任意順序執行&#xff0c;并且每個任務都可以在 1 個單位時間內執行完。在任何一個單位時間&#xff0c;CPU 可以完成一個任務&#xff0c;或者處于待命狀態。 然而&…

英國腦科學領域_來自英國A級算法崩潰的數據科學家的4課

英國腦科學領域In the UK, families, educators, and government officials are in an uproar about the effects of a new algorithm for scoring “A-levels,” the advanced level qualifications used to evaluate students’ knowledge of specific subjects in preparati…

MVC發布后項目存在于根目錄中的子目錄中時的css與js、圖片路徑問題

加載固定資源js與css <script src"Url.Content("~/Scripts/js/jquery.min.js")" type"text/javascript"></script> <link href"Url.Content("~/Content/css/shop.css")" rel"stylesheet" type&quo…

telegram 機器人_學習使用Python在Telegram中構建您的第一個機器人

telegram 機器人Imagine this, there is a message bot that will send you a random cute dog image whenever you want, sounds cool right? Let’s make one!想象一下&#xff0c;有一個消息機器人可以隨時隨地向您發送隨機的可愛狗圖像&#xff0c;聽起來很酷吧&#xff1…

判斷輸入的字符串是否為回文_刷題之路(九)--判斷數字是否回文

Palindrome Number問題簡介&#xff1a;判斷輸入數字是否是回文,不是返回0,負數返回0舉例:1:輸入: 121輸出: true2:輸入: -121輸出: false解釋: 回文為121-&#xff0c;所以負數都不符合3:輸入: 10輸出: false解釋: 倒序為01&#xff0c;不符合要求解法一&#xff1a;這道題比較…

python + selenium 搭建環境步驟

介紹在windows下&#xff0c;selenium python的安裝以及配置。1、首先要下載必要的安裝工具。 下載python&#xff0c;我安裝的python3.0版本,根據你自己的需要安裝下載setuptools下載pip(python的安裝包管理工具) 配置系統的環境變量 python,需要配置2個環境變量C:\Users\AppD…

VirtualBox 虛擬機復制

本文簡單講兩種情況下的復制方式 1 跨電腦復制 2 同一virtrul box下 虛擬機復制 ---------------------------------------------- 1 跨電腦復制 a虛擬機 是老的虛擬機 b虛擬機 是新的虛擬機 新虛擬機b 新建&#xff0c; 點擊下一步會生成 相應的文件夾 找到老虛擬機a的 vdi 文…

javascript實用庫_編寫實用JavaScript的實用指南

javascript實用庫by Nadeesha Cabral通過Nadeesha Cabral 編寫實用JavaScript的實用指南 (A practical guide to writing more functional JavaScript) Functional programming is great. With the introduction of React, more and more JavaScript front-end code is being …

數據庫數據過長避免_為什么要避免使用商業數據科學平臺

數據庫數據過長避免讓我們從一個類比開始 (Lets start with an analogy) Stick with me, I promise it’s relevant.堅持下去&#xff0c;我保證這很重要。 If your selling vegetables in a grocery store your business value lies in your loyal customers and your positi…

mysql case快捷方法_MySQL case when使用方法實例解析

首先我們創建數據庫表&#xff1a; CREATE TABLE t_demo (id int(32) NOT NULL,name varchar(255) DEFAULT NULL,age int(2) DEFAULT NULL,num int(3) DEFAULT NULL,PRIMARY KEY (id)) ENGINEInnoDB DEFAULT CHARSETutf8;插入數據&#xff1a;INSERT INTO t_demo VALUES (1, 張…

【~~~】POJ-1006

很簡單的一道題目&#xff0c;但是引出了很多知識點。 這是一道中國剩余問題&#xff0c;先貼一下1006的代碼。 #include "stdio.h" #define MAX 21252 int main() { int p , e , i , d , n 1 , days 0; while(1) { scanf("%d %d %d %d",&p,&e,&…

Java快速掃盲指南

文章轉自&#xff1a;https://segmentfault.com/a/1190000004817465#articleHeader22 JDK&#xff0c;JRE和 JVM 的區別 JVM&#xff1a;java 虛擬機&#xff0c;負責將編譯產生的字節碼轉換為特定機器代碼&#xff0c;實現一次編譯多處執行&#xff1b; JRE&#xff1a;java運…

xcode擴展_如何將Xcode插件轉換為Xcode擴展名

xcode擴展by Khoa Pham通過Khoa Pham 如何將Xcode插件轉換為Xcode擴展名 (How to convert your Xcode plugins to Xcode extensions) Xcode is an indispensable IDE for iOS and macOS developers. From the early days, the ability to build and install custom plugins ha…

leetcode 861. 翻轉矩陣后的得分(貪心算法)

有一個二維矩陣 A 其中每個元素的值為 0 或 1 。 移動是指選擇任一行或列&#xff0c;并轉換該行或列中的每一個值&#xff1a;將所有 0 都更改為 1&#xff0c;將所有 1 都更改為 0。 在做出任意次數的移動后&#xff0c;將該矩陣的每一行都按照二進制數來解釋&#xff0c;矩…

數據分析團隊的價值_您的數據科學團隊的價值

數據分析團隊的價值This is the first article in a 2-part series!!這是分兩部分的系列文章中的第一篇&#xff01; 組織數據科學 (Organisational Data Science) Few would argue against the importance of data in today’s highly competitive corporate world. The tech…

mysql 保留5位小數_小猿圈分享-MySQL保留幾位小數的4種方法

今天小猿圈給大家分享的是MySQL使用中4種保留小數的方法&#xff0c;希望可以幫助到大家&#xff0c;讓大家的工作更加方便。1 round(x,d)用于數據x的四舍五入, round(x) ,其實就是round(x,0),也就是默認d為0&#xff1b;這里有個值得注意的地方是&#xff0c;d可以是負數&…