文本訓練集
介紹 (Introduction)
In text generation, conventionally, maximum likelihood estimation is used to train a model to generate a text one token at a time. Each generated token will be compared against the ground-truth data. If any token is different from the actual token, this information will be used to update the model. However, such training may cause the generation to be generic or repetitive.
通常,在文本生成中,最大似然估計用于訓練模型以一次生成一個令牌的文本。 每個生成的令牌將與真實數據進行比較。 如果任何令牌與實際令牌不同,則此信息將用于更新模型。 但是,這樣的訓練可能導致生成是通用的或重復的。
Generative Adversarial Network (GAN) tackles this problem by introducing 2 models — generator and discriminator. The goal of the discriminator is to determine whether a sentence x is real or fake (fake refers to generated by models), whereas the generator attempts to produce a sentence that can fool the discriminator. These two models are competing against each other, which results in the improvement of both networks until the generator can produce a human-like sentence.
生成對抗網絡(GAN)通過引入兩種模型(生成器和鑒別器)解決了這個問題。 判別器的目標是確定句子x是真實的還是假的(偽造是指由模型生成),而生成器會嘗試生成可以使判別器蒙蔽的句子。 這兩個模型相互競爭,導致兩個網絡都得到改善,直到生成器可以產生類似于人的句子為止。
Although we may see some promising results with computer vision and text generation communities, getting hands-on this type of modeling is difficult.
盡管我們可能會在計算機視覺和文本生成社區中看到一些令人鼓舞的結果,但是很難進行這種類型的建模。
GANS問題 (Problem with GANS)
Mode Collapse (Lack of Diversity) This is a common problem with GAN training. Mode collapse occurs when the model does not care about the input random noise, and it keeps generating the same sentence regardless of the input. In this sense, the model is now trying to fool the discriminator, and finding a single point is sufficient enough to do so.
模式崩潰(缺乏多樣性)這是GAN訓練中的常見問題。 當模型不關心輸入的隨機噪聲時,就會發生模式崩潰,并且不管輸入如何,它都會不斷生成相同的句子。 從這個意義上講,模型現在正試圖欺騙鑒別器,找到一個單點足以做到這一點。
Unstable Training. The most important problem is to ensure that the generator and the discriminator are on par to each other. If either one outperforms each other, the whole training will become unstable, and no useful information will be learned. For example, when the generator’s loss is slowly reducing, that means the generator starts to find a way to fool the discriminator even though the generation is still immature. On the other hand, when discriminator is OP, there is no new information for the generator to learn. Every generation will be evaluated as fake; therefore, the generator will have to rely on changing the word randomly in searching for a sentence that may fool the D.
培訓不穩定。 最重要的問題是確保生成器和鑒別器彼此相同。 如果任何一個人的表現都超過對方,則整個培訓將變得不穩定,并且將不會學習有用的信息。 例如,當發電機的損耗逐漸降低時,這意味著即使發電機還不成熟,發電機也開始尋找一種欺騙鑒別器的方法。 另一方面,當判別器為OP時,沒有新的信息可供生成器學習。 每一代都將被視為假貨; 因此,生成器將不得不依靠隨機更改單詞來搜索可能使D蒙蔽的句子。
Intuition is NOT Enough. Sometimes, your intended modeling is correct, but it may not work as you want it to be. It may require more than that to work. Frequently, you need to do hyperparameters tuning by tweaking learning rate, trying different loss functions, using batch norm, or trying different activation functions.
直覺還不夠 。 有時,您預期的建模是正確的,但可能無法按您希望的那樣工作。 可能需要更多的工作。 通常,您需要通過調整學習率,嘗試使用不同的損失函數,使用批處理規范或嘗試使用不同的激活函數來進行超參數調整。
Lots of Training Time. Some work reported training up to 400 epochs. That is tremendous if we compare with Seq2Seq that might take only 50 epochs or so to get a well-structured generation. The reason that causes it to be slow is the exploration of the generation. G does not receive any explicit signal of which token is bad. Rather it receives for the whole generation. To able to produce a natural sentence, G needs to explore various combinations of words to reach there. How often do you think G can accidentally produce <eos> out of nowhere? If we use MLE, the signal is pretty clear that there should be <eos> and there are <pad> right after it.
大量的培訓時間。 一些工作報告說培訓了多達400個紀元。 如果我們與Seq2Seq進行比較,那可能只花費50個紀元左右即可得到結構良好的世代,這是巨大的。 導致它變慢的原因是一代人的探索。 G沒有收到任何明顯的信號,指出哪個令牌不好。 相反,它為整個世代所接受。 為了產生自然的句子,G需要探索各種單詞組合以到達那里。 您認為G多久會偶然地偶然產生<eos>? 如果我們使用MLE,則信號很清楚,應該有<eos>,緊隨其后的是<pad>。
潛在解決方案 (Potential Solutions)
Many approaches have been attempted to handle this type of training.
已經嘗試了許多方法來處理這種訓練。
Use ADAM Optimizer. Some suggest using the ADAM for the generator and SGD for the discriminator. But most importantly, some paper starts to tweak the beta for the ADAM. betas=(0.5, 0.999)
使用ADAM優化器 。 有些人建議使用ADAM作為生成器,使用SGD作為鑒別器。 但最重要的是,一些論文開始調整ADAM的beta版本。 beta =(0.5,0.999)
Wasserstein GAN. Some work reports using WGAN can stabilize the training greatly. From our experiments, however, WGAN can not even reach the quality of regular GAN. Perhaps we are missing something. (See? It’s quite difficult)
瓦瑟斯坦甘 。 使用WGAN的一些工作報告可以大大穩定培訓。 但是,根據我們的實驗,WGAN甚至無法達到常規GAN的質量。 也許我們缺少了一些東西。 (看?這很困難)
GAN Variation. Some suggest trying KL-GAN, or VAE-GAN. These can make the models easier to train.
GAN變化 。 有些人建議嘗試KL-GAN或VAE-GAN。 這些可以使模型更容易訓練。
Input Noise to the Discriminator. To make the discriminator’s learning on par with the generator, which in general have a harder time than the D, we input some noise along with the input as well as using dropout to make things easier.
鑒別器的輸入噪聲 。 為了使鑒別器的學習與生成器(通常比生成器困難)相提并論,我們在輸入的同時輸入一些噪聲,并使用壓差使事情變得更容易。
DCGAN (Deep Convolutional GAN). This is only for computer vision tasks. However, this model is known to avoid unstable training. The key in this model is to not use ReLU, use BatchNorm, and use Strided Convolution.
DCGAN(深度卷積GAN) 。 這僅用于計算機視覺任務。 但是,已知該模型可以避免不穩定的訓練。 該模型的關鍵是不使用ReLU,使用BatchNorm和使用Strided Convolution。
Ensemble of Discriminator. Instead of having a single discriminator, multiple discriminators are trained with different batch, to capture different aspects of respect. Thus, the generator can not just fool a single D, but to be more generalized so that it can fool all of them. This is also related to Dropout GAN (many D and dropout some during training).
鑒別器合奏 。 代替單個鑒別器,而是用不同的批次訓練多個鑒別器,以捕獲尊重的不同方面。 因此,生成器不僅可以欺騙單個D,還可以對其進行更廣泛的概括以使其欺騙所有D。 這也與輟學GAN(許多D,并且在培訓期間輟學)有關。
Parameter Tuning. With the learning rate, dropout ratio, batch size, and so on. It is difficult to determine how much a model is better than another. Therefore, some would test on multiple parameters and see whichever works best. One bottleneck is there is no evaluation metric for GAN, which results in a lot of manual checks to determine the quality.
參數調整 。 具有學習率,輟學率,批量大小等。 很難確定一個模型比另一個模型好多少。 因此,有些人會在多個參數上進行測試,然后看哪個效果最好。 一個瓶頸是沒有針對GAN的評估指標,這導致需要大量手動檢查來確定質量。
Scheduling G and D. Trying to learn G 5 times followed by D 1 times are reported to be useless in many work. If you want to try scheduling, do something more meaningful.
安排G和D。 據報告,嘗試學習G 5次然后學習D 1次在許多工作中是沒有用的。 如果您想嘗試安排時間,請做一些更有意義的事情。
while
train_G()while discriminator
train_D()
結論 (Conclusion)
Adversarial-based text generation opens a new avenue on how a model is trained. Instead of relying on MLE, discriminator(s) are used to signal whether or not the generation is correct. However, such training has its downside that it is quite hard to train. Many studies suggest some tips on how to avoid the problems are described above; however, you need to try a variety of settings (or parameters) to assure your generative model can learn properly.
基于對抗的文本生成為如何訓練模型開辟了一條新途徑。 鑒別器不是依靠MLE,而是用于發信號通知生成是否正確。 但是,這種訓練有其缺點,那就是很難訓練。 上面有許多研究提出了一些避免問題的技巧。 但是,您需要嘗試各種設置(或參數)以確保生成模型可以正確學習。
進一步閱讀 (Further Reading)
翻譯自: https://towardsdatascience.com/instability-in-training-text-gan-20273d6a859a
文本訓練集
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/392550.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/392550.shtml 英文地址,請注明出處:http://en.pswp.cn/news/392550.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!