文本解碼就是根據當前已經輸入的內容不斷地預測下一個詞,前期通過大量的文本文章等輸入,讓模型學習好以后,根據已學習的內容,不斷預測下一個詞。就像鸚鵡學舌一樣你不斷的叫他說你好大帥哥,你好大帥哥。后面某一天,當你說你好的時候,他會自然的接著說大帥哥。文本解碼同理。
不過內容量會大很多,除了會說你好大帥哥,也會說你好大美女。那AI是怎么知道應該說哪個。他會看前文,因為我們喂給他文章里面,“女”這個詞總是關聯出現大美女,所以當前面出現女,接著說你好的時候,他就知道大美女的概率高于大帥哥,就是優先出現大帥哥。
import mindspore
from mindnlp.transformers import GPT2Tokenizer, GPT2LMHeadModeltokenizer = GPT2Tokenizer.from_pretrained("iiBcai/gpt2", mirror='modelscope')# add the EOS token as PAD token to avoid warnings
model = GPT2LMHeadModel.from_pretrained("iiBcai/gpt2", pad_token_id=tokenizer.eos_token_id, mirror='modelscope')# encode context the generation is conditioned on
input_ids = tokenizer.encode('I enjoy walking with my cute dog', return_tensors='ms')mindspore.set_seed(0)
# set top_k = 50 and set top_p = 0.95 and num_return_sequences = 3
sample_outputs = model.generate(input_ids,do_sample=True,max_length=50,top_k=5,top_p=0.95,num_return_sequences=3
)print("Output:\n" + 100 * '-')
for i, sample_output in enumerate(sample_outputs):print("{}: {}".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))
Output:
----------------------------------------------------------------------------------------------------
0: I enjoy walking with my cute dog."My dog loves the smell of the dog. I'm so happy that she's happy with me."I love to walk with my dog. I'm so happy that she's happy
1: I enjoy walking with my cute dog. I'm a big fan of my cat and her dog, but I don't have the same enthusiasm for her. It's hard not to like her because it is my dog.My husband, who
2: I enjoy walking with my cute dog, but I'm also not sure I would want my dog to walk alone with me."She also told The Daily Beast that the dog is very protective."I think she's very protective of
類似 這個示例,當輸入I enjoy walking with my cute dog
的時候,AI會一直續寫下去,總體看上去,效果還是很不錯的。