【Python NTLK自然語言處理庫】

安裝流程

import nltk
nltk.download()

運行后出現一個界面，然后按Download
在這里插入圖片描述

Tokenize

###分詞

from nltk.tokenize import word_tokenize
text = "The vendor paid $20,000,000."
tokens = word_tokenize(text)
print(tokens)

輸出

['The', 'vendor', 'paid', '$', '20,000,000', '.']

###分句

import nltk
sents = "I am Angela. I am happy."
sens= nltk.sent_tokenize(sents)
print(sens)

輸出

['I am Angela.', 'I am happy.']

###中文分詞

from jieba import lcut
chinese_sentence = "我正在練習自然語言處理。"
chinese_tokens = lcut(chinese_sentence)
print(chinese_tokens)

輸出

['我', '正在', '練習', '自然', '語言', '處理', '。']

停用詞

過濾停用詞

from nltk.corpus import stopwords  
from nltk.tokenize import word_tokenize  
text = "I would like to watch movie."  
tokens = word_tokenize(text) tokens
print(tokens)  
stopwords_list = set(stopwords.words('english'))  
filtered_tokens = [word for word in tokens if word.lower() not in stopwords_list]
print(filtered_tokens)

輸出

['I', 'would', 'like', 'to', 'watch', 'movie', '.']
['would', 'like', 'watch', 'movie', '.']

標簽

import nltk
sentence = "I am happy."
tokens = nltk.word_tokenize(sentence)
pos_tags = nltk.pos_tag(tokens)
print(pos_tags)

輸出

[('I', 'PRP'), ('am', 'VBP'), ('happy', 'JJ'), ('.', '.')]

詞頻

import nltk
from nltk.corpus import stopwords
sentence="I would like to buy a book. The book was bought by me."
full_stop = "."
tokens = nltk.word_tokenize(sentence.lower())
stopwords_list = set(stopwords.words('english'))
stopwords_list.add(full_stop)
filtered_tokens = [word for word in tokens if word not in stopwords_list]
print(filtered_tokens)
freq = nltk.FreqDist(filtered_tokens)
for key,val in freq.items():print (str(key) + ':' + str(val))
standard_freq=freq.most_common(3)
print(standard_freq)

輸出

['would', 'like', 'buy', 'book', 'book', 'bought']
would:1
like:1
buy:1
book:2
bought:1
[('book', 2), ('would', 1), ('like', 1)]

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/bicheng/94571.shtml
繁體地址，請注明出處：http://hk.pswp.cn/bicheng/94571.shtml
英文地址，請注明出處：http://en.pswp.cn/bicheng/94571.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！