fasttext的基本使用 java 、python為例子
?
今天早上在地鐵上看到知乎上看到有人使用fasttext進行文本分類,到公司試了下情況在GitHub上找了下,最開始是c++版本的實現,不過有Java、Python版本的實現了,正好拿下來試試手,
?
python情況:
python版本參考,作者提供了詳細的實現,并且提供了中文分詞之后的數據,正好拿下來用用,感謝作者,代碼提供的數據作者都提供了,點后鏈接在上面有百度盤,可下載,java接口用到的數據也一樣:
?
- http://blog.csdn.net/lxg0807/article/details/52960072??
- import?logging??
- import?fasttext??
- logging.basicConfig(format='%(asctime)s?:?%(levelname)s?:?%(message)s',?level=logging.INFO)??
- #classifier?=?fasttext.supervised("fasttext/news_fasttext_train.txt","fasttext/news_fasttext.model",label_prefix="__label__")??
- #load訓練好的模型??
- classifier?=?fasttext.load_model('fasttext/news_fasttext.model.bin',?label_prefix='__label__')??
- result?=?classifier.test("fasttext/news_fasttext_test.txt")??
- print(result.precision)??
- print(result.recall)??
- labels_right?=?[]??
- texts?=?[]??
- with?open("fasttext/news_fasttext_test.txt")?as?fr:??
- ????lines?=?fr.readlines()??
- for?line?in?lines:??
- ????labels_right.append(line.split("\t")[1].rstrip().replace("__label__",""))??
- ????texts.append(line.split("\t")[0])??
- #?????print?labels??
- #?????print?texts??
- #?????break??
- labels_predict?=?[e[0]?for?e?in?classifier.predict(texts)]?#預測輸出結果為二維形式??
- #?print?labels_predict??
- text_labels?=?list(set(labels_right))??
- text_predict_labels?=?list(set(labels_predict))??
- print(text_predict_labels)??
- print(text_labels)??
- A?=?dict.fromkeys(text_labels,0)??#預測正確的各個類的數目??
- B?=?dict.fromkeys(text_labels,0)???#測試數據集中各個類的數目??
- C?=?dict.fromkeys(text_predict_labels,0)?#預測結果中各個類的數目??
- for?i?in?range(0,len(labels_right)):??
- ????B[labels_right[i]]?+=?1??
- ????C[labels_predict[i]]?+=?1??
- ????if?labels_right[i]?==?labels_predict[i]:??
- ????????A[labels_right[i]]?+=?1??
- print(A?)??
- print(B)??
- print(?C)??
- #計算準確率,召回率,F值??
- for?key?in?B:??
- ????p?=?float(A[key])?/?float(B[key])??
- ????r?=?float(A[key])?/?float(C[key])??
- ????f?=?p?*?r?*?2?/?(p?+?r)??
- ????print?("%s:\tp:%f\t%fr:\t%f"?%?(key,p,r,f))??
?
java版本情況:
- https://github.com/ivanhk/fastText_java??
- package?test;??
- import?java.util.List;??
- ??
- import?fasttext.FastText;??
- import?fasttext.Main;??
- import?fasttext.Pair;??
- ??
- public?class?Test?{??
- ????public?static?void?main(String[]?args)?throws?Exception?{??
- ??
- ????????String[]?text?=?{??
- ????????????????"supervised",??
- ????????????????"-input",??
- ????????????????"/Users/shuubiasahi/Documents/python/fasttext/news_fasttext_train.txt",??
- ????????????????"-output",?"/Users/shuubiasahi/Documents/faste.model",?"-dim",??
- ????????????????"10",?"-lr",?"0.1",?"-wordNgrams",?"2",?"-minCount",?"1",??
- ????????????????"-bucket",?"10000000",?"-epoch",?"5",?"-thread",?"4"?};??
- ????????Main?op?=?new?Main();??
- ????????op.train(text);??
- ????????FastText?fasttext?=?new?FastText();??
- ????????String[]?test?=?{?"就讀",?"科技",?"學生"?,"學生","學生"};??
- ????????fasttext.loadModel("/Users/shuubiasahi/Documents/faste.model.bin");??
- ????????List<Pair<Float,?String>>?list?=?fasttext.predict(test,?6);??//得到最大可能的六個預測概率??
- ????????for?(Pair<Float,?String>?parir?:?list)?{??
- ????????????System.out.println("key?is:"?+?parir.getKey()?+?"???value?is:"??
- ????????????????????+?parir.getValue());??
- ????????}??
- ????????System.out.println(Math.exp(list.get(0).getKey()));??//得到最大預測概率??
- ??
- ????}??
- ??
- }??
?
?
key is:0.0 ? value is:__label__edu
key is:-17.75125 ? value is:__label__affairs
key is:-17.75125 ? value is:__label__economic
key is:-17.75125 ? value is:__label__ent
key is:-17.75125 ? value is:__label__fashion
key is:-17.75125 ? value is:__label__game
1.0
?
注意fasttext對輸入格式有要求,label標簽使用 ?“__label__”+實際標簽的形式, ? over
有問題聯系我
?
?
?
?
2016年5月26 ? 我的模型已經上線了 ? ?效果還不錯 ??
?