Python讀取Word統計詞頻輸出到Excel

1.安裝依賴的包

```
"# 讀取docx\n",
? ? "!pip install python-docx\n",
? ? "!pip install -i https://pypi.tuna.tsinghua.edu.cn/simple python-docx\n",
? ? "# 中英文分詞\n",
? ? "!pip install jieba\n",
? ? "!pip install -i https://pypi.tuna.tsinghua.edu.cn/simple jieba\n",
? ? "# 輸出到excel\n",
? ? "!pip install pandas"
? ? "!pip install -i https://pypi.tuna.tsinghua.edu.cn/simple pandas"
```

2.讀取docx文件到一個大字符串

```python
import docx
from docx import Document
document = docx.Document("Python.docx")
content = " ".join([para.text for para in document.paragraphs])
```

3. 中文分詞

```
import jieba

seg_list = jieba.cut(content,cut_all=False)
print(type(seg_list))

# 過濾標點符號，無意義的單個字
seg_list = [
? ? word
? ? for word in seg_list
? ? if len(word) >1
]
print(seg_list[:30])
```

4.統計詞頻

```
from collections import Counter
counter = Counter(seg_list)
for key,count in list(counter.items())[:10]:
? ? print(key,count)
```

5. 構造pandas并且排序

```
import pandas as pd
df = pd.DataFrame(list(counter.items()), columns = ['word','count'])
df.sort_values(by="count",ascending=False,inplace=True)
df.head()
```

將list轉化為dict

```
a=['hello','world','1','2']
b= dict(zip(a[0::2],a[1::2]))
b
```
?

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/40439.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/40439.shtml
英文地址，請注明出處：http://en.pswp.cn/news/40439.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！