【63 Pandas+Pyecharts | 泡泡瑪特微博熱搜評論數據分析可視化】

文章目錄

🏳??🌈 1. 導入模塊
🏳??🌈 2. Pandas數據處理
- 2.1 讀取數據
- 2.2 數據信息
- 2.3 數據去重
- 2.4 數據去空
- 2.5 時間處理
- 2.6 性別處理
- 2.7 評論內容處理
🏳??🌈 3. Pyecharts數據可視化
- 3.1 用戶評論IP分布
- 3.2 話題點贊熱度趨勢
- 3.3 話題評論熱度趨勢
- 3.4 各個時間段評論數量
- 3.5 評論點贊量分布
- 3.6 輿論情感分布
- 3.7 用戶性別占比
- 3.8 用戶性別占比
🏳??🌈 4. 可視化項目源碼+數據

大家好，我是 👉 【Python當打之年(點擊跳轉)】

本期我們利用Python分析「微博泡泡瑪特熱搜評論數據集」，看看：各用戶評論IP地圖分布、話題點贊熱度趨勢、話題評論熱度趨勢、各個時間段評論數量、輿論情感分布、用戶性別占比、評論內容詞云等等，希望對大家有所幫助，如有疑問或者需要改進的地方可以聯系小編。

在這里插入圖片描述

涉及到的庫：

Pandas— 數據處理
Pyecharts— 數據可視化

🏳??🌈 1. 導入模塊

import jieba
import pandas as pd
from snownlp import SnowNLP
from pyecharts.charts import *
from pyecharts import options as opts
import warnings
warnings.filterwarnings('ignore')

🏳??🌈 2. Pandas數據處理

2.1 讀取數據

df = pd.read_excel('微博泡泡瑪特數據.xlsx')

在這里插入圖片描述

2.2 數據信息

df.info()

在這里插入圖片描述

2.3 數據去重

df1 = df.drop_duplicates()

2.4 數據去空

df1 = df1.dropna()

2.5 時間處理

df1['發布時間_s'] = df1['時間'].str[:10]
df1['時間_d'] = pd.to_datetime(df1['時間']).dt.day
df1['時間_h'] = pd.to_datetime(df1['時間']).dt.hour

2.6 性別處理

df1['性別'] = df1['性別'].replace({'f':'女性','m':'男性'})

2.7 評論內容處理

score = []
for comm in comments:s = SnowNLP(comm)score.append(round(s.sentiments,4))
df1['情感評分'] = score
df1['情感評分區間'] = pd.cut(df1['情感評分'],bins=[0,0.3,0.7,1],labels=['消極','中性','積極'])

在這里插入圖片描述

🏳??🌈 3. Pyecharts數據可視化

3.1 用戶評論IP分布

def get_chart():chart = (Map().add('', data, 'china').set_global_opts(title_opts=opts.TitleOpts(title='1-用戶評論IP分布',subtitle=subtitle,pos_top='2%',pos_left='center',title_textstyle_opts=opts.TextStyleOpts(font_size=20)),visualmap_opts=opts.VisualMapOpts(is_show=True,pos_left='15%',pos_bottom='10%',range_color=range_color),legend_opts=opts.LegendOpts(is_show=False)))

在這里插入圖片描述

東部地區評論數量要明顯高于中西部地區，沿海地區更為明顯，也從側面反映了當地的經濟情況。

3.2 話題點贊熱度趨勢

在這里插入圖片描述

話題熱度在06-08當天最高，后續持續下降，符合一般的輿情趨勢。

3.3 話題評論熱度趨勢

def get_chart():chart = (Line().add_xaxis(x_data).add_yaxis('', y_data).set_colors(range_color[1]).set_global_opts(title_opts=opts.TitleOpts(title='3-話題評論熱度趨勢',subtitle=subtitle,pos_top='2%',pos_left='center',title_textstyle_opts=opts.TextStyleOpts(font_size=20)),legend_opts=opts.LegendOpts(is_show=False)))

在這里插入圖片描述

3.4 各個時間段評論數量

在這里插入圖片描述

從評論時間上來看，在晚上的19:00-21:00期間評論量達到頂峰，其他時間較平緩，在早上的07:00-09:00出現次高峰，這個時間也是上班高峰時間。

3.5 評論點贊量分布

def get_chart():chart = (Scatter().add_xaxis(x_data).add_yaxis('', y_data,label_opts=opts.LabelOpts(is_show=False)).set_global_opts(title_opts=opts.TitleOpts(title='5-評論點贊量分布',subtitle=subtitle,pos_top='2%',pos_left='center',title_textstyle_opts=opts.TextStyleOpts(font_size=20)),visualmap_opts=opts.VisualMapOpts(is_show=False,range_color=range_color),legend_opts=opts.LegendOpts(is_show=False)))

在這里插入圖片描述

3.6 輿論情感分布

在這里插入圖片描述

輿情方面在，大眾的積極情緒占比還是最多的，但是和中性情緒相差不是很明顯，說明正向反向輿情存在一定波動。

3.7 用戶性別占比

def get_chart():chart = (Pie().add('',datas,center=['50%', '50%'],).set_global_opts(title_opts=opts.TitleOpts(title='7-用戶性別占比',subtitle=subtitle,pos_top='2%',pos_left='center',title_textstyle_opts=opts.TextStyleOpts(font_size=20)),legend_opts=opts.LegendOpts(is_show=True,pos_top='12%')))

在這里插入圖片描述

用戶性別占比，男女基本持平，說明此輿情和性別關系不大。

3.8 用戶性別占比

def get_chart():chart = (WordCloud().add('', words, word_size_range=[20, 50]).set_global_opts(title_opts=opts.TitleOpts(title='8-評論內容詞云',pos_top='2%', pos_left='center',title_textstyle_opts=opts.TextStyleOpts(font_size=20)),visualmap_opts=opts.VisualMapOpts(is_show=False,range_color=range_color),))

在這里插入圖片描述

🏳??🌈 4. 可視化項目源碼+數據

點擊跳轉：【全部可視化項目源碼+數據】

以上就是本期為大家整理的全部內容了，趕快練習起來吧，原創不易，喜歡的朋友可以點贊、收藏也可以分享（注明出處）讓更多人知道。

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/909891.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/909891.shtml
英文地址，請注明出處：http://en.pswp.cn/news/909891.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！