【實戰】deepseek數據分類用戶評論數據

在平時的工作中，我們會遇到數據分類的情況，比如將一些文本劃分為各個標簽。如果人工分類這塊的工作量將是非常大，而且分類數據的準確性也不高。我們需要用到一些工具來實現。提高效率的同時也提高準確率。

1.示例數據

用戶ID	時間戳	評論場景	評論內容
U001	2023/10/1 9:05	電商購物	"剛收到快遞，包裝完好，實物比圖片還漂亮！"
U001	2023/10/3 14:30	電商購物	"用了兩天發現電池續航很差，和宣傳不符，失望。"
U001	2023/10/5 11:15	客服溝通	"客服很快解決了問題，補償了優惠券，態度點贊！"
U002	2023/10/2 18:20	社交媒體	"今天和朋友聚餐，餐廳氛圍超棒，但菜品有點咸。"
U003	2023/10/4 10:00	旅行預訂	"航班延誤了3小時，機場服務混亂，體驗極差！"
U003	2023/10/4 15:45	旅行預訂	"酒店免費升級了海景房，意外驚喜！"

2.數據分析

數據清洗

通過python工具去除文字中的特殊符號。

安裝依賴

pip install pandas snownlp matplotlib openpyxl jinja2

代碼實戰

import pandas as pdfrom snownlp import SnowNLPimport matplotlib.pyplot as pltfrom datetime import datetime# 1. 數據加載df = pd.read_excel("數據分析.xlsx", sheet_name="Sheet1”)# 2. 情緒分析函數（使用SnowNLP中文情感分析）def classify_sentiment(text):score = SnowNLP(text).sentimentsif score > 0.6:return ("積極", score)elif score < 0.4:return ("消極", score)else:return ("中性", score)# 應用情緒分類df[["情緒標簽", "情緒強度"]] = df["評論內容"].apply(lambda x: pd.Series(classify_sentiment(x)))# 3. 生成統計報告report = df.groupby("情緒標簽").agg(評論數量=("用戶ID", "count"),用戶數=("用戶ID", pd.Series.nunique),平均情緒強度=("情緒強度", "mean")).reset_index()# 4. 用戶情緒軌跡分析user_timelines = []for uid, group in df.groupby("用戶ID"):timeline = group.sort_values("時間戳").reset_index(drop=True)user_timelines.append({"用戶ID": uid,"情緒變化序列": " → ".join(timeline["情緒標簽"]),"關鍵轉折點": timeline.iloc[[0, -1]][["時間戳", "情緒標簽"]].to_dict("records")})# 5. 可視化生成# 設置matplotlib的字體配置plt.rcParams['axes.unicode_minus'] = False? # 解決負號 '-' 顯示為方塊的問題plt.rcParams['font.family'] = 'Kaiti SC'? # 可以替換為其他字體plt.figure(figsize=(12, 6))# 情緒分布餅圖ax1 = plt.subplot(121)df["情緒標簽"].value_counts().plot.pie(autopct="%1.1f%%", ax=ax1)ax1.set_title("情緒分布比例")# 時間線示例（U001）ax2 = plt.subplot(122)u001 = df[df["用戶ID"] == "U001"].sort_values("時間戳")ax2.plot(u001["時間戳"], u001["情緒強度"], marker="o", linestyle="--")ax2.set_title("U001情緒波動趨勢")plt.xticks(rotation=45)plt.tight_layout()plt.savefig("sentiment_analysis.png", dpi=300)# 6. 導出報告with pd.ExcelWriter("情緒分析報告.xlsx") as writer:df.to_excel(writer, sheet_name="原始數據+情緒標注", index=False)pd.DataFrame(report).to_excel(writer, sheet_name="統計摘要", index=False)pd.DataFrame(user_timelines).to_excel(writer, sheet_name="用戶軌跡", index=False)print("分析完成！生成文件：情緒分析報告.xlsx 和 sentiment_analysis.png")

生成文件

原始數據+情緒標注

統計摘要

用戶分析

分析餅圖

效率提升

目前模型如果在大數據下會比較慢，需要更換模型

# 使用HuggingFace中文模型（需GPU支持）

from transformers import pipelineclassifier = pipeline("text-classification", model="uer/roberta-base-finetuned-jd-binary-chinese")

實時監控集成

# 示例：Flask API端點

from flask import Flask, requestapp = Flask(__name__)@app.route("/predict", methods=["POST"])def predict():text = request.json["text"]return {"sentiment": classify_sentiment(text)}

動態閾值調整

# 基于歷史數據自動校準閾值

def auto_threshold(df):q_low = df["情緒強度"].quantile(0.3)q_high = df["情緒強度"].quantile(0.7)return q_low, q_high

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/pingmian/73181.shtml
繁體地址，請注明出處：http://hk.pswp.cn/pingmian/73181.shtml
英文地址，請注明出處：http://en.pswp.cn/pingmian/73181.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！