seaborn分類數據可視：散點圖|箱型圖|小提琴圖|lv圖|柱狀圖|折線圖

一、散點圖stripplot( ) 與swarmplot（）

1.分類散點圖stripplot( )?

用法stripplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,jitter=True, dodge=False, orient=None,

　　　　　　? color=None, palette=None,size=5, edgecolor="gray", linewidth=0, ax=None, **kwargs)

x，y 分類字段和分布統計字段
hue 在x分類的基礎上進行二次分類的字段
data 源數據
order 圖表中顯示的分類
jitter?當點數據重合較多時用該參數做一些調整，可以設置為True或者間距0.1，否則會有重合的點
dodge 如果有二次分類，二次分類是否拆分顯示

tips = sns.load_dataset("tips") #導入系統數據
print(tips.head())
print(tips['day'].value_counts())

   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4
Sat     87
Sun     76
Thur    62
Fri     19
Name: day, dtype: int64

輸出結果

fig = plt.figure(figsize=(15,10))
ax1 = plt.subplot(221)
# 對data數據按day分類，統計total_bill的分布，如果點重合較多適當顯示開
sns.stripplot(x="day", y="total_bill",  data=tips, jitter = True, size = 5, edgecolor = 'w',linewidth=1, marker = 'o', ax=ax1)ax2 = plt.subplot(222)
# 對data數據按day分類，統計total_bill的分布，并且圖表中只顯示按day分類的中的Sat和Sun
sns.stripplot(x="day", y="total_bill", data=tips,jitter = True, order = ['Sat','Sun'],ax=ax2) ax3 = plt.subplot(223)
# 對data數據按sex分類后再按day分類，統計total_bill的分布
sns.stripplot(x="sex", y="total_bill", hue="day",data=tips, jitter=True,ax = ax3)ax4 = plt.subplot(224)
# 對data數據按sex分類后再按day分類，統計total_bill的分布，并且不同的day拆分顯示
sns.stripplot(x="sex", y="total_bill", hue="day",data=tips, jitter=True,palette="Set2",dodge=True,ax=ax4)

2.分簇散點圖swarmplot（）

用法swarmplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,dodge=False, orient=None, color=None,

? ? ? ? ? ? ? ? ? ? ? palette=None,size=5, edgecolor="gray", linewidth=0, ax=None, **kwargs)

swarmplot()除了沒有jitter參數，其他用法類似stripplot()。

fig = plt.figure(figsize=(20,5))
ax1 = plt.subplot(141)
# 對data數據按day分類，統計total_bill的分布，如果點重合較多適當顯示開
sns.swarmplot(x="day", y="total_bill",  data=tips,  size = 5, edgecolor = 'w',linewidth=1, marker = 'o', ax=ax1)ax2 = plt.subplot(142)
# 對data數據按day分類，統計total_bill的分布，并且圖表中只顯示按day分類的中的Sat和Sun
sns.swarmplot(x="day", y="total_bill", data=tips, order = ['Sat','Sun'],ax=ax2) ax3 = plt.subplot(143)
# 對data數據按sex分類后再按day分類，統計total_bill的分布
sns.swarmplot(x="sex", y="total_bill", hue="day",data=tips, ax = ax3)ax4 = plt.subplot(144)
# 對data數據按sex分類后再按day分類，統計total_bill的分布，并且不同的day拆分顯示
sns.swarmplot(x="sex", y="total_bill", hue="day",data=tips, palette="Set2",dodge=True,ax=ax4)

二、箱型圖boxplot()

boxplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,orient=None, color=None, palette=None,

? ? ? ? ? ? ?saturation=.75,width=.8, dodge=True, fliersize=5, linewidth=None,whis=1.5, notch=False, ax=None, **kwargs)

x，y 分類字段和分布統計字段
hue 在x分類的基礎上進行二次分類的字段
data 源數據
order 圖表中顯示的分類
dodge 如果有二次分類，二次分類是否拆分顯示
width 箱的間隔的比例，值越大間隔越小
filtersize? 異常點大小
whis 設置IQR
notch 是否以中值做凹槽

fig = plt.figure(figsize=(12,5))ax1 = plt.subplot(121)
sns.boxplot(x="day", y="total_bill", data=tips,linewidth = 2, width = 0.8, fliersize = 10, palette = 'hls', whis = 1.5, notch = True )
sns.swarmplot(x="day", y="total_bill", data=tips,color ='g',size = 3,alpha = 0.8)  #在箱型圖上做分簇散點圖

ax2 = plt.subplot(122)
sns.boxplot(x="day", y="total_bill", data=tips, hue = 'smoker', order = ['Sat','Sun'],palette = 'Reds') #根據day分類，再根據smkker分類

三、小提琴圖violinplot()

violinplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,bw="scott", cut=2,

? ? ? ? ? ? ? ?scale="area", scale_hue=True,?gridsize=100, width=.8,?inner="box", split=False, dodge=True,

? ? ? ? ? ? ? ?orient=None,linewidth=None, color=None, palette=None,?saturation=.75,ax=None, **kwargs)

x，y 分類字段和分布統計字段
hue 在x分類的基礎上進行二次分類的字段
data 源數據
order 圖表中顯示的分類
dodge 如果有二次分類，二次分類的多個小提琴位置是否錯開，默認為True，False則多個小提琴會重復? （dodge=True與split=False效果相同）
split?如果有二次分類，二次分類是否拆分整個提琴，默認為False顯示為多個獨立的小提琴，True則顯示為一個小提琴，左右兩側表示二次分類
scale = 'area' 設置小提琴圖的寬度，area-保持小提琴面積相同，count-按照樣本數量決定寬度，width-寬度一樣
gridsize = 100 設置小提琴圖邊線的平滑度，越高越平滑
inner = 'box' 設置內部顯示類型 → “box”箱型圖, “quartile”分位數, “point”點, “stick”, None
bw = 0.8 # 控制擬合程度，'scott'、'silverman'或者一個浮點數，一般可以不設置

fig = plt.figure(figsize=(20,5))ax1 = plt.subplot(141)
sns.violinplot(x="day",y="total_bill",data=tips,linewidth=2,width=0.8,palette='hls',scale= 'area',gridsize=50,inner='box')  ax2 = plt.subplot(142)
sns.violinplot(x="day",y="total_bill",data=tips,hue = 'smoker',palette="muted",dodge=False,inner="point")#二次分類小提琴位置不錯開

ax3 = plt.subplot(143)
sns.violinplot(x="day",y="total_bill",data=tips,hue = 'smoker',palette="muted",split=False,inner="stick")#二次分類不拆分小提琴，顯示為多個獨立小提琴

ax4 = plt.subplot(144)
sns.violinplot(x="day",y="total_bill",data=tips,hue = 'smoker',palette="muted",split=True,inner="quartile")#二次分類拆分小提琴，左右兩側分別表示二次

小提琴圖與分簇散點圖結合sns.violinplot（）+ sns.swarmplot（）

sns.violinplot(x="day", y="total_bill", data=tips, palette = 'hls',alpha=0.5, inner = None)
sns.swarmplot(x="day", y="total_bill", data=tips, color="w")

四、增強箱圖boxenplot()

boxenplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,orient=None, color=None,

? ? ? ? ? ? ? ? palette=None, saturation=.75,width=.8,dodge=True, k_depth='proportion', linewidth=None,

? ? ? ? ? ? ? ? scale='exponential', outlier_prop=None, ax=None, **kwargs)

(lv圖表使用boxenplot(),lvplot()即將被遺棄)

x，y 分類字段和分布統計字段
hue 在x分類的基礎上進行二次分類的字段
data 源數據
order 圖表中顯示的分類
dodge 如果有二次分類，二次分類的多個小提琴位置是否錯開，默認為True，False則多個小提琴會重復? （dodge=True與split=False效果相同）
scale = 'area' 設置lv圖的寬度，“linear”、“exonential”、“area”? ?（一般scale和k_depth保持默認就好）
k_depth = 'proportion', # 設置框的數量 → “proportion”、“tukey”、“trustworthy”
width 箱之間間隔

sns.lvplot(x="day", y="total_bill", data=tips, palette="mako", width = 0.8, scale = 'area',k_depth = 'proportion')
sns.swarmplot(x="day", y="total_bill", data=tips, color ='k',size = 5)

五、柱狀圖barplot()?

seaborn中的柱狀圖不是單純的表示數量，而是表示了一個統計標準和對應的置信區間。

barplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,estimator=np.mean, ci=95,?
? ? ? ? ? ?n_boot=1000, units=None,orient=None, color=None, palette=None, saturation=.75,errcolor=".26",?
? ? ? ? ? ?errwidth=None, capsize=None, dodge=True,ax=None, **kwargs)

x，y 分類字段和分布統計字段
hue 在x分類的基礎上進行二次分類的字段
data 源數據
order 圖表中顯示的分類
estimater 柱狀圖表示的統計量，默認和常使用均值
ci 置信區間的誤差，0-100之內、或sd標準差，或None，默認為95
saturation 顏色飽和度
errcolor與errwidth 誤差線顏色與寬度
capsize 誤差線延長寬度
dodge 如果有二次分類，二次分類的多個多個柱狀圖位置是否錯開
edgecolor 柱子的邊框顏色

#導入泰坦尼克號、小費和汽車事故的3個表的數據結構，在不同窗口顯示前5行
titanic = sns.load_dataset("titanic")
titanic.head()
tips = sns.load_dataset('tips') #
tips.head()
crashes = sns.load_dataset("car_crashes")#
crashes.head()

???? ?

fig = plt.figure(figsize=(15,4))
ax1 = plt.subplot(131)  #泰坦尼克，在性別分類的基礎上再按艙級別分類，統計生還率
sns.barplot(x="sex",y="survived",hue="class",data=titanic,palette = 'hls',capsize = 0.05,saturation=.8,errcolor = 'gray',errwidth = 2,ci = 'sd')
ax2 = plt.subplot(132)  #小費，在日期分類的基礎上再按性別分類，統計給的小費，置信區間的誤差為標準差
sns.barplot(x="day", y="total_bill", hue="sex", data=tips,edgecolor = 'white',errcolor='gray',capsize=0.1,ci='sd')
ax3 = plt.subplot(133)  #小費，同上，置信區間的誤差為默認的95
sns.barplot(x="day", y="total_bill", hue="sex", data=tips,edgecolor = 'white',errcolor='gray',capsize=0.1)

crashes = sns.load_dataset("car_crashes").sort_values("total", ascending=False)
f,ax = plt.subplots(figsize=(8, 20))# 創建圖表

sns.set_color_codes("pastel")
sns.barplot(x="total", y="abbrev", data=crashes,label="Total", color='b',edgecolor = 'w')# 設置第一個柱狀圖
sns.set_color_codes("muted")
sns.barplot(x="alcohol", y="abbrev", data=crashes, label="Alcohol-involved",color='b',edgecolor = 'w')# 設置第二個柱狀圖

ax.legend(ncol=2, loc="lower right")
sns.despine(left=True, bottom=True)

六、計數柱狀圖countplot()?

countplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,orient=None, color=None,
? ? ? ? ? ? ? ? palette=None, saturation=.75,dodge=True, ax=None, **kwargs)

x，y 同時表示分類字段和顯示方向，即在x軸上或在y軸上對指定的字段進行計數顯示
hue 在x分類或者y分類的基礎上進行二次分類的字段
data 源數據
order 圖表中顯示的分類
dodge 如果有二次分類，二次分類的多個多個柱狀圖位置是否錯開
edgecolor 柱子的邊框顏色

fig = plt.figure(figsize=(12,4))
ax1 = plt.subplot(121)
sns.countplot(x="day", hue="sex", data=tips, palette = 'muted') #豎直顯示，在日期分類的基礎上再按性別分類
ax2 = plt.subplot(122)
sns.countplot(y="day", hue="sex", data=tips, palette = 'muted') #水平顯示

七、折線圖pointbar()

折線圖pointbar()和barplot()的用法類似，只是barplot()用柱狀圖表示均值，而pointbar()用一個點表示了均值。

pointplot(x=None, y=None, hue=None, data=None, order=None, hue_order=None,estimator=np.mean, ci=95,
? ? ? ? ? ? ? n_boot=1000, units=None,markers="o", linestyles="-", dodge=False, join=True, scale=1,
? ? ? ? ? ? ? orient=None, color=None, palette=None, errwidth=None,capsize=None, ax=None, **kwargs)

x，y 分類字段和分布統計字段
hue 在x分類的基礎上進行二次分類的字段
data 源數據
order 圖表中顯示的分類
estimater 柱狀圖表示的統計量，默認和常使用均值
ci 置信區間的誤差，0-100之內、或sd標準差，或None，默認為95
marker 均值的表示形式
errwidth 誤差線顏色與寬度
capsize 誤差線延長寬度
dodge 如果有二次分類，二次分類的多個線是否分開
joint 是否連線

sns.pointplot(x="day",y="total_bill",hue = 'sex',data=tips,palette = 'hls',dodge = True,join = True,markers=["o", "x"],linestyles=["-", "--"])
tips.groupby(['day','sex']).mean()['total_bill']