課程學習來源:b站up:【螞蟻學python】
【課程鏈接:【【數據可視化】Python數據圖表可視化入門到實戰】】
【課程資料鏈接:【鏈接】】
Python繪制散點圖查看BMI與保險費的關系
散點圖:
- 用兩組數據構成多個坐標點,考察坐標點的分布,判斷兩變量之間是否存在某種關聯或總結坐標點的分布模式
- 散點圖核心的價值在于發現變量之間的關系,然后進行預測分析,做出科學的決策
實例:醫療費用個人數據集中,"身體質量指數BMI"與"個人醫療費用"兩者之間的關系
數據集原地址:https://www.kaggle.com/mirichoi0218/insurance/home
1.讀取保險費數據集
import pandas as pddf = pd.read_csv("../DATA_POOL/PY_DATA/ant-learn-visualization-master/datas/insurance/insurance.csv")df.head(10)
age | sex | bmi | children | smoker | region | charges | |
---|---|---|---|---|---|---|---|
0 | 19 | female | 27.900 | 0 | yes | southwest | 16884.92400 |
1 | 18 | male | 33.770 | 1 | no | southeast | 1725.55230 |
2 | 28 | male | 33.000 | 3 | no | southeast | 4449.46200 |
3 | 33 | male | 22.705 | 0 | no | northwest | 21984.47061 |
4 | 32 | male | 28.880 | 0 | no | northwest | 3866.85520 |
5 | 31 | female | 25.740 | 0 | no | southeast | 3756.62160 |
6 | 46 | female | 33.440 | 1 | no | southeast | 8240.58960 |
7 | 37 | female | 27.740 | 3 | no | northwest | 7281.50560 |
8 | 37 | male | 29.830 | 2 | no | northeast | 6406.41070 |
9 | 60 | female | 25.840 | 0 | no | northwest | 28923.13692 |
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1338 entries, 0 to 1337
Data columns (total 7 columns):# Column Non-Null Count Dtype
--- ------ -------------- ----- 0 age 1338 non-null int64 1 sex 1338 non-null object 2 bmi 1338 non-null float643 children 1338 non-null int64 4 smoker 1338 non-null object 5 region 1338 non-null object 6 charges 1338 non-null float64
dtypes: float64(2), int64(2), object(3)
memory usage: 73.3+ KB
2.pyecharts繪制散點圖
# 將數據按照bmi升序排列
df.sort_values(by = "bmi", inplace = True)# inplace =true 表示直接更改df本身的數據
df.head()
age | sex | bmi | children | smoker | region | charges | |
---|---|---|---|---|---|---|---|
172 | 18 | male | 15.960 | 0 | no | northeast | 1694.79640 |
428 | 21 | female | 16.815 | 1 | no | northeast | 3167.45585 |
1226 | 38 | male | 16.815 | 2 | no | northeast | 6640.54485 |
412 | 26 | female | 17.195 | 2 | yes | northeast | 14455.64405 |
1286 | 28 | female | 17.290 | 0 | no | northeast | 3732.62510 |
bmi = df["bmi"].to_list()
charges = df["charges"].to_list()
import pyecharts.options as opts
from pyecharts.charts import Scatter
scatter = (Scatter().add_xaxis(xaxis_data = bmi).add_yaxis(series_name = "",y_axis = charges,symbol_size = 4,label_opts = opts.LabelOpts(is_show = False)).set_global_opts(xaxis_opts = opts.AxisOpts(type_ = "value"),yaxis_opts = opts.AxisOpts(type_ = "value"),title_opts = opts.TitleOpts(title = "(BMI-保險費)關系圖", pos_left = "center"))
)
from IPython.display import HTML# 同上,讀取 HTML 文件內容
# bar.render()的值是一個路徑,以字符串形式表示
with open(scatter.render(), 'r', encoding='utf-8') as file:html_content = file.read()# 直接在 JupyterLab 中渲染 HTML
HTML(html_content)