一. 隨機變量
隨機變量是一個從樣本空間 Ω \Omega Ω到實數空間 R R R的函數,比如隨機變量 X X X可以表示投骰子的點數。隨機變量一般可以分為兩類:
- 離散型隨機變量:隨機變量的取值為有限個。
- 連續型隨機變量:隨機變量的取值是連續的,有無限多個。
scipy.stat模塊中包含了多種概率分布的隨機變量,包含離散型隨機變量和連續型隨機變量。離散型隨機變量的常見接口如下:
方法名 | 功能 |
---|---|
rvs | 生成該分布的隨機序列 |
pmf | 概率質量函數 |
cdf | 累計概率分布函數 |
stats | 計算該分布的均值,方差,偏度,峰度。[Mean(‘m’), variance(‘v’), skew(‘s’), kurtosis(‘k’)] |
連續型隨機變量的常見接口如下:
方法名 | 功能 |
---|---|
rvs | 生成該分布的隨機序列 |
概率密度函數 | |
cdf | 累計概率分布函數 |
stats | 計算該分布的均值,方差,偏度,峰度。[Mean(‘m’), variance(‘v’), skew(‘s’), kurtosis(‘k’)] |
二. 常見離散分布
1. 二項分布
如果隨機變量 X X X的分布律為 P ( X = k ) = C n k p k q n ? k , k = 0 , 1 , . . . n , P(X=k) = C^k_np^kq^{n-k},k = 0,1,...n, P(X=k)=Cnk?pkqn?k,k=0,1,...n,其中 p + q = 1 p + q = 1 p+q=1 ,則稱 X X X服從參數為 n , p n,p n,p的二項分布,記為 X ~ B ( n , p ) X \sim B(n,p) X~B(n,p)。
- 期望: E ( X ) = n p E(X) = np E(X)=np
- 方差: D ( X ) = n p ( 1 ? p ) D(X) = np(1 - p) D(X)=np(1?p)
-
畫出不同參數下的二項分布, n , p n, p n,p分別為 ( 10 , 0.3 ) , ( 10 , 0.5 ) , ( 10 , 0.7 ) (10,0.3),(10,0.5),(10,0.7) (10,0.3),(10,0.5),(10,0.7)
import numpy as np from scipy.stats import binom import matplotlib.pyplot as pltplt.rcParams["font.family"] = "SimHei" # 設置字體 plt.rcParams["axes.unicode_minus"] = False # 正常顯示負號if __name__ == '__main__':fig, ax = plt.subplots(3, 1, figsize = (10, 10))# 調整子圖間距fig.subplots_adjust(hspace = 0.5)params = [(10, 0.3), (10, 0.5), (10, 0.7)]for i in range(len(params)):n = params[i][0]p = params[i][1]x = np.arange(0, n + 1)y = binom(n, p).pmf(x)# 計算隨機變量的期望,方差mean, var = binom.stats(n, p, moments='mv')ax[i].scatter(x, y, color = 'blue', marker = 'o')ax[i].set_title('n = {}, p = {}'.format(n, p))ax[i].set_xticks(x)ax[i].text(1, 0.2, '期望: {:.2f}\n方差: {:.2f}'.format(mean, var))ax[i].grid()plt.show()
運行結果:
-
生成服從不同參數二項分布的隨機數組(采樣100000次),然后查看數組的頻率分布
import numpy as np from scipy.stats import binom import matplotlib.pyplot as pltplt.rcParams["font.family"] = "SimHei" # 設置字體 plt.rcParams["axes.unicode_minus"] = False # 正常顯示負號if __name__ == '__main__':fig, ax = plt.subplots(3, 1, figsize = (10, 10))# 調整子圖間距fig.subplots_adjust(hspace = 0.5)params = [(10, 0.3), (10, 0.5), (10, 0.7)]for i in range(len(params)):n = params[i][0]p = params[i][1]x = np.arange(0, 11)# 抽樣10萬次sample = binom.rvs(n = n, p = p, size = 100000)print(sample)ax[i].hist(sample, color = 'blue', density=True, bins = 50)ax[i].set_title('n = {}, p = {}'.format(n, p))ax[i].set_xticks(x)ax[i].grid()plt.show()
運行結果:
2. 幾何分布
若隨機變量 X X X的分布律為 P ( X = k ) = ( 1 ? p ) k ? 1 p , k = 1 , 2 , . . . , P(X = k) = (1 - p)^{k - 1}p,k = 1, 2, ..., P(X=k)=(1?p)k?1p,k=1,2,...,其中 0 < p < 1 0 < p < 1 0<p<1,則稱 X X X服從參數為 p p p的幾何分布,記為 X ~ G e ( p ) X \sim Ge(p) X~Ge(p)。
- 期望: E ( X ) = 1 p E(X) = \frac{1}{p} E(X)=p1?
- 方差: D ( X ) = 1 ? p p 2 D(X) = \frac{1 - p}{p^2} D(X)=p21?p?
-
畫出不同參數下的幾何分布, p p p分別為 ( 0.3 , 0.5 , 0.7 ) (0.3,0.5,0.7) (0.3,0.5,0.7)
import numpy as np from scipy.stats import geom import matplotlib.pyplot as pltplt.rcParams["font.family"] = "SimHei" # 設置字體 plt.rcParams["axes.unicode_minus"] = False # 正常顯示負號if __name__ == '__main__':fig, ax = plt.subplots(3, 1, figsize = (10, 10))# 調整子圖間距fig.subplots_adjust(hspace = 0.5)params = [0.3,0.5,0.7]for i in range(len(params)):p = params[i]x = np.arange(1, 15)y = geom(p = p).pmf(x)print(y)# 計算隨機變量的期望,方差mean, var = geom.stats(p = p, moments='mv')ax[i].scatter(x, y, color = 'blue', marker = 'o')ax[i].set_title('p = {}'.format(p))ax[i].set_xticks(x)ax[i].text(5, 0.2, '期望: {:.2f}\n方差: {:.2f}'.format(mean, var))ax[i].grid()plt.show()
運行結果:
-
生成服從不同參數幾何分布的隨機數組(采樣100000次),然后查看數組的頻率分布
import numpy as np from scipy.stats import geom import matplotlib.pyplot as pltplt.rcParams["font.family"] = "SimHei" # 設置字體 plt.rcParams["axes.unicode_minus"] = False # 正常顯示負號if __name__ == '__main__':fig, ax = plt.subplots(3, 1, figsize = (10, 10))# 調整子圖間距fig.subplots_adjust(hspace = 0.5)params = [0.3, 0.5, 0.7]for i in range(len(params)):p = params[i]x = np.arange(0, 15)# 抽樣sample = geom.rvs(p = p, size = 100000)print(sample)ax[i].hist(sample, color = 'blue', density=True, bins = 50)ax[i].set_title('p = {}'.format(p))ax[i].set_xlim(0,15)ax[i].set_xticks(x)ax[i].grid()plt.show()
運行結果:
3. 泊松分布
若隨機變量 X X X的分布律為 P ( X = k ) = λ k k ! e ? λ , k = 0 , 1 , 2... , P(X=k) = \frac{\lambda^k}{k!}e^{-\lambda},k = 0, 1, 2 ..., P(X=k)=k!λk?e?λ,k=0,1,2...,其中 λ > 0 , \lambda > 0, λ>0,則稱 X X X服從參數為 λ \lambda λ的泊松分布,記為 X ~ P ( λ ) X \sim P(\lambda) X~P(λ)。
- 期望: E ( X ) = λ E(X) = \lambda E(X)=λ
- 方差: D ( X ) = λ D(X) = \lambda D(X)=λ
-
畫出不同參數下的泊松分布, λ \lambda λ分別為 ( 2 , 6 , 8 ) (2,6,8) (2,6,8)
import numpy as np from scipy.stats import poisson import matplotlib.pyplot as pltplt.rcParams["font.family"] = "SimHei" # 設置字體 plt.rcParams["axes.unicode_minus"] = False # 正常顯示負號if __name__ == '__main__':fig, ax = plt.subplots(3, 1, figsize = (10, 10))# 調整子圖間距fig.subplots_adjust(hspace = 0.5)params = [2,6,8]for i in range(len(params)):numda = params[i]x = np.arange(1, 15)y = poisson(numda).pmf(x)# 計算隨機變量的期望,方差mean, var = poisson.stats(numda, moments='mv')ax[i].scatter(x, y, color = 'blue', marker = 'o')ax[i].set_title('lambda = {}'.format(numda))ax[i].set_xticks(x)ax[i].set_yticks([0, 0.1, 0.2, 0.3, 0.4])ax[i].text(5, 0.2, '期望: {:.2f}\n方差: {:.2f}'.format(mean, var))ax[i].grid()plt.show()
運行結果:
-
生成服從不同參數泊松分布的隨機數組(采樣100000次),然后查看數組的頻率分布
import numpy as np from scipy.stats import poisson import matplotlib.pyplot as pltplt.rcParams["font.family"] = "SimHei" # 設置字體 plt.rcParams["axes.unicode_minus"] = False # 正常顯示負號if __name__ == '__main__':fig, ax = plt.subplots(3, 1, figsize = (10, 10))# 調整子圖間距fig.subplots_adjust(hspace = 0.5)params = [2, 6, 8]for i in range(len(params)):numda = params[i]x = np.arange(0, 16)# 抽樣sample = poisson.rvs(numda, size = 1000000)print(sample)ax[i].hist(sample, color = 'blue', density=True, bins = 50)ax[i].set_title('lamdba = {}'.format(numda))ax[i].set_xticks(x)ax[i].set_xlim(0, 16)ax[i].grid()plt.show()
運行結果:
三. 常見連續分布
1. 正太分布
若隨機變量 X X X的概率密度函數為 f ( x ) = 1 2 π δ e ? ( x ? μ ) 2 2 δ 2 , ( ? ∞ < x < + ∞ ) f(x) = \frac{1}{\sqrt{2\pi}\delta}e^{- \frac{(x - \mu)^2}{2\delta^2}},( -\infty< x < +\infty) f(x)=2π?δ1?e?2δ2(x?μ)2?,(?∞<x<+∞),則稱 X X X服從參數為 ( μ , δ 2 ) (\mu,\delta^2) (μ,δ2)的正太分布,記為 X ~ N ( μ , δ 2 ) X \sim N(\mu,\delta^2) X~N(μ,δ2)。當 μ = 0 , δ = 1 \mu =0,\delta = 1 μ=0,δ=1時稱 X X X服從標準正太分布。
- 期望: E ( X ) = μ E(X) = \mu E(X)=μ
- 方差: D ( X ) = δ 2 D(X) = \delta^2 D(X)=δ2
-
畫出不同參數下的正太分布, μ , δ \mu,\delta μ,δ分別為 ( 0 , 1 ) , ( 0 , 3 ) (0, 1), (0, 3) (0,1),(0,3)
import numpy as np from scipy.stats import norm import matplotlib.pyplot as pltplt.rcParams["font.family"] = "SimHei" # 設置字體 plt.rcParams["axes.unicode_minus"] = False # 正常顯示負號if __name__ == '__main__':fig, ax = plt.subplots(figsize=(10, 8))params = [(0, 1, 'red'), (0, 3, 'blue')]x = np.linspace(-20, 20, 1000)for i in range(0, len(params)):loc = params[i][0]scale = params[i][1]color = params[i][2]mean, var = norm.stats(loc, scale, moments='mv')ax.plot(x, norm(loc = loc, scale = scale).pdf(x), color = color, label = 'loc={},scale={},均值={},方差={}'.format(loc, scale,mean,var))ax.set_xticks(np.arange(-20, 21))ax.grid()ax.legend()plt.show()
-
生成服從不同參數正太分布的隨機數組(采樣100000次),然后查看數組的頻率分布
import numpy as np from scipy.stats import norm import matplotlib.pyplot as pltplt.rcParams["font.family"] = "SimHei" # 設置字體 plt.rcParams["axes.unicode_minus"] = False # 正常顯示負號if __name__ == '__main__':fig, ax = plt.subplots(2, 1, figsize=(10, 8))params = [(0, 1, 'red'), (0, 3, 'blue')]x = np.linspace(-20, 20, 1000)# 采樣for i in range(0, len(params)):loc = params[i][0]scale = params[i][1]color = params[i][2]# 畫出分布圖ax[i].plot(x, norm(loc = loc, scale = scale).pdf(x), color = color, label = 'loc={},scale={}'.format(loc, scale))# 畫出隨機抽樣的頻率分布直方圖ax[i].hist(norm(loc = loc, scale = scale).rvs(size = 100000), density=True, bins = 100)ax[i].set_xticks(np.arange(-20, 21))ax[i].grid()ax[i].legend()plt.show()
2. 指數分布
若隨機變量 X X X的概率密度函數為 f ( x ) = { λ e ? λ x x ≥ 0 0 x < 0 ( λ > 0 ) f(x) = \begin{cases} {\lambda}e^{-{\lambda}x} & x \ge 0\\0 & x < 0\end{cases} (\lambda > 0) f(x)={λe?λx0?x≥0x<0?(λ>0),則稱 X X X服從參數為 λ \lambda λ的指數分布,記為 X ~ E ( λ ) X \sim E(\lambda) X~E(λ)。
- 期望: E ( X ) = 1 λ E(X) = \frac{1}{\lambda} E(X)=λ1?
- 方差: D ( X ) = 1 λ 2 D(X) = \frac{1}{{\lambda}^2} D(X)=λ21?
scipy中指數分布expon的參數傳入 λ \lambda λ的倒數。
A common parameterization for expon is in terms of the rate parameter lambda, such that pdf = lambda * exp(-lambda * x). This parameterization corresponds to using scale = 1 / lambda.
-
畫出不同參數下的指數分布, λ \lambda λ分別為 ( 0.5 , 1 , 1.5 ) (0.5,1,1.5) (0.5,1,1.5)
import numpy as np import matplotlib.pyplot as plt from scipy.stats import exponplt.rcParams["font.family"] = "SimHei" # 設置字體 plt.rcParams["axes.unicode_minus"] = False # 正常顯示負號if __name__ == '__main__':fig, ax = plt.subplots(figsize = (10, 8))params = [(0.5, 'red'), (1, 'blue'), (1.5, 'green')]x = np.linspace(0, 15, 1000)for i in range(0, len(params)):numda = params[i][0]color = params[i][1]mean, var = expon.stats(loc = 0, scale = 1 / numda, moments='mv')ax.plot(x, expon(scale = 1 / numda).pdf(x), color = color, label = 'lambda = {:.2f}, 均值:{:.2f}, 方差: {:.4f}'.format(numda, mean, var))ax.grid()ax.legend()plt.show()
-
生成服從不同參數指數分布的隨機數組(采樣100000次),然后查看數組的頻率分布
import numpy as np import matplotlib.pyplot as plt from scipy.stats import exponplt.rcParams["font.family"] = "SimHei" # 設置字體 plt.rcParams["axes.unicode_minus"] = False # 正常顯示負號if __name__ == '__main__':fig, ax = plt.subplots(3, 1, figsize = (10, 8))params = [(0.5, 'red'), (1, 'blue'), (1.5, 'green')]x = np.linspace(0, 15, 1000)# 采樣for i in range(0, len(params)):numda = params[i][0]color = params[i][1]ax[i].plot(x, expon(scale = 1/numda).pdf(x), color = color, label = 'lambda={}'.format(numda))ax[i].hist(expon(scale = 1/numda).rvs(size = 10000), density=True, bins = 100)ax[i].set_xticks(np.arange(0, 15))ax[i].set_xlim(0, 15)ax[i].grid()ax[i].legend()plt.show()