python 重啟內核

Every beginner in Machine Learning starts by studying what regression means and how the linear regression algorithm works. In fact, the ease of understanding, explainability and the vast effective real-world use cases of linear regression is what makes the algorithm so famous. However, there are some situations to which linear regression is not suited. In this article, we will see what these situations are, what the kernel regression algorithm is and how it fits into the scenario. Finally, we will code the kernel regression algorithm with a Gaussian kernel from scratch. Basic knowledge of Python and numpy is required to follow the article.

e。通過學習什么回歸方式，以及如何進行線性回歸算法的工作非常初學者在機器學習開始。實際上，算法的易懂性，可解釋性和廣泛有效的線性回歸實際使用案例就是使該算法如此出名的原因。但是，在某些情況下線性回歸不適合。在本文中，我們將了解這些情況，內核回歸算法是什么以及它如何適合該場景。最后，我們將從頭開始使用高斯內核對內核回歸算法進行編碼。閱讀本文需要具備Python和numpy的基礎知識。

線性回歸簡要回顧 (Brief Recap on Linear Regression)

Given data in the form of N feature vectors x=[x?, x?, …, x?] consisting of n features and the corresponding label vector y, linear regression tries to fit a line that best describes the data. For this, it tries to find the optimal coefficients c?, i∈{0, …, n} of the line equation y = c? + c?x?+c?x?+…+c?x? usually by gradient descent with the model accuracy measured on the RMSE metric. The equation obtained is then used to predict the target y? for new unseen input vector x?.

在N個特征向量x = [X?中，x?，...，X?]組成的n個特征和對應的標簽向量y，線性回歸嘗試的形式給定的數據，以適應線路最能描述的數據。對于這一點，它試圖找到最佳系數c?， 我 ∈{0，...，N}的直線方程Y = C?+ C?X?+ C?X?+ ... + C?X通常由梯度?的下降，并以RMSE指標衡量的模型準確性。然后將獲得的方程式用于預測新的看不見的輸入向量x的目標y 。

Linear regression is a simple algorithm that cannot model very complex relationships between the features. Mathematically, this is because well, it is linear with the degree of the equation being 1, which means that linear regression will always model a straight line. Indeed, this linearity is the weakness of the linear regression algorithm. Why?

線性回歸是一種簡單的算法，無法對要素之間的非常復雜的關系建模。從數學上講，這是因為它很好，它在方程的次數為1時是線性的，這意味著線性回歸將始終對直線建模。確實，這種線性是線性回歸算法的弱點。為什么？

Well, let’s consider a situation where our data doesn’t have the form of a straight line: let’s take data generated using the function f(x) = x3. If we use linear regression to fit a model to this data, we will never get anywhere close to the true cubic function because the equation for which we are finding the coefficients does not have a cubic term! So, for any data not generated using a linear function, linear regression is very likely to underfit. So, what do we do?

好吧，讓我們考慮一下數據不呈直線形式的情況：讓我們看一下使用函數f(x)=x3生成的數據。如果我們使用線性回歸將模型擬合到該數據，我們將永遠無法接近真正的三次函數，因為我們要為其找到系數的方程式沒有三次項！因此，對于未使用線性函數生成的任何數據，線性回歸很可能不適合。那么我們該怎么辦？

We can use another type of regression called polynomial regression which tries to find optimal coefficients of a (as the name suggests) polynomial equation with the degree of the equation being n, n?1. However, with polynomial regression another problem arises: as a data analyst, you cannot know what the degree of the equation should be so that the resulting equation fits best to the data. This can only be determined by trial and error which is made more difficult by the fact that above degree 3, the model built using polynomial regression is difficult to visualize.

我們可以使用一個叫做多項式回歸另一種類型的回歸它試圖找到一個最佳系數(顧名思義)多項式方程與方程為N，N-?1的程度。但是，使用多項式回歸會出現另一個問題：作為數據分析人員，您不知道方程式的度數以使所得方程式最適合數據。這只能通過反復試驗來確定，而由于三次以上的事實，使用多項式回歸建立的模型難以可視化，因此更加困難。

This is where kernel regression can come to the rescue!

這是內核回歸可以解決的地方！

什么是內核回歸？ (What is Kernel Regression?)

Seeing the name, you may ask that if ‘linear’ in linear regression meant a linear function and ‘polynomial’ in polynomial regression meant a polynomial function, what does ‘kernel’ mean? Turns out, it means a kernel function! So, what is a kernel function? Simply, it is a similarity function that takes two inputs and spits out how similar they are. We will see shortly how a kernel function is used in kernel regression.

看到名稱，您可能會問，如果線性回歸中的“線性”表示線性函數，而多項式回歸中的“多項式”意味著多項式函數，那么“內核”是什么意思？原來，這意味著內核功能！那么，什么是內核函數？簡而言之，它是一個相似度函數，它接受兩個輸入并吐出它們的相似度。我們很快將看到在內核回歸中如何使用內核函數。

Now about kernel regression. Unlike linear and polynomial regression in which the optimal parameter vector c=[c?, c?, …, c?] needs to be learnt, kernel regression is non-parametric, meaning that it calculates the target y? by performing computations directly on the input x?.

現在介紹內核回歸。與線性和多項式回歸，其中最佳參數矢量c = [C?，C?，...，C?]需要學習，核回歸是非參數，這意味著它計算目標??通過直接在執行計算輸入x?。

How?

怎么樣？

Given data points (x?, y?) Kernel Regression goes about predicting by first constructing a kernel k for each data point x?. Then for a given new input x?, it computes a similarity score with each x? (given by x?-x?) using the kernel ; the similarity score acts as a weight w? that represents the importance of that kernel (and corresponding label y?) in predicting the target y?. The prediction is then obtained by multiplying the weight vector w= [w?, w?, …, w?] with the label vector y= [y?, y?, …, y?].

給定數據點( x ， y )內核回歸通過首先為每個數據點x構造一個內核k來進行預測。然后，對于一個給定的新的輸入x?，它計算一個相似性得分與每個x?使用內核(由X?-X?給出); 相似度得分作為權重w?，表示內核的重要性(和相應的標記y?)在預測對象物Y?。 然后通過將權重向量w = [ w? ， w 2，… ， w?]乘以標記向量y = [ y y ， y 2，… ， y?]來獲得預測。

Image for post — Image by Author: Kernel Regression in Equations

Now, there can be different kernel functions which give rise to different types of kernel regressions. One such type is the Gaussian Kernel Regression in which the shape of the constructed kernel is the Gaussian curve also known as the bell-shaped curve. In the context of Gaussian Kernel Regression, each constructed kernel can also be viewed as a normal distribution with mean value x? and standard deviation b. Here, b is a hyperparameter that controls the shape (in particular, the width of the Gaussian curve in Gaussian kernels) of the curve. The equation for the Gaussian kernel k is given below. Notice the similarity between this equation and that of the Gaussian (also called normal) distribution.

現在，可以有不同的內核函數，從而導致不同類型的內核回歸。一種這樣的類型是高斯核回歸，其中構造的核的形狀是高斯曲線，也稱為鐘形曲線。在高斯核回歸的情況下，各構造內核還可以被看作是與平均值x?和標準偏差B A正態分布。在此，b是控制曲線的形狀(特別是高斯核中的高斯曲線的寬度)的超參數。高斯核k的等式在下面給出。請注意，該方程式與高斯分布(也稱為正態分布)的相似性。

We will code this type of kernel regression next.

接下來，我們將編碼這種類型的內核回歸。

編碼高斯核回歸 (Coding Gaussian Kernel Regression)

We will first look at the case of a one-dimensional feature vector and then extend it to n dimensions.

我們將首先看一維特征向量的情況，然后將其擴展到n維。

from scipy.stats import norm
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import mathclass GKR:
    
    def __init__(self, x, y, b):
        self.x = x
        self.y = y
        self.b = b
    
    '''Implement the Gaussian Kernel'''
    def gaussian_kernel(self, z):
        return (1/math.sqrt(2*math.pi))*math.exp(-0.5*z**2)
    
    '''Calculate weights and return prediction'''
    def predict(self, X):
        kernels = [self.gaussian_kernel((xi-X)/self.b) for xi in self.x]
        weights = [len(self.x) * (kernel/np.sum(kernels)) for kernel in kernels]
        return np.dot(weights, self.y)/len(self.x)

We define a class for Gaussian Kernel Regression which takes in the feature vector x, the label vector y and the hyperparameter b during initialization. Inside the class, we define a function gaussian_kernel() that implements the Gaussian kernel. You can see that we just write out the mathematical equation as code. Next, we define the function predict() that takes in the feature vector x? (referred to in code as X) whose target value has to be predicted. Inside the function, we construct kernels for each x?, calculate the weights and return the prediction, again by plugging in the mathematical equations into code as-is.

我們為高斯核回歸定義了一個類，該類在初始化過程中接受特征向量x，標簽向量y和超參數b 。在類內部，我們定義了一個實現高斯內核的函數gaussian_kernel() 。您可以看到我們只是將數學方程寫為代碼。接下來，我們定義函數predict() ，該函數接受必須預測目標值的特征向量x X (在代碼中稱為X )。在函數內部，我們為每個x construct構造內核，計算權重并返回預測，再次將數學方程式直接插入代碼中。

Now, let’s pass in some dummy data and see the prediction that is output. We predict the value for x? = 50 (by ignoring for demonstration purposes that it is already present in training data)

現在，讓我們傳遞一些虛擬數據，看看輸出的預測。我們預測值對于x?= 50(通過忽略用于演示目的，它已經存在于訓練數據)

gkr = GKR([10,20,30,40,50,60,70,80,90,100,110,120], [2337,2750,2301,2500,1700,2100,1100,1750,1000,1642, 2000,1932], 10)
gkr.predict(50)

This gives us an output of 1995.285

這樣我們得到的輸出是1995.285

Now, let’s extend the code for the case of n dimensional feature vectors. The only modification we need to make is in the similarity score calculation. Instead of obtaining the difference between x? and x?, we calculate the similarity score in the n dimensional case as the Euclidean distance ||x?-x?|| between them. Note that for the purposes of handling n dimensional vectors, we use numpy wherever needed.

現在，讓我們針對n維特征向量的情況擴展代碼。我們需要做的唯一修改是相似度分數計算。代替獲得X?并且x?，我們計算在n維的情況作為歐幾里得距離的相似性得分||之間的差的 X?-X?|| 它們之間。請注意，出于處理n維向量的目的，我們在需要時使用numpy。

from scipy.stats import multivariate_normal
'''Class for Gaussian Kernel Regression'''
class GKR:
    
    def __init__(self, x, y, b):
        self.x = np.array(x)
        self.y = np.array(y)
        self.b = b
    
    '''Implement the Gaussian Kernel'''
    def gaussian_kernel(self, z):
        return (1/np.sqrt(2*np.pi))*np.exp(-0.5*z**2)
    
    '''Calculate weights and return prediction'''
    def predict(self, X):
        kernels = np.array([self.gaussian_kernel((np.linalg.norm(xi-X))/self.b) for xi in self.x])
        weights = np.array([len(self.x) * (kernel/np.sum(kernels)) for kernel in kernels])
        return np.dot(weights.T, self.y)/len(self.x)

Again, let’s pass in some 2D dummy data and predict for x? = [20, 40].

再次，讓我們傳遞一些2D虛擬數據，并預測x = [20，40]。

gkr = GKR([[11,15],[22,30],[33,45],[44,60],[50,52],[67,92],[78,107],[89,123],[100,137]], [2337,2750,2301,2500,1700,1100,1000,1642, 1932], 10)
gkr.predict([20,40])

We get y? = 2563.086.

我們得到y = 2563.086。

The extended code (including for visualizations) for this article can be found on GitHub and Kaggle.

可以在GitHub和Kaggle上找到本文的擴展代碼(包括可視化)。

結論 (Conclusion)

We saw where and why linear regression and polynomial regression cannot be used and with that background understood the intuition behind and the working of kernel regression and how it can be used as an alternative. We went into the details of the Gaussian kernel regression and coded it from scratch in Python by simply plugging in the mathematical equations to code.

我們看到了不能在何處以及為什么不能使用線性回歸和多項式回歸，并且在這種背景下了解了內核回歸的直覺和工作原理以及如何將其用作替代方法。我們研究了高斯核回歸的細節，并通過簡單地將數學方程式插入代碼中，從頭開始用Python對其進行了編碼。

翻譯自: https://towardsdatascience.com/kernel-regression-from-scratch-in-python-ea0615b23918

python 重啟內核

本文來自互聯網用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。
如若轉載，請注明出處：http://www.pswp.cn/news/390994.shtml
繁體地址，請注明出處：http://hk.pswp.cn/news/390994.shtml
英文地址，請注明出處：http://en.pswp.cn/news/390994.shtml

如若內容造成侵權/違法違規/事實不符，請聯系多彩編程網進行投訴反饋email:809451989@qq.com，一經查實，立即刪除！

bzoj千題計劃282：bzoj4517: [Sdoi2016]排列計數

http://www.lydsy.com/JudgeOnline/problem.php?id4517 組合數錯排公式 #include<cstdio> #include<iostream>using namespace std;#define N 1000001const int mod1e97;long long fac[N],inv[N],f[N];void read(int &x) {x0; char cgetchar();while(!isdigit…

chrome啟用flash_如何在Google Chrome中啟用Adobe Flash Player

chrome啟用flashRemember Adobe Flash player? Its that nifty software that lets websites embed videos and web games. Whole websites can even be powered by Flash.還記得Adobe Flash Player嗎？ 正是這些漂亮的軟件使網站可以嵌入視頻和網絡游戲。整個網站…

怎么樣把Java的字符串轉化為字節數組？

問題：怎么樣把Java的字符串轉化為字節數組有沒有任何方法把Java的字符串轉化為字節數組我嘗試這樣: System.out.println(response.split("\r\n\r\n")[1]); System.out.println("******"); System.out.println(response.split("\r\n\r\…

Forward團隊-爬蟲豆瓣top250項目-模塊開發過程

項目托管平臺地址:https://github.com/xyhcq/top250 開發模塊功能: 寫入文件功能開發時間:3小時實現將爬取到的信息寫入到文件中的功能實現過程： # 打開文件 fopen("top250.txt","w") 在別的隊員寫的代碼基礎上，加入功能代碼 de…

CSS3 outline-offset 屬性項目中input會遇到

outline在一個聲明中設置所有的輪廓屬性。outline:顏色（outline-line）樣式（outline-style）寬度（outline-width） outline-offset 屬性對輪廓進行偏移，并在邊框邊緣進行繪制。輪廓在兩方面與邊框…

回歸分析中自變量共線性_具有大特征空間的回歸分析中的變量選擇

回歸分析中自變量共線性介紹 (Introduction) Performing multiple regression analysis from a large set of independent variables can be a challenging task. Identifying the best subset of regressors for a model involves optimizing against things like bias, multi…

winform窗體模板_如何驗證角模板驅動的窗體

winform窗體模板介紹 (Introduction) In this article, we will learn about validations in Angular template-driven forms. We will create a simple user registration form and implement some inbuilt validations on it. Along with the inbuilt validations, we will a…

【loj6191】「美團 CodeM 復賽」配對游戲概率期望dp

題目描述 n次向一個棧中加入0或1中隨機1個，如果一次加入0時棧頂元素為1，則將這兩個元素彈棧。問最終棧中元素個數的期望是多少。輸入一行一個正整數 n 。輸出一行一個實數，表示期望剩下的人數，四舍五入保留三位小數。樣例輸入…

查找滿足斷言的第一個元素

問題：查找滿足斷言的第一個元素我剛剛開始使用Java 8的lambdas，我嘗試去實現一些我在函數式語言里面經常用的例如，大部分的函數式語言里有一些查找函數，針對序列或者list進行操作，返回使得斷言為真的第一個元素。我…

Lock和synchronized的選擇

學習資源:http://www.cnblogs.com/dolphin0520/p/3923167.html 一.java.util.concurrent.locks包下常用的類 1.Lock public interface Lock { void lock();//用來獲取鎖。如果鎖已被其他線程獲取，則進行等待。void lockInterruptibly() throws InterruptedException…

python 面試問題_值得閱讀的30個Python面試問題

python 面試問題Interview questions are quite tricky to predict. In most cases, even peoples with great programming ability fail to answer some simple questions. Solving the problem with your code is not enough. Often, the interviewer will expect you to hav…