python多項式回歸_Python從頭開始的多項式回歸

python多項式回歸

Polynomial regression in an improved version of linear regression. If you know linear regression, it will be simple for you. If not, I will explain the formulas here in this article. There are other advanced and more efficient machine learning algorithms are out there. But it is a good idea to learn linear based regression techniques. Because they are simple, fast, and works with very well known formulas. Though it may not work with a complex set of data.

線性回歸的改進版本中的多項式回歸。 如果您知道線性回歸,那么對您來說很簡單。 如果沒有,我將在本文中解釋這些公式。 還有其他先進且更有效的機器學習算法。 但是,學習基于線性的回歸技術是一個好主意。 因為它們簡單,快速并且可以使用眾所周知的公式。 盡管它可能不適用于復雜的數據集。

多項式回歸公式 (Polynomial Regression Formula)

Linear regression can perform well only if there is a linear correlation between the input variables and the output variable. As I mentioned before polynomial regression is built on linear regression. If you need a refresher on linear regression, here is the link to linear regression:

僅當輸入變量和輸出變量之間存在線性相關性時,線性回歸才能很好地執行。 如前所述,多項式回歸建立在線性回歸的基礎上。 如果您需要線性回歸的基礎知識,請訪問以下線性回歸鏈接:

Polynomial regression can find the relationship between input features and the output variable in a better way even if the relationship is not linear. It uses the same formula as the linear regression:

多項式回歸可以更好地找到輸入要素與輸出變量之間的關系,即使該關系不是線性的。 它使用與線性回歸相同的公式:

Y = BX + C

Y = BX + C

I am sure, we all learned this formula in school. For linear regression, we use symbols like this:

我敢肯定,我們都在學校學過這個公式。 對于線性回歸,我們使用如下符號:

Here, we get X and Y from the dataset. X is the input feature and Y is the output variable. Theta values are initialized randomly.

在這里,我們從數據集中獲得X和Y。 X是輸入要素,Y是輸出變量。 Theta值是隨機初始化的。

For polynomial regression, the formula becomes like this:

對于多項式回歸,公式如下所示:

Image for post

We are adding more terms here. We are using the same input features and taking different exponentials to make more features. That way, our algorithm will be able to learn about the data better.

我們在這里添加更多術語。 我們使用相同的輸入功能,并采用不同的指數以制作更多功能。 這樣,我們的算法將能夠更好地了解數據。

The powers do not have to be 2, 3, or 4. They could be 1/2, 1/3, or 1/4 as well. Then the formula will look like this:

冪不必為2、3或4。它們也可以為1 / 2、1 / 3或1/4。 然后,公式將如下所示:

Image for post

成本函數和梯度下降 (Cost Function And Gradient Descent)

Cost function gives an idea of how far the predicted hypothesis is from the values. The formula is:

成本函數給出了預測假設與值之間的距離的概念。 公式為:

Image for post

This equation may look complicated. It is doing a simple calculation. First, deducting the hypothesis from the original output variable. Taking a square to eliminate the negative values. Then dividing that value by 2 times the number of training examples.

這個方程可能看起來很復雜。 它正在做一個簡單的計算。 首先,從原始輸出變量中減去假設。 取平方消除負值。 然后將該值除以訓練示例數的2倍。

What is gradient descent? It helps in fine-tuning our randomly initialized theta values. I am not going to the differential calculus here. If you take the partial differential of the cost function on each theta, we can derive these formulas:

什么是梯度下降? 它有助于微調我們隨機初始化的theta值。 我不打算在這里進行微積分。 如果對每個θ取成本函數的偏微分,則可以得出以下公式:

Image for post

Here, alpha is the learning rate. You choose the value of alpha.

在這里,alpha是學習率。 您選擇alpha的值。

多項式回歸的Python實現 (Python Implementation of Polynomial Regression)

Here is the step by step implementation of Polynomial regression.

這是多項式回歸的逐步實現。

  1. We will use a simple dummy dataset for this example that gives the data of salaries for positions. Import the dataset:

    在此示例中,我們將使用一個簡單的虛擬數據集,該數據集提供職位的薪水數據。 導入數據集:
import pandas as pd
import numpy as np
df = pd.read_csv('position_salaries.csv')
df.head()
Image for post

2. Add the bias column for theta 0. This bias column will only contain 1. Because if you multiply 1 with a number it does not change.

2.添加theta 0的偏差列。該偏差列將僅包含1。因為如果將1乘以數字,它不會改變。

df = pd.concat([pd.Series(1, index=df.index, name='00'), df], axis=1)
df.head()
Image for post

3. Delete the ‘Position’ column. Because the ‘Position’ column contains strings and algorithms do not understand strings. We have the ‘Level’ column to represent the positions.

3.刪除“位置”列。 由于“位置”列包含字符串,并且算法無法理解字符串。 我們有“級別”列來代表職位。

df = df.drop(columns='Position')

4. Define our input variable X and the output variable y. In this example, ‘Level’ is the input feature and ‘Salary’ is the output variable. We want to predict the salary for levels.

4.定義我們的輸入變量X和輸出變量y。 在此示例中,“級別”是輸入要素,而“薪水”是輸出變量。 我們要預測各個級別的薪水。

y = df['Salary']
X = df.drop(columns = 'Salary')
X.head()
Image for post

5. Take the exponentials of the ‘Level’ column to make ‘Level1’ and ‘Level2’ columns.

5.以“級別”列的指數表示“級別1”和“級別2”列。

X['Level1'] = X['Level']**2
X['Level2'] = X['Level']**3
X.head()
Image for post

6. Now, normalize the data. Divide each column by the maximum value of that column. That way, we will get the values of each column ranging from 0 to 1. The algorithm should work even without normalization. But it helps to converge faster. Also, calculate the value of m which is the length of the dataset.

6.現在,標準化數據。 將每一列除以該列的最大值。 這樣,我們將獲得每列的值,范圍從0到1。即使沒有規范化,該算法也應該起作用。 但這有助于收斂更快。 同樣,計算m的值,它是數據集的長度。

m = len(X)
X = X/X.max()

7. Define the hypothesis function. That will use the X and theta to predict the ‘y’.

7.定義假設函數。 這將使用X和theta來預測“ y”。

def hypothesis(X, theta):
y1 = theta*X
return np.sum(y1, axis=1)

8. Define the cost function, with our formula for cost-function above:

8.使用上面的成本函數公式定義成本函數:

def cost(X, y, theta):
y1 = hypothesis(X, theta)
return sum(np.sqrt((y1-y)**2))/(2*m)

9. Write the function for gradient descent. We will keep updating the theta values until we find our optimum cost. For each iteration, we will calculate the cost for future analysis.

9.編寫梯度下降函數。 我們將不斷更新theta值,直到找到最佳成本。 對于每次迭代,我們將計算成本以供將來分析。

def gradientDescent(X, y, theta, alpha, epoch):
J=[]
k=0
while k < epoch:
y1 = hypothesis(X, theta)
for c in range(0, len(X.columns)):
theta[c] = theta[c] - alpha*sum((y1-y)* X.iloc[:, c])/m
j = cost(X, y, theta)
J.append(j)
k += 1
return J, theta

10. All the functions are defined. Now, initialize the theta. I am initializing an array of zero. You can take any other random values. I am choosing alpha as 0.05 and I will iterate the theta values for 700 epochs.

10.定義了所有功能。 現在,初始化theta。 我正在初始化零數組。 您可以采用任何其他隨機值。 我選擇alpha為0.05,我將迭代700個紀元的theta值。

theta = np.array([0.0]*len(X.columns))
J, theta = gradientDescent(X, y, theta, 0.05, 700)

11. We got our final theta values and the cost in each iteration as well. Let’s find the salary prediction using our final theta.

11.我們還獲得了最終的theta值以及每次迭代的成本。 讓我們使用最終的theta查找薪水預測。

y_hat = hypothesis(X, theta)

12. Now plot the original salary and our predicted salary against the levels.

12.現在根據水平繪制原始薪水和我們的預測薪水。

%matplotlib inline
import matplotlib.pyplot as plt
plt.figure()
plt.scatter(x=X['Level'],y= y)
plt.scatter(x=X['Level'], y=y_hat)
plt.show()
Image for post

Our prediction does not exactly follow the trend of salary but it is close. Linear regression can only return a straight line. But in polynomial regression, we can get a curved line like that. If the line would not be a nice curve, polynomial regression can learn some more complex trends as well.

我們的預測并不完全符合薪資趨勢,但接近。 線性回歸只能返回一條直線。 但是在多項式回歸中,我們可以得到這樣的曲線。 如果該線不是一條好曲線,則多項式回歸也可以學習一些更復雜的趨勢。

13. Let’s plot the cost we calculated in each epoch in our gradient descent function.

13.讓我們繪制我們在梯度下降函數中每個時期計算的成本。

plt.figure()
plt.scatter(x=list(range(0, 700)), y=J)
plt.show()
Image for post

The cost fell drastically in the beginning and then the fall was slow. In a good machine learning algorithm, cost should keep going down until the convergence. Please feel free to try it with a different number of epochs and different learning rates (alpha).

成本從一開始就急劇下降,然后下降緩慢。 在一個好的機器學習算法中,成本應該一直下降直到收斂。 請隨意嘗試不同的時期和不同的學習率(alpha)。

Here is the dataset: salary_data

這是數據集: salary_data

Follow this link for the full working code: Polynomial Regression

請點擊以下鏈接獲取完整的工作代碼: 多項式回歸

推薦閱讀: (Recommended reading:)

翻譯自: https://towardsdatascience.com/polynomial-regression-from-scratch-in-python-1f34a3a5f373

python多項式回歸

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/390743.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/390743.shtml
英文地址,請注明出處:http://en.pswp.cn/news/390743.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

《Linux命令行與shell腳本編程大全 第3版》Linux命令行---4

以下為閱讀《Linux命令行與shell腳本編程大全 第3版》的讀書筆記&#xff0c;為了方便記錄&#xff0c;特地與書的內容保持同步&#xff0c;特意做成一節一次隨筆&#xff0c;特記錄如下&#xff1a; 《Linux命令行與shell腳本編程大全 第3版》Linux命令行--- Linux命令行與she…

徹底搞懂 JS 中 this 機制

徹底搞懂 JS 中 this 機制 摘要&#xff1a;本文屬于原創&#xff0c;歡迎轉載&#xff0c;轉載請保留出處&#xff1a;https://github.com/jasonGeng88/blog 目錄 this 是什么this 的四種綁定規則綁定規則的優先級綁定例外擴展&#xff1a;箭頭函數this 是什么 理解this之前&a…

?如何在2分鐘內將GraphQL服務器添加到RESTful Express.js API

You can get a lot done in 2 minutes, like microwaving popcorn, sending a text message, eating a cupcake, and hooking up a GraphQL server.您可以在2分鐘內完成很多工作&#xff0c;例如微波爐爆米花&#xff0c;發送短信&#xff0c; 吃蛋糕以及連接GraphQL服務器 。 …

leetcode 1744. 你能在你最喜歡的那天吃到你最喜歡的糖果嗎?

給你一個下標從 0 開始的正整數數組 candiesCount &#xff0c;其中 candiesCount[i] 表示你擁有的第 i 類糖果的數目。同時給你一個二維數組 queries &#xff0c;其中 queries[i] [favoriteTypei, favoriteDayi, dailyCapi] 。 你按照如下規則進行一場游戲&#xff1a; 你…

回歸分析_回歸

回歸分析Machine learning algorithms are not your regular algorithms that we may be used to because they are often described by a combination of some complex statistics and mathematics. Since it is very important to understand the background of any algorith…

ruby nil_Ruby中的數據類型-True,False和Nil用示例解釋

ruby niltrue, false, and nil are special built-in data types in Ruby. Each of these keywords evaluates to an object that is the sole instance of its respective class.true &#xff0c; false和nil是Ruby中的特殊內置數據類型。 這些關鍵字中的每一個都求值為一個對…

淺嘗flutter中的動畫(淡入淡出)

在移動端開發中&#xff0c;經常會有一些動畫交互&#xff0c;比如淡入淡出,效果如圖&#xff1a; 因為官方封裝好了AnimatedOpacity Widget&#xff0c;開箱即用&#xff0c;所以我們用起來很方便&#xff0c;代碼量很少&#xff0c;做少量配置即可&#xff0c;所以&#xff0…

數據科學還是計算機科學_何時不使用數據科學

數據科學還是計算機科學意見 (Opinion) 目錄 (Table of Contents) Introduction 介紹 Examples 例子 When You Should Use Data Science 什么時候應該使用數據科學 Summary 摘要 介紹 (Introduction) Both Data Science and Machine Learning are useful fields that apply sev…

空間復雜度 用什么符號表示_什么是大O符號解釋:時空復雜性

空間復雜度 用什么符號表示Do you really understand Big O? If so, then this will refresh your understanding before an interview. If not, don’t worry — come and join us for some endeavors in computer science.您真的了解Big O嗎&#xff1f; 如果是這樣&#xf…

leetcode 523. 連續的子數組和

給你一個整數數組 nums 和一個整數 k &#xff0c;編寫一個函數來判斷該數組是否含有同時滿足下述條件的連續子數組&#xff1a; 子數組大小 至少為 2 &#xff0c;且 子數組元素總和為 k 的倍數。 如果存在&#xff0c;返回 true &#xff1b;否則&#xff0c;返回 false 。 …

Docker學習筆記 - Docker Compose

一、概念 Docker Compose 用于定義運行使用多個容器的應用&#xff0c;可以一條命令啟動應用&#xff08;多個容器&#xff09;。 使用Docker Compose 的步驟&#xff1a; 定義容器 Dockerfile定義應用的各個服務 docker-compose.yml啟動應用 docker-compose up二、安裝 Note t…

創建shell腳本

1.寫一個腳本 a) 用touch命令創建一個文件&#xff1a;touch my_script b) 用vim編輯器打開my_script文件&#xff1a;vi my_script c) 用vim編輯器編輯my_script文件,內容如下&#xff1a; #!/bin/bash 告訴shell使用什么程序解釋腳本 #My first script l…

線性回歸算法數學原理_線性回歸算法-非數學家的高級數學

線性回歸算法數學原理內部AI (Inside AI) Linear regression is one of the most popular algorithms used in different fields well before the advent of computers. Today with the powerful computers, we can solve multi-dimensional linear regression which was not p…

您應該在2020年首先學習哪種編程語言? ????d???s????:???su?

Most people’s journey toward learning to program starts with a single late-night Google search.大多數人學習編程的旅程都是從一個深夜Google搜索開始的。 Usually it’s something like “Learn ______”通常它類似于“學習______” But how do they decide which la…

Linux 概述

UNIX發展歷程 第一個版本是1969年由Ken Thompson&#xff08;UNIX之父&#xff09;在AT& T貝爾實驗室實現Ken Thompson和Dennis Ritchie&#xff08;C語言之父&#xff09;使用C語言對整個系統進行了再加工和編寫UNIX的源代碼屬于SCO公司&#xff08;AT&T ->Novell …

課程一(Neural Networks and Deep Learning),第四周(Deep Neural Networks)—— 0.學習目標...

Understand the key computations underlying deep learning, use them to build and train deep neural networks, and apply it to computer vision. 學習目標 See deep neural networks as successive blocks put one after each otherBuild and train a deep L-layer Neura…

使用ActionTrail Python SDK

ActionTrail提供官方的Python SDK。本文將簡單介紹一下如何使用ActionTrail的Python SDK。 安裝Aliyun Core SDK。 pip install aliyun-python-sdk-core 安裝ActionTrail Python SDK。 pip install aliyun-python-sdk-actiontrail 下面是測試的代碼。調用LookupEventsRequest獲…

泰坦尼克:機器從災難中學習_用于災難響應的機器學習研究:什么才是好的論文?...

泰坦尼克:機器從災難中學習For the first time in 2021, a major Machine Learning conference will have a track devoted to disaster response. The 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2021) has a track on…

github持續集成的設置_如何使用GitHub Actions和Puppeteer建立持續集成管道

github持續集成的設置Lately Ive added continuous integration to my blog using Puppeteer for end to end testing. My main goal was to allow automatic dependency updates using Dependabot. In this guide Ill show you how to create such a pipeline yourself. 最近&…

shell與常用命令

虛擬控制臺 一臺計算機的輸入輸出設備就是一個物理的控制臺 &#xff1b; 如果在一臺計算機上用軟件的方法實現了多個互不干擾獨立工作的控制臺界面&#xff0c;就是實現了多個虛擬控制臺&#xff1b; Linux終端的工作方式是字符命令行方式&#xff0c;用戶通過鍵盤輸入命令進…