機器學習模型 非線性模型_機器學習:通過預測菲亞特500的價格來觀察線性模型的工作原理...

機器學習模型 非線性模型

Introduction

介紹

In this article, I’d like to speak about linear models by introducing you to a real project that I made. The project that you can find in my Github consists of predicting the prices of fiat 500.

在本文中,我想通過向您介紹我所做的真實項目來談論線性模型。 您可以在我的Github中找到的項目包括預測菲亞特500的價格。

The dataset for my model presents 8 columns as you can see below and 1538 rows.

我的模型的數據集包含8列(如下所示)和1538行。

  • model: pop, lounge, sport

    模特:流行,休閑,運動
  • engine_power: Kw of the engine

    engine_power:發動機的千瓦
  • age_in_days: age of the car in days

    age_in_days:汽車的使用天數
  • km: kilometres of the car

    km:汽車的公里數
  • previous_owners: number of previous owners

    previous_owners:以前的所有者數
  • lat: latitude of the seller (the price of cars in Italy varies from North to South of the country)

    lat:賣方的緯度(意大利的汽車價格從該國的北部到南部不等)
  • lon: longitude of the seller (the price of cars in Italy varies from North to South of the country)

    lon:賣方的經度(意大利的汽車價格從該國的北部到南部不等)
  • price: selling price

    價格:售價

During this article, we will see in the first part some concepts about the linear regression, the ridge regression and the lasso regression. Then I will show you the fundamental insights that I found about the dataset I considered and last but not least we will see the preparation and the metrics I used to evaluate the performance of my model.

在本文中,我們將在第一部分中看到有關線性回歸,嶺回歸和套索回歸的一些概念。 然后,我將向您展示我對所考慮的數據集的基本見解,最后但并非最不重要的一點是,我們將看到用于評估模型性能的準備工作和度量標準。

Part I: Linear Regression, Ridge Regression and Lasso Regression

第一部分:線性回歸,嶺回歸和套索回歸

Linear models are a class of models that make a prediction using a linear function of the input features.

線性模型是使用輸入要素的線性函數進行預測的一類模型。

For what concerns regression, as we know the general formula looks like as follows:

對于回歸問題,我們知道一般公式如下所示:

Image for post

As you already know x[0] to x[p] represents the features of a single data point. Instead, m a b are the parameters of the model that are learned and ? is the prediction the model makes.

如您所知,x [0]至x [p]表示單個數據點的特征。 取而代之的是,m是一B是被學習的模型的參數,y是預測的模型使。

There are many linear models for regression. The difference between these models is about how the model parameters m and b are learned from the training data and how model complexity can be controlled. We will see three models for regression.

有許多線性模型可用于回歸。 這些模型之間的差異在于如何從訓練數據中學習模型參數mb以及如何控制模型復雜性。 我們將看到三種回歸模型。

  • Linear regression (ordinary least squares) → it finds the parameters m and b that minimize the mean squared error between predictions and the true regression targets, y, on the training set. The MSE is the sum of the squared differences between the predictions and the true value. Below how to compute it with scikit-learn.

    線性回歸(普通最小二乘) →它找到參數mb ,該參數使訓練集上的預測與真實回歸目標y之間的均方誤差最小。 MSE是預測值與真實值之間平方差的總和。 下面是如何使用scikit-learn計算它。

from sklearn.linear_model import LinearRegression X_train, X_test, y_train, y_test=train_test_split(X, y, random_state=0)lr = LinearRegression()lr.fit(X_train, y_train)print(“lr.coef_: {}”.format(lr.coef_)) print(“lr.intercept_: {}”.format(lr.intercept_))
  • Ridge regression → the formula it uses to make predictions is the same one used for the linear regression. In the ridge regression, the coefficients(m) are chosen for predicting well on the training data but also to fit the additional constraint. We want all entries of m should be close to zero. That means each feature should have a little effect on the outcome as possible(small slope), while still predicting well. This constraint is called regularization which means restricting a model to avoid overfitting. The particular ridge regression regularization is known as L2. Ridge regression is implemented in linear_model.Ridge as you can see below. In particular, by increasing alpha, we move the coefficients toward zero, which decreases training set performance but might help generalization and avoid overfitting.

    Ridge回歸 →用于進行預測的公式與用于線性回歸的公式相同。 在嶺回歸中,選擇系數(m)可以很好地預測訓練數據,但也可以擬合附加約束。 我們希望m的所有條目都應接近零。 這意味著每個特征都應該對結果產生盡可能小的影響(小斜率),同時仍能很好地預測。 此約束稱為正則化,這意味著限制模型以避免過度擬合。 特定的嶺回歸正則化稱為L2。 Ridge回歸在linear_model.Ridge中實現,如下所示。 特別是,通過增加alpha,我們會將系數移向零,這會降低訓練集的性能,但可能有助于泛化并避免過度擬合。

from sklearn.linear_model import Ridge ridge = Ridge(alpha=11).fit(X_train, y_train)print(“Training set score: {:.2f}”.format(ridge.score(X_train, y_train))) print(“Test set score: {:.2f}”.format(ridge.score(X_test, y_test)))
  • Lasso regression → an alternative for regularizing is Lasso. As with ridge regression, using the lasso also restricts coefficients to be close to zero, but in a slightly different way, called L1 regularization. The consequence of L1 regularization is that when using the lasso, some coefficients are exactly zero. This means some features are entirely ignored by the model.

    拉索回歸 →拉索正則化的替代方法。 與ridge回歸一樣,使用套索也將系數限制為接近零,但方式略有不同,稱為L1正則化。 L1正則化的結果是,使用套索時,某些系數正好為零。 這意味著模型將完全忽略某些功能。

from sklearn.linear_model import Lasso lasso = Lasso(alpha=3).fit(X_train, y_train) 
print(“Training set score: {:.2f}”.format(lasso.score(X_train, y_train))) print(“Test set score: {:.2f}”.format(lasso.score(X_test, y_test))) print(“Number of features used: {}”.format(np.sum(lasso.coef_ != 0)))

Part II: Insights that I found

第二部分:我發現的見解

Before to see the part about the preparation and evaluation of the model, it is useful to take a look at the situation of the dataset.

在查看有關模型準備和評估的部分之前,先了解一下數據集的情況是很有用的。

In the below scatter matrix we can observe that there some particular correlations between some features like km, age_in_days and price.

在下面的散點矩陣中,我們可以觀察到某些特征(例如km,age_in_days和價格)之間存在某些特定的相關性。

Image for post
image by author
圖片作者

Instead in the following correlation-matrix, we can see very well the result of correlations between the features.

相反,在下面的相關矩陣中,我們可以很好地看到特征之間的相關結果。

In particular, between age_in_days and price or km and price, we have a great correlation.

特別是在age_in_days和價格之間或km和價格之間,我們有很大的相關性。

This is the starting point for constructing our model and know which machine learning model could be fit better.

這是構建我們的模型的起點,并且知道哪種機器學習模型更合適。

Image for post
image by author
圖片作者

Part III: Prepare and evaluate the performance of the model

第三部分:準備和評估模型的性能

To train and test the dataset I used the Linear Regression.

為了訓練和測試數據集,我使用了線性回歸。

from sklearn.linear_model import LinearRegression
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
lr = LinearRegression()
lr.fit(X_train, y_train)

out:

出:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In the following table, there are the coefficients for each feature that I considered for my model.

下表列出了我為模型考慮的每個功能的系數。

coef_df = pd.DataFrame(lr.coef_, X.columns, columns=['Coefficient'])
coef_df

out:

出:

Image for post

Now, it is time to evaluate the model. In the following graph, characterized by a sample of 30 data points, we can observe the comparison between predicted values and actual values. As we can see our model is pretty good.

現在,該評估模型了。 在以30個數據點為樣本的下圖中,我們可以觀察到預測值與實際值之間的比較。 我們可以看到我們的模型非常好。

Image for post
image by author
圖片作者

The R-squared is a good measure of the ability of the model inputs to explain the variation of the dependent variables. In our case, we have 85%.

R平方可以很好地衡量模型輸入解釋因變量變化的能力。 就我們而言,我們有85%。

from sklearn.metrics import r2_score round(sklearn.metrics.r2_score(y_test, y_pred), 2)

out:

出:

0.85

Now I compute the MAE, MSE and the RMSE to have a more precise overview of the performance of the model.

現在,我計算MAE,MSE和RMSE,以更精確地概述模型的性能。

from sklearn import metrics print(‘Mean Absolute Error:’, metrics.mean_absolute_error(y_test, y_pred)) print(‘Mean Squared Error:’, metrics.mean_squared_error(y_test, y_pred))print(‘Root Mean Squared Error:',
np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

Finally, by comparing the training set score and the test set score we can see how performative is our model.

最后,通過比較訓練集得分和測試集得分,我們可以看到模型的性能如何。

print("Training set score: {:.2f}".format(lr.score(X_train, y_train)))print("Test set score: {:.2f}".format(lr.score(X_test, y_test)))

out:

出:

Training set score: 0.83 Test set score: 0.85

Conclusion

結論

Linear models are a class of models that are widely used in practice and have been studied extensively in the last few years in particular for machine learning. So, with this article, I hope you have obtained a good starting point in order to improve yourself and create your own Linear model.

線性模型是一類在實踐中廣泛使用的模型,并且在最近幾年中,特別是對于機器學習,已經進行了廣泛的研究。 因此,希望本文能夠為您提高自己并創建自己的線性模型提供一個良好的起點。

Thanks for reading this. There are some other ways you can keep in touch with me and follow my work:

感謝您閱讀本文。 您可以通過其他方法與我保持聯系并關注我的工作:

  • Subscribe to my newsletter.

    訂閱我的時事通訊。

  • You can also get in touch via my Telegram group, Data Science for Beginners.

    您也可以通過我的電報小組“ 面向初學者的數據科學”來聯系

翻譯自: https://towardsdatascience.com/machine-learning-observe-how-a-linear-model-works-by-predicting-the-prices-of-the-fiat-500-fb38e0d22681

機器學習模型 非線性模型

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/390978.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/390978.shtml
英文地址,請注明出處:http://en.pswp.cn/news/390978.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

NOIP賽前模擬20171027總結

題目: 1.壽司 給定一個環形的RB串要求經過兩兩互換后RB分別形成兩段連續區域,求最少操作次數(算法時間O(n)) 2.金字塔 給定一個金字塔的側面圖有n層已知每一層的寬度高度均為1要求在圖中取出恰好K個互不相交的矩形(邊緣可以重疊),求最多可以取…

虛幻引擎 js開發游戲_通過編碼3游戲學習虛幻引擎4-5小時免費游戲開發視頻課程

虛幻引擎 js開發游戲One of the most widely used game engines is Unreal Engine by Epic Games. On the freeCodeCamp.org YouTube channel, weve published a comprehensive course on how to use Unreal Engine with C to develop games.Epic Games的虛幻引擎是使用最廣泛的…

建造者模式什么時候使用?

問題:建造者模式什么時候使用? 建造者模式在現實世界里面的使用例子是什么?它有啥用呢?為啥不直接用工廠模式 回答一 下面是使用這個模式的一些理由和Java的樣例代碼,但是它是由設計模式的4個人討論出來的建造者模式…

TP5_學習

2017.10.27:1.index入口跑到public下面去了 2.不能使用 define(BIND_MODULE,Admin);自動生成模塊了,網上查了下: \think\Build::module(Admin);//親測,可用 2017.10.28:1.一直不知道怎么做查詢顯示和全部顯示,原來如此簡單&#x…

sql sum語句_SQL Sum語句示例說明

sql sum語句SQL中的Sum語句是什么? (What is the Sum statement in SQL?) This is one of the aggregate functions (as is count, average, max, min, etc.). They are used in a GROUP BY clause as it aggregates data presented by the SELECT FROM WHERE port…

10款中小企業必備的開源免費安全工具

10款中小企業必備的開源免費安全工具 secist2017-05-188共527453人圍觀 ,發現 7 個不明物體企業安全工具很多企業特別是一些中小型企業在日常生產中,時常會因為時間、預算、人員配比等問題,而大大減少或降低在安全方面的投入。這時候&#xf…

為什么Java里面沒有 SortedList

問題:為什么Java里面沒有 SortedList Java 里面有SortedSet和SortedMap接口,它們都屬于Java的集合框架和提供對元素進行排序的方法 然鵝,在我的認知里Java就沒有SortedList這個東西。你只能使用java.util.Collections.sort()去排序一個list…

圖片主成分分析后的可視化_主成分分析-可視化

圖片主成分分析后的可視化If you have ever taken an online course on Machine Learning, you must have come across Principal Component Analysis for dimensionality reduction, or in simple terms, for compression of data. Guess what, I had taken such courses too …

回溯算法和遞歸算法_回溯算法:遞歸和搜索示例說明

回溯算法和遞歸算法Examples where backtracking can be used to solve puzzles or problems include:回溯可用于解決難題或問題的示例包括: Puzzles such as eight queens puzzle, crosswords, verbal arithmetic, Sudoku [nb 1], and Peg Solitaire. 諸如八個皇后…

C#中的equals()和==

using System;namespace EqualsTest {class EqualsTest{static void Main(string[] args){//值類型int x 1;int y 1;Console.WriteLine(x y);//TrueConsole.WriteLine(x.Equals(y));//True //引用類型A a new A();B b new B();//Console.WriteLine(ab);//報錯…

JPA JoinColumn vs mappedBy

問題&#xff1a;JPA JoinColumn vs mappedBy 兩者的區別是什么呢 Entity public class Company {OneToMany(cascade CascadeType.ALL , fetch FetchType.LAZY)JoinColumn(name "companyIdRef", referencedColumnName "companyId")private List<B…

TP引用樣式表和js文件及驗證碼

TP引用樣式表和js文件及驗證碼 引入樣式表和js文件 <script src"__PUBLIC__/bootstrap/js/jquery-1.11.2.min.js"></script> <script src"__PUBLIC__/bootstrap/js/bootstrap.min.js"></script> <link href"__PUBLIC__/bo…

pytorch深度學習_深度學習和PyTorch的推薦系統實施

pytorch深度學習The recommendation is a simple algorithm that works on the principle of data filtering. The algorithm finds a pattern between two users and recommends or provides additional relevant information to a user in choosing a product or services.該…

什么是JavaScript中的回調函數?

This article gives a brief introduction to the concept and usage of callback functions in the JavaScript programming language.本文簡要介紹了JavaScript編程語言中的回調函數的概念和用法。 函數就是對象 (Functions are Objects) The first thing we need to know i…

Java 集合-集合介紹

2017-10-30 00:01:09 一、Java集合的類關系圖 二、集合類的概述 集合類出現的原因&#xff1a;面向對象語言對事物的體現都是以對象的形式&#xff0c;所以為了方便對多個對象的操作&#xff0c;Java就提供了集合類。數組和集合類同是容器&#xff0c;有什么不同&#xff1a;數…

為什么Java不允許super.super.method();

問題&#xff1a;為什么Java不允許super.super.method(); 我想出了這個問題&#xff0c;認為這個是很好解決的&#xff08;也不是沒有它就不行的&#xff09;如果可以像下面那樣寫的話&#xff1a; Override public String toString() {return super.super.toString(); }我不…

Exchange 2016部署實施案例篇-04.Ex基礎配置篇(下)

上二篇我們對全新部署完成的Exchange Server做了基礎的一些配置&#xff0c;今天繼續基礎配置這個話題。 DAG配置 先決條件 首先在配置DGA之前我們需要確保DAG成員服務器上磁盤的盤符都是一樣的&#xff0c;大小建議最好也相同。 其次我們需要確保有一塊網卡用于數據復制使用&…

數據庫課程設計結論_結論:

數據庫課程設計結論In this article, we will learn about different types[Z Test and t Test] of commonly used Hypothesis Testing.在本文中&#xff0c;我們將學習常用假設檢驗的不同類型[ Z檢驗和t檢驗 ]。 假設是什么&#xff1f; (What is Hypothesis?) This is a St…

JavaScript數據類型:Typeof解釋

typeof is a JavaScript keyword that will return the type of a variable when you call it. You can use this to validate function parameters or check if variables are defined. There are other uses as well.typeof是一個JavaScript關鍵字&#xff0c;當您調用它時將…

asp.net讀取用戶控件,自定義加載用戶控件

1、自定義加載用戶控件 ceshi.aspx頁面 <html><body> <div id"divControls" runat"server"></div> </body></html> ceshi.aspx.cs頁面 System.Web.UI.UserControl newUC (System.Web.UI.UserControl)Page.LoadContro…