深度學習算法和機器學習算法_啊哈! 4種流行的機器學習算法的片刻

深度學習算法和機器學習算法

Most people are either in two camps:

大多數人都在兩個營地中:

  • I don’t understand these machine learning algorithms.

    我不了解這些機器學習算法。
  • I understand how the algorithms work, but not why they work.

    我理解的算法是如何工作的,但不是為什么他們的工作。

This article seeks to explain not only how algorithms work, but give an intuitive understanding of why they work, to deliver that lightbulb aha! moment.

本文試圖解釋算法不僅是如何工作的,但給的為什么他們的工作,以交付燈泡AHA一個直觀的了解! 時刻。

決策樹 (Decision Trees)

Decision Trees divide the feature space using horizontal and vertical lines. For example, consider a very simplistic Decision Tree below, which has one conditional node and two class nodes, indicating a condition and under which category a training point that satisfies it will fall into.

決策樹使用水平線和垂直線劃分要素空間。 例如,考慮下面一個非常簡單的決策樹,該決策樹具有一個條件節點和兩個類節點,指示一個條件以及滿足該條件的訓練點將屬于哪個類別。

Image for post

Note that there is a lot of overlap between the fields marked as each color and the data points within that area that actually are that color, or (roughly) entropy. The decision tree is constructed to minimize the entropy. In this scenario, we can add an additional layer of complexity. If we were to add another condition; if x is less than 6 and y is larger than 6, we can designate points in that area as red. The entropy has been lowered with this move.

請注意,標記為每種顏色的字段與該區域內實際上是該顏色或(大致) 的數據點之間存在很多重疊。 構造決策樹以最小化熵。 在這種情況下,我們可以增加一層復雜性。 如果要添加另一個條件; 如果x小于6 y大于6,我們可以將該區域中的點指定為紅色。 此舉降低了熵。

Image for post

Each step, the Decision Tree algorithm attempts to find a method to build the tree such that the entropy is minimized. Think of entropy more formally as the amount of ‘disorder’ or ‘confusion’ a certain divider (the conditions) has, and its opposite as ‘information gain’ — how much a divider adds information and insight to the model. Feature splits that have the highest information gain (as well as a lowest entropy) are placed at the top.

在每個步驟中,決策樹算法都會嘗試找到一種構建樹的方法,以使熵最小化。 將熵更正式地看作是某個分隔線(條件)所具有的“混亂”或“混亂”的數量,而與之相反的是“信息增益”,即分隔線為模型增加了多少信息和洞察力。 具有最高信息增益(以及最低熵)的要素拆分位于頂部。

Image for post

The conditions may split their one-dimensional features somewhat like this:

條件可能會將其一維特征分解為如下形式:

Image for post

Note that condition 1 has clean separation, and therefore low entropy and high information gain. The same cannot be said for condition 3, which is why it is placed near the bottom of the Decision Tree. This construction of the tree ensures that it can remain as lightweight as possible.

請注意,條件1具有清晰的分隔,因此熵低且信息增益高。 條件3不能說相同,這就是為什么它位于決策樹底部附近的原因。 樹的這種構造確保其可以保持盡可能輕巧。

You can read more about entropy and its use in Decision Trees as well as neural networks (cross-entropy as a loss function) here.

您可以在此處閱讀有關熵及其在決策樹以及神經網絡(交叉熵作為損失函數)中的用法的更多信息。

隨機森林 (Random Forest)

Random Forest is a bagged (bootstrap aggregated) version of the Decision Tree. The primary idea is that several Decision Trees are each trained on a subset of data. Then, an input is passed through each model, and their outputs are aggregated through a function like a mean to produce a final output. Bagging is a form of ensemble learning.

隨機森林是決策樹的袋裝(引導聚合)版本。 主要思想是對數個決策樹分別訓練一個數據子集。 然后,輸入通過每個模型,并且它們的輸出通過類似平均值的函數進行匯總以產生最終輸出。 套袋是合奏學習的一種形式。

Image for post

There are many analogies for why Random Forest works well. Here is a common version of one:

有許多類比說明為什么隨機森林運作良好。 這是其中一個的通用版本:

You need to decide which restaurant to go to next. To ask someone for their recommendation, you must answer a variety of yes/no questions, which will lead them to make their decision for which restaurant you should go to.

您需要確定下一家餐廳。 要向某人提出建議,您必須回答各種是/否問題,這將使他們做出您應該去哪家餐廳的決定。

Would you rather only ask one friend or ask several friends, then find the mode or general consensus?

您愿意只問一個朋友還是問幾個朋友,然后找到方式或普遍共識?

Unless you only have one friend, most people would answer the second. The insight this analogy provides is that each tree has some sort of ‘diversity of thought’ because they were trained on different data, and hence have different ‘experiences’.

除非您只有一個朋友,否則大多數人都會回答第二個。 該類比提供的見解是,每棵樹都具有某種“思想多樣性”,因為它們是在不同的數據上進行訓練的,因此具有不同的“體驗”。

This analogy, clean and simple as it is, never really stood out to me. In the real world, the single-friend option has less experience than all the friends in total, but in machine learning, the decision tree and random forest models are trained on the same data, and hence, same experiences. The ensemble model is not actually receiving any new information. If I could ask one all-knowing friend for a recommendation, I see no objection to that.

這種類比,干凈和簡單,從來沒有真正讓我脫穎而出。 在現實世界中,單朋友選項的經驗少于所有朋友,但是在機器學習中,決策樹和隨機森林模型是在相同的數據上訓練的,因此也具有相同的體驗。 集成模型實際上沒有接收任何新信息。 如果我可以向一個全知的朋友提出建議,我不會反對。

How can a model trained on the same data that randomly pulls subsets of the data to simulate artificial ‘diversity’ perform better than one trained on the data as a whole?

在相同數據上訓練的,隨機抽取數據子集以模擬人為“多樣性”的模型如何比在整個數據上訓練的模型更好?

Take a sine wave with heavy normally distributed noise. This is your single Decision Tree classifier, which is naturally a very high-variance model.

拍攝正弦波,并帶有大量正態分布的噪聲。 這是您的單個決策樹分類器,它自然是一個高方差模型。

Image for post

100 ‘approximators’ will be chosen. These approximators randomly select points along the sine wave and generate a sinusoidal fit, much like decision trees being trained on subsets of the data. These fits are then averaged to form a bagged curve. The result? — a much smoother curve.

將選擇100個“近似器”。 這些逼近器沿正弦波隨機選擇點并生成正弦曲線擬合,就像在數據子集上訓練決策樹一樣。 然后將這些擬合平均,以形成袋裝曲線。 結果? -更平滑的曲線。

Image for post

The reason why bagging works is because it reduces the variance of models, and helps improve capability to generalize, by artificially making the model more ‘confident’. This is also why bagging does not work as well on already low-variance models like logistic regression.

套袋工作的原因在于,它通過人為地使模型更具“信心”,從而減少了模型的差異并有助于提高泛化能力。 這也就是為什么裝袋在諸如Logistic回歸之類的低方差模型中效果不佳的原因。

You can read more about the intuition and more rigorous proof of the success of bagging here.

您可以在這里關于直覺和成功套袋更嚴格的證據。

支持向量機 (Support Vector Machines)

Support Vector Machines attempt to find a hyperplane that can divide the data best, relying on the concept of ‘support vectors’ to maximize the divide between the two classes.

支持向量機依靠“支持向量”的概念來最大化兩個類別之間的距離,從而試圖找到一種可以最好地劃分數據的超平面。

Image for post

Unfortunately, most datasets are not so easily separable, and if they were, SVM would likely not be the best algorithm to handle it. Consider this one-dimensional separation task; there is no good divider, since any one separation will cause two separate classes to be lumped into the same one.

不幸的是,大多數數據集并不是那么容易分離,如果是這樣,SVM可能不是處理它的最佳算法。 考慮此一維分離任務; 沒有很好的分隔符,因為任何一種分隔都會導致將兩個單獨的類歸為同一類。

Image for post
One proposal for a split.
一個提議分開。

SVM is powerful at solving these kinds of problems by using a so-called ‘kernel trick’, which projects data into new dimensions to make the separation task easier. For instance, let’s create a new dimension, which is simply defined as x2 (x is the original dimension):

SVM通過使用所謂的“內核技巧”來強大地解決此類問題,該技巧將數據投影到新的維度上,從而使分離任務更加容易。 例如,讓我們創建一個新的層面,它被簡單地定義為x2(x為原始尺寸):

Image for post

Now, the data is cleanly separable after the data was projected onto a new dimension (each data point represented in two dimensions as (x, x2)).

現在,數據被投影到一個新的層面后的數據是干凈可分離(每個數據點在兩個維度為代表( x , x 2)

Using a variety of kernels — most popularly, polynomial, sigmoid, and RBF kernels — the kernel trick does the heavy lifting to create a transformed space such that the separation task is simple.

使用各種內核(最常見的是多項式,Sigmoid和RBF內核),內核技巧使繁重的工作創造了一個轉換后的空間,從而使分離任務變得簡單。

神經網絡 (Neural Networks)

Neural Networks are the pinnacle of machine learning. Their discovery, and that unlimited variations and improvements that can be made upon it have warranted it the subject of its own field, deep learning. Admittedly, the success of neural networks is still incomplete (“Neural networks are matrix multiplications that no one understands”), but the easiest way to explain them is through the Universal Approximation Theorem (UAT).

神經網絡是機器學習的頂峰。 他們的發現以及對它的無窮變化和改進使它成為了自己領域的主題,即深度學習。 誠然,神經網絡的成功仍然不完整(“神經網絡是沒人能理解的矩陣乘法”),但是最簡單的解釋方法是通過通用近似定理(UAT)。

At their core, every supervised algorithm seeks to model some underlying function of the data; usually this is either a regression plane or the feature boundary. Consider this function y = x2, which can be modelled to an arbitrary accuracy with several horizontal steps.

每種監督算法的核心都是試圖對數據的某些基礎功能進行建模。 通常這是一個回歸平面或特征邊界。 考慮這個函數y = x2 ,可以用幾個水平步長將其建模為任意精度。

Image for post

This is essentially what a neural network can do. Perhaps it can be a little more complex and model relationships beyond horizontal steps (like quadratic and linear lines below), but at its core, the neural network is a piecewise function approximator.

這本質上就是神經網絡可以做的。 也許除了水平步長(如下面的二次和線性線)之外,模型關系可能會更復雜一些,但是神經網絡的核心是分段函數逼近器。

Image for post

Each node is in delegated to one part of the piecewise function, and the purpose of the network is to activate certain neurons responsible for parts of the feature space. For instance, if one were to classify images of men with beards or no beards, several nodes should be delegated specifically to pixel locations where beards often appear. Somewhere in multi-dimensional space, these nodes represent a numerical range.

每個節點都委派給分段功能的一部分,而網絡的目的是激活負責部分特征空間的某些神經元。 例如,如果要對有胡須或沒有胡須的男性圖像進行分類,則應將幾個節點專門委派給經常出現胡須的像素位置。 在多維空間中的某個位置,這些節點表示一個數值范圍。

Note, again, that the question “why do neural networks work” is still unanswered. The UAT doesn’t answer this question, but states that neural networks, under certain human interpretations, can model any function. The field of Explainable/Interpretable AI is emerging to answer these questions with methods like activation maximization and sensitivity analysis.

再次注意,“神經網絡為什么起作用”的問題仍然沒有得到回答。 UAT并未回答這個問題,但指出在某些人類的解釋下,神經網絡可以為任何功能建模。 可解釋/可解釋AI的領域正在涌現,以通過激活最大化和敏感性分析之類的方法來回答這些問題。

You can read a more in-depth explanation and view visualizations of the Universal Approximation Theorem here.

您可以在此處閱讀更深入的解釋,并查看通用近似定理的可視化。

In all four algorithms, and many others, these look very simplistic at a low dimensionality. A key realization in machine learning is that a lot of the ‘magic’ and ‘intelligence’ we purport to see in AI is really a simple algorithm hidden under the guise of high dimensionality.

在所有四種算法以及許多其他算法中,這些算法在低維情況下看起來都非常簡單。 機器學習的一個關鍵實現是,我們聲稱在AI中看到的許多“魔術”和“智能”實際上是一個隱藏在高維偽裝下的簡單算法。

Decision trees splitting regions into squares is simple, but decision trees splitting high-dimensional space into hypercubes is less so. SVM performing a kernel trick to improve separability from one to two dimensions is understandable, but SVM doing the same thing on a dataset of hundreds of dimensions large is almost magic.

將區域劃分為正方形的決策樹很簡單,但是將高維空間劃分為超立方體的決策樹卻不那么容易。 SVM執行內核技巧以提高一維到二維的可分離性是可以理解的,但是SVM在數百個大維數據集上執行相同的操作幾乎是神奇的。

Our admiration and confusion of machine learning is predicated on our lack of understanding for high dimensional spaces. Learning how to get around high dimensionality and understanding algorithms in a native space is instrumental to an intuitive understanding.

我們對機器學習的欽佩和困惑是基于我們對高維空間缺乏了解。 學習如何解決高維問題并了解本機空間中的算法,有助于直觀理解。

All images created by author.

作者創作的所有圖像。

翻譯自: https://towardsdatascience.com/the-aha-moments-in-4-popular-machine-learning-algorithms-f7e75ef5b317

深度學習算法和機器學習算法

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/390774.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/390774.shtml
英文地址,請注明出處:http://en.pswp.cn/news/390774.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

Python第一次周考(0402)

2019獨角獸企業重金招聘Python工程師標準>>> 一、單選 1、Python3中下列語句錯誤的有哪些? A s input() B s raw_input() C print(hello world.) D print(hello world.) 2、下面哪個是 Pycharm 在 Windows 下 默認 用于“批量注釋”的快捷鍵 A Ctrl d…

express 路由中間件_Express通過示例進行解釋-安裝,路由,中間件等

express 路由中間件表達 (Express) When it comes to build web applications using Node.js, creating a server can take a lot of time. Over the years Node.js has matured enough due to the support from community. Using Node.js as a backend for web applications a…

ASP.NET 頁面之間傳值的幾種方式

對于任何一個初學者來說,頁面之間傳值可謂是必經之路,卻又是他們的難點。其實,對大部分高手來說,未必不是難點。 回想2016年面試的將近300人中,有實習生,有應屆畢業生,有1-3年經驗的&#xff0c…

Mapreduce原理和YARN

MapReduce定義 MapReduce是一種分布式計算框架,由Google公司2004年首次提出,并貢獻給Apache基金會。 MR版本 MapReduce 1.0,Hadoop早期版本(只支持MR模型)MapReduce 2.0,Hadoop 2.X版本(引入了YARN資源調度框架后&a…

數據可視化圖表類型_數據可視化中12種最常見的圖表類型

數據可視化圖表類型In the current era of large amounts of information in the form of numbers available everywhere, it is a difficult task to understand and get insights from these dense piles of data.在當今時代,到處都是數字形式的大量信息&#xff…

三大紀律七項注意(Access數據庫)

三大紀律(規則或范式) 要有主鍵其他字段依賴主鍵其他字段之間不能依賴七項注意 一表一主鍵(訂單表:訂單號;訂單明細表:訂單號產品編號)經常查,建索引,小數據(日期,數字類…

CentOS下安裝JDK的三種方法

來源:Linux社區 作者:spiders http://www.linuxidc.com/Linux/2016-09/134941.htm 由于各Linux開發廠商的不同,因此不同開發廠商的Linux版本操作細節也不一樣,今天就來說一下CentOS下JDK的安裝: 方法一:手動解壓JDK的壓縮包,然后…

MapReduce編程

自定義Mapper類 class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> &#xff5b; … }自定義mapper類都必須實現Mapper類&#xff0c;有4個類型參數&#xff0c;分別是&#xff1a; Object&#xff1a;Input Key Type-------------K1Text: Input…

統計信息在數據庫中的作用_統計在行業中的作用

統計信息在數據庫中的作用數據科學與機器學習 (DATA SCIENCE AND MACHINE LEARNING) Statistics are everywhere, and most industries rely on statistics and statistical thinking to support their business. The interest to grasp on statistics also required to become…

IOS手機關于音樂自動播放問題的解決辦法

2019獨角獸企業重金招聘Python工程師標準>>> 評估手機自帶瀏覽器不能識別 aduio標簽重的autoplay屬性 也不能自動執行play()方法 一個有效的解決方案是在微信jssdk中調用play方法 document.addEventListener("WeixinJSBridgeReady", function () { docum…

svg標簽和svg文件區別_什么是SVG文件? SVG圖片和標簽說明

svg標簽和svg文件區別SVG (SVG) SVG or Scalable Vector Graphics is a web standard for defining vector-based graphics in web pages. Based on XML the SVG standard provides markup to describe paths, shapes, and text within a viewport. The markup can be embedded…

開發人員怎么看實施人員

英文原文&#xff1a;What Developers Think Of Operations&#xff0c;翻譯&#xff1a;張紅月CSDN 在一個公司里面&#xff0c;開發和產品實施對于IS/IT的使用是至關重要的&#xff0c;一個負責產品的研發工作&#xff0c;另外一個負責產品的安裝、調試等工作。但是在開發人員…

怎么評價兩組數據是否接近_接近組數據(組間)

怎么評價兩組數據是否接近接近組數據(組間) (Approaching group data (between-group)) A typical situation regarding solving an experimental question using a data-driven approach involves several groups that differ in (hopefully) one, sometimes more variables.使…

代碼審計之DocCms漏洞分析

0x01 前言 DocCms[音譯&#xff1a;稻殼Cms] &#xff0c;定位于為企業、站長、開發者、網絡公司、VI策劃設計公司、SEO推廣營銷公司、網站初學者等用戶 量身打造的一款全新企業建站、內容管理系統&#xff0c;服務于企業品牌信息化建設&#xff0c;也適應用個人、門戶網站建設…

你讓,勛爵? 使用Jenkins聲明性管道的Docker中的Docker

Resources. When they are unlimited they are not important. But when theyre limited, boy do you have challenges! 資源。 當它們不受限制時&#xff0c;它們并不重要。 但是&#xff0c;當他們受到限制時&#xff0c;男孩你有挑戰&#xff01; Recently, my team has fa…

翻譯(九)——Clustered Indexes: Stairway to SQL Server Indexes Level 3

原文鏈接&#xff1a;www.sqlservercentral.com/articles/StairwaySeries/72351/ Clustered Indexes: Stairway to SQL Server Indexes Level 3 By David Durant, 2013/01/25 (first published: 2011/06/22) The Series 本文是階梯系列的一部分&#xff1a;SQL Server索引的階梯…

power bi 中計算_Power BI中的期間比較

power bi 中計算Just recently, I’ve come across a question on the LinkedIn platform, if it’s possible to create the following visualization in Power BI:就在最近&#xff0c;我是否在LinkedIn平臺上遇到了一個問題&#xff0c;是否有可能在Power BI中創建以下可視化…

-Hive-

Hive定義 Hive 是一種數據倉庫技術&#xff0c;用于查詢和管理存儲在分布式環境下的大數據集。構建于Hadoop的HDFS和MapReduce上&#xff0c;用于管理和查詢分析結構化/非結構化數據的數據倉庫; 使用HQL&#xff08;類SQL語句&#xff09;作為查詢接口&#xff1b;使用HDFS作…

CentOS 7 安裝 JDK

2019獨角獸企業重金招聘Python工程師標準>>> 1、下載oracle jdk 下載地址&#xff1a; http://www.oracle.com/technetwork/java/javase/downloads/index.html 選擇同一協議&#xff0c;下載rpm格式版本jdk&#xff0c;或tar.gz格式jdk。 2、卸載本機openjdk 2.1、查…

javascript 布爾_JavaScript布爾說明-如何在JavaScript中使用布爾

javascript 布爾布爾型 (Boolean) Booleans are a primitive datatype commonly used in computer programming languages. By definition, a boolean has two possible values: true or false.布爾值是計算機編程語言中常用的原始數據類型。 根據定義&#xff0c;布爾值有兩個…