深度學習算法和機器學習算法
Most people are either in two camps:
大多數人都在兩個營地中:
- I don’t understand these machine learning algorithms. 我不了解這些機器學習算法。
I understand how the algorithms work, but not why they work.
我理解的算法是如何工作的,但不是為什么他們的工作。
This article seeks to explain not only how algorithms work, but give an intuitive understanding of why they work, to deliver that lightbulb aha! moment.
本文試圖解釋算法不僅是如何工作的,但給的為什么他們的工作,以交付燈泡AHA一個直觀的了解! 時刻。
決策樹 (Decision Trees)
Decision Trees divide the feature space using horizontal and vertical lines. For example, consider a very simplistic Decision Tree below, which has one conditional node and two class nodes, indicating a condition and under which category a training point that satisfies it will fall into.
決策樹使用水平線和垂直線劃分要素空間。 例如,考慮下面一個非常簡單的決策樹,該決策樹具有一個條件節點和兩個類節點,指示一個條件以及滿足該條件的訓練點將屬于哪個類別。

Note that there is a lot of overlap between the fields marked as each color and the data points within that area that actually are that color, or (roughly) entropy. The decision tree is constructed to minimize the entropy. In this scenario, we can add an additional layer of complexity. If we were to add another condition; if x is less than 6 and y is larger than 6, we can designate points in that area as red. The entropy has been lowered with this move.
請注意,標記為每種顏色的字段與該區域內實際上是該顏色或(大致) 熵的數據點之間存在很多重疊。 構造決策樹以最小化熵。 在這種情況下,我們可以增加一層復雜性。 如果要添加另一個條件; 如果x小于6 , y大于6,我們可以將該區域中的點指定為紅色。 此舉降低了熵。

Each step, the Decision Tree algorithm attempts to find a method to build the tree such that the entropy is minimized. Think of entropy more formally as the amount of ‘disorder’ or ‘confusion’ a certain divider (the conditions) has, and its opposite as ‘information gain’ — how much a divider adds information and insight to the model. Feature splits that have the highest information gain (as well as a lowest entropy) are placed at the top.
在每個步驟中,決策樹算法都會嘗試找到一種構建樹的方法,以使熵最小化。 將熵更正式地看作是某個分隔線(條件)所具有的“混亂”或“混亂”的數量,而與之相反的是“信息增益”,即分隔線為模型增加了多少信息和洞察力。 具有最高信息增益(以及最低熵)的要素拆分位于頂部。

The conditions may split their one-dimensional features somewhat like this:
條件可能會將其一維特征分解為如下形式:

Note that condition 1 has clean separation, and therefore low entropy and high information gain. The same cannot be said for condition 3, which is why it is placed near the bottom of the Decision Tree. This construction of the tree ensures that it can remain as lightweight as possible.
請注意,條件1具有清晰的分隔,因此熵低且信息增益高。 條件3不能說相同,這就是為什么它位于決策樹底部附近的原因。 樹的這種構造確保其可以保持盡可能輕巧。
You can read more about entropy and its use in Decision Trees as well as neural networks (cross-entropy as a loss function) here.
您可以在此處閱讀有關熵及其在決策樹以及神經網絡(交叉熵作為損失函數)中的用法的更多信息。
隨機森林 (Random Forest)
Random Forest is a bagged (bootstrap aggregated) version of the Decision Tree. The primary idea is that several Decision Trees are each trained on a subset of data. Then, an input is passed through each model, and their outputs are aggregated through a function like a mean to produce a final output. Bagging is a form of ensemble learning.
隨機森林是決策樹的袋裝(引導聚合)版本。 主要思想是對數個決策樹分別訓練一個數據子集。 然后,輸入通過每個模型,并且它們的輸出通過類似平均值的函數進行匯總以產生最終輸出。 套袋是合奏學習的一種形式。

There are many analogies for why Random Forest works well. Here is a common version of one:
有許多類比說明為什么隨機森林運作良好。 這是其中一個的通用版本:
You need to decide which restaurant to go to next. To ask someone for their recommendation, you must answer a variety of yes/no questions, which will lead them to make their decision for which restaurant you should go to.
您需要確定下一家餐廳。 要向某人提出建議,您必須回答各種是/否問題,這將使他們做出您應該去哪家餐廳的決定。
Would you rather only ask one friend or ask several friends, then find the mode or general consensus?
您愿意只問一個朋友還是問幾個朋友,然后找到方式或普遍共識?
Unless you only have one friend, most people would answer the second. The insight this analogy provides is that each tree has some sort of ‘diversity of thought’ because they were trained on different data, and hence have different ‘experiences’.
除非您只有一個朋友,否則大多數人都會回答第二個。 該類比提供的見解是,每棵樹都具有某種“思想多樣性”,因為它們是在不同的數據上進行訓練的,因此具有不同的“體驗”。
This analogy, clean and simple as it is, never really stood out to me. In the real world, the single-friend option has less experience than all the friends in total, but in machine learning, the decision tree and random forest models are trained on the same data, and hence, same experiences. The ensemble model is not actually receiving any new information. If I could ask one all-knowing friend for a recommendation, I see no objection to that.
這種類比,干凈和簡單,從來沒有真正讓我脫穎而出。 在現實世界中,單朋友選項的經驗少于所有朋友,但是在機器學習中,決策樹和隨機森林模型是在相同的數據上訓練的,因此也具有相同的體驗。 集成模型實際上沒有接收任何新信息。 如果我可以向一個全知的朋友提出建議,我不會反對。
How can a model trained on the same data that randomly pulls subsets of the data to simulate artificial ‘diversity’ perform better than one trained on the data as a whole?
在相同數據上訓練的,隨機抽取數據子集以模擬人為“多樣性”的模型如何比在整個數據上訓練的模型更好?
Take a sine wave with heavy normally distributed noise. This is your single Decision Tree classifier, which is naturally a very high-variance model.
拍攝正弦波,并帶有大量正態分布的噪聲。 這是您的單個決策樹分類器,它自然是一個高方差模型。

100 ‘approximators’ will be chosen. These approximators randomly select points along the sine wave and generate a sinusoidal fit, much like decision trees being trained on subsets of the data. These fits are then averaged to form a bagged curve. The result? — a much smoother curve.
將選擇100個“近似器”。 這些逼近器沿正弦波隨機選擇點并生成正弦曲線擬合,就像在數據子集上訓練決策樹一樣。 然后將這些擬合平均,以形成袋裝曲線。 結果? -更平滑的曲線。

The reason why bagging works is because it reduces the variance of models, and helps improve capability to generalize, by artificially making the model more ‘confident’. This is also why bagging does not work as well on already low-variance models like logistic regression.
套袋工作的原因在于,它通過人為地使模型更具“信心”,從而減少了模型的差異并有助于提高泛化能力。 這也就是為什么裝袋在諸如Logistic回歸之類的低方差模型中效果不佳的原因。
You can read more about the intuition and more rigorous proof of the success of bagging here.
您可以在這里關于直覺和成功套袋更嚴格的證據。
支持向量機 (Support Vector Machines)
Support Vector Machines attempt to find a hyperplane that can divide the data best, relying on the concept of ‘support vectors’ to maximize the divide between the two classes.
支持向量機依靠“支持向量”的概念來最大化兩個類別之間的距離,從而試圖找到一種可以最好地劃分數據的超平面。

Unfortunately, most datasets are not so easily separable, and if they were, SVM would likely not be the best algorithm to handle it. Consider this one-dimensional separation task; there is no good divider, since any one separation will cause two separate classes to be lumped into the same one.
不幸的是,大多數數據集并不是那么容易分離,如果是這樣,SVM可能不是處理它的最佳算法。 考慮此一維分離任務; 沒有很好的分隔符,因為任何一種分隔都會導致將兩個單獨的類歸為同一類。

SVM is powerful at solving these kinds of problems by using a so-called ‘kernel trick’, which projects data into new dimensions to make the separation task easier. For instance, let’s create a new dimension, which is simply defined as x2 (x is the original dimension):
SVM通過使用所謂的“內核技巧”來強大地解決此類問題,該技巧將數據投影到新的維度上,從而使分離任務更加容易。 例如,讓我們創建一個新的層面,它被簡單地定義為x2(x為原始尺寸):

Now, the data is cleanly separable after the data was projected onto a new dimension (each data point represented in two dimensions as (x, x2)
).
現在,數據被投影到一個新的層面后的數據是干凈可分離(每個數據點在兩個維度為代表( x , x 2)
Using a variety of kernels — most popularly, polynomial, sigmoid, and RBF kernels — the kernel trick does the heavy lifting to create a transformed space such that the separation task is simple.
使用各種內核(最常見的是多項式,Sigmoid和RBF內核),內核技巧使繁重的工作創造了一個轉換后的空間,從而使分離任務變得簡單。
神經網絡 (Neural Networks)
Neural Networks are the pinnacle of machine learning. Their discovery, and that unlimited variations and improvements that can be made upon it have warranted it the subject of its own field, deep learning. Admittedly, the success of neural networks is still incomplete (“Neural networks are matrix multiplications that no one understands”), but the easiest way to explain them is through the Universal Approximation Theorem (UAT).
神經網絡是機器學習的頂峰。 他們的發現以及對它的無窮變化和改進使它成為了自己領域的主題,即深度學習。 誠然,神經網絡的成功仍然不完整(“神經網絡是沒人能理解的矩陣乘法”),但是最簡單的解釋方法是通過通用近似定理(UAT)。
At their core, every supervised algorithm seeks to model some underlying function of the data; usually this is either a regression plane or the feature boundary. Consider this function y = x2, which can be modelled to an arbitrary accuracy with several horizontal steps.
每種監督算法的核心都是試圖對數據的某些基礎功能進行建模。 通常這是一個回歸平面或特征邊界。 考慮這個函數y = x2 ,可以用幾個水平步長將其建模為任意精度。

This is essentially what a neural network can do. Perhaps it can be a little more complex and model relationships beyond horizontal steps (like quadratic and linear lines below), but at its core, the neural network is a piecewise function approximator.
這本質上就是神經網絡可以做的。 也許除了水平步長(如下面的二次和線性線)之外,模型關系可能會更復雜一些,但是神經網絡的核心是分段函數逼近器。

Each node is in delegated to one part of the piecewise function, and the purpose of the network is to activate certain neurons responsible for parts of the feature space. For instance, if one were to classify images of men with beards or no beards, several nodes should be delegated specifically to pixel locations where beards often appear. Somewhere in multi-dimensional space, these nodes represent a numerical range.
每個節點都委派給分段功能的一部分,而網絡的目的是激活負責部分特征空間的某些神經元。 例如,如果要對有胡須或沒有胡須的男性圖像進行分類,則應將幾個節點專門委派給經常出現胡須的像素位置。 在多維空間中的某個位置,這些節點表示一個數值范圍。
Note, again, that the question “why do neural networks work” is still unanswered. The UAT doesn’t answer this question, but states that neural networks, under certain human interpretations, can model any function. The field of Explainable/Interpretable AI is emerging to answer these questions with methods like activation maximization and sensitivity analysis.
再次注意,“神經網絡為什么起作用”的問題仍然沒有得到回答。 UAT并未回答這個問題,但指出在某些人類的解釋下,神經網絡可以為任何功能建模。 可解釋/可解釋AI的領域正在涌現,以通過激活最大化和敏感性分析之類的方法來回答這些問題。
You can read a more in-depth explanation and view visualizations of the Universal Approximation Theorem here.
您可以在此處閱讀更深入的解釋,并查看通用近似定理的可視化。
In all four algorithms, and many others, these look very simplistic at a low dimensionality. A key realization in machine learning is that a lot of the ‘magic’ and ‘intelligence’ we purport to see in AI is really a simple algorithm hidden under the guise of high dimensionality.
在所有四種算法以及許多其他算法中,這些算法在低維情況下看起來都非常簡單。 機器學習的一個關鍵實現是,我們聲稱在AI中看到的許多“魔術”和“智能”實際上是一個隱藏在高維偽裝下的簡單算法。
Decision trees splitting regions into squares is simple, but decision trees splitting high-dimensional space into hypercubes is less so. SVM performing a kernel trick to improve separability from one to two dimensions is understandable, but SVM doing the same thing on a dataset of hundreds of dimensions large is almost magic.
將區域劃分為正方形的決策樹很簡單,但是將高維空間劃分為超立方體的決策樹卻不那么容易。 SVM執行內核技巧以提高一維到二維的可分離性是可以理解的,但是SVM在數百個大維數據集上執行相同的操作幾乎是神奇的。
Our admiration and confusion of machine learning is predicated on our lack of understanding for high dimensional spaces. Learning how to get around high dimensionality and understanding algorithms in a native space is instrumental to an intuitive understanding.
我們對機器學習的欽佩和困惑是基于我們對高維空間缺乏了解。 學習如何解決高維問題并了解本機空間中的算法,有助于直觀理解。
All images created by author.
作者創作的所有圖像。
翻譯自: https://towardsdatascience.com/the-aha-moments-in-4-popular-machine-learning-algorithms-f7e75ef5b317
深度學習算法和機器學習算法
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/390774.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/390774.shtml 英文地址,請注明出處:http://en.pswp.cn/news/390774.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!