神秘的數組初始化

by gk_

由gk_

圖像識別神秘化 (Image Recognition Demystified)

Nothing in machine learning captivates the imagination quite like the ability to recognize images. Identifying imagery must connote “intelligence,” right? Let’s demystify.

機器學習沒有什么能像圖像識別能力那樣吸引著想象力。識別圖像必須表示“智能”，對嗎？讓我們揭開神秘面紗。

The ability to “see,” when it comes to software, begins with the ability to classify. Classification is pattern matching with data. Images are data in the form of 2-dimensional matrices.

對于軟件，“查看”的能力始于分類的能力。分類是與數據進行模式匹配。圖像是二維矩陣形式的數據。

Image recognition is classifying data into one bucket out of many. This is useful work: you can classify an entire image or things within an image.

圖像識別將數據分類到眾多存儲桶中。這項工作很有用：您可以對整個圖像或圖像中的事物進行分類。

One of the classic and quite useful applications for image classification is optical character recognition (OCR): going from images of written language to structured text.

光學字符識別( OCR )是圖像分類的經典且非常有用的應用程序之一： 從書面圖像到結構化文本 。

This can be done for any alphabet and a wide variety of writing styles.

可以針對任何字母和多種書寫方式來完成此操作。

過程中的步驟 (Steps in the process)

We’ll build code to recognize numerical digits in images and show how this works. This will take 3 steps:

我們將構建代碼以識別圖像中的數字并顯示其工作原理。這將需要3個步驟：

gather and organize data to work with (85% of the effort)
收集和整理數據以進行合作(85％的努力)
build and test a predictive model (10% of the effort)
建立和測試預測模型 (工作量的10％)
use the model to recognize images (5% of the effort)
使用模型識別圖像(工作量的5％)

Preparing the data is by far the largest part of our work, this is true of most data science work. There’s a reason it’s called DATA science!

到目前為止，準備數據是我們工作的最大部分，大多數數據科學工作都是如此 。有一個原因叫數據科學！

The building of our predictive model and its use in predicting values is all math. We’re using software to iterate through data, to iteratively forge “weights” within mathematical equations, and to work with data structures. The software isn’t “intelligent”, it works mathematical equations to do the narrow knowledge work, in this case: recognizing images of digits.

我們的預測模型的建立及其在預測值中的用途都是數學上的 。我們正在使用軟件迭代數據，迭代偽造數學方程式中的“權重”以及使用數據結構。該軟件不是“智能”軟件，它通過數學方程式來完成狹義的知識工作，在這種情況下，即：識別數字圖像。

In practice, most of what people label “AI” is really just software performing knowledge work.

實際上，人們標記為“ AI”的大多數實際上只是執行知識工作的軟件。

我們的預測模型和數據 (Our predictive model and data)

We’ll be using one of the simplest predictive models: the “k-nearest neighbors” or “kNN” regression, first published by E. Fix, J.L. Hodges in 1952.

我們將使用最簡單的預測模型之一：“ k最近鄰居”或“ kNN”回歸模型，該模型最早由E. Fix，JL Hodges于1952年發布。

A simple explanation of this algorithm is here and a video of its math here. And also here for those that want to build the algorithm from scratch.

該算法的簡單解釋就是在這里和數學的視頻在這里。而且在這里為那些想從頭開始構建的算法。

Here’s how it works: imagine a graph of data points and circles capturing k points, with each value of k validated against your data.

它的工作方式如下：想象一下一個數據點和捕獲k個點的圓的圖形，其中k的每個值都針對您的數據進行了驗證。

The validation error for k in your data has a minimum which can be determined.

數據中k的驗證誤差有一個可以確定的最小值。

Given the ‘best’ value for k you can classify other points with some measure of precision.

給定k的“最佳”值，您可以用某種精度來對其他點進行分類。

We’ll use scikit learn’s kNN algorithm to avoid building the math ourselves. Conveniently this library will also provides us our images data.

我們將使用scikit Learn的kNN算法來避免自己構建數學。方便地，該庫還將為我們提供圖像數據。

Let’s begin.

讓我們開始。

The code is here, we’re using iPython notebook which is a productive way of working on data science projects. The code syntax is Python and our example is borrowed from sk-learn.

代碼在這里，我們使用的是iPython Notebook ，這是處理數據科學項目的一種有效方式。代碼語法是Python，我們的示例是從sk-learn借來的。

Start by importing the necessary libraries:

首先導入必要的庫：

Next we organize our data:

接下來，我們整理數據：

training images: 1527, test images: 269

You can manipulate the fraction and have more or less test data, we’ll see shortly how this impacts our model’s accuracy.

您可以操縱分數并擁有或多或少的測試數據，我們很快就會看到這如何影響模型的準確性。

By now you’re probably wondering: how are the digit images organized? They are arrays of values, one for each pixel in an 8x8 image. Let’s inspect one.

現在，您可能想知道：數字圖像是如何組織的？它們是值的數組，在8x8圖像中每個像素一個。讓我們檢查一個。

# one-dimension[  0.   1.  13.  16.  15.   5.   0.   0.   0.   4.  16.   7.  14.  12.   0.   0.   0.   3.  12.   2.  11.  10.   0.   0.   0.   0.   0.   0.  14.   8.   0.   0.   0.   0.   0.   3.  16.   4.   0.   0.   0.   0.   1.  11.  13.   0.   0.   0.   0.   0.   9.  16.  14.  16.   7.   0.   0.   1.  16.  16.  15.  12.   5.   0.]

# two-dimensions[[  0.   1.  13.  16.  15.   5.   0.   0.] [  0.   4.  16.   7.  14.  12.   0.   0.] [  0.   3.  12.   2.  11.  10.   0.   0.] [  0.   0.   0.   0.  14.   8.   0.   0.] [  0.   0.   0.   3.  16.   4.   0.   0.] [  0.   0.   1.  11.  13.   0.   0.   0.] [  0.   0.   9.  16.  14.  16.   7.   0.] [  0.   1.  16.  16.  15.  12.   5.   0.]]

The same image data is shown as a flat (one-dimensional) array and again as an 8x8 array in an array (two-dimensional). Think of each row of the image as an array of 8 pixels, there are 8 rows. We could ignore the gray-scale (the values) and work with 0’s and 1’s, that would simplify the math a bit.

相同的圖像數據顯示為平面(一維)陣列，再次顯示為陣列中的8x8陣列(二維)。將圖像的每一行視為一個8像素的數組，共有8行。我們可以忽略灰度(值)并使用0和1，這將簡化數學運算。

We can ‘plot’ this to see this array in its ‘pixelated’ form.

我們可以對此進行“繪制”以查看其“像素化”形式的數組。

What digit is this? Let’s ask our model, but first we need to build it.

這是幾位數讓我們問一下我們的模型，但是首先我們需要構建它。

KNN score: 0.951852

Against our test data our nearest-neighbor model had an accuracy score of 95%, not bad. Go back and change the ‘fraction’ value to see how this impacts the score.

根據我們的測試數據，我們的最近鄰居模型的準確度得分為95％，還不錯。返回并更改“分數”值以查看其如何影響分數。

array([2])

The model predicts that the array shown above is a ‘2’, which looks correct.

該模型預測上面顯示的數組為' 2 '，看起來正確。

Let’s try a few more, remember these are digits from our test data, we did not use these images to build our model (very important).

讓我們再嘗試一些，記住這些是測試數據中的數字 ，我們沒有使用這些圖像來構建我們的模型(非常重要)。

Not bad.

不錯。

We can create a fictional digit and see what our model thinks about it.

我們可以創建一個虛構的數字，然后看看我們的模型對此有何看法。

If we had a collection of nonsensical digit images we could add those to our training with a non-numeric label — just another classification.

如果我們收集了一系列無意義的數字圖像，則可以使用非數字標簽將它們添加到我們的訓練中，這只是另一種分類。

那么圖像識別如何工作？ (So how does image recognition work?)

image data is organized: both training and test, with labels (X, y)
圖像數據組織起來 ：訓練和測試都帶有標簽(X，y)

Training data is kept separate from test data, which also means we remove duplicates (or near-duplicates) between them.

訓練數據與測試數據是分開的，這也意味著我們刪除了它們之間的重復項(或幾乎重復項)。

a model is built using one of several mathematical models (kNN, logistic regression, convolutional neural network, etc.)
使用幾種數學模型( kNN ，邏輯回歸，卷積神經網絡等)之一構建模型

Which type of model you choose depends on your data and the type and complexity of the classification work.

選擇哪種類型的模型取決于您的數據以及分類工作的類型和復雜性。

new data is put into the model to generate a prediction
將新數據放入模型以生成預測

This is lighting fast: the result of a single mathematical calculation.

這是很快的事情：一次數學計算的結果。

If you have a collection of pictures with and without cats, you can build a model to classify if a picture contains a cat. Notice you need training images that are devoid of any cats for this to work.

如果您有帶和不帶貓的圖片集合，則可以建立模型來分類圖片是否包含貓。請注意，您需要沒有任何貓的訓練圖像才能起作用。

Of course you can apply multiple models to a picture and identify several things.

當然，您可以將多個模型應用于一張圖片并識別幾件事。

大數據 (Large Data)

A significant challenge in all of this is the size of each image since 8x8 is not a reasonable image size for anything but small digits, it’s not uncommon to be dealing with 500x500 pixel images, or larger. That’s 250,000 pixels per image, so 10,000 images of training means doing math on 2.5Billion values to build a model. And the math isn’t just addition or multiplication: we’re multiplying matrices, multiplying by floating-point weights, calculating derivatives. This is why processing power (and memory) is key in certain machine learning applications.

所有這方面的一個重大挑戰是每張圖像的大小，因為8x8對于除小數位以外的其他任何東西都不是合理的圖像大小，處理500x500像素或更大的圖像并不少見。那就是每張圖像250,000像素，因此10,000張訓練圖像意味著對25億個值進行數學運算以建立模型。數學不只是加法或乘法：我們要乘以矩陣，再乘以浮點權重，然后計算導數。這就是為什么處理能力(和內存)在某些機器學習應用程序中至關重要的原因。

There are strategies to deal with this image size problem:

有解決此圖像尺寸問題的策略：

use hardware graphic processor units (GPUs) to speed up the math
使用硬件圖形處理器單元( GPU )加速數學運算
reduce images to smaller dimensions, without losing clarity
將圖像縮小到較小的尺寸，而不會失去清晰度
reduce colors to gray-scale and gradients (you can still see the cat)
將顏色降低為灰度和漸變(您仍然可以看到貓)

look at sections of an image to find what you’re looking for
查看圖像的各個部分以找到所需的內容

The good news is once a model is built, no matter how laborious that was, the prediction is fast. Image processing is used in applications ranging from facial recognition to OCR to self-driving cars.

好消息是，一旦建立了模型，無論多么費力，預測都很快。圖像處理用于從面部識別到OCR到自動駕駛汽車的各種應用。

Now you understand the basics of how this works.

現在您了解了其工作原理。