by Wolfgang Beyer

沃爾夫岡·拜爾(Wolfgang Beyer)

如何使用TensorFlow構建簡單的圖像識別系統(第2部分) (How to Build a Simple Image Recognition System with TensorFlow (Part 2))

This is the second part of my introduction to building an image recognition system with TensorFlow. In the first part we built a softmax classifier to label images from the CIFAR-10 dataset. We achieved an accuracy of around 25–30%. Since there are 10 different and equally likely categories, labeling the images randomly we’d expect an accuracy of 10%. So we’re already a lot better than random, but there’s still plenty of room for improvement.

這是我使用TensorFlow構建圖像識別系統的第二部分。在第一部分中，我們構建了softmax分類器來標記CIFAR-10數據集中的圖像。我們達到了約25–30％的精度。由于存在10個不同且可能性均等的類別，因此隨機標記圖像，我們希望其準確性為10％。因此，我們已經比隨機的要好很多，但是仍有很大的改進空間。

In this post, I’ll describe how to build a neural network that performs the same task. Let’s see by how much we can increase our prediction accuracy!

在本文中，我將描述如何構建執行相同任務的神經網絡。讓我們看看我們可以提高多少預測精度！

神經網絡 (Neural Networks)

Neural networks are very loosely based on how biological brains work. They consist of a number of artificial neurons which each process multiple incoming signals and return a single output signal. The output signal can then be used as an input signal for other neurons.

神經網絡非常松散地基于生物大腦的工作方式。它們由許多人工神經元組成，每個神經元處理多個傳入信號并返回單個輸出信號。輸出信號然后可以用作其他神經元的輸入信號。

Let’s take a look at an individual neuron:

讓我們看一下單個神經元：

What happens in a single neuron is very similar to what happens in the the softmax classifier. Again we have a vector of input values and a vector of weights. The weights are the neuron’s internal parameters. Both input vector and weights vector contain the same number of values, so we can use them to calculate a weighted sum.

單個神經元中發生的事情與softmax分類器中發生的事情非常相似。同樣，我們有一個輸入值向量和一個權重向量。權重是神經元的內部參數。輸入向量和權重向量都包含相同數量的值，因此我們可以使用它們來計算加權和。

So far, we’re doing exactly the same calculation as in the softmax classifier, but now comes a little twist: as long as the result of the weighted sum is a positive value, the neuron’s output is this value. But if the weighted sum is a negative value, we ignore that negative value and the neuron generates an output of 0 instead. This operation is called a Rectified Linear Unit (ReLU).

到目前為止，我們所做的計算與softmax分類器中的計算完全相同，但是現在有了一些變化：只要加權總和的結果為正值，神經元的輸出就是該值。但是，如果加權和為負值，我們將忽略該負值，而神經元將生成輸出0。此操作稱為整流線性單位(ReLU)。

The reason for using a ReLU is that this creates a nonlinearity. The neuron’s output is now not strictly a linear combination (= weighted sum) of its inputs anymore. We’ll see why this is useful when we stop looking at individual neurons and instead look at the whole network.

使用ReLU的原因是這會產生非線性。現在，神經元的輸出不再嚴格是其輸入的線性組合(=加權和)。我們將看到為什么當我們停止查看單個神經元，而是查看整個網絡時，為什么這樣做有用。

The neurons in artificial neural networks are usually not connected randomly to each other. Most of the time they are arranged in layers:

人工神經網絡中的神經元通常不會彼此隨機連接。大多數情況下，它們是分層排列的：

The input image’s pixel values are the inputs for the network’s first layer of neurons. The output of the neurons in layer 1 is the input for neurons of layer 2 and so forth. This is the reason why having a nonlinearity is so important. Without the ReLU at each layer, we would only have a sequence of weighted sums. And stacked weighted sums can be merged into a single weighted sum, so the multiple layers would give us no improvement over a single layer network. Introducing the ReLU nonlinearity solves this problem as each additional layer really adds something to the network.

輸入圖像的像素值是網絡第一層神經元的輸入。第1層中神經元的輸出是第2層中神經元的輸入，依此類推。這就是為什么具有非線性如此重要的原因。沒有每一層的ReLU，我們將只有一系列加權和。而且堆疊的加權總和可以合并為單個加權總和，因此多層結構不會對單層網絡帶來任何改善。引入ReLU非線性解決了這個問題，因為每個附加層確實為網絡增加了一些東西。

The network’s final layer’s output are the values we are interested in, the scores for the image categories. In this network architecture each neuron is connected to all neurons of the previous layer, therefore this kind of network is called a fully connected network. As we shall see in Part 3 of this Tutorial, that is not necessarily always the case.

網絡最后一層的輸出是我們感興趣的值，即圖像類別的分數。在這種網絡體系結構中，每個神經元都連接到上一層的所有神經元，因此，這種網絡稱為完全連接網絡。正如我們將在本教程的第3部分中看到的那樣，情況不一定總是如此。

And that’s already the end of my very brief part on the theory of neural networks. Let’s get started building one!

這已經是我關于神經網絡理論的簡短部分的結尾。讓我們開始構建一個！

代碼 (The Code)

The full code for this example is available on Github. It requires TensorFlow and the CIFAR-10 dataset (see Part 1) on how to install the prerequisites).

Github上提供了此示例的完整代碼。它需要TensorFlow和CIFAR-10數據集(有關如何安裝必備組件的信息，請參閱第1部分 )。

If you’ve made your way through my previous blog post, you’ll see that the code for the neural network classifier is pretty similar to the code for the softmax classifier. But in addition to switching out the part of the code that defines the model, I’ve added a couple of small features to show some of the things TensorFlow can do:

如果您已經看過我以前的博客文章，那么您會發現神經網絡分類器的代碼與softmax分類器的代碼非常相似。但是除了切換定義模型的代碼部分之外，我還添加了一些小功能來展示TensorFlow可以做的一些事情：

Regularization: this is a very common technique to prevent overfitting of a model. It works by applying a counter-force during the optimization process which aims to keep the model simple.
正則化：這是防止模型過度擬合的非常常用的技術。它通過在優化過程中施加反作用來工作，該作用旨在保持模型簡單。
Visualization of the model with TensorBoard: TensorBoard is included with TensorFlow and allows you to generate charts and graphs from your models and from data generated by your models. This helps with analyzing your models and is especially useful for debugging.
使用TensorBoard可視化模型：TensorFlow隨附TensorBoard，使您可以從模型以及模型生成的數據生成圖表。這有助于分析模型，對于調試尤其有用。
Checkpoints: this feature allows you to save the current state of your model for later use. Training a model can take quite a while, so it’s essential to not have to start from scratch each time you want to use it.
檢查點：此功能使您可以保存模型的當前狀態以供以后使用。訓練模型可能需要花費相當長的時間，因此至關重要的是，不必在每次使用模型時都從頭開始。

The code is split into two files this time: there’s two_layer_fc.py, which defines the model, and run_fc_model.py, which runs the model (in case you’re wondering: ‘fc’ stands for fully connected).

這次將代碼分為兩個文件：定義模型的two_layer_fc.py和運行模型的run_fc_model.py (以防您想知道：'fc'代表完全連接)。

兩層全連接神經網絡 (2-Layer Fully Connected Neural Network)

Let’s look at the model itself first and deal with running and training it later. two_layer_fc.py contains the following functions:

讓我們先看一下模型本身，然后再處理并訓練它。 two_layer_fc.py包含以下功能：

inference() gets us from input data to class scores.
inference()讓我們從輸入數據到課程成績。
loss() calculates the loss value from class scores.
loss()根據班級成績計算損失值。
training() performs a single training step.
training()執行單個訓練步驟。
evaluation() calculates the accuracy of the network.
evaluation()計算網絡的準確性。

生成班級成績： `inference()` (Generating Class Scores: `inference()`)

inference() describes the forward pass through the network. How are the class scores calculated, starting from input images?

inference()描述通過網絡的正向傳遞。從輸入圖像開始，如何計算班級成績？

The images parameter is the TensorFlow placeholder containing the actual image data. The next three parameters describe the shape/size of the network. image_pixels is the number of pixels per input image, classes is the number of different output labels and hidden_units is the number of neurons in the first/hidden layer of our network.

images參數是包含實際圖像數據的TensorFlow占位符。接下來的三個參數描述了網絡的形狀/大小。 image_pixels是每個輸入圖像的像素數， classes是不同輸出標簽的數量， hidden_units是我們網絡的第一/隱藏層中的神經元的數量。

Each neuron takes all values from the previous layer as input and generates a single output value. Each neuron in the hidden layer therefore has image_pixels inputs and the layer as a whole generates hidden_units outputs. These are then fed into the classes neurons of the output layer which generate classes output values, one score per class.

每個神經元都將前一層的所有值作為輸入，并生成單個輸出值。因此，隱藏層中的每個神經元都具有image_pixels輸入，并且該層作為一個整體生成hidden_units輸出。然后將它們輸入到輸出層的classes神經元中，這些神經元生成classes輸出值，每個類一個分數。

reg_constant is the regularization constant. TensorFlow allows us to add regularization to our network very easily by handling most of the calculations automatically. I’ll go into a bit more detail when we get to the loss function.

reg_constant是正則化常數。 TensorFlow允許我們通過自動處理大多數計算來非常輕松地將正則化添加到我們的網絡中。當我們討論損失函數時，我將進一步詳細介紹。

Since our neural network has 2 similar layers, we’ll define a separate scope for each. This allows us to reuse variable names in each scope. The biases variable is defined in the way we already know, by using tf.Variable().

由于我們的神經網絡具有2個相似的層，因此我們將為每個層定義一個單獨的范圍。這使我們可以在每個作用域中重用變量名。該biases變量是在我們已經知道的方式定義，通過使用tf.Variable()

The definition of the weights variable is a bit more involved. We use tf.get_variable(), which allows us to add regularization. weights is a matrix with dimensions of image_pixels by hidden_units (input vector size x output vector size). The initializer parameter describes the weight variable’s initial values.

weights變量的定義要復雜得多。我們使用tf.get_variable() ，它允許我們添加正則化。 weights是具有的尺寸的矩陣image_pixels通過hidden_units (輸入矢量大小x輸出向量的大小)。 initializer參數描述了weight變量的初始值。

Up to now, we’ve initialized our variables to 0, but this wouldn’t work here. Think about the neurons in a single layer. They all receive exactly the same input values. If they all had the same internal parameters as well, they would all make the same calculation and all output the same value. To avoid this, we need to randomize their initial weights. We use an initialization scheme which usually works well, the weights are initialized to normally distributed values. We drop values which are more than 2 standard deviations from the mean, and the standard deviation is set to the inverse of the square root of the number of input pixels. Luckily TensorFlow handles all these details for us, we just need to specify that we want to use a truncated_normal_initializer which does exactly what we want.

到目前為止，我們已經將變量初始化為0，但這在這里不起作用。考慮單層中的神經元。它們都接收完全相同的輸入值。如果它們都具有相同的內部參數，則它們都將進行相同的計算，并且都將輸出相同的值。為了避免這種情況，我們需要將它們的初始權重隨機化。我們使用通常工作良好的初始化方案，將權重初始化為正態分布的值。我們丟棄與平均值相比大于2個標準偏差的值，并且將標準偏差設置為輸入像素數的平方根的倒數。幸運的是TensorFlow為我們處理了所有這些細節，我們只需要指定我們想要使用一個truncated_normal_initializer完成我們想要的工作。

The final parameter for the weights variable is the regularizer. All we have to do at this point is to tell TensorFlow we want to use L2-regularization for the weights variable. I’ll cover regularization here.

weights變量的最后一個參數是regularizer 。這時我們要做的就是告訴TensorFlow我們想對weights變量使用L2正則化。我將在這里討論正則化。

To create the first layer’s output we multiply the images matrix and the weights matrix witch each other and add the bias variable. This is exactly the same as in the softmax classifier from the previous blog post. Then we apply tf.nn.relu(), the ReLU function to arrive at the hidden layer’s output.

為了創建第一層的輸出，我們將images矩陣和weights矩陣彼此相乘，然后加上bias變量。這與先前博客文章中的softmax分類器完全相同。然后我們使用ReLU函數tf.nn.relu()到達隱藏層的輸出。

Layer 2 is very similar to layer 1. The number of inputs is hidden_units, the number of outputs is classes. Therefore the dimensions of the weights matrix are [hidden_units, classes]. Since this is the final layer of our network, there’s no need for a ReLU anymore. We arrive at the class scores (logits) by multiplying input (hidden) and weights with each other and adding bias.

第2層與第1層非常相似。輸入的數量是hidden_units ，輸出的數量是classes 。因此， weights矩陣的維數為[hidden_units, classes] 。由于這是我們網絡的最后一層，因此不再需要ReLU。我們通過將輸入( hidden )和weights彼此相乘并加上bias來得出班級成績(對logits )。

The summary operation tf.histogram_summary() allows us to record the value of the logits variable for later analysis with TensorBoard. I’ll cover this later.

摘要操作tf.histogram_summary()允許我們記錄logits變量的值，以便以后使用TensorBoard進行分析。我將這個以后。

To sum it up, the inference() function as whole takes in input images and returns class scores. That’s all a trained classifier needs to do, but in order to arrive at a trained classifier, we first need to measure how good those class scores are. That’s the job of the loss function.

概括起來，整個inference()函數接收輸入圖像并返回類分數。這是訓練有素的分類器所需要做的所有工作，但是為了得出訓練有素的分類器，我們首先需要衡量這些類分數的好壞。這就是損失函數的工作。

計算損失： `loss()` (Calculating the Loss: `loss()`)

First we calculate the cross-entropy between logits(the model’s output) and labels(the correct labels from the training dataset). That has been our whole loss function for the softmax classifier, but this time we want to use regularization, so we have to add another term to our loss.

首先，我們計算logits (模型的輸出)和labels (訓練數據集中的正確標簽)之間的交叉熵。這就是softmax分類器的全部損失函數，但是這次我們要使用正則化，因此我們必須在損失中添加另一個術語。

Let’s take a step back first and look at what we want to achieve by using regularization.

讓我們先退后一步，看看我們希望通過使用正則化來實現什么。

過度擬合和正則化 (Overfitting and Regularization)

When a statistical model captures the random noise in the data it was trained on instead of the true underlying relationship, this is called overfitting.

當統計模型捕獲其訓練所依據的數據中的隨機噪聲而不是真正的基礎關系時，這稱為過擬合。

In the above image there are two different classes, represented by the blue and red circles. The green line is an overfitted classifier. It follows the training data perfectly, but it is also heavily dependent on it and is likely to handle unseen data worse than the black line, which represents a regularized model.

在上圖中，有兩個不同的類，分別由藍色和紅色圓圈表示。綠線是過度擬合的分類器。它完美地遵循了訓練數據，但也嚴重依賴于訓練數據，并且可能處理比黑線更糟的看不見的數據，黑線表示正則化模型。

So our goal for regularization is to arrive at a simple model without any unnecessary complications. There are different ways to achieve this, and the option we are choosing is called L2-regularization. L2-regularization adds the sum of the squares of all the weights in the network to the loss function. This corresponds to a heavy penalty if the model is using big weights and a small penalty if the model is using small weights.

因此，我們進行正則化的目標是獲得一個沒有任何不必要復雜性的簡單模型。有多種方法可以實現此目的，我們選擇的選項稱為L2正則化。 L2正則化將網絡中所有權重的平方和與損失函數相加。如果模型使用的是較大的權重，則對應沉重的懲罰；如果模型使用的是較小的權重，則對應較小的懲罰。

That’s why we used the regularizer parameter when defining the weights and assigned a l2_regularizer to it. This tells TensorFlow to keep track of the L2-regularization terms (and weigh them by the parameter reg_constant) for this variable. All regularization terms are added to a collection called tf.GraphKeys.REGULARIZATION_LOSSES, which the loss function accesses. We then add the sum of all regularization losses to the previously calculated cross-entropy to arrive at the total loss of our model.

這就是為什么我們在定義權重時使用了regularizer參數并l2_regularizer分配了l2_regularizer的原因。這告訴TensorFlow跟蹤該變量的L2正則化項(并通過參數reg_constant )。所有正則化術語都添加到名為tf.GraphKeys.REGULARIZATION_LOSSES的集合中，損失函數可以訪問該集合。然后，我們將所有正則化損失的總和加到先前計算的交叉熵中，得出模型的總損失。

優化變量： `training()` (Optimizing the Variables: `training()`)

global_step is a scalar variable which keeps track of how many training iterations have already been performed. When repeatedly running the model in our training loop, we already know this value. It’s the iteration variable of the loop. The reason we’re adding this value directly to the TensorFlow graph is that we want to be able to take snapshots of the model. And these snapshots should include information about how many training steps have already been performed.

global_step是一個標量變量，用于跟蹤已經執行了多少次訓練迭代。當在訓練循環中重復運行模型時，我們已經知道該值。這是循環的迭代變量。我們將此值直接添加到TensorFlow圖的原因是我們希望能夠拍攝模型的快照。并且這些快照應包括有關已經執行了多少培訓步驟的信息。

The definition of the gradient descent optimizer is simple. We provide the learning rate and tell the optimizer which variable it is supposed to minimize. In addition, the optimizer automatically increments the global_step parameter with every iteration.

梯度下降優化器的定義很簡單。我們提供學習率，并告訴優化器應該最小化哪個變量。此外，優化器會在每次迭代時自動增加global_step參數。

績效`evaluation()` ： `evaluation()` (Measuring Performance: `evaluation()`)

The calculation of the model’s accuracy is the same as in the softmax case: we compare the model’s predictions with true labels and calculate the frequency of how often the prediction is correct. We’re also interested in how the accuracy evolves over time, so we’re adding a summary operation which keeps track of the value of accuracy. We’ll cover this in the section about TensorBoard.

模型準確性的計算與softmax情況相同：我們將模型的預測與真實標簽進行比較，并計算預測正確的頻率。我們也正在隨著時間的推移感興趣的是如何準確的發展，所以我們增加其跟蹤的值的匯總操作accuracy 。我們將在有關TensorBoard的部分中對此進行介紹。

To summarize what we have done so far, we have defined the behavior of a 2-layer artificial neural network using 4 functions: inference() constitutes the forward pass through the network and returns class scores. loss() compares predicted and true class scores and generates a loss value. training() performs a training step and optimizes the model’s internal parameters and evaluation() measures the performance of our model.

總結到目前為止我們所做的事情，我們使用4個函數定義了一個2層人工神經網絡的行為： inference()構成通過網絡的正向傳遞并返回類分數。 loss()比較預測的和真實的班級成績，并生成損失值。 training()執行訓練步驟并優化模型的內部參數， evaluation()模型的性能。

運行神經網絡 (Running the Neural Network)

Now that the neural network is defined, let’s look at how run_fc_model.py runs, trains and evaluates the model.

現在已經定義了神經網絡，讓我們看看run_fc_model.py如何運行，訓練和評估模型。

After the obligatory imports we’re defining the model parameters as external flags. TensorFlow has its own module for command line parameters, which is a thin wrapper around Python’s argparse. We’re using it here for convenience, but you can just as well use argparse directly instead.

強制導入后，我們將模型參數定義為外部標志。 TensorFlow擁有自己的命令行參數模塊，該模塊是Python的argparse的瘦包裝。為了方便起見，我們在這里使用它，但是您也可以直接使用argparse 。

In the first couple of lines, the various command line parameters are being defined. The parameters for each flag are the flag’s name, its default value and a short description. Executing the file with the -h flag displays these descriptions.

在前幾行中，定義了各種命令行參數。每個標志的參數是標志的名稱，其默認值和簡短描述。使用-h標志執行文件將顯示這些描述。

The second block of lines calls the function which actually parses the command line parameters. Then the values of all parameters are printed to the screen.

第二行代碼調用實際解析命令行參數的函數。然后將所有參數的值打印到屏幕上。

Here we define constants for the number of pixels per image (32 x 32 x 3) and the number of different image categories. Then we start measuring the runtime by creating a timer.

在這里，我們為每個圖像的像素數(32 x 32 x 3)和不同圖像類別的數量定義常數。然后，我們通過創建計時器開始測量運行時間。

We want to log some info about the training process and use TensorBoard to display that info. TensorBoard requires the logs for each run to be in a separate directory, so we’re adding date and time info to the name of the log directory.

我們想記錄一些有關訓練過程的信息，并使用TensorBoard顯示該信息。 TensorBoard要求每次運行的日志都位于單獨的目錄中，因此我們將日期和時間信息添加到日志目錄的名稱中。

load_data() loads the CIFAR-10 data and returns a dictionary containing separate training and test datasets.

load_data()加載CIFAR-10數據并返回包含單獨的訓練和測試數據集的字典。

生成TensorFlow圖 (Generate the TensorFlow Graph)

We’re defining TensorFlow placeholders. When performing the actual calculations, these will be filled with training/testing data.

我們正在定義TensorFlow占位符。在執行實際計算時，這些數據將填充訓練/測試數據。

The images_placeholder has dimensions of batch size x pixels per image. A batch size of ‘None’ allows us to run the graph with different batch sizes (the batch size for training the net can be set via a command line parameter, but for testing we’re passing the whole test set as a single batch).

images_placeholder具有批量大小x每個圖像像素的尺寸。批處理大小為“無”可讓我們以不同的批處理大小運行圖形(可通過命令行參數設置用于訓練網絡的批處理大小，但對于測試，我們將整個測試集作為單個批處理傳遞) 。

The labels_placeholder is a vector of integer values containing the correct class label, one per image in the batch.

labels_placeholder是包含正確的類標簽的整數值的向量，批處理中的每個圖像一個。

Here we’re referencing the functions we covered earlier in two_layer_fc.py.

在這里，我們引用我們在two_layer_fc.py介紹的函數。

inference() gets us from input data to class scores.
inference()讓我們從輸入數據到課程成績。
loss() calculates a loss value from class scores.
loss()根據課程分數計算損失值。
training() performs a single training step.
training()執行單個訓練步驟。
evaluation() calculates the accuracy of the network.
evaluation()計算網絡的準確性。

Defines a summary operation for TensorBoard (covered here).

為TensorBoard定義一個摘要操作(在此處找到 )。

Generates a saver object to save the model’s state at checkpoints (covered here).

生成一個saver對象，以將模型的狀態保存在檢查點(在此處找到 )。

We start the TensorFlow session and immediately initialize all variables. Then we create a summary writer which we will use to periodically save log information to disk.

我們開始TensorFlow會話并立即初始化所有變量。然后，我們創建摘要編寫器，將其用于定期將日志信息保存到磁盤。

These lines are responsible for generating batches of input data. Let’s pretend we have 100 training images and a batch size of 10. In the softmax example we just picked 10 random images for each iteration. This means that after 10 iterations each image will have been picked once on average(!). But in fact some images will have been picked multiple times while some images haven’t been part of any batch so far. As long as you repeat this often enough, it’s not that terrible that randomness causes some images to be part of the training batches somewhat more often than others.

這些行負責生成一批輸入數據。假設我們有100張訓練圖像，批處理大小為10張。在softmax示例中，我們為每次迭代選擇了10張隨機圖像。這意味著經過10次迭代后，每個圖像平均會被選擇一次(！)。但是實際上，有些圖像會被多次拾取，而到目前為止，有些圖像還沒有被納入任何批次。只要您重復的次數足夠多，隨機性就不會比某些圖像更頻繁地使某些圖像成為訓練批次的一部分。

But this time we want to improve the sampling process. What we do is we first shuffle the 100 images of the training dataset. The first 10 images of the shuffled data are our first batch, the next 10 images are our second batch and so forth. After 10 batches we’re at the end of our dataset and the process starts again. We shuffle the data another time and run through it from front to back. This guarantees that no image is being picked more often than any other while still ensuring that the order in which the images are returned is random.

但是這次我們要改善采樣過程。我們要做的是首先對訓練數據集的100張圖像進行混洗。隨機數據的前10張圖像是我們的第一批，接下來的10張圖像是我們的第二批，依此類推。 10個批次后，我們位于數據集的末尾，過程再次開始。我們再次對數據進行混洗，然后從頭到尾遍歷數據。這保證了沒有圖像比其他任何圖像被更頻繁地拾取，同時仍然確保了返回圖像的順序是隨機的。

In order to achieve this, the gen_batch() function in data_helpers() returns a Python generator, which returns the next batch each time it is evaluated. The details of how generators work are beyond the scope of this post (a good explanation can be found here). We’re using the Python’s built-in zip() function to generate a list of tuples of the from [(image1, label1), (image2, label2), ...], which is then passed to our generator function.

為了實現這一點， gen_batch()函數data_helpers()返回一個Python generator ，其中每個被評價時間返回下一個批次。生成器如何工作的詳細信息超出了本文的范圍(可以在此處找到很好的解釋)。我們正在使用Python的內置zip()函數來生成from [(image1, label1), (image2, label2), ...]的元組列表，然后將其傳遞給我們的生成器函數。

next(batches) returns the next batch of data. Since it’s still in the form of [(imageA, labelA), (imageB, labelB), ...], we need to unzip it first to separate images from labels, before filling feed_dict, the dictionary containing the TensorFlow placeholders, with a single batch of training data.

next(batches)返回下一批數據。由于它仍然是[(imageA, labelA), (imageB, labelB), ...] ，因此我們需要先將其解壓縮以將圖像與標簽分開，然后在feed_dict (包含TensorFlow占位符的字典)中填充一個單批訓練數據。

Every 100 iterations the model’s current accuracy is evaluated and printed to the screen. In addition, the summary operation is being run and its results are added to the summary_writer which is responsible for writing the summaries to disk. From there they can be read and displayed by TensorBoard (see this section).

每進行100次迭代，就會評估模型的當前精度并將其打印到屏幕上。此外，正在運行summary操作，并且其結果已添加到summary_writer ，后者負責將摘要寫入磁盤。 TensorBoard可以從那里讀取和顯示它們(請參閱本節 )。

This line runs the train_step operation (defined previously to call two_layer_fc.training(), which contains the actual instructions for the optimization of the variables).

該行運行train_step操作(先前定義為調用two_layer_fc.training() ，其中包含用于優化變量的實際指令)。

When training a model takes a longer period of time, there is an easy way to save a snapshot of your progress. This allows you to come back later and restore the model in exactly the same state. All you need to do is to create a tf.train.Saver object (we did that earlier) and then call its save() method every time you want to take a snapshot.

訓練模型需要較長時間時，有一種簡單的方法可以保存進度快照。這樣一來，您稍后即可返回并以完全相同的狀態還原模型。您需要做的就是創建一個tf.train.Saver對象(我們之前做過)，然后每次想要拍攝快照時都調用其save()方法。

Restoring a model is just as easy, just call the saver’s restore() method. There is a working code example showing how to do this in the file restore_model.pyin the github repository.

還原模型同樣簡單，只需調用保護程序的restore()方法即可。 github存儲庫中的restore_model.py文件中有一個工作代碼示例，展示了如何執行此操作。

After the training is finished, the final model is evaluated on the test set (remember, the test set contains data that the model has not seen so far, allowing us to judge how well the model is able to generalize to new data).

訓練完成后，將在測試集上評估最終模型(請記住，測試集包含該模型到目前為止尚未看到的數據，這使我們能夠判斷該模型能夠很好地推廣到新數據)。

結果 (Results)

Let’s run the model with the default parameters via “python run_fc_model.py”. My output looks like this:

讓我們通過“ python run_fc_model.py ”使用默認參數運行模型。我的輸出如下所示：

Parameters: batch_size = 400 hidden1 = 120 learning_rate = 0.001 max_steps = 2000 reg_constant = 0.1 train_dir = tf_logs

Step 0, training accuracy 0.09 Step 100, training accuracy 0.2675 Step 200, training accuracy 0.3925 Step 300, training accuracy 0.41 Step 400, training accuracy 0.4075 Step 500, training accuracy 0.44 Step 600, training accuracy 0.455 Step 700, training accuracy 0.44 Step 800, training accuracy 0.48 Step 900, training accuracy 0.51 Saved checkpoint Step 1000, training accuracy 0.4425 Step 1100, training accuracy 0.5075 Step 1200, training accuracy 0.4925 Step 1300, training accuracy 0.5025 Step 1400, training accuracy 0.5775 Step 1500, training accuracy 0.515 Step 1600, training accuracy 0.4925 Step 1700, training accuracy 0.56 Step 1800, training accuracy 0.5375 Step 1900, training accuracy 0.51 Saved checkpoint Test accuracy 0.4633 Total time: 97.54s

We can see that the training accuracy starts at a level we would expect from guessing randomly (10 classes -> 10% chance of picking the correct one). Over the first about 1000 iterations the accuracy increases to around 50% and fluctuates around that value for the next 1000 iterations. The test accuracy of 46% is not much lower than the training accuracy. This indicates that our model is not significantly overfitted. The performance of the softmax classifier was around 30%, so 46% is an improvement of about 50%. Not bad!

我們可以看到，訓練的準確性始于我們隨機猜測所期望的水平(10個類-> 10％的機會選擇正確的一個)。在最初的約1000次迭代中，精度增加到50％左右，并在接下來的1000次迭代中圍繞該值波動。 46％的測試準確度并不比訓練準確度低很多。這表明我們的模型沒有明顯過擬合。 softmax分類器的性能約為30％，因此46％的性能約為50％。不錯！

使用TensorBoard進行可視化 (Visualization with TensorBoard)

TensorBoard allows you to visualize different aspects of your TensorFlow graphs and is very useful for debugging and improving your networks. Let’s look at the TensorBoard-related lines of code spread throughout the codebase.

TensorBoard允許您可視化TensorFlow圖的不同方面，對于調試和改進網絡非常有用。讓我們看看遍及整個代碼庫的與TensorBoard相關的代碼行。

In two_layer_fc.py we find the following:

在two_layer_fc.py我們找到以下內容：

Each of these three lines creates a summary operation. By defining a summary operation you tell TensorFlow that you are interested in collecting summary information from certain tensors (logits, loss and accuracy in our case). The other parameter for the summary operation is just a label you want to attach to the summary.

這三行中的每行都創建一個摘要操作。通過定義摘要操作，您可以告訴TensorFlow您有興趣從某些張量(本例中為logits ， loss和accuracy )收集摘要信息。摘要操作的另一個參數只是您要附加到摘要的標簽。

There are different kinds of summary operations. We’re using scalar_summary to record information about scalar (non-vector) values and histogram_summary to collect info about a distribution of multiple values (more info about the various summary operations can be found in the TensorFlow docs).

有不同種類的匯總操作。我們正在使用scalar_summary記錄有關標量(非矢量)值的信息，并使用histogram_summary收集有關多個值分布的信息(有關各種匯總操作的更多信息，請參見TensorFlow文檔 )。

In run_fc_model.py the following lines are relevant for the TensorBoard visualization:

在run_fc_model.py ，以下幾行與TensorBoard可視化相關：

An operation in TensorFlow doesn’t run by itself, you need to either call it directly or call another operation which depends on it. Since we don’t want to call each summary operation individually each time we want to collect summary information, we’re using tf.merge_all_summaries to create a single operation which runs all our summaries.

TensorFlow中的一個操作不是自己運行的，您需要直接調用它或調用另一個依賴于它的操作。由于我們不想每次想要收集摘要信息時都單獨調用每個摘要操作，因此我們使用tf.merge_all_summaries創建一個運行所有摘要的單個操作。

During the initialization of the TensorFlow session we’re creating a summary writer. The summary writer is responsible for actually writing summary data to disk. In its constructor we supply logdir, the directory where we want the logs to be written. The optional graph argument tells TensorBoard to render a display of the whole TensorFlow graph.

在TensorFlow會話初始化期間，我們正在創建摘要編寫器。摘要編寫器負責將摘要數據實際寫入磁盤。在其構造函數中，我們提供logdir ，即我們希望將日志寫入的目錄。可選的graph參數告訴TensorBoard渲染整個TensorFlow圖的顯示。

Every 100 iterations we execute the merged summary operation and feed the results to the summary writer which writes them to disk.

每執行100次迭代，我們就會執行合并的摘要操作，并將結果提供給摘要編寫器，然后將其寫入磁盤。

To view the results we run TensorBoard via “tensorboard --logdir=tf_logs” and open localhost:6006 in a web browser. In the “Events”-tab we can see how the network’s loss decreases and how its accuracy increases over time.

要查看結果，我們通過“ tensorboard --logdir=tf_logs ”運行TensorBoard并在Web瀏覽器中打開localhost:6006 。在“事件”選項卡中，我們可以看到網絡的損耗如何減少以及其準確性如何隨時間增加。

The “Graphs”-tab shows a visualization of the TensorFlow graph we have defined. You can interactively rearrange it until you’re satisfied with how it looks. I think the following image shows the structure of our network pretty well.

“圖形”選項卡顯示了我們定義的TensorFlow圖的可視化。您可以交互式地重新排列它，直到對它的外觀滿意為止。我認為下圖很好地顯示了我們的網絡結構。

In the “Distribution”- and “Histograms”-tabs you can explore the results of the tf.histogram_summary operation we attached to logits, but I won’t go into further details here. More info can be found in the relevant section of the offical TensorFlow documentation.

在“分布”和“直方圖”選項卡中，您可以瀏覽我們附加到logits的tf.histogram_summary操作的結果，但在此不再贅述。可以在官方TensorFlow文檔的相關部分中找到更多信息。

進一步改進 (Further Improvements)

Maybe you’re thinking that training the softmax classifier took a lot less computation time than training the neural network. While that’s true, even if we kept training the softmax classifier as long as it took the neural network to train, it wouldn’t reach the same performance. The longer you train a model, the smaller the additional gains get and after a certain point the performance improvement is miniscule. We’ve reached this point with the neural network too. Additional training time would not improve the accuracy significantly anymore. There’s something else we could do though:

也許您認為與訓練神經網絡相比，訓練softmax分類器花費的計算時間少得多。的確如此，即使只要我們繼續訓練softmax分類器，只要它需要神經網絡來訓練，它就不會達到相同的性能。訓練模型的時間越長，獲得的額外收益越小，并且在特定點之后，性能提升微乎其微。我們也已經通過神經網絡達到了這一點。額外的培訓時間將不再顯著提高準確性。我們還有其他可以做的事情：

The default parameter values are chosen to be pretty ok, but there is some room for improvement left. By varying parameters such as the number of neurons in the hidden layer or the learning rate, we should be able to improve the model’s accuracy some more. A testing accuracy greater than 50% should definitely be possible with this model with some further optimization. Although I would be very surprised if this model could be tuned to reach 65% or more. But there’s another type of network architecture for which such an accuracy is easily doable: convolutional neural networks. These are a class of neural networks which are not fully connected. Instead they try to make sense of local features in their input, which is very useful for analyzing images. It intuitively makes a lot of sense to take spatial information into account when looking at images. In part 3 of this series we will see the principles of how convolutional neural networks work and build one ourselves.

選擇默認參數值還可以，但是還有一些改進的余地。通過改變諸如隱藏層中的神經元數量或學習率之類的參數，我們應該能夠進一步提高模型的準確性。經過進一步優化，使用此模型絕對可以實現大于50％的測試精度。盡管如果將該模型調整到65％或更高的水平，我會感到非常驚訝。但是還有另一種類型的網絡體系結構可以很容易地實現這種準確性：卷積神經網絡。這些是一類沒有完全連接的神經網絡。相反，他們嘗試在輸入中理解局部特征，這對于分析圖像非常有用。直觀地考慮圖像時考慮空間信息非常有意義。在本系列的第3部分中，我們將了解卷積神經網絡如何工作并自行構建的原理。

Stay tuned for part 3 on convolutional neural networks and thanks a lot for reading! I’m happy about any feedback you might have!

敬請關注卷積神經網絡的第3部分，非常感謝您的閱讀！我很高興收到您的任何反饋！

aYou can also check out other articles I’ve written on my blog.

a您還可以查看我在博客上寫的其他文章。