Neural Networks for Handwritten Digit Recognition, Multiclass
In this exercise, you will use a neural network to recognize the hand-written digits 0-9.
在本次練習中,您將使用神經網絡來識別0-9的手寫數字。
Outline
- 1 - Packages
- 2 - ReLU Activation
- 3 - Softmax Function
- Exercise 1
- 4 - Neural Networks
- 4.1 Problem Statement
- 4.2 Dataset
- 4.3 Model representation
- 4.4 Tensorflow Model Implementation
- 4.5 Softmax placement
- Exercise 2
1 - Packages
First, let’s run the cell below to import all the packages that you will need during this assignment.
首先,運行下面的單元格來導入你在這個練習中需要的所有包。
- numpy is the fundamental package for scientific computing with Python.
- matplotlib is a popular library to plot graphs in Python.
- tensorflow a popular platform for machine learning.
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.activations import linear, relu, sigmoid
%matplotlib widget
import matplotlib.pyplot as plt
plt.style.use('./deeplearning.mplstyle')import logging
logging.getLogger("tensorflow").setLevel(logging.ERROR)
tf.autograph.set_verbosity(0)from public_tests import * from autils import *
from lab_utils_softmax import plt_softmax
np.set_printoptions(precision=2)
2 - ReLU Activation
This week, a new activation was introduced, the Rectified Linear Unit (ReLU).
本周,一個新的激活函數被引入,即整流線性單元(ReLU)。
a = m a x ( 0 , z ) ReLU?function a = max(0,z) \quad\quad\text {ReLU function} a=max(0,z)ReLU?function
plt_act_trio()
The example from the lecture on the right shows an application of the ReLU. In this example, the derived “awareness” feature is not binary but has a continuous range of values. The sigmoid is best for on/off or binary situations. The ReLU provides a continuous linear relationship. Additionally it has an ‘off’ range where the output is zero.
右邊的例子展示了ReLU的一個應用。在這個例子中,派生的“意識”特征不是二元的,而是具有連續范圍的值。s型最適合開/關或二進制的情況。ReLU提供了一個連續的線性關系。此外,它有一個輸出為零的“關閉”范圍
The “off” feature makes the ReLU a Non-Linear activation. Why is this needed? This enables multiple units to contribute to to the resulting function without interfering. This is examined more in the supporting optional lab.
“關閉”功能使ReLU成為非線性激活。為什么需要這樣做?這使得多個單元能夠在不干擾的情況下為最終功能做出貢獻。在配套的可選實驗室中對此進行了更多的研究
3 - Softmax Function
A multiclass neural network generates N outputs. One output is selected as the predicted answer. In the output layer, a vector z \mathbf{z} z is generated by a linear function which is fed into a softmax function. The softmax function converts z \mathbf{z} z into a probability distribution as described below. After applying softmax, each output will be between 0 and 1 and the outputs will sum to 1. They can be interpreted as probabilities. The larger inputs to the softmax will correspond to larger output probabilities.
一個多類神經網絡產生N個輸出。選擇一個輸出作為預測答案。在輸出層,向量 z \mathbf{z} z是由一個線性函數生成的,該線性函數被饋送到一個softmax函數中。softmax函數將 z \mathbf{z} z轉換為如下所述的概率分布。應用softmax后,每個輸出將在0到1之間,輸出之和為1。它們可以被解釋為概率。softmax的較大輸入將對應較大的輸出概率
The softmax function can be written:
a j = e z j ∑ k = 0 N ? 1 e z k (1) a_j = \frac{e^{z_j}}{ \sum_{k=0}^{N-1}{e^{z_k} }} \tag{1} aj?=∑k=0N?1?ezk?ezj??(1)
Where z = w ? x + b z = \mathbf{w} \cdot \mathbf{x} + b z=w?x+b and N is the number of feature/categories in the output layer.
Exercise 1
Let’s create a NumPy implementation:
# UNQ_C1
# GRADED CELL: my_softmaxdef my_softmax(z): """ Softmax converts a vector of values to a probability distribution.Args:z (ndarray (N,)) : input data, N featuresReturns:a (ndarray (N,)) : softmax of z""" ### START CODE HERE ### ez = np.exp(z)a = ez/np.sum(ez)### END CODE HERE ### return a
z = np.array([1., 2., 3., 4.])
a = my_softmax(z)
atf = tf.nn.softmax(z)
print(f"my_softmax(z): {a}")
print(f"tensorflow softmax(z): {atf}")# BEGIN UNIT TEST
test_my_softmax(my_softmax)
# END UNIT TEST
my_softmax(z): [0.03 0.09 0.24 0.64]
tensorflow softmax(z): [0.03 0.09 0.24 0.64]
[92m All tests passed.
Click for hints One implementation uses for loop to first build the denominator and then a second loop to calculate each output.
def my_softmax(z): N = len(z)a = # initialize a to zeros ez_sum = # initialize sum to zerofor k in range(N): # loop over number of outputs ez_sum += # sum exp(z[k]) to build the shared denominator for j in range(N): # loop over number of outputs again a[j] = # divide each the exp of each output by the denominator return(a)
Click for code
def my_softmax(z): N = len(z)a = np.zeros(N)ez_sum = 0for k in range(N): ez_sum += np.exp(z[k]) for j in range(N): a[j] = np.exp(z[j])/ez_sum return(a)Or, a vector implementation:def my_softmax(z): ez = np.exp(z) a = ez/np.sum(ez) return(a)
Below, vary the values of the z
inputs. Note in particular how the exponential in the numerator magnifies small differences in the values. Note as well that the output values sum to one.
下面,改變“z”輸入的值。特別要注意分子中的指數如何放大值中的微小差異。還要注意,輸出值和為1。
plt.close("all")
plt_softmax(my_softmax)
4 - Neural Networks
In last weeks assignment, you implemented a neural network to do binary classification. This week you will extend that to multiclass classification. This will utilize the softmax activation.
在上周的作業中,你們實現了一個神經網絡來進行二值分類。本周你將把它擴展到多類分類。這將利用softmax激活。
4.1 Problem Statement
In this exercise, you will use a neural network to recognize ten handwritten digits, 0-9. This is a multiclass classification task where one of n choices is selected. Automated handwritten digit recognition is widely used today - from recognizing zip codes (postal codes) on mail envelopes to recognizing amounts written on bank checks.
在這個練習中,你將使用神經網絡來識別10個手寫數字,0-9。這是一個從n個選項中選擇一個的多類分類任務。自動手寫數字識別在今天被廣泛使用——從識別郵件信封上的郵政編碼到識別銀行支票上的金額。
4.2 Dataset
You will start by loading the dataset for this task.
一開始你將為這個任務加載數據集。
-
The
load_data()
function shown below loads the data into variablesX
andy
(load_data()
函數將數據加載到變量X
和y
中) -
The data set contains 5000 training examples of handwritten digits 1 ^1 1. (數據集包含5000個手寫數據的訓練例子)
- Each training example is a 20-pixel x 20-pixel grayscale image of the digit.(每個訓練樣例是數字的20像素x 20像素灰度圖像)
- Each pixel is represented by a floating-point number indicating the grayscale intensity at that location.(每個像素用一個浮點數表示,表示該位置的灰度強度。)
- The 20 by 20 grid of pixels is “unrolled” into a 400-dimensional vector.(20 × 20的像素網格被“展開”成一個400維向量。)
- Each training examples becomes a single row in our data matrix
X
.(每個訓練樣例都成為數據矩陣X中的一行。) - This gives us a 5000 x 400 matrix
X
where every row is a training example of a handwritten digit image.(這給了我們一個5000 x 400矩陣x,其中每一行都是一個手寫數字圖像的訓練示例。)
- Each training example is a 20-pixel x 20-pixel grayscale image of the digit.(每個訓練樣例是數字的20像素x 20像素灰度圖像)
X = ( ? ? ? ( x ( 1 ) ) ? ? ? ? ? ? ( x ( 2 ) ) ? ? ? ? ? ? ? ( x ( m ) ) ? ? ? ) X = \left(\begin{array}{cc} --- (x^{(1)}) --- \\ --- (x^{(2)}) --- \\ \vdots \\ --- (x^{(m)}) --- \end{array}\right) X= ????(x(1))??????(x(2))???????(x(m))???? ?
- The second part of the training set is a 5000 x 1 dimensional vector
y
that contains labels for the training set(訓練集的第二部分是一個5000 x 1維向量“y”,其中包含訓練集的標簽)y = 0
if the image is of the digit0
,y = 4
if the image is of the digit4
and so on.(如果圖像為數字0
,則為y = 0
,如果圖像為數字4
,則為y = 4
,以此類推。)
1 ^1 1 This is a subset of the MNIST handwritten digit dataset(這是MNIST手寫數字數據集的一個子集) (http://yann.lecun.com/exdb/mnist/)
# load dataset
X, y = load_data()
4.2.1 View the variables
Let’s get more familiar with your dataset.(讓我們更熟悉你的數據集。)
- A good place to start is to print out each variable and see what it contains.
- 一個好的開始是打印出每個變量,看看它包含什么。
The code below prints the first element in the variables X
and y
.
下面的代碼打印變量X
和y
中的第一個元素。
print ('The first element of X is: ', X[0])
The first element of X is: [ 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+000.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00...0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+000.00e+00]
print ('The first element of y is: ', y[0,0])
print ('The last element of y is: ', y[-1,0])
The first element of y is: 0
The last element of y is: 9
4.2.2 Check the dimensions of your variables(檢查變量的大小)
Another way to get familiar with your data is to view its dimensions. Please print the shape of X
and y
and see how many training examples you have in your dataset.
另一種熟悉你的數據的方法是查看它的維度。請打印出X
和y
的形狀,并查看你數據集中的訓練示例數量。
print ('The shape of X is: ' + str(X.shape))
print ('The shape of y is: ' + str(y.shape))
The shape of X is: (5000, 400)
The shape of y is: (5000, 1)
4.2.3 Visualizing the Data(可視化數據)
You will begin by visualizing a subset of the training set.(你將從可視化訓練集中的一小部分開始。)
- In the cell below, the code randomly selects 64 rows from
X
, maps each row back to a 20 pixel by 20 pixel grayscale image and displays the images together.- 下面的代碼從
X
中隨機選擇64行,將每一行映射回20像素x20像素的灰度圖像,并顯示圖像。
- 下面的代碼從
- The label for each image is displayed above the image
- 每個圖像的標簽都顯示在圖像上方
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
# You do not need to modify anything in this cellm, n = X.shapefig, axes = plt.subplots(8,8, figsize=(5,5))
fig.tight_layout(pad=0.13,rect=[0, 0.03, 1, 0.91]) #[left, bottom, right, top]#fig.tight_layout(pad=0.5)
widgvis(fig)
for i,ax in enumerate(axes.flat):# Select random indicesrandom_index = np.random.randint(m)# Select rows corresponding to the random indices and# reshape the imageX_random_reshaped = X[random_index].reshape((20,20)).T# Display the imageax.imshow(X_random_reshaped, cmap='gray')# Display the label above the imageax.set_title(y[random_index,0])ax.set_axis_off()fig.suptitle("Label, image", fontsize=14)
4.3 Model representation(模型表示)
The neural network you will use in this assignment is shown in the figure below.
你將在這個作業中使用的神經網絡如下圖所示。
- This has two dense layers with ReLU activations followed by an output layer with a linear activation.(它有兩個具有ReLU激活的密集層,后面是一個具有線性激活的輸出層。)
- Recall that our inputs are pixel values of digit images.(回想一下,我們的輸入是數字圖像的像素值。)
- Since the images are of size 20 × 20 20\times20 20×20, this gives us 400 400 400 inputs(由于圖像的大小為 20 × 20 20 × 20 20×20,因此我們得到 400 400 400的輸入)
- The parameters have dimensions that are sized for a neural network with 25 25 25 units in layer 1, 15 15 15 units in layer 2 and 10 10 10 output units in layer 3, one for each digit.
-
參數的維度是神經網絡的大小,第一層為 25 25 25單位,第二層為 15 15 15單位,第三層為 10 10 10輸出單位,每個數字一個
-
Recall that the dimensions of these parameters is determined as follows:(記住,這些參數的維度是按照以下方式確定的:)
- If network has s i n s_{in} sin? units in a layer and s o u t s_{out} sout? units in the next layer, then(如果網絡在層中有 s i n s_{in} sin?個單元,在下一層有 s o u t s_{out} sout?個單元,則)
- W W W will be of dimension s i n × s o u t s_{in} \times s_{out} sin?×sout?.( W W W的維度為 s i n × s o u t s_{in} \times s_{out} sin?×sout?)
- b b b will be a vector with s o u t s_{out} sout? elements( b b b將是一個具有 s o u t s_{out} sout?個元素的向量)
- If network has s i n s_{in} sin? units in a layer and s o u t s_{out} sout? units in the next layer, then(如果網絡在層中有 s i n s_{in} sin?個單元,在下一層有 s o u t s_{out} sout?個單元,則)
-
Therefore, the shapes of
W
, andb
, are(因此,W
和b
的形狀是)- layer1: The shape of
W1
is (400, 25) and the shape ofb1
is (25,) - layer2: The shape of
W2
is (25, 15) and the shape ofb2
is: (15,) - layer3: The shape of
W3
is (15, 10) and the shape ofb3
is: (10,)
- layer1: The shape of
-
Note: The bias vector
b
could be represented as a 1-D (n,) or 2-D (n,1) array. Tensorflow utilizes a 1-D representation and this lab will maintain that convention:
注意: 偏差向量b
可以表示為1-D(n,)或2-D(n,1)數組。Tensorflow使用1-D表示法,本實驗將保持這種慣例:
4.4 Tensorflow Model Implementation(Tensorflow模型實現)
Tensorflow models are built layer by layer. A layer’s input dimensions ( s i n s_{in} sin? above) are calculated for you. You specify a layer’s output dimensions and this determines the next layer’s input dimension. The input dimension of the first layer is derived from the size of the input data specified in the model.fit
statement below.
Tensorflow模型是按層構建的。上面的 s i n s_{in} sin?的輸入尺寸是為你計算的。你指定一個層的輸出尺寸,這決定了下一層的輸入尺寸。第一層的輸入尺寸由在下面的model.fit
語句中指定的輸入數據的大小決定。
Note: It is also possible to add an input layer that specifies the input dimension of the first layer. For example:
注意: 也可以添加一個輸入層,該層指定第一層的輸入尺寸。例如:
tf.keras.Input(shape=(400,)), #specify input shape
We will include that here to illuminate some model sizing.
我們將在這里包括它來說明一些模型大小。
4.5 Softmax placement(Softmax放置)
As described in the lecture and the optional softmax lab, numerical stability is improved if the softmax is grouped with the loss function rather than the output layer during training. This has implications when building the model and using the model.
正如在講座和可選的softmax實驗室中所描述的,如果softmax在訓練期間與損失函數一起而不是輸出層分組,則數值穩定性會得到改善。這在“構建”模型和“使用”模型時具有隱含意義。
Building:
- The final Dense layer should use a ‘linear’ activation. This is effectively no activation.
- 最后的致密層應該使用“線性”激活。這實際上是沒有激活。
- The
model.compile
statement will indicate this by includingfrom_logits=True
.(model.compile
語句將通過包含from_logits=True
來表明這一點。)
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
- This does not impact the form of the target. In the case of SparseCategorialCrossentropy, the target is the expected digit, 0-9.
- 這不會影響目標的形狀。在SparseCategorialCrossentropy的情況下,目標是期望的數字0-9。
Using the model(運用該模型):
- The outputs are not probabilities. If output probabilities are desired, apply a softmax function.
- 輸出不是概率。如果需要輸出概率,則應用softmax函數。
Exercise 2
Below, using Keras Sequential model and Dense Layer with a ReLU activation to construct the three layer network described above.
下面,使用Keras Sequential model和Dense Layer與ReLU激活來構建上面描述的三層網絡。
# UNQ_C2
# GRADED CELL: Sequential model
tf.random.set_seed(1234) # for consistent results
model = Sequential([ ### START CODE HERE ### tf.keras.Input(shape=(400,)),Dense(25,activation='relu'),Dense(15,activation='relu'),Dense(10,activation='linear')### END CODE HERE ### ], name = "my_model"
)
model.summary()
Model: "my_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 25) 10025
_________________________________________________________________
dense_1 (Dense) (None, 15) 390
_________________________________________________________________
dense_2 (Dense) (None, 10) 160
=================================================================
Total params: 10,575
Trainable params: 10,575
Non-trainable params: 0
_________________________________________________________________
Expected Output (Click to expand) The `model.summary()` function displays a useful summary of the model. Note, the names of the layers may vary as they are auto-generated unless the name is specified.
Model: "my_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
L1 (Dense) (None, 25) 10025
_________________________________________________________________
L2 (Dense) (None, 15) 390
_________________________________________________________________
L3 (Dense) (None, 10) 160
=================================================================
Total params: 10,575
Trainable params: 10,575
Non-trainable params: 0
_________________________________________________________________
Click for hints
tf.random.set_seed(1234)
model = Sequential([ ### START CODE HERE ### tf.keras.Input(shape=(400,)), # @REPLACE Dense(25, activation='relu', name = "L1"), # @REPLACE Dense(15, activation='relu', name = "L2"), # @REPLACE Dense(10, activation='linear', name = "L3"), # @REPLACE ### END CODE HERE ### ], name = "my_model"
)
# BEGIN UNIT TEST
test_model(model, 10, 400)
# END UNIT TEST
[92mAll tests passed!
The parameter counts shown in the summary correspond to the number of elements in the weight and bias arrays as shown below.
摘要中顯示的參數計數對應于權重和偏置數組中的元素數量,如下所示。
Let’s further examine the weights to verify that tensorflow produced the same dimensions as we calculated above.
讓我們進一步檢查權重,以驗證tensorflow產生的維度與我們上面計算的相同。
[layer1, layer2, layer3] = model.layers
#### Examine Weights shapes
W1,b1 = layer1.get_weights()
W2,b2 = layer2.get_weights()
W3,b3 = layer3.get_weights()
print(f"W1 shape = {W1.shape}, b1 shape = {b1.shape}")
print(f"W2 shape = {W2.shape}, b2 shape = {b2.shape}")
print(f"W3 shape = {W3.shape}, b3 shape = {b3.shape}")
W1 shape = (400, 25), b1 shape = (25,)
W2 shape = (25, 15), b2 shape = (15,)
W3 shape = (15, 10), b3 shape = (10,)
Expected Output
W1 shape = (400, 25), b1 shape = (25,)
W2 shape = (25, 15), b2 shape = (15,)
W3 shape = (15, 1), b3 shape = (10,)
The following code:
- defines a loss function,
SparseCategoricalCrossentropy
and indicates the softmax should be included with the loss calculation by adding (from_logits=True
)- 定義一個損失函數,
SparseCategoricalCrossentropy
,通過添加(from_logits=True
)來指示損失計算中包含softmax
- 定義一個損失函數,
- defines an optimizer. A popular choice is Adaptive Moment (Adam) which was described in lecture.
- 定義一個優化器。一個流行的選擇是自適應矩(Adam),這在講座中描述過。
model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
)history = model.fit(X,y,epochs=40
)
Epoch 1/40
157/157 [==============================] - 2s 1ms/step - loss: 1.7107
Epoch 2/40
157/157 [==============================] - 0s 1ms/step - loss: 0.7461
...
Epoch 40/40
157/157 [==============================] - 0s 847us/step - loss: 0.0329
Epochs and batches
In the compile
statement above, the number of epochs
was set to 100. This specifies that the entire data set should be applied during training 100 times. During training, you see output describing the progress of training that looks like this:
在上面的’ compile ‘語句中,’ epoch '的數目被設置為100。這指定整個數據集應該在訓練期間應用100次。在訓練過程中,你會看到描述訓練進度的輸出,如下所示:
Epoch 1/100
157/157 [==============================] - 0s 1ms/step - loss: 2.2770
The first line, Epoch 1/100
, describes which epoch the model is currently running. For efficiency, the training data set is broken into ‘batches’. The default size of a batch in Tensorflow is 32. There are 5000 examples in our data set or roughly 157 batches. The notation on the 2nd line 157/157 [====
is describing which batch has been executed.
第一行“Epoch 1/100”描述了模型當前運行的Epoch。為了提高效率,訓練數據集被分成“批次”。Tensorflow中批處理的默認大小是32。我們的數據集中有5000個例子,大約157批。第二行’ 157/157[====]的符號描述了執行了哪個批處理。
Loss (cost)
In course 1, we learned to track the progress of gradient descent by monitoring the cost. Ideally, the cost will decrease as the number of iterations of the algorithm increases. Tensorflow refers to the cost as loss
. Above, you saw the loss displayed each epoch as model.fit
was executing. The .fit method returns a variety of metrics including the loss. This is captured in the history
variable above. This can be used to examine the loss in a plot as shown below.
在課程1中,我們學習了通過監測cost來跟蹤梯度下降的進度。理想情況下,成本會隨著算法迭代次數的增加而降低。Tensorflow將成本稱為“損失”。在上面,您可以看到每個epoch的損失顯示為“模型”。他在執行死刑。.fit方法返回各種指標,包括損失。這是在上面的“history”變量中捕獲的。這可以用來檢查如下圖所示的損失。
plot_loss_tf(history)
Prediction
To make a prediction, use Keras predict
. Below, X[1015] contains an image of a two.
要進行預測,請使用Keras ’ predict '。下面,X[1015]包含一個2的圖像。
image_of_two = X[1015]
display_digit(image_of_two)prediction = model.predict(image_of_two.reshape(1,400)) # predictionprint(f" predicting a Two: \n{prediction}")
print(f" Largest Prediction index: {np.argmax(prediction)}")
predicting a Two:
[[ -8.45 -3.27 1.03 -2.2 -10.83 -9.65 -9.07 -2.18 -4.75 -6.29]]Largest Prediction index: 2
The largest output is prediction[2], indicating the predicted digit is a ‘2’. If the problem only requires a selection, that is sufficient. Use NumPy argmax to select it. If the problem requires a probability, a softmax is required:
最大的輸出是prediction[2],表示預測的數字是“2”。如果問題只需要一個選擇,那就足夠了。使用NumPy argmax來選擇它。如果問題需要一個概率,則需要一個softmax:
prediction_p = tf.nn.softmax(prediction)print(f" predicting a Two. Probability vector: \n{prediction_p}")
print(f"Total of predictions: {np.sum(prediction_p):0.3f}")
predicting a Two. Probability vector:
[[6.92e-05 1.24e-02 9.12e-01 3.58e-02 6.41e-06 2.10e-05 3.74e-05 3.67e-022.79e-03 6.01e-04]]
Total of predictions: 1.000
To return an integer representing the predicted target, you want the index of the largest probability. This is accomplished with the Numpy argmax function.
要返回一個表示預測目標的整數,您需要最大概率的索引。這是通過Numpy argmax函數完成的。
yhat = np.argmax(prediction_p)print(f"np.argmax(prediction_p): {yhat}")
np.argmax(prediction_p): 2
Let’s compare the predictions vs the labels for a random sample of 64 digits. This takes a moment to run.
讓我們比較64位隨機樣本的預測和標簽。這需要一點時間來運行。
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
# You do not need to modify anything in this cellm, n = X.shapefig, axes = plt.subplots(8,8, figsize=(5,5))
fig.tight_layout(pad=0.13,rect=[0, 0.03, 1, 0.91]) #[left, bottom, right, top]
widgvis(fig)
for i,ax in enumerate(axes.flat):# Select random indicesrandom_index = np.random.randint(m)# Select rows corresponding to the random indices and# reshape the imageX_random_reshaped = X[random_index].reshape((20,20)).T# Display the imageax.imshow(X_random_reshaped, cmap='gray')# Predict using the Neural Networkprediction = model.predict(X[random_index].reshape(1,400))prediction_p = tf.nn.softmax(prediction)yhat = np.argmax(prediction_p)# Display the label above the imageax.set_title(f"{y[random_index,0]},{yhat}",fontsize=10)ax.set_axis_off()
fig.suptitle("Label, yhat", fontsize=14)
plt.show()
Let’s look at some of the errors.
讓我們看看一些錯誤。
Note: increasing the number of training epochs can eliminate the errors on this data set.
注意:增加訓練輪數可以消除這個數據集上的錯誤。
print( f"{display_errors(model,X,y)} errors out of {len(X)} images")
14 errors out of 5000 images
Congratulations!
You have successfully built and utilized a neural network to do multiclass classification.
你已經成功構建并利用了神經網絡來進行多類分類。
Pytorch實現Minist(手寫數字)數據集的分類
import torch
import torchvision
from torch.utils.tensorboard import SummaryWriter
from torch.utils.data import DataLoader , TensorDataset
from torch import nn
在開始搭建模型之前我們先了解兩個包SummaryWriter
和torchvision
.
SummaryWriter
:官方解釋:將條目直接寫入 log_dir 中的事件文件以供 TensorBoard 使用。SummaryWriter
提供了一個高級 API,用于在給定目錄中創建事件文件,并向其中添加摘要和事件。 該類異步更新文件內容。 這允許訓練程序調用方法以直接從訓練循環將數據添加到文件中,而不會減慢訓練速度。
簡單來說:用來記錄訓練過程中的數據,比如損失函數,準確率并將其可視化等。
torchvision
:官方解釋:torchvision 是一個用于構建計算機視覺模型和數據加載的庫。它包括數據集,模型架構,數據轉換等。torchvision
這里我們將使用torchvision下載Minist數據集,并使用torchvision.transforms
對數據進行預處理。在之前的實驗中我們是通過TensorDDataset來自己構建數據集.
# 定義訓練設備
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")#下載數據集
train_dataset = torchvision.datasets.MNIST(root='./data',train=True,transform=torchvision.transforms.ToTensor(),download=True)
test_dataset = torchvision.datasets.MNIST(root='./data',train=False,transform=torchvision.transforms.ToTensor(),download=True)#數據集長度
train_data_size = len(train_dataset)
test_data_size = len(test_dataset)
train=True
將數據集作為訓練集,False
將數據集作為測試集。transform
將數據轉化為Tensor類型。
接下來讓我們來看一下數據集的形狀
print(train_dataset.data.shape)
print(train_dataset.targets.shape)
torch.Size([60000, 28, 28])
torch.Size([60000])
從輸出我們可以知道Minist數據集的訓練集有6000個樣本,每個樣本有28*28個像素點,每個像素點的值在0-1之間。下面我們利用DataLoader
加載數據集。
#載入訓練集和測試集。
train_dataloader_in = DataLoader(train_dataset,64)
test_dataloader = DataLoader(test_dataset,64)
print(train_dataloader_in.dataset.data.shape)
print(train_dataloader_in.dataset.targets.shape)
torch.Size([60000, 28, 28])
torch.Size([60000])
我們使用的數據集與上面的數據集不一樣它是60000張28像素*28像素的圖片,在導入Minist數據集中到這一步數據集的搭建就已經完成了。因為我們模型的輸入要求的是2維的數據集所以我們在下面將reshape
數據集的形狀成(-1, 28 * 28) 。這里的-1會根據數據自動計算。
X = train_dataloader_in.dataset.data.reshape(-1,28*28)
y = train_dataloader_in.dataset.targets
Xt = test_dataloader.dataset.data.reshape(-1,28*28)
yt = test_dataloader.dataset.targets
#更改數據類型,避免喂入神經網絡的時候報錯
X = torch.tensor(X,dtype=torch.float32)
y = torch.tensor(y,dtype=torch.long)
Xt = torch.tensor(Xt,dtype=torch.float32)
yt = torch.tensor(yt,dtype=torch.long)
C:\Users\10766\AppData\Local\Temp\ipykernel_20736\1033584978.py:7: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).X = torch.tensor(X,dtype=torch.float32)
C:\Users\10766\AppData\Local\Temp\ipykernel_20736\1033584978.py:8: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).y = torch.tensor(y,dtype=torch.long)
C:\Users\10766\AppData\Local\Temp\ipykernel_20736\1033584978.py:9: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).Xt = torch.tensor(Xt,dtype=torch.float32)
C:\Users\10766\AppData\Local\Temp\ipykernel_20736\1033584978.py:10: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).yt = torch.tensor(yt,dtype=torch.long)
print(X.shape)
print(y.shape)
torch.Size([60000, 784])
torch.Size([60000])
使用TensorDataset
和DataLoader
,來構建數據集。之前提到過:
TensorDataset
:對數據進行打包整合(數據格式為Tensor),與python中zip
方法類似,
DataLoader
:用來分批次向模型中傳入數據
train_dataset_re = TensorDataset(X,y)
train_dataloader = DataLoader(train_dataset_re,batch_size=64,shuffle=False)test_dataloader_re = TensorDataset(Xt,yt)
test_dataloader = DataLoader(test_dataloader_re,batch_size=64,shuffle=False)
構建網絡模型
注意:tensorboard在終端使用,
tensorboard --logdir=path
class MinistNet(nn.Module):def __init__(self):super(MinistNet,self).__init__()self.model = nn.Sequential(nn.Linear(784,25),nn.ReLU(),nn.Linear(25,15),nn.ReLU(),nn.Linear(15,10))def forward(self,x):x = self.model(x)return x
神經網絡模型搭建完成,請注意代碼中對SummaryWriter
的使用。將在訓練開始,梯度更新后以及測試完成后。
MinistNet = MinistNet()
MinistNet = MinistNet.to(device)#損失函數
loss_fn = nn.CrossEntropyLoss()
loss_fn = loss_fn.to(device)#優化器
learning_rate = 1e-3
optimizer = torch.optim.Adam(MinistNet.parameters(),lr=learning_rate)#訓練次數
total_train_step = 0
total_test_step = 0#訓練輪數
epoch = 10#添加tensorboard
writer = SummaryWriter("./logs_train")for i in range(epoch):print("--------第{}輪訓練開始--------".format(i+1))# 訓練步驟開始MinistNet.train() for data in train_dataloader:imgs,targets = dataimgs = imgs.to(device)targets = targets.to(device)outputs = MinistNet(imgs)loss = loss_fn(outputs,targets)optimizer.zero_grad()loss.backward()optimizer.step()total_train_step += 1if total_train_step % 100 == 0:print("訓練次數:{},loss:{}".format(total_train_step,loss.item()))writer.add_scalar("train_loss",loss.item(),total_train_step)writer.flush()#測試步驟開始MinistNet.eval()#測試損失和準確率total_test_loss = 0total_accuracy = 0with torch.no_grad():#與訓練步驟一樣只是數據集變為測試集for data in test_dataloader:imgs,targets = dataimgs = imgs.to(device)targets = targets.to(device)outputs = MinistNet(imgs)#計算損失loss = loss_fn(outputs,targets)#計算總損失total_test_loss += loss.item()#準確次數accuracy = (outputs.argmax(1) == targets).sum()total_accuracy += accuracyprint("整體測試集上的Loss:{}".format(total_test_loss))print("整體測試集上的正確率:{}".format(total_accuracy/test_data_size))writer.add_scalar("test_loss",total_test_loss,total_train_step)writer.add_scalar("test_accuracy",total_accuracy/test_data_size,total_train_step)total_test_step += 1if i == 5:torch.save(MinistNet.state_dict(),"model_dict{}.pth".format(i+1))print("模型已保存")#千萬別忘記
writer.close()
--------第1輪訓練開始--------
訓練次數:100,loss:1.4312055110931396
訓練次數:200,loss:1.2895567417144775
訓練次數:300,loss:1.1959021091461182
...
訓練次數:8900,loss:0.1826225370168686
訓練次數:9000,loss:0.06963329017162323
訓練次數:9100,loss:0.07875239849090576
訓練次數:9200,loss:0.10090328007936478
訓練次數:9300,loss:0.22011485695838928
整體測試集上的Loss:42.02128033316694
整體測試集上的正確率:0.932699978351593
上述圖就是tensorboard
中記錄的損失以及準確率的改變。通過圖像我們可以判斷模型收斂情況。上述圖中模型的損失在不斷減小,準確率在不斷提高,模型良好。接下來我們來驗證一下。
# 測試
X_test = X[0]
print(X_test)
print(y[0])
tensor([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., ...0., 0., 0., 0.])
tensor(5)
model = MinistNet()
model.load_state_dict(torch.load("model_dict6.pth",map_location=torch.device("cpu")))
model.eval()prediction = model(X[0].reshape(1,-1))
print("預測的值為:",prediction)
print("預測類別為:",prediction.argmax(dim=1))
print("真實類別是:",y[0])
預測的值為: tensor([[17.8051, 2.4538, 11.3420, 31.7410, 3.3992, 40.8306, 5.0347, 23.3100,0.6284, 26.4911]], grad_fn=<AddmmBackward0>)
預測類別為: tensor([5])
真實類別是: tensor(5)
結果與實際一直,nice。
恭喜,你使用Pytorch實現了Minist數據集手寫字數字分類的問題!
有更好的實現方法以及更正確、簡潔的解釋,歡迎在評論區討論。希望對大家的學習有所幫助!