卷積神經網絡手勢識別

by Vagdevi Kommineni

通過瓦格德維·科米尼(Vagdevi Kommineni)

如何構建識別手語手勢的卷積神經網絡 (How to build a convolutional neural network that recognizes sign language gestures)

Sign language has been a major boon for people who are hearing- and speech-impaired. But it can serve its purpose only when the other person can understand sign language. Thus it would be really nice to have a system which could convert the hand gesture image to the corresponding English letter. And so the aim of this post is to build such an American Sign Language Recognition System.

手語一直是聽力和言語障礙人士的主要福音。但是，只有當其他人能夠理解手語時，它才能達到目的。因此，擁有一個可以將手勢圖像轉換為相應英文字母的系統真的很不錯。因此，本文的目的是建立這樣的美國手語識別系統。

Wikipedia has defined ASL as the following:

維基百科將ASL定義如下：

American Sign Language (ASL) is a natural language that serves as the predominant sign language of Deaf communities in the United States and most of Anglophone Canada.
美國手語 ( ASL )是一種自然語言，是美國和加拿大大部分聾人社區的主要手語。

First, the data: it is really important to remember the diversity of image classes with respect to influential factors like lighting conditions, zooming conditions etc. Kaggle data on ASL has all such different variants. Training on such data makes sure our model has pretty good knowledge of each class. So, let's work on the Kaggle data.

首先，數據：記住影響照明條件，縮放條件等影響因素的圖像類別的多樣性非常重要。ASL上的Kaggle數據具有所有這些不同的變體。對此類數據進行培訓可確保我們的模型對每個班級都有相當好的知識。因此，讓我們處理K aggle數據。

The dataset consists of the images of hand gestures for each letter in the English alphabet. The images of a single class are of different variants — that is, zoomed versions, dim and bright light conditions, etc. For each class, there are as many as 3000 images. Let us consider classifying “A”, “B” and “C” images in our work for simplicity. Here are links for the full code for training and testing.

數據集由英語字母中每個字母的手勢圖像組成。單個類別的圖像具有不同的變體-即縮放版本，昏暗和明亮的光照條件等。對于每個類別，最多有3000張圖像。為了簡單起見，讓我們考慮對工作中的“ A”，“ B”和“ C”圖像進行分類。這是培訓和測試的完整代碼的鏈接。

We are going to build an AlexNet to achieve this classification task. Since we are training the CNN, make sure that there is the support of computational resources like GPU.

我們將構建一個AlexNet來完成此分類任務。由于我們正在訓練CNN，因此請確保有GPU等計算資源的支持。

We start by importing the necessary modules.

我們首先導入必要的模塊。

import warningswarnings.filterwarnings("ignore", category=DeprecationWarning)

import osimport cv2import randomimport numpy as npimport kerasfrom random import shufflefrom keras.utils import np_utilsfrom shutil import unpack_archive

print("Imported Modules...")

Download the data zip file from Kaggle data. Now, let us select the gesture images for A, B, and C and split the obtained data into training data, validation data, and test data.

從K aggle數據下載數據zip文件。現在，讓我們選擇A，B和C的手勢圖像，并將獲得的數據分為訓練數據，驗證數據和測試數據。

# data folder pathdata_folder_path = "asl_data/new" files = os.listdir(data_folder_path)

# shuffling the images in the folderfor i in range(10):   shuffle(files)

print("Shuffled Data Files")

# dictionary to maintain numerical labelsclass_dic = {"A":0,"B":1,"C":2}

# dictionary to maintain countsclass_count = {'A':0,'B':0,'C':0}

# training listsX = []Y = []

# validation listsX_val = []Y_val = []

# testing listsX_test = []Y_test = []

for file_name in files:  label = file_name[0]  if label in class_dict:    path = data_folder_path+'/'+file_name    image = cv2.imread(path)    resized_image = cv2.resize(image,(224,224))    if class_count[label]<2000:      class_count[label]+=1      X.append(resized_image)      Y.append(class_dic[label])    elif class_count[label]>=2000 and class_count[label]<2750:      class_count[label]+=1      X_val.append(resized_image)      Y_val.append(class_dic[label])    else:      X_test.append(resized_image)      Y_test.append(class_dic[label])

Each image in the dataset is named according to a naming convention. The 34th image of class A is named as “A_34.jpg”. Hence, we consider only the first element of the name of the file string and check if it is of the desired class.

數據集中的每個圖像均根據命名約定進行命名。 A類的第34張圖像命名為“ A_34.jpg”。因此，我們僅考慮文件字符串名稱的第一個元素，并檢查它是否屬于所需的類。

Also, we are splitting the images based on counts and storing those images in the X and Y lists — X for image, and Y for the corresponding classes. Here, counts refer to the number of images we wish to put in the training, validation, and test sets respectively. So here, out of 3000 images for each class, I have put 2000 images in the training set, 750 images in the validation set, and the remaining in the test set.

另外，我們將基于計數拆分圖像并將這些圖像存儲在X和Y列表中-X表示圖像，Y表示對應的類。在這里，計數是指我們希望分別放入訓練，驗證和測試集中的圖像數量。因此，這里，在每個課程的3000張圖像中，我將2000張圖像放入訓練集中，將750張圖像放入驗證集中，其余的放入測試集中。

Some people also prefer to split based on the total dataset (not for each class as we did here), but this doesn’t promise that all classes are learned properly. The images are read and are stored in the form of Numpy arrays in the lists.

有些人還希望基于總數據集進行拆分(而不是像我們在此處那樣對每個班級進行拆分)，但這并不能保證所有班級都能正確學習。圖像被讀取并以Numpy數組的形式存儲在列表中。

Now the label lists (the Y’s) are encoded to form numerical one-hot vectors. This is done by the np_utils.to_categorical.

現在，標簽列表(Y)被編碼以形成數字一熱向量。這是由np_utils.to_categorical完成的。

# one-hot encodings of the classesY = np_utils.to_categorical(Y)Y_val = np_utils.to_categorical(Y_val)Y_test = np_utils.to_categorical(Y_test)

Now, let us store these images in the form of .npy files. Basically, we create separate .npy files to store the images belonging to each set.

現在，讓我們以.npy文件的形式存儲這些圖像。基本上，我們創建單獨的.npy文件來存儲屬于每個集合的圖像。

if not os.path.exists('Numpy_folder'):    os.makedirs('Numpy_folder')

np.save(npy_data_path+'/train_set.npy',X)np.save(npy_data_path+'/train_classes.npy',Y)

np.save(npy_data_path+'/validation_set.npy',X_val)np.save(npy_data_path+'/validation_classes.npy',Y_val)

np.save(npy_data_path+'/test_set.npy',X_test)np.save(npy_data_path+'/test_classes.npy',Y_test)

print("Data pre-processing Success!")

Now that we have completed the data preprocessing part, let us take a look at the full data preprocessing code here:

現在我們已經完成了數據預處理部分，讓我們在這里查看完整的數據預處理代碼：

# preprocess.py

import warningswarnings.filterwarnings("ignore", category=DeprecationWarning)

import osimport cv2import randomimport numpy as npimport kerasfrom random import shufflefrom keras.utils import np_utilsfrom shutil import unpack_archive

print("Imported Modules...")

# data folder pathdata_folder_path = "asl_data/new" files = os.listdir(data_folder_path)

# shuffling the images in the folderfor i in range(10):   shuffle(files)

print("Shuffled Data Files")

# dictionary to maintain numerical labelsclass_dic = {"A":0,"B":1,"C":2}

# dictionary to maintain countsclass_count = {'A':0,'B':0,'C':0}

# training listsX = []Y = []

# validation listsX_val = []Y_val = []

# testing listsX_test = []Y_test = []

for file_name in files:  label = file_name[0]  if label in class_dict:    path = data_folder_path+'/'+file_name    image = cv2.imread(path)    resized_image = cv2.resize(image,(224,224))    if class_count[label]<2000:      class_count[label]+=1      X.append(resized_image)      Y.append(class_dic[label])    elif class_count[label]>=2000 and class_count[label]<2750:      class_count[label]+=1      X_val.append(resized_image)      Y_val.append(class_dic[label])    else:      X_test.append(resized_image)      Y_test.append(class_dic[label])

# one-hot encodings of the classesY = np_utils.to_categorical(Y)Y_val = np_utils.to_categorical(Y_val)Y_test = np_utils.to_categorical(Y_test)

if not os.path.exists('Numpy_folder'):    os.makedirs('Numpy_folder')

np.save(npy_data_path+'/train_set.npy',X)np.save(npy_data_path+'/train_classes.npy',Y)

np.save(npy_data_path+'/validation_set.npy',X_val)np.save(npy_data_path+'/validation_classes.npy',Y_val)

np.save(npy_data_path+'/test_set.npy',X_test)np.save(npy_data_path+'/test_classes.npy',Y_test)

print("Data pre-processing Success!")

Now comes the training part! Let us start by importing the essential modules so we can construct and train the CNN AlexNet. Here it is primarily done using Keras.

現在是訓練部分！讓我們從導入基本模塊開始，以便我們可以構建和訓練CNN AlexNet。這里主要是使用Keras完成的。

# importing from keras.optimizers import SGDfrom keras.models import Sequentialfrom keras.preprocessing import imagefrom keras.layers.normalization import BatchNormalizationfrom keras.layers import Dense, Activation, Dropout, Flatten,Conv2D, MaxPooling2D

print("Imported Network Essentials")

We next go for loading the images stored in the form of .npy:

接下來，我們將加載以.npy格式存儲的圖像：

X_train=np.load(npy_data_path+"/train_set.npy")Y_train=np.load(npy_data_path+"/train_classes.npy")

X_valid=np.load(npy_data_path+"/validation_set.npy")Y_valid=np.load(npy_data_path+"/validation_classes.npy")

X_test=np.load(npy_data_path+"/test_set.npy")Y_test=np.load(npy_data_path+"/test_classes.npy")

We then head towards defining the structure of our CNN. Assuming prior knowledge of the AlexNet architecture, here is the Keras code for that.

然后，我們走向定義CNN的結構。假設具有AlexNet架構的先驗知識，下面是Keras的代碼。

model = Sequential()

# 1st Convolutional Layermodel.add(Conv2D(filters=96, input_shape=(224,224,3), kernel_size=(11,11),strides=(4,4), padding='valid'))model.add(Activation('relu'))

# Max Pooling model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))

# Batch Normalisation before passing it to the next layermodel.add(BatchNormalization())

# 2nd Convolutional Layermodel.add(Conv2D(filters=256, kernel_size=(11,11), strides=(1,1), padding='valid'))model.add(Activation('relu'))

# Max Poolingmodel.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))

# Batch Normalisationmodel.add(BatchNormalization())

# 3rd Convolutional Layermodel.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='valid'))model.add(Activation('relu'))

# Batch Normalisationmodel.add(BatchNormalization())

# 4th Convolutional Layermodel.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='valid'))model.add(Activation('relu'))

# Batch Normalisationmodel.add(BatchNormalization())

# 5th Convolutional Layermodel.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='valid'))model.add(Activation('relu'))

# Max Poolingmodel.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))

# Batch Normalisationmodel.add(BatchNormalization())

# Passing it to a dense layermodel.add(Flatten())

# 1st Dense Layermodel.add(Dense(4096, input_shape=(224*224*3,)))model.add(Activation('relu'))

# Add Dropout to prevent overfittingmodel.add(Dropout(0.4))

# Batch Normalisationmodel.add(BatchNormalization())

# 2nd Dense Layermodel.add(Dense(4096))model.add(Activation('relu'))

# Add Dropoutmodel.add(Dropout(0.6))

# Batch Normalisationmodel.add(BatchNormalization())

# 3rd Dense Layermodel.add(Dense(1000))model.add(Activation('relu'))

# Add Dropoutmodel.add(Dropout(0.5))

# Batch Normalisationmodel.add(BatchNormalization())

# Output Layermodel.add(Dense(24))model.add(Activation('softmax'))

model.summary()

The Sequential model is a linear stack of layers. We add the convolutional layers (applying filters), activation layers (for non-linearity), max-pooling layers (for computational efficiency) and batch normalization layers (to standardize the input values from the previous layer to the next layer) and the pattern is repeated five times.

Sequential模型是層的線性堆棧。我們添加卷積層(應用過濾器)，激活層(用于非線性)，最大池化層(用于計算效率)和批處理歸一化層(以標準化從上一層到下一層的輸入值)和模式重復五次。

The Batch Normalization layer was introduced in 2014 by Ioffe and Szegedy. It addresses the vanishing gradient problem by standardizing the output of the previous layer, it speeds up the training by reducing the number of required iterations, and it enables the training of deeper neural networks.

批次歸一化層由Ioffe和Szegedy于2014年引入。它通過標準化前一層的輸出來解決消失的梯度問題，通過減少所需的迭代次數來加快訓練速度，并且可以訓練更深的神經網絡。

At last, 3 fully-connected dense layers along with dropouts (to avoid over-fitting) are added.

最后，添加3個完全連接的密集層以及輟學(以避免過度擬合)。

To get the summarized description of the model, use model.summary().

要獲取模型的摘要說明，請使用model.summary()。

The following is the code for the compilation part of the model. We define the optimization method to follow as SGD and set the parameters.

以下是該模型的編譯部分的代碼。我們定義遵循SGD的優化方法并設置參數。

# Compile sgd = SGD(lr=0.001)

model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])

checkpoint = keras.callbacks.ModelCheckpoint("Checkpoint/weights.{epoch:02d}-{val_loss:.2f}.hdf5", monitor='val_loss', verbose=0,

save_best_only=False, save_weights_only=False, mode='auto', period=1)

lr in SGD is the learning rate. Since this is a categorical classification, we use categorical_crossentropy as the loss function in model.compile. We set the optimizer to be sgd, the SGD object we have defined and set the evaluation metric to be accuracy.

SGD中的lr是學習率。由于這是分類分類，因此我們將categorical_crossentropy用作model.compile的損失函數。我們將優化器設置為sgd ， sgd定義的SGD對象，并將評估指標設置為準確性。

While using GPU, sometimes it may happen to interrupt its running. Using checkpoints is the best way to store the weights we had gotten up to the point of interruption, so that we may use them later. The first parameter is to set the place to store: save it as weights.{epoch:02d}-{val_loss:.2f}.hdf5 in the Checkpoints folder.

使用GPU時，有時可能會中斷其運行。使用檢查點是存儲權衡到中斷點的權重的最佳方法，以便我們以后可以使用它們。第一個參數是設置存儲位置：將其保存為weights.{epoch:02d}-{val_loss:.2f}.hdf5位于Checkpoints文件夾中。

Finally, we save the model in the json format and weights in .h5 format. These are thus saved locally in the specified folders.

最后，我們將模型保存為json格式，并將權重保存為.h5格式。因此，這些文件將本地保存在指定的文件夾中。

# serialize model to JSONmodel_json = model.to_json()with open("Weights_Full/model.json", "w") as json_file:    json_file.write(model_json)

# serialize weights to HDF5model.save_weights("Weights_Full/model_weights.h5")print("Saved model to disk")

Let’s look at the whole code of defining and training the network. Consider this as a separate file ‘training.py’.

讓我們看一下定義和訓練網絡的整個代碼。將此視為單獨的文件“ training.py”。

# training.py

from keras.optimizers import SGDfrom keras.models import Sequentialfrom keras.preprocessing import imagefrom keras.layers.normalization import BatchNormalizationfrom keras.layers import Dense, Activation, Dropout, Flatten,Conv2D, MaxPooling2D

print("Imported Network Essentials")

# loading .npy datasetX_train=np.load(npy_data_path+"/train_set.npy")Y_train=np.load(npy_data_path+"/train_classes.npy")

X_valid=np.load(npy_data_path+"/validation_set.npy")Y_valid=np.load(npy_data_path+"/validation_classes.npy")

X_test=np.load(npy_data_path+"/test_set.npy")Y_test=np.load(npy_data_path+"/test_classes.npy")

X_test.shape

model = Sequential()# 1st Convolutional Layermodel.add(Conv2D(filters=96, input_shape=(224,224,3), kernel_size=(11,11),strides=(4,4), padding='valid'))model.add(Activation('relu'))# Pooling model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))# Batch Normalisation before passing it to the next layermodel.add(BatchNormalization())

# 2nd Convolutional Layermodel.add(Conv2D(filters=256, kernel_size=(11,11), strides=(1,1), padding='valid'))model.add(Activation('relu'))# Poolingmodel.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))# Batch Normalisationmodel.add(BatchNormalization())

# 3rd Convolutional Layermodel.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='valid'))model.add(Activation('relu'))# Batch Normalisationmodel.add(BatchNormalization())

# 4th Convolutional Layermodel.add(Conv2D(filters=384, kernel_size=(3,3), strides=(1,1), padding='valid'))model.add(Activation('relu'))# Batch Normalisationmodel.add(BatchNormalization())

# 5th Convolutional Layermodel.add(Conv2D(filters=256, kernel_size=(3,3), strides=(1,1), padding='valid'))model.add(Activation('relu'))# Poolingmodel.add(MaxPooling2D(pool_size=(2,2), strides=(2,2), padding='valid'))# Batch Normalisationmodel.add(BatchNormalization())

# Passing it to a dense layermodel.add(Flatten())# 1st Dense Layermodel.add(Dense(4096, input_shape=(224*224*3,)))model.add(Activation('relu'))# Add Dropout to prevent overfittingmodel.add(Dropout(0.4))# Batch Normalisationmodel.add(BatchNormalization())

# 2nd Dense Layermodel.add(Dense(4096))model.add(Activation('relu'))# Add Dropoutmodel.add(Dropout(0.6))# Batch Normalisationmodel.add(BatchNormalization())

# 3rd Dense Layermodel.add(Dense(1000))model.add(Activation('relu'))# Add Dropoutmodel.add(Dropout(0.5))# Batch Normalisationmodel.add(BatchNormalization())

# Output Layermodel.add(Dense(24))model.add(Activation('softmax'))

model.summary()

# (4) Compile sgd = SGD(lr=0.001)model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])checkpoint = keras.callbacks.ModelCheckpoint("Checkpoint/weights.{epoch:02d}-{val_loss:.2f}.hdf5", monitor='val_loss', verbose=0, save_best_only=False, save_weights_only=False, mode='auto', period=1)# (5) Trainmodel.fit(X_train/255.0, Y_train, batch_size=32, epochs=50, verbose=1,validation_data=(X_valid/255.0,Y_valid/255.0), shuffle=True,callbacks=[checkpoint])

# serialize model to JSONmodel_json = model.to_json()with open("Weights_Full/model.json", "w") as json_file:    json_file.write(model_json)# serialize weights to HDF5model.save_weights("Weights_Full/model_weights.h5")print("Saved model to disk")

When we run the training.py file, we get to see something as follows:

當我們運行training.py文件時，我們將看到以下內容：

For example, considering the first epoch of 12(Epoch 1/12):

例如，考慮第一個紀元12(紀元1/12)：

it took 1852s to complete that epoch
完成了那個時代花了1852年代
the training loss was 0.2441
訓練損失為0.2441
accuracy was 0.9098 on the validation data
驗證數據的準確性為0.9098
0.0069 was the validation loss, and
驗證損失為0.0069，并且
0.9969 was the validation accuracy.
驗證準確性為0.9969。

So based on these values, we know the parameters of which epochs are performing better, where to stop training, and how to tune the hyperparameter values.

因此，基于這些值，我們知道哪些時期的效果更好，在哪里停止訓練以及如何調整超參數值的參數。

Now it’s time for testing!

現在該進行測試了！

# test.py

import warningswarnings.filterwarnings("ignore", category=DeprecationWarning) from keras.preprocessing import imageimport numpy as npfrom keras.models import model_from_jsonfrom sklearn.metrics import accuracy_score

# dimensions of our imagesimage_size = 224

# load the model in json formatwith open('Model/model.json', 'r') as f:    model = model_from_json(f.read())    model.summary()model.load_weights('Model/model_weights.h5')model.load_weights('Weights/weights.250-0.00.hdf5')

X_test=np.load("Numpy/test_set.npy")Y_test=np.load("Numpy/test_classes.npy")

Y_predict = model.predict(X_test)Y_predict = [np.argmax(r) for r in Y_predict]

Y_test = [np.argmax(r) for r in Y_test]

print("##################")acc_score = accuracy_score(Y_test, Y_predict)print("Accuracy: " + str(acc_score))print("##################")

From the above code, we load the saved model architecture and the best weights. Also, we load the .npy files (the Numpy form of the test set) and go for the prediction of these test set of images. In short, we just load the saved model architecture and assign it the learned weights.

從上面的代碼，我們加載保存的模型架構和最佳權重。同樣，我們加載.npy文件(測試集的Numpy形式)，并預測這些圖像測試集。簡而言之，我們只是加載保存的模型架構并為其分配學習的權重。

Now the approximator function along with the learned coefficients (weights) is ready. We just need to test it by feeding the model with the test set images and evaluating its performance on this test set. One of the famous evaluation metrics is accuracy. The accuracy is given by accuracy_score of sklearn.metrics.

現在，近似器函數以及學習的系數(權重)已準備就緒。我們只需要通過向模型提供測試集圖像并評估該測試集的性能來對其進行測試。著名的評估指標之一是準確性。精度由accuracy_score給出 sklearn.metrics 。

Thank you for reading! Happy learning! :)

感謝您的閱讀！學習愉快！ :)