by Bharath Raj

巴拉斯·拉吉(Bharath Raj)

如何使用TensorFlow對象檢測API播放Quidditch (How to play Quidditch using the TensorFlow Object Detection API)

Deep Learning never ceases to amaze me. It has had a profound impact on several domains, beating benchmarks left and right.

深度學習永遠不會令我驚訝。它對多個領域產生了深遠的影響，超越了左右基準。

Image classification using convolutional neural networks (CNNs) is fairly easy today, especially with the advent of powerful front-end wrappers such as Keras with a TensorFlow back-end. But what if you want to identify more than one object in an image?

如今，使用卷積神經網絡(CNN)進行圖像分類非常容易，尤其是隨著功能強大的前端包裝程序(例如帶有TensorFlow后端的Keras)的出現。但是，如果您想在一個圖像中識別多個對象怎么辦？

This problem is called “object localization and detection.” It is much more difficult than simple classification. In fact, until 2015, image localization using CNNs was very slow and inefficient. Check out this blog post by Dhruv to read about the history of object detection in Deep Learning, if you’re interested.

此問題稱為“對象定位和檢測”。這比簡單分類困難得多。實際上，直到2015年，使用CNN進行圖像定位都非常緩慢且效率低下。如果您有興趣，請查看Dhruv的這篇博客文章，以了解有關深度學習中對象檢測的歷史記錄。

Sounds cool. But is it hard to code?

聽起來不錯。 但是很難編碼嗎？

Worry not, TensorFlow’s Object Detection API comes to the rescue! They have done most of the heavy lifting for you. All you need to do is to prepare the dataset and set some configurations. You can train your model and use then it for inference.

不用擔心， TensorFlow的對象檢測API可以助您一臂之力！他們為您完成了大部分繁重的工作。您需要做的只是準備數據集并設置一些配置。您可以訓練模型，然后將其用于推理。

TensorFlow also provides pre-trained models, trained on the MS COCO, Kitti, or the Open Images datasets. You could use them as such, if you just want to use it for standard object detection. The drawback is that, they are pre-defined. It can only predict the classes defined by the datasets.

TensorFlow還提供預先訓練的模型，這些模型在MS COCO，Kitti或Open Images數據集上進行訓練。如果您只想將其用于標準對象檢測，則可以按原樣使用它們。缺點是它們是預定義的。它只能預測數據集定義的類。

But, what if you wanted to detect something that’s not on the possible list of classes? That’s the purpose of this blog post. I will guide you through creating your own custom object detection program, using a fun example of Quidditch from the Harry Potter universe! (For all you Star Wars fans, here’s a similar blog post that you might like).

但是，如果您想檢測出可能不在類列表中的東西怎么辦？這就是這篇博客的目的。我將通過一個有趣的哈利波特宇宙中的魁地奇示例，指導您創建自己的自定義對象檢測程序！ (對于您所有的《星球大戰》粉絲，這里可能都是您喜歡的類似博客 )。

入門 (Getting started)

Start by cloning my GitHub repository, found here. This will be your base directory. All the files referenced in this blog post are available in the repository.

通過克隆我的GitHub庫，發現開始在這里。這將是您的基本目錄。該博客文章中引用的所有文件都可以在資源庫中找到。

Alternatively, you can clone the TensorFlow models repo. If you choose the latter, you only need the folders named “slim” and “object_detection,” so feel free to remove the rest. Don’t rename anything inside these folders (unless you’re sure it won’t mess with the code).

另外，您可以克隆TensorFlow 模型庫。如果選擇后者，則只需要名為“ slim”和“ object_detection”的文件夾，因此可以隨意刪除其余的文件夾。不要重命名這些文件夾中的任何內容(除非您確定它不會與代碼混淆)。

依存關系 (Dependencies)

Assuming you have TensorFlow installed, you may need to install a few more dependencies, which you can do by executing the following in the base directory:

假設您已安裝TensorFlow，則可能需要安裝更多依賴關系，可以通過在基本目錄中執行以下操作來完成此依賴關系：

pip install -r requirements.txt

The API uses Protobufs to configure and train model parameters. We need to compile the Protobuf libraries before using them. First, you have to install the Protobuf Compiler using the below command:

該API使用Protobufs來配置和訓練模型參數。在使用它們之前，我們需要先編譯Protobuf庫。首先，您必須使用以下命令安裝Protobuf編譯器：

sudo apt-get install protobuf-compiler

Now, you can compile the Protobuf libraries using the following command:

現在，您可以使用以下命令編譯Protobuf庫：

protoc object_detection/protos/*.proto --python_out=.

You need to append the path of your base directory, as well as your slim directory to your Python path variable. Note that you have to complete this step every time you open a new terminal. You can do so by executing the below command. Alternatively, you can add it to your ~/.bashrc file to automate the process.

您需要將基本目錄的路徑以及苗條目錄附加到Python路徑變量中。請注意，每次打開新終端時必須完成此步驟。您可以通過執行以下命令來實現。或者，您可以將其添加到?/ .bashrc中文件以自動執行該過程。

export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

準備輸入 (Preparing the inputs)

My motive was pretty straightforward. I wanted to build a Quidditch Seeker using TensorFlow. Specifically, I wanted to write a program to locate the snitch at every frame.

我的動機很簡單。我想使用TensorFlow構建Quidditch Seeker。具體來說，我想編寫一個程序來定位每幀的小節。

But then, I decided to up the stakes. How about trying to identify all the moving pieces of equipment used in Quidditch?

但是后來，我決定舉足輕重。如何嘗試識別魁地奇中使用的所有移動設備？

We start by preparing the label_map.pbtxt file. This would contain all the target label names as well as an ID number for each label. Note that the label ID should start from 1. Here’s the content of the file that I used for my project.

我們首先準備label_map.pbtxt文件。這將包含所有目標標簽名稱以及每個標簽的ID號。請注意，標簽ID應該從1開始。這是我用于項目的文件的內容。

item { id: 1 name: ‘snitch’}

item { id: 2 name: ‘quaffle’}

item { id: 3 name: ‘bludger’}

Now, its time to collect the dataset.

現在，該收集數據集了。

Fun! Or boring, depending on your taste, but it’s a mundane task all the same.

好玩！還是無聊，取決于您的口味，但這都是一個平凡的任務。

I collected the dataset by sampling all the frames from a Harry Potter video clip, using a small code snippet I wrote, using the OpenCV framework. Once that was done, I used another code snippet to randomly sample 300 images from the dataset. The code snippets are available in utils.py in my GitHub repo if you would like to do the same.

我使用OpenCV框架，使用我編寫的一個小代碼段，通過對Harry Potter視頻剪輯中的所有幀進行采樣來收集數據集。完成此操作后，我使用了另一個代碼段從數據集中隨機采樣了300張圖像 。如果您想這樣做，可以在我的GitHub存儲庫中的utils.py中找到這些代碼片段。

You heard me right. Only 300 images. Yeah, my dataset wasn’t huge. That’s mainly because I can’t afford to annotate a lot of images. If you want, you can opt for paid services like Amazon Mechanical Turk to annotate your images.

你沒聽錯僅300張圖像。是的，我的數據集并不龐大。這主要是因為我無法注釋很多圖像。如果需要，您可以選擇Amazon Mechanical Turk之類的付費服務來注釋圖像。

注解 (Annotations)

Every image localization task requires ground truth annotations. The annotations used here are XML files with 4 coordinates representing the location of the bounding box surrounding an object, and its label. We use the Pascal VOC format. A sample annotation would look like this:

每個圖像本地化任務都需要地面真相注釋。此處使用的注釋是XML文件，具有4個坐標，分別表示圍繞對象的邊框及其標簽的位置。我們使用Pascal VOC格式。注釋示例如下所示：

<annotation>  <filename>182.jpg</filename>  <size>    <width>1280</width>    <height>586</height>    <depth>3</depth>  </size>  <segmented>0</segmented>  <object>    <name>bludger</name>    <bndbox>      <xmin>581</xmin>      <ymin>106</ymin>      <xmax>618</xmax>      <ymax>142</ymax>    </bndbox>  </object>  <object>    <name>quaffle</name>    <bndbox>      <xmin>127</xmin>      <ymin>406</ymin>      <xmax>239</xmax>      <ymax>526</ymax>    </bndbox>  </object></annotation>

You might be thinking, “Do I really need to go through the pain of manually typing in annotations in XML files?” Absolutely not! There are tools which let you use a GUI to draw boxes over objects and annotate them. Fun! LabelImg is an excellent tool for Linux/Windows users. Alternatively, RectLabel is a good choice for Mac users.

您可能會想：“我真的需要經歷手動輸入XML文件中注釋的痛苦嗎？” 絕對不！有一些工具可讓您使用GUI在對象上繪制框并進行注釋。好玩！ LabelImg是Linux / Windows用戶的絕佳工具。另外，對于Mac用戶， RectLabel是一個不錯的選擇。

A few footnotes before you start collecting your dataset:

開始收集數據集之前的一些腳注：

Do not rename you image files after you annotate them. The code tries to look up an image using the file name specified inside your XML file (Which LabelImg automatically fills in with the image file name). Also, make sure your image and XML files have the same name.
對圖像文件進行注釋后，請勿重命名它們。該代碼嘗試使用XML文件中指定的文件名查找圖像(其中LabelImg自動填充圖像文件名)。另外，請確保您的圖片和XML文件具有相同的名稱 。
Make sure you resize the images to the desired size before you start annotating them. If you do so later on, the annotations will not make sense, and you will have to scale the annotation values inside the XMLs.
開始注釋圖像之前，請確保將圖像調整為所需的尺寸。如果以后再這樣做，注釋將沒有意義，并且您將不得不在XML內部縮放注釋值。
LabelImg may output some extra elements to the XML file (Such as <pose>, <truncated>, <path>). You do not need to remove those as they won’t interfere with the code.
LabelImg可能會將一些額外的元素輸出到XML文件(例如<pose>，<truncated>，<path>)。您無需刪除它們，因為它們不會干擾代碼。

In case you messed up anything, the utils.py file has some utility functions that can help you out. If you just want to give Quidditch a shot, you could download my annotated dataset instead. Both are available in my GitHub repository.

萬一您搞砸了一切， utils.py文件具有一些實用程序功能可以為您提供幫助。如果您只想給Quidditch一炮而紅，則可以下載我帶注釋的數據集。兩者都可以在我的GitHub 存儲庫中找到。

Lastly, create a text file named trainval. It should contain the names of all your image/XML files. For instance, if you have img1.jpg, img2.jpg and img1.xml, img2.xml in your dataset, you trainval.txt file should look like this:

最后，創建一個名為trainval的文本文件。它應該包含所有圖像/ XML文件的名稱。例如，如果數據集中有img1.jpg，img2.jpg和img1.xml，img2.xml，則trainval.txt文件應如下所示：

img1img2

Separate your dataset into two folders, namely images and annotations. Place the label_map.pbtxt and trainval.txt inside your annotations folder. Create a folder named xmls inside the annotations folder and place all your XMLs inside that. Your directory hierarchy should look something like this:

將數據集分成兩個文件夾，即images和notes 。將label_map.pbtxt和trainval.txt放在注釋文件夾中。在注解文件夾中創建一個名為xmls的文件夾，并將所有XML放入其中。您的目錄層次結構應如下所示：

-base_directory|-images|-annotations||-xmls||-label_map.pbtxt||-trainval.txt

The API accepts inputs in the TFRecords file format. Worry not, you can easily convert your current dataset into the required format with the help of a small utility function. Use the create_tf_record.py file provided in my repo to convert your dataset into TFRecords. You should execute the following command in your base directory:

該API接受TFRecords文件格式的輸入。不用擔心，您可以借助一個小的實用程序功能輕松地將當前數據集轉換為所需格式。使用我的倉庫中提供的create_tf_record.py文件將您的數據集轉換為TFRecords。您應該在基本目錄中執行以下命令：

python create_tf_record.py \    --data_dir=`pwd` \    --output_dir=`pwd`

You will find two files, train.record and val.record, after the program finishes its execution. The standard dataset split is 70% for training and 30% for validation. You can change the split fraction in the main() function of the file if needed.

程序完成執行后，您將找到兩個文件train.record和val.record 。標準數據集拆分為訓練的70％和驗證的30％。如果需要，可以在文件的main()函數中更改拆分分數。

訓練模型 (Training the model)

Whew, that was a rather long process to get things ready. The end is almost near. We need to select a localization model to train. Problem is, there are so many options to choose from. Each vary in performance in terms of speed or accuracy. You have to choose the right model for the right job. If you wish to learn more about the trade-off, this paper is a good read.

ew，這是一個漫長的準備過程。末日快到了。我們需要選擇一種本地化模型進行訓練。問題是，有太多選項可供選擇。每個方面在速度或準確性方面都有差異。您必須為正確的工作選擇正確的模型。如果您想了解更多有關權衡的知識，可以閱讀這篇文章。

In short, SSDs are fast but may fail to detect smaller objects with decent accuracy, whereas Faster RCNNs are relatively slower and larger, but have better accuracy.

簡而言之，SSD速度很快，但可能無法以適當的精度檢測較小的對象，而Faster RCNN相對較慢且較大，但具有更好的精度。

The TensorFlow Object Detection API has provided us with a bunch of pre-trained models. It is highly recommended to initialize training using a pre-trained model. It can heavily reduce the training time.

TensorFlow對象檢測API為我們提供了一堆預訓練的模型。強烈建議使用預訓練模型初始化訓練。它可以大大減少培訓時間。

Download one of these models, and extract the contents into your base directory. Since I was more focused on the accuracy, but also wanted a reasonable execution time, I chose the ResNet-50 version of the Faster RCNN model. After extraction, you will receive the model checkpoints, a frozen inference graph, and a pipeline.config file.

下載這些模型之一，然后將內容提取到您的基本目錄中。由于我更加關注精度，而且還希望有合理的執行時間，因此我選擇了Faster RCNN模型的ResNet-50版本。提取后，您將收到模型檢查點，凍結的推理圖和pipeline.config文件。

One last thing remains! You have to define the “training job” in the pipeline.config file. Place the file in the base directory. What really matters is the last few lines of the file — you only need to set the highlighted values to your respective file locations.

最后一件事仍然存在！您必須在pipeline.config文件中定義“培訓工作”。將文件放在基本目錄中。真正重要的是文件的最后幾行-您只需將突出顯示的值設置為您各自的文件位置。

gradient_clipping_by_norm: 10.0  fine_tune_checkpoint: "model.ckpt"  from_detection_checkpoint: true  num_steps: 200000}train_input_reader {  label_map_path: "annotations/label_map.pbtxt"  tf_record_input_reader {    input_path: "train.record"  }}eval_config {  num_examples: 8000  max_evals: 10  use_moving_averages: false}eval_input_reader {  label_map_path: "annotations/label_map.pbtxt"  shuffle: false  num_epochs: 1  num_readers: 1  tf_record_input_reader {    input_path: "val.record"  }}

If you have experience in setting the best hyper parameters for your model, you may do so. The creators have given some rather brief guidelines here.

如果您有為模型設置最佳超級參數的經驗，則可以這樣做。創作者在這里給出了一些相當簡短的指導。

You’re all set to train your model now! Execute the below command to start the training job.

您現在就可以訓練模型了！執行以下命令以開始培訓工作。

python object_detection/train.py \--logtostderr \--pipeline_config_path=pipeline.config \--train_dir=train

My Laptop GPU couldn’t handle the model size (Nvidia 950M, 2GB) so I had to run it on the CPU instead. It took around 7–13 seconds per step on my device. After about 10,000 excruciating steps, the model achieved a pretty good accuracy. I stopped training after it reached 20,000 steps, solely because it had taken two days already.

我的筆記本電腦GPU無法處理模型尺寸(Nvidia 950M，2GB)，因此我不得不在CPU上運行它。我的設備每步花費了大約7-13秒的時間。經過大約10,000個步驟，該模型達到了相當好的精度。達到20,000步后，我停止了訓練，這完全是因為已經花了兩天時間。

You can resume training from a checkpoint by modifying the “fine_tune_checkpoint” attribute from model.ckpt to model.ckpt-xxxx, where xxxx represents the global step number of the saved checkpoint.

您可以通過將model.ckpt的“ fine_tune_checkpoint”屬性從model.ckpt修改為model.ckpt-xxxx來從訓練點恢復訓練，其中xxxx代表已保存檢查點的全局步驟號。

導出模型以進行推斷 (Exporting the model for inference)

What’s the point of training the model if you can’t use it for object detection? API to the rescue again! But there’s a catch. Their inference module requires a frozen graph model as an input. Not to worry though: using the following command, you can export your trained model to a frozen graph model.

如果不能將其用于對象檢測，訓練模型有什么意義？ API再次解救！但是有一個陷阱。他們的推理模塊需要一個凍結的圖模型作為輸入。不過不用擔心：使用以下命令，您可以將訓練后的模型導出為凍結的圖形模型。

python object_detection/export_inference_graph.py \--input_type=image_tensor \--pipeline_config_path=pipeline.config \--trained_checkpoint_prefix=train/model.ckpt-xxxxx \--output_directory=output

Neat! You will obtain a file named frozen_inference_graph.pb, along with a bunch of checkpoint files.

整齊！您將獲得一個名為Frozen_inference_graph.pb的文件，以及一堆檢查點文件。

You can find a file named inference.py in my GitHub repo. You can use it to test or run your object detection module. The code is pretty self explanatory, and is similar to the Object Detection Demo, presented by the creators. You can execute it by typing in the following command:

您可以在我的GitHub存儲庫中找到一個名為inference.py的文件。您可以使用它來測試或運行對象檢測模塊。該代碼很容易解釋，并且類似于創建者提供的“對象檢測演示”。您可以通過鍵入以下命令來執行它：

python object_detection/inference.py \--input_dir={PATH} \--output_dir={PATH} \--label_map={PATH} \--frozen_graph={PATH} \--num_output_classes={NUM}

Replace the highlighted characters {PATH} with the filename or path of the respective file/directory. Replace {NUM} with the number of objects you have defined for your model to detect (In my case, 3).

將突出顯示的字符{PATH}替換為相應文件/目錄的文件名或路徑。將{NUM}替換為為模型定義的要檢測的對象數(在我的情況下為3)。

結果 (Results)

Check out these videos to see its performance for yourself! The first video demonstrates the model’s capability to distinguish all three objects, whereas the second video flaunts its prowess as a seeker.

觀看這些視頻，親自體驗一下！第一個視頻演示了模型區分所有三個對象的能力，而第二個視頻則彰顯了其作為搜尋者的能力。

Pretty impressive I would say! It does have an issue with distinguishing heads from Quidditch objects. But considering the size of our dataset, the performance is pretty good.

我會說非常令人印象深刻！將頭部與魁地奇對象區分開來確實存在問題。但是考慮到數據集的大小，性能相當不錯。

Training it for too long led to massive over-fitting (it was no longer size invariant), even though it reduced some mistakes. You can overcome this by having a larger dataset.

訓練時間過長會導致嚴重的過度擬合(不再大小不變)，即使它減少了一些錯誤。您可以通過擁有更大的數據集來克服這一問題。

Thank you for reading this article! Hit that clap button if you did! Hope it helped you create your own Object Detection program. If you have any questions, you can hit me up on LinkedIn or send me an email (bharathrajn98@gmail.com).

感謝您閱讀本文！如果您按下了拍手按鈕！希望它可以幫助您創建自己的對象檢測程序。如有任何疑問，您可以在LinkedIn上打我，或給我發送電子郵件(bharathrajn98@gmail.com)。