by Harini Janakiraman

通過哈里尼·賈納基拉曼

第22天：如何使用OpenAI Gym和Universe構建AI游戲機器人 (Day 22: How to build an AI Game Bot using OpenAI Gym and Universe)

Let’s face it, AI is everywhere. A face-off battle is unfolding between Elon Musk and Mark Zuckerberg on the future of AI. There are some that demonize it. And some whose utopian views claim that AI could almost be God-like in helping humanity. Whichever side your views tilt, AI is here to stay.

面對現實，人工智能無處不在。埃隆·馬斯克(Elon Musk)和馬克·扎克伯格(Mark Zuckerberg)之間就AI的未來展開的對抗之戰正在展開。有一些妖魔化了它。還有一些人以烏托邦的觀點聲稱，人工智能在幫助人類方面幾乎可以像上帝一樣。無論您的觀點偏向哪一側，人工智能都將繼續存在。

“With artificial intelligence, we are summoning the demon.” — Elon Musk

“借助人工智能，我們正在召喚惡魔。” —伊隆·馬斯克(Elon Musk)

“Fearing a rise of killer robots is like worrying about overpopulation on Mars.” — Andrew Ng

“害怕殺手機器人的崛起就像擔心火星上的人口過多。” －吳彥祖

If you’re excited to dive right in and tinker with AI, then games are a great place to start. They have been the go-to testbed for AI. But before jumping in, here’s a little bit of history on how game programming has evolved through time.

如果您很高興直接潛入并嘗試AI，那么游戲就是一個不錯的起點。它們已成為AI的首選測試平臺。但是在進入之前，這里有一些關于游戲編程如何隨著時間演變的歷史。

游戲編程的歷史 (The History of Game Programming)

Game programmers used to use heuristic if-then-else type decisions to make educated guesses. We saw this in the earliest arcade videos games such as Pong and PacMan. This trend was the norm for a very long time. But game developers can only predict so many scenarios and edge cases so your bot doesn’t run in circles!

游戲程序員過去常常使用啟發式的if-then-else類型決策來進行有根據的猜測。我們在最早的街機視頻游戲(例如Pong和PacMan)中看到了這一點。長期以來，這種趨勢一直是常態。但是游戲開發人員只能預測這么多的場景和極端情況，因此您的機器人不會運轉！

Game developers then tried to mimic how humans would play a game, and modeled human intelligence in a game bot.

然后，游戲開發人員試圖模仿人類如何玩游戲，并在游戲機器人中模擬人類智能。

The team at DeepMind did this by generalizing and modeling intelligence to solve any Atari game thrown at it. The game bot used deep learning neural networks that would have no game-specific knowledge. They beat the game based on the pixels they saw on screen and their knowledge of the game controls. However, parts of DeepMind are still not open-sourced as Google uses it to beat competition.

DeepMind團隊通過對情報進行歸納和建模來解決投擲給它的任何Atari游戲來做到這一點。該游戲機器人使用了深度學習神經網絡，而該神經網絡沒有特定于游戲的知識。他們根據在屏幕上看到的像素和對游戲控件的了解來打敗游戲。但是，由于Google使用DeepMind擊敗競爭對手，因此DeepMind的某些部分仍未開源。

人工智能的民主化 (The Democratization of AI)

To avoid concentrating the incredible power of AI in the hands of a few, Elon Musk founded OpenAI. It seeks to democratize AI by making it accessible to all. Today we shall explore OpenAI Gym and the recently released Universe, which is built on top of Gym.

為了避免將AI的強大功能集中在少數人的手中，Elon Musk創立了OpenAI 。它試圖通過使所有人都能使用來使人工智能民主化。今天，我們將探索OpenAI Gym和最近發布的基于Gym的Universe。

OpenAI Gym provides a simple interface for interacting with and managing any arbitrary dynamic environment. OpenAI Universe is a platform that lets you build a bot and test it out.

OpenAI Gym提供了一個簡單的界面，用于與任意動態環境進行交互和管理。 OpenAI Universe是一個平臺，可讓您構建一個機器人并對其進行測試。

There are thousands of environments. They range from classic Atari games, Minecraft, and Grand Theft Auto, to protein fold simulations that can cure cancer. You can create a bot and run it in any environment using only a few lines of Python code. This is too awesome not to try!

有成千上萬的環境。它們的范圍從經典的Atari游戲，Minecraft和Grand Theft Auto到可以治愈癌癥的蛋白質折疊模擬。您可以創建bot并僅使用幾行Python代碼在任何環境中運行它。這太棒了，不要嘗試！

專案(1小時) (Project (1 Hour))

We are going to build an AI Game Bot that uses the “Reinforcement Learning” technique. I’ll explain that later. It will autonomously play against and beat the Atari game Neon Race Car (you can select any game you want). We will build this game bot using OpenAI’s Gym and Universe libraries.

我們將構建一個使用“強化學習”技術的AI游戲機器人。稍后再解釋。它將自動與Atari游戲Neon Race Car對抗并擊敗Atari游戲Neon Race Car(您可以選擇任何游戲)。我們將使用OpenAI的Gym和Universe庫構建此游戲機器人。

步驟1：安裝 (Step 1: Installation)

Ensure you have Python installed, or install it using Homebrew. You can download a dedicated Python IDE like PyCharm or iPython notebook. I like to keep it simple and use Sublime. Finally, install Gym, Universe and other required libraries using pip.

確保已安裝Python，或使用Homebrew安裝它。您可以下載專用的Python IDE，例如PyCharm或iPython Notebook。我喜歡保持簡單并使用Sublime。最后，使用pip安裝Gym，Universe和其他必需的庫。

// Install python using brewbrew install python3// Install the required OpenAI librariespip3 install gympip3 install numpy incrementalbrew install golang libjpeg-turbo pip install universe

Everything in Universe (the environments) runs as containers inside Docker. In case you don’t have it already, install and run Docker from here.

Universe(環境)中的所有內容都作為Docker內部的容器運行。如果您還沒有它，請從這里安裝并運行Docker。

第2步：編寫游戲機器人代碼 (Step 2: Code the Game Bot)

The Game Bot is coded in Python, so we start by importing the only two dependencies needed: Gym and Universe.

Game Bot是用Python編碼的，因此我們首先導入所需的兩個依賴項：Gym和Universe。

import gymimport universe

For this Game Bot, let’s use my favorite childhood game, Neon Race Cars, as the test environment. You can find a complete list of other environment/games you can choose from here.

對于這個游戲機器人，讓我們使用我最喜歡的童年游戲Neon Race Cars作為測試環境。您可以在此處找到其他環境/游戲的完整列表。

Universe lets you run as many environments as you want in parallel. But for this project, we will use only one.

Universe使您可以并行運行任意多個環境。但是對于這個項目，我們將只使用一個。

env = gym.make(‘flashgames.NeonRace-v0’)env.configure(remotes=1) # creates a local docker container

強化學習 (Reinforcement Learning)

Now we add the game bot logic that uses the reinforcement learning technique. This technique observes the game’s previous state and reward (such as the pixels seen on the screen or the game score). It then comes up with an action to perform on the environment.

現在，我們添加了使用強化學習技術的游戲機器人邏輯。此技術觀察游戲的先前狀態和獎勵(例如，屏幕上看到的像素或游戲得分)。然后提出要在環境上執行的操作。

The goal is to make its next observation better (in our case — to maximize the game score). This action is chosen and performed by an agent (Game Bot) with the intention of maximizing the score. It’s then applied on the environment. The environment records the resulting state and reward based on whether the action was beneficial or not (did it win the game?).

我們的目標是使下一次觀察更好(在我們的案例中-最大化游戲得分)。此動作是由代理商(游戲機器人)選擇并執行的，目的是使得分最大化。然后將其應用于環境。環境根據操作是否有益(它是否贏得了游戲？)記錄結果狀態和獎勵。

Now we can retrieve the list of observations for each environment initialized using the env.reset() method.

現在，我們可以檢索使用env.reset()方法初始化的每個環境的觀察結果列表。

observation_n = env.reset()

The observation here is an environment-specific object. It represents what was observed, such as the raw pixel data on the screen or the game status/score.

這里的觀察是特定于環境的對象。它代表觀察到的內容，例如屏幕上的原始像素數據或游戲狀態/得分。

The next step is to create a game agent using an infinite loop, which continuously performs some action based on the observation. In our bot, let’s define a single action of repeatedly pressing the up arrow (Silly bot! Feel free to evolve it to a complex one…). Action here is defined by the event type (KeyEvent), the control key (Up Arrow), and setting it to true for all observation that the agent sees.

下一步是使用無限循環創建游戲代理，該循環根據觀察結果連續執行某些動作。在我們的機器人中，讓我們定義一個反復按下向上箭頭的動作(Silly機器人！隨意將其演變為一個復雜的……)。此處的操作由事件類型(KeyEvent)，控制鍵(向上箭頭)定義，并針對代理看到的所有觀察值將其設置為true。

while True:action_n = [[('KeyEvent', 'ArrowUp', True)] for ob in observation_n]

We then use the env.step() method to use the action to move forward one time step. This is a very basic implementation of reinforced learning.

然后，我們使用env.step()方法來使用該動作向前移動一個時間步。這是強化學習的非常基本的實現。

observation_n, reward_n, done_n, info = env.step(action_n)

The step method here returns four variables:

這里的step方法返回四個變量：

observation_n: Observations of the environment
observation_n ：對環境的觀察
reward_n: If your action was beneficial or not: +1/-1
reward_n ：如果您的舉動是否有益：+ 1 / -1
done_n: Indicates if the game is over or not: Yes/No
done_n ：指示游戲是否結束：是/否
info: Additional info such as performance and latency for debugging purposes
info ：用于調試目的的其他信息，例如性能和延遲

You can run this action simultaneously for all the environments in which you’re training your bot. Use the env.render() method to start the bot.

您可以在訓練機器人的所有環境中同時運行此操作。使用env.render()方法啟動機器人。

env.render()

Now you have the Game Bot ready to compete with the environment. The complete code for this basic bot as well as an advanced version is available in my Github repo here.

現在您已經準備好與環境競爭。這個基本的機器人，以及一個高級版本的完整代碼，請在我的GitHub庫在這里。

步驟3：運行游戲機器人 (Step 3: Run the Game Bot)

Now for the fun part: ensure Docker is running and run the bot. See it in action beating other cars or failing to do so. If it fails, keep tweaking your bot to make it beat intelligence!

現在開始有趣的部分：確保Docker正在運行并運行該機器人。實際觀察它擊敗其他汽車還是不這樣做。如果失敗，請不斷調整您的機器人使其勝過智能！

python gamebot.py

Keep tinkering with AI and eventually you can unlock God Mode! #100DaysOfCode

繼續修補AI，最終您可以解鎖上帝模式！＃100DaysOfCode

If you enjoyed this, please clap ? so others can see it as well! Follow me on Twitter @HariniLabs or Medium to get the latest updates on other stories or just to say Hi :)

如果喜歡這個，請鼓掌嗎？ S 0其他人可以看到它的！ 在Twitter上關注我@ H ariniLabs或Medium，以獲取其他故事的最新更新，或者只是打個招呼：)

PS: Sign up for my newsletter here to be the first to get fresh new content and it’s filled with a dose of inspiration from the world of #WomenInTech and yes men can signup too!

PS： 在這里注冊我的新聞通訊是第一個獲得新鮮內容的新聞，它充滿了＃WomenInTech世界的啟發，是的，男性也可以注冊！