強化學習簡介

by ADL

通過ADL

Reinforcement Learning is an aspect of Machine learning where an agent learns to behave in an environment, by performing certain actions and observing the rewards/results which it get from those actions.

強化學習是機器學習的一個方面,其中代理通過執行某些動作并觀察其從這些動作中獲得的回報/結果來學習在環境中的行為。

With the advancements in Robotics Arm Manipulation, Google Deep Mind beating a professional Alpha Go Player, and recently the OpenAI team beating a professional DOTA player, the field of reinforcement learning has really exploded in recent years.

隨著機器人手臂操縱技術的進步,Google Deep Mind擊敗了專業的Alpha Go Player以及最近的OpenAI團隊擊敗了專業的DOTA玩家,強化學習領域的確在近年來得到了飛速發展。

In this article, we’ll discuss:

在本文中,我們將討論:

  • What reinforcement learning is and its nitty-gritty like rewards, tasks, etc

    強化學習是什么,它的實質是獎勵,任務等
  • 3 categorizations of reinforcement learning

    強化學習的3種分類

什么是強化學習? (What is Reinforcement Learning?)

Let’s start the explanation with an example — say there is a small baby who starts learning how to walk.

讓我們從一個例子開始說明-假設有一個小嬰兒開始學習如何走路。

Let’s divide this example into two parts:

讓我們將此示例分為兩部分:

1. 嬰兒開始走路并成功地到達沙發上 (1. Baby starts walking and successfully reaches the couch)

Since the couch is the end goal, the baby and the parents are happy.

由于沙發是最終目標,因此嬰兒和父母都很開心。

So, the baby is happy and receives appreciation from her parents. It’s positive — the baby feels good (Positive Reward +n).

因此,嬰兒很快樂,并得到了父母的贊賞。 這是積極的-嬰兒感覺良好(積極獎勵+ n)。

2. 嬰兒開始走動,由于中間的一些障礙而摔倒并受傷。 (2. Baby starts walking and falls due to some obstacle in between and gets bruised.)

Ouch! The baby gets hurt and is in pain. It’s negative — the baby cries (Negative Reward -n).

哎喲! 嬰兒受傷并處于疼痛中。 這是負面的-嬰兒哭了(負面獎勵-n)。

That’s how we humans learn — by trail and error. Reinforcement learning is conceptually the same, but is a computational approach to learn by actions.

這就是我們人類學習的方法–循序漸進。 強化學習在概念上是相同的,但它是通過動作學習的一種計算方法。

強化學習 (Reinforcement Learning)

Let’s suppose that our reinforcement learning agent is learning to play Mario as a example. The reinforcement learning process can be modeled as an iterative loop that works as below:

讓我們假設我們的強化學習代理正在學習玩Mario為例。 可以將增強學習過程建模為一個迭代循環,其工作方式如下:

  • The RL Agent receives state S? from the environment i.e. Mario

    RL代理從環境(即Mario)接收狀態 S?

  • Based on that state S?, the RL agent takes an action A?, say — our RL agent moves right. Initially, this is random.

    根據狀態S?, RL代理采取行動A say,例如-我們的RL代理向右移動。 最初,這是隨機的。

  • Now, the environment is in a new state S1 (new frame from Mario or the game engine)

    現在,環境處于新狀態S1 (來自Mario或游戲引擎的新框架)

  • Environment gives some reward R1 to the RL agent. It probably gives a +1 because the agent is not dead yet.

    環境給RL代理人一些獎勵 R1。 可能是+1,因為代理尚未死亡。

This RL loop continues until we are dead or we reach our destination, and it continuously outputs a sequence of state, action and reward.

RL循環一直持續到我們死亡或到達目的地為止,并不斷輸出一系列狀態,動作和獎勵。

The basic aim of our RL agent is to maximize the reward.

我們的RL代理商的基本目標是使報酬最大化。

獎勵最大化 (Reward Maximization)

The RL agent basically works on a hypothesis of reward maximization. That’s why reinforcement learning should have best possible action in order to maximize the reward.

RL代理基本上基于獎勵最大化的假設。 這就是為什么強化學習應該采取最佳措施以最大程度地提高獎勵的原因。

The cumulative rewards at each time step with the respective action is written as:

每個時間步長和相應操作的累積獎勵寫為:

However, things don’t work in this way when summing up all the rewards.

但是,在總結所有獎勵時,事情就不會以這種方式起作用。

Let us understand this, in detail:

讓我們詳細了解一下:

Let us say our RL agent (Robotic mouse) is in a maze which contains cheese, electricity shocks, and cats. The goal is to eat the maximum amount of cheese before being eaten by the cat or getting an electricity shock.

假設我們的RL代理(機器人老鼠)在迷宮中,迷宮中包含奶酪,電擊和貓 。 目標是在被貓吃掉或觸電之前先吃最大量的奶酪。

It seems obvious to eat the cheese near us rather than the cheese close to the cat or the electricity shock, because the closer we are to the electricity shock or the cat, the danger of being dead increases. As a result, the reward near the cat or the electricity shock, even if it is bigger (more cheese), will be discounted. This is done because of the uncertainty factor.

在我們附近而不是在貓或電擊附近的奶酪上吃奶酪似乎是顯而易見的,因為我們離電擊或貓越近,死亡的危險就越大。 結果,靠近貓或電擊的獎勵,即使更大(更多奶酪)也將被打折。 這樣做是由于不確定性因素。

It makes sense, right?

有道理吧?

獎勵折扣如下所示: (Discounting of rewards works like this:)

We define a discount rate called gamma. It should be between 0 and 1. The larger the gamma, the smaller the discount and vice versa.

我們定義了稱為gamma的折現率。 它應該在0到1之間。伽瑪值越大,折扣越小,反之亦然。

So, our cumulative expected (discounted) rewards is:

因此,我們的累計預期(折現)獎勵為:

強化學習中的任務及其類型 (Tasks and their types in reinforcement learning)

A task is a single instance of a reinforcement learning problem. We basically have two types of tasks: continuous and episodic.

任務是強化學習問題的單個實例。 我們基本上有兩種任務: 連續任務和情節任務

連續任務 (Continuous tasks)

These are the types of tasks that continue forever. For instance, a RL agent that does automated Forex/Stock trading.

這些是永遠持續的任務類型。 例如,執行自動外匯/股票交易的RL代理。

In this case, the agent has to learn how to choose the best actions and simultaneously interacts with the environment. There is no starting point and end state.

在這種情況下,代理必須學習如何選擇最佳操作并同時與環境交互。 沒有起點和終點狀態。

The RL agent has to keep running until we decide to manually stop it.

RL代理必須繼續運行,直到我們決定手動停止它為止。

情景任務 (Episodic task)

In this case, we have a starting point and an ending point called the terminal state. This creates an episode: a list of States (S), Actions (A), Rewards (R).

在這種情況下,我們有一個起點和終點, 稱為終端狀態。 這將創建一個情節 :狀態列表(S),動作(A),獎勵(R)。

For example, playing a game of counter strike, where we shoot our opponents or we get killed by them.We shoot all of them and complete the episode or we are killed. So, there are only two cases for completing the episodes.

對于 例如反擊游戲,我們向對手射擊或者被對手殺死,我們向所有人射擊并完成情節,否則我們被殺死。 因此,只有兩種情況可以完成這些情節。

勘探與開發之間的權衡 (Exploration and exploitation trade off)

There is an important concept of the exploration and exploitation trade off in reinforcement learning. Exploration is all about finding more information about an environment, whereas exploitation is exploiting already known information to maximize the rewards.

在強化學習中,有一個重要的探索和權衡取舍概念。 探索就是尋找有關環境的更多信息,而探索則是利用已知的信息來最大化回報。

Real Life Example: Say you go to the same restaurant every day. You are basically exploiting. But on the other hand, if you search for new restaurant every time before going to any one of them, then it’s exploration. Exploration is very important for the search of future rewards which might be higher than the near rewards.

現實生活示例:假設您每天去同一家餐廳。 您基本上是在利用。 但另一方面,如果您每次去新餐廳之前都去尋找新餐廳,那就是探索 。 探索對于尋找可能比接近獎勵更高的未來獎勵非常重要。

In the above game, our robotic mouse can have a good amount of small cheese (+0.5 each). But at the top of the maze there is a big sum of cheese (+100). So, if we only focus on the nearest reward, our robotic mouse will never reach the big sum of cheese — it will just exploit.

在上面的游戲中,我們的機械鼠標可以放入大量的小奶酪(每個+0.5)。 但是在迷宮的頂部,有一大堆奶酪(+100)。 因此,如果我們只關注最接近的獎勵,那么我們的機器人鼠標將永遠無法獲得大量的奶酪,而只會被利用。

But if the robotic mouse does a little bit of exploration, it can find the big reward i.e. the big cheese.

但是,如果機械鼠標進行一些探索,它可以找到巨大的回報,即巨大的奶酪。

This is the basic concept of the exploration and exploitation trade-off.

這是勘探與開發權衡的基本概念

強化學習的方法 (Approaches to Reinforcement Learning)

Let us now understand the approaches to solving reinforcement learning problems. Basically there are 3 approaches, but we will only take 2 major approaches in this article:

現在讓我們了解解決強化學習問題的方法。 基本上有3種方法,但是在本文中我們將僅采用2種主要方法:

1.基于政策的方法 (1. Policy-based approach)

In policy-based reinforcement learning, we have a policy which we need to optimize. The policy basically defines how the agent behaves:

在基于策略的強化學習中,我們有一個需要優化的策略。 該策略基本上定義了代理的行為方式:

We learn a policy function which helps us in mapping each state to the best action.

我們學習了一項政策功能,可以幫助我們將每個狀態映射到最佳行動。

Getting deep into policies, we further divide policies into two types:

深入了解策略后,我們進一步將策略分為兩種類型:

  • Deterministic: a policy at a given state(s) will always return the same action(a). It means, it is pre-mapped as S=(s) ? A=(a).

    確定性的 :處于給定狀態的策略將始終返回相同的動作。 這意味著,它被預先映射為S =(s)?A =(a)。

  • Stochastic: It gives a distribution of probability over different actions. i.e Stochastic Policy ? p( A = a | S = s )

    隨機的 :它給出了不同動作的概率分布 即隨機策略?p(A = a | S = s)

2.基于價值 (2. Value Based)

In value-based RL, the goal of the agent is to optimize the value function V(s) which is defined as a function that tells us the maximum expected future reward the agent shall get at each state.

在基于價值的RL中,代理的目標是優化價值函數V(s) ,定義為 告訴我們代理商在每個州應獲得的最大預期未來回報的功能。

The value of each state is the total amount of the reward an RL agent can expect to collect over the future, from a particular state.

每個州的價值是RL代理商期望從特定州獲得的獎勵總額。

The agent will use the above value function to select which state to choose at each step. The agent will always take the state with the biggest value.

代理將使用上述值功能來選擇每個步驟要選擇的狀態。 代理將始終采用具有最大價值的狀態。

In the below example, we see that at each step, we will take the biggest value to achieve our goal: 1 ? 3 ? 4 ? 6 so on…

在下面的例子中,我們看到,在每一步,我們將采取的最大價值,實現我們的目標:1?3?4?6等等...

乒乓球游戲—直觀案例研究 (The game of Pong — An Intuitive case study)

Let us take a real life example of playing pong. This case study will just introduce you to the Intuition of How reinforcement Learning Works. We will not get into details in this example, but in the next article we will certainly dig deeper.

讓我們以打乒乓球為例。 本案例研究將向您介紹強化學習的原理 。 在本示例中,我們不會詳細介紹,但是在下一篇文章中,我們當然會進行更深入的研究。

Suppose we teach our RL agent to play the game of Pong.

假設我們教RL經紀人玩Pong游戲。

Basically, we feed in the game frames (new states) to the RL algorithm and let the algorithm decide where to go up or down. This network is said to be a policy network, which we will discuss in our next article.

基本上,我們將游戲框架(新狀態)輸入到RL算法中,然后讓算法決定向上或向下移動的位置。 據說該網絡是一個策略網絡,我們將在下一篇文章中進行討論。

The method used to train this Algorithm is called the policy gradient. We feed random frames from the game engine, and the algorithm produces a random output which gives a reward and this is fed back to the algorithm/network. This is an iterative process.

用于訓練該算法的方法稱為策略梯度 。 我們從游戲引擎中獲取隨機幀,并且該算法產生一個隨機輸出,該輸出給出獎勵并將其反饋給算法/網絡。 這是一個反復的過程。

We will discuss policy gradients in the next Article with greater details.

我們將在下一篇文章中詳細討論政策梯度

In the context of the game, the score board acts as a reward or feed back to the agent. Whenever the agent tends to score +1, it understands that the action taken by it was good enough at that state.

在游戲的背景下,計分板可以作為獎勵或反饋給代理商。 每當代理傾向于得分+1時,它就會了解到代理在該狀態下采取的措施足夠好。

Now we will train the agent to play the pong game. To start, we will feed in a bunch of game frame (states) to the network/algorithm and let the algorithm decide the action.The Initial actions of the agent will obviously be bad, but our agent can sometimes be lucky enough to score a point and this might be a random event. But due to this lucky random event, it receives a reward and this helps the agent to understand that the series of actions were good enough to fetch a reward.

現在,我們將訓練代理商打乒乓球。 首先,我們將一堆游戲框架(狀態)饋送到網絡/算法中,然后由算法決定動作.Agent的初始動作顯然會很糟糕,但我們的Agent有時會很幸運地得分點,這可能是隨機事件。 但是由于這個幸運的隨機事件,它會收到獎勵,這有助于代理了解一系列操作足以獲取獎勵。

So, in the future, the agent is likely to take the actions which will fetch a reward over an action which will not. Intuitively, the RL agent is leaning to play the game.

因此,在將來,代理可能會采取將獲得獎勵的行動,而不是將采取行動的獎勵。 憑直覺,RL特工傾向于玩游戲。

局限性 (Limitations)

During the training of the agent, when an agent loses an episode, then the algorithm will discard or lower the likelyhood of taking all the series of actions which existed in this episode.

在對代理進行訓練期間,當代理丟失某個情節時,該算法將放棄或降低采取該情節中存在的所有一系列動作的可能性。

But if the agent was performing well from the start of the episode, but just due to the last 2 actions the agent lost the game, it does not make sense to discard all the actions. Rather it makes sense if we just remove the last 2 actions which resulted in the loss.

但是,如果特工從情節開始就表現良好 ,而僅僅由于最后2個動作,特工便輸了比賽,則放棄所有動作是沒有道理的。 相反,如果我們僅刪除導致損失的最后兩個動作,這是有道理的。

This is called the Credit Assignment Problem. This problem arises because of a sparse reward setting. That is, instead of getting a reward at every step, we get the reward at the end of the episode. So, it’s on the agent to learn which actions were correct and which actual action led to losing the game.

這稱為學分分配問題。 由于稀疏的獎勵設置而出現此問題 也就是說,我們沒有在每一步都獲得獎勵,而是在情節結束時獲得了獎勵。 因此,由代理來了解哪些動作是正確的,以及哪些實際動作導致游戲輸了。

So, due to this sparse reward setting in RL, the algorithm is very sample-inefficient. This means that huge training examples have to be fed in, in order to train the agent. But the fact is that sparse reward settings fail in many circumstance due to the complexity of the environment.

因此,由于RL中這種稀疏的獎勵設置,該算法的樣本效率非常低。 這意味著必須提供大量的培訓示例,以培訓代理。 但是事實是,由于環境的復雜性,稀疏的獎勵設置在許多情況下都失敗了。

So, there is something called rewards shaping which is used to solve this. But again, rewards shaping also suffers from some limitation as we need to design a custom reward function for every game.

因此,有一種叫做獎勵整形的東西可以用來解決這個問題。 但同樣,由于我們需要為每個游戲設計自定義獎勵功能,因此獎勵塑造也受到一些限制。

結語 (Closing Note)

Today, reinforcement learning is an exciting field of study. Major developments has been made in the field, of which deep reinforcement learning is one.

如今,強化學習已成為令人興奮的研究領域。 該領域已取得重大進展,其中深度強化學習就是其中之一。

We will cover deep reinforcement learning in our upcoming articles. This article covers a lot of concepts. Please take your own time to understand the basic concepts of reinforcement learning.

我們將在以后的文章中介紹深度強化學習。 本文涵蓋了許多概念。 請花些時間來了解強化學習的基本概念。

But, I would like to mention that reinforcement is not a secret black box. Whatever advancements we are seeing today in the field of reinforcement learning are a result of bright minds working day and night on specific applications.

但是,我想提一下,加固不是一個秘密的黑匣子。 我們今天在強化學習領域看到的任何進步都是由于在特定應用程序上日夜工作的頭腦聰明的結果。

Next time we’ll work on a Q-learning agent and also cover some more basic stuff in reinforcement learning.

下次,我們將研究Q學習代理,并且還將介紹強化學習中的一些更基本的知識。

Until, then enjoy AI ?…

直到享受AI為止……

Important : This article is 1st part of Deep Reinforcement Learning series, The Complete series shall be available both on Text Readable forms on Medium and in Video explanatory Form on my channel on YouTube.

重要提示 :本文是“深度強化學習”系列的第一部分,“完整”系列既可以在“ 媒體”上的“文本可讀”表格中使用 ,也可以在我的YouTube頻道 中以“視頻說明”形式使用。

For deep and more Intuitive understanding of reinforcement learning, I would recommend that you watch the below video:

為了對強化學習有更深入,更直觀的理解,我建議您觀看以下視頻:

Subscribe to my YouTube channel For more AI videos : ADL .

訂閱我的YouTube頻道以獲取更多AI視頻: ADL

If you liked my article, please click the ? as I remain motivated to write stuffs and Please follow me on Medium &

如果您喜歡我的文章,請單擊“ ?”。 A S我仍然動機寫的東西,并請跟隨我的中型和

If you have any questions, please let me know in a comment below or Twitter. Subscribe to my YouTube Channel For More Tech videos : ADL .

如果您有任何疑問,請在下面的評論或Twitter中讓我知道。 訂閱我的YouTube頻道以獲取更多技術視頻: ADL

翻譯自: https://www.freecodecamp.org/news/a-brief-introduction-to-reinforcement-learning-7799af5840db/

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/393446.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/393446.shtml
英文地址,請注明出處:http://en.pswp.cn/news/393446.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

leetcode1111. 有效括號的嵌套深度(棧)

給你一個「有效括號字符串」 seq,請你將其分成兩個不相交的有效括號字符串,A 和 B,并使這兩個字符串的深度最小。 不相交:每個 seq[i] 只能分給 A 和 B 二者中的一個,不能既屬于 A 也屬于 B 。 A 或 B 中的元素在原字…

利用Arcgis for javascript API繪制GeoJSON并同時彈出多個Popup

1.引言 由于Arcgis for javascript API不可以繪制Geojson,并且提供的Popup一般只可以彈出一個,在很多專題圖制作中,會遇到不少的麻煩。因此本文結合了兩個現有的Arcgis for javascript API擴充庫,對其進行改造達到繪制Geojson并同…

java 線程簡介_java多線程介紹

java多線程介紹多線程的基本實現進程指運行中的程序,每個進程都會分配一個內存空間,一個進程中存在多個線程,啟動一個JAVA虛擬機,就是打開個一個進程,一個進程有多個線程,當多個線程同時進行,就…

webpack入門——構建簡易版vue-cli

用vue-cli1/2搭建一個vue項目時,可以看到有很多關于webpack配置的文件。我們不需要知道那些繁瑣的配置文件有什么作用,只需在控制臺輸入npm run dev,項目自動啟動,我們就可以愉快的寫業務代碼了。 雖然vue-cli幫我們做好了一切&am…

leetcode43. 字符串相乘

給定兩個以字符串形式表示的非負整數 num1 和 num2,返回 num1 和 num2 的乘積,它們的乘積也表示為字符串形式。 示例 1: 輸入: num1 “2”, num2 “3” 輸出: “6” 代碼 class Solution {public String multiply(String num1, String num2) {if(n…

作業二:個人博客作業內容:需求分析

作業二:個人博客作業內容:需求分析 怎樣與用戶有效溝通獲取用戶的真實需求?訪談,正式訪談系統分析員將提出一些事先準備好的具體問題;非正式訪談中,分析人員將提出一些用戶可以自由回答的開放性問題&#…

HBase數據備份及恢復(導入導出)的常用方法

一、說明 隨著HBase在重要的商業系統中應用的大量增加,許多企業需要通過對它們的HBase集群建立健壯的備份和故障恢復機制來保證它們的企業(數據)資產。備份Hbase時的難點是其待備份的數據集可能非常巨大,因此備份方案必須有很高的…

react和react2_為什么React16是React開發人員的福氣

react和react2by Harsh Makadia通過苛刻馬卡迪亞 為什么React16是React開發人員的福氣 (Why React16 is a blessing to React developers) Just like how people are excited about updating their mobile apps and OS, developers should also be excited to update their fr…

jzoj4598. 【NOIP2016模擬7.9】準備食物

一個th的題(a gensokyo) 難度系數在該知識點下為$2.1$ 區間xor我們很明顯會想到trie樹,將每一個區間$l~r$異或和拆成$sum[l-1]$ $sum[r]$兩個數的異或 注意到二進制的性質,比當前低的位即使都取1加起來都沒有這位選1答案高&#x…

java number轉string_Java Number類, Character類,String類

字符串在Java編程中廣泛使用,字符串就是一系列字符(由一個個的字符組成)。 在Java編程語言中,字符串被視為對象。Java平臺提供String類來創建和操作字符串。1. 創建字符串創建字符串的最直接方法是 -String str "Hello world!";每當它在代碼中…

Android商城開發系列(二)——App啟動歡迎頁面制作

商城APP一般都會在應用啟動時有一個歡迎界面,下面我們來實現一個最簡單的歡迎頁開發:就是打開商城App,先出現歡迎界面,停留幾秒鐘,自動進入應用程序的主界面。 首先先定義WelcomeActivity布局,布局非常簡單…

DELL安裝不了mysql_Windows 版本 Mysql 8.x 安裝

1、官網下載安裝包百度網盤鏈接:https://pan.baidu.com/s/1cFRbQM5720xrzMxbgjPeyA提取碼:xlz72、解壓安裝包并新建一個文件夾作為安裝目錄(mysqlInstall)3、配置 Mysql 環境變量4、在解壓好的目錄下新建一個 my.ini 文件(注意:my.ini 文件和…

lambda 使用_如何使用Lambda和API網關構建API

lambda 使用Do you want to access your database, control your system, or execute some code from another website? An API can do all of this for you, and they’re surprisingly easy to set up.您是否要訪問數據庫,控制系統或從其他網站執行一些代碼&…

Hyper-V Server聯機調整虛擬硬盤大小

1. 技術概述: 從 Windows Server 2012 R2開始,管理員可以在運行虛擬機的同時,使用 Hyper-V 來擴展或壓縮虛擬硬盤的大小。存儲管理員可以通過對運行中的虛擬硬盤執行維護操作來避免代價不菲的停機。不再需要關閉虛擬機,這可以避免…

leetcode162. 尋找峰值(二分法)

峰值元素是指其值大于左右相鄰值的元素。 給定一個輸入數組 nums,其中 nums[i] ≠ nums[i1],找到峰值元素并返回其索引。 數組可能包含多個峰值,在這種情況下,返回任何一個峰值所在位置即可。 你可以假設 nums[-1] nums[n] -…

python網絡爬蟲(5)BeautifulSoup的使用示范

創建并顯示原始內容 其中的lxml第三方解釋器加快解析速度 import bs4 from bs4 import BeautifulSoup html_str """ <html><head><title>The Dormouses story</title></head> <body> <p class"title"><…

Mingw編譯DLib

Mingw編譯DLib 因為機器上安裝了qt-opensource-windows-x86-mingw530-5.8.0&#xff0c;所以準備使用其自帶的mingw530來編譯DLib使用。 因為DLib使用CMake的構建腳本&#xff0c;所以還請先安裝好CMake。 cmake的下載地址如下https://cmake.org/files/v3.7/cmake-3.7.2-win64-…

探索JavaScript的關閉功能

Discover Functional JavaScript was named one of the best new Functional Programming books by BookAuthority!“發現功能JavaScript”被BookAuthority評為最佳新功能編程書籍之一 &#xff01; A closure is an inner function that has access to the outer scope, even…

QueryList 配置curl參數 的文檔位置 QueryList抓取https 終于找到了

需要設置ssl證書&#xff0c;或者不驗證證書&#xff0c;例&#xff1a;$ql QueryList::get(https://...,[],[verify > false]);設置這個 verify > false , 所以curl的其他參數就在這里配置即可 文檔在 https://guzzle-cn.readthedocs.io/zh_CN/latest/request-optio…

leetcode981. 基于時間的鍵值存儲(treemap)

創建一個基于時間的鍵值存儲類 TimeMap&#xff0c;它支持下面兩個操作&#xff1a; set(string key, string value, int timestamp) 存儲鍵 key、值 value&#xff0c;以及給定的時間戳 timestamp。 2. get(string key, int timestamp) 返回先前調用 set(key, value, times…