by Tal Kol
通過塔爾科爾
如何在Go中編寫防彈代碼:不會失敗的服務器工作流程 (How to write bulletproof code in Go: a workflow for servers that can’t fail)
From time to time you may find yourself facing a daunting task: building a server that really isn’t allowed to fail, a project where the cost of error is extraordinarily high. What is the methodology for approaching such a task?
有時您可能會發現自己面臨著艱巨的任務:構建真正不允許出現故障的服務器,這是一個錯誤成本非常高的項目。 解決這一任務的方法是什么?
您的服務器真的需要防彈嗎? (Does your server really need to be bulletproof?)
Before diving into this excessive workflow, you should ask yourself — does my server really need to be bulletproof? There’s a lot of overhead involved in preparing for the worst, and it’s not always worth it.
在進入這種過多的工作流程之前,您應該問自己-我的服務器真的需要防彈嗎? 為最壞的情況做準備會涉及很多開銷,但這并不總是值得的。
If the cost of error isn’t extraordinarily high, a perfectly valid approach is to make a reasonable best effort for things to work, and if your server breaks, just deal with it. Monitoring tools today and modern workflows of continuous delivery allow us to spot problems in production quickly and fix them almost immediately. For many cases, this is good enough.
如果錯誤的代價不是很高,那么一種完美有效的方法是盡最大努力使事情正常運行,并且如果服務器出現故障,請對其進行處理。 當今的監控工具和持續交付的現代工作流程使我們能夠Swift發現生產中的問題并幾乎立即解決它們。 在許多情況下,這已經足夠了。
In the project I’m working on today, it isn’t. I’m working on implementing a blockchain — a distributed server infrastructure for executing code securely under consensus in a low trust environment. One of the applications of this technology is digital currencies. This is a textbook example where the cost of error is literally high. We naturally want its implementation to be as bulletproof as possible.
在我今天正在從事的項目中,事實并非如此。 我正在努力實現區塊鏈 -一種分布式服務器基礎結構,用于在低信任度環境下以一致的方式安全地執行代碼。 該技術的應用之一是數字貨幣。 這是一個教科書示例,其中錯誤的代價確實很高。 我們自然希望它的實現盡可能地防彈。
There are other cases though, even when not dealing with currencies, where bulletproof code makes sense. The cost of maintenance skyrockets quickly for a codebase that fails frequently. Being able to identify problems earlier in the development cycle, when the cost of fixing them is still low, has a good chance of paying back the upfront investment in a bulletproof methodology.
但是,在其他情況下,即使不處理貨幣,防彈代碼也是有意義的。 對于經常失敗的代碼庫,維護成本飛速上漲。 當解決問題的成本仍然很低時,能夠在開發周期的早期發現問題,就很有可能以防彈方法償還前期投資。
TDD是神奇的答案嗎? (Is TDD the magic answer?)
Test Driven Development (TDD) is often hailed as the silver bullet against malfunctioning code. It is a puristic development methodology where new code isn’t added unless it satisfies a failing test. This process guarantees test coverage of 100 percent and often gives the illusion that your code is tested against every possible scenario.
測試驅動開發 (TDD)通常被譽為防止故障代碼的靈丹妙藥。 這是一種純粹的開發方法,除非滿足失敗的測試,否則不添加新代碼。 這個過程保證了100%的測試覆蓋率,并且常常給人一種幻覺,即您的代碼已針對每種可能的情況進行了測試。
This isn’t the case. TDD is a great methodology that works well for some, but by itself it still isn’t enough. Even worse, TDD instills false confidence in code and may make developers lazy when considering paranoid edge cases. I’ll show a good example of this later on.
事實并非如此。 TDD是一種不錯的方法,對某些人來說效果很好,但僅憑它還不夠。 更糟糕的是,TDD在代碼中產生了錯誤的信心,并可能在考慮偏執的極端情況時使開發人員變得懶惰。 稍后我將展示一個很好的例子。
測試很重要-它們是關鍵 (Tests are important — they are the key)
It doesn’t matter if you write tests before the fact or after, using a technique like TDD or not. All that matters is that you have tests. Tests are the best line of defense for protecting your code against breaking in production.
是否在事實之前或之后使用TDD之類的技術編寫測試都沒有關系。 重要的是您要進行測試。 測試是保護代碼免遭生產中斷的最佳防線。
Since we’re going to run our entire test suite very frequently — after every new line of code if possible — tests must be automated. No part of our confidence in our code can result from a manual QA process. Humans make mistakes. Human attention to detail deteriorates after doing the same mind-numbing task a hundred times in a row.
由于我們要非常頻繁地運行整個測試套件(如果可能,每行新代碼之后),因此必須使測試自動化。 手動的質量檢查流程不會使我們對代碼充滿信心。 人類會犯錯誤。 在進行同樣的麻木之后,人們對細節的關注會下降 連續執行一百次任務。
Tests must be fast. Blazingly fast.
測試必須快速。 快如閃電。
If a test suite takes more than a few seconds to run, developers are likely going to become lazy, pushing code without running it. This is one of the great things about Go — it has one of the fastest toolchains out there. It compiles, rebuilds, and tests in seconds.
如果測試套件的運行時間超過幾秒鐘,則開發人員可能會變得懶惰,在不運行代碼的情況下推送代碼。 這是Go的一大優點-它擁有最快的工具鏈之一。 它可以在幾秒鐘內完成編譯,重建和測試。
Tests are also important enablers for open-source projects. Blockchains, for example, are almost religiously open-source. The codebase must be open to establish trust — expose itself for audit and create a decentralized atmosphere where no single governing entity controls the project.
測試也是開源的重要推動力 項目。 例如,區塊鏈在宗教上幾乎是開源的。 該代碼庫必須開放以建立信任-將自身暴露給審計并創建一種分散的氛圍,其中沒有一個管理實體可以控制該項目。
It is unreasonable to expect massive external contributions in an open-source project without a thorough test suite. External contributors need a quick way to check if their contribution breaks any existing behavior. The entire test suite, in fact, must run automatically on every pull request and fail automatically if the PR broke anything.
如果沒有完整的測試套件,就不可能在開源項目中做出大量外部貢獻。 外部貢獻者需要一種快速的方法來檢查其貢獻是否破壞了任何現有行為。 實際上,整個測試套件必須在每個拉取請求中自動運行,并且如果PR發生任何故障,則自動失敗。
Full test coverage is a misleading metric, but it is important. It may feel excessive to reach 100% coverage, but when you think about it, it makes no sense to ship code to production that was never executed beforehand.
完整的測試覆蓋率是一個令人誤解的指標,但很重要。 達到100%的覆蓋率可能會感覺過高,但是當您考慮它時,將代碼運送到從未執行過的生產中是沒有意義的。
Full test coverage doesn’t necessarily mean that we have enough tests and it doesn’t mean that our tests are meaningful. What is certain is that if we don’t have 100% coverage, we don’t have enough to consider ourselves bulletproof, since parts of our code were never tested.
完整的測試覆蓋范圍并不一定意味著我們有足夠的測試,也并不意味著我們的測試有意義。 可以肯定的是,如果我們沒有100%的覆蓋率,那么我們就沒有足夠的能力來考慮防彈,因為我們的代碼的某些部分從未經過測試。
Nevertheless, there is such a thing as too many tests. Ideally, every bug we encounter should break a single test. If we have redundant tests — different tests that check the same thing — modifying existing code and breaking existing behavior in the process will incur too much overhead in fixing failed tests.
但是,有太多的測試。 理想情況下,我們遇到的每個錯誤都應打破單個測試。 如果我們有冗余測試-不同的測試可以檢查同一件事-修改現有代碼并破壞流程中的現有行為將在修復失敗的測試中產生過多的開銷。
為什么Go是防彈代碼的絕佳選擇? (Why is Go a great choice for bulletproof code?)
Go is statically typed. Types provide a contract between various pieces of code running together. Without automatic type checking during build, if we wanted to adhere to our strict coverage rules, we would have to implement these contract tests ourselves. This is the case with environments like Node.js and JavaScript. Writing comprehensive contract tests manually is a lot of extra work we prefer to avoid.
Go是靜態類型的。 類型提供了一起運行的各種代碼之間的契約。 如果在構建過程中沒有自動類型檢查,那么如果我們要遵守嚴格的覆蓋規則,則必須自己實施這些合同測試。 像Node.js和JavaScript這樣的環境就是這種情況。 手動編寫全面的合同測試是我們要避免的許多額外工作。
Go is simple and dogmatic. Go is known for being stripped of many traditional language features like classic OOP inheritance. Complexity is the worst enemy of bulletproof code. Problems tend to creep up in the seams. While the common case is easy to test, it’s the strange edge case you haven’t thought of that will eventually get you.
圍棋簡單而教條。 Go被剝奪了許多傳統語言功能(例如經典OOP繼承)的功能。 復雜性是防彈代碼的最大敵人。 問題往往在接縫處蔓延。 盡管常見的情況很容易測試,但這是您從未想到過的奇怪的極端情況,它最終會幫助您。
Dogma is also helpful in this sense. There’s often only one way to do something in Go. This may inhibit the free spirit of man, but when there’s one way to do something, it’s more difficult to get this one thing wrong.
從這個意義上講,教條也是有幫助的。 Go中通常只有一種方法來做某事。 這可能會抑制人的自由精神,但是當有一種方法可以做某事時,將這一件事弄錯就更困難了。
Go is concise yet expressive. Readable code is easier to review and audit. If the code is too verbose, its core purpose may be drowned by the noise of boilerplate. If the code is too concise, it becomes hard to follow and understand.
Go簡潔而富有表現力。 可讀的代碼更易于查看和審核。 如果代碼太冗長,其核心目的可能會被樣板的噪音淹沒。 如果代碼過于簡潔,則很難遵循和理解。
Go strikes a nice balance between the two. There’s not a lot of language boilerplate like in Java or C++, but the language is still very explicit and verbose in areas like error handling — making it easy to verify that you’ve checked every possible route.
Go在兩者之間取得了很好的平衡。 沒有像Java或C ++這樣的語言樣板很多,但是在錯誤處理等領域,該語言仍然非常明確和冗長-可以很容易地驗證是否已檢查所有可能的路線。
Go has clear paths of error and recovery. Dealing gracefully with errors in runtime is a cornerstone for bulletproof code. Go has a strict convention of how errors are returned and propagated. Environments like Node.js — where multiple flavors of control flow like callbacks, promises, and async are mixed together — often result in leakage like unhandled promise rejections. Recovering from these is almost impossible.
Go具有明確的錯誤和恢復路徑。 優雅地處理運行時錯誤是防彈代碼的基石。 Go對于錯誤的返回和傳播方式具有嚴格的約定。 諸如Node.js之類的環境(如回調,promise和async等多種控制流混合在一起)通常會導致泄漏, 如未處理的promise拒絕 。 從這些中恢復幾乎是不可能的 。
Go has an extensive standard library. Dependencies add risk, especially when coming from sources that aren’t necessarily well-maintained. When shipping your server, you ship all of your dependencies with it. You are responsible for their malfunctions as well. Environments overflowing with fragmented dependencies, like Node.js, are harder to keep bulletproof.
Go具有廣泛的標準庫。 依賴關系會增加風險,尤其是在來源不一定維護得當的情況下。 運送服務器時,將運送所有依賴項。 您也應對其故障負責。 諸如Node.js之類的零散依賴所溢出的環境更難保持安全。
This is also risky from a security standpoint, as you are as vulnerable as your weakest dependency. Go’s extensive standard library is well-maintained and reduces reliance on external dependencies.
從安全的角度來看,這也是有風險的,因為您與最弱的依賴一樣脆弱。 Go的廣泛標準庫維護良好,減少了對外部依賴項的依賴。
Development velocity is still rapid. The main appeal of environments like Node.js is an extremely rapid development cycle. Code just takes less time to write and you become more productive.
發展速度仍然很快。 像Node.js這樣的環境的主要吸引力在于其開發周期非常快。 代碼只需花費較少的時間來編寫,因此您的工作效率更高。
Go preserves these benefits quite well. The build toolchain is fast enough to make feedback immediate. Compilation time is negligible, and code seems to run like it’s interpreted. The language has enough abstractions like garbage collection to focus engineering efforts on core functionality.
Go可以很好地保留這些好處。 構建工具鏈足夠快,可以立即反饋。 編譯時間可以忽略不計,并且代碼似乎像解釋的那樣運行。 該語言具有足夠的抽象(如垃圾回收),可將工程重點放在核心功能上。
讓我們來看一個可行的例子 (Let’s play with a working example)
Now, with the introductions over, it’s time to dive into some code. We need an example that is simple enough so we can focus on methodology, but complicated enough to have substance. I find it’s easiest to take something from my day to day, so let’s build a server that processes currency-like transactions. Users will be able to check the balance for an account. Users will also be able to transfer funds from one account to another.
現在,隨著介紹的結束,是時候深入研究一些代碼了。 我們需要一個足夠簡單的示例,以便我們可以專注于方法論,但是要足夠復雜,可以有實質內容。 我發現日常工作最簡單,因此讓我們構建一個服務器來處理類似貨幣的交易。 用戶將能夠檢查帳戶余額。 用戶還可以將資金從一個帳戶轉移到另一個帳戶。
We’ll keep things very simple. Our system will only have a single server. We’re also not going to deal with user authentication or cryptography. These are product features, whereas we want to focus on building the bulletproof software foundation.
我們將使事情變得非常簡單。 我們的系統將只有一臺服務器。 我們也不打算處理用戶身份驗證或加密。 這些是產品功能,而我們希望專注于構建防彈軟件基??礎。
將復雜性分解為可管理的部分 (Breaking down complexity to manageable parts)
Complexity is the worst enemy of bulletproof code. One of the best ways to deal with complexity is divide and conquer — split the problem into smaller problems and solve each one separately. How do we split? We’ll follow the principle of separation of concerns. Every part should deal with a single concern.
復雜性是防彈代碼的最大敵人。 解決復雜性的最佳方法之一是分而治之 -將問題分解為較小的問題,然后分別解決每個問題。 我們如何分割? 我們將遵循關注點分離的原則。 每個部分都應處理單個問題。
This goes hand in hand with the popular architecture of microservices. Our server will be comprised of services. Each service will be mandated a clear responsibility and given a well defined interface for communication with the other services.
這與流行的微服務體系結構齊頭并進。 我們的服務器將包含服務。 每個服務將被賦予明確的責任,并具有與其他服務進行通信的明確定義的界面。
Once we’ve structured our server this way, we’ll be free to decide how each service is running. We can run all services together in the same process, make each service its own separate server and communicate via RPC, or split services to run on different machines.
這樣構造服務器后,我們將可以自由決定每種服務的運行方式。 我們可以在同一過程中一起運行所有服務,使每個服務成為自己的獨立服務器并通過RPC進行通信,或者拆分服務以在不同的計算機上運行。
Since we’re just starting out, we’ll keep things simple — all services will share the same process and communicate directly as libraries. We’ll be able to change this decision easily in the future.
由于我們才剛剛起步,因此我們將使事情變得簡單—所有服務將共享相同的過程并直接作為庫進行通信。 將來,我們將能夠輕松更改此決定。
So which services should we have? Our server is a little too simple for splitting up, but to demonstrate this principle we’ll do so anyways. We need to respond to HTTP requests from clients for checking balances and making transactions. One service can deal with the client HTTP interface — we’ll call it PublicApi. Another service will own the state — the ledger of all balances —so we’ll call it StateStorage. The third service will connect the two and implement our business logic of the “contract” for changing balances. Since blockchains usually allow these contracts to be deployed by application developers, the third service will be charged with running them — we’ll call it VirtualMachine.
那么我們應該提供哪些服務? 我們的服務器拆分起來有點太簡單了,但是為了演示這個原理,我們還是會這樣做。 我們需要響應來自客戶端的HTTP請求,以檢查余額和進行交易。 一種服務可以處理客戶端HTTP接口-我們將其稱為PublicApi 。 另一個服務將擁有狀態-所有余額的分類帳-因此我們將其稱為StateStorage 。 第三項服務將兩者聯系起來,并實施我們的“合同”業務邏輯以更改余額。 由于區塊鏈通常允許應用程序開發人員部署這些合同,因此第三項服務將負責運行它們-我們將其稱為VirtualMachine 。
We’ll place the code for services in our project under /services/publicapi
, /services/virtualmachine
and /services/statestorage
.
我們將服務代碼放置在項目中的/services/publicapi
, /services/virtualmachine
和/services/statestorage
。
明確定義服務邊界 (Defining service boundaries clearly)
When implementing services, we’ll want to be able to work on each one separately. Possibly even assign different services to different developers. Since services are dependent on one another and we’re going to parallelize work on their implementation, we’ll have to start by defining clear interfaces between them. Using this interface, we’ll be able to test a service individually and mock everything else.
實施服務時,我們希望能夠分別處理每個服務。 甚至可能將不同的服務分配給不同的開發人員。 由于服務相互依賴,并且我們將并行執行它們的實現,因此我們必須首先定義它們之間的清晰接口。 使用此接口,我們將能夠單獨測試服務并模擬其他所有內容。
How can we define the interface? One option is to document it, but documentation tends to grow stale and out of sync with the code. We could use Go interface declarations. This makes sense, but it’s nicer to define the interface in a language agnostic way. Our server isn’t limited to Go only. We may decide down the road to reimplement one of the services in a different language more appropriate to its requirements.
我們如何定義接口? 一種選擇是對其進行文檔化,但是文檔化往往會過時并且與代碼不同步。 我們可以使用Go 接口聲明。 這是有道理的,但是以一種與語言無關的方式定義接口會更好。 我們的服務器不僅限于Go。 我們可能會決定以更適合其要求的另一種語言重新實現其中一項服務的道路。
One approach is to use protobuf — a simple language-agnostic syntax by Google to define messages and service endpoints.
一種方法是使用protobuf ,即Google的一種與語言無關的簡單語法,用于定義消息和服務端點。
Let’s start with StateStorage. We’ll structure state as a key-value store:
讓我們從StateStorage開始。 我們將狀態結構化為鍵值存儲:
Although PublicApi is accessed via client HTTP, it’s still a good practice to give it a clear interface in the same way:
盡管可以通過客戶端HTTP訪問PublicApi ,但仍然可以采用相同的方式為其提供清晰的接口,這仍然是一個好習慣:
This will require us to define Transaction and Address data structures:
這將需要我們定義交易和地址數據結構:
We’ll place the .proto
definitions for services in our project under /types/services
and general data structures under /types/protocol
. Once the definitions are ready, they can be compiled to Go code. The benefit of this approach is that code which doesn’t meet the contract will simply not compile. Alternate methods would require us to write contract tests explicitly.
我們將服務的.proto
定義放在項目中的/types/services
并將常規數據結構放在/types/protocol
。 定義就緒后,就可以將其編譯為Go代碼。 這種方法的好處是,不符合約定的代碼將無法編譯。 替代方法將要求我們顯式編寫合同測試。
The complete definitions, generated Go files, and compilation instructions are available here. Kudos to Square Engineering for making goprotowrap.
完整的定義,生成的Go文件和編譯說明可在此處獲得 。 向Square Engineering制作goprotowrap表示敬意 。
Note that we’re not integrating an RPC transport layer yet, and calls between services will currently be regular library calls. When we’re ready to split services to different servers, we can add a transport layer like gRPC.
請注意,我們尚未集成RPC傳輸層,并且服務之間的調用當前將是常規庫調用。 當我們準備將服務拆分到不同的服務器時,可以添加諸如gRPC的傳輸層。
我們項目中的測試類型 (The types of tests in our project)
Since tests are the key to bulletproof code, let’s discuss first which types of tests we’ll be writing:
由于測試是防彈代碼的關鍵,因此讓我們首先討論將要編寫的測試類型:
單元測試 (Unit tests)
This is the base of the testing pyramid. We’ll test every unit in isolation. What’s a unit? In Go, we can define a unit to be every file in a package. If we have /services/publicapi/handlers.go
, we’ll place its unit test in the same package under /services/publicapi/handlers_test.go
.
這是測試金字塔的基礎。 我們將單獨測試每個單元。 什么是單位? 在Go中,我們可以將單位定義為包中的每個文件。 如果擁有/services/publicapi/handlers.go
,則將其單元測試放在/services/publicapi/handlers_test.go
下的同一程序包中。
It’s preferable to place unit tests in the same package as the tested code so the tests have access to non-exported variables and functions.
最好將單元測試與測試的代碼放在同一包中,以便測試可以訪問未導出的變量和函數。
服務/集成/組件測試 (Service / integration / component tests)
The next type of tests has multiple names that all refer to the same thing — taking several units and testing them together. This is one level up the pyramid. In our case, we’ll focus on an entire service. These tests define the specifications for a service. For the StateStorage service for example, we’ll place them in /services/statestorage/spec
.
下一類測試具有多個名稱,它們都指向同一事物–采取多個單元并將其一起測試。 這是金字塔上的一層。 在我們的案例中,我們將專注于整個服務。 這些測試定義了服務的規范。 例如,對于StateStorage服務,我們將其放置在/services/statestorage/spec
。
It’s preferable to place these tests in a different package than the tested code to enforce access through exported interfaces only.
最好將這些測試與測試的代碼放在不同的包中,以僅通過導出的接口強制執行訪問。
端到端測試 (End-to-end tests)
This is the top of the testing pyramid, where we test our entire system together with all services combined. These tests define the end-to-end specifications for the system, therefore we’ll place them in our project under /e2e/spec
.
這是測試金字塔的頂部,我們在這里測試整個系統以及所有組合的服務。 這些測試定義了系統的端到端規范,因此我們將它們放在項目中的/e2e/spec
。
These tests as well should be placed in a different package than the tested code to enforce access through exported interfaces only.
這些測試也應與測試代碼放在不同的包中,以僅通過導出的接口強制進行訪問。
Which tests should we write first? Do we start at the base and work our way up? Or go top-down? Both approaches are valid. The benefit of the top-down approach is for building specifications. It’s usually easier to reason about the specifications for the entire system first. Even if we split our system to services the wrong way, the system spec would remain the same. This would also help us understand that.
我們應該首先編寫哪些測試? 我們是否從基礎開始并逐步提高? 還是自上而下? 兩種方法都是有效的。 自上而下方法的好處是用于構建規范。 首先通常更容易推斷整個系統的規格。 即使我們以錯誤的方式拆分系統以提供服務,系統規格也將保持不變。 這也將幫助我們理解這一點。
The drawback of starting top-down is that our end-to-end tests will be the last ones to pass (only after the entire system has been implemented). This means they’ll remain failing for a long time.
自上而下開始的缺點是我們的端到端測試將是最后通過的測試(僅在整個系統實現之后)。 這意味著他們將長期處于失敗狀態。
端到端測試 (End-to-end tests)
Before writing tests, we need to consider whether we’re going to write everything bare-boned or use a framework. Relying on frameworks for dev dependencies is less dangerous than relying on frameworks for production code. In our case, since the Go standard library doesn’t have great support for BDD and this format is excellent for defining specs, we’ll opt for a framework.
在編寫測試之前,我們需要考慮是要編寫所有的內容還是使用框架。 依賴于開發依賴的框架比依賴于生產代碼框架的危險要小。 在我們的案例中,由于Go標準庫對BDD的支持不大,而且這種格式非常適合定義規范,因此我們將選擇一個框架。
There are many excellent candidates like GoConvey and Ginkgo. My personal preference is Ginkgo with Gomega (terrible names, but what can you do) which use syntax like Describe()
and It()
.
有很多優秀的候選人,例如GoConvey和Ginkgo 。 我個人偏愛的是帶有Gomega的 Ginkgo (名稱很糟糕,但是您能做什么),它使用的語法類似于Describe()
和It()
。
So what does a test look like? Checking user balance:
那么測試是什么樣的呢? 檢查用戶余額:
Since our server provides public HTTP interface to the world, we access this web API using http.Get. What about making a transaction?
由于我們的服務器向世界提供了公共HTTP接口,因此我們使用http.Get訪問此Web API。 進行交易呢?
The test is very descriptive and can even replace documentation. As you can see above, we’re allowing accounts to reach a negative balance. This is a product choice. If this weren’t allowed, the test would reflect that.
該測試具有描述性,甚至可以代替文檔。 正如您在上面看到的,我們允許帳戶余額達到負數。 這是產品選擇。 如果不允許這樣做,測試將反映出來。
The complete test file is available here.
完整的測試文件可在此處獲得 。
服務集成/組件測試 (Service integration / component tests)
Now that we’re done with end-to-end tests, we go down the pyramid and implement service tests. This is done for every service separately. Let’s choose a service which has a dependency on another service, because this case is more interesting.
現在我們已經完成了端到端測試,接下來我們進行金字塔并實施服務測試。 分別為每個服務完成此操作。 讓我們選擇一個依賴于另一個服務的服務,因為這種情況更有趣。
We’ll start with VirtualMachine. The protobuf interface for this service is available here. Because VirtualMachine relies on service StateStorage and makes calls to it, we’re going to have to mock StateStorage in order to test VirtualMachine in isolation. The mock object will allow us to control StateStorage’s responses during the test.
我們將從VirtualMachine開始。 此服務的protobuf接口在此處可用。 因為VirtualMachine依賴于服務StateStorage并對其進行調用,所以我們將不得不模擬 StateStorage以便單獨測試VirtualMachine 。 模擬對象將使我們能夠在測試過程中控制StateStorage的響應。
How can we implement mock objects in Go? We can simply create a bare-boned stub implementation, but using a mocking library will also provide us with useful assertions during the test. My preference is go-mock.
我們如何在Go中實現模擬對象? 我們可以簡單地創建一個簡單的存根實現,但是使用模擬庫也可以在測試過程中為我們提供有用的斷言。 我的首選是模擬 。
We’ll place the mock for StateStorage in /services/statestorage/mock.go
. It’s preferable to place mocks in the same package as the mocked code to provide access to non-exported variables and functions. The mock is pretty much just boilerplate at this point, but as our services get more complicated, we may find ourselves adding some logic here. This is the mock:
我們將StateStorage的模擬放置在/services/statestorage/mock.go
。 最好將模擬與模擬代碼放在同一包中,以提供對未導出的變量和函數的訪問。 在這一點上,模擬幾乎只是樣板,但是隨著我們的服務變得越來越復雜,我們可能會發現自己在這里添加了一些邏輯。 這是模擬的:
If you assign different services to different developers, it makes sense to implement the mocks first and share them between the team.
如果您將不同的服務分配給不同的開發人員,則首先實現模擬并在團隊之間共享它們是有意義的。
Let’s get back to writing our service test for VirtualMachine. Which scenario should we test here exactly? It’s best to follow the interface for the service and design tests for each endpoint. We’ll implement the test for the endpoint CallContract()
with the method argument of "GetBalance"
first:
讓我們回到為VirtualMachine編寫服務測試。 我們應該在這里準確測試哪種情況? 最好遵循用于每個端點的服務和設計測試的界面 。 我們將首先使用"GetBalance"
的方法參數對端點CallContract()
進行測試:
Notice that the service we’re testing, VirtualMachine, receives a pointer to its dependency StateStorage in its Start()
method via simple dependency injection. That’s where we pass the mocked instance. Also notice on line 23 where we instruct the mock with how to respond when accessed. When its ReadKey
method is called, it should return the value 100
. We then verify that it indeed was called exactly once in line 28.
注意,我們正在測試的服務VirtualMachine通過簡單的依賴注入在其Start()
方法中接收到一個指向其依賴StateStorage的指針。 那就是我們傳遞模擬實例的地方。 還要注意第23行,其中我們指示模擬程序如何在訪問時做出響應。 調用其ReadKey
方法時,應返回值100
。 然后,我們驗證它確實確實在第28行中被調用過一次。
These tests become the specifications for the service. The full suite for service VirtualMachine is available here. The suites for the other services are available here and here.
這些測試成為服務的規格。 完整的VirtualMachine服務套件可在此處獲得 。 其他服務的套件可在此處和此處獲得 。
最后,實現一個單元 (Let’s implement a unit, finally)
Implementing the contract for method "GetBalance"
is a bit too simple, so let’s move instead to the slightly more complicated implementation for method "Transfer”
. The transfer contract needs to read the balances of both the sender and recipient, calculate their new balances, and write them back to state. The service integration test for it is very similar to the one we just implemented:
為"GetBalance"
方法實現合同有點太簡單了,因此讓我們轉到"Transfer”
方法稍微復雜一點的實現。轉讓合同需要讀取發送者和接收者的余額,計算其新余額,并將它們寫回狀態。針對它的服務集成測試與我們剛剛實施的測試非常相似:
We’ll finally get down to business and create a unit called processor.go
that contains the actual implementation for the contract. This is what our initial implementation turns out:
最后,我們開始做生意,創建一個名為processor.go
的單元,其中包含合同的實際實現。 這是我們最初的實現結果:
This satisfies the service integration test, but the integration test only contains a common case scenario. What about edge cases and potential failures? As you can see, any of the calls we make to StateStorage may fail. If we’re aiming for 100-percent coverage, we need to check all of these cases. A unit test would be a great place to do that.
這滿足了服務集成測試,但是集成測試僅包含一個普通案例。 邊緣情況和潛在故障呢? 如您所見,我們對StateStorage的任何調用都可能失敗。 如果我們的目標是100%覆蓋,則需要檢查所有這些情況。 單元測試將是一個很好的選擇。
Since we’re going to have to run the function multiple times with different inputs and mock settings to reach all flows, a table driven test would make this process a little more efficient. The convention in Go is to avoid fancy frameworks in unit tests. We can drop Ginkgo, but we should probably keep Gomega so our matchers look similar to our previous tests. This is the test:
由于我們將不得不使用不同的輸入和模擬設置來多次運行該函數才能到達所有流,因此表驅動測試將使此過程效率更高。 Go中的約定是避免在單元測試中使用花哨的框架。 我們可以放下Ginkgo ,但是我們應該保留Gomega,這樣我們的匹配器看起來就和我們之前的測試類似。 這是測試:
If you’re weirded out by the “Ω” symbol don’t worry, it’s just a regular variable name (holding a pointer to Gomega). You’re welcome to rename it to anything you like.
如果您對“Ω”符號感到困惑,請放心,它只是一個常規變量名(持有指向Gomega的指針)。 歡迎您將其重命名為任何喜歡的名稱。
For the sake of time, we didn’t show the strict methodology of TDD where a new line of code would only be written to resolve a failing test. Using this methodology, the unit test and implementation for processTransfer()
would be implemented over several iterations.
為了節省時間,我們沒有展示嚴格的TDD方法,即僅編寫新的一行代碼來解決失敗的測試。 使用這種方法,將在多個迭代中實現processTransfer()
的單元測試和實現。
The full suite of unit tests in the VirtualMachine service is available here. The unit tests for the other services are available here and here.
VirtualMachine服務中的全套單元測試可在此處獲得 。 其他服務的單元測試可在此處和此處獲得 。
We’ve reached 100% coverage, our end-to-end tests are passing, our service integration tests are passing and our unit tests are passing. The code fulfills its requirements to the letter and is thoroughly tested.
我們已經達到100%的覆蓋率,端到端測試通過,服務集成測試通過,單元測試通過。 該代碼滿足其要求,并經過全面測試。
Does that mean that everything is working? Unfortunately not. We still have several nasty bugs hiding in plain sight in our simple implementation.
這是否意味著一切正常? 不幸的是沒有。 在我們的簡單實現中,我們仍然隱藏著一些討厭的錯誤。
壓力測試的重要性 (The importance of stress tests)
All of our tests so far tested a single request being handled at any given time. What about synchronization issues? Every HTTP request in Go is handled in its own goroutine. Since these goroutines run concurrently, potentially on different OS threads on different CPU cores, we face synchronization problems. These are very nasty bugs that aren’t easy to track down.
到目前為止,我們所有的測試都測試了在任何給定時間處理的單個請求。 同步問題呢? Go中的每個HTTP請求都在其自己的goroutine中進行處理。 由于這些goroutine并發運行(可能在不同CPU內核上的不同OS線程上運行),因此我們面臨同步問題。 這些是非常討厭的錯誤,不容易找到。
One of the approaches for finding synchronization issues is stressing the system with many requests in parallel and making sure everything still works. This should be an end-to-end test because we want to test synchronization issues across our entire system with all services. We’ll place stress tests in our project under /e2e/stress
.
查找同步問題的方法之一是使系統并行處理許多請求,并確保一切正常。 這應該是一個端到端測試,因為我們想測試整個系統中所有服務的同步問題。 我們將壓力測試放在項目中的/e2e/stress
。
This is what a stress test looks like:
這是壓力測試的樣子:
Notice that the stress test includes random data. It’s recommended to use a constant seed (see line 39) to make the test deterministic. Running a different scenario every time we run our tests isn’t a good idea. Flakiness by tests that sometimes pass and sometimes fail reduces developer confidence in the suite.
請注意,壓力測試包含隨機數據。 建議使用常量種子(請參閱第39行)來確定測試的確定性。 每次運行測試時都運行不同的場景不是一個好主意。 通過有時會通過有時失敗的測試的松懈感會降低開發人員對該套件的信心。
The tricky part about stress tests over HTTP is that most machines have a hard time simulating thousands of concurrent users and opening thousands of concurrent TCP connections (you’ll see strange failures like “maximum file descriptors” or “connection reset by peer”). The code above tries to deal with this gracefully by limiting concurrent connections to batches of 200 and using IdleConnection Transport settings to recycle TCP sessions between batches. If this test is flaky on your machine, try reducing the batch size to 100.
關于通過HTTP進行壓力測試的棘手部分是,大多數計算機很難模擬成千上萬的并發用戶并打開成千上萬的并發TCP連接(您會看到奇怪的故障,例如“最大文件描述符”或“對等方重置連接”)。 上面的代碼嘗試通過將并發連接限制為200個批次并使用IdleConnection Transport設置來回收批次之間的TCP會話,來優雅地處理此問題。 如果您的計算機上的測試不穩定,請嘗試將批次大小減小到100。
Oh no…the test fails:
哦,不……測試失敗:
What happens here? StateStorage is implemented as simple in-memory map. It seems we’re trying to write to this map in parallel from different threads. It may seem at first that we should just replace the regular map with the thread-safe sync.map
but our problem runs a little deeper.
發生什么事了? StateStorage實現為簡單的內存映射。 似乎我們正在嘗試從不同線程并行寫入此映射。 sync.map
看起來,我們應該只用線程安全的sync.map
替換常規映射,但是我們的問題更深了。
Take a look at the processTransfer()
implementation. It reads twice from the state and then writes twice. The set of reads and writes isn’t an atomic transaction, so if another thread changes the state after one thread read from it, we’re going to have data corruption. The fix is to make sure only one instance of processTransfer()
can run concurrently — you can see it here.
看一下processTransfer()
實現。 它從狀態讀取兩次,然后寫入兩次。 讀寫集不是原子事務,因此,如果另一個線程從一個線程讀取狀態后更改了狀態,則將導致數據損壞。 解決方法是確保只有processTransfer()
一個實例可以并發運行-您可以在此處查看 。
Let’s try to run the stress test again. Oh no, another failure!
讓我們嘗試再次運行壓力測試。 哦,不,另一個失敗!
This one requires a little more debugging to understand. It seems that it happens when a user tries to transfer an amount to themselves (the same user is both the sender and recipient). Looking at the implementation, it’s easy to see why this happens.
這需要更多的調試才能理解。 似乎是在用戶嘗試將金額轉給自己時發生的(同一用戶既是發送者又是接收者)。 查看實現,很容易看出為什么會發生這種情況 。
This one is a little disturbing. We’ve followed a TDD-like workflow and we still hit a hard business logic bug. How can that be? Isn’t our code tested against every scenario with 100% coverage?! Well…this bug is the result of a faulty product requirement, not a faulty implementation. The requirements for processTransfer()
should have clearly stated that if a user transfers an amount to themselves, nothing happens.
這個有點令人不安。 我們遵循了類似TDD的工作流程,但仍然遇到了嚴重的業務邏輯錯誤。 怎么可能? 我們的代碼不是針對覆蓋率100%的每種情況進行測試的嗎? 好吧……這個錯誤是產品需求錯誤而不是實施錯誤的結果。 對processTransfer()
的要求應明確說明,如果用戶向自己轉賬金額,則不會發生任何事情。
When we discover a business logic bug, we should always reproduce it first in our unit tests. It’s very easy to add this case to our table driven test from before. The fix is also simple — you can see it here.
當發現業務邏輯錯誤時,我們應該始終在單元測試中首先復制它。 從以前將這種情況添加到我們的表驅動測試中非常容易。 修復也很簡單-您可以在此處查看 。
我們終于有空了嗎? (Are we finally home free?)
After adding the stress tests and making sure everything passes, is our system finally working as intended? Is it finally bulletproof?
添加壓力測試并確保一切順利之后,我們的系統最終是否按預期工作? 終于防彈了嗎?
Unfortunately not.
不幸的是沒有。
We still have some nasty bugs that even the stress test did not uncover. Our “simple” function processTransfer()
is still at risk. Consider what happens if we ever reach this line. The first write to state succeeded but the second fails. We’re about to return an error, but we’ve already corrupted our state by writing to it half-baked data. If we’re going to return an error, we’ll have to undo the first write.
我們仍然有一些討厭的錯誤,甚至壓力測試也沒有發現。 我們的“簡單”函數processTransfer()
仍然處于危險之中。 考慮一下,如果我們達到這條線會發生什么。 第一次寫入狀態成功,但是第二次失敗。 我們將返回一個錯誤,但是我們已經通過寫入半烘焙數據破壞了狀態。 如果要返回錯誤,則必須撤消第一次寫入。
This is a little more complicated to fix. The best solution is probably to change our interface altogether. Instead of having an endpoint in StateStorage named WriteKey
that we call twice, we should probably rename it to WriteKeys
— an endpoint that we’ll call once to write both keys together in one transaction.
修復起來有點復雜。 最好的解決方案可能是完全更改我們的界面 。 與其在StateStorage中有一個名為WriteKey
的端點被調用兩次,我們不應該將其重命名為WriteKeys
我們將調用一次端點以將兩個密鑰一起寫入一個事務中。
There’s a bigger lesson here: a methodical test suite is not enough. Dealing with complex bugs requires critical thinking and paranoid creativity by developers. It’s recommended to have someone else look at your code and perform code reviews in your team. Even better, open sourcing your code and encouraging the community to audit it is one of the best ways to make your code more bulletproof.
這里有一個更大的教訓:有條不紊的測試套件是不夠的。 處理復雜的錯誤需要開發人員進行批判性思考和偏執的創造力。 建議讓其他人查看您的代碼并在團隊中執行代碼審查。 更好的是,開源代碼并鼓勵社區對其進行審核,這是使代碼更安全的最佳方法之一。
All the code in this article is available on Github as a single example repository. You’re welcome to use this project as a starter kit for your next server. You’re also welcome to review the repo and uncover more bugs and make it more bulletproof. Be creatively paranoid!
本文中的所有代碼都可以在Github上作為單個示例存儲庫使用。 歡迎您將此項目用作下一個服務器的入門套件。 也歡迎您查看存儲庫并發現更多錯誤,并使其更加安全。 富有創意的偏執狂!
orbs-network/go-scaffoldgo-scaffold - Scaffold starter project in Go for a micro services based server with thorough testinggithub.com
orbs-network / go-scaffold go-scaffold-Go中的Scaffold入門項目,用于經過全面測試的基于微服務的服務器 github.com
Tal is a founder at Orbs.com — a public blockchain infrastructure for large scale consumer applications with millions of users. To learn more and read the Orbs white papers click here. [Follow on Telegram, Twitter, Reddit]
Tal是Orbs.com的創始人-Orbs.com是一個公共區塊鏈基礎架構,用于擁有數百萬用戶的大規模消費者應用程序。 要了解更多信息并閱讀Orbs白皮書, 請單擊此處 。 [關注電報 , Twitter , Reddit ]
Note: if you’re interested in blockchain — come contribute! Orbs is a fully open source project where anyone can participate.
注意:如果您對區塊鏈感興趣,請貢獻力量! Orbs是一個完全開源的項目,任何人都可以參與。
翻譯自: https://www.freecodecamp.org/news/how-to-write-bulletproof-code-in-go-a-workflow-for-servers-that-cant-fail-10a14a765f22/