python每秒20個請求

by Pawe? Piotr Przeradowski

通過Pawe?Piotr Przeradowski

使用Python每秒百萬個請求 (A million requests per second with Python)

Is it possible to hit a million requests per second with Python? Probably not until recently.

使用Python每秒可以處理一百萬個請求嗎？大概直到最近。

A lot of companies are migrating away from Python and to other programming languages so that they can boost their operation performance and save on server prices, but there’s no need really. Python can be right tool for the job.

許多公司正在從Python遷移到其他編程語言，以便它們可以提高其操作性能并節省服務器價格，但實際上并不需要。 Python可能是完成這項工作的合適工具。

The Python community is doing a lot of around performance lately. CPython 3.6 boosted overall interpreter performance with new dictionary implementation. CPython 3.7 is going to be even faster, thanks to the introduction of faster call convention and dictionary lookup caches.

最近，Python社區在性能方面做了很多工作。 CPython 3.6通過新的字典實現提高了整體解釋器的性能。由于引入了更快的調用約定和字典查找緩存，CPython 3.7的速度將會更快。

For number crunching tasks you can use PyPy with its just-in-time code compilation. You can also run NumPy’s test suite, which now has improved overall compatibility with C extensions. Later this year PyPy is expected to reach Python 3.5 conformance.

對于數字處理任務，您可以將PyPy與即時代碼一起使用。您還可以運行NumPy的測試套件，該套件現在已改善了與C擴展的整體兼容性。今年晚些時候，PyPy有望達到Python 3.5一致性。

All this great work inspired me to innovate in one of the areas where Python is used extensively: web and micro-service development.

所有這些出色的工作激發了我在廣泛使用Python的領域之一中進行創新：Web和微服務開發。

輸入Japronto！ (Enter Japronto!)

Japronto is a brand new micro-framework tailored for your micro-services needs. Its main goals include being fast, scalable, and lightweight. It lets you do both synchronous and asynchronous programming thanks to asyncio. And it’s shamelessly fast. Even faster than NodeJS and Go.

Japronto是為您的微服務需求量身定制的全新微框架。其主要目標包括快速， 可擴展和輕量級 。多虧了asyncio ，您可以同時進行同步和異步編程。而且它速度很快 。比NodeJS和Go還要快。

Errata: As user @heppu points out, Go’s stdlib HTTP server can be 12% faster than this graph shows when written more carefully. Also there’s an awesome fasthttp server for Go that apparently is only 18% slower than Japronto in this particular benchmark. Awesome! For details see https://github.com/squeaky-pl/japronto/pull/12 and https://github.com/squeaky-pl/japronto/pull/14.

勘誤：正如用戶@heppu所指出的，如果更仔細地編寫，Go的stdlib HTTP服務器可以比此圖顯示的速度快12％ 。還有一個很棒的Go的fasthttp服務器，顯然在該特定基準測試中僅比Japronto 慢18％ 。太棒了！有關詳細信息，請參見https://github.com/squeaky-pl/japronto/pull/12和https://github.com/squeaky-pl/japronto/pull/14 。

We can also see that Meinheld WSGI server is almost on par with NodeJS and Go. Despite of its inherently blocking design, it is a great performer compared to the preceding four, which are asynchronous Python solutions. So never trust anyone who says that asynchronous systems are always speedier. They are almost always more concurrent, but there’s much more to it than just that.

我們還可以看到Meinheld WSGI服務器幾乎與NodeJS和Go相當。盡管具有固有的阻塞設計，但與異步Python解決方案的前四個相比，它的性能出色。因此，永遠不要相信任何說異步系統總是更快的人。它們幾乎總是并發的，但是不僅僅如此。

I performed this micro benchmark using a “Hello world!” application, but it clearly demonstrates server-framework overhead for a number of solutions.

我使用“ Hello world！”執行了此微型基準測試。應用程序，但是它清楚地展示了許多解決方案的服務器框架開銷。

These results were obtained on an AWS c4.2xlarge instance that had 8 VCPUs, launched in S?o Paulo region with default shared tenancy and HVM virtualization and magnetic storage. The machine was running Ubuntu 16.04.1 LTS (Xenial Xerus) with the Linux 4.4.0–53-generic x86_64 kernel. The OS was reporting Xeon? CPU E5–2666 v3 @ 2.90GHz CPU. I used Python 3.6, which I freshly compiled from its source code.

這些結果是在具有8個VCPU的AWS c4.2xlarge實例上獲得的，該實例在圣保羅地區啟動，具有默認的共享租期以及HVM虛擬化和磁存儲。該機器運行的Linux 14.04.1 LTS(Xenial Xerus)和Linux 4.4.0–53通用x86_64內核。操作系統報告的是Xeon?CPU E5-2666 v3 @ 2.90GHz CPU。我使用了Python 3.6，我從它的源代碼中重新編譯了它。

To be fair, all the contestants (including Go) were running a single-worker process. Servers were load tested using wrk with 1 thread, 100 connections, and 24 simultaneous (pipelined) requests per connection (cumulative parallelism of 2400 requests).

公平地說，所有參賽者(包括圍棋)都在運行一個單一工作流程。使用wrk對服務器進行了負載測試，該服務器具有1個線程，100個連接以及每個連接24個同時(流水線)請求(2400個請求的并行并行性)。

HTTP pipelining is crucial here since it’s one of the optimizations that Japronto takes into account when executing requests.

HTTP管道在這里至關重要，因為它是Japronto在執行請求時要考慮的優化之一。

Most of the servers execute requests from pipelining clients in the same fashion they would from non-pipelining clients. They don’t try to optimize it. (In fact Sanic and Meinheld will also silently drop requests from pipelining clients, which is a violation of HTTP 1.1 protocol.)

大多數服務器以與非流水線客戶端相同的方式執行流水線客戶端的請求。他們不會嘗試對其進行優化。 (實際上，Sanic和Meinheld還將靜默丟棄來自流水線客戶端的請求，這違反了HTTP 1.1協議。)

In simple words, pipelining is a technique in which the client doesn’t need to wait for the response before sending subsequent requests over the same TCP connection. To ensure integrity of the communication, the server sends back several responses in the same order requests are received.

簡而言之，流水線技術是一種客戶端無需在通過同一TCP連接發送后續請求之前等待響應的技術。為了確保通信的完整性，服務器以接收到相同順序的請求發送回幾個響應。

優化的細節 (The gory details of optimizations)

When many small GET requests are pipelined together by the client, there’s a high probability that they’ll arrive in one TCP packet (thanks to Nagle’s algorithm) on the server side, then be read back by one system call.

當客戶端將許多小的GET請求流水線化在一起時，很有可能它們會到達服務器端的一個TCP數據包(由于Nagle的算法 )，然后被一個系統調用 讀回。

Doing a system call and moving data from kernel-space to user-space is a very expensive operation compared to, say, moving memory inside process space. That’s why doing it’s important to perform as few as necessary system calls (but no less).

與在進程空間內移動內存相比，進行系統調用并將數據從內核空間移至用戶空間是一項非常昂貴的操作。這就是為什么執行盡可能少的必要系統調用(但不少于此)很重要的原因。

When Japronto receives data and successfully parses several requests out of it, it tries to execute all the requests as fast as possible, glue responses back in correct order, then write back in one system call. In fact the kernel can aid in the gluing part, thanks to scatter/gather IO system calls, which Japronto doesn’t use yet.

當Japronto接收數據并成功解析出其中的幾個請求時，它會嘗試盡快執行所有請求，以正確的順序粘貼響應，然后在一個系統調用中 寫回。實際上，由于Japronto尚未使用分散/聚集IO系統調用，因此內核可以幫助完成粘合工作。

Note that this isn’t always possible, since some of the requests could take too long, and waiting for them would needlessly increase latency.

請注意，這并非總是可能的，因為某些請求可能會花費很長時間，而等待它們會不必要地增加延遲。

Take care when you tune heuristics, and consider the cost of system calls and the expected request completion time.

調整啟發式方法時請多加注意，并考慮系統調用的成本和預期的請求完成時間。

Besides delaying writes for pipelined clients, there are several other techniques that the code employs.

除了延遲對流水線客戶端的寫入，該代碼還采用了其他幾種技術。

Japronto is written almost entirely in C. The parser, protocol, connection reaper, router, request, and response objects are written as C extensions.

Japronto幾乎完全用C編寫。解析器，協議，連接收割器，路由器，請求和響應對象都寫為C擴展。

Japronto tries hard to delay creation of Python counterparts of its internal structures until asked explicitly. For example, a headers dictionary won’t be created until it’s requested in a view. All the token boundaries are already marked before but normalization of header keys, and creation of several str objects is done when they’re accessed for the first time.

Japronto努力延遲其內部結構的Python對應版本的創建，直到明確要求為止。例如，只有在視圖中請求時才創建標題字典。除令牌頭鍵已標準化外，所有標記邊界均已標記過，并且首次訪問它們時會創建多個str對象。

Japronto relies on the excellent picohttpparser C library for parsing status line, headers, and a chunked HTTP message body. Picohttpparser directly employs text processing instructions found in modern CPUs with SSE4.2 extensions (almost any 10-year-old x86_64 CPU has it) to quickly match boundaries of HTTP tokens. The I/O is handled by the super awesome uvloop, which itself is a wrapper around libuv. At the lowest level, this is a bridge to epoll system call providing asynchronous notifications on read-write readiness.

Japronto依靠出色的picohttpparser C庫來解析狀態行，標頭和分塊的HTTP消息正文。 Picohttpparser直接使用具有SSE4.2擴展的現代CPU(幾乎任何使用10年的x86_64 CPU都有)的文本處理指令來快速匹配HTTP令牌的邊界。 I / O由超棒的uvloop處理，它本身就是libuv的包裝。在最低級別上，這是epoll系統調用的橋梁，可提供有關讀寫就緒狀態的異步通知。

Python is a garbage collected language, so care needs to be taken when designing high performance systems so as not to needlessly increase pressure on the garbage collector. The internal design of Japronto tries to avoid reference cycles and do as few allocations/deallocations as necessary. It does this by preallocating some objects into so-called arenas. It also tries to reuse Python objects for future requests if they’re no longer referenced instead of throwing them away.

Python是一種垃圾收集語言，因此在設計高性能系統時需要格外小心，以免不必要地增加垃圾收集器的壓力。 Japronto的內部設計試圖避免參考循環，并根據需要進行盡可能少的分配/取消分配。它通過將一些對象預分配到所謂的“競技場”中來實現。如果不再引用它們，它還會嘗試將Python對象重新用于將來的請求，而不是丟棄它們。

All the allocations are done as multiples of 4KB. Internal structures are carefully laid out so that data used frequently together is close enough in memory, minimizing the possibility of cache misses.

所有分配均以4KB的倍數完成。內部結構經過精心布置，以便經常一起使用的數據在內存中足夠接近，從而最大程度地減少了高速緩存未命中的可能性。

Japronto tries to not copy between buffers unnecessarily, and does many operations in-place. For example, it percent-decodes the path before matching in the router process.

Japronto嘗試不要在緩沖區之間不必要地復制，并就地執行許多操作。例如，它在路由器過程中匹配之前對路徑進行百分比解碼。

開源貢獻者，我可以使用您的幫助。 (Open source contributors, I could use your help.)

I’ve been working on Japronto continuously for past 3 months — often during weekends, as well as normal work days. This was only possible due to me taking a break from my regular programmer job and putting all my effort into this project.

在過去的3個月中，我一直在Japronto上工作-經常在周末以及正常工作日。這僅是由于我中斷了我的常規程序員工作，并將所有精力都投入到這個項目中。

I think it’s time to share the fruit of my labor with the community.

我認為是時候與社區分享我的工作成果了。

Currently Japronto implements a pretty solid feature set:

目前， Japronto實現了非常可靠的功能集：

HTTP 1.x implementation with support for chunked uploads
HTTP 1.x實現，支持分塊上傳
Full support for HTTP pipelining
全面支持HTTP流水線
Keep-alive connections with configurable reaper
具有可配置收割器的保持活動連接
Support for synchronous and asynchronous views
支持同步和異步視圖
Master-multiworker model based on forking
基于分叉的Master-multiworker模型
Support for code reloading on changes
支持更改時重新加載代碼
Simple routing
簡單的路由

I would like to look into Websockets and streaming HTTP responses asynchronously next.

接下來，我想研究Websockets和HTTP流異步響應。

There’s a lot of work to be done in terms of documenting and testing. If you’re interested in helping, please contact me directly on Twitter. Here’s Japronto’s GitHub project repository.

在文檔和測試方面，有很多工作要做。如果您有興趣提供幫助，請直接通過Twitter與我聯系。這是Japronto的GitHub項目存儲庫。

Also, if your company is looking for a Python developer who’s a performance freak and also does DevOps, I’m open to hearing about that. I am going to consider positions worldwide.

另外，如果您的公司正在尋找表現出色并且也從事DevOps的Python開發人員，那么我很樂意聽到這一點。我將考慮在全球范圍內的職位。

最后的話 (Final words)

All the techniques that I’ve mentioned here are not really specific to Python. They could be probably employed in other languages like Ruby, JavaScript or even PHP. I’d be interested in doing such work, too, but this sadly will not happen unless somebody can fund it.

我在這里提到的所有技術并不是真正針對Python的。它們可能會用在其他語言中，例如Ruby，JavaScript甚至PHP。我也會對這樣的工作感興趣，但是可悲的是，除非有人能夠資助，否則這不會發生。

I’d like to thank Python community for their continuous investment in performance engineering. Namely Victor Stinner @VictorStinner, INADA Naoki @methane and Yury Selivanov @1st1 and entire PyPy team.

我要感謝Python社區對性能工程的持續投資。即Victor Stinner @VictorStinner， INADA Naoki @甲烷和Yury Selivanov @ 1st1以及整個PyPy團隊。

For the love of Python.

對于Python的熱愛。