kafka應用于區塊鏈

by Luc Russell

盧克·羅素(Luc Russell)

Apache Kafka的區塊鏈實驗 (A blockchain experiment with Apache Kafka)

Blockchain technology and Apache Kafka share characteristics which suggest a natural affinity. For instance, both share the concept of an ‘immutable append only log’. In the case of a Kafka partition:

區塊鏈技術和Apache Kafka具有共同的特征，這暗示了自然的親和力。例如，兩者共享“不可變的僅追加日志”的概念。如果是Kafka分區：

Each partition is an ordered, immutable sequence of records that is continually appended to — a structured commit log. The records in the partitions are each assigned a sequential id number called the offset that uniquely identifies each record within the partition [Apache Kafka]
每個分區都是有序的，不變的記錄序列，這些記錄連續地附加到結構化的提交日志中。 分區中的每個記錄均分配有一個順序ID號，稱為偏移量，該ID唯一地標識分區中的每個記錄[ Apache Kafka ]

Whereas a blockchain can be described as:

而區塊鏈可以描述為：

a continuously growing list of records, called blocks, which are linked and secured using cryptography. Each block typically contains a hash pointer as a link to a previous block, a timestamp and transaction data [Wikipedia]
不斷增長的記錄列表(稱為塊)，這些記錄使用密碼進行鏈接和保護。 每個塊通常包含一個哈希指針(作為指向前一個塊的鏈接)，時間戳和交易數據[ Wikipedia ]

Clearly, these technologies share the parallel concepts of an immutable sequential structure, with Kafka being particularly optimized for high throughput and horizontal scalability, and blockchain excelling in guaranteeing the order and structure of a sequence.

顯然，這些技術共享不可變順序結構的并行概念，其中Kafka特別針對高吞吐量和水平可伸縮性進行了優化，而區塊鏈在保證序列的順序和結構方面表現出色。

By integrating these technologies, we can create a platform for experimenting with blockchain concepts.

通過集成這些技術，我們可以創建一個試驗區塊鏈概念的平臺。

Kafka provides a convenient framework for distributed peer to peer communication, with some characteristics particularly suitable for blockchain applications. While this approach may not be viable in a trustless public environment, there could be practical uses in a private or consortium network. See Scaling Blockchains with Apache Kafka for further ideas on how this could be implemented.

Kafka為分布式對等通信提供了方便的框架，具有一些特別適合于區塊鏈應用程序的特征。盡管此方法在不信任的公共環境中可能不可行，但在私有或聯盟網絡中可能會有實際用途。有關如何實現此功能的更多想法，請參見使用Apache Kafka擴展區塊鏈。

Additionally, with some experimentation, we may be able to draw on concepts already implemented in Kafka (e.g. sharding by partition) to explore solutions to blockchain challenges in public networks (e.g. scalability problems).

另外，通過一些試驗，我們也許能夠利用已經在Kafka中實現的概念(例如，按分區分片)來探索解決公共網絡中的區塊鏈挑戰(例如，可伸縮性問題)的解決方案。

The purpose of this experiment is therefore to take a simple blockchain implementation and port it to the Kafka platform; we’ll take Kafka’s concept of a sequential log and guarantee immutability by chaining the entries together with hashes. The blockchain topic on Kafka will become our distributed ledger. Graphically, it will look like this:

因此，本實驗的目的是采用簡單的區塊鏈實現并將其移植到Kafka平臺。我們將采用Kafka的順序日志的概念，并通過將條目與哈希值鏈接在一起來確保不變性。卡夫卡上的blockchain主題將成為我們的分布式賬本。在圖形上，它將如下所示：

卡夫卡簡介 (Introduction to Kafka)

Kafka is a streaming platform designed for high-throughput, real-time messaging, i.e. it enables publication and subscription to streams of records. In this respect it is similar to a message queue or a traditional enterprise messaging system. Some of the characteristics are:

Kafka是用于高吞吐量，實時消息傳遞的流媒體平臺，即，它可以發布和訂閱記錄流。在這方面，它類似于消??息隊列或傳統的企業消息傳遞系統。一些特征是：

High throughput: Kafka brokers can absorb gigabytes of data per second, translating into millions of messages per second. You can read more about the scalability characteristics in Benchmarking Apache Kafka: 2 Million Writes Per Second.
高吞吐量：Kafka代理可以每秒吸收千兆字節的數據，每秒可以轉換成數百萬條消息。您可以在基準化Apache Kafka：每秒2百萬次寫入中了解有關可伸縮性特征的更多信息。
Competing consumers: Simultaneous delivery of messages to multiple consumers, typically expensive in traditional messaging systems, is no more complex than for a single consumer. This means we can design for competing consumers, guaranteeing that each consumer will receive only one of the messages and achieving a high degree of horizontal scalability.
競爭的消費者：向多個消費者同時傳遞消息(通常在傳統消息傳遞系統中價格昂貴)并不比單個消費者復雜。這意味著我們可以為競爭的消費者進行設計，從而確保每個消費者僅接收到一條消息，并實現高度的水平可擴展性。
Fault tolerance: By replicating data across multiple nodes in a cluster, the impact of individual node failures is minimized.
容錯能力：通過在群集中的多個節點之間復制數據，可以將單個節點故障的影響降至最低。
Message retention and replay: Kafka brokers maintain a record of consumer offsets — a consumer’s position in the stream of messages. Using this, consumers can rewind to a previous position in the stream even if the messages have already been delivered, allowing them to recreate the status of the system at a point in time. Brokers can be configured to retain messages indefinitely, which is necessary for blockchain applications.
消息保留和重播：Kafka經紀人保留消費者補償記錄-消費者在消息流中的位置。使用此方法，即使消息已經傳遞，消費者也可以倒退到流中的先前位置，從而允許他們在某個時間點重新創建系統的狀態。可以將代理配置為無限期保留消息，這對于區塊鏈應用程序是必需的。

In Kafka, each topic is split into partitions, where each partition is a sequence of records which is continually appended to. This is similar to a text log file, where new lines are appended to the end. The entries in the partition are each assigned a sequential id, called an offset, which uniquely identifies the record.

在Kafka中，每個主題都分為多個分區，每個分區都是一系列記錄，這些記錄不斷地附加到該記錄上。這類似于文本日志文件，該文件的末尾添加了新行。每個分區中的條目都分配有一個順序ID，稱為偏移量，用于唯一標識記錄。

The Kafka broker can be queried by offset, i.e. a consumer can reset its offset to some arbitrary point in the log to retrieve records from that point forward.

可以通過偏移量查詢Kafka代理，即，使用者可以將其偏移量重置為日志中的任意點，以從該點開始檢索記錄。

講解 (Tutorial)

Full source code is available here.

完整的源代碼在這里。

先決條件 (Prerequisites)

Some understanding of blockchain concepts: The tutorial below is based on implementations from Daniel van Flymen and Gerald Nash, both excellent practical introductions. The following tutorial builds heavily on these concepts, while using Kafka as the message transport. In effect, we’ll port a Python blockchain to Kafka, while maintaining most of the current implementation.
對區塊鏈概念的一些理解：以下教程基于Daniel van Flymen和Gerald Nash的實現，它們都是出色的實用介紹。下面的教程在將Kafka用作消息傳輸的同時，也以這些概念為基礎。實際上，我們將在保持大多數當前實現的同時將Python區塊鏈移植到Kafka。
Basic knowledge of Python: the code is written for Python 3.6.
Python的基本知識：該代碼是為Python 3.6編寫的。
Docker: docker-compose is used to run the Kafka broker.
Docker ：docker-compose用于運行Kafka代理。
kafkacat: This is a useful tool for interacting with Kafka (e.g. publishing messages to topics)
kafkacat ：這是與Kafka進行交互的有用工具(例如，將消息發布到主題)

On startup, our Kafka consumer will try to do three things: initialize a new blockchain if one has not yet been created; build an internal representation of the current state of the blockchain topic; then begin reading transactions in a loop:

在啟動時，我們的Kafka消費者將嘗試做三件事：如果尚未創建一個新的區塊鏈，則對其進行初始化；建立區塊鏈主題當前狀態的內部表示; 然后開始循環讀取事務：

The initialization step looks like this:

初始化步驟如下所示：

First, we find the highest available offset on the blockchain topic. If nothing has ever been published to the topic, the blockchain is new, so we start by creating and publishing the genesis block:

首先，我們在區塊鏈主題上找到最高的可用偏移量。如果尚未對該主題發布任何東西，則區塊鏈是新的，因此我們首先創建和發布創世塊：

In read_and_validate_chain(), we’ll first create a consumer to read from the blockchain topic:

在read_and_validate_chain() ，我們首先創建一個消費者以讀取read_and_validate_chain() blockchain主題：

Some notes on the parameters we’re creating this consumer with:

關于我們使用以下方法創建此使用者的一些注意事項：

Setting the consumer group to the blockchain group allows the broker to keep a reference of the offset the consumers have reached, for a given partition and topic
將消費者組設置為blockchain組可允許經紀人針對給定的分區和主題保留消費者已達到的偏移量的參考
auto_offset_reset=OffsetType.EARLIEST indicates that we’ll begin downloading messages from the start of the topic.
auto_offset_reset=OffsetType.EARLIEST表示我們將從主題的開頭開始下載消息。
auto_commit_enable=True periodically notifies the broker of the offset we’ve just consumed (as opposed to manually committing)
auto_commit_enable=True定期通知經紀人我們剛剛消耗的偏移量(與手動提交相對)
reset_offset_on_start=True is a switch which activates the auto_offset_reset for the consumer
reset_offset_on_start=True是一個為使用者激活auto_offset_reset的開關
consumer_timeout_ms=5000 will trigger the consumer to return from the method after five seconds if no new messages are being read (we’ve reached the end of the chain)
consumer_timeout_ms=5000將在五秒鐘后觸發使用者返回方法，如果沒有新消息被讀取(我們已經到達鏈的末尾)

Then we begin reading block messages from the blockchain topic:

然后，我們開始從區塊blockchain主題中讀取阻止消息：

For each message we receive:

對于每條消息，我們收到：

If it’s the first block in the chain, skip validation and add to our internal copy (this is the genesis block)
如果它是鏈中的第一個塊，請跳過驗證并添加到我們的內部副本中(這是創世塊)
Otherwise, check the block is valid with respect to the previous block, and append it to our copy
否則，檢查該塊相對于前一個塊是否有效，并將其附加到我們的副本中
Keep a note of the offset of the block we just consumed
記下我們剛剛消耗的塊的偏移量

At the end of this process, we’ll have downloaded the whole chain, discarding any invalid blocks, and we’ll have a reference to the offset of the latest block.

在此過程結束時，我們將下載整個鏈，丟棄所有無效塊，并且將引用最新塊的偏移量。

At this point, we’re ready to create a consumer on the transactions topic:

至此，我們準備在transactions主題上創建使用者：

Our example topic has been created with two partitions, to demonstrate how partitioning works in Kafka. The partitions are set up in the docker-compose.yml file, with this line:

我們的示例主題已經創建了兩個分區，以演示分區在Kafka中的工作方式。分區在docker-compose.yml文件中設置，行如下：

KAFKA_CREATE_TOPICS=transactions:2:1,blockchain:1:1

transactions:2:1 specifies the number of partitions and the replication factor (i.e. how many brokers will maintain a copy of the data on this partition).

transactions:2:1指定分區數和復制因子(即，有多少代理將在此分區上維護數據副本)。

This time, our consumer will start from OffsetType.LATEST so we only get transactions published from the current time onwards.

這次，我們的使用者將從OffsetType.LATEST開始，因此我們僅從當前時間開始發布交易。

By pinning the consumer to a specific partition of the transactions topic, we can increase the total throughput of all consumers on the topic. The Kafka broker will evenly distribute incoming messages across the two partitions of the transactions topic, unless we specify a partition when we publish to the topic. This means each consumer will be responsible for processing 50% of the messages, doubling the potential throughput of a single consumer.

通過將消費者固定在transactions主題的特定分區上，我們可以增加該主題上所有消費者的總吞吐量。 Kafka代理將在事務主題的兩個分區之間平均分配傳入消息，除非在發布到該主題時指定一個分區。這意味著每個使用者將負責處理50％的消息，使單個使用者的潛在吞吐量增加一倍。

Now we can begin consuming transactions:

現在我們可以開始使用交易了：

As transactions are received, we’ll add them to an internal list. Every three transactions, we’ll create a new block and call mine():

收到交易后，我們會將其添加到內部列表中。每三筆交易，我們將創建一個新塊并調用mine() ：

First, we’ll check if our blockchain is the longest one in the network; is our saved offset the latest, or have other nodes already published later blocks to the blockchain? This is our consensus step.
首先，我們將檢查我們的區塊鏈是否是網絡中最長的區塊鏈；我們保存的偏移量是最新的，還是其他節點已經發布了更高版本的區塊鏈？這是我們的共識步驟。
If new blocks have already been appended, we’ll make use of the read_and_validate_chain from before, this time supplying our latest known offset to retrieve only the newer blocks.
如果已經添加了新的塊，那么我們將使用之前的read_and_validate_chain ，這次提供我們最新的已知偏移量以僅檢索較新的塊。
At this point, we can attempt to calculate the proof of work, basing it on the proof from the latest block.
在這一點上，我們可以基于最新區塊的證明來嘗試計算工作證明。
To reward ourselves for solving the proof of work, we can insert a transaction into the block, paying ourselves a small block reward.
為了獎勵自己解決工作量證明的方法，我們可以在交易中插入一筆交易，并向自己支付一小筆獎勵。
Finally, we’ll publish our block onto the blockchain topic. The publish method looks like this:
最后，我們將區塊發布到區塊鏈主題上。 publish方法看起來像這樣：

行動中 (In Action)

First start the broker:
首先啟動代理：

docker-compose up -d

2. Run a consumer on partition 0:

2.在分區0上運行使用者：

python kafka_blockchain.py 0

3. Publish 3 transactions directly to partition 0:

3.直接將3個事務發布到分區0：

4. Check the transactions were added to a block on the blockchain topic:

4.檢查將交易添加到關于區塊blockchain的區塊：

kafkacat -C -b kafka:9092 -t blockchain

You should see output like this:

您應該看到如下輸出：

To balance transactions across two consumers, start a second consumer on partition 1, and remove -p 0 from the publication script above.

要在兩個使用者之間平衡事務，請在分區1上啟動另一個使用者，然后從上面的發布腳本中刪除-p 0 。

結論 (Conclusion)

Kafka can provide the foundation for a simple framework for blockchain experimentation. We can take advantage of features built into the platform, and associated tools like kafkacat, to experiment with distributed peer to peer transactions.

Kafka可以為區塊鏈實驗的簡單框架提供基礎。我們可以利用平臺內置的功能以及諸如kafkacat之類的相關工具來試驗分布式對等事務。

While scaling transactions in a public setting presents one set of issues, within a private network or consortium, where real-world trust is already established, transaction scaling might be achieved via an implementation which takes advantage of Kafka concepts.

雖然在公共場所擴展事務會帶來一系列問題，但在已經建立了真實世界信任的專用網絡或財團內部，可以通過利用Kafka概念的實現來實現事務擴展。