flink 檢查點
Apache Flink is a popular real-time data processing framework. It’s gaining more and more popularity thanks to its low-latency processing at extremely high throughput in a fault-tolerant manner.
Apache Flink是一種流行的實時數據處理框架。 它以容錯的方式以極高的吞吐量進行低延遲處理,因此越來越受歡迎。
While there is a good documentation provided by Flink it took me some time to get to understand the various mechanics that come together to make Flink Check pointing and Recovery work end to end. In this article I will explain the key steps one need to perform at various operator levels to create a fault tolerant Flink Job. Flink basic operators are Source, Process and Sink. Process operators could be of various flavors.
盡管Flink提供了很好的文檔,但是我花了一些時間來理解使Flink Check Pointing和Recovery工作端到端結合在一起的各種機制。 在本文中,我將解釋在各種操作員級別上創建容錯Flink Job所需執行的關鍵步驟。 Flink的基本運算符是Source,Process和Sink。 過程操作員可能具有多種口味。
So let’s get started on what you need to do to enable check pointing and making all operators Checkpoint aware.
因此,讓我們開始您需要做的事情,以啟用檢查點并使所有操作員都知道Checkpoint。
Flink環境配置(檢查指向) (Flink Environment Configuration (Check pointing))
源運營商檢查點 (Source Operator Checkpointing)
Source operator is the one which fetches data from the source. I wrote a simple SQL continuous query based source operator and kept track of the timestamp till the data has been queried. This information is what will be stored as part of check pointing process by flink. State of the source is saved by flink at the Job Operator level. CheckPointedFunction interface or ListCheckpointed interface should be implemented by the Source function as follows:
源運算符是從源獲取數據的運算符。 我編寫了一個簡單的基于SQL連續查詢的源運算符,并跟蹤時間戳,直到查詢完數據為止。 該信息將作為flink在檢查點過程中存儲的信息。 源的狀態通過flink在作業操作員級別保存。 CheckPointedFunction接口或ListCheckpointed接口應該由Source函數實現,如下所示:
snapshotState method will be called by the Flink Job Operator every 30 seconds as configured. Method should return the value to be saved in state backend
Flink作業操作員將按配置每30秒調用一次snapshotState方法。 方法應返回要保存在狀態后端的值
restoreState method is called when the operator is restarting and this method is the handler method to set the last stored timestamp (state) during a checkpoint
當操作員重新啟動時將調用restoreState方法,并且該方法是在檢查點期間設置最后存儲的時間戳(狀態)的處理程序方法
過程功能檢查點 (Process Function Checkpointing)
Flink supports saving state per key via KeyedProcessFunction. ProcessWindowFunction can also save the state of windows on per key basis in case of Event Time processing
Flink支持通過KeyedProcessFunction保存每個鍵的狀態。 在事件時間處理的情況下, ProcessWindowFunction還可以按鍵保存窗口的狀態
For KeyedProcessFunction, ValueState need to be stored per key as follows:
對于KeyedProcessFunction ,需要按以下方式存儲每個鍵的ValueState :
ValueState is just one of the examples. There are other ways to save the state as well. ProcessWindowFunction automatically saves the window state and no variable need to be set.
ValueState只是示例之一。 還有其他保存狀態的方法。 ProcessWindowFunction自動保存窗口狀態,無需設置任何變量。
接收器功能檢查點 (Sink Function Checkpointing)
Sink function check pointing works similar to Source Function check pointing and state is saved at the Job Operator level. I have implemented Sink function for Postgres DB. There could be multiple approaches to make sink function fault tolerant and robust considering performance and efficiency. I have taken a simplistic approach and will improve upon it in future.
接收器功能檢查指向的工作方式類似于源功能檢查指向,并且狀態保存在作業操作員級別。 我已經為Postgres DB實現了Sink功能。 考慮到性能和效率,可以有多種方法使接收器功能具有容錯性和魯棒性。 我采用了一種簡單的方法,將來會對其進行改進。
By committing statement in snapshotState method I’m ensuring that all pending data is flushed and committed as part of checkpointing trigger.
通過在snapshotState方法中提交語句,我確保將所有未決數據刷新并作為檢查點觸發器的一部分提交。
可以了,好了 (All Set)
Finally, you need to run your job and you can try to cancel it in between of processing and try to rerun it by providing the checkpoint location as follows. You will need to pass the latest checkpoint yourself, pay attention to -s parameter.
最后,您需要運行您的作業,您可以嘗試在處理之間取消它,并通過提供以下檢查點位置來嘗試重新運行它。 您將需要自己通過最新的檢查點,請注意-s參數。
.\flink.bat run -m localhost:8081 -s D:\flink-checkpoints\1d96f28886b693452ab1c88ab72a35c8\chk-10 -c <Job class Name> <Path to Jar file>
結論 (Conclusion)
This is a basic approach toward checkpointing and failure recovey and might need more improvements depending upon each use case. Feel free to provide me your feedback. Happy Reading!!
這是進行檢查點和故障重新報告的基本方法,并且可能需要根據每個用例進行更多的改進。 隨時向我提供您的反饋。 閱讀愉快!
Repository Link to codebase:
倉庫鏈接到代碼庫:
翻譯自: https://towardsdatascience.com/flink-checkpointing-and-recovery-7e59e76c2d45
flink 檢查點
本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。 如若轉載,請注明出處:http://www.pswp.cn/news/388036.shtml 繁體地址,請注明出處:http://hk.pswp.cn/news/388036.shtml 英文地址,請注明出處:http://en.pswp.cn/news/388036.shtml
如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!