HybridTime - Accessible Global Consistency with High Clock Uncertainty

Amazon’s Dynamo [9] and Facebook’s Cassandra [13], relax the consistency model,and offer only eventual consistency.

Others such as HBase [1] and BigTable [4] offer strong consistency only for operations touching a single partition, but not across the database as a whole.

Dynamo和Cassandra都是基于Vector Clocks和Gossip算法,強調高可用性,可以達到eventual consistency

而HBase,可以提供單分區的強一致性,但對于跨分區,或跨數據庫,就無法保證;

for example, in a geo-replicated service, a user may first be routed to a local datacenter and perform some action such as sending an email message.?
If they then reload their browser and are routed to another datacenter backend, they would still expect to see their sent message in their email outbox.

這個例子比較典型,要達到異地的一致性

?

一般要解決分布式問題的一致性問題的最簡單的辦法就是用synchronized clocks across all servers,但這基本是達不到的

the traditional approach to global consistency has been to use logical clocks.?
Logical clocks, such as Lamport Clocks [15] or Vector Clocks [10, 20].

所以傳統的做法就是用邏輯時鐘

https://www.zhihu.com/question/19994133

貼個回答,寫的不錯

當然可以,但是不適合在分布式數據庫里面。為什么呢? 首先,需要先確定你這個時間戳取自哪里。1、所有的時間戳都取自一個中心點節點,這樣讓時間戳得到一個全序關系,這是最簡單的實現,如果你的分布式數據庫只是部署在一個數據中心的話,但是這樣就增加了一個中心節點,影響數據庫的擴展性,而且任何一個取時間戳的動作都要增加上網絡延遲的開銷。如果分布式數據庫部署需要跨數據中心,這種方案變的不可行,跨數據中心網絡時延太大。從而影響整體性能2、所有的時間戳取自已所在的節點。這就涉及到時鐘同步的問題,就像Dongdong說的,如果取的物理時間那么這些節點并沒有一個一致的時間,總會有一點誤差,這種解決方案就有兩種思路:?
一、就是現在以Leslie Lamport 提出的邏輯時鐘的方法,重新定義了一種分布式的全序關系。vector clock就是以邏輯時鐘為基礎上發展而來的。?
二、使用物理時鐘,就是google spanner使用gps 加原子鐘進行時間同步,不同節點之間的時間是不能精確同步的,只能把不同節點之間的時間同步到一個范圍之內,spanner一個節點上獲得一個時間戳,不是一個時間值,而是一個時間值加上一個誤差范圍,也就是一個時間窗,如果兩個時間窗有重合,就無法比較大小,從而無法比較出來時間窗代表的事件的先后關系。而時間同步算法就是讓分布式各節點之間的時間的誤差盡量的小,這樣取時間戳這個動作效率是最高的,只在本節點取一下物理時間即可,但是為了同步時間,各個節點肯定在周期與其它節點進行通訊來同步物理時間(google時間同步的算法好像沒有開源)。時間同步算法只能讓時間盡量的精確。分布式數據庫里面是不能直接使用NTP來同步時間的,因為NTP時間同步協議里面允許節點的時間能夠回退,而數據庫里面要求本地的時鐘必須是遞增的。?
三、還有一種混合邏輯時鐘(Logical Physical Clocks),也是由邏輯時鐘演變而來,但是時鐘的值比較接近于物理時鐘,而且不依賴于下面物理時鐘的同步的算法,它的時間戳都可以在本地取,而且取出來是一個時間值,而不像spanner一樣,取出來是一個時間窗。詳細可以參見Logical Physical Clocks?and Consistent Snapshots in Globally Distributed Databases這篇論文

關于Lamport或向量時鐘,

Lamport時鐘就是邏輯時鐘,即不依賴于物理時鐘,而依賴event之間的因果關系來定義的一種偏序

參考,全序, 分布式一致性的本質

而Vector時鐘是,邏輯時鐘的一種實現,具體的邏輯看下文,

Why Vector Clock are Easy or Hard?

摘自上面的知乎回答

“dynamo paper是七、八年前寫的了,現在的amazon dynamo早已摒棄了version vector,而采用了synchronous replication(類似paxos的protocol),每一個partition會有一個leader負責寫入。其實version vector并不scale,因為對于一個key來說,隨著writer數量的增加,version vector數量成指數級的增長”?

邏輯時鐘的缺點,

first, they require that all clients must propagate clock data to achieve consistent views,?
and second, the assigned timestamps have no relation to physical time.

?

Spanner introduced?commit-wait, a way of ensuring physical-time based consistent global state by forcing operations to wait long enough so that all participants agree that the operation’s timestamp has passed based on worst case synchronization error bounds.

While innovative, the system performance becomes highly dependent on the quality of the time synchronization infrastructure, and thus may have unacceptable performance absent specialized hardware such as atomic clocks and GPS receivers.

Spanner利用原子鐘和GPS接收器,實現了一個較為精確的時鐘,這個時鐘叫做TrueTime,每次調用TrueTime API返回的是一個時間區間,而不是一個具體的值,這個TrueTime保證的是真實時間(absolute time/real time)一定在這個區間內,這個區間范圍通常大約14ms,甚至更小。

所以只有每個transaction的時間區間,都不重合,那么就可以比較容易的做到全局有序;

所以這里需要commit-wait,

image

可以看到,我在開始transaction的時候,取一次true time,[t1.earliest, t1.latest]

那么什么時候,我可以把這個transaction commit掉,即釋放掉鎖,即別人可以開始新的transaction

當我再取一次true time,[t2.earliest, t2.latest],當t2.earliest > t1.latest的時候,即可

因為這樣兩個transaction的時間區間就不可能重疊了,當然代價就是每個transaction都有等待大概2個error bound的時間

當然,spanner的實現依賴硬件的infra設備,普通用戶無法復制;

?

HybridTime(HT), a hybrid between physical and logical clocks and we show how HT can be used to achieve the same semantics as Spanner, but with good performance even with commonly available time synchronization.

Like Spanner, our approach relies on physical time measurements with bounded error to assign HybridTime timestamps to events that occur in the system.?
However, unlike Spanner, our approach does not usually require the error to be waited out, thus allowing for usage in common deployment scenarios where clocks are synchronized through common protocols such as the Network Time Protocol, in which clock synchronization error is often higher than with Spanner’s TrueTime.

The trade-off is that, in order to avoid commit-wait, HybridTime requires that timestamps be propagated across machines to achieve the same consistency semantics as Spanner.?
Contrary to vector clocks, which can expand as the number of participants in the cluster grows, HybridTime timestamps have constant and small size.

HybridTime說是physical and logical clocks的結合,和spanner一樣,也可以使用帶error bound的物理時間?
但由于他沒有spanner的硬件設施,所以做不到低延遲的時間同步,所以他依然使用邏輯時鐘來達到一致性;并且他解決了vector clocks隨著client的數量增加而會size膨脹的問題

HybridTime clocks follow similar update rules to Lamport clocks, but the time values are not purely logical:?
each time value has both a logical component, which helps in guaranteeing the same properties as a Lamport Clocks, and a physical component which allows the event to be associated with a physical point-in-time.

?

One of Spanner’s key benefits is that is?externally consistent, which is defined as fully linearizable, even in the presence of hidden channels.

Additionally we use the term?globally consistent?to describe a system which provides the same linearizability semantics, provided that there are no hidden channels present.

區分為,是否有hidden channels?意思是clients間存在因果或先后關系,比如通過發message,都是這個因果關系對于系統是不可知的,比如A write,然后發送消息給B,然后B write,那么因果關系上,B write一定后于A write,但是對于數據庫而言他并不知道,所以并不能保證A write先寫入;

spanner是可以保證externally consistent, 由于他通過 commit wait,來保證每個transaction之間一定是全局有序的

相對的globally consistent,更簡單些,因為并沒有hidden channels

?

Spanner’s key innovation is that timestamps assigned by the system can be used to achieve?external consistency, but also have?physical meaning.

HybridTime is always?globally consistent, and through selective application of commit-wait is externally consistent.

Spanner的主要創新在于實現external一致性的,同時還保留了物理時間的

而HybridTime,默認情況下是可以實現globally consistent,即偏序,因為他本身就是使用lamport時鐘,并且當你選擇commit-wait時,也可以保證externally consistent;

?

HybridTime Assumptions

1. HybridTime assumes that machines have a?reasonably accurate physical clock, represented by the?PCi(e)?function, which outputs the numeric timestamp returned by the physical clock as read by?process i for event e, that is able to provide absolute time measurements (usually in milli- or microseconds since 1 January 1970).

2. keeps the physical clocks across different servers synchronized with regard to a reference server,?the “reference” time, represented by the?PCref(e)?function which outputs the numeric timestamp returned by the “reference” process for event e;

Additionally, we assume that such a substrate is able to provide an?error bound?along with each time measurement, denoted by the?Ei(e)?function, which outputs the numeric value ε error of process i at the time e occurred

3. we make no assumptions on the actual accuracy of the clocks, i.e. the physical timestamps returned by server’s clocks may have an arbitrarily large but finite error, as long as this error’s bound is known

說白了,假設

有個相對靠譜的物理時鐘,一個理想的參照時鐘,以及他們之間相差的,error bound

最后,我們并不假設這個error bound會很小,只要有限就可以

假設1,我們有有限的error bound

image

?

假設2, 進程級別的物理時鐘單調遞增,注意是進程級別

image

?

基于以上假設,提出HybridTime Clock and Protocol

?

HybridTime Clock and Protocol

HybridTime clock(HTC) is a pair?(physical,logical)?where the first component is a representation of the physical time at which the event occurred and the second component is a logical sequence number.

定義其實顯而易見。。。

Algorithm 2 depicts the HTC algorithm.

image

HTC算法如上,兩部分,NOW和UPDATE

now就是取當前的HybridTime clock

upate就是根據in 來更新當前的clock

Algorithm 2 implements a Lamport Clock, with the additional advantage that generated timestamps have physical meaning and are accurate representations of physical time within a bound error.

算法本身,其實就是Lamport Clock,只是增加了physical clock的部分

給出一個例子,

image

?

To order the events timestamped using the HybridTime Clock algorithm we use Definition 1.

Definition 1. HCT(e) < HCT( f ) is defined as the lexicographical ordering of the timestamp two-tuple (physical,logical)

Theorem 1. The HybriTime clock happened-before relation forms a total order of events

Theorem 2. For any event in a causal chain f , the physical component of a HTC timestamp approximates the “real” time the event occurred, with a error defined and bounded by

image

?

Implementation

No Consistency - In this mode there are no external consistency guarantees, transactions are assigned timestamps from each server’s physical clock and no guarantee is made that reads are consistent or repeatable.

直接用local的物理時間,不保證一致性

HybridTime Consistency - In this mode our implementation guarantees the global consistency as Spanner, absent hidden channels, but using HybridTime instead of commit-wait.?
Clients choosing this consistency mode on writes must make sure that the timestamp that is received from the server is propagated to other servers and/or clients.?
Within the same client process, timestamps are automatically propagated on behalf of the user.

這個其實和邏輯時鐘,沒有區別,就是保證偏序

Commit-wait Consistency - In this mode our implementation guarantees the same external consistency semantics as Spanner by also using commit-wait in the way described in the original paper.?
However instead of using TrueTime, which is a proprietary and private API, we implemented commit-wait on top of the widely used Network Time Protocol (NTP). Hence, in this consistency mode we support hidden channels.

這個估計沒啥用,在沒有spanner硬件保證的情況下,commit-wait,性能不能忍吧

本文來自互聯網用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務,不擁有所有權,不承擔相關法律責任。
如若轉載,請注明出處:http://www.pswp.cn/news/282408.shtml
繁體地址,請注明出處:http://hk.pswp.cn/news/282408.shtml
英文地址,請注明出處:http://en.pswp.cn/news/282408.shtml

如若內容造成侵權/違法違規/事實不符,請聯系多彩編程網進行投訴反饋email:809451989@qq.com,一經查實,立即刪除!

相關文章

公司目前實行的git團隊協作方案

1. git init 新建本地倉庫2. git clone 項目地址 獲取遠程master代碼3. 在本地master代碼上進行開發, 并將修改提交到待推送區4. 開發完, 在本地master分支基礎上創建ready分支5. 在本地ready分支上(本地測試分支), 拉取并合并遠程nanle分支最新代碼(遠程測試分支)6. 將本地re…

bzoj3122 [Sdoi2013]隨機數生成器(bsgs+擴歐+數列)

Description Input 輸入含有多組數據&#xff0c;第一行一個正整數T&#xff0c;表示這個測試點內的數據組數。 接下來T行&#xff0c;每行有五個整數p&#xff0c;a&#xff0c;b&#xff0c;X1&#xff0c;t&#xff0c;表示一組數據。保證X1和t都是合法的頁碼。 注意&…

邊寫 Javascript 代碼邊玩游戲 – WarriorJS

在 github 上看到這個有趣的項目 – WarriorJS &#xff0c;項目的內容寫著 – 令人興奮的程序設計和人工智慧游戲&#xff0c;Ok 我坦白我是看到人工智慧被這個專案所吸引&#xff0c;但是玩了兩個關卡&#xff0c;還是不知道這個游戲跟人工智慧有什么關系&#xff0c;不過這個…

挑選合適自己的一門編程語言

2019獨角獸企業重金招聘Python工程師標準>>> 導讀想學編程的原因有很多&#xff0c;你也許是想要做一個程序&#xff0c;又或者你只是想投身于這個行業&#xff0c;所以&#xff0c;在選擇你的第一門編程語言之前&#xff0c;問問你自己&#xff1a;你想要在哪里運行…

css 實現章節名稱不換行,多余部分用 ... 代替

修改之前:修改之后: 代碼: <p style "white-space: nowrap;text-overflow: ellipsis;overflow: hidden;"><? $d[name] ?></p> <i><? $d[pen_name] ?></i> <i><?phpforeach ($d[tags] as $t) {echo $t[tag_name];…

.NET 反向代理-YARP 部署Https(SSL)

相關文章&#xff1a;.NET 反向代理-YARP.NET 反向代理-YARP 根據域名轉發分享一個基于Abp 和Yarp 開發的API網關項目使用 Yarp 做網關YARP&#xff08;Yet Another Reverse Proxy&#xff09;是使用 .NET 構建的高度可定制的反向代理C# 開源一個基于 yarp 的 API 網關 Demo&am…

shell腳本--cut命令

bash&shell系列文章&#xff1a;http://www.cnblogs.com/f-ck-need-u/p/7048359.html 1.1 選項說明 cut命令將行按指定的分隔符分割成多列&#xff0c;它的弱點在于不好處理多個分隔符重復的情況&#xff0c;因此經常結合tr的壓縮功能。 -b&#xff1a;按字節篩選&#xff…

12C RAC for ASM添加磁盤步驟

RHEL 7.2使用EMC Powerpath擴容2T磁盤空間&#xff0c;需要添加至以用12C RAC for ASM系統中。下面是具體步驟&#xff0c;主機人員告知擴容別名為data_center_16、data_center_17 1&#xff1a;linux 7 系統下添加映射存儲LUN(無需重啟)1>查看機器HBA卡信息--兩個節點機器都…

Windows 下 Redis 的下載和安裝

一 安裝redis 1. 下載redis https://github.com/MicrosoftArchive/redis/releases 注: 如果上面網址下載不了, 就到這里下載 https://download.csdn.net/download/m_nanle_xiaobudiu/104370342. 解壓壓縮文件夾3. 運行redis服務端到此 , redis已經可以正常使用了,但是為了方便…

什么是行內塊元素?

2019獨角獸企業重金招聘Python工程師標準>>> 我們都知道行內元素和塊級元素&#xff0c;在實際開發中&#xff0c;經常會聽到行內塊元素&#xff0c;那么什么是行內塊元素呢&#xff1f; 行內塊元素實際就是把塊元素以行的形式展現,保留了塊元素可以設置的對應CSS屬…

WPF-08 控件模板

模板是描述控件外觀&#xff0c;WPF中每個控件都有一個默認的模板&#xff0c;你可以通過定義模板來重寫控件可視化外觀和行為&#xff0c;WPF中有兩種常用的模板Control Template和Data TemplateControl Template控件模板定義了控件的可視化外觀&#xff0c;所有的UI控件都有自…

玄學搜索\隨稽化

正解又不會寫&#xff0c;又懶得去想 只好每次考試大大暴力&#xff0c;維持一下生活了 ----------------------- P1337 [JSOI2004]平衡點 / 吊打XXX 題目描述 有n個重物&#xff0c;每個重物系在一條足夠長的繩子上。每條繩子自上而下穿過桌面上的洞&#xff0c;然后系在一起。…

第0次作業

問題1:你為什么選擇計算機專業&#xff1f;你認為你的條件如何&#xff1f; 答:我平時比較喜歡研究一些自己認為神秘的東西&#xff0c;我認為計算機就是這樣的神秘東西&#xff01;所以我選擇這個專業&#xff01;我認為我自己可以學會計算機這個專業&#xff01;我對自己有信…

Nginx +Tomcat 實現動靜態分離(轉)

Nginx Tomcat 實現動靜態分離 動靜態分離就是Nginx處理客戶端的請求的靜態頁面(html頁面)或者圖片&#xff0c;Tomcat處理客戶端請求的動態頁面&#xff08;jsp頁面&#xff09;&#xff0c;因為Nginx處理的靜態頁面的效率高于Tomcat。 一&#xff0e;Nginx簡介&#xff1a; Ng…

Beanstalked的初步了解和使用(包括利用beanstalkd 秒殺消息隊列的實現)

一 Beanstalkd 是什么 Beanstalkd&#xff0c;一個高性能、輕量級的分布式內存隊列系統二 Beanstalkd 特性 1. 優先級&#xff08;priority&#xff09; 注&#xff1a;優先級就意味 支持任務插隊&#xff08;數字越小&#xff0c;優先級越高&#xff0c;0的優先級最高&#…

WPF效果第二百篇之再玩Gamma曲線

前面效果中使用比較low的方式實現了2.4的Gamma曲線;雖說后面加了點動畫呈現效果,但也就是個過渡版;今天才基本符合需求的效果:1、還是基于WPF效果第一百七十八篇之貝塞爾曲線他來實現的:3個ListBox 3個LandmarkControl2、在LandmarkControl增加插點位事件View:LandmarkControl …

2018企業面試總匯(答案請自行搜羅) 新增19年阿里面題(反向拓展技術棧)

Java 1.多個線程同時讀寫&#xff0c;讀線程的數量遠遠大于寫線程&#xff0c;你認為應該如何解決并發的問題&#xff1f;你會選擇加什么樣的鎖&#xff1f; 2.JAVA的AQS是否了了解&#xff0c;它是干嘛的&#xff1f; 3.除了synchronized關鍵字之外&#xff0c;你是怎么來保障…

skynet源碼閱讀5--協程調度模型

注&#xff1a;為方便理解&#xff0c;本文貼出的代碼部分經過了縮減或展開&#xff0c;與實際skynet代碼可能會有所出入。 作為一個skynet actor&#xff0c;在啟動腳本被加載的過程中&#xff0c;總是要調用skynet.start和skynet.dispatch的&#xff0c;前者在skynet-os中…

ASP.NET Core GRPC 和 Dubbo 互通

一.前言Dubbo 是比較流行的服務治理框架&#xff0c;國內不少大廠都在使用。以前的 Dubbo 使用的是私有協議&#xff0c;采集用的 hessian 序列化&#xff0c;對于多語言生態來說是極度的不友好。現在 Dubbo 發布了新版本 v3&#xff0c;推出了基于 gRPC 的新協議 Triple&#…

詳解C# 迭代器

[引用&#xff1a;https://www.cnblogs.com/yangecnu/archive/2012/03/17/2402432.html] 迭代器模式是設計模式中行為模式(behavioral pattern)的一個例子&#xff0c;他是一種簡化對象間通訊的模式&#xff0c;也是一種非常容易理解和使用的模式。簡單來說&#xff0c;迭代器模…