分布式數據庫系統
針對于海量數據,可擴展,高吞吐量,低時延
不支持關系模型
通過row和column進行索引,row和column可以是任意字符串
所存儲的數據也是字符串
Bigtable是一個map,value是array of bytes,通過row key, column key, timestamp檢索。
(row:string, column:string, time:int64) --> string
讀寫操作對于行是原子性的
The row range for a table is dynamically partitioned.
Each row range is called a tablet, which is the unit of distribution and load balancing.
一個tablet包含多行
列族是訪問控制的基本單位
同一列族的數據通常是相同類型的,對同一列族的數據進行壓縮
列族屬于模式,數量有限(in the hundreds at most),很少改變
列的數量是不限制的(have an unbounded number of columns)
列鍵 column key命名為family:qualifier
時間戳timestamp
可通過HBase系統指定,也可在客戶端指定
可存儲最新n條數據,也可存儲最近幾天的數據
基于GFS存儲log和data files
SSTable文件格式來存儲Bigtable data
Internaly, each SSTable contains a sequence of blocks(typically each block is 64KB in size, but this is configurable).
A block index (stored at the end of the SSTable) is used to locate blocks; the index is loaded into memory when the SSTable is opened.
Chubby provides a namespace that consists of directories and small files.
Each directory or file can be used as a lock, and reads and writes to a file are atomic.
Chubby的職責:
1、to ensure that there is at most one active master at any time
2、to store the bootstrap location of Bigtable data
3、to discover tablet servers and finallize tablet server deaths
4、to store Bigtable schema information (the column family information for each table)
5、to store access control lists
如果Chubby持續一段時間不能訪問,Bigtable becomes unavialiable。
實現包含三個部分:
a library that is linked into every client
one master server
many tablet servers (can be dynamically added or removed)
master server的職責:
1、assigning tablets to tablet servers
2、detecting the addition and expiration of tablet servers
3、balancing tablet-server load
4、garbage collection of files in GFS
5、handles schema changes such as table and column family creations
teblet server的職責:
1、manages a set of tablets (typically we have somewhere between ten to thousand tablets per tablet server)
2、handles read and write requests to the teblets that is loaded
3、splits tablets that have grown too large
每個table包含若干tablets,每個tablet對應多行
通過3層類B+樹實現tablet的索引
第一層是保存在Chubby中的文件,記錄的是root tablet的位置
第二層為root tablet,包含METADATA表的tablets信息,為了保證三層結構,root table只有一個,不進行split。
第二層為METADATA,包含的是user tablets的位置,以tablet's table identifier和its end row為key,value為對應tablet的位置
root tablet實際是第一個METADATA表的tablet。
Locality groups:
用戶可以將同時訪問的列族設置為一個locality group,每個locality group作為一個SSTable存儲
Compression:
可以將包含locality group的SSTable文件進行壓縮存儲以節省空間
Caching for read performance:
Higher-level cache: the key-value pairs returned by the SSTable
Lower-level cache: SSTables blocks
Bloom filters:
A read operation has to read from all SSTables that make up the state of a tablet.
過濾器可以減少磁盤訪問量