大數據系列 | 詳解基于Zookeeper或ClickHouse Keeper的ClickHouse集群部署
- 1. ClickHouse與MySQL的區別
- 2. 在群集的所有機器上安裝ClickHouse服務端
- 2.1. 在線安裝clickhouse
- 2.2. 離線安裝clickhouse
- 3. ClickHouse Keeper/Zookeeper集群安裝
- 4. 在配置文件中設置集群配置
- 5. 在每個實例上創建本地表
- 6. 創建一個分布式表
- 7. 同一環境創建多個ClickHouse集群
1. ClickHouse與MySQL的區別
?? ● 數據存儲方式:
?? MySQL是關系型數據庫管理系統(RDBMS),它使用表格來組織數據,并支持SQL查詢語言。
?? ClickHouse是列式數據庫管理系統(Columnar DBMS),它以列為單位存儲數據,這使得它在處理大量數據時非常高效。
?? ● 用途:
?? MySQL通常用于事務處理(OLTP)工作負載,比如Web應用程序的后端數據庫,支持高并發的讀寫操作。
?? ClickHouse則更適合于分析型(OLAP)工作負載,用于大規模數據分析和報告,以及實時數據分析。
?? ● 性能:
?? ClickHouse在處理大量數據時通常比MySQL更快,尤其是在數據分析和聚合查詢方面。
?? MySQL在處理事務處理方面可能更快,因為它是為支持事務處理而設計的。
?? ● 數據復制和高可用性:
?? MySQL有成熟的復制和高可用性方案,如MySQL Replication和Galera Cluster,可以確保數據的可靠性和可用性。
?? ClickHouse也有一些復制和高可用性的解決方案,但相對來說還不如MySQL那么成熟。
?? 事務的概念:
?? 事務(Transaction)是指數據庫管理系統執行的一組操作,被視為一個邏輯工作單元,要么全部執行成功,要么全部不執行。事務具有以下四個特性,通常稱為 ACID 特性:
?? 原子性(Atomicity)
:事務是一個原子操作,要么全部執行成功,要么全部不執行。如果事務中的任何一部分操作失敗,整個事務都會被回滾(Rollback),即撤銷之前的操作,保持數據庫狀態一致性。
?? 一致性(Consistency)
:事務執行的結果必須使數據庫從一個一致性狀態轉變到另一個一致性狀態。換句話說,事務執行后,數據庫必須保持滿足預定義的約束和完整性規則的狀態。
?? 隔離性(Isolation)
:并發執行的事務之間應該相互隔離,即一個事務的執行不應該受到其他事務的影響。事務應該像是在獨立的執行環境中運行一樣,盡管它們可能同時運行。
?? 持久性(Durability)
:一旦事務提交(Commit),其結果應該永久保存在數據庫中,即使系統崩潰或重啟,也應該能夠恢復到提交后的狀態。
?? 接下來通俗的案例來解釋MySQL中的事務:
?? 假設你在網上有一個小型的書店,你的網站上允許人們購買書籍。現在你需要確保每一次購買都是可靠的,避免出現錯誤的情況。現在,讓我們來看看一次典型的購買流程,以及如何使用MySQL的事務來保證它的可靠性:
?? 開始事務:當有人點擊購買按鈕時,你需要開始一個事務。這就好像給購買過程打了個標記,告訴系統:“我現在要進行一次購買操作了,請注意!”
?? 扣除庫存:在購買過程中,你需要檢查庫存是否充足,并且將庫存數量相應地減少。這一步是在事務中完成的,如果庫存不足或者出現其他錯誤,你可以回滾事務,即取消這次購買操作。
?? 生成訂單:如果庫存充足,你就可以為購買生成一個訂單,記錄購買者的信息、購買的書籍以及付款信息等。這個過程也在事務中進行,確保訂單的生成是原子性的操作,要么全部成功,要么全部失敗。
?? 扣款:最后,你需要從購買者的銀行賬戶中扣款,完成支付過程。同樣地,在事務中進行,以保證支付的準確性和可靠性。
?? 提交事務:如果上述所有步驟都順利完成,你可以提交事務,告訴系統:“購買操作已經完成,可以更新數據庫并通知用戶了!”
?? 完成購買:一旦事務提交成功,購買操作就完成了。你可以更新庫存數量,發送確認郵件給購買者,以及做其他必要的處理。
?? 通過MySQL的事務的原子性,購買過程中的每個步驟要么全部成功執行,要么全部失敗回滾。即使在處理過程中出現了錯誤,也可以保證數據庫的一致性(庫存一致性、訂單一致性、支付一致性)和完整性。
2. 在群集的所有機器上安裝ClickHouse服務端
?? 官網部署文檔:https://clickhouse.com/docs/zh/getting-started/tutorial
?? RPM軟件包下載:https://packages.clickhouse.com/rpm/stable/
hostnamectl set-hostname ck1
hostnamectl set-hostname ck2
hostnamectl set-hostname ck3
hostnamectl set-hostname ck4
cat /etc/hosts
127.0.0.1 localhost
127.0.0.1 vm10-1-248-33.cloud.local10.1.248.33 ck1
10.1.248.247 ck2
10.1.248.68 ck3
10.1.248.16 ck4
在線安裝或者離線安裝任選其中一種方式,每臺節點都需要安裝
?? 各個節點之間要做無密鑰登錄
2.1. 在線安裝clickhouse
?? 推薦使用CentOS、RedHat和所有其他基于rpm的Linux發行版的官方預編譯rpm包。
?? 首先,您需要添加官方存儲庫:
yum install -y yum-utils
yum-config-manager --add-repo https://packages.clickhouse.com/rpm/clickhouse.repo
yum install -y clickhouse-server clickhouse-client
rpm -qa | grep clickhouse-server
clickhouse-server-24.6.2.17-1.x86_64rpm -qa | grep clickhouse-client
clickhouse-client-24.6.2.17-1.x86_64
/etc/init.d/clickhouse-server startchown -R clickhouse: '/var/run/clickhouse-server/'
Will run sudo --preserve-env -u 'clickhouse' /usr/bin/clickhouse-server --config-file /etc/clickhouse-server/config.xml --pid-file /var/run/clickhouse-server/clickhouse-server.pid --daemon
Waiting for server to start
Waiting for server to start
Server started
?? 設置登錄密碼:
方式一(隨機):
PASSWORD=$(base64 < /dev/urandom | head -c8); echo "$PASSWORD"; echo -n "$PASSWORD" | sha256sum | tr -d '-'
xS3UbICi
019a756a1aaf0d0df25f3507978bbb0c375ed97cfa6282832cbc95143e7b7e4e方式二(指定):
echo -n 123456 | openssl dgst -sha256
<password_sha256_hex>8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92</password_sha256_hex>
?? 修改users.xml
使密碼生效:
vim /etc/clickhouse-server/users.xml
<!-- <password></password> --> // 注銷此行
<password_sha256_hex>019a756a1aaf0d0df25f3507978bbb0c375ed97cfa6282832cbc95143e7b7e4e</password_sha256_hex>
systemctl restart clickhouse-server
systemctl enable clickhouse-server
clickhouse-client // 如果沒有密碼直接使用此命令執行
clickhouse-client --password // 如果有密碼需要指定--password
ClickHouse client version 24.6.2.17 (official build).
Password for user (default): // 輸入密碼
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 24.6.2.Warnings:* Linux transparent hugepages are set to "always". Check /sys/kernel/mm/transparent_hugepage/enabled* Maximum number of threads is lower than 30000. There could be problems with handling a lot of simultaneous queries.clickhouse1 :) quit
Bye.
2.2. 離線安裝clickhouse
export LATEST_VERSION=24.6.2.17curl -O https://packages.clickhouse.com/rpm/stable/clickhouse-client-$LATEST_VERSION.x86_64.rpm
curl -O https://packages.clickhouse.com/rpm/stable/clickhouse-common-static-$LATEST_VERSION.x86_64.rpm
curl -O https://packages.clickhouse.com/rpm/stable/clickhouse-common-static-dbg-$LATEST_VERSION.x86_64.rpm
curl -O https://packages.clickhouse.com/rpm/stable/clickhouse-server-$LATEST_VERSION.x86_64.rpm
?? clickhouse-common-static
:ClickHouse編譯的二進制文件。
?? clickhouse-server
:創建clickhouse-server軟連接,并安裝默認配置服務
?? clickhouse-client
:創建clickhouse-client客戶端工具軟連接,并安裝客戶端配置文件。
?? clickhouse-common-static-dbg
:帶有調試信息的ClickHouse二進制文件
rpm -ivh clickhouse-common-static-24.6.2.17.x86_64.rpm
rpm -ivh clickhouse-common-static-dbg-24.6.2.17.x86_64.rpm
rpm -ivh clickhouse-client-24.6.2.17.x86_64.rpm
rpm -ivh clickhouse-server-24.6.2.17.x86_64.rpm
ClickHouse binary is already located at /usr/bin/clickhouse
Symlink /usr/bin/clickhouse-server already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-server to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-client already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-client to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-local already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-local to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-benchmark already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-benchmark to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-obfuscator already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-obfuscator to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-git-import to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-compressor already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-compressor to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-format already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-format to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-extract-from-config already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-extract-from-config to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-keeper already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-keeper to /usr/bin/clickhouse.
Symlink /usr/bin/clickhouse-keeper-converter already exists but it points to /clickhouse. Will replace the old symlink to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-keeper-converter to /usr/bin/clickhouse.
Creating symlink /usr/bin/clickhouse-disks to /usr/bin/clickhouse.
Symlink /usr/bin/ch already exists. Will keep it.
Symlink /usr/bin/chl already exists. Will keep it.
Symlink /usr/bin/chc already exists. Will keep it.
Creating clickhouse group if it does not exist.
groupadd -r clickhouse
Creating clickhouse user if it does not exist.useradd -r --shell /bin/false --home-dir /nonexistent -g clickhouse clickhouse
Will set ulimits for clickhouse user in /etc/security/limits.d/clickhouse.conf.
Creating config directory /etc/clickhouse-server/config.d that is used for tweaks of main server configuration.
Creating config directory /etc/clickhouse-server/users.d that is used for tweaks of users configuration.
Config file /etc/clickhouse-server/config.xml already exists, will keep it and extract path info from it.
/etc/clickhouse-server/config.xml has /var/lib/clickhouse/ as data path.
/etc/clickhouse-server/config.xml has /var/log/clickhouse-server/ as log path.
Users config file /etc/clickhouse-server/users.xml already exists, will keep it and extract users info from it.
Creating log directory /var/log/clickhouse-server/.
Creating data directory /var/lib/clickhouse/.
Creating pid directory /var/run/clickhouse-server.chown -R clickhouse:clickhouse '/var/log/clickhouse-server/'chown -R clickhouse:clickhouse '/var/run/clickhouse-server'chown clickhouse:clickhouse '/var/lib/clickhouse/'
Enter password for the default user: // 輸入密碼(xS3UbICi)
Password for the default user is saved in file /etc/clickhouse-server/users.d/default-password.xml.
Setting capabilities for clickhouse binary. This is optional.
Cannot set 'net_admin' or 'ipc_lock' or 'sys_nice' or 'net_bind_service' capability for clickhouse binary. This is optional. Taskstats accounting will be disabled. To enable taskstats accounting you may add the required capability later manually.chown -R clickhouse:clickhouse '/etc/clickhouse-server'ClickHouse has been successfully installed.Start clickhouse-server with:sudo clickhouse startStart clickhouse-client with:clickhouse-client --passwordCreated symlink from /etc/systemd/system/multi-user.target.wants/clickhouse-server.service to /usr/lib/systemd/system/clickhouse-server.service.
?? 查看保存的密碼(使用sha256編碼格式存儲的):
cat /etc/clickhouse-server/users.d/default-password.xml
<clickhouse><users><default><password remove='1' /><password_sha256_hex>019a756a1aaf0d0df25f3507978bbb0c375ed97cfa6282832cbc95143e7b7e4e</password_sha256_hex></default></users>
</clickhouse>
?? 啟動clickhouse-server:
systemctl start clickhouse-server
[root@clickhouse2 ~]# ps -ef | grep clickhouse
clickho+ 24511 1 0 13:50 ? 00:00:00 clickhouse-watchdog --config=/etc/clickhouse-server/config.xml --pid-file=/run/clickhouse-server/clickhouse-server.pid
clickho+ 24512 24511 27 13:50 ? 00:00:01 /usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml --pid-file=/run/clickhouse-server/clickhouse-server.pid
clickhouse-client --password
ClickHouse client version 24.6.2.17 (official build).
Password for user (default):
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 24.6.2.Warnings:* Linux transparent hugepages are set to "always". Check /sys/kernel/mm/transparent_hugepage/enabled* Maximum number of threads is lower than 30000. There could be problems with handling a lot of simultaneous queries.clickhouse2 :)
3. ClickHouse Keeper/Zookeeper集群安裝
?? 為什么使用zookeeper?不用也可以,但建議還是使用。ClickHouse提供了非常高級的基于zookeeper的表復制方式,同時也提供了基于Cluster的復制方式。
?? clickhouse keeper可以和clickhouse集群使用相同的主機,也可以單獨運行在其他主機上
?? clickhouse keeper包含在clickhouse軟件包中,所以如果clickhouse keeper單獨出來部署一個集群,也是需要安裝clickhouse-server軟件,然后按照如下配置config.xml
文件內容,再啟動clickhouse-server服務
<listen_host>0.0.0.0</listen_host>
<keeper_server><tcp_port>2181</tcp_port><server_id>1</server_id> // 每臺主機的id不一樣<raft_configuration><server><id>1</id><hostname>keeper1</hostname><port>9444</port></server><server><id>2</id><hostname>keeper2</hostname><port>9444</port></server><server><id>3</id><hostname>keeper3</hostname><port>9444</port></server></raft_configuration>
</keeper_server>
4. 在配置文件中設置集群配置
?? ClickHouse的配置主要為config.xml
、user.xml
?? 修改config.xml
文件:
cp /etc/clickhouse-server/config.xml /etc/clickhouse-server/config.xml.bak
vim /etc/clickhouse-server/config.xml
?? 配置listen_host
字段:
<listen_host>0.0.0.0</listen_host>
?? 配置remote_server
字段:
?? 找到配置文件中的remote_server
標簽,發現它里面有很多的內容,我們沒有都用到,它只是給我一個例子,把里面的內容都刪除,粘貼上我們自己想要的:
?? 配置2個分片,每個分片包含2個副本:
<remote_servers><cluster1> // 集群名稱<shard><replica><host>ck1</host><port>9000</port><user>default</user><password>123.com</password></replica><replica><host>ck3</host><port>9000</port><user>default</user><password>123.com</password></replica><internal_replication>true</internal_replication> // 將內部復制設置為true</shard><shard><replica><host>ck2</host><port>9000</port><user>default</user><password>123.com</password></replica><replica><host>ck4</host><port>9000</port><user>default</user><password>123.com</password></replica><internal_replication>true</internal_replication></shard></cluster1></remote_servers>
?? 四臺配置完成后檢查ClickHouse是否在集群系統表中顯示新的Cluster集群配置:
ck1 :) SELECT cluster FROM system.clusters;SELECT cluster
FROM system.clustersQuery id: 0f8461fd-c06c-4a39-8f13-9dee83ada84aPassword for user (default):
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 24.6.2.┌─cluster──┐
1. │ cluster1 │
2. │ cluster1 │
3. │ cluster1 │
4. │ cluster1 │└──────────┘4 rows in set. Elapsed: 0.002 sec.
?? 配置macros字段: 根據每臺機器的分片副本配置:
?? CK1節點配置如下:
<macros><shard>1</shard><replica>1</replica>
</macros>
?? 修改CK3節點配置如下:
<macros><shard>1</shard><replica>2</replica>
</macros>
?? 修改CK2節點配置如下:
<macros><shard>2</shard><replica>1</replica>
</macros>
?? 修改CK4節點配置如下:
<macros><shard>2</shard><replica>2</replica>
</macros>
?? 配置文件是熱更新的,所以修改配置后無需要重啟服務
ck1 :) SELECT cluster,shard_num,replica_num,host_name,port FROM system.clusters WHERE cluster = 'cluster1' ORDER BY shard_num ASC,replica_num ASC;SELECTcluster,shard_num,replica_num,host_name,port
FROM system.clusters
WHERE cluster = 'cluster1'
ORDER BYshard_num ASC,replica_num ASCQuery id: 47c42f7b-50b6-4988-bff6-49b9792662ff┌─cluster──┬─shard_num─┬─replica_num─┬─host_name─┬─port─┐
1. │ cluster1 │ 1 │ 1 │ ck1 │ 9000 │
2. │ cluster1 │ 1 │ 2 │ ck3 │ 9000 │
3. │ cluster1 │ 2 │ 1 │ ck2 │ 9000 │
4. │ cluster1 │ 2 │ 2 │ ck4 │ 9000 │└──────────┴───────────┴─────────────┴───────────┴──────┘4 rows in set. Elapsed: 0.004 sec.
?? 為了集群復制需要配置zookeeper字段:
?? 無論是使用clickhouse keeper還是zookeeper,配置都使用zookeeper節點
<zookeeper><node><host>10.1.248.23</host><port>2181</port></node><node><host>10.1.248.59</host><port>2181</port></node><node><host>10.1.248.173</host><port>2181</port></node></zookeeper>
?? 配置完成后重啟clickhouse,檢查并查看zookeeper系統數據庫是否存在:
ck1 :) SELECT * FROM system.zookeeper WHERE path IN ('/','clickhouse');SELECT *
FROM system.zookeeper
WHERE path IN ('/', 'clickhouse')Query id: cc0bbdb1-60b8-4be6-8f56-b73ad616586b┌─name───────┬─value─┬─path───────┐
1. │ sessions │ │ clickhouse │
2. │ task_queue │ │ clickhouse │
3. │ zookeeper │ │ / │
4. │ clickhouse │ │ / │└────────────┴───────┴────────────┘4 rows in set. Elapsed: 0.003 sec.
?? 其他clickhouse節點也需要進行同樣的配置
?? 在clickhouse中執行的DDL語句在zookeeper中就存在消息隊列,查看zookeeper中的消息隊列:
[zk: localhost:2181(CONNECTED) 4] ls /clickhouse/task_queue/ddl
[query-0000000000]
?? 刪除消息隊列:
deleteall /clickhouse/task_queue/ddl
5. 在每個實例上創建本地表
ck1 :) CREATE TABLE default.local_table(EventDate DateTime, Number UInt32, id UInt32 ) ENGINE = MergeTree() PARTITION BY toYYYYMM(EventDate) ORDER BY (Number, EventDate, intHash32(id)) SAMPLE BY intHash32(id);CREATE TABLE default.local_table
(`EventDate` DateTime,`Number` UInt32,`id` UInt32
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(EventDate)
ORDER BY (Number, EventDate, intHash32(id))
SAMPLE BY intHash32(id)Query id: f9b0e82d-f288-424b-9d2e-f9ef13b0f389Ok.0 rows in set. Elapsed: 0.007 sec. ck1 :) show tables;SHOW TABLESQuery id: 3a3c8a9e-5a5c-4e0c-96b5-e71cdd91b819┌─name────────┐
1. │ local_table │└─────────────┘1 row in set. Elapsed: 0.003 sec.
?? 加上on cluster
創建一個集群表:
?? 使用on cluster
子句對cluster1執行創建數據庫的DDL查詢,clickhouse將自動在所有4臺主機上創建數據庫:
ck1 :) CREATE DATABASE my_db ON CLUSTER cluster1;CREATE DATABASE my_db ON CLUSTER cluster1Query id: 20cfe4e4-8001-4009-a132-076c0e13c816┌─host─┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
1. │ ck1 │ 9000 │ 0 │ │ 3 │ 0 │
2. │ ck4 │ 9000 │ 0 │ │ 2 │ 0 │
3. │ ck2 │ 9000 │ 0 │ │ 1 │ 0 │
4. │ ck3 │ 9000 │ 0 │ │ 0 │ 0 │└──────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘4 rows in set. Elapsed: 0.069 sec.
?? 如果創建失敗,查看clickhouse日志報如下錯誤:
tail -f /var/log/clickhouse-server/clickhouse-server.log
2024.07.15 05:02:11.192027 [ 26122 ] {} <Debug> DDLWorker: Scheduling tasks
2024.07.15 05:02:11.192125 [ 26141 ] {} <Debug> DDLWorker: Cleaning queue
2024.07.15 05:02:11.194949 [ 26122 ] {} <Trace> DDLWorker: scheduleTasks: initialized=true, size_before_filtering=5, queue_size=5, entries=query-0000000000..query-0000000004, first_failed_task_name=none, current_tasks_size=0, last_current_task=none, last_skipped_entry_name=query-0000000003
2024.07.15 05:02:11.194979 [ 26122 ] {} <Debug> DDLWorker: Will schedule 1 tasks starting from query-0000000004
2024.07.15 05:02:11.194992 [ 26122 ] {} <Trace> DDLWorker: Checking task query-0000000004
2024.07.15 05:02:11.197936 [ 26122 ] {} <Debug> DDLWorker: Will not execute task query-0000000004: There is no a local address in host list
2024.07.15 05:02:11.197985 [ 26122 ] {} <Debug> DDLWorker: Waiting for queue updates
故障原因是因為我使用的云主機,在
remote_servers
的配置中一定要使用網卡的IP地址,不要使用云主機的彈性IP
ck4 :) show databasesSHOW DATABASESQuery id: c8f89ea9-9125-4c47-8ba7-a7937ce5d670┌─name───────────────┐
1. │ INFORMATION_SCHEMA │
2. │ default │
3. │ information_schema │
4. │ my_db │
5. │ system │└────────────────────┘5 rows in set. Elapsed: 0.002 sec.
?? 接下來創建一個復制表,再次使用on cluster子句針對cluster1,clickhouse將自動在每個主機上創建四個表(2個分片2個副本),主機1和主機3上的兩個表是兩個分片中的一個的兩個副本。主機2和主機4上的兩個表是另一個分片的副本。
?? 復制型合并樹表需要兩個參數才可以正常工作,第一個是zookeeper的路徑,這是指表的元數據路徑,第二個是表的副本名稱。所有其其它副本會定期從相同的zookeeper路徑中拉取復制日志,他們會注意新的條目并讀取它,然后通過內部http復制協議從源副本下載新數據
CREATE TABLE my_db.my_table1 ON CLUSTER cluster1
(`WatchID` UInt64,`JavaEnable` UInt8,`Title` String,`GoodEvent` Int16,`EventTime` DateTime,`EventDate` Date,`CounterID` UInt32,`ClientIP` UInt32,`ClientIP6` FixedString(16),`RegionID` UInt32,`UserID` UInt64,`CounterClass` Int8,`OS` UInt8,`UserAgent` UInt8,`URL` String,`Referer` String,`URLDomain` String,`RefererDomain` String,`Refresh` UInt8,`IsRobot` UInt8,`RefererCategories` Array(UInt16),`URLCategories` Array(UInt16),`URLRegions` Array(UInt32),`RefererRegions` Array(UInt32),`ResolutionWidth` UInt16,`ResolutionHeight` UInt16,`ResolutionDepth` UInt8,`FlashMajor` UInt8,`FlashMinor` UInt8,`FlashMinor2` String,`NetMajor` UInt8,`NetMinor` UInt8,`UserAgentMajor` UInt16,`UserAgentMinor` FixedString(2),`CookieEnable` UInt8,`JavascriptEnable` UInt8,`IsMobile` UInt8,`MobilePhone` UInt8,`MobilePhoneModel` String,`Params` String,`IPNetworkID` UInt32,`TraficSourceID` Int8,`SearchEngineID` UInt16,`SearchPhrase` String,`AdvEngineID` UInt8,`IsArtifical` UInt8,`WindowClientWidth` UInt16,`WindowClientHeight` UInt16,`ClientTimeZone` Int16,`ClientEventTime` DateTime,`SilverlightVersion1` UInt8,`SilverlightVersion2` UInt8,`SilverlightVersion3` UInt32,`SilverlightVersion4` UInt16,`PageCharset` String,`CodeVersion` UInt32,`IsLink` UInt8,`IsDownload` UInt8,`IsNotBounce` UInt8,`FUniqID` UInt64,`HID` UInt32,`IsOldCounter` UInt8,`IsEvent` UInt8,`IsParameter` UInt8,`DontCountHits` UInt8,`WithHash` UInt8,`HitColor` FixedString(1),`UTCEventTime` DateTime,`Age` UInt8,`Sex` UInt8,`Income` UInt8,`Interests` UInt16,`Robotness` UInt8,`GeneralInterests` Array(UInt16),`RemoteIP` UInt32,`RemoteIP6` FixedString(16),`WindowName` Int32,`OpenerName` Int32,`HistoryLength` Int16,`BrowserLanguage` FixedString(2),`BrowserCountry` FixedString(2),`SocialNetwork` String,`SocialAction` String,`HTTPError` UInt16,`SendTiming` Int32,`DNSTiming` Int32,`ConnectTiming` Int32,`ResponseStartTiming` Int32,`ResponseEndTiming` Int32,`FetchTiming` Int32,`RedirectTiming` Int32,`DOMInteractiveTiming` Int32,`DOMContentLoadedTiming` Int32,`DOMCompleteTiming` Int32,`LoadEventStartTiming` Int32,`LoadEventEndTiming` Int32,`NSToDOMContentLoadedTiming` Int32,`FirstPaintTiming` Int32,`RedirectCount` Int8,`SocialSourceNetworkID` UInt8,`SocialSourcePage` String,`ParamPrice` Int64,`ParamOrderID` String,`ParamCurrency` FixedString(3),`ParamCurrencyID` UInt16,`GoalsReached` Array(UInt32),`OpenstatServiceName` String,`OpenstatCampaignID` String,`OpenstatAdID` String,`OpenstatSourceID` String,`UTMSource` String,`UTMMedium` String,`UTMCampaign` String,`UTMContent` String,`UTMTerm` String,`FromTag` String,`HasGCLID` UInt8,`RefererHash` UInt64,`URLHash` UInt64,`CLID` UInt32,`YCLID` UInt64,`ShareService` String,`ShareURL` String,`ShareTitle` String,`ParsedParams` Nested(Key1 String,Key2 String,Key3 String,Key4 String,Key5 String,ValueDouble Float64),`IslandID` FixedString(16),`RequestNum` UInt32,`RequestTry` UInt8
)
ENGINE = ReplicatedMergeTree ('/clickhouse/tables/my_table1/{shard}', '{replica}')
PRIMARY KEY(CounterID,EventDate,UserID,EventTime,WatchID);Query id: 4f2bd2f5-1e2f-49b9-bb67-5db6c8842efa┌─host─┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
1. │ ck1 │ 9000 │ 0 │ │ 3 │ 0 │
2. │ ck4 │ 9000 │ 0 │ │ 2 │ 0 │
3. │ ck2 │ 9000 │ 0 │ │ 1 │ 0 │
4. │ ck3 │ 9000 │ 0 │ │ 0 │ 0 │└──────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘4 rows in set. Elapsed: 0.167 sec.
// 修改數據類型
ALTER table my_db.my_table1_distributed on cluster cluster1 MODIFY COLUMN ClientIP Int64;
?? 建表語句基本語法如下:
CREATE TABLE [IF NOT EXISTS] [db.]table_name ON CLUSTER cluster
(name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],...INDEX index_name1 expr1 TYPE type1(...) GRANULARITY value1,INDEX index_name2 expr2 TYPE type2(...) GRANULARITY value2
) ENGINE = engine_name()
[PARTITION BY expr]
[ORDER BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[SETTINGS name=value, ...];
?? 選項描述:
?? ● db
:指定數據庫名稱,如果當前語句沒有包含‘db’,則默認使用當前選擇的數據庫為‘db’。
?? ● cluster
:指定集群名稱,目前固定為default。ON CLUSTER 將在每一個節點上都創建一個本地表。
?? ● type
:該列數據類型,例如 UInt32。
?? ● DEFAULT
:該列缺省值。如果INSERT中不包含指定的列,那么將通過表達式計算它的默認值并填充它。
?? ● MATERIALIZED
:物化列表達式,表示該列不能被INSERT,是被計算出來的; 在INSERT語句中,不需要寫入該列;在SELECT *查詢語句結果集不包含該列。
?? ● ALIAS
:別名列,這樣的列不會存儲在表中。 它的值不能夠通過INSERT寫入,同時使用SELECT查詢星號時,這些列也不會被用來替換星號。 但是它們可以用于SELECT中,在這種情況下,在查詢分析中別名將被替換。
物化列與別名列的區別: 物化列是會保存數據,查詢的時候不需要計算,而別名列不會保存數據,查詢的時候需要計算,查詢時候返回表達式的計算結果
?? 以下選項與表引擎相關,只有MergeTree系列表引擎支持:
?? ● PARTITION BY
:指定分區鍵。通常按照日期分區,也可以用其他字段或字段表達式。
?? ● ORDER BY
:指定 排序鍵。可以是一組列的元組或任意的表達式。
?? ● PRIMARY KEY
: 指定主鍵,默認情況下主鍵跟排序鍵相同。因此,大部分情況下不需要再專門指定一個 PRIMARY KEY 子句。
?? ● SAMPLE BY
:抽樣表達式,如果要用抽樣表達式,主鍵中必須包含這個表達式。
?? ● SETTINGS
:影響 性能的額外參數。
?? ● GRANULARITY
:索引粒度參數。
?? ● {shard}
,{replica}
參數不需要賦值。
高可用集群(雙副本),要用ReplicatedMergeTree等Replicated系列引擎,否則副本之間不進行數據復制,導致數據查詢結果不一致。
?? 分別在4個節點都查看下是否同步了新建表:
ck1 :) use my_db;USE my_dbQuery id: 4b1be3f0-f408-41fc-ac85-044190111512Ok.0 rows in set. Elapsed: 0.002 sec. ck1 :) show tables;SHOW TABLESQuery id: 92639317-9f4d-4dfd-8b6f-25fc475b9cc9┌─name──────┐
1. │ my_table1 │└───────────┘1 row in set. Elapsed: 0.004 sec.
ck2 :) use my_db;USE my_dbQuery id: 732b124f-4c5e-4224-8a40-0dbe6cd45071Ok.0 rows in set. Elapsed: 0.002 sec. ck2 :) show tables;SHOW TABLESQuery id: 2a6b6ccb-bf35-4542-87a1-3c2125a7c381┌─name──────┐
1. │ my_table1 │└───────────┘1 row in set. Elapsed: 0.004 sec.
ck3 :) use my_db;USE my_dbQuery id: 3282e619-d2b4-4227-9e86-2bef0705e77dOk.0 rows in set. Elapsed: 0.002 sec. ck3 :) show tables;SHOW TABLESQuery id: 8352c44e-13e6-435c-aace-c2337e262b4f┌─name──────┐
1. │ my_table1 │└───────────┘1 row in set. Elapsed: 0.006 sec.
ck4 :) use my_db;USE my_dbQuery id: b8e06690-7a12-45a5-bc2c-01d8575a0f1bOk.0 rows in set. Elapsed: 0.002 sec. ck4 :) show tables;SHOW TABLESQuery id: 9720c5d2-b8c0-4509-87d3-e426859c56f0┌─name──────┐
1. │ my_table1 │└───────────┘1 row in set. Elapsed: 0.004 sec.
6. 創建一個分布式表
?? 分布式表本身不存儲任何數據, 但是它提供了一個單一的表接口,用于統一訪問位于不同主機上的遠程表。
?? 可以在一臺或者多臺主機上創建分布式表,當查詢針對分布式表時,分布式表會將查詢轉發給所有主機,等待來自分片的查詢結果,然后計算并返回整個查詢結果
?? 直接將一個查詢發送到其中一個遠程表也是可以的,有時出于特定的數據分布原因這樣做也是合理的
?
分片和復制是完全獨立的,復制用于數據完整性和自動故障轉移,分片用于集群的水平擴展,也可以不使用分片。僅使用復制來保證數據的高度可用
?? host3主機故障之后,host1也會正常對外提供服務,當host3主機恢復后,host1的會自動同步副本給host3
?? clickhouse keeper或者zookeeper提供了數據復制的協調系統,用于通知分片的副本狀態變化,以確保分片的所有副本按照相同的順序執行相同的操作
clickhouse keeper只在clickhouse中存儲元數據
?? 接下來創建一個分布式表,以便位于不同主機上的遠程表提供單一表接口的統一登錄。可以在一個或者多個主機上創建分布式表:
?? 當分布式表指向當前服務器上創建的表時,可以使用as
子句自動采用該表的模式
ck1 :) CREATE TABLE my_db.my_table1_distributed ON CLUSTER cluster1
AS my_db.my_table1
ENGINE = Distributed(cluster1,my_db,my_table1, rand());CREATE TABLE my_db.my_table1_distributed ON CLUSTER cluster1 AS my_db.my_table1
ENGINE = Distributed(cluster1, my_db, my_table1, rand())Query id: 4e8d8a6b-defe-4436-bfd8-36c50c888af9┌─host─┬─port─┬─status─┬─error─┬─num_hosts_remaining─┬─num_hosts_active─┐
1. │ ck1 │ 9000 │ 0 │ │ 3 │ 0 │
2. │ ck2 │ 9000 │ 0 │ │ 2 │ 0 │
3. │ ck4 │ 9000 │ 0 │ │ 1 │ 0 │
4. │ ck3 │ 9000 │ 0 │ │ 0 │ 0 │└──────┴──────┴────────┴───────┴─────────────────────┴──────────────────┘4 rows in set. Elapsed: 0.069 sec.
?? 為展示ClickHouse的性能,下載一個ClickBench數據集,包含一億條記錄。接下來將一億條記錄加載到剛才創建的分布式表中。
?? 因為上面創建的分布式表使用了rand()
函數作為分布式表的分片鍵,rand函數返回一個在所有整數中均勻分布的偽隨機整數,因此一億條記錄將均勻分布在ClickHouse集群的兩個分片上,每個分片應該包含大約五千萬條記錄,每個分片的兩個副本最終應該包含相同的五千萬行
?? ClickBench下載地址:https://github.com/ClickHouse/ClickBench
https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz
wget https://datasets.clickhouse.com/hits_compatible/hits.tsv.gz --no-check-certificate
gzip -d hits.tsv.gz
ls -lh hits.tsv
-rw-r--r-- 1 root root 70G Jul 17 14:10 hits.tsvclickhouse-client --password --time --query "INSERT INTO my_db.my_table1_distributed FORMAT TSV" < hits.tsv
ck1 :) SELECT formatReadableQuantity(count()) FROM my_db.my_table1_distributed;
?? 接下來發送一個典型的分析查詢,將查詢發送到分布式表,分布式表將查詢轉發給分片,更準確的說是轉發給每個分片的一個副本,然后它返回一個組合結果
ck3 :) SELECTURL,COUNT(*) AS PageViewsFROM my_db.my_table1_distributedWHERE(CounterID = 62) AND(EventDate >= '2013-07-01') AND(EventDate <= '2013-07-31') AND(DontCountHits = 0) AND(Refresh = 0) AND(URL != '')GROUP BY URLORDER BY PageViews DESCLIMIT 10;
7. 同一環境創建多個ClickHouse集群
??在該環境下再配置一個cluster2的集群,它由四個非復制的分片組成,可以用于一些非關鍵任務數據,這些數據需要高吞吐量
<remote_servers><cluster1><shard><replica><host>ck1</host><port>9000</port><user>default</user><password>123.com</password></replica><replica><host>ck3</host><port>9000</port><user>default</user><password>123.com</password></replica><internal_replication>true</internal_replication></shard><shard><replica><host>ck2</host><port>9000</port><user>default</user><password>123.com</password></replica><replica><host>ck4</host><port>9000</port><user>default</user><password>123.com</password></replica><internal_replication>true</internal_replication></shard></cluster1><cluster2><shard><replica><host>ck1</host><port>9000</port></replica><replica><host>ck4</host><port>9000</port><user>default</user><password>123.com</password></replica><internal_replication>true</internal_replication></shard></cluster1><cluster2><shard><replica><host>ck1</host><port>9000</port><user>default</user><password>123.com</password></replica></shard><shard><replica><host>ck2</host><port>9000</port><user>default</user><password>123.com</password></replica></shard><shard><replica><host>ck3</host><port>9000</port><user>default</user><password>Newtv123.com</password></replica></shard><shard><replica><host>ck4</host><port>9000</port><user>default</user><password>123.com</password></replica></shard></cluster2></remote_servers>