Win10環境借助DockerDesktop部署最新版大數據時序數據庫Apache Druid32.0.0
前言
大數據分析中,有一種常見的場景,那就是時序數據,簡言之,數據一旦產生絕對不會修改,隨著時間流逝,每個時間點都會有個新的狀態值。這種時序數據的量級往往異常夸張,例如傳感器的原始監控數據:
https://lizhiyong.blog.csdn.net/article/details/114898620
一個簡單的加速度傳感器一年的數據量就是31e!!!制造業傳感器數據如果不經底層PLC
等下位機預處理,直接打到邊緣計算網關,即使mqtt
也會有巨大的負載!!!
類似的,還有服務器的原始監控數據,例如常見的Prometheus
和Zabbix
,當集群很多時,監控項同樣很多,再算上虛擬化后的容器和虛擬機內都可能部署了監控,此時的數據量級就灰常可觀!!!一小時幾百億條數據都是常見的事情!!!
但是很多原始的監控數據如果全部存下來,存儲成本高的可怕,同時信息密度極低,更多時候我們可能只關注近期的全部熱數據來做在線的模型訓練,人工查看每秒鐘幾千條數據也是不切合實際的,事實上,做一個簡單的秒級/分鐘級統計就能滿足大多數的分析場景,超過1天的冷數據其實已經沒什么時效性。
對于此類場景,可以高吞吐、預聚合的數據庫,在壓測后,從Apache Druid
、Clickhouse
、Kylin
中,選擇了前者。。。專業的事情要交給專業的組件去做!!!
對于非內核和二開的業務開發人員,更多場景應該關注的是API、特性及用法,不應該在部署這種事情上花費太多精力!!!筆者之前已部署了Docker Desktop:
https://lizhiyong.blog.csdn.net/article/details/145580868
今天在Win10環境再搭建個Apache Druid
最新版玩玩。
版本選擇
官網:
https://druid.apache.org/
注意不是阿里數據庫連接池的那個Druid
!!!
截至2025-02-13
,Apache Druid
最新版本是32.0.0
。
資源準備
參考官網:
https://druid.apache.org/docs/latest/tutorials/docker
官方給出了使用docker-compose.yml
編排容器的教程,作為一個實時組件,大內存是必須的!!!但是啟動8個容器【Zookeeper
+PostgreSQL
+6個Druid
】每個最多7GB內存也不是什么大事!!!
https://raw.githubusercontent.com/apache/druid/32.0.0/distribution/docker/docker-compose.yml
獲取到這個資源文件:
version: "2.2"volumes:metadata_data: {}middle_var: {}historical_var: {}broker_var: {}coordinator_var: {}router_var: {}druid_shared: {}services:postgres:container_name: postgresimage: postgres:latestports:- "5432:5432"volumes:- metadata_data:/var/lib/postgresql/dataenvironment:- POSTGRES_PASSWORD=FoolishPassword- POSTGRES_USER=druid- POSTGRES_DB=druid# Need 3.5 or later for container nodeszookeeper:container_name: zookeeperimage: zookeeper:3.5.10ports:- "2181:2181"environment:- ZOO_MY_ID=1coordinator:image: apache/druid:32.0.0container_name: coordinatorvolumes:- druid_shared:/opt/shared- coordinator_var:/opt/druid/vardepends_on:- zookeeper- postgresports:- "8081:8081"command:- coordinatorenv_file:- environmentbroker:image: apache/druid:32.0.0container_name: brokervolumes:- broker_var:/opt/druid/vardepends_on:- zookeeper- postgres- coordinatorports:- "8082:8082"command:- brokerenv_file:- environmenthistorical:image: apache/druid:32.0.0container_name: historicalvolumes:- druid_shared:/opt/shared- historical_var:/opt/druid/vardepends_on: - zookeeper- postgres- coordinatorports:- "8083:8083"command:- historicalenv_file:- environmentmiddlemanager:image: apache/druid:32.0.0container_name: middlemanagervolumes:- druid_shared:/opt/shared- middle_var:/opt/druid/vardepends_on: - zookeeper- postgres- coordinatorports:- "8091:8091"- "8100-8105:8100-8105"command:- middleManagerenv_file:- environmentrouter:image: apache/druid:32.0.0container_name: routervolumes:- router_var:/opt/druid/vardepends_on:- zookeeper- postgres- coordinatorports:- "3012:8888" #這里筆者改為3012防止霸占有用的端口command:- routerenv_file:- environment
參照官網另一篇:
https://druid.apache.org/docs/latest/configuration/
自己玩玩可以先不改這些運行時配置,容器啟動的,后續要重新部署也非常容易!!!
還需要:
https://raw.githubusercontent.com/apache/druid/32.0.0/distribution/docker/environment
做另一個配置文件:
# Java tuning
#DRUID_XMX=1g
#DRUID_XMS=1g
#DRUID_MAXNEWSIZE=250m
#DRUID_NEWSIZE=250m
#DRUID_MAXDIRECTMEMORYSIZE=6172m
DRUID_SINGLE_NODE_CONF=micro-quickstartdruid_emitter_logging_logLevel=debugdruid_extensions_loadList=["druid-histogram", "druid-datasketches", "druid-lookups-cached-global", "postgresql-metadata-storage", "druid-multi-stage-query"]druid_zk_service_host=zookeeperdruid_metadata_storage_host=
druid_metadata_storage_type=postgresql
druid_metadata_storage_connector_connectURI=jdbc:postgresql://postgres:5432/druid
druid_metadata_storage_connector_user=druid
druid_metadata_storage_connector_password=FoolishPassworddruid_indexer_runner_javaOptsArray=["-server", "-Xmx1g", "-Xms1g", "-XX:MaxDirectMemorySize=3g", "-Duser.timezone=UTC", "-Dfile.encoding=UTF-8", "-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager"]
druid_indexer_fork_property_druid_processing_buffer_sizeBytes=256MiBdruid_storage_type=local
druid_storage_storageDirectory=/opt/shared/segments
druid_indexer_logs_type=file
druid_indexer_logs_directory=/opt/shared/indexing-logsdruid_processing_numThreads=2
druid_processing_numMergeBuffers=2DRUID_LOG4J=<?xml version="1.0" encoding="UTF-8" ?><Configuration status="WARN"><Appenders><Console name="Console" target="SYSTEM_OUT"><PatternLayout pattern="%d{ISO8601} %p [%t] %c - %m%n"/></Console></Appenders><Loggers><Root level="info"><AppenderRef ref="Console"/></Root><Logger name="org.apache.druid.jetty.RequestLog" additivity="false" level="DEBUG"><AppenderRef ref="Console"/></Logger></Loggers></Configuration>
部署文件看起來麻雀雖小五臟俱全!!!
部署
PS C:\Users\zhiyong> cd E:\dockerData\volume\druid1
PS E:\dockerData\volume\druid1> ls目錄: E:\dockerData\volume\druid1Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 2025-02-13 23:26 2980 docker-compose.yml
-a---- 2025-02-13 23:33 1576 environment
PS E:\dockerData\volume\druid1> docker compose up -d
time="2025-02-13T23:34:39+08:00" level=warning msg="E:\\dockerData\\volume\\druid1\\docker-compose.yml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion"
[+] Running 72/15? router Pulled 230.7s ? coordinator Pulled 230.7s ? postgres Pulled 181.0s ? historical Pulled 230.7s ? broker Pulled 230.7s ? middlemanager Pulled 230.7s ? zookeeper Pulled 85.7s [+] Running 15/15? Network druid1_default Created 0.1s ? Volume "druid1_druid_shared" Created 0.0s ? Volume "druid1_historical_var" Created 0.0s ? Volume "druid1_middle_var" Created 0.0s ? Volume "druid1_router_var" Created 0.0s ? Volume "druid1_metadata_data" Created 0.0s ? Volume "druid1_coordinator_var" Created 0.0s ? Volume "druid1_broker_var" Created 0.0s ? Container postgres Started 2.4s ? Container zookeeper Started 2.4s ? Container coordinator Started 1.6s ? Container router Started 2.5s ? Container broker Started 2.3s ? Container historical Started 2.5s ? Container middlemanager Started 2.8s
PS E:\dockerData\volume\druid1>
拉取鏡像成功后很快就能拉起容器:
好家伙。。。還順便把其它組件的端口也給暴露出來了。。。
于是還**白piao
**到一個PG和Zookeeper
!!!
驗證
http://localhost:3012/unified-console.html#
灰常好,現在已經擁有了一個最新Apache Druid32.0.0
!!!
轉載請注明出處:https://lizhiyong.blog.csdn.net/article/details/145622903