背景
從業務發展需求,大數據平臺須要使用spark作為機器學習、數據挖掘、實時計算等工作,所以決定使用Cloudera Manager5.2.0版本號和CDH5。
曾經搭建過Cloudera Manager4.8.2和CDH4,在搭建Cloudera Manager5.2.0版本號的時候,發現對應的Service Host Monitor 和 Service Monitor不能配置外部表,剛開是還以為是配置出錯,后來才發現應該是新版本號的Cloudera的存儲改變方式了。查了非常多文檔,果然發現,新版本號中Service Host Monitor 和 ServicMonitore 不須要配置數據庫,默認使用內置存儲方式。而且不能改動。
Cloudera Manager uses databases to store information about the Cloudera Manager configuration, as well as information such as the health of the system or task progress. For quick, simple installations, Cloudera Manager can install and configure an embedded PostgreSQL database as part of the Cloudera Manager installation process. In addition, some CDH services use databases and are automatically configured to use a default database. If you plan to use the embedded and default databases provided during the Cloudera Manager installation, see Installation Path A - Automated Installation by Cloudera Manager.
Although the embedded database is useful for getting started quickly, you can also use your own?PostgreSQL, MySQL, or Oracle database?for the Cloudera Manager Server and services that use databases.
須要的數據庫
The?Cloudera Manager Server,?Activity Monitor, Reports Manager, Hive Metastore, Sentry Server, Cloudera Navigator Audit Server, and?Cloudera Navigator Metadata Server?all require databases. The type of data contained in the databases and their estimated sizes are as follows:
- Cloudera Manager - Contains all the information about services you have configured and their role assignments, all configuration history, commands, users, and running processes. This relatively small database (<100 MB) is the most important to back up.
- Activity Monitor - Contains information about past activities. In large clusters, this database can grow large. Configuring an Activity Monitor database is only necessary if a MapReduce service is deployed.
- Reports Manager - Tracks disk utilization and processing activities over time. Medium-sized.
- Hive Metastore - Contains Hive metadata. Relatively small.
- Sentry Server - Contains authorization metadata. Relatively small.
- Cloudera Navigator Audit Server - Contains auditing information. In large clusters, this database can grow large.
- Cloudera Navigator Metadata Server - Contains authorization, policies, and audit report metadata. Relatively small.
The Cloudera Manager Service Host Monitor and Service Monitor roles have an?internal datastore.?(注意。就是此處說明了, Host Monitor and Service Monitor在CM5版本號中,不能配置外部表,僅僅能使用內置表。
與CM4版本號有差別)
Cloudera Manager 提供三種不同的安裝方式,方法A是自己主動化安裝。方法B和C是使用rpm或tar手動安裝:
- Path A automatically installs an embedded PostgreSQL database to meet the requirements of the services. This path reduces the number of installation tasks to complete and choices to make. In Path A you can optionally choose to create external databases forActivity Monitor, Reports Manager, Hive Metastore, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server.
- Path B and Path C require you to create databases for the Cloudera Manager Server,?Activity Monitor, Reports Manager, Hive Metastore, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server.
使用外部數據庫須要很多其它的輸入以及相關工作,可是cloudera提供了很多其它的兼容性和擴展性,讓你能夠彈性的選擇數據庫和配置。
當然能夠在一套系統中安裝多種不同的數據庫。可是這樣會帶來非常多不確定的因素。所以cloudera建議始終使用同一種數據庫。
在非常多樣例中,你須要將對應的service與database安裝到同一臺機器上,能夠減小網絡IO。提高總體效率。
當然,你也能夠將service和database分開安裝到不同的機器上。在大型部署中或者database管理員須要這種配置,比方這種場景,Oracle DBA須要獨立的管理database。
搭建數據庫的配置參考官網。有具體配置步驟:
搭建Cloudera Manager Server數據庫
為Activity Monitor, Reports Manager, Hive Metastore, Sentry Server, Cloudera Navigator Audit Server, and Cloudera Navigator Metadata Server搭建外部數據庫
為Hue。Oozie搭建外部數據庫
下一篇文章中,我將具體介紹Cloudera Manager中database的存儲機制。如何配置,調優等。
?原創文章。歡迎轉載,轉載請標明出處?