hadoop+hive-0.10.0完全分布式安裝方法
1、jdk版本:jdk-7u60-linux-x64.tar.gz
http://www.oracle.com/technetwork/cn/java/javase/downloads/jdk7-downloads-1880260.html
2、hive版本:hive-0.10.0.tar.gz
https://archive.apache.org/dist/hive/hive-0.10.0/
3、hadoop版本:hadoop-2.2.0.tar.gz
http://apache.fayea.com/apache-mirror/hadoop/common/hadoop-2.2.0/
4、Linux操作系統:ubuntu-14.04-server-amd64.iso
http://releases.ubuntu.com/14.04/
模擬3臺安裝,在hosts文件中添加以下信息,功能分配如下:
192.168.1.150???hdp1???//namenode,SecondaryNamenode,ResourceManager
192.168.1.151???hdp2???//datanode,nodemanager
192.168.1.152???hdp3???//datanode,nodemanager
1、jdk安裝
(1)將下載的jdk文件jdk-7u60-linux-x64.tar.gz解壓到相應的文件夾下(可根據情況自己選擇安裝路徑):
#?tar?zxf?jdk-7u60-linux-x64.tar.gz
#?mv??jdk1.7.0_60??/usr/local/jdk7
?
(2)配置jdk?環境變量
?????#?vi?~/.bashrc???打開.bashrc文件,添加下面配置信息
export?JAVA_HOME="/usr/local/jdk7"
export?PATH="$PATH:$JAVA_HOME/bin"
?
(3)驗證是否安裝正確
#?java?-version?
java?version?"1.7.0_60"
Java(TM)?SE?Runtime?Environment?(build?1.7.0_60-b19)
Java?HotSpot(TM)?64-Bit?Server?VM?(build?24.60-b09,?mixed?mode)
?
2、新建一個用戶,如hadoop,并設置密碼
#?groupadd?hadoop?
#?useradd?-c?"Hadoop?User"?-d?/home/hadoop?-g?hadoop?-m?-s?/bin/bash?hadoop
#?passwd?hadoop
??hadoop
3、配置ssh?
(1)切換到hdp1新建的hadoop用戶下?:#?su?-?hadoop
(2)$?ssh-keygen?-t?rsa
(3)$?cat?.ssh/id_rsa.pub?>>?.ssh/authorized_keys
(4)$?ssh?localhost?驗證是否成功
(5)hdp2,hdp3采用同樣的方法配置ssh,然后將各自的.ssh/id_rsa.pub追加到hdp1的.ssh/authorized_keys中,實現hdp1到hdp2、hdp3的免密碼登錄,方便啟動服務
登錄到hdp2的hadoop用戶:scp?.ssh/id_rsa.pub?hadoop@hdp1:.ssh/hdp2_rsa
登錄到hdp3的hadoop用戶:scp?.ssh/id_rsa.pub?hadoop@hdp1:.ssh/hdp3_rsa
在hdp1中:cat?.ssh/hdp2_rsa?>>?.ssh/authorized_keys
???????????????cat?.ssh/hdp3_rsa?>>?.ssh/authorized_keys
注:以上的準備工作三臺機器應完全一樣,尤其注意安裝的目錄,修改相應的主機名等信息
?
接下來安裝Hadoop部分
1、?解壓文件,并配置環境變量
將下載的hadoop-2.2.0.tar.gz解壓到/home/hadoop路徑下:
???tar?-zxvf?hadoop-2.2.0.tar.gz?/home/hadoop/
移動hadoop-2.2.0到/usr/local目錄下:
??sudo?mv?hadoop-2.2.0?/usr/local/
注意:每臺機器的安裝路徑要相同!!
??#?vi?~/.bashrc???打開.bashrc文件,添加下面配置信息
export?HADOOP_HOME=/usr/local/hadoop
export?HADOOP_MAPRED_HOME=${HADOOP_HOME}
export?HADOOP_COMMON_HOME=${HADOOP_HOME}
export?HADOOP_HDFS_HOME=${HADOOP_HOME}
export?YARN_HOME=${HADOOP_HOME}
export?HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export?PATH=$PATH:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
?
2、?hadoop配置過程
配置之前,需要在hdp1本地文件系統創建以下文件夾:
~/hadoop/dfs/name
~/hadoop/dfs/data
~/hadoop/temp
這里要涉及到的配置文件有7個:
~/hadoop/etc/hadoop/hadoop-env.sh
~/hadoop/etc/hadoop/yarn-env.sh
~/hadoop/etc/hadoop/slaves
~/hadoop/etc/hadoop/core-site.xml
~/hadoop/etc/hadoop/hdfs-site.xml
~/hadoop/etc/hadoop/mapred-site.xml
~/hadoop/etc/hadoop/yarn-site.xml
以上個別文件默認不存在的,可以復制相應的template文件獲得。
配置文件1:hadoop-env.sh
修改JAVA_HOME值(export?JAVA_HOME=/usr/java/jdk7)
配置文件2:yarn-env.sh
修改JAVA_HOME值(export?JAVA_HOME=/usr/java/jdk7)
配置文件3:slaves?(這個文件里面保存所有slave節點)
寫入以下內容:
hdp2?
hdp3
配置文件4:core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hdp1:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/hadoop/temp</value>
<description>Abase?for?other?temporary?directories.</description>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
</configuration>
配置文件5:hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hdp1:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
配置文件6:mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hdp1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hdp1:19888</value>
</property>
</configuration>
配置文件7:yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hdp1:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>?hdp1:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>?hdp1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>?hdp1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>?hdp1:8088</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8092</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
</configuration>
?
3、復制到其他節點
這里可以寫一個shell腳本進行操作(有大量節點時比較方便)
cp2slave.sh
#!/bin/bash
scp?–r?/usr/local/hadoop?hadoop@hdp2:/usr/local/
scp?–r?/usr/local/hadoop?hadoop@hdp3:/usr/local/
4、啟動驗證
4.1?啟動hadoop
進入安裝目錄:?cd?hadoop/
(1)格式化namenode:bin/hdfs?namenode?–format
(2)啟動hdfs:?sbin/start-dfs.sh
此時在hdp1上面運行的進程有:namenode,?secondarynamenode
hdp2和hdp3上面運行的進程有:datanode
(3)啟動yarn:?sbin/start-yarn.sh
此時在hdp1上面運行的進程有:namenode,secondarynamenode,resourcemanager
hdp2和hdp3上面運行的進程有:datanode,nodemanager
(4)啟動historyserver:?sbin/mr-jobhistory-daemon.sh?start?historyserver
查看集群狀態:bin/hdfs?dfsadmin?–report
查看文件塊組成:bin/hdfs?fsck?/?-files?-blocks
查看HDFS:?http://192.168.1.150:50070
查看RM:?http://192.168.1.150:8088
4.2?運行示例程序:
先在hdfs上創建一個文件夾
bin/hadoop?jar?./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar?pi?2?1000
接下來安裝Mysql部分,存儲hive元數據
1、sudo?apt-get?install?mysql-server??按提示安裝,并設置root用戶密碼
2、創建mysql用戶hive
???$?mysql?-u?root?-p?進入root用戶
????mysql>?CREATE?USER?'hive'@'%'?IDENTIFIED?BY?'hive';?
3、授權:??
???mysql>?GRANT?ALL?PRIVILEGES?ON?*.*?TO'hive'@'%'?WITH?GRANT?OPTION;
4、登錄到hadoop?用戶?$?mysql?-u?hiv?-p?
5、創建數據庫hive
mysql>create?database?hive;?
接下來安裝Hive部分
1、?解壓文件,并配置環境變量
將下載的hive-0.10.0.tar.gz解壓到/home/hadoop路徑下。
sudo?mv?hive/usr/local/
注意:每臺機器的安裝路徑要相同!!
??#?vi?~/.bashrc???打開.bashrc文件,添加下面配置信息
export?HIVE_HOME=/usr/local/hive
export?PATH=$PATH:${HIVE_HOME}/bin
2、在hive/conf中添加hive-site.xml
<?xml?version="1.0"?>
<?xml-stylesheet?type="text/xsl"?href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.local</name>
<value>false</value>
<description>Thrift?uri?for?the?remote?metastore.?Used?by?metastore?client?to?connect?to?remote?metastore.</description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://master:9083</value>
<description></description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC?connect?string?for?a?JDBC?metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver?class?name?for?a?JDBC?metastore</description>
</property>
<property>
<name>javax.jdo.PersistenceManagerFactoryClass</name>
<value>org.datanucleus.jdo.JDOPersistenceManagerFactory</value>
<description>class?implementing?the?jdo?persistence</description>
</property>
<property>
<name>javax.jdo.option.DetachAllOnCommit</name>
<value>true</value>
<description>detaches?all?objects?from?session?so?that?they?can?be?used?after?transaction?is?committed</description>
</property>
<property>
<name>javax.jdo.option.NonTransactionalRead</name>
<value>true</value>
<description>reads?outside?of?transactions</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username?to?use?against?metastore?database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password?to?use?against?metastore?database</description>
</property>
</configuration>
3、將mysql?jdbc?driver拷貝到hive的lib下
4、啟動hive并測試:
hive>?show?tables;
OK
Time?taken:?5.204?seconds