文章目錄
- 1、免密登錄
- 2、安裝hadoop
- 3、Spark配置
具體詳細報告見資源部分,全部實驗內容已經上傳,如有需要請自行下載。
1、免密登錄
使用的配置命令:
cd ~/.ssh/
ssh-keygen -t rsa
Enter鍵回車
y
回車
回車
出現如上所示
cat ./id_rsa.pub >> ./authorized_keys
ssh hadoop01
exit
scp /root/.ssh/id_rsa.pub root@hadoop02:/root/.ssh/id_rsa.pub
然后輸入hadoop02的密碼,去復制就行
scp /root/.ssh/id_rsa.pub root@hadoop03:/root/.ssh/id_rsa.pub
然后輸入hadoop03的密碼,去復制就行
顯示圖示這樣的,重啟就行了。
全部重啟一下,從開頭輸入一下命令,驗證:ssh hadoop02
ssh hadoop03
不需要密碼,則已經成功,退出:exit
2、安裝hadoop
java -version
顯示如下:
nano ~/.bashrc
在文本的最后加入:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:$JAVA_HOME/bin
然后保存退出:Ctrl+X,然后輸入Y,回車即可
讓配置生效:
source ~/.bashrc
驗證JAVA_HOME 配置是否成功:
echo $JAVA_HOME
如上所示JAVA_HOME 已經配置成功
cd /usr/local
wget https://downloads.apache.org/hadoop/common/hadoop-3.3.5/hadoop-3.3.5.tar.gz
解壓:tar -xzvf hadoop-3.3.5.tar.gz
重命名:mv hadoop-3.3.5 /usr/local/hadoop
修改文件權限:chown -R root:root ./hadoop
ls -1 hadoop/
配置 Hadoop 環境變量:
nano ~/.bashrc
在最下面加入:
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
保存退出:Ctrl+X,Y,回車
source ~/.bashrc
檢查Hadoop命令是否可用:
cd /usr/local/hadoop
./bin/hadoop version
配置集群/分布式環境:
修改文件profile:
cd /usr/local/hadoop/etc/hadoop
nano /etc/profile
加入如下內容:
# Hadoop Service Users
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
source /etc/profile
修改文件workers:
nano workers
hadoop01
hadoop02
hadoop03
保存退出:Ctrl+X,Y,回車
修改文件core-site.xml:
nano core-site.xml
添加如下配置:
<configuration><property><name>fs.defaultFS</name><value>hdfs://hadoop01:9000</value></property><property><name>hadoop.tmp.dir</name><value>file:/usr/local/hadoop/tmp</value><description>Abase for other temporary directories.</description></property>
</configuration>
修改文件hdfs-site.xml:
nano hdfs-site.xml
添加如下內容:
<configuration><property><name>dfs.namenode.secondary.http-address</name><value>hadoop03:50090</value></property><property><name>dfs.replication</name><value>3</value></property><property><name>dfs.namenode.name.dir</name><value>file:/usr/local/hadoop/tmp/dfs/name</value></property><property><name>dfs.datanode.data.dir</name><value>file:/usr/local/hadoop/tmp/dfs/data</value></property>
</configuration>
修改文件mapred-site.xml:
nano mapred-site.xml
<configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property><property><name>mapreduce.jobhistory.address</name><value>hadoop01:10020</value></property><property><name>mapreduce.jobhistory.webapp.address</name><value>hadoop01:19888</value></property><property><name>yarn.app.mapreduce.am.env</name><value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property><name>mapreduce.map.env</name><value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property><property><name>mapreduce.reduce.env</name><value>HADOOP_MAPRED_HOME=/usr/local/hadoop</value></property>
</configuration>
保存退出:Ctrl+X,Y,回車
修改文件 yarn-site.xml:
nano yarn-site.xml
<configuration><!-- Site specific YARN configuration properties --><property><name>yarn.resourcemanager.hostname</name><value>hadoop01</value></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property>
</configuration>
保存退出:Ctrl+X,Y,回車
修改文件hadoop-env.sh:
nano hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
保存退出:Ctrl+X,Y,回車
復制hadoop01節點的Hadoop文件夾,分發:
cd /usr/local
tar -zcf ~/hadoop.master.tar.gz ./hadoop
cd ~
scp ./hadoop.master.tar.gz hadoop02:/root
scp ./hadoop.master.tar.gz hadoop03:/root
在02中:
tar -zxf ~/hadoop.master.tar.gz -C /usr/local
chown -R root /usr/local/hadoop
在hadoop03中:
tar -zxf ~/hadoop.master.tar.gz -C /usr/local
chown -R root /usr/local/hadoop
在hadoop01中:
cd /usr/local/hadoop
./bin/hdfs namenode -format
啟動hadoop:
cd /usr/local/hadoop
./sbin/start-dfs.sh
./sbin/start-yarn.sh
./sbin/mr-jobhistory-daemon.sh start historyserver
jps
在hadoop02:jps
在hadoop03:jps
回hadoop01:
./bin/hdfs dfsadmin -report
stop-yarn.sh
stop-dfs.sh
mr-jobhistory-daemon.sh stop historyserver
成功結束Hadoop相關配置。
3、Spark配置
將spark解壓到/usr/local中:
tar -zxf /root/spark-3.4.2-bin-without-hadoop.tgz -C /usr/local
cd /usr/local
mv ./spark-3.4.2-bin-without-hadoop ./spark
chown -R root ./spark
(2)配置相關文件:
修改spark-env.sh文件:
cd /usr/local/spark
cp ./conf/spark-env.sh.template ./conf/spark-env.sh
nano ./conf/spark-env.sh
export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath)
Ctrl+X,Y,回車
發現不對,往回找,然后一個里面內容不對,修改.bashrc文件:
nano ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:$JAVA_HOME/binexport PATH=$PATH:/usr/local/hadoop/bin:/usr/local/hadoop/sbin
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbinexport SPARK_HOME=/usr/local/spark
export JRE_HOME=${JAVA_HOME}jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:${JAVA_HOME}/bin:$HADOOP_HOME/bin:$SPARK_HOME/bin
export PYSPARK_PYTHON=/root/anaconda3/bin/python
Ctrl+X,Y,回車
source ~/.bashrc
(3)設置日志信息:
cd /usr/local/spark/conf
sudo mv log4j2.properties.template log4j.properties
vim log4j.properties
按i進入編輯模式
將里面的rootLogger.level改成=error
先ESC退出編輯模式,然后保存并退出:在命令模式下輸入 :wq,然后按 Enter。
驗證Spark是否安裝成功:
cd /usr/local/spark
./bin/run-example SparkPi
使用Anaconda修改Python版本:
conda create -n pyspark python=3.8
y
切換python環境:
conda activate pyspark
啟動pyspark:
cd /usr/local/spark
./bin/pyspark
安裝 Spark(Spark on YARN模式):
cd /usr/local/spark
./bin/pyspark --master yarn
成功結束!