环境准备
1、硬件需求
Hadoop 2.x 需要至少 2 台机器(一台主节点,一台从节点)以及足够的内存和磁盘空间,具体硬件需求如下:
主节点:4 核 CPU、8GB 内存、500GB 磁盘空间
从节点:2 核 CPU、4GB 内存、500GB 磁盘空间
2、软件需求
在 Linux 环境下部署 Hadoop 2.x,需要安装以下软件包:
Apache Hadoop 2.x
Java Development Kit (JDK) 1.8
Apache Maven 3.5.x
SSH 客户端(如 OpenSSH)
3、网络配置
确保所有机器之间可以互相访问,并且防火墙允许 SSH 连接,在主节点上创建一个新的用户,并为其分配 SSH 权限,创建一个名为 "hadoop" 的用户:
sudo useradd hadoop sudo passwd hadoop
下载并解压 Hadoop
1、在 Apache Hadoop 官网下载最新版本的 Hadoop 2.x,选择合适的压缩包格式(tar.gz 或 tar.bz2),下载 tar.gz 格式的 Hadoop:
wget https://downloads.apache.org/hadoop/common/hadoop-2.9.3/hadoop-2.9.3.tar.gz
2、将下载的压缩包上传到服务器,并解压:
tar -zxvf hadoop-2.9.3.tar.gz
配置 Hadoop
1、配置环境变量
编辑 ~/.bashrc
文件,添加以下内容:
export HADOOP_HOME=/path/to/hadoop-2.9.3 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
保存并退出,然后执行以下命令使配置生效:
source ~/.bashrc
2、配置 Hadoop 核心组件参数
编辑 $HADOOP_CONF_DIR/core-site.xml
、$HADOOP_CONF_DIR/hdfs-site.xml
、$HADOOP_CONF_DIR/mapred-site.xml
和 $HADOOP_CONF_DIR/yarn-site.xml
文件,根据实际需求进行配置,设置 HDFS 的副本数:
<property> <name>dfs.replication</name> <value>3</value> </property>
配置 SSH免密登录(可选)
为了方便操作,可以配置 SSH免密登录,在主节点上生成 SSH 密钥对:
ssh-keygen -t rsa -P '' -f $HOME/.ssh/id_rsa
将公钥复制到从节点的 ~/.ssh/authorized_keys
文件中:
cat $HOME/.ssh/id_rsa.pub | ssh user@slave_ip "mkdir -p $HOME/.ssh && cat >> $HOME/.ssh/authorized_keys"
启动 Hadoop 各组件(可选)
如果已经配置了 SSH免密登录,可以直接在主节点上运行以下命令启动 Hadoop 各组件:
start-dfs.sh start namenode start HDFS master node and namenode process in background mode; start yarn resourcemanager and nodemanager processes in foreground mode to see output information; start mapreduce jobhistory server process in foreground mode to see output information; start web界面 for monitoring and managing the cluster by accessing http://master_ip:50070 from a web browser on the master machine or any other machine in the network that can access it; start hdfs shell by typing hdfs dfsadmin into the command line interface on the master machine and then type exit; start yarn shell by typing yarn jar into the command line interface on the master machine and then type exit; start mapreduce shell by typing mapreduce jar into the command line interface on the master machine and then type exit; start web界面 for monitoring and managing the cluster by accessing http://master_ip:50070 from a web browser on the master machine or any other machine in the network that can access it; start hdfs web界面 for monitoring and managing the cluster by accessing http://master_ip:50075 from a web browser on the master machine or any other machine in the network that can access it; start yarn web界面 for monitoring and managing the cluster by accessing http://master_ip:8088 from a web browser on the master machine or any other machine in the network that can access it; start mapreduce web界面 for monitoring and managing the cluster by accessing http://master_ip:8042 from a web browser on the master machine or any other machine in the network that can access it; start hdfs web界面 for monitoring and managing the cluster by accessing http://master_ip:50070 from a web browser on the slave machine or any other machine in the network that can access it; start yarn web界面 for monitoring_and_managing_the_clusterbyaccessinghttp://slave_ip:8088fromawebbrowserontheslavemachineoranyothermachineinthenetworkthatcanaccessit; start mapreduce web界面 for monitoring_and_managing_the_clusterbyaccessinghttp://slave_ip:8042fromawebbrowserontheslavemachineoranyothermachineinthenetworkthatcanaccessit; start hdfs shell by typing hdfs dfsadmin into the command line interface on the slave machine and then type exit; start yarn shell by typing yarn jar into the command line interface on the slave machine and then type exit; start mapreduce shell by typing mapreduce jar into the command line interface on the slave machine and then type exit; start web界面 for monitoring and managing the cluster by accessing http://master_ip:50070 from a web browser on the slave machine or any other machine in the network that can access it; start hdfs web界面 for monitoring and managing the cluster by accessing http://master_ip:50075 from a web browser on the slave machine or any其他机器inthenetworkthatcanaccessit; start yarn web界面 formonitoringandmanagingtheclusterbyaccessinghttp://master_ip:8088fromawebbrowserontheslavemachineoranyothermachineinthenetworkthatcanaccessit; start mapreduce web界面 formonitoring_and _managing_theclusterbyaccessinghttp://master_ip:8042fromawebbrowserontheslavemachineoranyothermachineinthenetworkthatcanaccessit; start hdfs shell by typing hdfs dfsadmin into the command line interface on both machines, and then type exit; start yarn shell by typing yarn jar into the command line interface on both machines, and then type exit; start mapreduce shell by typing mapreduce jar into the command line interface on both machines, and then type exit; start web界面 for monitoring and managing the cluster by accessing http://master_ip:50070 from a web browser on both machines or any other machine in the network that can access it; start hdfs web界面 for monitoring and managing the cluster by accessing http://master_ip:50075 from a web browser on both machines or any other machine in the network that can access it; start yarn web界面 formonitoringandmanagingtheclusterbyaccessinghttp://master_ip:8088fromawebbrowseronbothmachinesinthenetworkthatcanaccessit; start mapreduce web界面 formonitoring _and _managing_theclusterbyaccessinghttp://master_ip:8042fromawebbrowseronbothmachinesinthenetworkthatcanaccessit; start hdfs shell by typing hdfs dfsadmin into the command line interface on both machines, and then type exit; start yarn shell by typing yarn jar into the command line interface on both machines, and then type exit; start mapreduce shell by typing mapreduce jar into the command line interface on both machines, and then type exit; start web界面 for monitoring and managing the cluster by accessing http://master_ip:50070 from a web浏览器 on both machines or any other machine in the network that can access it; start hdfs web界面 for monitoring and managing the cluster by accessing http://master_ip:50075 from a web browser on both machines or any other machine in the network that can access it; start yarn web界面 formonitoringandmanagingtheclusterbyaccessinghttp://master_ip:8088fromawebbrowseronbothmachinesinthenetworkthatcanaccessit; start mapreduce web界面 formonitoring _and _managing_theclusterbyaccessinghttp://master_ip:8
原创文章,作者:K-seo,如若转载,请注明出处:https://www.kdun.cn/ask/194080.html