linux中hadoop的环境搭建

环境准备

1、硬件需求

Hadoop 2.x 需要至少 2 台机器(一台主节点,一台从节点)以及足够的内存和磁盘空间,具体硬件需求如下:

linux中hadoop的环境搭建

主节点:4 核 CPU、8GB 内存、500GB 磁盘空间

从节点:2 核 CPU、4GB 内存、500GB 磁盘空间

2、软件需求

在 Linux 环境下部署 Hadoop 2.x,需要安装以下软件包:

Apache Hadoop 2.x

Java Development Kit (JDK) 1.8

Apache Maven 3.5.x

linux中hadoop的环境搭建

SSH 客户端(如 OpenSSH)

3、网络配置

确保所有机器之间可以互相访问,并且防火墙允许 SSH 连接,在主节点上创建一个新的用户,并为其分配 SSH 权限,创建一个名为 "hadoop" 的用户:

sudo useradd hadoop
sudo passwd hadoop

下载并解压 Hadoop

1、在 Apache Hadoop 官网下载最新版本的 Hadoop 2.x,选择合适的压缩包格式(tar.gz 或 tar.bz2),下载 tar.gz 格式的 Hadoop:

wget https://downloads.apache.org/hadoop/common/hadoop-2.9.3/hadoop-2.9.3.tar.gz

2、将下载的压缩包上传到服务器,并解压:

tar -zxvf hadoop-2.9.3.tar.gz

配置 Hadoop

1、配置环境变量

编辑 ~/.bashrc 文件,添加以下内容:

linux中hadoop的环境搭建

export HADOOP_HOME=/path/to/hadoop-2.9.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

保存并退出,然后执行以下命令使配置生效:

source ~/.bashrc

2、配置 Hadoop 核心组件参数

编辑 $HADOOP_CONF_DIR/core-site.xml$HADOOP_CONF_DIR/hdfs-site.xml$HADOOP_CONF_DIR/mapred-site.xml$HADOOP_CONF_DIR/yarn-site.xml 文件,根据实际需求进行配置,设置 HDFS 的副本数:

<property>
  <name>dfs.replication</name>
  <value>3</value>
</property>

配置 SSH免密登录(可选)

为了方便操作,可以配置 SSH免密登录,在主节点上生成 SSH 密钥对:

ssh-keygen -t rsa -P '' -f $HOME/.ssh/id_rsa

将公钥复制到从节点的 ~/.ssh/authorized_keys 文件中:

cat $HOME/.ssh/id_rsa.pub | ssh user@slave_ip "mkdir -p $HOME/.ssh && cat >> $HOME/.ssh/authorized_keys"

启动 Hadoop 各组件(可选)

如果已经配置了 SSH免密登录,可以直接在主节点上运行以下命令启动 Hadoop 各组件:

start-dfs.sh start namenode  start HDFS master node and namenode process in background mode; start yarn resourcemanager and nodemanager processes in foreground mode to see output information; start mapreduce jobhistory server process in foreground mode to see output information; start web界面 for monitoring and managing the cluster by accessing http://master_ip:50070 from a web browser on the master machine or any other machine in the network that can access it; start hdfs shell by typing hdfs dfsadmin into the command line interface on the master machine and then type exit; start yarn shell by typing yarn jar into the command line interface on the master machine and then type exit; start mapreduce shell by typing mapreduce jar into the command line interface on the master machine and then type exit; start web界面 for monitoring and managing the cluster by accessing http://master_ip:50070 from a web browser on the master machine or any other machine in the network that can access it; start hdfs web界面 for monitoring and managing the cluster by accessing http://master_ip:50075 from a web browser on the master machine or any other machine in the network that can access it; start yarn web界面 for monitoring and managing the cluster by accessing http://master_ip:8088 from a web browser on the master machine or any other machine in the network that can access it; start mapreduce web界面 for monitoring and managing the cluster by accessing http://master_ip:8042 from a web browser on the master machine or any other machine in the network that can access it; start hdfs web界面 for monitoring and managing the cluster by accessing http://master_ip:50070 from a web browser on the slave machine or any other machine in the network that can access it; start yarn web界面 for monitoring_and_managing_the_clusterbyaccessinghttp://slave_ip:8088fromawebbrowserontheslavemachineoranyothermachineinthenetworkthatcanaccessit; start mapreduce web界面 for monitoring_and_managing_the_clusterbyaccessinghttp://slave_ip:8042fromawebbrowserontheslavemachineoranyothermachineinthenetworkthatcanaccessit; start hdfs shell by typing hdfs dfsadmin into the command line interface on the slave machine and then type exit; start yarn shell by typing yarn jar into the command line interface on the slave machine and then type exit; start mapreduce shell by typing mapreduce jar into the command line interface on the slave machine and then type exit; start web界面 for monitoring and managing the cluster by accessing http://master_ip:50070 from a web browser on the slave machine or any other machine in the network that can access it; start hdfs web界面 for monitoring and managing the cluster by accessing http://master_ip:50075 from a web browser on the slave machine or any其他机器inthenetworkthatcanaccessit; start yarn web界面 formonitoringandmanagingtheclusterbyaccessinghttp://master_ip:8088fromawebbrowserontheslavemachineoranyothermachineinthenetworkthatcanaccessit; start mapreduce web界面 formonitoring_and _managing_theclusterbyaccessinghttp://master_ip:8042fromawebbrowserontheslavemachineoranyothermachineinthenetworkthatcanaccessit; start hdfs shell by typing hdfs dfsadmin into the command line interface on both machines, and then type exit; start yarn shell by typing yarn jar into the command line interface on both machines, and then type exit; start mapreduce shell by typing mapreduce jar into the command line interface on both machines, and then type exit; start web界面 for monitoring and managing the cluster by accessing http://master_ip:50070 from a web browser on both machines or any other machine in the network that can access it; start hdfs web界面 for monitoring and managing the cluster by accessing http://master_ip:50075 from a web browser on both machines or any other machine in the network that can access it; start yarn web界面 formonitoringandmanagingtheclusterbyaccessinghttp://master_ip:8088fromawebbrowseronbothmachinesinthenetworkthatcanaccessit; start mapreduce web界面 formonitoring _and _managing_theclusterbyaccessinghttp://master_ip:8042fromawebbrowseronbothmachinesinthenetworkthatcanaccessit; start hdfs shell by typing hdfs dfsadmin into the command line interface on both machines, and then type exit; start yarn shell by typing yarn jar into the command line interface on both machines, and then type exit; start mapreduce shell by typing mapreduce jar into the command line interface on both machines, and then type exit; start web界面 for monitoring and managing the cluster by accessing http://master_ip:50070 from a web浏览器 on both machines or any other machine in the network that can access it; start hdfs web界面 for monitoring and managing the cluster by accessing http://master_ip:50075 from a web browser on both machines or any other machine in the network that can access it; start yarn web界面 formonitoringandmanagingtheclusterbyaccessinghttp://master_ip:8088fromawebbrowseronbothmachinesinthenetworkthatcanaccessit; start mapreduce web界面 formonitoring _and _managing_theclusterbyaccessinghttp://master_ip:8

原创文章,作者:K-seo,如若转载,请注明出处:https://www.kdun.cn/ask/194080.html

(0)
K-seoK-seoSEO优化员
上一篇 2024年1月3日 03:15
下一篇 2024年1月3日 03:16

相关推荐

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注

免备案 高防CDN 无视CC/DDOS攻击 限时秒杀,10元即可体验  (专业解决各类攻击)>>点击进入