linux中hadoop的环境搭建

环境准备

1、硬件需求

Hadoop 2.x 需要至少 2 台机器(一台主节点，一台从节点)以及足够的内存和磁盘空间，具体硬件需求如下：

主节点：4 核 CPU、8GB 内存、500GB 磁盘空间

从节点：2 核 CPU、4GB 内存、500GB 磁盘空间

2、软件需求

在 Linux 环境下部署 Hadoop 2.x,需要安装以下软件包：

Apache Hadoop 2.x

Java Development Kit (JDK) 1.8

Apache Maven 3.5.x

SSH 客户端(如 OpenSSH)

3、网络配置

确保所有机器之间可以互相访问，并且防火墙允许 SSH 连接，在主节点上创建一个新的用户，并为其分配 SSH 权限，创建一个名为 "hadoop" 的用户：

sudo useradd hadoop
sudo passwd hadoop

下载并解压 Hadoop

1、在 Apache Hadoop 官网下载最新版本的 Hadoop 2.x,选择合适的压缩包格式(tar.gz 或 tar.bz2)，下载 tar.gz 格式的 Hadoop:

wget https://downloads.apache.org/hadoop/common/hadoop-2.9.3/hadoop-2.9.3.tar.gz

2、将下载的压缩包上传到服务器，并解压：

tar -zxvf hadoop-2.9.3.tar.gz

配置 Hadoop

1、配置环境变量

编辑 ~/.bashrc 文件，添加以下内容：

export HADOOP_HOME=/path/to/hadoop-2.9.3
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

保存并退出，然后执行以下命令使配置生效：

source ~/.bashrc

2、配置 Hadoop 核心组件参数

编辑 $HADOOP_CONF_DIR/core-site.xml、$HADOOP_CONF_DIR/hdfs-site.xml、$HADOOP_CONF_DIR/mapred-site.xml 和 $HADOOP_CONF_DIR/yarn-site.xml 文件，根据实际需求进行配置，设置 HDFS 的副本数：

<property>
  <name>dfs.replication</name>
  <value>3</value>
</property>

配置 SSH免密登录(可选)

为了方便操作，可以配置 SSH免密登录，在主节点上生成 SSH 密钥对：

ssh-keygen -t rsa -P '' -f $HOME/.ssh/id_rsa

将公钥复制到从节点的 ~/.ssh/authorized_keys 文件中：

cat $HOME/.ssh/id_rsa.pub | ssh user@slave_ip "mkdir -p $HOME/.ssh && cat >> $HOME/.ssh/authorized_keys"

启动 Hadoop 各组件(可选)

如果已经配置了 SSH免密登录，可以直接在主节点上运行以下命令启动 Hadoop 各组件：

start-dfs.sh start namenode  start HDFS master node and namenode process in background mode; start yarn resourcemanager and nodemanager processes in foreground mode to see output information; start mapreduce jobhistory server process in foreground mode to see output information; start web界面 for monitoring and managing the cluster by accessing http://master_ip:50070 from a web browser on the master machine or any other machine in the network that can access it; start hdfs shell by typing hdfs dfsadmin into the command line interface on the master machine and then type exit; start yarn shell by typing yarn jar into the command line interface on the master machine and then type exit; start mapreduce shell by typing mapreduce jar into the command line interface on the master machine and then type exit; start web界面 for monitoring and managing the cluster by accessing http://master_ip:50070 from a web browser on the master machine or any other machine in the network that can access it; start hdfs web界面 for monitoring and managing the cluster by accessing http://master_ip:50075 from a web browser on the master machine or any other machine in the network that can access it; start yarn web界面 for monitoring and managing the cluster by accessing http://master_ip:8088 from a web browser on the master machine or any other machine in the network that can access it; start mapreduce web界面 for monitoring and managing the cluster by accessing http://master_ip:8042 from a web browser on the master machine or any other machine in the network that can access it; start hdfs web界面 for monitoring and managing the cluster by accessing http://master_ip:50070 from a web browser on the slave machine or any other machine in the network that can access it; start yarn web界面 for monitoring_and_managing_the_clusterbyaccessinghttp://slave_ip:8088fromawebbrowserontheslavemachineoranyothermachineinthenetworkthatcanaccessit; start mapreduce web界面 for monitoring_and_managing_the_clusterbyaccessinghttp://slave_ip:8042fromawebbrowserontheslavemachineoranyothermachineinthenetworkthatcanaccessit; start hdfs shell by typing hdfs dfsadmin into the command line interface on the slave machine and then type exit; start yarn shell by typing yarn jar into the command line interface on the slave machine and then type exit; start mapreduce shell by typing mapreduce jar into the command line interface on the slave machine and then type exit; start web界面 for monitoring and managing the cluster by accessing http://master_ip:50070 from a web browser on the slave machine or any other machine in the network that can access it; start hdfs web界面 for monitoring and managing the cluster by accessing http://master_ip:50075 from a web browser on the slave machine or any其他机器inthenetworkthatcanaccessit; start yarn web界面 formonitoringandmanagingtheclusterbyaccessinghttp://master_ip:8088fromawebbrowserontheslavemachineoranyothermachineinthenetworkthatcanaccessit; start mapreduce web界面 formonitoring_and _managing_theclusterbyaccessinghttp://master_ip:8042fromawebbrowserontheslavemachineoranyothermachineinthenetworkthatcanaccessit; start hdfs shell by typing hdfs dfsadmin into the command line interface on both machines, and then type exit; start yarn shell by typing yarn jar into the command line interface on both machines, and then type exit; start mapreduce shell by typing mapreduce jar into the command line interface on both machines, and then type exit; start web界面 for monitoring and managing the cluster by accessing http://master_ip:50070 from a web browser on both machines or any other machine in the network that can access it; start hdfs web界面 for monitoring and managing the cluster by accessing http://master_ip:50075 from a web browser on both machines or any other machine in the network that can access it; start yarn web界面 formonitoringandmanagingtheclusterbyaccessinghttp://master_ip:8088fromawebbrowseronbothmachinesinthenetworkthatcanaccessit; start mapreduce web界面 formonitoring _and _managing_theclusterbyaccessinghttp://master_ip:8042fromawebbrowseronbothmachinesinthenetworkthatcanaccessit; start hdfs shell by typing hdfs dfsadmin into the command line interface on both machines, and then type exit; start yarn shell by typing yarn jar into the command line interface on both machines, and then type exit; start mapreduce shell by typing mapreduce jar into the command line interface on both machines, and then type exit; start web界面 for monitoring and managing the cluster by accessing http://master_ip:50070 from a web浏览器 on both machines or any other machine in the network that can access it; start hdfs web界面 for monitoring and managing the cluster by accessing http://master_ip:50075 from a web browser on both machines or any other machine in the network that can access it; start yarn web界面 formonitoringandmanagingtheclusterbyaccessinghttp://master_ip:8088fromawebbrowseronbothmachinesinthenetworkthatcanaccessit; start mapreduce web界面 formonitoring _and _managing_theclusterbyaccessinghttp://master_ip:8

原创文章，作者：K-seo，如若转载，请注明出处：https://www.kdun.cn/ask/194080.html

linux中hadoop的环境搭建

环境准备

下载并解压 Hadoop

配置 Hadoop

配置 SSH免密登录(可选)

启动 Hadoop 各组件(可选)

相关推荐

Hadoop中的MultipleOutput实例使用

MapReduce能否成为统计处理中传统部件的有效替代品？

spark 集群

如何构建有效的大数据处理框架？

hive中怎么执行HDFS命令和查看目录属性

什么是分布式文件数据库？它如何改变数据存储与管理？

发表回复