在国产OpenEuler 24.03上,我这样用3台虚拟机搭起了Hadoop 3.3.4集群(附完整脚本)

张开发
2026/4/8 2:45:07 15 分钟阅读

分享文章

在国产OpenEuler 24.03上,我这样用3台虚拟机搭起了Hadoop 3.3.4集群(附完整脚本)
在国产OpenEuler 24.03上构建Hadoop 3.3.4集群的实战指南1. 环境准备与系统配置在国产OpenEuler 24.03系统上部署Hadoop集群前需要完成一系列基础环境配置工作。不同于常见的CentOS或Ubuntu系统OpenEuler作为国产操作系统的代表在软件包管理和系统配置上有其独特之处。首先需要准备三台配置相同的虚拟机建议每台分配至少4GB内存和50GB存储空间。以下是详细的配置步骤1.1 系统安装与基础配置安装OpenEuler 24.03从官网下载ISO镜像创建虚拟机时选择服务器安装模式分区方案建议/boot 1GBswap 4GB/ 剩余空间创建普通用户hadoop并授予sudo权限网络配置# 修改网卡配置 sudo vim /etc/sysconfig/network-scripts/ifcfg-ens33关键参数设置BOOTPROTOstatic ONBOOTyes IPADDR10.90.100.101 # 根据实际规划修改 GATEWAY10.90.100.2 DNS1114.114.114.114主机名与hosts文件# 设置主机名 sudo hostnamectl set-hostname node1 # 编辑hosts文件 sudo vim /etc/hosts添加内容10.90.100.101 node1 10.90.100.102 node2 10.90.100.103 node31.2 系统优化与依赖安装OpenEuler系统默认配置需要进行一些优化调整关闭防火墙和SELinuxsudo systemctl stop firewalld sudo systemctl disable firewalld sudo setenforce 0 sudo sed -i s/SELINUXenforcing/SELINUXdisabled/g /etc/selinux/config安装必要依赖sudo dnf install -y rsync openssh-clients vim wget配置SSH免密登录ssh-keygen -t rsa ssh-copy-id node1 ssh-copy-id node2 ssh-copy-id node32. JDK安装与环境配置Hadoop 3.3.4需要Java 8或11运行环境这里选择JDK 11作为基础环境。2.1 JDK安装步骤下载并解压JDKwget https://download.java.net/openjdk/jdk11/ri/openjdk-1128_linux-x64_bin.tar.gz sudo mkdir -p /opt/software sudo tar -xzf openjdk-1128_linux-x64_bin.tar.gz -C /opt/software/ sudo ln -s /opt/software/jdk-11 /opt/software/jdk环境变量配置sudo vim /etc/profile.d/java.sh添加内容export JAVA_HOME/opt/software/jdk export PATH$JAVA_HOME/bin:$PATH验证安装source /etc/profile.d/java.sh java -version2.2 集群同步配置使用rsync工具将JDK配置同步到其他节点# 创建分发脚本xsync cat ~/bin/xsync EOF #!/bin/bash if [ $# -lt 1 ]; then echo Usage: xsync file_or_dir exit 1 fi for host in node2 node3; do rsync -avz $1 $host:$1 done EOF chmod x ~/bin/xsync # 同步JDK和环境变量 xsync /opt/software/jdk-11 xsync /opt/software/jdk xsync /etc/profile.d/java.sh3. Hadoop集群部署3.1 Hadoop安装与配置下载并解压Hadoopwget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz sudo tar -xzf hadoop-3.3.4.tar.gz -C /opt/software/ sudo ln -s /opt/software/hadoop-3.3.4 /opt/software/hadoop配置环境变量sudo vim /etc/profile.d/hadoop.sh添加内容export HADOOP_HOME/opt/software/hadoop export PATH$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin export HADOOP_CONF_DIR$HADOOP_HOME/etc/hadoop核心配置文件修改core-site.xmlconfiguration property namefs.defaultFS/name valuehdfs://node1:9820/value /property property namehadoop.tmp.dir/name value/opt/software/hadoop/data/value /property /configurationhdfs-site.xmlconfiguration property namedfs.replication/name value3/value /property property namedfs.namenode.http-address/name valuenode1:9870/value /property property namedfs.namenode.secondary.http-address/name valuenode3:9868/value /property /configurationyarn-site.xmlconfiguration property nameyarn.resourcemanager.hostname/name valuenode2/value /property property nameyarn.nodemanager.aux-services/name valuemapreduce_shuffle/value /property /configurationworkers文件node1 node2 node33.2 集群同步与验证同步Hadoop配置到集群xsync /opt/software/hadoop-3.3.4 xsync /opt/software/hadoop xsync /etc/profile.d/hadoop.sh格式化HDFS仅在首次执行hdfs namenode -format启动集群# 在node1启动HDFS start-dfs.sh # 在node2启动YARN start-yarn.sh验证服务状态# 检查进程 jps # 浏览器访问 # HDFS: http://node1:9870 # YARN: http://node2:80884. 集群管理脚本与优化4.1 自动化管理脚本创建集群管理脚本hdp.sh简化操作cat ~/bin/hdp.sh EOF #!/bin/bash case $1 in start) echo Starting Hadoop cluster... ssh node1 start-dfs.sh ssh node2 start-yarn.sh ;; stop) echo Stopping Hadoop cluster... ssh node2 stop-yarn.sh ssh node1 stop-dfs.sh ;; status) echo Cluster status: for node in node1 node2 node3; do echo $node ssh $node jps done ;; *) echo Usage: hdp.sh [start|stop|status] ;; esac EOF chmod x ~/bin/hdp.sh4.2 性能优化建议内存配置调整修改yarn-site.xmlproperty nameyarn.nodemanager.resource.memory-mb/name value4096/value /property property nameyarn.scheduler.maximum-allocation-mb/name value4096/value /propertyHDFS块大小调整修改hdfs-site.xmlproperty namedfs.blocksize/name value256m/value /property日志聚集配置修改yarn-site.xmlproperty nameyarn.log-aggregation-enable/name valuetrue/value /property4.3 常见问题排查节点无法加入集群检查SSH免密登录是否配置正确验证hosts文件配置是否一致查看防火墙状态是否关闭HDFS格式化失败# 清理临时目录后重新格式化 rm -rf /opt/software/hadoop/data/* hdfs namenode -formatYARN资源分配问题检查yarn.nodemanager.resource.memory-mb设置验证物理内存是否足够在实际部署过程中根据硬件配置调整参数是关键。OpenEuler系统与Hadoop的兼容性良好但需要注意某些系统级配置可能与常见Linux发行版存在差异。

更多文章