搭建hadoop分布式环境

安装环境

  • CentOS 8
  • JDK:1.8_231
  • hadoop:3.2.1

主机准备

  • 主机名:hadoop

  • vi /etc/hostname

  • 主机名与IP地址映射:

    1
    2
    vi /etc/hosts
    IP地址 hadoop
  • 防火墙关闭

    1
    systemctl status firewalld
  • 创建hadoop用户

    1
    2
    useradd hadoop
    passwd hadoop
  • hadoop用户环境变量,配置JDK; /home/hadoop/.bash_profile

安装Hadoop

  • 配置hadoop用户免密登录(ssh)

    1. ```bash
      cd ~
      ssh-keygen -t rsa
      cd .ssh
      cat id_rsa.pub >> authorized_keys
      chmod 600 authorized_keys
      chmod 700 ~/.ssh/
      1
      2
      3
      4
      5
      6

      2. 验证免密

      ```bash
      ssh hadoop
      exit
  • 上传文件

  • 解压,移动到/usr/

    1
    2
    3
    4
    tar zxvf hadoop-3.2.1.tar.gz
    su
    mv /home/hadoop/tools/hadoop-3.2.1 /usr/
    su - hadoop
  • hadoop环境变量 ~/.bash_profile

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    $ vi ~/.bash_profile

    JAVA_HOME=/usr/jdk1.8.0_231
    HADOOP_HOME=/usr/hadoop-3.2.1
    PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$PATH

    export JAVA_HOME
    export HADOOP_HOME
    export PATH

    $ source .bash_profile
  • hadoop的基本配置文件,hadoop-env.sh

    1
    2
    3
    4
    5
    $ cd /usr/hadoop-3.2.1/etc/hadoop

    $ vi hadoop-env.sh

    54 export JAVA_HOME=/usr/jdk1.8.0_231
  • 测试基本配置是否完成

    1
    2
    3
    4
    5
    6
    7
    hadoop version
    Hadoop 3.2.1
    Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
    Compiled by rohithsharmaks on 2019-09-10T15:56Z
    Compiled with protoc 2.5.0
    From source with checksum 776eaf9eee9c0ffc370bcbc1888737
    This command was run using /usr/hadoop-3.2.1/share/hadoop/common/hadoop-common-3.2.1.jar
  • 准备目录 /usr/local/hadoop,用于job执行临时目录,和数据存储

    1
    2
    cd /usr/local/
    chown hadoop:hadoop hadoop
  • 核心配置

    1. core-site.xml $HADOOP_HOME/etc/hadoop/core-site.xml

      1
      2
      $ vi /usr/hadoop-3.2.1/etc/hadoop/core-site.xml 

      fs.defaultFS hdfs://hadoop:9000 hadoop.tmp.dir /usr/local/hadoop/tmp
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      2. hdfs-site.xml	`$HADOOP_HOME/etc/hadoop`

      ```xml
      <!-- 文件存储在hdfs上的副本数量-->
      <property>
      <name>dfs.replication</name>
      <value>1</value>
      </property>
      <!-- hdfs web监听端口-->
      <property>
      <name>dfs.namenode.http-address</name>
      <value>hadoop:9870</value>
      </property>


      <!-- namenode数据存储路径 -->
      <property>
      <name>dfs.namenode.name.dir</name>
      <value>/usr/local/hadoop/dfs/name</value>
      </property>

      <!-- datanode数据存储路径 -->
      <property>
      <name>dfs.datanode.data.dir</name>
      <value>/usr/local/hadoop/dfs/data</value>
      </property>
    2. mapred-site.xml

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      <property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
      </property>

      <property>
      <name>mapreduce.application.classpath</name>
      <value>
      /usr/hadoop-3.2.1/etc/hadoop:/usr/hadoop-3.2.1/share/hadoop/common/lib/*:/usr/hadoop-3.2.1/share/hadoop/common/*:/usr/hadoop-3.2.1/share/hadoop/hdfs:/usr/hadoop-3.2.1/share/hadoop/hdfs/lib/*:/usr/hadoop-3.2.1/share/hadoop/hdfs/*:/usr/hadoop-3.2.1/share/hadoop/mapreduce/lib/*:/usr/hadoop-3.2.1/share/hadoop/mapreduce/*:/usr/hadoop-3.2.1/share/hadoop/yarn:/usr/hadoop-3.2.1/share/hadoop/yarn/lib/*:/usr/hadoop-3.2.1/share/hadoop/yarn/*
      </value>
      </property>
    3. yarn-site.xml

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      <property>
      <name>yarn.resourcemanager.hostname</name>
      <value>hadoop</value>
      </property>
      <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
      </property>
      <property>
      <name>mapreduce.application.classpath</name>
      <value>
      /usr/hadoop-3.2.1/etc/hadoop:/usr/hadoop-3.2.1/share/hadoop/common/lib/*:/usr/hadoop-3.2.1/share/hadoop/common/*:/usr/hadoop-3.2.1/share/hadoop/hdfs:/usr/hadoop-3.2.1/share/hadoop/hdfs/lib/*:/usr/hadoop-3.2.1/share/hadoop/hdfs/*:/usr/hadoop-3.2.1/share/hadoop/mapreduce/lib/*:/usr/hadoop-3.2.1/share/hadoop/mapreduce/*:/usr/hadoop-3.2.1/share/hadoop/yarn:/usr/hadoop-3.2.1/share/hadoop/yarn/lib/*:/usr/hadoop-3.2.1/share/hadoop/yarn/*
      </value>
      </property>
    4. workers

      1
      hadoop

启动Hadoop集群

  • 首次启动hadoop,必须namenode格式化,命令如下:

    1
    2
    3
    $ hdfs namenode -format

    2020-07-07 14:25:59,234 INFO common.Storage: Storage directory /usr/local/hadoop/dfs/name has been successfully formatted.
  • 启动两个服务(HDFS、yarn),命令如下:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    $ start-dfs.sh

    $ start-yarn.sh

    $ jps

    14337 Jps
    11608 jar
    13610 SecondaryNameNode
    13867 ResourceManager
    13388 DataNode
    13261 NameNode
    13997 NodeManager
  • HDFS管理命令

  1. hdfs安全模式查看

    1
    2
    3
    4
    5
    $ hdfs dfsadmin -safemode get

    Safe mode is OFF

    # 注意:Safe mode is OFF,说明HDFS安全模式已经关闭,实现对数据的读写操作
  2. 查看根目录结构

    1
    $ hdfs dfs -ls /
  3. 创建目录

    1
    2
    3
    $ hdfs dfs -mkdir /data

    $ hdfs dfs -ls /
  4. 递归创建目录

    1
    2
    3
    $ hdfs dfs -mkdir -p /data/subdata/input

    $ hdfs dfs -ls -R /
  5. 上传本地文件到htfs目录

    1
    2
    3
    $ hdfs dfs -put jdk-8u231-linux-x64.tar.gz /data/

    $ hdfs dfs -ls /data
  6. 下载hdfs数据文件到本地操作系统

    1
    2
    3
    $ hdfs dfs -get /data/jdk-8u231-linux-x64.tar.gz ./

    $ ll
  7. 复制文件

    1
    $ hdfs dfs -cp /data/jdk-8u231-linux-x64.tar.gz /data/subdata/jdk.tar.gz
  8. 删除文件

    1
    $ hdfs dfs -rm -r /data/subdata/

HDFS管理命令

  • 安全模式

    1
    2
    3
    $ hdfs dfsadmin -safemode get

    # hdfs dfsadmin -safemode get|enter|leave|wait
  • report命令

    1
    $ hdfs dfsadmin -report
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    Configured Capacity: 37558423552 (34.98 GB)
    Present Capacity: 29635686400 (27.60 GB)
    DFS Remaining: 29440000000 (27.42 GB)
    DFS Used: 195686400 (186.62 MB)
    DFS Used%: 0.66%
    Replicated Blocks:
    Under replicated blocks: 0
    Blocks with corrupt replicas: 0
    Missing blocks: 0
    Missing blocks (with replication factor 1): 0
    Low redundancy blocks with highest priority to recover: 0
    Pending deletion blocks: 0
    Erasure Coded Block Groups:
    Low redundancy block groups: 0
    Block groups with corrupt internal blocks: 0
    Missing block groups: 0
    Low redundancy blocks with highest priority to recover: 0
    Pending deletion blocks: 0

    -------------------------------------------------
    Live datanodes (1):

    Name: 192.168.1.103:9866 (hadoop)
    Hostname: hadoop
    Decommission Status : Normal
    Configured Capacity: 37558423552 (34.98 GB)
    DFS Used: 195686400 (186.62 MB)
    Non DFS Used: 7922737152 (7.38 GB)
    DFS Remaining: 29440000000 (27.42 GB)
    DFS Used%: 0.52%
    DFS Remaining%: 78.38%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Tue Jul 07 14:54:09 CST 2020
    Last Block Report: Tue Jul 07 14:28:45 CST 2020
    Num of Blocks: 2

HDFS回收站

  • 默认关闭

  • 启动回收站

  • core-site.xml

    1
      

搭建hadoop分布式环境

https://blog.luzy.top/posts/3506883051/

作者

江风引雨

发布于

2020-07-22

更新于

2023-01-10

许可协议

CC BY 4.0

评论