第十二节企业开发案例之聚合 - 余老师带你学习大数据框架全栈

聚合

实验目的: app-11 上的 Flume-1 监控文件 /hadoop/test/group.log，app-12 上的 Flume-2 监控某一个端口的数据流，Flume-1与 Flume-2 将数据发送给 app-13 上的 Flume-3，Flume-3 将最终数据打印到控制台 实验分析: 在这里插入图片描述

实验步骤：一、实验前准备 1.新建FlumeSqoopC1,C2,C3 在这里插入图片描述

进入C1, 一)进行三台机器认证切换到root用户。命令：sudo /bin/bash 在这里插入图片描述进入hadoop目录下并查看有哪些文件夹。命令：cd /hadoop/ 运行initHost.sh脚本，进行三台机器的认证：./initHosts.sh 命令：./initHosts.sh

二)启动集群 1切换到hadoop用户，（密码Yhf_1018 ）命令：su – hadoop 在这里插入图片描述 2切换到hadoop根目录下命令：cd /hadoop/ 3启动startAll.sh 命令：./startAll.sh 这个脚本里包含这三台机器所有的启动命令三) 1.新建test文件夹，切换到此文件夹下新建group.log文件。命令：mkdir test cd test touch group.log 在这里插入图片描述 2.切换到 Flume/apache-flume-1.9.0-bin/目录下，新建job文件夹命令：cd .. cd Flume/apache-flume-1.9.0-bin/ mkdir job

3.在app-12,app-13的Flume/apache-flume-1.9.0-bin/目录下，新建job文件夹命令：ssh hadoop@app-12 "cd /hadoop/Flume/apache-flume-1.9.0-bin/ && mkdir job" ssh hadoop@app-13 "cd /hadoop/Flume/apache-flume-1.9.0-bin/ && mkdir job” 在这里插入图片描述

二、开始实验 app-11 1.切换到job目录下命令：cd job 在这里插入图片描述 2.创建 f1.conf配置文件配置 Source 用于监控group.log 文件，配置 Sink 输出数据到下一级 Flume。命令：vi f1.conf 输入a或i进行编辑，在文件中添加以下内容。

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /hadoop/test/group.log
a1.sources.r1.shell = /bin/bash -c
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = app-13
a1.sinks.k1.port = 4141
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3.安装nc工具命令：sudo yum install -y nc 在这里插入图片描述安装成功。 app-12 4.免密登录app-12 命令：ssh app-12

切换到/hadoop/Flume/apache-flume-1.9.0-bin/job目录下，创建 f2.conf配置文件命令：cd /hadoop/Flume/apache-flume-1.9.0-bin/job vi f2.conf

配置 Source 监控端口 44444 数据流，配置 Sink 数据到下一级 Flume 输入a或i进行编辑，在文件中添加以下内容。

# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1
# Describe/configure the source
a2.sources.r1.type = netcat
a2.sources.r1.bind = app-12
a2.sources.r1.port = 44444
# Describe the sink
a2.sinks.k1.type = avro
a2.sinks.k1.hostname =app-13
a2.sinks.k1.port = 4141
# Use a channel which buffers events in memory
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

点击Ecs退出编辑，:wq保存退出 5.安装nc工具命令： sudo yum install -y nc

app-13 6.免密登录app-13 命令：ssh app-13 切换到/hadoop/Flume/apache-flume-1.9.0-bin/job目录下，创建 f2.conf配置文件配置 source 用于接收 flume1 与 flume2 发送过来的数据流，最终合并后 sink 到控制台。命令：cd /hadoop/Flume/apache-flume-1.9.0-bin/job vi f3.conf 在这里插入图片描述

输入a或i进行编辑，在文件中添加以下内容。

# Name the components on this agent
a3.sources = r1
a3.sinks = k1
a3.channels = c1
# Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.bind = app-13
a3.sources.r1.port = 4141
# Describe the sink
a3.sinks.k1.type = logger
# Describe the channel
a3.channels.c1.type = memory
a3.channels.c1.capacity = 1000
a3.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a3.sources.r1.channels = c1
a3.sinks.k1.channel = c1

点击Ecs退出编辑，:wq保存退出 7.安装nc工具命令：sudo yum install -y nc

8.执行配置文件在app-11,app-12,app-13中的cd /hadoop/Flume/apache-flume-1.9.0-bin/目录下，分别开启对应配置文件f1.conf、 f2.conf、f3.conf 在这里插入图片描述

app-13: 命令：flume-ng agent --name a3 --conf-file job/f3.conf -Dflume.root.logger=INFO,console app-12: 命令：flume-ng agent --name a2 --conf-file job/f2.conf app-11: 命令：flume-ng agent --name a1 --conf-file job/f1.conf 9.重新开一个终端，登录app-12 命令：ssh app-12 上向 44444 端口发送数据命令：nc app-12 44444 输入bcd,回车在这里插入图片描述

10.在 app-11上/hadoop/test 目录下的 group.log 追加内容命令：cd test echo abc >> group.log 在这里插入图片描述

11.检查app-13 上的数据在这里插入图片描述详细学习内容可观看Spark快速大数据处理扫一扫~~~或者引擎搜索Spark余海峰