當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

Spark Streaming（二）Flume

發布時間：2024/9/18 编程问答 26 豆豆

生活随笔收集整理的這篇文章主要介紹了 Spark Streaming（二）Flume 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

現狀分析

如何解決我們的數據從其他的server上移動到Hadoop之上

腳本shell cp到Hadoop集群的機器上，然后使用hadoop fs -put命令傳到hadoop上【問題：1.這種方法如何做監控，2.文本數據的傳輸對于磁盤的開銷非常大 3. 必須要指定一個間隔的時間，比如每隔1分鐘拷貝一次，這樣時效性不好 4. 如何做容錯和負載均衡】

使用Flume。容錯、負載均衡、高延遲、壓縮在flume中都有很好的解決。只需要寫config就可以了

Flume概述

Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.

主要包括收集(collecting)、聚合(aggregating)、移動(moving)功能。
也就是說webserver(源端)可以通過Flume移動到HDFS(目的端)中。

Flume架構

source 收集

channel 聚集

sink 輸出

業界同類產品對比

(常用)Flume：Apache項目，采用java進行開發
Scribe：Facebook項目，采用C/C++開發，負載均衡與容錯不是很好。目前不再維護。
Chukwa：Yahoo/Apache項目，采用java開發，不再維護
Fluentd：與Flume類似，采用Ruby開發
(常用)Logstash：ELK（Elasticsearch+Logstash+Kibana）

安裝Flume

前置條件

java1.8以上

足夠內存，供source，channel，sink使用

磁盤空間足夠

文件目錄權限

下載安裝Flume

下載Flume：下載CDH5.15.1版本，這里

解壓：tar -xvf flume-ng-1.6.0-cdh5.15.1.tar.gz -C ~/app/
目錄結構：

添加環境變量，~/.bashrc 內容如下：

# FLUME_HOME 1.6.0 FLUME_HOME=/home/iie4bu/app/apache-flume-1.6.0-cdh5.15.1-binPATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$SPARK_HOME/bin:$FLUME_HOME/bin:$PATHexport PATH=$PATH

然后執行source ~/.bashrc使得環境變量生效

配置Flume

在conf目錄下執行cp flume-env.sh.template flume-env.sh
添加export JAVA_HOME=/home/iie4bu/app/jdk1.8.0_101
檢查運行情況，在bin目錄下執行：./flume-ng version

Flume實戰

需求1

從指定網絡端口采集數據輸出到控制臺。

配置

使用Flume的關鍵就是寫配置文件

A) 配置Source B) 配置Channel C) 配置Sink D) 把以上三個組件串起來 a1: agent名稱 r1: source的名稱 k1: sink的名稱 c1: channel的名稱

在conf目錄下新建一個example.conf, 內容如下：

# Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1# Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444# Describe the sink a1.sinks.k1.type = logger# Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100# Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

啟動agent

bin/flume-ng agent --name a1 --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/example.conf -Dflume.root.logger=INFO,console

其中：

--name: 表示agent的名稱

–conf: 表示 $FLUME_HOME/conf

–conf-file: 指定自己寫的配置文件

-D 表示JDK的一些參數。-Dflume.root.logger=INFO,console

使用nc進行測試

重新開一個窗口，使用nc命令在本地測試44444端口

在Flume中可以看到輸出結果：

在輸出的日志中可以看到Event：
Event: { headers:{} body: 61 62 63 abc }
這個Event就是Flume中數據傳輸的基本單元。

需求2

監控一個文件實時采集新增的數據輸出到控制臺

配置
Agent選型：exec source + memory channel + logger sink
在$FLUME_HOME/conf目錄下新建exec-memory-logger.conf配置文件，內容如下：

# Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1# Describe/configure the source a1.sources.r1.type = exec a1.sources.r1.command = tail -F /home/iie4bu/data/hello.txt a1.sources.r1.shell = /bin/sh -c# Describe the sink a1.sinks.k1.type = logger# Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100# Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1

啟動agent

bin/flume-ng agent --name a1 --conf $FLUME_HOME/conf --conf-file /home/iie4bu/app/apache-flume-1.6.0-cdh5.15.1-bin/conf/exec-memory-logger.conf -Dflume.root.logger=INFO,console

這樣就實現了監聽/home/iie4bu/data/hello.txt文件，并把里面的內容輸出到控制臺。

需求3

將A服務器上的日志實時采集到B服務器。

Agent選型

機器A上的agent選型：exec source + memory channel + avro sink
機器B上的agent選型：avro source + memory channel + logger sink

配置Agent
新建exec-memory-avro.conf，內容如下：

exec-memory-avro.sources = exec-source exec-memory-avro.sinks = avro-sink exec-memory-avro.channels = memory-channel# Describe/configure the source exec-memory-avro.sources.exec-source.type = exec exec-memory-avro.sources.exec-source.command = tail -F /home/iie4bu/data/hello.txt exec-memory-avro.sources.exec-source.shell = /bin/sh -c# Describe the sink exec-memory-avro.sinks.avro-sink.type = avro exec-memory-avro.sinks.avro-sink.hostname = localhost exec-memory-avro.sinks.avro-sink.port = 44444# Use a channel which buffers events in memory exec-memory-avro.channels.memory-channel.type = memory exec-memory-avro.channels.memory-channel.capacity = 1000 exec-memory-avro.channels.memory-channel.transactionCapacity = 100# Bind the source and sink to the channel exec-memory-avro.sources.exec-source.channels = memory-channel exec-memory-avro.sinks.avro-sink.channel = memory-channel

新建avro-memory-logger.conf，內容如下：

avro-memory-logger.sources = avro-source avro-memory-logger.sinks = logger-sink avro-memory-logger.channels = memory-channel# Describe/configure the source avro-memory-logger.sources.avro-source.type = avro avro-memory-logger.sources.avro-source.bind = localhost avro-memory-logger.sources.avro-source.port = 44444# Describe the sink avro-memory-logger.sinks.logger-sink.type = logger# Use a channel which buffers events in memory avro-memory-logger.channels.memory-channel.type = memory avro-memory-logger.channels.memory-channel.capacity = 1000 avro-memory-logger.channels.memory-channel.transactionCapacity = 100# Bind the source and sink to the channel avro-memory-logger.sources.avro-source.channels = memory-channel avro-memory-logger.sinks.logger-sink.channel = memory-channel

啟動agent

這里要注意啟動順序。
首先啟動avro-memory-logger.conf：bin/flume-ng agent --name avro-memory-logger --conf $FLUME_HOME/conf --conf-file /home/iie4bu/app/apache-flume-1.6.0-cdh5.15.1-bin/conf/avro-memory-logger.conf -Dflume.root.logger=INFO,console

然后再啟動exec-memory-avro.conf：bin/flume-ng agent --name exec-memory-avro --conf $FLUME_HOME/conf --conf-file /home/iie4bu/app/apache-flume-1.6.0-cdh5.15.1-bin/conf/exec-memory-avro.conf -Dflume.root.logger=INFO,console

當我們給/home/iie4bu/data/hello.txt文件添加內容時，在avro-memory-logger.conf就會打印輸出響應：

延時

這里兩個Agent之間會有一定的延時，因為channel是基于內存，有大小設置，到了一定的時間才會進行相應的操作。

總結日志收集過程

機器A上監控一個文件，當我們訪問主站時會有用戶行為日志記錄到access.log中

avro sink把新產生的日志輸出到對應的機器B的hostname和port上

通過機器B上的avro source對應的agent將我們的日志輸出到控制臺（Kafka）

總結

以上是生活随笔為你收集整理的Spark Streaming（二）Flume的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：「重点」枇杷成熟的季节是几月
下一篇：这款应用能让所有安卓手机与PC多屏协同安