當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

flume案例-文件数据采集-步骤分析

發(fā)布時(shí)間：2024/4/13 编程问答 26 豆豆

生活随笔收集整理的這篇文章主要介紹了 flume案例-文件数据采集-步骤分析小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

采集文件到 HDFS

需求

比如業(yè)務(wù)系統(tǒng)使用log4j生成的日志，日志內(nèi)容不斷增加，需要把追加到日志文件中的數(shù)據(jù)實(shí)時(shí)采集到hdfs

分析

根據(jù)需求，首先定義以下3大要素

采集源，即source——監(jiān)控文件內(nèi)容更新 : exec ‘tail -F file’

下沉目標(biāo)，即sink——HDFS文件系統(tǒng) : hdfs sink

Source和sink之間的傳遞通道——channel，可用file channel 也可以用內(nèi)存channel

定義 Flume 配置文件

cd /export/servers/apache-flume-1.8.0-bin/conf vim tail-file.conf agent1.sources = source1 agent1.sinks = sink1 agent1.channels = channel1 # Describe/configure tail -F source1 agent1.sources.source1.type = exec agent1.sources.source1.command = tail -F /export/servers/taillogs/access_log agent1.sources.source1.channels = channel1 # Describe sink1 agent1.sinks.sink1.type = hdfs #a1.sinks.k1.channel = c1 agent1.sinks.sink1.hdfs.path = hdfs://node01:8020/weblog/flume-collection/%y-%m-%d/%H-% agent1.sinks.sink1.hdfs.filePrefix = access_log agent1.sinks.sink1.hdfs.maxOpenFiles = 5000 agent1.sinks.sink1.hdfs.batchSize= 100 agent1.sinks.sink1.hdfs.fileType = DataStream agent1.sinks.sink1.hdfs.writeFormat =Text agent1.sinks.sink1.hdfs.round = true agent1.sinks.sink1.hdfs.roundValue = 10 agent1.sinks.sink1.hdfs.roundUnit = minute agent1.sinks.sink1.hdfs.useLocalTimeStamp = true # Use a channel which buffers events in memory agent1.channels.channel1.type = memory agent1.channels.channel1.keep-alive = 120 agent1.channels.channel1.capacity = 500000 agent1.channels.channel1.transactionCapacity = 600 # Bind the source and sink to the channel agent1.sources.source1.channels = channel1 agent1.sinks.sink1.channel = channel1

總結(jié)

以上是生活随笔為你收集整理的flume案例-文件数据采集-步骤分析的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： flume案例-网络数据采集-启动flu
下一篇： flume案例-文件数据采集-运行测试