Flume数据存储到BOS
更新时间:2024-03-22
Flume
Flume是一个分布式、可靠性和高可用的海量日志聚合系统,支持在系统中定制各类数据发送方,用于收集数据;同时,FLume提供对数据进行简单处理,并写到各种数据接收方(可定制)的能力。
Flume支持多种Sink类型,可以借助HDFS Sink将收集到的数据存储到BOS。
开始
1. 下载并安装apache-flume
略
2. 配置环境
如果已有hadoop环境,且已配置过访问BOS,本环节跳过; 否则
- 将bos-hdfs的jar包下载到/opt/apache-flume-1.xx.0-bin/lib目录下;
- 将hadoop下的配置文件core-site.xml添加访问BOS的相关配置,并复制到/opt/apache-flume-1.xx.0-bin/conf目录下。
3. 创建flume配置文件
把flume的StressSource作为source,使用内存channel,通过HDFS协议写入BOS。
Bash
1#ss2bos.properties
2agent.sources = stress_source
3agent.channels = mem_channel
4agent.sinks = bos_hdfs_sink
5
6agent.sources.stress_source.type = org.apache.flume.source.StressSource
7agent.sources.stress_source.channels = mem_channel
8agent.sources.stress_source.size = 1024
9agent.sources.stress_source.maxTotalEvents = 1000
10agent.sources.stress_source.maxEventsPerSecond = 10
11agent.sources.stress_source.batchSize=10
12
13agent.channels.mem_channel.type = memory
14agent.channels.mem_channel.capacity = 1000000
15agent.channels.mem_channel.transactionCapacity = 100
16
17agent.sinks.bos_hdfs_sink.channel = mem_channel
18agent.sinks.bos_hdfs_sink.type = hdfs
19agent.sinks.bos_hdfs_sink.hdfs.useLocalTimeStamp = true
20agent.sinks.bos_hdfs_sink.hdfs.filePrefix = %{host}_bos_hdfs_sink #host区分文件,避免并发写冲突
21agent.sinks.bos_hdfs_sink.hdfs.path = bos://{your bucket}/flume/%Y-%m-%d-%H-%M #替换bucket路径
22agent.sinks.bos_hdfs_sink.hdfs.fileType = DataStream
23agent.sinks.bos_hdfs_sink.hdfs.writeFormat = Text
24agent.sinks.bos_hdfs_sink.hdfs.rollSize = 0
25agent.sinks.bos_hdfs_sink.hdfs.rollCount = 100
26agent.sinks.bos_hdfs_sink.hdfs.rollInterval = 0
27agent.sinks.bos_hdfs_sink.hdfs.batchSize = 100
28agent.sinks.bos_hdfs_sink.hdfs.round = true
29agent.sinks.bos_hdfs_sink.hdfs.roundValue = 10
30agent.sinks.bos_hdfs_sink.hdfs.roundUnit = minute
4. 启动Flume agent
Bash
1./bin/flume-ng agent -n agent -c conf/ -f ss2bos.properties