kafka消费报错
問(wèn)題描述:
新版本的kafka消息處理程序中,當(dāng)消息量特別大時(shí)不斷出現(xiàn)如下錯(cuò)誤,并且多個(gè)相同groupId的消費(fèi)者重復(fù)消費(fèi)消息。
2018-10-12 19:49:34,903 WARN [DESKTOP-8S2E5H7 id2-1-C-1] Caller+0 at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$4.onComplete(ConsumerCoordinator.java:649)
Auto-commit of offsets {xxxTopic-5=OffsetAndMetadata{offset=359, metadata=’’}} failed for group My-Group-Name: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records.
解決辦法:
分析:
1, 根據(jù)問(wèn)題描述,處理消息的時(shí)間太長(zhǎng),優(yōu)化消息處理(整個(gè)消息的處理時(shí)間有所減少),該告警有所減少,但是依然存在。
2,根據(jù)問(wèn)題描述,將max.poll.records值設(shè)置為200(默認(rèn)值是500),并增加了session timeout(session.timeout.ms=60000, 默認(rèn)值是5000,也就是5s),檢測(cè)日志,問(wèn)題有所改善,但是依然存在。
至于消息被重復(fù)消費(fèi),這是因?yàn)榘l(fā)送大量消息(group.id=abc)時(shí),consumer1消息處理時(shí)間太長(zhǎng),而consumer設(shè)置的是自動(dòng)提交,因?yàn)椴荒茉谀J(rèn)的自動(dòng)提交時(shí)間內(nèi)處理完畢,所以自動(dòng)提交失敗,導(dǎo)致kafka認(rèn)為該消息沒(méi)有消費(fèi)成功,因此consumer2(group.id=abc,同一個(gè)group.id的多個(gè)消費(fèi)實(shí)例)又獲得該消息開(kāi)始重新消費(fèi)??梢酝ㄟ^(guò)查看kafka中該topic對(duì)應(yīng)的group的lag來(lái)驗(yàn)證。
最終決絕辦法,增加auto.commit.interval.ms , 默認(rèn)值是5000,增加到7000之后,同等kafka消息量下,基本沒(méi)有了該告警消息。
為什么修改該參數(shù),因?yàn)樵摳婢谋举|(zhì)原因是, 消息處理時(shí)間過(guò)長(zhǎng),不能在設(shè)置的自動(dòng)提交間隔時(shí)間內(nèi)完成消息確認(rèn)提交。
?
總結(jié)
- 上一篇: skywalking告警相关配置
- 下一篇: go 安装墙外的依赖包报错问题