Decrease topic replication factor after Kafka brokers removed from cluster and failed reassignments

822 views Asked by At

The topic replication factor has increased to 45 while the number of available Kafka brokers in the cluster is 40.

This happened due to repeated stuck partition reassignments which were stopped.

kafka-topics --topic top --zookeeper zoo_url --describe

shows

Partition: 0 Leader: 20464 Replicas: 20464,20765,1882,20870,873,898,20752,16789,17181,20743,20854,20762,894,20459,20851,21070,20757,20766,20763,890,21173,20852,895,21314,20767,883,20467,16787,21071,20750,887,20760,7067,876,20764,891,20768,4880,20769,16788,20756,886,21172,1582,871,16827 Isr: 20464,20765,1882,20870,873,898,20752,16789,17181,20743,20762,894,20459,21070,20757,20766,20763,890,21173,895,21314,20767,883,20467,16787,20750,887,20760,7067,876,20764,891,20768,4880,20769,16788,20756,886,21172,871,16827 ...

Some of the replicas are not part of the cluster.

Running:

kafka-reassign-partitions --zookeeper zoo_url --topics-to-move-json-file assign.json --generate --broker-list ...

fails with below error

Partitions reassignment failed due to replication factor: 45 larger than available brokers: 21
kafka.admin.AdminOperationException: replication factor: 45 larger than available brokers: 21
    at kafka.admin.AdminUtils$.assignReplicasToBrokers(AdminUtils.scala:117)
    at kafka.admin.ReassignPartitionsCommand$$anonfun$generateAssignment$1.apply(ReassignPartitionsCommand.scala:110)
    at kafka.admin.ReassignPartitionsCommand$$anonfun$generateAssignment$1.apply(ReassignPartitionsCommand.scala:108)
    at scala.collection.immutable.Map$Map1.foreach(Map.scala:116)
    at kafka.admin.ReassignPartitionsCommand$.generateAssignment(ReassignPartitionsCommand.scala:108)
    at kafka.admin.ReassignPartitionsCommand$.generateAssignment(ReassignPartitionsCommand.scala:91)
    at kafka.admin.ReassignPartitionsCommand$.main(ReassignPartitionsCommand.scala:50)
    at kafka.admin.ReassignPartitionsCommand.main(ReassignPartitionsCommand.scala)

--broker-list argument input are ids of online brokers.

How to force decreasing of topic replication factor?

The only solution that worked is decreasing replication factor of one partition by running

kafka-reassign-partitions --zookeeper zoo_url --reassignment-json-file /tmp/assign.json --execute

when /tmp/assign.json is like below

{ "partitions": [ { "partition": 0, "replicas": [20743,20762,894,20459,20757,895,20467,20760], "topic": "topic" } ], "version": 1 }

And then rerunning partition assignment (generation of assignment and executing it)

Kafka 0.9.0.1 is deployed as part of Cloudera.

1

There are 1 answers

0
OneCricketeer On

You can add more than one partition in the { "partitions": [ list, but using kafka-reassign-partitions is the only built-in way to handle this. There may be external tools that can manage to programmatically generate that JSON, but they may also error when trying to lookup unknown broker ids that are currently in the assignment