Nifi and Kafka are now both available in Cloudera Data Platform, CDP public cloud. Nifi is great at talking to everything and Kafka is a mainstream message bus, I just wondered:
What are the minimal steps needed to Produce/Consume data to Kafka from Apache Nifi within CDP Public Cloud
I would Ideally look for steps that work in any cloud, for instance Amazon AWS and Microsoft Azure.
I am satisfied with answers that follow best practices and work with the default configuration of the platform, but if there are common alternatives these are welcome as well.
There will be multiple form factors available in the future, for now I will assume you have an environment that contains 1 datahub with NiFi, and 1 Data Hub with Kafka. (The answer still works if both are on the same datahub).
Prerequisites
These steps allow you to Produce data from NiFi to Kafka in CDP Public Cloud
Unless mentioned otherwise, I have kept everything to its default settings.
In Kafka Data Hub Cluster:
broker1.abc:9093,broker2.abc:9093,broker3.abc:9093
In NiFi GUI:
GenerateFlowFile
processorPublishKafka_2_0
, configure it as follows:GenerateFlowFile
processor to yourPublishKafka_2_0
processor and start the flowThese are the minimal steps, a more extensive explanation can be found on in the Cloudera Documentation. Note that it best practice to create topics explicitly (this example leverages the feature of Kafka that automatically lets it create topics when produced to).
These steps allow you to Consume data with NiFi from Kafka in CDP Public Cloud
A good check to see if data was written to Kafka, is consuming it again.
In NiFi GUI:
ConsumeKafka_2_0
, configure its Properties as follows:And that is it, within 30 seconds you should see that the data that you published to Kafka is now flowing into NiFi again.
Full Disclosure: I am an employee of Cloudera, the driving force behind Nifi.