I am new to spark and kafka and I have a slightly different usage pattern of spark streaming with kafka. I am using
spark-core_2.10 - 2.1.1
spark-streaming_2.10 - 2.1.1
spark-streaming-kafka-0-10_2.10 - 2.0.0
kafka_2.10 - 0.10.1.1
Continuous event data is being streamed to a kafka topic which I need to process from multiple spark streaming applications. But when I run the spark streaming apps, only one of them receives the data.
Map<String, Object> kafkaParams = new HashMap<String, Object>();
kafkaParams.put("bootstrap.servers", "localhost:9092");
kafkaParams.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
kafkaParams.put("auto.offset.reset", "latest");
kafkaParams.put("group.id", "test-consumer-group");
kafkaParams.put("enable.auto.commit", "true");
kafkaParams.put("auto.commit.interval.ms", "1000");
kafkaParams.put("session.timeout.ms", "30000");
Collection<String> topics = Arrays.asList("4908100105999_000005");;
JavaInputDStream<ConsumerRecord<String, String>> stream = org.apache.spark.streaming.kafka010.KafkaUtils.createDirectStream(
ssc,
LocationStrategies.PreferConsistent(),
ConsumerStrategies.<String, String> Subscribe(topics, kafkaParams) );
... //spark processing
I have two spark streaming applications, usually the first one I submit consumes the kafka messages. Second application just waits for messages and never proceeds. As I read, kafka topics can be subscribed from multiple consumers, is it not true for spark streaming ? Or there is something I am missing with kafka topic and its configuration ?
Thanks in advance .
You can create different streams with same groupids. Here are more details from the online documentation for 0.8 integrations, there are two approaches:
Approach 1: Receiver-based Approach
Approach 2: Direct Approach (No Receivers)
You can read more at Spark Streaming + Kafka Integration Guide 0.8
From your code looks like you are using 0.10, refer Spark Streaming + Kafka Integration Guide (Kafka broker version 0.10.0
Even thought it is using spark streaming api, everything is controlled by kafka properties so depends on group id you specify in properties file, you can start multiple streams with different group id's.
Cheers !