I want to read streaming XML files and parse them in Apache Storm. I am using Kafka as MQ system to queue the XML files of size ~ 500 KB. I want to pass a whole file as a message to KafkaSpout. How should I go about it?
What is the best way to pass XML files (of size 500-600 KB) as Kafka messages?
1.4k views Asked by vick At
2
There are 2 answers
Related Questions in XML
- Postgres && statement Error in Mybatis Mapper?
- Sorting items after building an XML feed?
- C# XML ModelBinding - ASP.NET Core 8 Web API - required field not found
- How can I create an automatic table of contents in docx without the text being bold?
- Odoo 16 Make Fields Readonly Using XPath
- Using similar tags for different objects in XML
- Android Studio problem like gradle sync project failed and plugin error, version 2023.2.1 Iguana
- error: cannot find symbol View root = inflater.inflate(R.layout.toolbar, parent, false);
- Android camera application restriction to 12 mp
- Azure Data Factory Copy Activity Only Importing First Row of XML file
- I am not able to remove space below the navigation view icon in android studio. What;s wrong with code?
- Field can be converted to a local variable ,convert field to local variable in onCreate method
- Deserialize XML with optional different name
- Retrieve tags from xml using python
- Getting attribute from xml and printing it error
Related Questions in APACHE-STORM
- ERROR: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "maprfs"
- Use rack aware policy for kafka in apache storm
- Apache storm + Kafka Spout
- Getting classCastException when upgrade from strom/zookeepr 2.5/3.8.0 to 2.6/3.9.1
- Does SGX or Gramine support mmap files?
- Apache Storm: Get Blob download exception in Nimbus log
- Apache Storm: can't receive tuples from multiple bolts
- How to make apache storm as FIPS (Federal Information Processing Standard ) compliant
- one bolt recive from 2 others in streamparse python
- How to deploy a topology on Apache Storm Nimbus deployed on AWS ECS
- How to store custom metatags in elasticsearch index from a website using stormcrawler
- conf/storm.yaml is not populated with values coming from config map
- How to process late tuples from BaseWindowedBolt?
- Unable to Start this Storm Project
- Handing skewed processing time of events in a streaming application
Related Questions in APACHE-KAFKA
- No method found for class java.lang.String in Kafka
- How to create beans of the same class for multiple template parameters in Spring
- Troubleshoot .readStream function not working in kafka-spark streaming (pyspark in colab notebook)
- Handling and ignore UNKNOWN_TOPIC_OR_PARTITION error in Kafka Streams
- Connect Apache Flink with Apache kudu as sink using Pyflink
- Embedded Kafka Failed to Start After Spring Starter Parent Version 3.1.10
- Producer Batching Service Bus Vs Kafka
- How to create a docker composer environment where containers can communicate each other?
- Springboot Kafka Consumer unable to maintain connect to kafka cluster brokers
- Kafka integration between two micro service which can respond back to the same function initiated the request
- Configuring Apache Spark's MemoryStream to simulate Kafka stream
- Opentelemetry Surpresses Kafka Produce Message Java
- Kafka: java.lang.NoClassDefFoundError: Could not initialize class org.apache.logging.log4j.core.appender.mom.kafka.KafkaManager
- MassTransit Kafka producers configure to send several events to the same Kafka topic
- NoClassDefFoundError when running JAR file with Apache Kafka dependencies
Related Questions in MQ
- JMeter parallel mq publishing request messages with unique sequence ID which needs to be validated against external system before being sent
- LDAP authentication on MQ qmgr
- AMQ Message move from one server to another us failing with a-1.6.0-jar-with-dependencies.jar
- Saving Dead letter messages in file system/database
- Headers are not read in IBM MQ
- Java IBM MQ client MQ.Environment setting hostname to only ip adress not connecting, but setting to "localhost" works. Local docker deployment ibm-mq
- Linux Posix Message Queue
- How to change test message Format as null through IBM Websphere MQ Explorer
- IBM MQ call failed with compcode '2' ('MQCC_FAILED') reason '2009' ('MQRC_CONNECTION_BROKEN')
- Latest IBM NuGet package for .NET clients obsolete?
- How to set WMQConstants.USERID in springboot application to connect to IBM MQ?
- Jmeter MQ GUI is working properly but cli summary showing error. Aggregate report shows different result
- JMS producer to send message to specific queue manager using apache camel
- how to increase the number of threads/task inserting into MQ using JMS caching connection factory?
- JMSTemplate transacted session not working with MQ resource adapter in Jboss
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Just go ahead and pass the whole file. Based on the benchmark from linkedin (I added the relevant details).
I have mostly shown performance on small 100 byte messages. Smaller messages are the harder problem for a messaging system as they magnify the overhead of the bookkeeping the system does. We can show this by just graphing throughput in both records/second and MB/second as we vary the record size.
So, as we would expect, this graph shows that the raw count of records we can send per second decreases as the records get bigger. But if we look at MB/second, we see that the total byte throughput of real user data increases as messages get bigger:
We can see that with the 10 byte messages we are actually CPU bound by just acquiring the lock and enqueuing the message for sending—we are not able to actually max out the network. However, starting with 100 bytes, we are actually seeing network saturation (though the MB/sec continues to increase as our fixed-size bookkeeping bytes become an increasingly small percentage of the total bytes sent).