I have a source of files that I need to process. From each file, my code generates a variable number of data objects, let's call it N. I have K number of processing objects that can be used to process the N data objects.
I'm thinking of doing the following using Tbb:dataflow:
- Create a function_node with concurrency K and put my K processing objects into a concurrent_queue.
- Use input_node to read file, generate the N data objects, and try_put each into the function_node.
- The function_node body dequeues a processing object, uses it to process a data object, then returns the processing object back to the concurrent_queue when done.
Another way I can think of is possibly like so:
- Create a function_node with serial concurrency.
- Use input_node to read file, generate the N data objects, put the data objects into a collection and send over to the function_node.
- At the function_node, partition the N objects into K ranges and use each of the K processing objects to process each range concurrently - not sure if it is possible to customize parallel_for for this purpose.
The advantage of the first method is probably lower latency because I can start sending data objects through the dataflow the moment they are generated rather than have to wait for all N data objects to be generated.
What do you think is the best way to go about parallelizing this processing?
Yes, you are right that the first method has this advantage of not waiting all of the data objects to start their processing. However, it also has an advantage of not waiting completion of processing all of the data objects passed to
parallel_for
. This becomes especially visible if the speed of processing varies for each data object and/or by each processing object.Also, it seems enough to have
buffer_node
followed by (perhaps,reserving
)join_node
instead ofconcurrent_queue
for saving of processing objects for further reuse. In this case,function_node
would return processing object back to thebuffer_node
once it finishes processing of the data object. So, the graph will look like the following:In this case, the concurrency of the
function_node
can be leftunlimited
as it would be automatically followed by the number of processing objects that exist (available tokens) in the graph.Also, note that generating data objects from different files can be done in parallel as well. If you see benefit from that consider using
function_node
instead ofinput_node
as the latter is always serial. However, in this case, usejoin_node
withqueueing
policy sincefunction_node
is not reservable.Also, please consider using tbb::parallel_pipeline instead as it seems you have a classic pipelining scheme of processing. In particular, this and that link might be useful.