I am trying to understand the difference between threaded port and @parallel annotation in IBM info-sphare streams,i have searched many places but' couldn't get my answers,as per my understanding ,both of them are useful in making the operator threaded ,but i am not sure when and where to use them,and can they be used together for performance boost.Could someone please corroborate their usage with examples.
Thanks.
Threaded ports are primarily about pipeline parallelism. When you specify a threaded port on an operator's
config
clause, you're telling the runtime to execute that operator with a different thread from the thread that executes the upstream operator. We call this pipeline parallelism because the operators in the pipeline are able to execute simultaneously. If the operators in your pipeline are computationally expensive enough, this can improve throughput.The
@parallel
annotation is primarily about data parallelism. When you apply the@parallel(width=N)
annotation to an operator invocation, Streams will replicate that operatorN
times. Your running application will haveN
copies of that operator, each receiving a different subset of the overall number of tuples. We call this data parallelism because we're processing different data (which are tuples, in the case of Streams) simultaneously by replicating the operator. When you have an operator which is computationally expensive, and it's okay to process incoming tuples out of order,@parallel
can improve throughput.In practice, using the
@parallel
annotation will sometimes inject threaded ports into your application in order to make sure that the replicated operator execution happens in parallel. This will, as a side effect, also introduce some pipeline parallelism.This application demonstrates both threaded ports and
@parallel
: streamsx.demo.logwatch. It is the source code for the application developed in the Optimizing Streams Applications presentation.