Possible to parallelize for loop with dependencies?

81 views Asked by At

Hello veteran R users,

I'm quite new to R and am wondering if there's any possibility of parallezing my process. My dataset is essentially derived from a pcap file where I've extracted the packets that correspond to a particular protocol-MODBUS/TCP. There are over 800k packets and every two consecutive packets correspond to a query/response of a particular (ie, the same) MODBUS transaction.

As some values are contained in either the query/response, I've created an initial for loop which goes through line by line to "line up" the data so that I have a single line per transaction with all the variables filled from both the query/response lines. The only way to differentiate between a query/response is by the source/destination port number, which is in conditional if statements.

I'm using data tables, setting keys, preallocating variables (the merged table/results). Functions applied to vectors (columns in the result data.table) execute fairly quickly.

My PC is running debian wheezy with 4 processors. Since there are dependencies, from what I've read my understanding is that it's not really possible to leverage the parallel processing? However is there some way I can partition the entire dataset, have them process in parallel and then merge the results? It took over 3 hrs to run, perhaps there's some other optimization I can apply?

Any guidance/pointers greatly appreciated. Thanks!

1

There are 1 answers

0
lstilo On

I've reimplemented the code in C, and have since discovered Rcpp which I'm currently exploring. This seems to be the way to go.