Hello veteran R users,
I'm quite new to R and am wondering if there's any possibility of parallezing my process. My dataset is essentially derived from a pcap file where I've extracted the packets that correspond to a particular protocol-MODBUS/TCP. There are over 800k packets and every two consecutive packets correspond to a query/response of a particular (ie, the same) MODBUS transaction.
As some values are contained in either the query/response, I've created an initial for loop which goes through line by line to "line up" the data so that I have a single line per transaction with all the variables filled from both the query/response lines. The only way to differentiate between a query/response is by the source/destination port number, which is in conditional if statements.
I'm using data tables, setting keys, preallocating variables (the merged table/results). Functions applied to vectors (columns in the result data.table) execute fairly quickly.
My PC is running debian wheezy with 4 processors. Since there are dependencies, from what I've read my understanding is that it's not really possible to leverage the parallel processing? However is there some way I can partition the entire dataset, have them process in parallel and then merge the results? It took over 3 hrs to run, perhaps there's some other optimization I can apply?
Any guidance/pointers greatly appreciated. Thanks!
I've reimplemented the code in C, and have since discovered Rcpp which I'm currently exploring. This seems to be the way to go.