Note:
I recognize this is a slightly more amorphous/non-replicable problem than is ideal, but I feel it is worthwhile given the other instances we've seen on stackoverflow and potential general applicability to other parallel computing issues.
Problem:
I've been trying to parallelize my code in R in windows using foreach and %dopar% in the "doParallel" package, and it all seemed to work as expected for several months before suddenly exhibiting very strange behavior. Each child process produces its own .csv file when it is complete, after ~5 minutes. For months, after that 5 minutes, my session would continue to work as expected.
Then, something seemed to happen where each child process would finish, and they would all write their .csvs after ~5 minutes. But, the Rsession would still be stuck and perhaps the master process was still somehow working, despite all the child processes clearly being done. And, when it finally completes, the whole session runs insanely slow until I close R.
It seems to be the exact same situation as these other stackoverflow questions which were unsolved (and there were comments therein of others with the same problem):
R foreach do parallel does not end after loop completion
%dopar% parallel foreach loop fails to exit when called from inside a function (R)
Given that it's a recurring issue across several stack overflow questions, I wanted to see if anyone could provide some additional context or guidance for the situation.
I code with data.table, and per this blogpost, data.table's multithreading could cause conflicts with parallel processing. I'm not sure if that started the problem, but I've now tried to setDTthreads(1) at the start of the function and I would think that would solve it. It's weird, though, that it would have gone from never being a problem for months to almost always being a problem.
Potentially relevant details
Sometimes one of my child processes will seem to break and start spinning out of control writing weird .csv files. When that happens I need to restart R. Not sure if one of those could have caused some damage somehow.
More concerning, I started having weird behavior where my computer/explorer were running really choppy, whether or not i was using R, and while showing only ~10% cpu usage, around the time when this started happening. It didn't go away with ~10-20 restarts, but it did seem to stop happening after a month or two.
Main questions
1.) Is it possible for parallel processing gone wrong to cause some sort of actual damage to a windows computer? Would it be the CPU? SSD? Or something else? Would a factory reset fix it? Something else?
2.) Could something have gone wrong in the packages/R installation? Does it merit reinstalling those?
3.) Is it likely that all these different stackoverflow issues were caused by multithreading vs. parallel processing conflicts gone wrong? Should setDTthreads(1) at the top of the function prevent such problems? What are best practices for avoiding them?
4.) Would switching to a linux partition likely alleviate these problems? Switching to a different parallelization package in R?
A huge thanks to anyone who actually read this whole thing! I appreciate any help, this is a really dire/frustrating issue that seems to have affected several people.