R in Parallel - Partools Package - Error with `calm()` function

119 views Asked by At

I am using the partools package to run linear regressions in parallel. I am doing this using the calm() function, which is a wrapper for the package's version of R's lm().

I'm using 20 cores on a 64gb node.

I receive errors when I run the calm() function, and I've isolated the problem to a single variable: agelvl. Since partools must split a dataset into chunks (the number of chunks equaling the number of cores to be used), variables, from what I can tell, are stored as either character or integer. agelvl is stored as a character due to it's named levels, so I use factor() around it in the function.

Here's the code:

lpmvbac2<-calm(cls,'vbac ~ factor(agelvl),data=nat[nat$prec==1,]')$tht

Here's the error:

  Error in cabase(cls, ovf, coef, vcov) :
      likely cause is constant variable in some chunk
    Calls: calm -> cabase
    In addition: Warning message:
    In f(init, x[[i]]) :
      number of columns of result is not a multiple of vector length (arg 2)

When I run the above code on my local machine (although, using 3 cores, instead of 20), I can't reproduce the error. This would suggest that the problem occurs in the chunking, specifically that a given level of agelvl is missing from one or more chunks.

However, here's a summary of agelvl in the unchunked data:

under 15    15-19    20-24    25-29    30-34    35-39    40-44    45-49 
    7440   336242   698606   770127   620437   267777    48342     2176 

It seems unlikely to me that split into 20 chunks, any one of those 20 chunks would be missing any of these levels. I even checked each 20 chunks individually, and I don't see any levels missing:

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   16732    34284    37552    30392    13225     2410      105      382

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   16774    34906    38727    31012    13469     2445      113      386

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   17007    34762    38820    31159    13311     2326      104      344

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   16836    34839    38387    31251    13594     2429       91      405

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   16621    35150    38519    31103    13470     2505      109      355

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   16768    35020    38673    31034    13379     2467       97      395

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   16724    35036    38376    31211    13473     2538      120      354

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   16948    34831    38714    31013    13486     2373      107      361

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   16948    34807    38845    30801    13532     2432      107      360

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   16746    35042    38581    31184    13369     2381      130      400

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   16796    35045    38616    31200    13351     2335      111      378

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   16837    35298    38579    30858    13369     2424      106      361

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   16882    34955    38529    31136    13403     2459      104      365

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   16839    35096    38360    31210    13383     2462      106      376

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   17109    35106    38450    30991    13322     2377      112      366

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   16869    35118    38310    31083    13426     2530      122      374

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   16850    34885    38768    31210    13284     2371      101      363

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   16644    35086    38968    30840    13450     2378      103      364

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   16707    35086    38762    31010    13371     2387      121      388

   15-19    20-24    25-29    30-34    35-39    40-44    45-49 under 15
   16605    34254    37591    30739    13110     2313      107      363

Interestingly, when I split the data into 3 chunks and use 3 cores on the cluster, instead of 20, I get it to run, just as I'm able to on my local machine.

So, why does this problem occur when using 20 cores but not 3?

1

There are 1 answers

0
Mason Malone On BEST ANSWER

According to the author of partools, this could be a scaling issue -- so that, even if no levels of a categorical variable are missing in any one chunk, the error may still occur because the number of observations in a given level are both absolutely and relatively low.

Solutions

  1. Decrease the number of chunks: assuming there is a point at which the error will disappear, you can decrease the number of chunks; however, this also means that you are decreasing the number of cores you will use which means that (a) each chunk may be so large so that you run into memory problems or (b) the parallel processes now run too slow, or (c) both.

  2. Alter the levels/variable structure: you can leave the desired number of chunks/cores as-is, and simply alter the levels so that each level has a critical number of observations. For agelvl, you could increase the intervals (10 years, instead of 5), or, if possible, change age from a categorical variable to a continuous one. One should keep in mind that such changes could alter the explanatory power of the model or cause the model to be incorrectly specified.