How to fix memory allocation issues when converting annotated NLP model to dataframe in R

Question

How to fix memory allocation issues when converting annotated NLP model to dataframe in R

93 views Asked by nigus21 At 02 December 2020 at 19:08

I am trying to convert an annotated NLP model of size 1.2GB to dataframe. I am using the Udpipe package for natural language processing in R with following code:

# Additional Topic Models
# annotate and tokenize corpus
model <- udpipe_download_model(language = "english")
udmodel_english <- udpipe_load_model(model$file_model)
s <- udpipe_annotate(udmodel_english, cleaned_text_NLP)
options(java.parameters = "-Xmx32720m")
memory.limit(3210241024*1024)
x <- data.frame(s)

Note that I have 32GB RAM and allocated all available memory to R to run the code. I also tried deleting large objects stored in the R environment space that are not relevant for running the above code. R cannot seem to allocate enough memory for the task and the following error message was the result:

Error in strsplit(x$conllu, "\n") : 
  could not allocate memory (4095 Mb) in C function 'R_AllocStringBuffer'

My question is two fold:

What does the above error message mean?
What workarounds are available to fix this issue?

Original Q&A

There are 1 answers

**AudioBubble** · Accepted Answer · 2020-12-03T12:02:30+00:00

Probably you have quite some documents to annotate. It's better to annotate in chunks as shown at https://cran.r-project.org/web/packages/udpipe/vignettes/udpipe-parallel.html

Following code will annotate in chunks of 50 documents in parallel across 2 cores and basically does your data.frame command. You will no longer have the issue as the function did strsplit on each chunks of 50 documents instead of on your full dataset where apparently the size of the annotated text was too large to fit into the limits of a stringbuffer in R. But below code will solve your issue.

x <- udpipe(cleaned_text_NLP, udmodel_english, parallel.cores = 2L, parallel.chunksize = 50)

TechQA.

How to fix memory allocation issues when converting annotated NLP model to dataframe in R

There are 1 answers

Related Questions in R

Related Questions in UDPIPE

Popular Questions

Popular Tags

Trending Questions