R spreaded matrix but had Error: Can't assign to elements that don't exist

280 views Asked by At

Win10 Build 18363.836; R version R-4.0.2; RStudio version 1.3.1093; CPU: Intel i7-7500U; Physical Memory: 16GB; HDD: 500GB SSD

I am working on the Instacart Market Basket dataset. Inorder to build recommendation models with the R package, I need to create a matrix with customer_id and product_name for collaboration filtering.

After loaded the data, I merged and filtered the data to prepare the training set.

data_train = orders %>% 
  filter(eval_set=='train') %>% 
  left_join(order_products) %>%
  left_join(products) %>%
  mutate(actual=1) %>%
  select(user_id, order_id, product_id, product_name, actual)

I tried to build the matrix like this.

data_train %>% 
  select(user_id, product_name) %>% 
  mutate(n=1)  %>%
  arrange(product_name) %>% 
  pivot_wider(names_from = "product_name", values_from = "n", values_fill=0)

But got error message:

Error Can't assign to elements that don't exist. x Locations 2, 3, 4, 5, 6, etc. don't exist. i There are only 1 element.

I am not able to solve the issue, so I tried another way to build matrix according to this article

ratings_matrix <- train01 %>%
  select(user_id, product_name) %>% 
  mutate(value = 1) %>%
  spread(product_name, value, fill = 0) %>%
  select(-user_id) %>%
  as.matrix() %>%
  as("binaryRatingMatrix")
ratings_matrix

Now it shows:

Error: cannot allocate vector of size 19.2 Gb

So I extened the memory limition by:

memory.limit(50000)

Then I checked my system, it shows a total paging file for all drive 43377MB (as virtual memory), which is far more than R asked for generating the matrix. But still got the same error message.

I also tried to split the train data to four pieces, and just converted 25% of it, but still got error:

Error: cannot allocate vector of size 3.2 Gb

This is even far less than the available physical memory now on my laptop. And since I need to do the collaborative filtering for the data, it is better to generate the matrix as a whole piece. Can anyone help me to find out what kind of mistake I made on the coding, or teach me another way to generate the matrix? Thanks.

0

There are 0 answers