Datalab kernel crashes because of data set size. Is load balancing an option?

Question

Datalab kernel crashes because of data set size. Is load balancing an option?

289 views Asked by Charlie At 22 December 2016 at 16:55

I am currently running the virtual machine with the highest memory,n1-highmem-32 (32 vCPUs, 208 GB memory).

My data set is around 90 gigs, but has the potential to grow in the future.

The data is in stored in many zipped csv files. I am loading the data into a sparse matrix in order to preform some dimensionality reduction and clustering.

Original Q&A

There are 1 answers

**Chris Meyers** · Answer 1 · 2016-12-28T18:58:49+00:00

The Datalab kernel runs on a single machine. Since you are already running on a 208GB RAM machine, you may have to switch to a distributed system to analyze the data.

If the operations you are doing on the data can be expressed as SQL, I'd suggest loading the data into BigQuery, which Datalab has a lot of support for. Otherwise you may want to convert your processing pipeline to use Dataflow (which has a Python SDK). Depending on the complexity of your operations, either of these may be difficult, though.

TechQA.

Datalab kernel crashes because of data set size. Is load balancing an option?

There are 1 answers

Related Questions in GOOGLE-CLOUD-STORAGE

Related Questions in GOOGLE-COMPUTE-ENGINE

Related Questions in GOOGLE-CLOUD-DATALAB

Popular Questions

Popular Tags

Trending Questions