how to build test enviroment (Linux, Spark, jupyterhub

90 views Asked by At

I am working on my thesis, and i have the opportunity to set up a working environment to test the functionality and how it works.

the following points should be covered:

  • jupyterhub (within a private cloud)
  • pandas, numpy, sql, nbconvert, nbviewer
  • get Data into DataFrame (csv), analyze Data, store the data (RDD?, HDF5?, HDFS?)
  • spark for future analysis

The test scenario will consist:

  • multiple user environment with notebooks for Users/Topics
  • analyze structured tables (RSEG, MSEG, EKPO) with several million lines in a 3-way-match with pandas, numpy and spark (spark-sql), matplotlib.... its about 3GB of Data in those 3 tables.
  • export notebooks with nbconvert, nbviewer to pdf, read-only notbook and/or reveal.js

Can you guys please give me some hints or experiences on how many notes i should use for testing, which Linux distribution is a good start? i am sure there are many more questions, i have problems to find ways or info how to evaluate possible answers.

thanks in advance!

0

There are 0 answers