how to build test enviroment (Linux, Spark, jupyterhub

90 views Asked by Lanfear At 21 December 2016 at 14:12

I am working on my thesis, and i have the opportunity to set up a working environment to test the functionality and how it works.

the following points should be covered:

jupyterhub (within a private cloud)
pandas, numpy, sql, nbconvert, nbviewer
get Data into DataFrame (csv), analyze Data, store the data (RDD?, HDF5?, HDFS?)
spark for future analysis

The test scenario will consist:

multiple user environment with notebooks for Users/Topics
analyze structured tables (RSEG, MSEG, EKPO) with several million lines in a 3-way-match with pandas, numpy and spark (spark-sql), matplotlib.... its about 3GB of Data in those 3 tables.
export notebooks with nbconvert, nbviewer to pdf, read-only notbook and/or reveal.js

Can you guys please give me some hints or experiences on how many notes i should use for testing, which Linux distribution is a good start? i am sure there are many more questions, i have problems to find ways or info how to evaluate possible answers.

thanks in advance!

Original Q&A

TechQA.

how to build test enviroment (Linux, Spark, jupyterhub

There are 0 answers

Related Questions in LINUX

Related Questions in HADOOP

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in JUPYTERHUB

Popular Questions

Popular Tags

Trending Questions