Not sure what you mean by "parallel computing", largely we're implemented in c/c++ but also support cuda as well as mkl, and power chips as well.
Due to how broad your question is, I can only assume you aren't really looking for a "deep" answer to this question but I can tell you that we have the buzzwords you'd expect like openmp, blas/lapack, sparse,..
So editing my answer a bit: Numpy operations that are "vectorized" are just for loops in c. Python inherently has slow loops and is largely a slow language.
Another edit: It would be physically impossible for us to support gpus if there wasn't a ton of c code buried in there. We also couldn't do blas without JNI. Nd4j is definitely NOT a pure java library.
Not sure what you mean by "parallel computing", largely we're implemented in c/c++ but also support cuda as well as mkl, and power chips as well.
Due to how broad your question is, I can only assume you aren't really looking for a "deep" answer to this question but I can tell you that we have the buzzwords you'd expect like openmp, blas/lapack, sparse,..
So editing my answer a bit: Numpy operations that are "vectorized" are just for loops in c. Python inherently has slow loops and is largely a slow language.
Another edit: It would be physically impossible for us to support gpus if there wasn't a ton of c code buried in there. We also couldn't do blas without JNI. Nd4j is definitely NOT a pure java library.
We run all the real logic in: https://github.com/deeplearning4j/libnd4j
So yes: in that effect we have "c++ based for loops" in there yes. The for loops are multi threaded or "parallelized" using cuda and openmp/mkl.