Using Apache spark with HDinsight cluster from a web application

277 views Asked by At

I am currently trying to create a big data processing web application using Apache spark, which I have successfully installed on my HDinsight cluster. I have written Mapreduce programs in C# connecting to my cluster in the past and have been able to run applications in which I connect to my cluster by putting in my account name, storage key, etc... I have looked around the web and it seems that the only way to submit a job with apache spark is to connect to your cluster using a RDP but then there is no way I could incorporate that into a web app (easily. I am new to dealing with clusters/big data). Is it possible to connect to my cluster in a similar manner that I do when I run mapreduce jobs?

I was also thinking that maybe it would be possible to write this within mapreduce, in which I would already be in the context of my cluster. Would that be possible in any way?

1

There are 1 answers

0
Andrew Moll On

If you are installing spark via script actions then Spark specific ports cannot be open outside of the cluster. You can use Spark through VNets though. If you setup a VNET between your end point and the cluster, you can use native spark protocols for remote job submission/querying. It's also possible using Oozie

You could also investigate using the newly announced preview Spark clusters and C# job submissions.