What is the most efficient solution for hundreds download requests in minute for HDFS folder

Question

What is the most efficient solution for hundreds download requests in minute for HDFS folder

79 views Asked by Julias At 22 February 2020 at 22:02

In my company, we have a continuous learning process. Every 5-10 minutes we create a new model in HDFS. Model is a folder of several files:

model ~ 1G (binary file)
model metadata 1K (text file)
model features 1K (csv file) ...

On the other hand, we have hundreds of model serving instances, that need to download the model into the local filesystem once 5-10 minutes and serve from it. Currently, we are using WebFS from our service (java FileSystem client), but it probably creates a load to our Hadoop cluster, since it redirects requests to the concrete data nodes.

We consider to using HTTPFs service. Does it have a caching capability? So the first request will get a folder to service memory, and the next requests will use the already downloaded results?

What other technology/solution could be used for such use-case?

Original Q&A

There are 1 answers

**Julias** · Answer 1 · 2020-02-24T16:31:58+00:00

We have found a nice solution.

It could be used for Hadoop to reduce the read load or for Google/S3 buckets to reduce the cost.

We simply set-up a couple of Ngnix servers, and configure them as a proxy with file cache 2 minutes.

In that way, only Ngnix machines will download the data from the Hadoop cluster.

And all serving machines (that might be hundreds) will pull the data from the Nginx server, where it will be already cached

TechQA.

What is the most efficient solution for hundreds download requests in minute for HDFS folder

There are 1 answers

Related Questions in HADOOP

Related Questions in HDFS

Related Questions in WEBHDFS

Related Questions in HTTPFS

Popular Questions

Popular Tags

Trending Questions