Collect logs from Mesos Cluster

633 views Asked by At

My team is deploying a new cluster on Amazon EC2 instances. After a bit of research, we decided to go with Apache Mesos as cluster manager and Spark for computation.

The first question we asked ourself is what would be the best way to collect logs from all the machines, for each different framework. Till now, we developed some custom bash/python scripts which collect logs from predefined locations, zip them and send the compressed file to S3. This kind of rotation is activated by a cron job, which runs every hour.

I have been searching for the "best" (or standard) way to do this. I found Apache Flume, which is a data collector also for logs, but I don't understand how could it be integrated in a Mesos cluster to collect logs (and for Spark).

I found this "similar" question, but the solutions are not Open Source or no more supported.

Is there a better way to rotate logs or a standard way I'm missing?

Thank you very much

1

There are 1 answers

1
Phillip Mann On

There is no perfect answer to this. If you are using Spark and are interested in using Flume, you will have to either write a custom Flume -> Spark interface as one doesn't exist as far as I know. However, what you can do is this:

  1. Use Flume to ingest log data in realtime.
  2. Have Flume do pre-processing on the log data with a custom interceptor.
  3. Have Flume write to Kafka after pre-processing is done.
  4. Have Spark streaming read off of the Kafka queue to process the logs and run your computations.

Spark Streaming is supposedly not up to prime time production grade yet but this is one potential solution.