AWS EMR Cluster fails because disc is full

3.2k views Asked by At

I am running some Map-Reduce-Jobs on an aws emr cluster with ~10 Nodes. (emr 4.7.11, m3.xlarge)

While the job is running the worker nodes start to die one by one after ~4 hours. In the logs I found the following error:

"1/3 local-dirs are bad: /mnt/yarn; 1/1 log-dirs are bad: /var/log/hadoop-yarn/containers"

The disks on the worker nodes were at 96% used when the Nodes failed. So I assume the disks on the nodes got to 100% and no files could be written to the disk.

So I tried to attach an 500GB EBS Volume to each instance. But Hadoop only uses /mnt and does not use the additional Volume (/mnt2).

How do i configure the AWS EMR Cluster to use /mnt2 ? I've tried to use a configuration file, but the cluster fails now with the error On the master instance (i-id), bootstrap action 6 returned a non-zero on bootstrap. unfortunately there are bootstrap action 6 log in the s3 bucket

The config file:

[
   {
    "Classification": "core-site",
    "Properties": {
      "hadoop.tmp.dir": "/mnt2/var/lib/hadoop/tmp"
    }
  },
  {
    "Classification": "mapred-site",
    "Properties": {
      "mapred.local.dir": "/mnt2/var/lib/hadoop/mapred"
    }
  }
]

Anyone has a hint why the cluster fails on startup ? Or is there another way to increase the initial EBS Volume of the m3.xlarge instances ?

https://forums.aws.amazon.com/thread.jspa?threadID=225588 Looks like the same issue but there is no solution

1

There are 1 answers

0
jc mannem On

if the disk (like /mnt/) goes beyond 90% , then the core/task node will be marked unhealthy and unusable by YARN. See yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage in http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

Now , if you attach EBS volumes with EMR API(while you provision your cluster), then EMR does use those volumes for certain properties automatically. For example : mapred.local.dir will use all mounts. However, some properties like (hadoop.tmp.dir , yarn.nodemanager.log-dirs ) may not use all mounts. For such properties, you will need add a comma directory paths as values and set them using configurations API or manually editing necessary files.

  <property>
    <name>mapred.local.dir</name>
    <value>/mnt/mapred,/mnt1/mapred</value>
  </property>

  <property>
    <name>hadoop.tmp.dir</name>
    <value>/mnt/var/lib/hadoop/tmp</value>
  </property>