Apache Storm 2.1.0 memory related configurations

695 views Asked by At

We are in the process of migrating to 2.1.0 from 1.1.x.

In our current setup we have following memory configurations in storm.yaml

nimbus.childopts: -Xmx2048m
supervisor.childopts: -Xmx2048m
worker.childopts: -Xmx16384m

I see many other memory related configs in https://github.com/apache/storm/blob/master/conf/defaults.yaml, and have following questions regarding them.

  1. what is the difference between worker.childopts and topology.worker.childopts? If we are setting worker.childopts in storm.yaml, do we still have to override topology.worker.childopts?
  2. If we are setting worker.childopts in storm.yaml, do we still have to override worker.heap.memory.mb? Is there a relationship between these two configs?
  3. Should topology.component.resources.onheap.memory.mb < worker.childopts? How should we decide the value of topology.component.resources.onheap.memory.mb ?

Appreciate if someone could explain these points.

1

There are 1 answers

0
Tyreal On

I have recently fiddled with some of these configs myself, so I am sharing my insights here:

  1. worker.childopts vs topology.worker.childopts - the first parameter sets childopts for all workers. The second parameter can be used to override those for individual topologies, e.g. by using conf.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "someJvmArgsHere");

  2. The default value for worker.childopts is "-Xmx%HEAP-MEM%m -XX:+PrintGCDetails -Xloggc:artifacts/gc.log -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1M -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=artifacts/heapdump" according to the storm git. Pay attention to the first argument, it includes a replacement pattern %HEAP-MEM%. This pattern is replaced with whatever you configure for worker.heap.memory.mb. You are able to override the latter parameter from inside a topology configuration in Java, thus I guess they build it that way to be able to quickly modify Java heap for individual topologies. One thing I noticed is that, when overriding, storm only seems to make use of the override value if at least one spout or bolt is configured with .setMemoryLoad(int heapSize).

  3. this highly depends on the individual topology's needs, but in general it is most likely a very good idea to have topology.component.resources.onheap.memory.mb be smaller than whatever you have configured for -Xmx in worker.childopts. How to find a good value for topology.component.resources.onheap.memory.mb is up to testing and knowledge about the memory consumption of your topology's components. For instance, I have a topology which receives tuples from redis and emits them. If bolts are busy, tuples may pile up in the spout, thus I configure it with some headroom in terms of memory. However, I normally do not modify topology.component.resources.onheap.memory.mb but rather use the setMemoryLoad(int heapSize) method of a topology's component, as this allows to set different values for individual components of the topology. Storm docs for this and related topics are here.