How do I debug a failing cloudera-scm-server process?

1.8k views Asked by At

I am trying to install Cloudera Manager 5 on centOS6, but the cloudera-scm-server process keeps failing without a clear error in the logs.

service --status-all

cloudera-scm-agent (pid  7058) is running...
cloudera-scm-server dead but pid file exists
pg_ctl: server is running (PID: 13650)
/usr/bin/postgres "-D" "/var/lib/cloudera-scm-server-db/data"

cat /var/log/cloudera-scm-server/cloudera-scm-server.out

JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
Killed (core dumped)

`cat /var/log/cloudera-scm-server/cloudera-scm-server.log

...
2015-06-15 13:54:23,642 INFO main:org.springframework.context.annotation.AnnotationConfigApplicationContext: Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@6424e9d8: startup date [Mon Jun 15 13:54:23 UTC 2015]; root of context hierarchy
2015-06-15 13:54:23,682 INFO main:org.springframework.beans.factory.support.DefaultListableBeanFactory: Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@3738baec: defining beans [org.springframework.context.annotation.internalConfigurationAnnotationProcessor,org.springframework.context.annotation.internalAutowiredAnnotationProcessor,org.springframework.context.annotation.internalRequiredAnnotationProcessor,org.springframework.context.annotation.internalCommonAnnotationProcessor,defaultValidatorConfiguration,messageInterpolator,validServiceDependencyValidator,uniqueServiceTypeValidator,uniqueRoleTypeValidator,existingServiceTypeValidator,existingRoleTypeValidator,expressionValidator,autoConfigSharesValidValidator,sdlParser,mdlParser,parcelParser,alternativesParser,permissionsParser,manifestParser,stringInterpolator,serviceDescriptorValidatorWithoutDependencyCheck,serviceDescriptorValidatorWithDependencyCheck,referenceValidator,serviceMonitoringDefinitionsDescriptorValidator,descriptorVisitor,parcelDescriptorValidator,alternativesDescriptorValidator,permissionsDescriptorValidator,manifestDescriptorValidator,springConstraintValidatorFactory,validatorFactoryBean,metricNameFormatValidator,nameForCrossEntityAggregateFormatValidator,builtInServiceTypes,builtInRoleTypes,builtInNamesForCrossEntityAggregateMetrics,uniqueFieldValidator]; root of factory hierarchy
2015-06-15 13:54:48,589 INFO main:com.cloudera.csd.components.MdlRegistry: Loaded /mdls/cdh5/oozie.mdl
2015-06-15 13:54:48,627 INFO main:com.cloudera.cmf.rules.RulesEngine: Loading rules knowledge base

The end of the log is not 100% consistent, but in general I would say this is the spot after which it regularly fails. On an OutOfMemoryError the application would get killed like it does, but I would expect in that case to find an indication of the error in the logs. Also the heap ought to get dumped, but I fail to find the heap dump, there is no *.hprof file anywhere on the machine. Since the cloudera-scm-server.out log say something about a core-dump, but I don't find that either, where would I look for that?

The server DB is the embedded one, and is running properly. The only error message that looks suspicious to me in the logs is that the relation 'cm_version' does not exist.

1

There are 1 answers

0
kutschkem On BEST ANSWER

The problem was memory-related: It was not the heap space that was running out, but the actual physical memory. My VM had a default of 512 MB memory, and the JVM was configured to have 2 GB heap space - filling up the physical memory resulted in the OS silently killing the process, hence no useful log entries. The solution was to increase the memory of the VM.