opscenter 6 failed query to node - take-snapshot timeout

338 views Asked by At

I upgraded from Opscenter 5.2.0 to 6.0.7 yesterday (this is DSE 4.8.10) following the instructions here. After the upgrade, I noticed that all the scheduled backup jobs were missing. The Activity tab for Backup Service took forever to load and gave me some timeout errors. I added a timeout to the opscenterd.conf to resolve that issue.

[ui]
 default_api_timeout=400

then I added my simple scheduled job - Backup On Server daily, 2 column families. I saved the job, then scheduled it to "Run Now". The backup failed with the following error:

Failed query to http://10.0.30.161:61621/ops/take-snapshot?req-id=78c53af6-bc02-49f2-b9a2-89452ec91cc9 : The http request to the agent timed out. This most likely indicates that you need to restart the agent on this node.

All agents are Up according to the Opscenter UI. Did I miss something during the upgrade? Or is there a known issue with 6.0.7?

EDIT: I found these errors in the agent.log

INFO [qtp203625144-16505] 2017-01-05 14:15:43,183 HTTP: :put /agent-conf {} - 200 INFO [qtp203625144-16506] 2017-01-05 14:15:43,184 HTTP: :get /connection-status {} - 200 WARN [qtp203625144-16505] 2017-01-05 14:16:16,175 Unrecognized config key: :rollup_subscriptions WARN [qtp203625144-16505] 2017-01-05 14:16:16,176 Unrecognized config key: :destinations WARN [qtp203625144-16505] 2017-01-05 14:16:16,176 Unrecognized config key: :f8c8f581a1a94776a052c25149156860 WARN [qtp203625144-16505] 2017-01-05 14:16:16,176 Unrecognized config key: :6ed69a25565942e9b7d65f17f754f7db WARN [qtp203625144-16505] 2017-01-05 14:16:16,176 Unrecognized config key: :7109d86e3c68442a98cfc6ecc6da5b26 WARN [qtp203625144-16505] 2017-01-05 14:16:16,176 Unrecognized config key: :64707f80a8d242c392ba5331e1d5f545 WARN [qtp203625144-16505] 2017-01-05 14:16:16,177 Unrecognized config key: :02238b5e495c4e47af5214ce70640b45 INFO [qtp203625144-16505] 2017-01-05 14:16:16,177 Got new config [note values in address.yaml override those from OpsCenter]: {:rollup_subscriptions [], :destinations ["02238b5e495c4e47af5214ce70640b45" "6ed69a25565942e9b7d65f17f754f7db" "64707f80a8d242c392ba5331e1d5f545" "7109d86e3c68442a98cfc6ecc6da5b26" "f8c8f581a1a94776a052c25149156860"], :metrics_ignored_column_families "", :f8c8f581a1a94776a052c25149156860 {:delete_this "False", :throttle_bytes_per_second "0", :path "cassandra-prod-bkup", :provider "s3", :access_key "*REDACTED*", :access_secret "*REDACTED*"}, :metrics_enabled true, :6ed69a25565942e9b7d65f17f754f7db {:throttle_bytes_per_second "0", :delete_this "False", :path "cass-prod-opsc6", :server_side_encryption "False", :provider "s3", :access_key "*REDACTED*", :access_secret "*REDACTED*"}, :cassandra_log_location "/var/log/cassandra", :hosts ["10.0.30.161"], :jmx_user "*REDACTED*", :kerberos_service "", :7109d86e3c68442a98cfc6ecc6da5b26 {:delete_this "False", :throttle_bytes_per_second "0", :path "cassandra-prod-bkup", :provider "s3", :access_key "*REDACTED*", :access_secret "*REDACTED*"}, :config_md5 "840a5bf32f266bf8c74be84adaad0520", :monitored_cassandra_pass "*REDACTED*", :monitored_thrift_port 9160, :metrics_ignored_solr_cores "", :rollups60_ttl 604800, :max_pending_repairs 5, :api_port "61621", :cassandra_port 9042, :jmx_pass "*REDACTED*", :64707f80a8d242c392ba5331e1d5f545 {:delete_this "False", :throttle_bytes_per_second "0", :path "cassandra-prod-bkup", :provider "s3", :access_key "*REDACTED*", :access_secret "*REDACTED*"}, :kerberos_keytab_location "", :rollups300_ttl 2419200, :cassandra_pass "*REDACTED*", :metrics_ignored_keyspaces "system, system_traces, system_auth, system_distributed, dse_auth, OpsCenter", :use_ssl false, :cassandra_user "*REDACTED*", :monitored_cassandra_user "*REDACTED*", :thrift_port 9160, :storage_keyspace "OpsCenter", :monitored_cassandra_port 9042, :kerberos_client_principal "", :02238b5e495c4e47af5214ce70640b45 {:delete_this "False", :throttle_bytes_per_second "0", :path "cassandra-prod-bkup", :provider "s3", :access_key "*REDACTED*", :access_secret "*REDACTED*"}, :cassandra_install_location "", :restore_req_update_period 1, :jmx_port 7199, :ec2_metadata_api_host "169.254.169.254", :rollups86400_ttl 0, :jmx_operations_pool_size 4, :backup_staging_dir "", :rollups7200_ttl 31536000} INFO [qtp203625144-16505] 2017-01-05 14:16:16,179 Configuration change for component class opsagent.backups.destinations.DestinationService: before: {:dest-map {"db4533f0dad548499089bbcfcc03e2fa" {:throttle_bytes_per_second "0", :delete_this "False", :path "cass-prod-opsc6", :server_side_encryption "False", :provider "s3", :access_key "*REDACTED*", :access_secret "*REDACTED*"}}, :destinations [nil nil "db4533f0dad548499089bbcfcc03e2fa" "64707f80a8d242c392ba5331e1d5f545" "7109d86e3c68442a98cfc6ecc6da5b26" "f8c8f581a1a94776a052c25149156860"]}, after: {:destinations [nil nil "64707f80a8d242c392ba5331e1d5f545" "7109d86e3c68442a98cfc6ecc6da5b26" "f8c8f581a1a94776a052c25149156860"]} INFO [async-dispatch-13] 2017-01-05 14:16:16,180 Starting system. INFO [async-dispatch-13] 2017-01-05 14:16:16,181 Configuration change for component class opsagent.backups.destinations.DestinationService: before: {:dest-map {"db4533f0dad548499089bbcfcc03e2fa" {:throttle_bytes_per_second "0", :delete_this "False", :path "cass-prod-opsc6", :server_side_encryption "False", :provider "s3", :access_key "*REDACTED*", :access_secret "*REDACTED*"}}, :destinations [nil nil "db4533f0dad548499089bbcfcc03e2fa" "64707f80a8d242c392ba5331e1d5f545" "7109d86e3c68442a98cfc6ecc6da5b26" "f8c8f581a1a94776a052c25149156860"]}, after: {:destinations [nil nil "64707f80a8d242c392ba5331e1d5f545" "7109d86e3c68442a98cfc6ecc6da5b26" "f8c8f581a1a94776a052c25149156860"]} INFO [async-dispatch-13] 2017-01-05 14:16:16,181 The following components have had a config change and will be rebuilt and restarted: (:destination-service) INFO [async-dispatch-13] 2017-01-05 14:16:16,181 The component restart for (:destination-service) when accounting for dependencies requires these components to be restarted #{:destination-service} INFO [async-dispatch-13] 2017-01-05 14:16:16,182 Starting DynamicEnvironmentComponent INFO [async-dispatch-13] 2017-01-05 14:16:16,207 Finished starting system.

0

There are 0 answers