DC/OS Marathon constraints hostname list

3k views Asked by At

When I want use

"constraints": [["hostname", "CLUSTER", "192.168.18.6(1|2)"]]

or

"constraints": [["hostname", "CLUSTER", "DCOS-S-0(1|2)"]] 

In Marathon app name "/zaslepki/4maxpl" has all the time Waiting status

So I try use attribute - I execute:

[root@DCOS-S-00 etc]# systemctl stop dcos-mesos-slave-public.service
[root@DCOS-S-00 etc]# mesos-slave --work_dir=/var/lib/mesos/slave --attributes=DC:DL01 --master=zk://192.168.18.51:2181,192.168.18.51:2181,192.168.18.53:2181/mesos
WARNING: Logging before InitGoogleLogging() is written to STDERR
I1229 13:16:19.800616 24537 main.cpp:243] Build: 2016-11-07 21:31:04 by 
I1229 13:16:19.800720 24537 main.cpp:244] Version: 1.0.1
I1229 13:16:19.800726 24537 main.cpp:251] Git SHA: d5746045ac740d5f28f238dc55ec95c89d2b7cd9
I1229 13:16:19.807195 24537 systemd.cpp:237] systemd version `219` detected
I1229 13:16:19.807232 24537 main.cpp:342] Inializing systemd state
I1229 13:16:19.820071 24537 systemd.cpp:325] Started systemd slice `mesos_executors.slice`
I1229 13:16:19.821051 24537 containerizer.cpp:196] Using isolation: posix/cpu,posix/mem,filesystem/posix,network/cni
I1229 13:16:19.825422 24537 linux_launcher.cpp:101] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I1229 13:16:19.826690 24537 main.cpp:434] Starting Mesos agent
2016-12-29 13:16:19,827:24537(0x7f8ecae60700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2016-12-29 13:16:19,827:24537(0x7f8ecae60700):ZOO_INFO@log_env@730: Client environment:host.name=DCOS-S-00
2016-12-29 13:16:19,827:24537(0x7f8ecae60700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2016-12-29 13:16:19,827:24537(0x7f8ecae60700):ZOO_INFO@log_env@738: Client environment:os.arch=3.10.0-514.2.2.el7.x86_64
2016-12-29 13:16:19,827:24537(0x7f8ecae60700):ZOO_INFO@log_env@739: Client environment:os.version=#1 SMP Tue Dec 6 23:06:41 UTC 2016
2016-12-29 13:16:19,827:24537(0x7f8ecae60700):ZOO_INFO@log_env@747: Client environment:user.name=root
2016-12-29 13:16:19,827:24537(0x7f8ecae60700):ZOO_INFO@log_env@755: Client environment:user.home=/root
2016-12-29 13:16:19,827:24537(0x7f8ecae60700):ZOO_INFO@log_env@767: Client environment:user.dir=/opt/mesosphere/etc
2016-12-29 13:16:19,827:24537(0x7f8ecae60700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=192.168.18.51:2181,192.168.18.51:2181,192.168.18.53:2181 sessionTimeout=10000 watcher=0x7f8ed221a030 sessionId=0 sessionPasswd=<null> context=0x7f8ebc001ee0 flags=0
I1229 13:16:19.828233 24537 slave.cpp:198] Agent started on 1)@192.168.18.60:5051
2016-12-29 13:16:19,828:24537(0x7f8ec8c49700):ZOO_INFO@check_events@1728: initiated connection to server [192.168.18.51:2181]
I1229 13:16:19.828263 24537 slave.cpp:199] Flags at startup: --appc_simple_discovery_uri_prefix="http://" --appc_store_dir="/tmp/mesos/store/appc" --attributes="DC:DL01" --authenticate_http_readonly="false" --authenticate_http_readwrite="false" --authenticatee="crammd5" --authentication_backoff_factor="1secs" --authorizer="local" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_kill_orphans="true" --docker_registry="https://registry-1.docker.io" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --docker_volume_checkpoint_dir="/var/run/mesos/isolators/docker/volume" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hostname_lookup="true" --http_authenticators="basic" --http_command_executor="false" --image_provisioner_backend="copy" --initialize_driver_logging="true" --ip_discovery_command="/opt/mesosphere/bin/detect_ip" --isolation="posix/cpu,posix/mem" --launcher_dir="/opt/mesosphere/packages/mesos--253f5cb0a96e2e3574293ddfecf5c63358527377/libexec/mesos" --logbufsecs="0" --logging_level="INFO" --master="zk://192.168.18.51:2181,192.168.18.51:2181,192.168.18.53:2181/mesos" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_enable_support="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/var/lib/mesos/slave"
I1229 13:16:19.829263 24537 slave.cpp:519] Agent resources: cpus(*):8; mem(*):6541; disk(*):36019; ports(*):[31000-32000]
I1229 13:16:19.829306 24537 slave.cpp:527] Agent attributes: [ DC=DL01 ]
I1229 13:16:19.829319 24537 slave.cpp:532] Agent hostname: DCOS-S-00
2016-12-29 13:16:19,832:24537(0x7f8ec8c49700):ZOO_INFO@check_events@1775: session establishment complete on server [192.168.18.51:2181], sessionId=0x1593f6a1ef20fce, negotiated timeout=10000
I1229 13:16:19.832623 24548 state.cpp:57] Recovering state from '/var/lib/mesos/slave/meta'
I1229 13:16:19.832695 24547 group.cpp:349] Group process (group(1)@192.168.18.60:5051) connected to ZooKeeper
I1229 13:16:19.832723 24547 group.cpp:837] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I1229 13:16:19.832736 24547 group.cpp:427] Trying to create path '/mesos' in ZooKeeper
I1229 13:16:19.834234 24547 detector.cpp:152] Detected a new leader: (id='70')
I1229 13:16:19.834319 24547 group.cpp:706] Trying to get '/mesos/json.info_0000000070' in ZooKeeper
I1229 13:16:19.835002 24547 zookeeper.cpp:259] A new leading master ([email protected]:5050) is detected
Failed to perform recovery: Incompatible agent info detected.
------------------------------------------------------------
Old agent info:
hostname: "192.168.18.60"
resources {
  name: "ports"
  type: RANGES
  ranges {
    range {
      begin: 1
      end: 21
    }
    range {
      begin: 23
      end: 5050
    }
    range {
      begin: 5052
      end: 32000
    }
  }
  role: "slave_public"
}
resources {
  name: "disk"
  type: SCALAR
  scalar {
    value: 37284
  }
  role: "slave_public"
}
resources {
  name: "cpus"
  type: SCALAR
  scalar {
    value: 8
  }
  role: "slave_public"
}
resources {
  name: "mem"
  type: SCALAR
  scalar {
    value: 6541
  }
  role: "slave_public"
}
attributes {
  name: "public_ip"
  type: TEXT
  text {
    value: "true"
  }
}
id {
  value: "8bc3d621-ed8a-4641-88c1-7a7163668263-S9"
}
checkpoint: true
port: 5051

------------------------------------------------------------
New agent info:
hostname: "DCOS-S-00"
resources {
  name: "cpus"
  type: SCALAR
  scalar {
    value: 8
  }
  role: "*"
}
resources {
  name: "mem"
  type: SCALAR
  scalar {
    value: 6541
  }
  role: "*"
}
resources {
  name: "disk"
  type: SCALAR
  scalar {
    value: 36019
  }
  role: "*"
}
resources {
  name: "ports"
  type: RANGES
  ranges {
    range {
      begin: 31000
      end: 32000
    }
  }
  role: "*"
}
attributes {
  name: "DC"
  type: TEXT
  text {
    value: "DL01"
  }
}
id {
  value: "8bc3d621-ed8a-4641-88c1-7a7163668263-S9"
}
checkpoint: true
port: 5051

------------------------------------------------------------
To remedy this do as follows:
Step 1: rm -f /var/lib/mesos/slave/meta/slaves/latest
        This ensures agent doesn't recover old live executors.
Step 2: Restart the agent.
[root@DCOS-S-00 etc]# rm -f /var/lib/mesos/slave/meta/slaves/latest
[root@DCOS-S-00 etc]# systemctl start dcos-mesos-slave-public.service

and I use in .json application configuration file

"constraints": [["DC", "CLUSTER", "DL01"]]

Status application is Waiting.....

This is my .json file aplication "/zaslepki/4maxpl"

{
  "id": "/zaslepki/4maxpl",
  "cmd": null,
  "cpus": 0.5,
  "mem": 256,
  "disk": 0,
  "instances": 2,
  "constraints": [["hostname", "CLUSTER", "DCOS-S-0(3|4)"]],
  "acceptedResourceRoles": [
    "slave_public"
  ],
  "container": {
    "type": "DOCKER",
    "volumes": [],
    "docker": {
      "image": "arekmax/4maxpl",
      "network": "BRIDGE",
      "portMappings": [
        {
          "containerPort": 80,
          "hostPort": 0,
          "servicePort": 10015,
          "protocol": "tcp",
          "labels": {}
        }
      ],
      "privileged": false,
      "parameters": [],
      "forcePullImage": false
    }
  },
  "healthChecks": [
    {
      "path": "/",
      "protocol": "HTTP",
      "portIndex": 0,
      "gracePeriodSeconds": 300,
      "intervalSeconds": 30,
      "timeoutSeconds": 10,
      "maxConsecutiveFailures": 2,
      "ignoreHttp1xx": false
    }
  ],
  "labels": {
    "HAPROXY_GROUP": "external"
  },
  "portDefinitions": [
    {
      "port": 10015,
      "protocol": "tcp",
      "labels": {}
    }
  ]
}

What I do wrong? I find that same problem link but there problem was fixed by use

constraints: [["DC", "CLUSTER", "DL01"]]

1

There are 1 answers

9
janisz On

You've got a clue in a log:

Invalid attribute key:value pair 'DL01'

Change your attribute to key:value pair e.g., DC:DL01 and it should work. Probably you will need to clean metadata directory because you are changing Agent configuration.

Cluster operator doesnt work with multiple values. You need to pass regular expression so your it should looks like this

"constraints": [["hostname", "LIKE", "192.168.18.6(1|2)"]]