Unable to setup Vespa container on multiple instances

575 views Asked by At

I have two instances where I have to deploy Vespa on a docker container. One container will act as a config cluster, container cluster, and content cluster while the other will act as a container cluster and content cluster.

host.xml file for the application looks like:

<hosts>
  <host name="vespa-master">
    <alias>admin0</alias>
  </host>

  <host name="vespa-searcher">
    <alias>searcher1</alias>
  </host>

</hosts>

services.xml for the application looks like:

<services version="1.0">
    <admin version="2.0">
        <adminserver hostalias="admin0"/>
        <configservers>
            <configserver hostalias="admin0"/>
        </configservers>
    </admin>

    <container id="container" version="1.0">
        <document-api />
        <search/>
        <nodes>
            <node hostalias="admin0"/>
            <node hostalias="searcher1"/>
        </nodes>
    </container>

    <content id="content" version="1.0">
        <documents>
            <!--version 1 docs starts-->
            <document type="document_name" mode="index" />
            <!--version 1 docs ends-->
        </documents>

        <redundancy>2</redundancy>
             <engine>
                 <proton>
                     <searchable-copies>1</searchable-copies>
                 </proton>
             </engine>

        <group name="top-group">
            <distribution partitions="*"/>
            <group name="group0" distribution-key="0">
                <node hostalias="admin0" distribution-key="0"/>
                <node hostalias="searcher1" distribution-key="1"/>
            </group>
        </group>
    </content>
</services>

I am using a docker swarm to make an overlay network connection between the two instances. The command for which looks something like this:

docker network create --driver=overlay --subnet=<IP>/24 vespa_conn --attachable

The command to create a container on the first instance that I had used is:

docker run --detach --hostname vespa-master --network=vespa_conn <other arguments> --env VESPA_CONFIGSERVERS=vespa-master vespaengine/vespa

and the command to create a container on the second instance is:

docker run --detach --hostname vespa-searcher --network=vespa_conn <other arguments> --env VESPA_CONFIGSERVERS=vespa-master vespaengine/vespa

The reference for these commands is from this page.

And after creating and deploying my application the state of the node on the second container is not showing up.

vespa-get-cluster-state 

Cluster content:
content/distributor/0: up
content/distributor/1: down
content/storage/0: up
content/storage/1: down

The issue that I found was:

content/distributor/0: Failed to fetch json: Connection error: socket write error
admin/cluster-controllers/0: Failed to fetch json: Connection error: socket write error
admin/slobrok.0: Failed to fetch json: Connection error: socket write error
admin/metrics/vespa-master: Failed to fetch json: Connection error: socket write error
hosts/vespa-master/sentinel: Failed to fetch json: Connection error: socket write error
hosts/vespa-master/logd: Failed to fetch json: Connection error: socket write error
[generation not up-to-date ignored]
container/container.1: Failed to fetch json: Connection error: socket write error
hosts/vespa-searcher/logd: Failed to fetch json: Connection error: socket write error
[generation not up-to-date ignored]

After some tries. I had fixed the problem by adding: 'override VESPA_CONFIGSERVERS vespa-master' in /opt/vespa/conf/vespa/default-env.txt file in the second container and then restarting the services.

Is there any better way to do this, so that I don't have to manually update the default-env.txt file?

Also, While I was adding the 'configserver' or 'services' at the end of the line of docker run command as specified in the page I was getting this error:

[2020-10-15 11:36:13.782540] 1935/8285 (vespa-model-inspect.config.frt.frtconnection) warning: Connection to tcp/localhost:19090 failed or timed out
[2020-10-15 11:36:13.782631] 1935/8285 (vespa-model-inspect.config.frt.frtconnection) warning: FRT Connection tcp/localhost:19090 suspended until 2020-10-15 11:36:23 GMT
[2020-10-15 11:36:13.782647] 1935/8285 (vespa-model-inspect.config.frt.frtconfigagent) info: Error response or no response from config server (key: name=model,namespace=cloud.config,configId=admin/model) (errcode=104, validresponse:0), trying again in 6000 milliseconds

What will be the reason for this error, Am I doing something wrong here?

1

There are 1 answers

2
Arnstein Ressem On BEST ANSWER

To get this working you should avoid having underscores in the network name, use the fully qualified name for the config server and name the containers to get DNS working.

Create the network on a manager swarm host:

docker network create --driver=overlay --attachable vespa-net

Start a Vespa container running both the config server and the services (no argument to the entrypoint):

docker run --detach --name vespa-master --hostname vespa-master.vespa-net --network=vespa-net --env VESPA_CONFIGSERVERS=vespa-master.vespa-net vespaengine/vespa

Start a Vespa container running only the services (services argument to entrypoint):

docker run --detach --name vespa-searcher --hostname vespa-searcher.vespa-net --network=vespa-net --env VESPA_CONFIGSERVERS=vespa-master.vespa-net vespaengine/vespa services

Then use the fully qualified names in the hosts.xml:

<hosts>
  <host name="vespa-master.vespa-net">
    <alias>admin0</alias>
  </host>

  <host name="vespa-searcher.vespa-net">
    <alias>searcher1</alias>
  </host>

</hosts>

By deploying your unmodified services.xml I get the following state:

[root@vespa-master /]# vespa-get-cluster-state

Cluster content:
content/distributor/0: up
content/distributor/1: up
content/storage/0: up
content/storage/1: up