I have two instances where I have to deploy Vespa on a docker container. One container will act as a config cluster, container cluster, and content cluster while the other will act as a container cluster and content cluster.
host.xml file for the application looks like:
<hosts>
<host name="vespa-master">
<alias>admin0</alias>
</host>
<host name="vespa-searcher">
<alias>searcher1</alias>
</host>
</hosts>
services.xml for the application looks like:
<services version="1.0">
<admin version="2.0">
<adminserver hostalias="admin0"/>
<configservers>
<configserver hostalias="admin0"/>
</configservers>
</admin>
<container id="container" version="1.0">
<document-api />
<search/>
<nodes>
<node hostalias="admin0"/>
<node hostalias="searcher1"/>
</nodes>
</container>
<content id="content" version="1.0">
<documents>
<!--version 1 docs starts-->
<document type="document_name" mode="index" />
<!--version 1 docs ends-->
</documents>
<redundancy>2</redundancy>
<engine>
<proton>
<searchable-copies>1</searchable-copies>
</proton>
</engine>
<group name="top-group">
<distribution partitions="*"/>
<group name="group0" distribution-key="0">
<node hostalias="admin0" distribution-key="0"/>
<node hostalias="searcher1" distribution-key="1"/>
</group>
</group>
</content>
</services>
I am using a docker swarm to make an overlay network connection between the two instances. The command for which looks something like this:
docker network create --driver=overlay --subnet=<IP>/24 vespa_conn --attachable
The command to create a container on the first instance that I had used is:
docker run --detach --hostname vespa-master --network=vespa_conn <other arguments> --env VESPA_CONFIGSERVERS=vespa-master vespaengine/vespa
and the command to create a container on the second instance is:
docker run --detach --hostname vespa-searcher --network=vespa_conn <other arguments> --env VESPA_CONFIGSERVERS=vespa-master vespaengine/vespa
The reference for these commands is from this page.
And after creating and deploying my application the state of the node on the second container is not showing up.
vespa-get-cluster-state
Cluster content:
content/distributor/0: up
content/distributor/1: down
content/storage/0: up
content/storage/1: down
The issue that I found was:
content/distributor/0: Failed to fetch json: Connection error: socket write error
admin/cluster-controllers/0: Failed to fetch json: Connection error: socket write error
admin/slobrok.0: Failed to fetch json: Connection error: socket write error
admin/metrics/vespa-master: Failed to fetch json: Connection error: socket write error
hosts/vespa-master/sentinel: Failed to fetch json: Connection error: socket write error
hosts/vespa-master/logd: Failed to fetch json: Connection error: socket write error
[generation not up-to-date ignored]
container/container.1: Failed to fetch json: Connection error: socket write error
hosts/vespa-searcher/logd: Failed to fetch json: Connection error: socket write error
[generation not up-to-date ignored]
After some tries. I had fixed the problem by adding:
'override VESPA_CONFIGSERVERS vespa-master' in /opt/vespa/conf/vespa/default-env.txt
file in the second container and then restarting the services.
Is there any better way to do this, so that I don't have to manually update the default-env.txt file?
Also, While I was adding the 'configserver' or 'services' at the end of the line of docker run command as specified in the page I was getting this error:
[2020-10-15 11:36:13.782540] 1935/8285 (vespa-model-inspect.config.frt.frtconnection) warning: Connection to tcp/localhost:19090 failed or timed out
[2020-10-15 11:36:13.782631] 1935/8285 (vespa-model-inspect.config.frt.frtconnection) warning: FRT Connection tcp/localhost:19090 suspended until 2020-10-15 11:36:23 GMT
[2020-10-15 11:36:13.782647] 1935/8285 (vespa-model-inspect.config.frt.frtconfigagent) info: Error response or no response from config server (key: name=model,namespace=cloud.config,configId=admin/model) (errcode=104, validresponse:0), trying again in 6000 milliseconds
What will be the reason for this error, Am I doing something wrong here?
To get this working you should avoid having underscores in the network name, use the fully qualified name for the config server and name the containers to get DNS working.
Create the network on a manager swarm host:
Start a Vespa container running both the config server and the services (no argument to the entrypoint):
Start a Vespa container running only the services (services argument to entrypoint):
Then use the fully qualified names in the hosts.xml:
By deploying your unmodified services.xml I get the following state: