I am new to K8s. So, it might be slight mistake or a big on my part but, I am not able to resolve this by my own. So here I am with my setup details and problem.
I am using minikube cluster with 2 nodes on same machine.
minikube profile list
|----------|-----------|---------|--------------|------|---------|---------|-------|--------|
| Profile | VM Driver | Runtime | IP | Port | Version | Status | Nodes | Active |
|----------|-----------|---------|--------------|------|---------|---------|-------|--------|
| minikube | docker | docker | 192.168.76.2 | 8443 | v1.27.4 | Running | 2 | * |
|----------|-----------|---------|--------------|------|---------|---------|-------|--------|
I created 2 VF from one PF.
eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 0c:c4:7a:77:f9:80 brd ff:ff:ff:ff:ff:ff
vf 0 link/ether 1e:4c:7c:6b:7d:0f brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off
vf 1 link/ether 9e:52:6e:e0:e9:07 brd ff:ff:ff:ff:ff:ff, spoof checking on, link-state auto, trust off, query_rss off
I want to create a POD with multiple interfaces. Also I want to link 1 VF with one POD.
For that I installed SRIOV-CNI and place the SRIOV binary in /opt/cni/bin folder as suggested in documentation.
After that I downloaded sriov-network-device-plugin and created and apply config map. Below is the extract from my config file
apiVersion: v1
kind: ConfigMap
metadata:
name: sriovdp-config
namespace: kube-system
data:
config.json: |
{
"resourceList": [{
"resourceName": "intel_sriov_netdevice",
"selectors": {
"vendors": ["8086"],
"devices": ["15a8"],
"drivers": ["ixgbevf"]
}
},
k -n kube-system get pod -l app=sriovdp -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-sriov-device-plugin-amd64-64bf7 1/1 Running 0 5h41m 192.168.76.2 minikube <none> <none>
kube-sriov-device-plugin-amd64-7knxh 1/1 Running 0 5h41m 192.168.76.3 minikube-m02 <none> <none>
My nodes are able to see my VF as well
kubectl get node minikube -o jsonpath='{.status.allocatable}' |jq -r '."intel.com/intel_sriov_netdevice"'
2
I apply multus daemonset as well.
k get pod -l app=multus -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system kube-multus-ds-47zg5 1/1 Running 0 5h38m 192.168.76.2 minikube <none> <none>
kube-system kube-multus-ds-rjrzn 1/1 Running 0 5h38m 192.168.76.3 minikube-m02 <none> <none>
After done with all this, when I am trying to launch POD, It is getting stuck. It's not coming UP.
Logs from POD are:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 15s default-scheduler Successfully assigned default/testpod1 to minikube-m02
Normal AddedInterface 14s multus Add eth0 [10.244.1.32/24] from kindnet
Warning FailedCreatePodSandBox 14s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "0f5d4259dfaa0087fba50ee5c656050fc6af6c01430bdd820e0642fba9d384de" network for pod "testpod1": networkPlugin cni failed to set up pod "testpod1_default" network: plugin type="multus" name="multus-cni-network" failed (add): [default/testpod1/:sriov-network]: error adding container to network "sriov-network": SRIOV-CNI failed to load netconf: LoadConf(): failed to get VF information: "PF network device not found"
Normal AddedInterface 13s multus Add eth0 [10.244.1.33/24] from kindnet
Warning FailedCreatePodSandBox 12s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "39c2cc9ceff18eb34fbd4f7b0746fd0807a3999fdcef6f8dd8487b43f812a31a" network for pod "testpod1": networkPlugin cni failed to set up pod "testpod1_default" network: plugin type="multus" name="multus-cni-network" failed (add): [default/testpod1/:sriov-network]: error adding container to network "sriov-network": SRIOV-CNI failed to load netconf: LoadConf(): failed to get VF information: "PF network device not found"
Normal AddedInterface 12s multus Add eth0 [10.244.1.34/24] from kindnet
Warning FailedCreatePodSandBox 11s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "20fb94f524db27d3b1bcb0e743c925ddb6af038c9220de28cd84ec92215b0ab3" network for pod "testpod1": networkPlugin cni failed to set up pod "testpod1_default" network: plugin type="multus" name="multus-cni-network" failed (add): [default/testpod1/:sriov-network]: error adding container to network "sriov-network": SRIOV-CNI failed to load netconf: LoadConf(): failed to get VF information: "PF network device not found"
As per the article written by Vinayak Pandey, there are different scenarios that can cause the
FailedCreatePodSandBox
error when you attempt to create a pod. In general you can check if CNI is working on the node and if all the CNI configuration files are correct then you should also verify that the system resource limits are properly set.Scenario 1: CNI not working on the node
The Kubernetes Container Network Interface (CNI) configures networking between pods. If CNI isn’t running properly on the nodes, pods can’t be created because they will be stuck in the
ContainerCreating
state.As your environment has 2 nodes in it, you need to prevent the SRIOV-CNI from running on one node by following the steps mentioned in the article.
Debugging and resolution
The error message indicates that CNI on the node where the pod is scheduled to run is not functioning properly, so the first step should be to check if the CNI pod is running on that node. If the CNI pod is running properly, one possible root cause is eliminated. In this case, once you remove the nodeSelector from the DaemonSet definition and ensure the CNI pod is running on the node, the pod should be running fine.
Scenario 2: Missing or incorrect CNI configuration files
Even if the CNI pod is running, there may be some issues if the CNI configuration files have errors. To simulate this, you need to make some changes in the CNI configuration files, which are stored under the /etc/cni/net.d directory by following the steps mentioned in the article.
Debugging and resolution
In this scenario, verify the CNI configuration files. You can check the configuration files of other nodes of the cluster and verify if those files are similar to the ones in the problematic node. If you find any issue with the configuration files, copy the configuration files from the other nodes to that node and then try recreating the pod.