Why Kafka doesn't start deployed on local k8s?

341 views Asked by At

I have windows machine with installed docker + k8s(enabled from docker) instance For create kafka instance in k8s I chosen here

To deploy kafka used this commands:

kubectl create namespace kafka
kubectl create -f 'https://strimzi.io/install/latest?namespace=kafka' -n kafka
kubectl apply -f https://strimzi.io/examples/latest/kafka/kafka-persistent-single.yaml -n kafka

And everything lunched successfully but When I restart notebook, kafka pod started with error (screen from lens)

enter image description here

When I opened logs, I saw zookeeper connection error When opened zookeeper pod logs, I saw error like this

2023-12-09 18:06:49,991 INFO Created server with tickTime 2000 ms minSessionTimeout 4000 ms maxSessionTimeout 40000 ms clientPortListenBacklog -1 datadir /var/lib/zookeeper/data/version-2 snapdir /var/lib/zookeeper/data/version-2 (org.apache.zookeeper.server.ZooKeeperServer) [QuorumPeer[myid=1](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
2023-12-09 18:06:49,991 ERROR Couldn't bind to my-cluster-zookeeper-0.my-cluster-zookeeper-nodes.kafka.svc/<unresolved>:2888 (org.apache.zookeeper.server.quorum.Leader) [QuorumPeer[myid=1](plain=127.0.0.1:12181)(secure=0.0.0.0:2181)]
java.net.SocketException: Unresolved address
    at java.base/java.net.ServerSocket.bind(ServerSocket.java:380)
    at java.base/java.net.ServerSocket.bind(ServerSocket.java:342)
    at org.apache.zookeeper.server.quorum.Leader.createServerSocket(Leader.java:322)
    at org.apache.zookeeper.server.quorum.Leader.lambda$new$0(Leader.java:301)
    at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197)
    at java.base/java.util.concurrent.ConcurrentHashMap$KeySpliterator.forEachRemaining(ConcurrentHashMap.java:3573)
    at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509)
    at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499)
    at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150)
    at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173)

I tried to reset k8s and docker to factory configs, tried to change resources of docker(increase memory space ) but the error is same

Updates:

list permissions enter image description here

dns logs enter image description here

it means coredns-5dd5756b68-qhp5q pod can't connect to 192.168.65.7:53

After restart k8s node I saw error in the same dns logs

[ERROR] plugin/errors: 2 5593748469660065637.885187837306804871. HINFO: read udp 10.1.0.27:42685->192.168.65.7:53: i/o timeout
[ERROR] plugin/errors: 2 5593748469660065637.885187837306804871. HINFO: read udp 10.1.0.27:44025->192.168.65.7:53: i/o timeout
2

There are 2 answers

0
Roberto On BEST ANSWER

My work around is to restrart node after pc start I used bat file like this

@echo OFF
echo start docker and k8s..
timeout 20

echo stop node k8s..
kubectl cordon docker-desktop
kubectl delete pod my-cluster-kafka-0 -n kafka
kubectl drain docker-desktop --delete-emptydir-data  --ignore-daemonsets --delete-local-data --force
timeout 20

kubectl uncordon docker-desktop
echo start k8s node..
echo pod status
kubectl get pods -n kafka
timeout 60

echo pod status
kubectl get pods -n kafka
timeout 60

Then I launch it using gpedit.msc when start work on pc

0
Tri Duong On

Note: This response was partially structured using GenAI technology and is currently under review for accuracy and adherence to Stack Overflow's guidelines by myself, a new member of this community. I'm in the process of familiarizing myself with the community standards and code of conduct.

It looks like you're encountering two distinct issues with your Kafka deployment on Kubernetes using Strimzi, I‘m guessing that:

  1. Kafka pod can't bind to the Zookeeper service due to an "Unresolved address" error.
  2. CoreDNS pod can't resolve certain IPs, which is critical for service discovery within the Kubernetes cluster.

To address these potential issues one by one:

Issue 1: Kafka Pod Binding to Zookeeper

The error message indicating "Couldn't bind to my-cluster-zookeeper-0.my-cluster-zookeeper-headless.kafka.svc" might suggest that a service discovery issue. This could be because the Zookeeper headless service is not properly set up, or the Kafka broker is trying to bind to a hostname that is not resolvable.

Resolution Steps to consider:

  • Ensure that your Zookeeper pods are running without issues.
  • Try checking the Zookeeper headless service with kubectl get svc -n kafka and ensuring it's correctly pointing to the Zookeeper pods.
  • Confirm that the Kafka broker configuration for Zookeeper (zookeeper.connect in server.properties) is correct.

Issue 2: CoreDNS Resolution Problem

The DNS resolution issue might be indicated by the CoreDNS pod's inability to connect to the Kubernetes API. This is often due to networking misconfiguration or resource constraints.

Resolution Steps:

  • Check the CoreDNS pod logs in detail for any clues: kubectl logs -n kube-system -l k8s-app=kube-dns.
  • Verify CoreDNS ConfigMap for any misconfiguration.
  • Make sure there's no network policy blocking traffic to the CoreDNS pods.
  • Ensure that your Docker Desktop and Kubernetes have sufficient resources allocated, as DNS issues can sometimes be a symptom of resource starvation.

Additional Troubleshooting Steps:

  • Use kubectl describe pod <kafka-pod-name> -n kafka to get more detailed logs on why the Kafka pod can't start.
  • Examine the events in the Kafka namespace for any anomalies: kubectl get events -n kafka.
  • Look into any persistent storage issues if applicable, as Kafka requires a persistent volume to function correctly.

It's also worth noting that Docker Desktop's Kubernetes cluster is meant for development purposes and might behave differently than a production cluster. Ensure that you're using compatible versions of Strimzi and Kubernetes as provided by Docker Desktop.

If these steps don't resolve the issue, please provide additional logs and configuration details for further diagnosis.