I am trying to install PGO operator by following this Docs. When I run this command
kubectl apply --server-side -k kustomize/install/default
my Pod run and soon it hit to crash loop back.
What I have done I check the logs of Pods with this command
kubectl logs pgo-98c6b8888-fz8zj -n postgres-operator
Result
time="2023-01-09T07:50:56Z" level=debug msg="debug flag set to true" version=5.3.0-0
time="2023-01-09T07:51:26Z" level=error msg="Failed to get API Group-Resources" error="Get \"https://10.96.0.1:443/api?timeout=32s\": dial tcp 10.96.0.1:443: i/o timeout" version=5.3.0-0
panic: Get "https://10.96.0.1:443/api?timeout=32s": dial tcp 10.96.0.1:443: i/o timeout
goroutine 1 [running]:
main.assertNoError(...)
github.com/crunchydata/postgres-operator/cmd/postgres-operator/main.go:42
main.main()
github.com/crunchydata/postgres-operator/cmd/postgres-operator/main.go:84 +0x465
To check the network connection to host I run this command
wget https://10.96.0.1:443/api
The Result is
--2023-01-09 09:49:30-- https://10.96.0.1/api
Connecting to 10.96.0.1:443... connected.
ERROR: cannot verify 10.96.0.1's certificate, issued by ‘CN=kubernetes’:
Unable to locally verify the issuer's authority.
To connect to 10.96.0.1 insecurely, use `--no-check-certificate'.
As you can see it is connected to API
Strange issue might be useful to help me
I run kubectl get pods --all-namespaces
and see this output
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-9gmmq 1/1 Running 0 3d16h
kube-flannel kube-flannel-ds-rcq8l 0/1 CrashLoopBackOff 10 (3m15s ago) 34m
kube-flannel kube-flannel-ds-rqwtj 0/1 CrashLoopBackOff 10 (2m53s ago) 34m
kube-system etcd-masterk8s-virtual-machine 1/1 Running 1 (5d ago) 3d17h
kube-system kube-apiserver-masterk8s-virtual-machine 1/1 Running 2 (5d ago) 3d17h
kube-system kube-controller-manager-masterk8s-virtual-machine 1/1 Running 8 (2d ago) 3d17h
kube-system kube-scheduler-masterk8s-virtual-machine 1/1 Running 7 (5d ago) 3d17h
postgres-operator pgo-98c6b8888-fz8zj 0/1 CrashLoopBackOff 7 (4m59s ago) 20m
As you can see my two kube-flannel Pods are also in crash loop-back and one is running. I am not sure if this is the main cause of this problem
What I want? I want to run the PGO pod successfully with no error.
How you can help me? Please help me to find the issue or any other way to get detailed logs. I am not able to find the root cause of this problem because, If it was network issue then why its connected? if its something else then how can I find the information?
Update and New errors after apply the fixes:
time="2023-01-09T11:57:47Z" level=debug msg="debug flag set to true" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Metrics server is starting to listen" addr=":8080" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="upgrade checking enabled" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="starting controller runtime manager and will wait for signal to exit" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting server" addr="[::]:8080" kind=metrics path=/metrics version=5.3.0-0
time="2023-01-09T11:57:47Z" level=debug msg="upgrade check issue: namespace not set" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1beta1.PostgresCluster" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.ConfigMap" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Endpoints" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.PersistentVolumeClaim" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Secret" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Service" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.ServiceAccount" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Deployment" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.StatefulSet" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Job" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Role" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.RoleBinding" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.CronJob" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.PodDisruptionBudget" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.Pod" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting EventSource" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster source="kind source: *v1.StatefulSet" version=5.3.0-0
time="2023-01-09T11:57:47Z" level=info msg="Starting Controller" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster version=5.3.0-0
W0109 11:57:48.006419 1 reflector.go:324] k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0109 11:57:48.006642 1 reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
time="2023-01-09T11:57:49Z" level=info msg="{\"pgo_versions\":[{\"tag\":\"v5.1.0\"},{\"tag\":\"v5.0.5\"},{\"tag\":\"v5.0.4\"},{\"tag\":\"v5.0.3\"},{\"tag\":\"v5.0.2\"},{\"tag\":\"v5.0.1\"},{\"tag\":\"v5.0.0\"}]}" X-Crunchy-Client-Metadata="{\"deployment_id\":\"288f4766-8617-479b-837f-2ee59ce2049a\",\"kubernetes_env\":\"v1.26.0\",\"pgo_clusters_total\":0,\"pgo_version\":\"5.3.0-0\",\"is_open_shift\":false}" version=5.3.0-0
W0109 11:57:49.163062 1 reflector.go:324] k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0109 11:57:49.163119 1 reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
W0109 11:57:51.404639 1 reflector.go:324] k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0109 11:57:51.404811 1 reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
W0109 11:57:54.749751 1 reflector.go:324] k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0109 11:57:54.750068 1 reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
W0109 11:58:06.015650 1 reflector.go:324] k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0109 11:58:06.015710 1 reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
W0109 11:58:25.355009 1 reflector.go:324] k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0109 11:58:25.355391 1 reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
W0109 11:59:10.447123 1 reflector.go:324] k8s.io/[email protected]/tools/cache/reflector.go:167: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
E0109 11:59:10.447490 1 reflector.go:138] k8s.io/[email protected]/tools/cache/reflector.go:167: Failed to watch *v1.PodDisruptionBudget: failed to list *v1.PodDisruptionBudget: poddisruptionbudgets.policy is forbidden: User "system:serviceaccount:postgres-operator:pgo" cannot list resource "poddisruptionbudgets" in API group "policy" at the cluster scope
time="2023-01-09T11:59:47Z" level=error msg="Could not wait for Cache to sync" controller=postgrescluster controllerGroup=postgres-operator.crunchydata.com controllerKind=PostgresCluster error="failed to wait for postgrescluster caches to sync: timed out waiting for cache to be synced" version=5.3.0-0
time="2023-01-09T11:59:47Z" level=info msg="Stopping and waiting for non leader election runnables" version=5.3.0-0
time="2023-01-09T11:59:47Z" level=info msg="Stopping and waiting for leader election runnables" version=5.3.0-0
time="2023-01-09T11:59:47Z" level=info msg="Stopping and waiting for caches" version=5.3.0-0
time="2023-01-09T11:59:47Z" level=error msg="failed to get informer from cache" error="Timeout: failed waiting for *v1.PodDisruptionBudget Informer to sync" version=5.3.0-0
time="2023-01-09T11:59:47Z" level=error msg="error received after stop sequence was engaged" error="context canceled" version=5.3.0-0
time="2023-01-09T11:59:47Z" level=info msg="Stopping and waiting for webhooks" version=5.3.0-0
time="2023-01-09T11:59:47Z" level=info msg="Wait completed, proceeding to shutdown the manager" version=5.3.0-0
panic: failed to wait for postgrescluster caches to sync: timed out waiting for cache to be synced
goroutine 1 [running]:
main.assertNoError(...)
github.com/crunchydata/postgres-operator/cmd/postgres-operator/main.go:42
main.main()
github.com/crunchydata/postgres-operator/cmd/postgres-operator/main.go:118 +0x434
If this is a new deployment, I suggest using v5.
That said, as PGO manages the networking for Postgres clusters (and as such, manages listen_adresses), there's no reason to modify the listen_addresses configuration parameter. If you need to manage networking or networking access, you can do that by setting the pg_hba config or using NetworkPolicies.
Please go through the Custom 'listen_addresses' not applied #2904 for more information.
CrashLoopBackOff: Check the pod logs for configuration or deployment issues such as missing dependencies (Like : kubernetes engine doesn't support docker-compose depends-on, so now we are using kubernetes + docker without nginx) and also check for pods being OOM killed and excessive resource usage.
Check for the timeout issues and also lab on timeout problem
Try solution for the above Error : first, remove ip link flannel.1 on every hosts which has this problem
secondly, delete kube-flannel-ds from k8s
last, recreate kube-flannel-ds from k8s, flannel.1 ip link will recreated and return back good.
(For flannel to work correctly, you must pass
--pod-network-cidr=10.244.0.0/16
to kubeadm init.(I mean Change CIDR).)Edit :
Please check similar issue and solution ,which may help to resolve your issue.