In my project, I am trying to implement custom metrics to support HPA in K8S.
With this guide Prometheus Custom Metrics Adapter step by step I prepare code and implement into my cluster (only different is that in my version of k8s I'm using v1.custom.metrics.k8s.io not v1beta1.custom.metrics.k8s.io)
I'm using GKE in GCP.
YAML file with pods and services:
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: fb-poster
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
containers:
- name: prometheus
image: prom/prometheus
args:
- '--storage.tsdb.retention=6h'
- '--storage.tsdb.path=/prometheus'
- '--config.file=/etc/prometheus/prometheus.yml'
ports:
- containerPort: 9090
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus/
- name: data-volume
mountPath: /prometheus
volumes:
- name: config-volume
configMap:
name: prometheus-config
- name: data-volume
emptyDir: {}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-adapter-monitor
labels:
app: prometheus-adapter-monitor
namespace: fb-poster
spec:
replicas: 1
selector:
matchLabels:
app: prometheus-adapter-monitor
template:
metadata:
labels:
app: prometheus-adapter-monitor
name: prometheus-adapter-monitor
spec:
serviceAccountName: cluster-monitoring
containers:
- name: prometheus-adapter-monitor
image: directxman12/k8s-prometheus-adapter-amd64:v0.5.0
args:
- "--secure-port=6443"
- "--tls-cert-file=/var/run/serving-cert/serving.crt"
- "--tls-private-key-file=/var/run/serving-cert/serving.key"
- "--logtostderr=true"
- "--metrics-relist-interval=1m"
- "--prometheus-url=http://prometheus-service:9090"
- "--v=10"
- "--config=etc/adapter/config.yml"
ports:
- containerPort: 6443
volumeMounts:
- mountPath: /var/run/serving-cert
name: volume-serving-cert
readOnly: true
- mountPath: /etc/adapter/
name: config
readOnly: true
- mountPath: /tmp
name: tmp-vol
volumes:
- name: volume-serving-cert
secret:
secretName: cm-adapter-serving-certs
- name: config
configMap:
name: prometheus-config
- name: tmp-vol
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: prometheus-service
spec:
selector:
app: prometheus
ports:
- protocol: TCP
port: 9090
targetPort: 9090
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
name: prometheus-adapter-monitor
namespace: fb-poster
spec:
ports:
- port: 443
targetPort: 6443
selector:
app: prometheus-adapter-monitor
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
name: v1.custom.metrics.k8s.io
spec:
service:
name: prometheus-adapter-monitor
namespace: fb-poster
group: custom.metrics.k8s.io
version: v1
insecureSkipTLSVerify: true
groupPriorityMinimum: 100
versionPriority: 100
configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: fb-poster
data:
prometheus.yml: |
global:
scrape_interval: 10s
evaluation_interval: 10s
rule_files:
- "custom_rules.yml"
scrape_configs:
- job_name: 'selenium-hub'
static_configs:
- targets: ['selenium-hub-export-service:9104']
- job_name: 'prometheus'
static_configs:
- targets: ['prometheus-service:9090']
custom_rules.yml: |
groups:
- name: custom.rules
rules:
- record: free_nodes_percentage
expr: 100 * (selenium_grid_node_count - selenium_grid_session_count) / selenium_grid_node_count
config.yml: |
rules:
- seriesQuery: 'free_nodes_percentage'
resources:
overrides:
kubernetes_namespace: {resource: "namespace"}
kubernetes_pod_name: {resource: "pod"}
name:
matches: "free_nodes_percentage"
After applying this YAML to the cluster, all my pods are working. from the prometheus-adapter-monitor pod, I get the logs:
I0620 21:06:01.122341 1 api.go:74] GET http://prometheus-service:9090/api/v1/series?match%5B%5D=selenium_grid_node_count%7Bkubernetes_namespace%21%3D%22%22%2Ckubernetes_pod_name%21%3D%22%22%7D&start=1687293961.117 200 OK
I0620 21:06:01.122618 1 api.go:93] Response Body: {"status":"success","data":[]}
I0620 21:06:01.122800 1 provider.go:270] Set available metric list from Prometheus to: [[]]
I0620 21:06:01.286377 1 handler.go:143] prometheus-metrics-adapter: GET "/apis/custom.metrics.k8s.io/v1" satisfied by gorestful with webservice /apis/custom.metrics.k8s.io
I0620 21:06:01.286709 1 wrap.go:42] GET /apis/custom.metrics.k8s.io/v1: (736.267µs) 404 [[Go-http-client/2.0] 10.45.0.31:48374]
When i kubectl describe apiservice v1.custom.metrics.k8s.io
i recived:
Name: v1.custom.metrics.k8s.io
Namespace:
Labels: <none>
Annotations: <none>
API Version: apiregistration.k8s.io/v1
Kind: APIService
Metadata:
Creation Timestamp: 2023-06-20T21:04:55Z
Managed Fields:
API Version: apiregistration.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:kubectl.kubernetes.io/last-applied-configuration:
f:spec:
f:group:
f:groupPriorityMinimum:
f:insecureSkipTLSVerify:
f:service:
.:
f:name:
f:namespace:
f:port:
f:version:
f:versionPriority:
Manager: kubectl-client-side-apply
Operation: Update
Time: 2023-06-20T21:04:55Z
API Version: apiregistration.k8s.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
f:conditions:
.:
k:{"type":"Available"}:
.:
f:lastTransitionTime:
f:message:
f:reason:
f:status:
f:type:
Manager: kube-apiserver
Operation: Update
Subresource: status
Time: 2023-06-20T21:05:02Z
Resource Version: 11540103
UID: 7c82cbb9-41df-4469-9394-a03da9d34bef
Spec:
Group: custom.metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: prometheus-adapter-monitor
Namespace: fb-poster
Port: 443
Version: v1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2023-06-20T21:04:55Z
Message: failing or missing response from https://10.45.1.70:6443/apis/custom.metrics.k8s.io/v1: bad status from https://10.45.1.70:6443/apis/custom.metrics.k8s.io/v1: 404
Reason: FailedDiscoveryCheck
Status: False
Type: Available
Events: <none>
I have no clue how can i solve this problem. I found some related posts, but they don't cover my problem and are from a few years ago.
UPDATE 27.06.2023 @DanF according to your answer I have checked my cluster network details:
Private cluster Disabled
Network default
Subnet default
Stack type IPv4
Private control plane’s endpoint subnet default
VPC-native traffic routing Enabled
Pod IPv4 address range (default) 10.82.128.0/17
Cluster Pod IPv4 ranges (additional) None
IPv4 service range 10.83.0.0/22
so I have filled @DanF commend like:
gcloud compute firewall-rules create allow-prometheus-adapter --action ALLOW --direction INGRESS --source-ranges 110.82.128.0/17 --rules tcp:6443 --network default
but still I'm reciving same error:
Spec:
Group: custom.metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: prometheus-adapter-monitor
Namespace: fb-poster
Port: 443
Version: v1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2023-06-27T19:07:53Z
Message: failing or missing response from https://10.82.128.155:6443/apis/custom.metrics.k8s.io/v1: bad status from https://10.82.128.155:6443/apis/custom.metrics.k8s.io/v1: 404
Reason: FailedDiscoveryCheck
Status: False
Type: Available
Events: <none>
Yeah I ran into this a while ago, not sure if it's the same issue but my solution was to run
I think the main cause is that you are not allowing traffic into that port and failing the discovery check.