I have a server running nginx, and there are a large number of configurations for different web resources, each configuration has its own redirection rules and upstreams.
From time to time errors such as (three examples for different domains) appear in different logs (each domain has its own log).
2023/11/17 16:00:27 [error] 90304#90304: *16977866956 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.0.0.1, server: my1.site.com, request: "GET /papi/1.0/40-a46a-bdd7e4749 HTTP/1.1", upstream: "http://10.0.0.2:30415/60-d74f", host: "my1.site.com"
2023/11/17 16:00:27 [error] 90305#90305: *16977868169 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.0.58.1, server: my2.site.com, request: "PUT /papi/1.0/b9ad-4c28 HTTP/1.1", upstream: "http://10.0.0.3:30415/papi/1.0/c28-b913-02aebafeea", host: "my2.site.com"
2023/11/17 16:00:28 [error] 90301#90301: *16977870167 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.0.0.8, server: my3.site.com, request: "GET /papi/1.0/d-6329-4c78-ba08 HTTP/1.1", upstream: "http://10.0.0.92:30415/912d-6329-", host: "my3.site.com"
I need to receive information from Zabbix about whether there are errors in the logs, as an example, I’m trying to count the number of matches in the last minute
awk '($0 >= from)' from="$(LC_ALL=C date +"%Y/%m/%d %H:%M" -d -1minute)" /var/log/nginx/* | grep "Connection timed out" | wc -l
I send the number of errors found to Zabbix and if the number of errors is greater than my trigger threshold, the trigger is triggered. Graph for example
But I have a lot of different domains that are serviced on Nginx, and I also have several such environments with other domains. Because of this, the resulting number of errors is not particularly informative, since there is no understanding for which specific domain errors occur
Perhaps there is an option, based on the log that I provided, to receive information not only about the number of errors over a certain period of time, but also to compare them with the domain and transfer all this information to Zabbix.
Ideally, use Discovery Rules in a zabix template for this. That is, based on the data received about the domain and the number of errors found for this domain, automatically create a new item with the name of the domain and a trigger for it, and so that in the future all errors found are compared with this domain, if the item for it already exists in Zabbix, and not there is a need (for example, a new configuration has been added to Nginx for a new domain), create a new item again
I still can’t figure out how to implement it correctly. Maybe someone has a simple solution?
you can create a discovery to find all domains, something like
sed '/.*host: "\([^"]\)".*/\1/' | uniq
then you create an item prototype, adding the domain as a parameter (
$1
) to the script you already have. Resulting command example:awk '($0 >= from)' from="$(LC_ALL=C date +"%Y/%m/%d %H:%M" -d -1minute)" /var/log/nginx/* | grep "Connection timed out" | grep $1 | wc -l