Using: Telegraf v1.0.1
Telegraf procstat plugin's documentation: https://github.com/influxdata/telegraf/tree/master/plugins/inputs/procstat
My custom config File:
/etc/telegraf/telegraf.d/my_custom_process_service-telegraf.conf contains:
[[inputs.procstat]]
exe = "."
prefix = "service_process"
[[inputs.procstat]]
pid_file = "/var/run/jenkins/jenkins.pid"
prefix = "service_process"
The above configuration works fine per the syntax. This will give me metrics with metric's name starting: procstat.service.process.xx.xx(depending upon if you are converting _ with a . character) -or simply procstat.service_process.x.x metrics.
To catch any process running on the machine using exe = "." (it will do a pgrep "." operation and) to find all processes running on the machine by giving process_name=<processes> values; -OR using, pid_file = /var/run/jenkins/jenkins.pid (NOTE: Provided you have READ permission for the user which is running telegraf service) for processes which run behind Java/other wrappers; If you give pid_file = /var/run/jenkins/jenkins.pid and if Jenkins is running under user jenkins and /var/run/jenkins folder doesn't have at least "r-x" access + read "r" access on the pid file itself, then it'll will throw and error about "permission denied".
2017-01-10T18:13:30Z E! Error: procstat getting process, exe: [] pidfile: [/var/run/jenkins/jenkins.pid] pattern: [] user: [] Failed to read pidfile '/var/run/jenkins/jenkins.pid'. Error: 'open /var/run/jenkins/jenkins.pid: permission denied'
Question:
Is it possible for Telegraf to run in SUDO mode (if possible)? i.e. if I don't have r-x/r access to read a process's PID file and assuming there are lots of such processes (running behind Java/some Wrapper, so exe=xxxx won't work in such cases), then I have to use pid_file = ... method, then how can I have Telegraf working with this pid_file method for getting the process_name as jenkins or nexus etc.
PS: Doing chmod -R 775_or_755 /var/run on every host may not be feasible.
If I do give 755 permission at /var/ran/jenkins folder and 644 to jenkins.pid file, the permission error will go away. After this I tried to use metric: procstat.service.process.cpu.usage against process jenkins (i.e. process_name="jenkins") but it's not finding jenkins as it's value. Did I miss anything?
Added the following config in /etc/telegraf/telegraf.d/someFile.conf and fixed the permission issue using
Ansible's file module: http://docs.ansible.com/ansible/file_module.html