I am trying to locate a machine in one of our environments that is causing a Sensu notification. The hostname and IP address listed in the notification are all messed up, because at the time the machine was being created, it had different data. So the wrong data stuck, and the machine is still alive and kicking... I mean, sending wrong data to the Sensu server from somewhere.
I have attempted to track down the address of the machine. With help of tcpdump, I found the same kind of packet that I am looking for occurring in two places:
1) At every machine that is running a Sensu client, I see packets with the right payload leaving for the Sensu server machine. Sensu config files tell me that Sensu is using RabbitMQ on the same machine as the Sensu server, and the packets are heading straight for that.
2) At the Sensu server, I see all of those packets incoming from a local 10...* IP address, from all kinds of different ports. When I probed that IP address with a wget, it game me the index.html of the Sensu dashboard, so that local address seems to be the same machine - probably RabbitMQ or something, since Sensu uses that.
There's probably up to a hundred machines running a Sensu client in our environments, but there are nowhere nearly as many connections or source IP addresses in the incoming traffic. So, I can't figure out how to find the right source machine other than brute force shutting down every machine one by one and seeing when a different notification pops up.
Extra information: our machines are all in AWS, and are provisioned by Puppet after creation. Sensu is baked into the base AMI so that we can get alerted if Puppet fails right away. Except that Puppet didn't even know who he was at the time it failed.
EDIT: also, now that I think about it, it may be important that the Sensu server is sitting behind an Elastic Load Balancer, which is behind a Route 53 entry, which is where all the Sensu clients are sending stuff.
ELB turned out to be the trouble. As soon as I rerouted Route 53 directly to the Sensu server and (because of caching issues) took the Sensu server out of the ELB, all the incoming connections assumed correct IP addresses. Wasn't a Sensu problem after all.