High CPU usage by sssd_nss during heavy disk IO

6.7k views Asked by At

I'm on Oracle Enterprise Linux 7u2 where I perform frequent, heavy maven builds which generate a large number of jars/wars/ears. What I've noticed recently (after some of the meltdown / spectre patches) is very heavy CPU utilization by this process:

/usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files

When my server is idle? No problems. But during the heavy disk IO portions of my maven builds, the maven java process and sssd_nss fight over CPU, each taking about 50% of the total. (For reference, I have a 4 core Xeon server)

I don't really know this process is (except that it might deal with LDAP?) or why it would care about java file copying and zipping. (This is all on local / non-NFS disk)

1

There are 1 answers

0
Eric M. Johnson On

sssd_nss is the daemon that abstracts user/group information requests from downstream services such as LDAP. It doesn't actually do the lookup, but rather makes the request to the service that does it, first checking a local disk cache.

This makes me think that the heavy I/O portions are doing a lot of operations around users and groups (eg. lookup up the username for a UID, look up the groups for a UID).

You should also look into whether the high sssd_nss CPU is IOWAIT. This would indicate that you are indeed doing a lot of user/group queries and somehow that is being held up by disk I/O. You can use top to see the overall system IOWAIT (look for the wa), and iotop to get per-process metrics.

If it is primarily IOWAIT, you may need to separate add I/O capacity or separate your build volumes from your system volumes. I have my doubts that this is the root cause of your issue.

You mention this has happened after meltdown/spectre patches. This may indicate the build process is initiating a lot of system calls in sssd_nss which are now slower with those patches. You may want to look into your build process and see if there are unnecessary user/group related commands. You can look into the system calls being called using strace -p $pid_of_sssd_nss or use sysdig for even fancier analysis. If that service is doing a lot of system calls, look into what calls it is making and figure out where your build process is initiating those calls. Then try to minimize them.