Htop cpu bar red, 100% kernel time

Question

Htop cpu bar red, 100% kernel time

1.7k views Asked by wstcegg At 17 April 2022 at 13:07

I found some similar topics but no helpful solution was found. Since I have some more information to provide, I opened this issue.

My PyTorch script frequently gets stuck on a training server. Htop shows that there is only one green CPU bar while other active cores are almost 100% red. According to the F1 explanation, red means kernel time.

Whenever this 100% red CPU bar occurs, the training gets stuck and GPU-util drops down to 0%. Wired thing is this only happens on two of the servers I use. It never happens on my PC (less powerful) and never happens on another powerful server.

The strace command shows that when the problem occurs, there will be many

futex(0x55bbb0e82db0, FUTEX_WAKE_PRIVATE, 1) = 0

Any explanation on what the problem is and how to avoid this. Or any further information to provide?

Original Q&A

There are 1 answers

**wstcegg** · Accepted Answer · 2022-06-26T07:17:42+00:00

I solved the problem and found possible causes.

The CPU usage is high means the CPU is working, so this means no disk IO limitation is happening.
The GPU usage is low means that GPU is not correctly fed.
This means RAM is the most likely bottleneck for my case.

As mentioned in the GitHub issue, multi-process accessing the same python object causes the object ref-count to increase. In fork mode, this triggers page allocation thus slowing down the system performance.

This system behavior can not be detected by python memory allocation libs such as Memray[https://github.com/bloomberg/memray] or so. But might be detected by other system-level memory tools such as Valgrind [https://valgrind.org/]

https://github.com/pytorch/pytorch/issues/13246#issuecomment-905703662

The final solution is to reduce accessing python objects from the forked process.

TechQA.

Htop cpu bar red, 100% kernel time

There are 1 answers

Related Questions in KERNEL

Related Questions in STRACE

Related Questions in HTOP

Popular Questions

Popular Tags

Trending Questions