Use shared GPU memory with TensorFlow?

37.7k views Asked by At

So I installed the GPU version of TensorFlow on a Windows 10 machine with a GeForce GTX 980 graphics card on it.

Admittedly, I know very little about graphics cards, but according to dxdiag it does have:

4060MB of dedicated memory (VRAM) and;

8163MB of shared memory

for a total of about 12224MB.

What I noticed, though, is that this "shared" memory seems to be pretty much useless. When I start training a model, the VRAM will fill up and if the memory requirement exceeds these 4GB, TensorFlow will crash with a "resource exhausted" error message.

I CAN, of course, prevent reaching that point by choosing the batch size suitably low, but I do wonder if there's a way to make use of these "extra" 8GB of RAM, or if that's it and TensorFlow requires the memory to be dedicated.

5

There are 5 answers

2
Alex I On BEST ANSWER

Shared memory is an area of the main system RAM reserved for graphics. References:

https://en.wikipedia.org/wiki/Shared_graphics_memory

https://www.makeuseof.com/tag/can-shared-graphics-finally-compete-with-a-dedicated-graphics-card/

https://youtube.com/watch?v=E5WyJY1zwcQ

This type of memory is what integrated graphics eg Intel HD series typically use.

This is not on your NVIDIA GPU, and CUDA can't use it. Tensorflow can't use it when running on GPU because CUDA can't use it, and also when running on CPU because it's reserved for graphics.

Even if CUDA could use it somehow. It won't be useful because system RAM bandwidth is around 10x less than GPU memory bandwidth, and you have to somehow get the data to and from the GPU over the slow (and high latency) PCIE bus.

Bandwidth numbers for reference : GeForce GTX 980: 224 GB/s DDR4 on desktop motherboard: approx 25GB/s PCIe 16x: 16GB/s

This doesn't take into account latency. In practice, running a GPU compute task on data which is too big to fit in GPU memory and has to be transferred over PCIe every time it is accessed is so slow for most types of compute that doing the same calculation on CPU would be much faster.

Why do you see that kind of memory being allocated when you have a NVIDIA card in your machine? Good question. I can think of a couple of possibilities:

(a) You have both NVIDIA and Intel graphics drivers active (eg as happens when running different displays on both). Uninstaller the Intel drivers and/or disable Intel HD graphics in the BIOS and shared memory will disappear.

(b) NVIDIA is using it. This may be eg extra texture memory, etc. It could also not be real memory but just a memory mapped area that corresponds to GPU memory. Look in the advanced settings of the NVIDIA driver for a setting that controls this.

In any case, no, there isn't anything that Tensorflow can use.

1
Lit Mage On

The accepted answer is based on old information and an ambiquity caused by a new graphics driver infrastructure in Windows 10:

Windows 10 Task Manager GPU Tab

This "shared memory" has nothing at all to do with the BIOS or dedicated/integrated GPU. (That memory will show as "dedicated GPU memroy" on this page. It is a purely driver convention, hardcoded into Windows as 50% of your system RAM and the only way to increase it is to add more RAM. It can be used by DirectML and thus by proxy, by Tensorflow and Pytorch.

So to answer your question, you can make use of it if you use DirectML, and it will be a bit slower, but at least it will work.

(I am not allowed to embed images yet)

0
Doedigo On

Well, that's not entirely true. You're right in terms of lowering the batch size but it will depend on what model type you are training. if you train Xseg, it won't use the shared memory but when you get into SAEHD training, you can set your model optimizers on CPU (instead of GPU) as well as your learning dropout rate which will then let you take advantage of that shared memory for those optimizations while saving the dedicated GPU memory for your model resolution and batch size. So it may seem like that shared memory is useless, but play with your settings and you'll see that for certain settings, it's just a matter of redistributing the right tasks. You'll have increased iteration times but you'll be utilizing that shared memory in one way or another. I had to experiment a lot to find what worked with my GPU and there was some surprising revelations. This is an old post but I bet you've figured it out by now, hopefully.

0
juro On

I had the same problem. My vram is 6gb but only 4 gb was detected. I read a code about tensorflow limiting gpu memory then I try this code, but it works:

#Setting gpu for limit memory
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    #Restrict Tensorflow to only allocate 6gb of memory on the first GPU
   try:
        tf.config.experimental.set_virtual_device_configuration(gpus[0],
       [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=6144)])
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
   except RuntimeError as e:
       #virtual devices must be set before GPUs have been initialized
        print(e)

Note: if you have 10gb vram, then try to allocate a memory limit of 10240.

0
Ferry On

CUDA can make use of the RAM, as well. In CUDA shared memory between VRAM and RAM is called unified memory. However, TensorFlow does not allow it due to performance reasons.