define shared array in GPU memory with Python?

Question

I am trying to use an array shared by multiple processes with Python. And I did a CPU version by defining the array with multiprocessing.RawArray and using the array with numpy.frombuffer(). When I tried to port the code to GPU with chainer.cuda.to_gpu(), I found that each process simply copies its own copy in GPU memory space and the array is not shared. Does anyone know if there is a way to fix that?

Thanks!

Peter Teoh · Accepted Answer · 2016-11-14 15:46:30Z

There may be a way to solve your problem.

Just look here:

sudo nvidia-smi
Mon Nov 14 16:14:48 2016       
+------------------------------------------------------+                       
| NVIDIA-SMI 358.16     Driver Version: 358.16         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TIT...  Off  | 0000:01:00.0     Off |                  N/A |
| 35%   76C    P2   111W / 250W |    475MiB / 12287MiB |     60%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     12235    C   /home/tteikhua/torch/install/bin/luajit        215MiB |
|    0     27771    C   python                                         233MiB |
+-----------------------------------------------------------------------------+

Using the "nvidia-smi" command, you can see that a program written to use the GPU in Torch is also sharing the GPU memory (just total up the individual GPU memory and you can see that it is less than the total 12G memory which Titan X have) with the python (which is running Tensorflow).

And the underlying mechanism how the sharing is done? It is because both Torch and Tensorflow are compiled with CUDA, and the access to the GPU is also shared:

ls -al /proc/12235/fd|grep nvidia0

lrwx------ 1 tteikhua tteikhua 64 Nov 12 07:54 10 -> /dev/nvidia0
lrwx------ 1 tteikhua tteikhua 64 Nov 12 07:54 11 -> /dev/nvidia0
lrwx------ 1 tteikhua tteikhua 64 Nov 12 07:54 12 -> /dev/nvidia0
lrwx------ 1 tteikhua tteikhua 64 Nov 12 07:54 4 -> /dev/nvidia0
lrwx------ 1 tteikhua tteikhua 64 Nov 12 07:54 5 -> /dev/nvidia0
lrwx------ 1 tteikhua tteikhua 64 Nov 12 07:54 6 -> /dev/nvidia0

ls -al /proc/27771/fd|grep nvidia0

lrwx------ 1 tteikhua tteikhua 64 Nov 14 15:51 10 -> /dev/nvidia0
lrwx------ 1 tteikhua tteikhua 64 Nov 14 15:51 11 -> /dev/nvidia0
lrwx------ 1 tteikhua tteikhua 64 Nov 14 15:51 15 -> /dev/nvidia0
lrwx------ 1 tteikhua tteikhua 64 Nov 14 15:51 16 -> /dev/nvidia0
lrwx------ 1 tteikhua tteikhua 64 Nov 14 15:51 17 -> /dev/nvidia0
lrwx------ 1 tteikhua tteikhua 64 Nov 14 15:51 9 -> /dev/nvidia0

So how to achieve this?

Look at the picture here below:

http://cuda-programming.blogspot.sg/2013/01/shared-memory-and-synchronization-in.html

and this:

https://www.bu.edu/pasi/files/2011/07/Lecture31.pdf

This is sharing between GPU and CPU. But your "sharing" is two different process sharing the same GPU memory. This is possible as shown below:

Modifying simpleMultiCopy from CUDA samples:

| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     12235    C   /home/tteikhua/torch/install/bin/luajit        215MiB |
|    0     27771    C   python                                         233MiB |
|    0     31014    C   ./simpleMultiCopy                              238MiB |
|    0     31021    C   ./simpleMultiCopy                              238MiB |
|    0     31024    C   ./simpleMultiCopy                              238MiB |
+-----------------------------------------------------------------------------+

You see that running multiple copies of the same program result in concurrent sharing of GPU memory as the individual memory used by different program add up to the total used on the GPU.

For chainer, I did a "git clone https://github.com/pfnet/chainer", and then the examples/mnist directory ran "python train_mnist.py --gpu=0" twice, and subsequently got this:

|   0  GeForce GTX 850M    Off  | 0000:01:00.0     Off |                  N/A |
| N/A   64C    P0    N/A /  N/A |    322MiB /  4043MiB |     81%      Default |
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     17406    C   python                                         160MiB |
|    0     17414    C   python                                         160MiB |
+-----------------------------------------------------------------------------+

which means that the two different processes are sharing the same GPU memory.

Thanks! This might be a dumb question, but I still cannot get why "Each process will open up the /dev/nvidia0 and thereafter no other process can open it." would make it impossible to share array. Would it be possible that each process opens up /dev/nvidia0 and modifies this shared array and yields this GPU to other processes (like multiprocessing in single CPU)? When you say "thereafter no other process can open it", do you mean other processes will get stuck forever or other processes cannot access this array concurrently?
sorry, my mistake, now I have updated the answer again, and I hope it clarify everything. you have better chance of success now.

Collectives™ on Stack Overflow

define shared array in GPU memory with Python?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related