parallelized methods in python

Question

I am working on a scientific cluster, that has been recently upgraded by the administrator, and now my code is superslow, whereas it used to be decent. I am using python 3.4

The way this kind of things work is the following: I have to guess what the administrator may have changed and then ask him to make the opportune changes, because if I ask him a direct question we will not conclude anything.

So, I have run my code with a profiler and I have found that there are some routines that are called many times, these routines are:

built-in method array (called ~10^5, execution time 0.003s)
sort of numpy.ndarray (~5000, 0.03s)
uniformof mtrand.RandomState (~2000, 0.03s)

My guess is that some of these libraries were parallelized in the previous installed version of python, for example being linked to mpi-parallelized or multi-threated math kernel libraries.

I would like to know if my guess is correct or if I have to think to something else, because my code itself has not changed.

The routines I have quoted here are the most relevant, because they account for 85% of the total time. in particular, array takes 55% if the total time. The efficiency of my code was degraded by a factor 10. Before talking with the system manager I would like to get confirmation that these routines do have a parallel version.

Of course I cannot test my code on the new and old configuration of the cluster, because the old configuration is gone. But I can see that on this cluster numpy.array takes 8minutes, while on the other cluster that I have it takes 2seconds. From top I can see that the memory used is always very low (~0.1%) while a single CPU is used at 100%.

 In [3]: numpy.__config__.show()
 lapack_info:
     libraries = ['lapack']
     library_dirs = ['/usr/lib64']
     language = f77
 atlas_threads_info:
     libraries = ['satlas']
     library_dirs = ['/usr/lib64/atlas']
     define_macros = [('ATLAS_WITHOUT_LAPACK', None)]
     language = c
     include_dirs = ['/usr/include']
 blas_opt_info:
     libraries = ['satlas']
     library_dirs = ['/usr/lib64/atlas']
     define_macros = [('ATLAS_INFO', '"\\"3.10.1\\""')]
     language = c
     include_dirs = ['/usr/include']
 atlas_blas_threads_info:
     libraries = ['satlas']
     library_dirs = ['/usr/lib64/atlas']
     define_macros = [('ATLAS_INFO', '"\\"3.10.1\\""')]
     language = c
     include_dirs = ['/usr/include']
 openblas_info:
   NOT AVAILABLE
 lapack_opt_info:
     libraries = ['satlas', 'lapack']
     library_dirs = ['/usr/lib64/atlas', '/usr/lib64']
     define_macros = [('ATLAS_WITHOUT_LAPACK', None)]
     language = f77
     include_dirs = ['/usr/include']
 lapack_mkl_info:
   NOT AVAILABLE
 blas_mkl_info:
   NOT AVAILABLE
 mkl_info:
   NOT AVAILABLE

ldd /usr/lib64/python3.4/site-packages/numpy/core/_dotblas.cpython-34m.so
     linux-vdso.so.1 =>  (0x00007fff46172000)
     libsatlas.so.3 => /usr/lib64/atlas/libsatlas.so.3 (0x00007f0d941a0000)
     libpython3.4m.so.1.0 => /lib64/libpython3.4m.so.1.0 (0x00007f0d93d08000)
     libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f0d93ae8000)
     libc.so.6 => /lib64/libc.so.6 (0x00007f0d93728000)
     libgfortran.so.3 => /lib64/libgfortran.so.3 (0x00007f0d93400000)
     libm.so.6 => /lib64/libm.so.6 (0x00007f0d930f8000)
     libdl.so.2 => /lib64/libdl.so.2 (0x00007f0d92ef0000)
     libutil.so.1 => /lib64/libutil.so.1 (0x00007f0d92ce8000)
     /lib64/ld-linux-x86-64.so.2 (0x00007f0d950e0000)
     libquadmath.so.0 => /lib64/libquadmath.so.0 (0x00007f0d92aa8000)
     libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f0d92890000)

Numpy is already linked to atlas, and I see a link to libpthread.so (so I assume it is already multithreated, is it?).

On the other side, I updated the version of numpy from 1.8.2 to 1.9.2 and now array method only takes 5 s instead of 300s. I think this is probably the reason of my code slowing down (maybe, did the system-adminstrator downgrade numpy version? who knows!)

The most probable issue could be that before the update numpy was linked to an optimized/multithreaded BLAS library for vector matrix operations (for instance OpenBlas, Blas Atlas, MKL etc), while it uses a slower reference implementation after the update. — rth
– rth, Commented Apr 29, 2015 at 0:19
Could you rather show the total time in your profiling? It doesn't matter so much that a routine was called 10^5 times, if the cumulated time spend there is low. Just to clarify, you python code by itself is not parallel, and rather relies on a multithreaded implementation of BLAS, right? — rth
– rth, Commented Apr 29, 2015 at 0:22
@rth I have edited my question with answers to your comments — simona
– simona, Commented Apr 29, 2015 at 10:43
Ummm... what was the previous version? I guess it was 2.x, then most probably this version of python didn't get removed. Instead of tracking what changed in python (although changes in python itself shouldn't make codes slower going from 2 to 3), and which libraries were linked to parallel math kernels and now are not, it might be easier just to use python 2 interpreter. My bet is python is now linked to python3 but there should be python2 still available. — luk32
– luk32, Commented Apr 29, 2015 at 10:58
@simona: [This question]([stackoverflow.com/questions/21671040/…) and its answer gives good information about checking the configuration. — Jonathan Dursi
– Jonathan Dursi, Commented Apr 29, 2015 at 12:52

Roland Smith · Accepted Answer · 2015-05-25 00:50:34Z

1

A parallelized BLAS only helps with a limited amount of numpy/scipy functions (see these test scripts);

numpy.dot
scipy.linalg.cholesky
scipy.linalg.svd

If you can run

import numpy.core._dotblas

without getting an ImportError, you have an optimized numpy.dot available.

Array creation speed should not be influenced by this, however.

Can you post your code and how you use it? Or else a minimal example that has the problem? How is your code run on the cluster?

answered May 25, 2015 at 0:50

Roland Smith

43.8k3 gold badges69 silver badges98 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

parallelized methods in python

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related