Why do we use the nomkl version of the python modules?

heather999 · September 27, 2017, 3:01pm

Hi,
I apologize if this has been answered elsewhere. Can someone remind me why we use the nomkl version of the Anaconda python modules: numpy, scipy, etc? We’ve been investigating performance issues at NERSC and inevitably the question is going to come up - right off the top of my head, I don’t have a ready answer
http://www.nersc.gov/users/data-analytics/data-analytics-2/python/python-on-cori-knl

Thank you!
Heather

timj · September 27, 2017, 3:11pm

Because there is a problem with a symbol clash in one of our packages that is still to be fixed: DM-8146.

heather999 · September 27, 2017, 3:21pm

That’s actually good news - since it looks like we will ultimately move to the mkl versions of those python modules.
Is there any sense of whether this is being actively looked at? I see the final comment is from last Jan.
Thanks,
Heather

KSK · September 27, 2017, 4:43pm

It doesn’t look like it is being currently worked on. It doesn’t seem like it would be particularly hard to fix this, but I don’t understand the details. Is this going to be a blocker for you going forward and do you think there would be effort in NERSC/DESC to look for a potential fix?

heather999 · September 27, 2017, 5:08pm

Situation murky We are seeing a significant (but not unexpected) performance hit at NERSC as we try out KNL. It may be that python just runs that much more slowly on KNL, but the mkl may offer an improvement. It would be very entertaining to try out a representative bit of the stack with the MKL and see what improvement we see. It would be great to be able to do that using galsim, but meas_base seems like a fairly low level package though, and I’m guessing galsim depends on it (I need to check that).
if so, could we formulate a suitable test and try some runs with and without the mkl libs? I suspect if we find a significant improvement - DESC would be interested in pursing a fix. If we really can’t provide some evidence - it’s still worth asking…

KSK · September 27, 2017, 5:31pm

Galsim does not depend on meas_base but the imsim tool that uses the catsim packages does. If doing a test with galsim is sufficient, I would be very interested in the results.

heather999 · September 27, 2017, 6:23pm

Perhaps a question for @timj, if I were to attempt a fresh newinstall of galsim that uses the mkl - I could set up an external python with the MKL versions of numpy, etc that are close to what is listed in the lsstsw package list:

github.com

lsst/lsstsw/blob/master/etc/conda3_packages-linux-64.txt

# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
asn1crypto=0.22.0=py36_0
astropy=2.0.1=np113py36_0
bottleneck=1.2.1=np113py36_0
certifi=2016.2.28=py36_0
cffi=1.10.0=py36_0
cryptography=1.8.1=py36_0
cycler=0.10.0=py36_0
cython=0.26=py36_0
dbus=1.10.20=0
expat=2.1.0=0
fontconfig=2.12.1=3
freetype=2.5.5=2
future=0.16.0=py36_1
h5py=2.7.0
glib=2.50.2=1
gst-plugins-base=1.8.0=0
gstreamer=1.8.0=0

This file has been truncated. show original

would that be enough or would newinstall attempt to pull in the nomkl versions? I think that might work… since I may have accidentally done that in the past

Take care,
Heather

timj · September 27, 2017, 6:37pm

I thought that if you provide your own python newinstall.sh won’t try to force its own versions of conda packages to be installed (it shouldn’t do that). I just did a version bump (see DM python and associated packages version baseline change) of all our conda packages so specific versions should be an issue once you’ve got a new astropy, numpy and matplotlib.

heather999 · September 27, 2017, 10:12pm

One more random question that’s not so related to mkl so much… It looks like we want to add compiler flags (-g --dynamic -debug inline-debug-info) to allow profiling at NERSC. Is there already an option to perform a debug build via newinstall or lsstsw?

NERSC mentions the intel compiler often and seems to suggest we might see better performance - thus far I’ve stuck with gcc - but has anyone tried using an intel compiler with the stack? Can we? I see the libgcc module in Anaconda - does that tie us to the gnu compiler?
Take care,
Heather

jbosch · September 28, 2017, 1:54pm

I know we got the stack working with an Intel compiler many years ago (before construction), but I don’t think anyone has tried since. It does build with Clang on Linux (at least it did earlier this year), so we’re certainly not tied too tightly to gcc. My best guess is that compiling with an Intel compiler would not work out-of-the-box, but it could be made to work with relatively minor changes, as long as the compiler version is new enough to support C++11.

swinbank · September 28, 2017, 11:56pm

Maybe worth adding that many of the developers and the CI system regularly build with Clang on macOS, too.

heather999 · October 26, 2017, 3:09am

Just coming back to note that I haven’t forgotten about this. I was able to get Mario’s suggested changes incorporated into meas_base and built at NERSC. The meas_base unit tests completed successfully. Then I set up a MKL installation of the stack. Thus far running galsim and imSim have not shown any significant speed ups. I’ll keep investigating when I get a chance. I do think it’s still worthwhile to consider taking Mario’s update to allow everyone the benefit of using the MKL. It would likely make it easier for users to use their own Anaconda python installations.

Next up I’ll try the intel compiler.

Take care,
Heather