Conda Updates: Implementing the switch to conda-forge, conda compilers [RFC-679]

newinstall
conda
lsstsw
eups
miniconda
Tags: #<Tag:0x00007f61a37f7518> #<Tag:0x00007f61a37f7388> #<Tag:0x00007f61a37f7180> #<Tag:0x00007f61a37f6ff0> #<Tag:0x00007f61a37f6eb0>

(Brian Van Klaveren) #1

New Conda Environment

This post is to notify the Science Pipelines community of the implementation of a new conda environment from RFC-679. After some final checks tonight and tomorrow with ci_hsc, we intend to roll this out tomorrow, April 30th.

The essence of this change is a new conda environment, a new compiler, and an upgrade of our third parties. Details of those changes are below.

Orchestrating a change like this isn’t easy, and while I’ve tried to account for all the details, I may have missed one. If, after the change, you see some issues, please notify me and create a ticket (please use conda as a component if possible.)

Switching channels to conda-forge

By default, we have switched to conda-forge to supply our third party Python dependencies, rather than the conda distribution (“defaults”).

conda-forge is a “community-led collection of recipes, build infrastructure and distributions for the conda package manager” — fundamentally using conda-build and its metadata to build package recipes, with a process to contribute those recipes, leveraging common CI infrastructure to build and rebuild recipes for multiple platforms to provide compatible binaries for all sorts of packages.

Using conda-forge for (most) third party libraries and compilers.

Significantly, most third party dependencies, including Python and non-Python (C++), come from conda-forge instead of eups now. Most notably, we are now on cfitsio 3.47, and we have merged some code to help with that transition a bit. See DM-24376. This isn’t perfect, but we aim to improve this experience while working within the FITS standard.

We have contributed, and continue to contribute, recipes for third-parties to the conda-forge community to support our software. Our general approach going forward is to push third party or
generally useful software to conda-forge when necessary. For developing recipes to achieve this,
there is extensive documentation on the conda-forge site on how to add or maintain a recipe, and there are lots of helpful people in #dm-conda on Slack.

Using the compilers provided from conda-forge

To support this change to the conda-forge channel, especially on CentOS 7, we needed to switch to conda compilers to ensure compatibility with the provided packages — specifically, those maintained by the conda-forge community. We are using the default “comp7” set of compilers, which are based on GCC 7.3.0 for linux and clang 9.0.1 on macOS.

System Requirements, Installing and Building

We are working to reduce the differences between newinstall and lsstsw in addition to
simplifying newinstall. Much of this work will occur after the change.

On most Linux distributions, you should only need git, patch, curl, and make installed to
get up and running with lsstsw or newinstall.

newinstall.sh defaults to conda compilers

By default, newinstall.sh in master will assume you will want the conda-system compiler. This is
a generic term which denotes that conda is providing the compilers. There is a flag (-g) which can revert to the old behavior for compatibility.

lsstsw

Build manifests (EUPS version lists) now include the conda environment (repo and SHA1) that was activated when the build occurred. This links a published EUPS tag with the conda environment.

Additional Notes

Jenkins

Most notably, Jenkins is dropping centos6 and adding centos8 for builds. We are keeping centos7 as the baseline at present.

With this change, the packer-layercake method of building Docker containers will be deprecated in
favor of Dockerfiles in the lsst-dm/docker-scipipe repo. The Dockerfiles in the base-7 and base-8 directories in that repo will correspond with lsstdm/scipipe-base:7 and lsstdm/scipipe-base:8 respectively, as they currently are in docker hub. docker-newinstall will also be updated to be based off of the lsstdm/scipipe-base:7 image. The infra monthly jobs in jenkins-dm-jobs, which build some containers, will be modified soon after the changes occur.

Modifying a deployed conda environment

We don’t currently encourage users to modify their environment or install extra software, but it is inevitable some users may need to.

Currently, we do not change the condo environment configuration (.condarc) to add conda-forge to the channels, so if you install additional software on top of your environment, you may want to add conda-forge to your environment’s channels. You can do so from your activated environment by running the following command:

conda config --env --add channels conda-forge

This is especially true as we also do not pin the dependencies in an environment once installed. So if you attempt to conda install you may get packages updated from the defaults channel as opposed to the conda-forge channel. Pinning can be useful to prevent your dependencies from changing should you wish to modify your environment:

conda list > $CONDA_PREFIX/conda-meta/pinned

Adding new third parties

After an RFC is adopted, the process of adding a new third party is simplified. It usually involves
only adding it to the conda bleed file and getting a maintainer of the conda environment repo to regenerate the package’s files, although it’s possible to generate those files yourself.

If the third party library is to be compiled against and needs a sconsUtils config, you must now
create the config in the configs directory of sconsUtils instead of putting such a file in the
ups directory of the third party you would have added. See the configs directory in sconsUtils
for some examples.

Modifying conda-bleed files

Conda bleed files have been slightly modified to have some structure similar to conda’s meta.yaml
requirements sections.

build/host/run sections have been added with some packages moved into those sections. This may help provide some guidance to downstream conda packaging of the stack (e.g. stackvana or others).

The run section is not currently populated, but it is intended for additional software which is not required to build/test/use the stack, but software which is instead required to provide a useful
deployment environment, such as on the lsst-dev “shared-stack”.

Added packages and version changes

For the conda environment, we have had to make a few compatibility pins to get things out the door.

These include:

  • boost = 1.70.0
  • pybind11 < 2.3
  • treecorr < 4
  • pyqt < 5.12 (Linux only)
    • PyQT is required by matplotlib; this is to avoid a bug in the pyqt recipe which reports additional packages in conda env export

There are a few notable exceptions where we still rely on software in eups which is also available
in conda-forge:

  • eigen: jointcal (and packages depending on jointcal) will still setup the eups eigen package.

  • psfex: Our version of psfex has changed substantially from the version available from astromatic in conda-forge (astromatic-psfex) and they are not equivalent. Our fork has had autotools/Makefile changes from the upstream project backported to improve the builds with conda-forge.

This commit shows which packages have moved into conda-forge. These changes are summarized below.

  • apr 1.5.2→1.6.5
  • autograd 1.1→1.3
  • boost 1.69→1.70.0
  • cfitsio 3.360→3.470
  • eigen (See Note [1]) 3.3.9→3.3.9
  • esutil 0.63→0.64
  • galsim 2.2.1→2.2.3
  • gsl 2.6→2.6
  • healpy 1.10.3→1.13.0
  • libaprutil (apr_util) 1.5.4→1.6.1
  • lapack (new; required to build psfex)
  • lmfit 0.9.3→1.0.0
  • log4cxx 0.10.0→0.10.0
  • lsstdesc.coord (coord) 1.1→1.2.1
  • minuit2 5.34→6.18.00
  • mpi4py 3.0.0→3.0.3
  • mpich 3.2.1→3.3.2
  • ndarray 1.5.3→1.5.3
  • pybind11 2.2.4→2.2.4
  • starlink-ast 1.3.8→9.1.0
  • treecorr 3.2.3→3.3.11
  • wcslib 5.13→7.2
  • ws4py 0.4.2→0.5.1
  • xpa 2.1.15→2.1.20

Notes:

  1. eigen from conda is used if jointcal is not set up. If jointcal is setup, eigen from eups will supersede conda-forge eigen.

Updated conda environment; bugfixes, fastavro added [RFC-693]
(Krzysztof Findeisen) #2

What does this mean for those of us working with the shared stack on lsst-dev? The use of conda compilers sounds incompatible with the use of devtoolset.


(K-T Lim) #3

scons should pick up the compilers automatically, depending on the environment set up by loadLSST.bash for the stack version you are using. But existing user clones may need to be cleaned and rebuilt, as can happen with any new shared stack update. lsstsw users will similarly need to do a new deploy.


(Jim Bosch) #4

I’m happy to report a huge (for me and a few others, at least) side benefit of this change: our binaries now work on non-RedHat Linux distributions.

I tested this (on Mint 19.3, an Ubuntu 18.04 derivative) by:

Also, yes, you can use eups distrib with lsstsw, and I highly recommend it - and if you keep your local versiondb in sync, rebuild should use any (now binary!) installs automatically you already have when they match what it would have built.

Anyhow, I hope this means we can update the docs and newinstall.sh to advertise and auto-select binaries for Linux, not just particular RH releases (and we could probably switch to just making binaries on one RH release now, and just test that they work on any others).


(Chris Walter) #5

Thanks for the work on this! A note and question:

On most Linux distributions, you should only need git, patch, curl, and make installed to get up and running with lsstsw or newinstall.

Since git seems to be in the new conda env should this statement be edited? Or is the order of the bootstrap such that you need it before conda is installed?

treecorr < 4

Could you say something about this and why it is necessary? @rmjarvis has added some important changes for calculating errors in treecor in the recent releases.


(Michael Jarvis) #6

There are API changes between 3.x and 4.x. Pretty minor ones, but someone would probably need to run through the treecorr usage to make sure nothing breaks when switching to 4.x.

Ref. https://github.com/rmjarvis/TreeCorr/blob/releases/4.0/CHANGELOG.rst


(K-T Lim) #7

Pre-installing git from the Linux distribution will avoid a warning/question from newinstall.sh. That step will be removed from newinstall (or its equivalent) soon. We may be able to similarly remove all of the other OS-level prerequisites, at least on CentOS (where curl is part of the base distribution).